WO2019139951A1 - Détection de sites d'interaction de protéines dans des acides nucléiques - Google Patents

Détection de sites d'interaction de protéines dans des acides nucléiques Download PDF

Info

Publication number
WO2019139951A1
WO2019139951A1 PCT/US2019/012851 US2019012851W WO2019139951A1 WO 2019139951 A1 WO2019139951 A1 WO 2019139951A1 US 2019012851 W US2019012851 W US 2019012851W WO 2019139951 A1 WO2019139951 A1 WO 2019139951A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
protein
binding
artificial
domain
Prior art date
Application number
PCT/US2019/012851
Other languages
English (en)
Inventor
Daniel Zvi BAR
Francis S. Collins
Original Assignee
The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services filed Critical The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
Publication of WO2019139951A1 publication Critical patent/WO2019139951A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6816Hybridisation assays characterised by the detection means
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/305Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Micrococcaceae (F)
    • C07K14/31Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Micrococcaceae (F) from Staphylococcus (G)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/315Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Streptococcus (G), e.g. Enterococci
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K19/00Hybrid peptides, i.e. peptides covalently bound to nucleic acids, or non-covalently bound protein-protein complexes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1003Transferases (2.) transferring one-carbon groups (2.1)
    • C12N9/1007Methyltransferases (general) (2.1.1.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/90Isomerases (5.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6804Nucleic acid analysis using immunogens

Definitions

  • the present invention lies in the fields of biochemistry, molecular biology and cell biology, as well as the related fields, and concerns the compositions and the methods useful for detection and mapping of protein interaction sites in nucleic acids, such as, but not limited to, binding sites for nucleic-acid binding proteins.
  • a version of this common method involves cross-linking DNA- bound proteins to DNA by treating cell or tissue samples by a chemical cross-linker, such as formaldehyde, followed by fragmentation of the DNA (for example, by sonication) and immunoprecipitation of the protein cross-linked to DNA by an antibody specific for the DNA- binding protein of interest (such as a transcription factor).
  • a chemical cross-linker such as formaldehyde
  • fragmentation of the DNA for example, by sonication
  • immunoprecipitation of the protein cross-linked to DNA by an antibody specific for the DNA- binding protein of interest (such as a transcription factor).
  • the immunoprecipitated DNA fragments are then sequenced, and the resulting sequences are mapped to the genome.
  • ChIP-seq Some of the disadvantages of ChIP-seq are the need for a specific antibody of superb quality to immunoprecipitate the DNA-protein complex, and the losses and inefficiencies associated with immunoprecipitation resulting in the requirement for a large sample input size (10 3 10 7 cells). The above disadvantages make it difficult or impossible to perform ChIP-seq on single cells, rare cell types and small-sized clinical samples.
  • DamID DNA Adenine Methyltransferase Identification (DamID) and Chromatin Immuno/Endogenous Cleavage methods (ChIC/ChEC).
  • DamID involves expressing a DNA-adenine methyltransferase (Dam) fused to a DNA-binding protein of interest, which catalyzes the methylation of Dam recognition sites in the vicinity of the binding site of the protein of interest. Methylated adenines are then detected, for example, by a PCR-based assay. DamID requires laborious construction of a unique fusion protein for each protein of interest, and the requirement for fusion protein expression makes DamID unsuitable for use on primary tissue samples.
  • ChIC/ChEC methods rely on tethering an endonuclease, either by antibodies or by fusion, to a DNA-bound protein of interest. Subsequent endonuclease activity results in a specific DNA cleavage pattern, allowing for the isolation and detection of double stranded DNA fragments from the vicinity of the protein’s binding site. ChIC/ChEC methods are not capable of producing protein binding maps from small input samples and only generate a single data point from the cleaved fragment from each successful nuclease-targeting event. Additionally, ChIC/ChEC methods cannot be multiplexed to produce data for several proteins in a single experiment.
  • compositions described herein include artificially engineered nucleic acid-modifying enzymes, such as engineered methyltransferases, with antibody-binding domains.
  • the methods utilize a suitable“primary” antibody bound specifically to a nucleic acid-interacting protein of interest (for example, a protein bound, directly or indirectly, or found in proximity of its target nucleic acid), and the engineered nucleic-acid modifying enzymes, such as an engineered methyltransferase, bound to the antibody/protein-of-interest complex.
  • a primary antibody may be first allowed to bind to the protein of interest, and the resulting complex is allowed to bind to the engineered nucleic -acid modifying enzyme.
  • an engineered nucleic -acid modifying enzyme can first be allowed to bind to the primary antibody or antibodies, and then the resulting complexes be allowed to bind to their nucleic acid targets.
  • the methods according to the embodiments of the present invention are not limited by the order in which the primary antibody, the engineered nucleic-acid modifying enzyme and the nucleic-acid interacting protein of interest are bound to each other to form a complex that tethers the engineered nucleic-acid modifying enzyme in the vicinity of the protein interaction site in the nucleic acid.
  • the reaction catalyzed by the nucleic acid-tethered engineered nucleic -acid modifying enzyme modifies its nearby reaction sites in the nucleic acid, producing distinct modification patterns (such as methylation patterns) in the nucleic acid sequence in proximity to the interaction site of a protein-of-interest, thus allowing for the assessment of the interaction site by various nucleic acid analysis methods.
  • some embodiments of the methods described herein use an engineered cytosine methyltransferase with an antibody-binding domain.
  • the engineered cytosine methyltransferase methylates the cytosines in the nucleic acid that are found in proximity, typically approximately 25 nm or closer, of the tethered methyltransferase.
  • the methylation pattern of the nucleic acid is analyzed, for example, by bisulfite sequencing approach.
  • the bisulfate treatment during such sequencing converts the methylated cytosine residues to uracil residues, but preserves the methylated cytosines.
  • the nucleic acid that has been treated with bisulfite retains only the methylated cytosines, which were“protected” by methylation. Since the bisulfite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, the exemplary embodiment of the method provides single-nucleotide resolution information about the methylation status of a segment of DNA.
  • the methods described herein offer a number of significant improvements over the previously known methods. For example, they do not require immunoprecipitation, thus making it possible to use a broad range of primary antibodies that may not necessarily be suitable for immunoprecipitation-based methods, such as ChIP-seq.
  • the methods described herein also avoid the losses and the inefficiencies associated with the immunoprecipitation.
  • the methods described herein also do not require uniquely engineered proteins or their expression in the cells, in contrast to DamID, thus making the described methods significantly less laborious and also suitable for use in primary cell and/or tissue samples.
  • An engineered nucleic -acid modifying enzyme such as an engineered methyltransferase, according to embodiments of the present invention that has a short recognition sequence can generate a meaningful signal over a relatively short single nucleic acid molecule, while maintaining a high mapping resolution.
  • the methods described herein allow for identification of protein interaction sites based on a single cell and/or a single nucleic acid molecule.
  • using a variety of engineered nucleic-acid modifying enzymes with different recognition sequences and/or producing different nucleic- acid modifications allows for detection of different protein target sites, in some cases in a single experiment.
  • the variations and advantages of the compositions and the methods of the present invention are discussed throughout this document and illustrated in the accompanying figures.
  • Embodiments of the present invention include an artificial nucleic acid-modifying enzyme comprising a nucleic acid modifying domain and an antibody-binding domain.
  • the nucleic acid modifying domain may be a methyltransferase domain, a pseudouridine synthase domain or a guanine transglycosylase domain.
  • the nucleic acid modifying domain may be a DNA or RNA methyltransferase domain.
  • the nucleic acid modifying domain may be a DNA cytosine-methyltransferase domain.
  • the antibody-binding domain may be derived from Protein A, Protein G, or other naturally occurring or artificial antibody-binding proteins.
  • the artificial nucleic acid-modifying enzyme is a recombinant fusion polypeptide.
  • the nucleic acid modifying domain and the antibody -binding domain may be separated by a linker.
  • the embodiments of the present invention also include nucleic acid sequences encoding the above artificial nucleic acid-modifying enzymes, methods of producing the artificial nucleic acid-modifying enzymes, which may include a step of expressing the above nucleic acid sequences, vectors comprising the above nucleic acid sequences, cells comprising such vectors, as well as kits comprising the above nucleic acid sequences, vectors, cells and other reagents and/or devices for producing the artificial nucleic acid-modifying enzyme.
  • Such methods may include the steps of contacting a sample comprising a nucleic acid-interacting protein and a nucleic acid with a primary antibody specific for the nucleic acid-interacting protein under conditions allowing binding of the primary antibody to the nucleic acid-interacting protein to occur; exposing the sample to the artificial nucleic acid-modifying enzyme under conditions allowing binding of the antibody-binding domain of the nucleic acid modifying enzyme to the primary antibody to occur; concurrently or subsequently to exposing, allowing for the artificial nucleic acid modifying enzyme to catalyze the nucleic acid modification reaction; subsequently to the modification reaction, isolating the nucleic acid molecule; analyzing the nucleic acid molecule to detect modifications by the artificial nucleic acid-modifying enzyme, wherein the modifications are indicative of the proximity of the interaction site of the nucleic acid interacting protein in the nucleic acid
  • the artificial nucleic acid modifying enzyme is an artificial methyltransferase and the modification reaction is a methylation reaction, such as cytosine or adenine methylation.
  • the sample can contain the nucleic acid-interacting protein interacting with its interaction site in the nucleic acid.
  • the nucleic acid can be DNA or R A.
  • the nucleic acid-interacting protein can be a nucleic acid binding protein, and the interaction site can be a binding site of the nucleic acid-binding protein.
  • the nucleic acid-interacting protein can be a transcription factor.
  • the analyzing step can include determining a sequence of the nucleic acid.
  • the analyzing step can include comparing the determined sequence of the nucleic acid to a reference sequence.
  • the artificial nucleic acid-modifying enzyme is an artificial cytosine methyltransferase
  • the modification reaction is cytosine methylation
  • the analyzing comprises treating the nucleic acid with bisulfate prior to determining the sequence of the nucleic acid.
  • the sample can be an eukaryotic cell or tissue sample.
  • the sample can be or include an immobilized nucleic acid or a nucleic acid array.
  • the sample can be or include an immobilized nucleic acid-interacting protein or a protein array.
  • the sample can include the nucleic acid-interacting protein interacting with the nucleic acid molecule
  • the above methods can include a step of allowing the nucleic acid binding protein to interact with the nucleic acid molecule.
  • the sample can include the nucleic acid-interacting protein cross-linked to the nucleic acid molecule.
  • the above methods can comprise a step of cross-linking the nucleic acid binding protein to the nucleic acid molecule.
  • kits for detecting a protein interaction site in a nucleic acid which include the artificial nucleic acid-modifying enzymes according to the embodiments of the present invention and optionally other reagents for performing the methods according to the embodiments present invention.
  • Figure 1 is a schematic illustration of an embodiment of a method of the present invention.
  • FIG 2 is a schematic illustration of selected embodiments engineered methyltransferases comprising an antibody binding domain (B) and a nucleic -acid modifying domain (B). As shown in schematic illustration, the order of the domains can be changed, and they can be separated by an optional linker (F).
  • B antibody binding domain
  • B nucleic -acid modifying domain
  • F optional linker
  • FIG 3A is a schematic illustration of a recombinant construct encoding an engineered methyltransferase termed“ChAMP.”
  • Figures 3B and 3C are the images of a stained gel (B) and a Western blot (C) demonstrating successful expression of ChAMP recombinant protein. No primary antibody was used for the Western blot, as the protein G domain of ChAMP directly bound the secondary antibodies used for the detection (IRDye® 800CW (925-32210 FI-COR® Biotechnology, Fincoln, Iowa)).
  • FIG. 4A is a compilation of the immunofluorescence images of HeFa cell samples demonstrating successful binding of ChAMP recombinant protein to primary antibodies in situ.
  • Feft panel shows a positive control: nuclear envelope as imaged with a primary antibody (rabbit anti-lamin B) and an appropriate secondary antibody (Alexa Fluor® (Molecular Probes, Inc., Eugene, Oregon) 555 anti-rabbit)
  • Middle panel shows a negative control: Cy3 anti-mouse antibody is unable to bind the primary antibody (rabbit anti-lamin B).
  • the insert in the top right comer of the middle panel is taken at lOx exposure, showing non-nuclear envelope background.
  • Right panel shows ChAMP that binds both antibodies in situ.
  • ChAMP successfully bound the primary antibody (rabbit anti-lamin B) and tethered it to the secondary antibody (Cy3 anti -mouse), as evident from the clear nuclear envelope images.
  • the images shown in the middle and the right panels were taken at the same exposure, using the same antibody concentrations.
  • FIG. 4B is an image of a Western blot showing that purified ChAMP recombinant protein had a methyltransferase activity following incubation with S-adenosyl methionine (SAM).
  • SAM S-adenosyl methionine
  • Figures 4C and 4D are the line plots of the fluorescent intensities illustrating the results of fluorescent sequencing of a DNA purified from HeLa cells and methylated by ChAMP (4C) or methylated by ChAMP in situ in a sample of fixed HeLa cells (4D) prior to bisulfite conversion and sequencing.
  • the letters above the plots is the unmodified genomic DNA sequence, with the arrows indicating the cytosines preserved from bisulfite conversion after ChAMP methylation.
  • Figures 4C and 4D show that only the cytosines following a guanine were preserved during bisulfite conversion, indicating ChAMP methyltransferase activity.
  • FIG. 4E is a diagram schematically illustrating that anti-CTCF binding factor primary antibodies guided ChAMP to methylate DNA near a CTCF-binding site in a sample of fixed HeLa cells.
  • the upper diagram shows the PCR design relative to the CTCF binding site.
  • the signal from a control ChIP-seq experiment is shown in blue.
  • Hae III restriction sites are indicated by black vertical bars, and PCR primers are indicated by red arrows.
  • Figure 4F is an image of the Western blot demonstrating that no PCR amplification of the fragment shown in Figure 4E occurred when ChAMP was not added (lane ChAMP -; Antibody -), minimal PCR amplification from non-specific binding occurred when the anti- CTCF antibody was omitted (lane ChAMP +; Antibody -), and strong PCR amplification occurred when anti-CTCF antibody guided ChAMP to methylate DNA near CTCF binding sites (lane ChAMP +; Antibody +).
  • the first lane shows the markers.
  • Figure 5A is a compilation of immunofluorescence images of fixed HeLa cell samples, with the fluorescence signal from non-specific binding detected at the red wavelength.
  • Top-left panel - no secondary antibody added - shows the autofluorescence background.
  • Top- right panel - secondary antibody only added - shows the background from secondary antibody specific binding.
  • Botom-left panel - ChAMP treatment followed by secondary antibody incubation - shows the fluorescence background predominantly from ChAMP.
  • Figure 5B is the image of the Western blot demonstrating that the detergents NP40, Tween-20 (T20) and Triton XI 00 (T), separately or together (T3 - all three detergents) did not inhibit the enzymatic activity of ChAMP, as evident from the comparison of Hae III digestion paterns in the presence of ChAMP ( Hae III +; ChAMP +) in the presence (+) or absence (-) of the detergents.
  • Figures 5 A and 5B together demonstrate that the detergent washes removed the non-specific binding of ChAMP without affecting its enzymatic activity.
  • Figure 6A is an image of the Western blot demonstrating that ChAMP was stable when incubated in GpC buffer (50 mM NaCl; 50 mM Tris-HCl; 10 mM dithiothreitol (DTT); pH 8.5)) double-distilled water (DDW), but not at the elevated temperatures (the lanes labeled 42C and 55C) or in Phosphate-Buffered Saline (PBST).
  • GpC buffer 50 mM NaCl; 50 mM Tris-HCl; 10 mM dithiothreitol (DTT); pH 8.5
  • DTT dithiothreitol
  • PBST Phosphate-Buffered Saline
  • ChAMP was incubated in the indicated buffer or at the temperature for the specified time, and used to methylate plasmid DNA (pcDNA3-GFPLaminA-R482W obtained from Addgene, Cambridge, Massachusets, although the choice of plasmid is not of a particular significance) before Hae III digestion.
  • Lane E is a sample used to test the activity of ChAMP incubated for 1 hour in the original elution buffer used for ChAMP purification. The elution buffer was deemed unsuitable for ChAMP storage based on the testing.
  • FIG. 6B is an image of the Western blot illustrating the results of testing the ability of ChAMP to methylate DNA in situ depending on the fixation procedure used on HeLa cell samples.
  • Fixation procedure is indicated by“Fixation” label as follows: no procedure - ; formaldehyde (FA) fixation - FA at the concentrations indicated; UV - ultraviolet (UV) exposure fixation at the indicated intensities (pJ/cm 2 x 100) using a Stratalinker 2400 UV Crosslinker using a 254 nm light bulb.
  • the samples were methylated by ChAMP in situ.
  • the DNA was then extracted, restricted by Hae III and amplified by PCR within the exponential range.
  • Figure 6B demonstrates that fixation at the low FA concentration allowed for ChAMP methylation, which was improved with a lO-minute incubation of the fixed cells at 65°C. In contrast, UV fixation inhibited ChAMP methylation.
  • Figure 7 is a bar graph showing the results of real-time PCR amplification of Hae III digested DNA from formaldehyde-fixed HeLa cells after anti-CTCF antibody-mediated ChAMP methylation of CTCF binding sites.
  • the graph indicates significant methylation at two different CTCF-binding sites (the bars labeled“CTCF”), but not at two different non- CTCF protein binding sites (the bars labeled“non-CTCF”), thus demonstrating targeted methylation of CTCF-binding sites by ChAMP.
  • FIG. 8 is a bar graph showing how ChAMP methylation was distributed relative to CTCF binding sites after anti-CTCF antibody-mediated ChAMP methylation of CTCF binding sites in formaldehyde-fixed HeLa cell samples. The figure illustrates that there is a strong enrichment near CTCF-binding sites, with the periodic decreases representing locations of the nucleosomes.
  • Figures 9A and 9B show a schematic representation generated by UCSC Genome browser (University of California, Santa Cruz, Genomics Institute) of a representative single nanopore-sequenced DNA molecule -150 kbp in size.
  • Figure 9B is a zoom-in of Figure 9A.
  • the DNA molecule was isolated from HeLa cells and subjected to antibody-guided methylation using ChAMP targeting H3K27ac, a modification Histone H3 that involves acetylation at the 27 th lysine residue of histone H3.
  • Embodiments of the present invention provide improved methods for detection and mapping of the protein interaction sites in nucleic acids.
  • One example of such sites are the binding sites for nucleic-acid binding proteins, such as the binding sites for transcription factors in genomic DNA.
  • the methods according to the embodiments of the present invention may be referred to as“detection methods,”“mapping methods,”“methods for assessment” and other similar terms.
  • the compositions for performing such detections methods, such as the reagents and their mixtures, as well as the related kits and the methods embodying the analytical, diagnostic clinical and therapeutic applications of the detection methods are also included among the embodiments of the present invention.
  • the terms“a,”“an,” and“the” can refer to“one,”“one or more” or “at least one,” unless specifically noted otherwise.
  • the term“occurrence” may be used herein to denote an incidence of something, for example, molecules, residues, sites etc., as well as frequency of their appearance, quantity, or distribution.
  • the combination of such information can be referred to as a“pattern.”
  • any of the foregoing information falling within the meaning of the term“occurrence” can be utilized in relation to modified nucleotides, such as methylated or unmethylated nucleotides, protein interaction sites in nucleic acids, modification (such as methylation) sites in nucleic acids, restriction sites, nucleotide sequences, etc..
  • the term “occurrence” can be utilized in relation to methylation sites for methyltransferases and/or in relation binding sites of the nucleic -acid binding proteins.
  • the information on the occurrences or patterns such as the information obtained in the course of performing the methods described herein can be compared or correlated with the information previously obtained, processed or stored. For example, such information can be compared to known or determined referenced nucleic acid sequences.
  • the results of such comparison can lead to assessment, detection or mapping of a protein interaction site in a nucleic acid, such as a binding site of a nucleic acid-binding protein.
  • the terms“assess,”“assessment” and similar terms are used herein to broadly refer to a process of discovering or determining the presence or an absence, as well as a degree, quantity, level, probability of occurrence or properties of something.
  • the term “assess” and related terms when used in reference to a denote detection or determination of the presence, absence, probability of presence or absence, quantity or changes of a protein interaction site in a nucleic acid, such as a binding site of a nucleic acid-binding protein, as well as to the determination of the a protein interaction site’s sequence and/or location in a nucleotide sequence (which can be referred to as“mapping”).
  • the term“assess” and related terms can be used interchangeably with the term“monitor,”“detect,”“detecting,”“indicate,” “map,”“mapping” and other related terms.
  • the terms“analysis” or“analyzing” and similar terms are used herein to broadly refer to studying or determining or identifying a nature, properties, or quantity of an object under analysis, or its components. Analysis can include assessment or detection, as discussed above. Analysis can include studying, determining or identifying changes, for example, changes over time or under different conditions. Analysis can also involve chemical or biochemical manipulations or steps, as well as manipulations or steps of nature, as well as manipulation of information in an appropriate manner (for example, storage of information in computer memory and computer calculations may be used).
  • the terms“analysis,”“analyzing” and the related terms can be used interchangeably with the terms“assessment,”“detection,” “identification,”“monitoring” and other related terms.
  • the terms“analysis,”“analyzing” and the related terms can be used in relation to nucleic acid sequencing, described in more detail elsewhere in this document.
  • the terms“subject,”“individual,” and“patient” may be used interchangeably. The use of these terms does not imply any kind of relationship to a medical professional, such as a nurse, medical or veterinary technician, physician or a veterinarian.
  • the term“subject” and related terms refer to an organism. Subject may be a mammal such as a primate, including a human.
  • the term“subject” includes non-human animals, for example, domesticated animals, such as cats, dogs, etc., livestock (cattle, horses, pigs, sheep, goats, etc.), and laboratory animals (mouse, rabbit, rat, guinea pig, etc).
  • subject may refer to a subject, such as, but not limited to, a human person, who may or may not have a medical disease or a condition. It is to be understood, that a subject having a medical disease or condition can be a patient with a known disease or condition, meaning the disease or condition that was detected prior to the performance of the embodiments of the methods of the present invention, or a subject with a previously undetected disease or condition.
  • condition when used in reference to the embodiments of the invention described herein, is used broadly to denote a biological state or process, which can be normal, abnormal or pathological.
  • condition can be used to refer to a medical or a clinical condition, meaning broadly a process occurring in a body or an organism and distinguished by certain symptoms and signs.
  • condition can be used to refer to a disease or pathology, meaning broadly an abnormal disease or condition affecting a body or an organism.
  • sample or“samples” as used herein are not intended to be limiting unless qualified otherwise and refer to any product, composition, cell, tissue or organism that may be analyzed by the methods described in this document.
  • sample may be any cell or tissue sample or extract originating from cells, tissues or subjects, and include samples of human or animal cells or tissues as well as cells of non-human or non-animal origin, including bacterial samples.
  • a sample can be directly obtained from a human or animal organism, or synthesized, propagated, cultured or otherwise generated.
  • sample may be an ex vivo sample, in vitro sample, a laboratory sample, a synthetic sample, etc.
  • Samples can be subject to various treatments, storage or processing procedures before being analyzed according to the methods described in the document. Generally, the terms“sample” or“samples” are not intended to be limited by their source, origin, manner of procurement, treatment, processing, storage or analysis, or any modification. Samples include, but are not limited to, samples of cells, tissues and/or organs. Samples encompass samples of healthy or pathological cells, tissues and/or organs. Samples can contain or be predominantly composed of cells or tissues, or can be prepared from cells or tissues.
  • samples are solutions, suspensions, supernatants, precipitates (cell precipitates), pellets, cell extracts (for examples, cell lysates), cell extracts, blood or plasma samples, tissue sections and/or including needle biopsies, microscopy slides, including fixed tissues (for example, formalin-fixed, paraffin-embedded (FFPE)) or frozen tissue sections, flow cytometry samples and fixed cell and tissue samples.
  • samples comprising nucleic acids (including DNA and R A), such as samples of cells or tissues, chromatin samples, samples of isolated and/or artificially generated nucleic acids, samples of immobilized nucleic acids, nucleic acid arrays, etc.
  • binding refers to a molecular interaction in which, under designated conditions, a specific binding molecule or a composition containing it binds to its binding partner or partners and does not bind in a significant amount to anything else. Binding to anything else other than the binding partner is typically referred to as“nonspecific binding” or“background.” The absence of binding in a significant amount is considered, for example, to be binding less than 1.5 times background (the level of non-specific binding or slightly above non-specific binding levels).
  • specific binding are antibody-antigen or antibody-epitope binding, binding of oligo- or polynucleotides to other oligo- or polynucleotides, binding of oligo- or polynucleotides to proteins or polypeptides (and vice versa), binding or proteins to polypeptides other proteins or polypeptides or receptor-ligand binding.
  • specific binding molecules can be or can include a protein, a polypeptide, an antibody, an oligo- or polynucleotide, a receptor, or a ligand. This list is not intended to be limiting, and other types of specific binding molecules may be employed.
  • a primary binding molecule is used herein to denote a specific binding molecule capable of specifically binding a target molecule directly.
  • a primary binding molecule can be an antibody, which can be referred to as a“primary antibody.”
  • a nucleic acid binding protein directly binding its nucleic acid target may be viewed as a primary binding molecule.
  • target is used herein to detect a molecule or a part thereof that interacts with another molecule, such as a part of a nucleic acid molecule that is an interaction site of a nucleic -acid-interacting protein, such as a binding site of a nucleic acid-binding protein.
  • a binding site of a transcription factor in genomic DNA may be described as the transcription factor’s“target,”“target binding site,”“target sequence” etc.
  • a recognition site for a restriction endonuclease may be described as its“target.”
  • a recognition site for a nucleic acid methyltransferase can be described as its“target.”
  • the use of the term“target” does not necessarily mean or imply direct binding of the two molecules, such as a protein and a nucleic acid. Although the direct binding may be occur in some situations, such as an antibody-antigen binding, or binding of a nucleic-acid binding protein to a nucleic acid, in some other situation the interaction may be indirect.
  • a first nucleic-acid interacting protein may not be directly binding to a nucleic acid but, instead, be directly interacting with a second nucleic -acid interacting protein, which, in turn, may directly bind to a nucleic acid sequence, which may be considered a“target” sequence for the first and/or the second proteins.
  • a first nucleic-acid interacting protein may be directly binding to a nucleic acid at one site and also be interacting with a second nucleic-acid interacting protein, which, in turn, may bind to same or different nucleic acid at a different binding sites. Both binding sites may be considered“target” sequences for the first and/or the second protein.
  • the sequence or sequences in the nucleic acid may be referred as a“target” of the first protein and/or the second protein.
  • the term“target” may be used to denote an interaction site or sequence, a recognition site or sequence, a binding site or sequence, a molecular binding partner, a site at which an enzymatic reaction occurs (such as a methylation site or a restriction site) and other similar terms and concepts.
  • the term“target” may also be used interchangeably and/or in conjunction with the above and other relevant terms.
  • the term“interaction,” as used herein, encompasses direct binding (covalent and non-covalent), indirect binding (such as via a mediator molecule), binding of molecular complexes, etc.
  • reaction also encompasses close molecular proximity that may be detected by the methods according to the embodiments of the present invention.
  • target may also be used to denote an interaction or proximity of engineered nucleic-acid modifying proteins of the present invention and/or the target protein, resulting from the three-dimensional conformation of the nucleic acid being tested, for example, from the three-dimensional arrangements of chromatin in situ.
  • antibody and the related terms, in the broadest sense, are used herein to denote any product, composition or molecule that contains at least one epitope binding site, meaning a molecule capable of specifically binding an“epitope” - a region or structure within an antigen.
  • antibody encompasses whole immunoglobulin (an intact antibody) of any class, including natural, natural-based, modified and non-natural antibodies, as well as their fragments.
  • antibody encompasses“polyclonal antibodies,” which react against the same antigen, but may bind to different epitopes within the antigen, as well as“monoclonal antibodies” (“mAbs”), meaning a substantially homogenous population of antibodies or an antibody obtained from a substantially homogeneous population of antibodies.
  • mAbs monoclonal antibodies
  • the antigen binding sites of the individual antibodies comprising the population of mAbs are comprised of polypeptide regions similar (although not necessarily identical) in sequence.
  • antibody also encompasses fragments, variants, modified and engineered antibodies, such as those artificially produced (“engineered), for example, by recombinant techniques.
  • antibody encompasses, but is not limited to, chimeric antibodies and hybrid antibodies, antibodies with dual or multiple antigen or epitope specificities, and fragments, such as F(ab')2, Fab', Fab, hybrid fragment, single chain variable fragments (scFv), “third generation” (3G) fragments, fusion proteins, single domain and“miniaturized” antibody molecules.
  • fragments such as F(ab')2, Fab', Fab, hybrid fragment, single chain variable fragments (scFv), “third generation” (3G) fragments, fusion proteins, single domain and“miniaturized” antibody molecules.
  • MTase Metaltransferase
  • Nucleic acid MTase is an enzyme that catalyzes the transfer of a methyl group to a nucleic acid.
  • DNA MTase is an enzyme that catalyzes the transfer or a methyl group to DNA.
  • DNA methyltransferases are site-specific DNA-methyltransferases that specifically methylate the amino group at the C-4 position of cytosines in DNA and can be referred to as“cytosine-N4-specific.”
  • Naturally-occurring DNA C-MTases methylate a ring carbon and form C5-methylcytosine.
  • Naturally-occurring N- MTases methylate exocyclic nitrogen and form either N4-methylcytosine (N4-MTases) orN6- methyladenine (N6-MTases).
  • RNA MTase is an enzyme that catalyzes the transfer or a methyl group to RNA, such as 02’ RNA MTases.
  • SAM S- adenosyl methionine
  • methyltransferases also fall within the scope of the term.
  • methyltransferase is a situation in which a methyltransferase uses a SAM analog, which leads to a modification of a nucleic acid other than methylation.
  • Non-limiting examples of nucleic acid MTases that can be suitably used in the methods and compositions according to the embodiments of the present invention are cytosine methyltransferases with various recognition sequences, such as (A/T/G/C)pC, Cp(A/T/C), NpNpC, CpHpN (H is anything but G) or any 3 or 4 specific/semi- specific recognition sequence (one example is a DNA cytosine methyltransferase a GpC recognition sequence) and adenine methyltransferases, such as Dam.
  • methyltransferase is used herein to denote natural or artificial (engineered, recombinant produced, modified etc.) methyl transferases, their fragments, variations, isoforms etc.
  • Nucleic-acid modifying enzymes included among or employed in the embodiments of the present invention broadly encompass enzymes that are capable of modifying a nucleic acid base in such a way that a modification can be detected by a sequencing method.
  • the non- limiting examples of nucleic-acid modifying enzymes are methyltransferases deaminases, pseudouridine synthases and guanine transglycosylases.
  • the terms“isolate,”“separate” or“purify” or similar terms are not used necessarily to refer to the removal of all materials other than the components of interest from a sample. Instead, in some embodiments, the terms are used to refer to a procedure that enriches the amount of one or more components of interest relative to one or more other components present in the sample. In some embodiments, “isolation,” “separation” or “purification” may be used to remove or decrease the amount of one or more components from a sample that could interfere with the detection of the component of interest.
  • a membrane typically nitrocellulose or PVDF
  • lysis and the related terms are used herein to refers to the breaking down of the membrane of a cell. A fluid or a suspension containing the contents of lysed cells is called a lysate.
  • sequences are used herein to denote the process determining, fully or partially, a nucleic acid sequence.
  • the term“sequencing,”“sequence determination,”“to sequence” and the related terms can be used to refer to the methods, procedures and protocols, using which the nucleic acid sequences are detected. Sequencing can involve nucleic acid amplification (such as polymerase chain reaction (PCR)), quantitative amplification (such as quantitative PCR (qPCR)) etc. Amplifications may be monitored in“real time,” for example, in real-time PCR.
  • PCR polymerase chain reaction
  • qPCR quantitative amplification
  • Amplifications may be monitored in“real time,” for example, in real-time PCR.
  • sequencing can be particularly effective when high throughput sequencing is used, for example, such as HiSeqTM, MiSeqTM, or Genome Analyzer (Illumina, San Diego, California), SOLiDTM or Ion TorrentTM (Life Technologies, Carlsbad, California) and 454TM sequencing (Roche Diagnostics, Indianapolis, Indiana).
  • high- throughput sequencing parallel sequencing reactions may be used using multiple templates and multiple primers allows rapid sequencing of genomes or large portions of genomes.
  • Amplicons may be sequenced in a base-incorporation method, a pyrosequencing method, a hydrogen ion detection method, or a dye -terminator detection method.
  • Sequencing may use in vitro cloning (nucleic acid implication) step or steps to amplify individual DNA molecules. Such methods may be referred to as“indirect” sequencing. Sequencing can also involve direct detection of nucleotide composition and/or modifications on a single molecule, including nanopore sequencing (for example, using the devices supplied by Oxford Nanopore Technologies, Oxford, United Kingdom) or zero-mode waveguide (for example PacBio® SMRT supplied by Pacific Biosciences, Menlo Park, CA). Such sequencing processes may be referred to as“direct” or“single-molecule” sequencing. Deep sequencing technologies and instruments (meaning technology and instruments capable of digital sequence readout) may also be employed. Sequencing may involve comparing the information obtained by a sequencing procedure to a known“reference” sequence.
  • mapping denotes the methods used to describe the positions or the locations of particular nucleic acid segments or sequences, such as the location of protein-interaction sites in a larger nucleic acid sequence, gene, chromatin, etc. Mapping may involve the comparison of sequencing data to a known“reference” sequence. Mapping may also refer to a result of a mapping process.
  • cross-linking “cross-linked” and other related terms are used herein to denote a process or the result of joining molecules together by a chemical bond.
  • cross-lining is formaldehyde cross-linking of biological molecules, which is an example of chemical cross-linking.
  • Formaldehyde reacts with a relatively strong nucleophile, most commonly a lysine e-amino group from a protein. This reaction forms a methylol intermediate that can lose water to yield a Schiff base (an imine), which, in turn, reacts with another nucleophile to generate a cross-linked product.
  • This second nucleophile may be a moiety from a nucleic acid, another protein, the same protein as the first nucleophile, a quencher molecule, or another endogenous small molecule.
  • a protein-DNA cross-linked product is one of the possibilities. All of the reactions in the formaldehyde cross-linking process are reversible.
  • formaldehyde cross-linking can be reversed by heating formaldehyde cross-linked sample in the presence of a salt, such as NaCl.
  • a salt such as NaCl.
  • glutaraldehyde cross-linking is another example of chemical cross-linking.
  • Cross-linking can also be accomplished by exposure to electromagnetic radiation, such as UV light.
  • fixation and the related terms are used herein to refer to processes used to stabilize proteins, nucleic acids and other components of cell and tissue samples and may be used interchangeably with the term“cross-linking,” depending on the context. Fixation processes may involve exposure to radiation, such as UV radiation, or chemical reagents referred to as“fixatives,” which typically make cell and tissue components, including proteins, insoluble.
  • fixatives such as aldehydes (for example, glutaraldehyde, formaldehyde or acrolein (propenal)) can create chemical bonds between proteins and/or proteins and nucleic acids in a sample.
  • fixatives act by reducing the solubility of protein molecules and/or disrupting hydrophobic interactions.
  • fixatives that act as precipitating fixatives are alcohols, such as ethanol and methanol.
  • acetone Another example of a fixative is acetone.
  • the samples may be exposed to a fixative (“fixed”) by various procedures. For example, a fixative may be injected in an animal and spread through the animal’s body via blood flow in a process commonly called“perfusion.” The samples then may be prepared from the perfused animal’s tissues. In another example, a cell sample may be immersed in a fixative and incubated in order to allow the fixative to diffuse. The duration of the procedure is determined by tissue type, size and density, as well as on the type of the fixative employed.
  • permeabilization is used herein to refer to processes used to make tissues, cell membranes or cell walls permeable, for example, to polypeptides and/or protein. Permeabilization may be achieved by exposure to a sample, to organic solvents and/or detergents. Some examples of detergents useful for permeabilization of the samples in the methods of the present invention are Triton X-100, Polyoxyethylene (20), sorbitan monooleate (Tween-20) or saponin.
  • the expressions“antibody-binding polypeptide” or“antibody-binding protein” is used herein to refer to a polypeptide that is capable of binding to immunoglobulin molecules.
  • Some examples of antibody-binding polypeptides are Protein A, Protein G, Protein M and Protein L, which are proteins of microbial origin that bind to mammalian immunoglobulin molecules.
  • Antibody-binding polypeptides can be recombinant polypeptides derived from the above or other naturally occurring binding proteins, or artificially engineered.
  • the expressions“antibody-binding domain” or“antibody-binding sites” can also be used refer to antibody-binding polypeptides or parts or fragments of antibody-binding polypeptides.
  • Embodiments of the present invention included polypeptides that can be referred to as“fusion proteins.”
  • A“fusion protein” refers to a composition containing at least one polypeptide or peptide domain which is associated with a second domain.
  • the second domain can be a polypeptide, peptide, polysaccharide, or the like.
  • The“fusion” generally can be an association generated by a peptide bond, a chemical linking, a charge interaction (for example, electrostatic attractions, such as salt bridges, H-bonding, etc.), a ligand-ligand noncovalent interaction or the like.
  • Embodiments of the present invention included“fusion proteins” that are recombinant polypeptides comprising at least two domains with different functional properties that are translated from a common nucleic acid sequence and thus are found in the same polypeptide chain.
  • the compositions of the domains can be linked by any chemical or electrostatic means.
  • Fusion proteins can include one or more optional linkers, epitope tags, enzyme cleavage recognition sequences, signal sequences, secretion signals, and the like. Fusion proteins can contain polypeptides intended to improve expression and/or purification, for example, histidine tagged fusion proteins, thioredoxin fusion proteins etc.
  • a polypeptide according to the embodiments of the present invention may contain a terminal polyhistidine (poly-Elis) amino acid sequence and/or a FFAG sequence (tag) to simplify purification.
  • Embodiments of the present invention encompass nucleic acids, including DNA and RNA, and polypeptides, which can be characterized by their respective sequences.
  • Embodiments of the present invention may encompass homologues, variants, isoforms, fragments, mutants, modified forms and other variations of the amino acid and nucleic acid sequences described in this document.
  • the term“homologous,”“homologues” and other related terms used in this document in reference to various amino acid and nucleic acid sequences are intended to describe a degree of sequence similarity among protein sequences or among nucleic acid sequences, calculated according to an accepted procedure.
  • Homologous sequences may be at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% 99% or 100% similar to reference sequences.
  • “% similarity” of two amino acid sequences or of two nucleic acid sequences is determined using the algorithm of Karlin and Altschul, which is incorporated into the NBLAST and XBLAST programs, available for public use through the website of the National Institutes of Health (U.S.A.). To obtain gapped alignments for comparison purposes, Gapped BLAST is utilized.
  • nucleotide sequences are homologous when two polynucleotide molecules hybridize to each other, or to a third nucleic acid, under stringent conditions.
  • Stringent conditions are sequence-dependent and will be different in different circumstances.
  • stringent conditions are selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength and pH.
  • Tm is the temperature (under defined ionic strength and pH), at which 50% of the target sequence hybridizes to a perfectly matched probe.
  • stringent conditions will be those in which the salt concentration is about 1 molar at pH 7 and the temperature is at least about 60° C.
  • RNA polynucleotides can be identified in Northern blots under stringent conditions using nucleotide sequences or their fragments of, for example, at least about 100 nucleotides.
  • Stringent conditions for such RNA-DNA hybridizations are those which include at least one wash in 6 SSC for 20 minutes at a temperature of at least about 50°C, usually about 55°C. to about 60°C, or equivalent conditions.
  • homologous nucleic acid molecules may include nucleic acid molecules that hybridize, under defined stringent conditions, with other nucleic acid molecules.
  • an indication that protein amino acid sequences are homologous may be that one protein is immunologically reactive with antibodies raised against the other protein.
  • Fragments of a polypeptide or an amino acid sequence can include any portion of a polypeptide or an amino acid sequence of at least 3, 5, 8, 10, 15, 20, 25, 30, 35, 40, 45 or 50 amino acids.
  • Variants of a polypeptide may result from sequence variations, such as amino acid substitutions, deletions, and insertions, as well as from post-translational modifications and their variations.
  • Variations in post-translational modifications can include variations in the type or amount of carbohydrate moieties of the protein core or any fragment or derivative thereof.
  • Variations in amino acid sequence may arise naturally as allelic variations (such as due to genetic polymorphism) or may be produced by human intervention (such as by mutagenesis of cloned DNA sequences), the examples being induced point, deletion, insertion and substitution mutants.
  • Variations in a nucleic acid sequence may result in changes in the amino acid sequence, provide silent mutations, modify a restriction site, or provide other specific mutations.
  • Variants of a polypeptide may also be conformational variations, with or without the changes the amino acid sequence and/or post-translational modifications.
  • mutation when used in reference to nucleotide or amino acid or nucleotide sequence can be used interchangeably and/or in conjunction with the terms“variant,”“allelic variant,”“variance,” or“polymorphism.”
  • Amino acid sequence modifications include substitutions, insertions or deletions. Insertions include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to four residues. Deletions are characterized by the removal of one or more amino acid residues from the protein sequence.
  • Amino acid substitutions are typically of single residues but may include multiple substitutions at different positions; insertions usually will be on the order of about from 1 to 10 amino acid residues but can be more; and deletions will range about from 1 to 30 residues, but can be more. Amino acid substitutions may be characterized as“conservative,” meaning substitution for an amino acid with similar properties. Some examples of conservative substitutions are shown in Table 1, below. A variant or an isoform can contain one or more of substitutions (including, for example, conservative amino acid substitutions, such as 1-5, 1-10, 1-20, 1-50 or more conservative amino acid substitutions), deletions or insertions.
  • An isoform or a variant can be a result of post-translational modifications, derivatizations or lack thereof.
  • variants may arise as a result of differences in glycosylation, such as N- and O-glycosylation.
  • Glutaminyl and asparaginyl residues are frequently post-translationally deamidated to the corresponding glutamyl and asparyl residues. Alternatively, these residues may be deamidated under mildly acidic conditions.
  • post- translational modifications include hydroxylation of proline and lysine, phosphorylation of hydroxyl groups of seryl or threonyl residues, methylation of the o-amino groups of lysine, arginine, and histidine side chains (acetylation of the N-terminal amine and, in some instances, amidation of the C-terminal carboxyl. Modifications can also include modifications in glycosylation.
  • the embodiments of the present invention include various polypeptides, as well as nucleic acids encoding such polypeptides.
  • the polypeptides described in this document can be produced using the nucleic acids encoding them with the aid of recombinant technologies.
  • the embodiments of the present invention include expression vectors containing one or more nucleic acids encoding one or more of the polypeptides described in this documents.
  • the encoding nucleic acid is typically operably linked to one or more regulatory sequences.
  • Such useful regulatory sequences include, for example, the early or late promoters, such as promoter sequences of SV40, CMV, vaccinia, polyoma or adenovirus, the lac system, the trp system, the TAC system, the TRC system, the LTR system, the major operator and promoter regions of phage lambda, the control regions of fd coat protein, the promoter for 3 -phosphogly cerate kinase or other glycolytic enzymes, the promoters of acid phosphatase (for example, Pho5), the AOX 1 promoter of methylotrophic yeast, the promoters of the yeast a-mating factors, and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses.
  • the early or late promoters such as promoter sequences of SV40, CMV, vaccinia, polyoma or adenovirus, the lac system, the trp system, the TAC system, the
  • An expression vector according to the embodiments of the present invention can be designed to produce fusion proteins described in this document.
  • An expression vector can be suitable for expression in eukaryotic or prokaryotic cells and thus include DNA molecules capable of integration into a prokaryotic or eukaryotic chromosome and subsequent expression.
  • the inserted genes in viral and retroviral vectors usually contain promoters and/or enhancers to help control the expression of the desired gene product.
  • a promoter is generally a sequence or sequences of DNA that function when in a relatively fixed location in regard to the transcription start site.
  • a promoter contains core elements required for basic interaction of RNA polymerase and transcription factors, and may contain upstream elements and response elements.
  • Specific regulatory elements can be cloned and used to construct expression vectors that are selectively expressed in specific cell types.
  • Expression vectors used in eukaryotic host cells may also contain sequences necessary for the termination of transcription which may affect mRNA expression. These regions are transcribed as polyadenylated segments in the untranslated portion of the mRNA encoding tissue factor protein. The 3' untranslated regions also include transcription termination sites. It is preferred that the transcription unit also contain a polyadenylation region. One benefit of this region is that it increases the likelihood that the transcribed unit will be processed and transported like mRNA. The identification and use of polyadenylation signals in expression constructs is well established.
  • the polyadenylation region is derived from the SV40 early polyadenylation signal and consists of about 400 bases.
  • the transcribed units may contain other standard sequences alone or in combination with the above sequences improve expression from, or stability of, the construct.
  • the vectors according to the embodiments of the present invention include viral vectors that transport the nucleic acids encoding polypeptides described in this document into cells without degradation and include a promoter yielding expression of the nucleic acids in the cells into which it is delivered.
  • Viral vectors are derived from viruses, including retroviruses, such as Adenovirus, Adeno-associated virus, Herpes virus, Vaccinia virus, Polio virus, AIDS virus, neuronal trophic virus, Sindbis virus and other RNA viruses. Also preferred are any viral families that share the properties of these viruses that make them suitable for use as vectors.
  • Retroviruses include Murine Maloney Leukemia virus, MMLV, and retroviruses that express the desirable properties of MMLV as a vector.
  • viral vectors are simian virus 40 (SV40) and baculovirus vectors.
  • viral vectors contain, nonstructural early genes, structural late genes, an RNA polymerase III transcript, inverted terminal repeats necessary for replication and encapsidation, and promoters to control the transcription and replication of the viral genome.
  • viruses When engineered as vectors, viruses typically have one or more of the early genes removed and a gene or gene/promoter cassette is inserted into the viral genome in place of the removed viral DNA.
  • the necessary functions of the removed early genes are typically supplied by cell lines that have been engineered to express the gene products of the early genes in trans.
  • Cells containing the expression vectors are also included among the embodiments of the present invention. Such cells can thus produce the polypeptides, for example, engineered methyl transferases, described in this document.
  • a cell can be either a eukaryotic or prokaryotic cell.
  • Some examples of the cells are bacterial cells, for example, cells of E. coli, Pseudomonas, Bacillus or Streptomyces, fungal cells, such as yeast cells (for example, cells of Saccharomyces, and methylotrophic yeast such as Pichia, Candida, Hansenula, and Torulopsis) animal cells, such as CHO, Rl .
  • African Green Monkey kidney cells for example, COS 1, COS 7, BSC1, BSC40, and BMT10
  • insect cells for example, Sf9 cells
  • human cells such as human embryonic kidney cells, for instance, HEK293
  • plant cells can be found in cell or tissue culture.
  • Eukaryotic cells can also be co-transformed with polynucleotide sequences encoding the antibody, labeled antibody, or antigen binding fragment thereof, and a second foreign DNA molecule encoding a selectable phenotype, such as the herpes simplex thymidine kinase gene.
  • Another method is to use a eukaryotic viral vector, such as simian virus 40 (SV40) or bovine papilloma virus, to transiently infect or transform eukaryotic cells and express the protein.
  • a eukaryotic viral vector such as simian virus 40 (SV40) or bovine papilloma virus
  • Expression systems such as plasmids and vectors, can be employed to produce proteins in cells, including higher eukaryotic cells, such as the COS, CHO, HeLa and myeloma cell lines.
  • the methods described in this document can involve computer-based calculations and tools. Tools can be advantageously provided in the form of computer programs that are executable by a general purpose computer system (which can be called“host computer”) of conventional design.
  • the host computer may be configured with many different hardware components and can be made in many dimensions and styles (e.g., desktop PC, laptop, tablet PC, handheld computer, server, workstation, mainframe). Standard components, such as monitors, keyboards, disk drives, CD and/or DVD drives, and the like, may be included.
  • the connections may be provided via any suitable transport media (e.g., wired, optical, and/or wireless media) and any suitable communication protocol (e.g., TCP/IP); the host computer may include suitable networking hardware (e.g., modem, Ethernet card, WiFi card).
  • suitable transport media e.g., wired, optical, and/or wireless media
  • TCP/IP any suitable communication protocol
  • the host computer may include suitable networking hardware (e.g., modem, Ethernet card, WiFi card).
  • the host computer may implement any of a variety of operating systems, including UNIX, Linux, Microsoft Windows, MacOS, or any other operating system.
  • Computer code for implementing aspects of the present invention may be written in a variety of languages, including PERL, C, C++, Java, JavaScript, VBScript, AWK, or any other scripting or programming language that can be executed on the host computer or that can be compiled to execute on the host computer. Code may also be written or distributed in low level languages such as assembler languages or machine languages.
  • the host computer system advantageously provides an interface via which the user controls operation of the tools.
  • software tools are implemented as scripts (for example, using PERL), execution of which can be initiated by a user from a standard command line interface of an operating system such as Linux or UNIX. Commands can be adapted to the operating system as appropriate.
  • a graphical user interface may be provided, allowing the user to control operations using a pointing device.
  • the present invention is not limited to any particular user interface.
  • Scripts or programs incorporating various features of the present invention may be encoded on various computer readable media for storage and/or transmission.
  • suitable media include magnetic disk or tape, optical storage media such as compact disk (CD) or DVD (digital versatile disk), flash memory, and carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • Engineered nucleic-acid modifying enzymes include compact disk (CD) or DVD (digital versatile disk), flash memory, and carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • Engineered nucleic-acid modifying enzymes include compact disk (CD) or DVD (digital versatile disk), flash memory, and carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet.
  • Engineered nucleic-acid modifying enzymes include compact disk (CD) or DVD (digital versatile disk), flash memory, and carrier signals
  • nucleic -acid modifying enzymes such as recombinant nucleic acid methyltransferases (MTases), which are recombinantly produced polypeptides or polypeptide fusions containing a domain having a nucleic acid modifying (such as methyltransferase activity) and an antibody binding domain.
  • the two domains may optionally be joined by a suitable linker, comprise additional domains or sequences, etc.
  • a nucleic acid modifying domain possesses a nucleic acid modifying activity, and an antibody-binding domain is capable of binding a range of immunoglobulins, as discussed in more detail in the relevant parts of“Terms and concepts” section of the present documents. Examples of domain arrangements of the engineered nucleic-acid modifying enzymes are schematically illustrated in Figure 2.
  • the engineered nucleic-acid modifying enzymes are capable of modifying a nucleic acid, including single- and double -stranded nucleic acid, DNA and RNA, in such a way that a modification can be detected by a sequencing method.
  • an engineered nucleic-acid modifying enzyme is an engineered methyltransferase comprising a nucleic acid methyltransferase domain, which can catalyze methylation of a nucleic acid base using SAM, or, in some cases, a different modification if a SAM analog is used.
  • nucleic -acid modifying enzymes are engineered DNA and RNA deaminases, pseudouridine synthases and guanine transglycosylases.
  • Nucleic acid-modifying domains can be derived from naturally occurring nucleic-acid modifying enzymes, such as those listed in Table 2, engineered de novo, produced by guided evolution, etc.
  • FIG. 3A schematically illustrates an exemplary non-limiting embodiment of an engineered methyltransferase coding sequence containing a methyltransferase domain coding sequence (MTase), a nucleotide sequence encoding a Protein-G derived antibody-binding domain (Protein G), as well as linker sequences, promoter, enhancer and terminator sequences, histidine tag sequences, etc.
  • MTase methyltransferase domain coding sequence
  • Protein G Protein-G derived antibody-binding domain
  • linker sequences promoter, enhancer and terminator sequences, histidine tag sequences, etc.
  • ChAMP methyltransferase
  • ChAMP amino acid sequence is represented by SEQ ID NO: 1 or variants thereof having at least 95%, 90%, 85%, 80%, 75% or 70% sequence similarity to SEQ ID NO: l, as well as amino acid sequences comprising SEQ ID NO: 1 or variants thereof having at least 95%, 90%, 85%, 80%, 75% or 70% sequence similarity to SEQ ID NO: l .
  • Nucleic acid sequences encoding engineered methyltransferases are also envisioned and included among the embodiments of the present invention.
  • SEQ ID NO:2 which encodes SEQ ID NO: l, or variants thereof encoding amino acid sequences having at least 95%, 90%, 85%, 80%, 75% or 70% sequence similarity to SEQ ID NO: 1, variants of SEQ ID NO:2 having 95%, 90%, 85%, 80%, 75% or 70%.
  • Molecules, vectors, compositions, products, kits including the above polypeptides or nucleic acids, as well as methods of making and using the above polypeptides or nucleic acids are also envisioned and included among the embodiments of the present invention.
  • the embodiments of the present invention include methods of detecting (methods of assessing, methods of locating, methods of mapping, methods of determining, etc. ) a protein interaction sites in a nucleic acids. Such methods can be useful, for example, for mapping binding sites of nucleic acid-binding proteins or the nucleic acid sequences located in proximity of the proteins interacting with nucleic acids in some capacity.
  • a nucleic acid interacting protein is a transcription factor.
  • Other examples of nucleic acid-interacting proteins are structural chromatin and nuclear proteins, including proteins with topological roles, proteins with genomic DNA maintenance roles, nuclear envelope or nuclear bodies proteins, histones and histone complex-associated proteins, replication, initiation, transcription and elongation complex and associated proteins, etc.
  • the detection methods according to the embodiments of the present invention utilize a suitable“primary” antibody bound specifically to a nucleic acid interacting protein of interest (for example, a protein bound to its nucleic acid binding site, such as DNA-bound transcription factor), and the engineered nucleic -acid modifying enzyme, such as an engineered, methyltransferase bound to the antibody/protein-of-interest complex.
  • a nucleic acid interacting protein of interest for example, a protein bound to its nucleic acid binding site, such as DNA-bound transcription factor
  • the engineered nucleic -acid modifying enzyme such as an engineered, methyltransferase bound to the antibody/protein-of-interest complex.
  • the nucleic acid modification reaction such as a methylation reaction catalyzed by the engineered methyltransferase, is allowed to proceed.
  • the engineered nucleic-acid modifying enzyme such as an engineered methyltransferase
  • the nucleic-acid modifying domain of the engineered nucleic-acid modifying protein is able to modify its target site found in the vicinity.
  • a methyltransferase domain of the engineered methyltransferase is able to methylate its methylation sites found nearby, thus producing distinct methylation patterns in the nucleic acid sequence in proximity to the interaction site of a protein-of-interest.
  • the resulting nucleic acid modification patterns are then assessed by various nucleic acid analysis methods to detect the interaction site of the protein-of-interest.
  • the methods according to the embodiments of the present invention are not limited by the order in which the primary antibody, the engineered nucleic -acid modifying protein, such as an engineered methyltransferase, and the nucleic -acid interacting protein of interest are bound to each other to form a complex that tethers the engineered nucleic -acid modifying protein in the vicinity of the protein interaction site in the nucleic acid.
  • the methods according to the embodiments of the present invention may comprise a step of allowing a suitable primary antibody to bind specifically to a nucleic-acid interacting protein of interest.
  • the methods may comprise a step of contacting the primary antibody with a sample comprising a nucleic acid interacting protein under the conditions allowing for the specific binding of the primary antibody and the nucleic acid interacting protein.
  • the nucleic acid interacting protein may be brought into contact to its target site in the nucleic acid prior to, concurrently or after being contacted with a primary antibody.
  • a nucleic acid binding protein may be bound to its nucleic acid binding site prior to, concurrently or after being contacted with a primary antibody.
  • the methods according to the embodiments of the present invention may comprise a step of allowing a nucleic-acid interacting protein of interest to interact with a nucleic acid under suitable conditions.
  • the methods may comprise a step of contacting the a nucleic acid interacting protein and a nucleic acid under the conditions allowing for the specific binding of the primary antibody and the nucleic acid interacting protein.
  • a suitable primary antibody may be a monoclonal or a polyclonal antibody capable of specifically binding the protein-of-interest without interfering with its interaction with the nucleic acid or while the protein of interest is interacting with the nucleic acid.
  • the detection methods may include a step of binding the primary antibody to the nucleic acid interacting protein, such as a nucleic-acid binding protein. This step is performed under the conditions that allow such binding to occur.
  • the conditions under which the binding of the primary antibody occurs depend on the context of the specific method. For example, the binding may occur in a buffer with a suitable pH containing a mild detergent (such as Tween 20) and a common protein (such as bovine serum albumin (BSA)) for blocking non-specific interactions.
  • a mild detergent such as Tween 20
  • BSA bovine serum albumin
  • Selection of the primary antibody is made based on the protein of interest. Various degrees of specificity of the primary antibody may be acceptable, depending on the goals and the applications of the detection method. For example, in some instances, a primary antibody may be selected that specifically binds all or a larger subset of variants of its target. In other instances, a primary antibody may be selected that specifically binds a smaller subset of variants, or only a particular variant. For example, in some embodiments, a primary antibody can be selected that specifically binds several variants and isoforms of a target nucleic-acid binding protein. In some other embodiments of the present method a primary antibody is selected that is specific for a mutant target.
  • the primary antibody is selected that is specific for the posttranslationally modified variant of a target. In still some other embodiments, the primary antibody is selected so that it is specific for a splice isoform, etc..
  • the nucleic acid interacting protein may be cross-linked to its target site in the nucleic acid prior to or after being contacted with a primary antibody. For example, if a sample comprising genomic DNA and DNA interacting proteins, such as a cell sample or a nuclear lysate sample, is analyzed according to the methods of the present invention, the sample may be subjected to a cross-linking procedure, such as formaldehyde cross-liking, prior to being contacted with the primary antibody.
  • the cross-liking conditions may be optimized based on the specific context. For example, the cross-liking agent concentrations, cross-linking times and temperatures may be adjusted. However, cross-linking is not required, and the methods can be performed without crosslinking, depending on the context, the sample, etc..
  • the cross- linking process employed may be reversible or irreversible, as long as irreversible cross-linking allows for methylation by the engineered methyltransferase.
  • an engineered nucleic acid-modifying enzyme such as an engineered methyltransferase
  • an antibody-binding domain is contacted with the sample and the binding of the engineered enzyme to the primary antibody is allowed to occur.
  • the sample is incubated with the engineered nucleic acid-modifying enzyme under suitable conditions.
  • incubation with an engineered methyltransferase may include the presence of SAM or SAM analog.
  • the nucleic acid modification reaction is then allowed to proceed. For example, the reaction may be allowed to proceed for 1 min - 1 hour.
  • the reaction then may be stopped by heating a sample and/or other suitable methods, such as an addition of proteinase.
  • the reaction can be performed at room temperature, or at lower temperatures to reduce the rate of the reaction.
  • Other reaction conditions may also be changed.
  • SAM concentration in a methyltransferase-catalyzed reaction may be changes to reduce or accelerate the reaction rate.
  • the methods of the present invention may include one or more washing and/or one or more blocking steps. Washing and blocking steps are employed, among other things, to decrease nonspecific binding and improve signal to noise ratios of the detection methods of the present intention.
  • the reagents and the conditions selected for such steps may vary, but can be experimentally determined according to commonly known procedures. Some reagents that can be suitably incorporated into washing and blocking solutions are bovine serum albumin (BSA) and detergents.
  • BSA bovine serum albumin
  • the methods according to the embodiments of the present invention can be successfully used on a wide variety of samples.
  • an in vitro prepared sample such as solution, containing a nucleic-acid interacting protein and a nucleic acid molecule.
  • a sample containing immobilized nucleic acid or nucleic acids or nucleic -acid interacting protein or proteins is another example of an in vitro prepared sample.
  • a sample may be an array containing immobilized nucleic acid or nucleic acids or nucleic-acid interacting protein or proteins.
  • a sample may be a blot containing immobilized nucleic acid or nucleic acids or nucleic-acid interacting protein or proteins.
  • a sample may be a library of nucleic acid or nucleic acids or nucleic-acid interacting protein or proteins.
  • a sample may be an isolated sample, such as an isolated chromatin sample or a nuclear sample. Samples include cultured cells and/or tissues, organoids, cell, tissue and/or organ samples obtained from humans or animals, whole small animals (such as single-cell animals or nematodes).
  • a sample may be a fixed, and optionally permeabilized, sample of a cell or a tissue.
  • a cell or tissue sample may be permeabilized to facilitate diffusion and binding of the reagents used in the methods of the present invention, such as the primary antibody and the engineered methyltransferase.
  • Permeabilization may be accomplished by exposure to various detergents, including, but not limited to, Triton X-100, NP-40, Polyoxyethylene (20) sorbitan monooleate (Tween-20), saponin, or their combinations, as well as by other suitable methods.
  • detergents including, but not limited to, Triton X-100, NP-40, Polyoxyethylene (20) sorbitan monooleate (Tween-20), saponin, or their combinations, as well as by other suitable methods.
  • the sample may be treated under the conditions allowing for lysis of the cells, solubilization of the proteins and/or other components in the sample and/or reversal of cross-linking.
  • the reversal of the cross-linking can be accomplished by incubation at 55- 99°C, optionally in the presence of NaCl or other salts. It is to be understood that the exposure of the sample to cross-linking reversal conditions may allow for partial reversal of the effects of the reversible cross-linking reagents.
  • a sample can also be treated with a suitable proteinase (such as proteinase K) to digest the proteins.
  • a suitable proteinase such as proteinase K
  • the sample may be purified by various procedures. For example, nucleic acids may be purified by column purification, phenol- chloroform DNA extraction, DNA binding beads or salt precipitation.
  • the nucleic acid is analyzed by a suitable method to determine which residues were modified by a nucleic acid-modification reaction.
  • the nucleic acid may be analyzed to determine which nucleotides were methylated by the engineered methyl transferase.
  • the methylation pattern of the nucleic acid may be analyzed by bisulfite sequencing. Bisulfate treatment converts the methylated cytosine residues to uracil residues, but preserves the methylated cytosines.
  • the method provides single-nucleotide resolution information about the methylation status of a segment of DNA. Comparison of the resulting sequencing information with a known reference sequence reveals the location of the methylated residues in a DNA, thus detecting (“mapping”) the location of the nucleic acid-interacting protein in the nucleic acid.
  • Bisulfite sequencing is only one example of a particular suitable nucleic acid analysis method. Various other sequencing methods may be utilized. For example, direct nanopore sequencing or other single molecule sequencing methods may be used instead of or in addition to bisulfite sequencing, depending on the context.
  • the“modification reach” of the engineered nucleic-acid modifying enzyme may change depending on the properties of the engineered enzyme, the antibody, the target protein etc.
  • various types of engineered nucleic -acid modifying enzymes such as engineered methyltransferases can be employed, depending on the context.
  • GpC methylation by cytosine methyltransferases does not generally occur in eukaryotic genomes, thus making the cytosine methylation patterns produced by engineered GpC cytosine methyltransferases in eukaryotic cells easier to analyze.
  • engineered GpC cytosine methyltransferases may be selected for analysis of eukaryotic genomes.
  • Engineered adenine transferases such as those based on Dam, may also be used instead of or together with engineered cytosine methyltransferases, and adenine methylation patterns may be analyzed by suitable direct sequencing methods, such as nanopore sequencing.
  • suitable direct sequencing methods such as nanopore sequencing.
  • Different types of engineered nucleic -acid modifying enzymes, such as different types of engineered methyltransferases, and/or primary antibodies may be combined together in multiplexed methods to detect the interaction sites of different proteins in a single experiment.
  • an experiment may be performed to identify nucleic acid interaction sites for several distinct nuclear proteins by using respective suitable primary antibodies, with each antibody tethered to a different engineered methyltransferase (for example, differing by their respective methylation sequences). Methylation analysis can then identify the site at which each nuclear protein interact with genomic DNA. Moreover, the relation between the proteins may also be inferred (for example if protein B is found only along with protein A but not vice versa, it may be inferred that protein A is required to interacts with the DNA before protein B).
  • Engineered methyltransferase molecules may be altered, for example, by altering their methylation site specificity and/or introducing and/or changing the linker length to affect their ability to access and methylate the nucleic acid and the“methylation reach” or proximity of the nucleic acid moieties being methylated.
  • the nucleic acid residues being methylated are at a distance approximately 25 nm or less from the protein-of-interest interaction site in the nucleic acid molecule. It is to be understood that the above distance is not necessarily a linear distance along the nucleic acid molecule, as it depends on the three-dimensional structure of the nucleic acid. The distance can be shortened to improve resolution. In another examples, different substrates can be used.
  • a methyltransferase and a SAM analog resulting in a detectable modification of a nucleic acid by a methyltransferase, for example, 5’- amino-5’-deoxyadenosine resulting in cytosine-to-uracil deamination by a DNA methyltransferase .
  • FIG. 1 An embodiment of a detection method is schematically illustrate in Figure 1.
  • the detection methods of the present invention may be useful in a wide range of analytical, diagnostic, clinical and therapeutic applications, for example, in research and laboratory applications in which detection of nucleic acid-protein interactions is desirable. Additional applications may include the identification of nucleic acid-protein interactions and chromatin architecture in when the data may be used to diagnose, classify or recommend treatment for diseases or conditions. For example, the methods may be used to test if a specific mutation in a protein changes its DNA binding preferences, or to explore how general chromatin organizing proteins change the chromatin architecture.
  • Kits for performing the methods of the present invention are included among its embodiments.
  • a kit is a set of components, comprising at least some components for performing the methods according to the embodiments of the present invention.
  • Such a kit may contain an engineered nucleic-acid modifying enzyme, such as an engineered methyltransferase, and, optionally, a primary antibody.
  • a kit may include one or more of cross- linking reagents, permeabilizing reagents, substrates, such as SAM or its analogues, buffers, blocking reagents, PCR reagents, such as primers, probes etc., sequencing reagents, etc.
  • Systems for performing the methods of the present invention are included among the embodiments of the present invention. These systems include various stations and/or components.
  • the term“station” is broadly defined and includes any suitable apparatus or assemblies, conglomerations or collections of apparatuses or components suitable for carrying out the a method according to the embodiments of the present invention.
  • the stations need not be integrally connected or situated with respect to each other in any particular way.
  • the invention includes any suitable arrangements of the stations with respect to each other. For example, the stations need not even be in the same room. But in some embodiments, the stations are connected to each other in an integral unit.
  • a system may include a station for treating a sample, such as a cross-lining station.
  • a system may comprise a station for performing binding of primary antibody.
  • a system may comprise a station for performing a nucleic-acid modification reaction.
  • a system may comprise a sequencing station.
  • a system may comprise a station for generating reports.
  • a system may comprise a station or components for data analysis.
  • a system may comprise a computer, a processor, electronic memory, software instructions etc.
  • a system, or parts of the system may be controlled by a computer.
  • a fusion protein termed“ChAMP,” which binds to antibodies and methylates GpC in the presence of S-adenosyl methionine (SAM) was designed, expressed and isolated.
  • the fusion protein coding sequence (SEQ ID NO:2) was designed in silico (see Figure 3A) custom- produced as a gBlock® gene fragment by Integrated DNA Technologies (Coralville, Iowa), PCR amplified and SLICEd substantially as described in Y ongwei Zhang etal, 2014 (Y ongwei Zhang, Uwe Werling, and Winfried Edelmann. Seamless Ligation Cloning Extract (SLiCE) Cloning Method. Methods Mol Biol. 2014; 1116: 235-244.
  • ChAMP was expressed in T7 Express lysY E. coli (C3010, NEB Ipswich, MA).
  • the bacteria transformed with the expression plasmid were grown overnight at 37°C in 50 ml of LB medium supplemented with 50 pg/mL of kanamycin, added to 200 ml of LB supplemented with lmM of IPTG (no kanamycin added) and grown for additional 4 hours. Bacterial culture samples were taken intermittently and later tested for the fusion protein expression. ChAMP expression was readily detected without induction, and no difference in expression level after IPTG induction was detected by Western blot.
  • the culture was cooled on ice for 15 minutes, centrifuged at 10,000 G, the pellet was resuspended in GpC buffer (50 mM NaCl; 50 mM Tris-HCl; 10 mM DTT; pH 8.5) and moved to a 50 ml tube.
  • GpC buffer 50 mM NaCl; 50 mM Tris-HCl; 10 mM DTT; pH 8.5
  • the tube was centrifuged and the pellet resuspended in 10 ml lysis buffer (GpC buffer + 0.5% Triton X100 + protease inhibitor + 0.1 mM EDTA) and incubated at 4°C with rotation for 1 hour, followed by centrifugation for 10 minutes at 5K RPM.
  • the resulting supernatant (10 ml) was incubated with 0.5 ml of His beads in lysis buffer for 1 hour at 4°C, with rotation. Following the incubation, the beads were loaded on a column, which was washed with 5 ml of GpC buffer followed by a series of elutions with 1.5 ml volume of GpC buffer with increasing concentration of imidazole.
  • the eluted samples were mixed with a protein loading buffer (Li- Cor 928-40004) and heated to 99°C for 5 minutes.
  • the proteins were resolved on a bis-tris gel (NuPAGE, Thermo Fisher) and transferred to a nitrocellulose membrane (iBlot mini, Thermo Fisher).
  • AGM antibody-guided methylation
  • SAM S-adenosyl methionine
  • DTT dithiothreitol
  • Fusion protein can also be stored for several weeks at -20°C with glycerol, but needs to be tested with every new experiment.
  • a tagged fusion protein based on a naturally occurring DNA-interacting protein is expressed in a specific cell and/or subset of cells in a transgenic animal, and the AGM protocol is performed using a tag-specific primary antibody.
  • a transcription factor fusion protein with a FLAG tag is expressed in a specific neuron in C. elegans, which allows for the identification of the transcription factor binding sites in that neuron and observation of the binding site change under various experimental conditions (age, stress, exposure to drugs etc ).
  • FIG. 9A shows a schematic representation, generated by UCSC Genome browser (University of California, Santa Cruz, Genomics Institute), of a representative single nanopore sequenced DNA molecule -150 kbp in size.
  • Figure 9B is a zoom-in of Figure 9A.
  • H3K27ac single molecule the y axis indicates the probability of GpC methylation for a given k-mer.
  • the strongest peak corresponds to a known H3K27ac binding site, with weaker peaks mainly along the gene body, presumably due to spatial proximity.
  • Dense ChAMP mediated methylation was observed on single molecules at known CTCF peaks. Sparse methylation was seen on the specific areas of adjacent DNA, potentially indicating three-dimensional proximity.

Abstract

L'invention concerne des compositions et des procédés utiles pour l'identification des sites d'interaction de protéines interagissant avec des acides nucléiques dans des acides nucléiques. Certains modes de réalisation de l'invention concernent les réactifs et les kits pour l'identification de tels sites d'interaction, tandis que certains autres modes de réalisation concernent les procédés. Les procédés, réactifs et compositions décrits dans la description sont utiles pour des applications de recherche, des applications cliniques, des applications diagnostiques et d'autres applications.
PCT/US2019/012851 2018-01-09 2019-01-09 Détection de sites d'interaction de protéines dans des acides nucléiques WO2019139951A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862615233P 2018-01-09 2018-01-09
US62/615,233 2018-01-09

Publications (1)

Publication Number Publication Date
WO2019139951A1 true WO2019139951A1 (fr) 2019-07-18

Family

ID=65324547

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/012851 WO2019139951A1 (fr) 2018-01-09 2019-01-09 Détection de sites d'interaction de protéines dans des acides nucléiques

Country Status (1)

Country Link
WO (1) WO2019139951A1 (fr)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111944874A (zh) * 2020-07-20 2020-11-17 广东省微生物研究所(广东省微生物分析检测中心) 一种筛选鉴定胁迫应答基因表达调控因子的方法
WO2021030666A1 (fr) * 2019-08-15 2021-02-18 The Broad Institute, Inc. Édition de bases par transglycosylation
WO2021203047A1 (fr) * 2020-04-02 2021-10-07 Altius Institute For Biomedical Sciences Procédés, compositions et kits pour identifier des régions d'adn génomique liées à une protéine
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070009937A1 (en) * 2005-05-19 2007-01-11 Laemmli Ulrich K Mapping of proteins along chromatin by chromatin cleavage
US20090061426A1 (en) * 2007-08-31 2009-03-05 Alexander Belyaev Binary signaling assay using a split-polymerase

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070009937A1 (en) * 2005-05-19 2007-01-11 Laemmli Ulrich K Mapping of proteins along chromatin by chromatin cleavage
US20090061426A1 (en) * 2007-08-31 2009-03-05 Alexander Belyaev Binary signaling assay using a split-polymerase

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GABRIEL N. AUGHEY ET AL: "Dam it's good! DamID profiling of protein-DNA interactions : Dam it's good!", WILEY INTERDISCIPLINARY REVIEWS: DEVELOPMENTAL BIOLOGY, vol. 5, no. 1, 18 September 2015 (2015-09-18), pages 25 - 37, XP055566853, ISSN: 1759-7684, DOI: 10.1002/wdev.205 *
MANFRED SCHMID ET AL: "ChIC and ChEC: Genomic Mapping of Chromatin Proteins", MOLECULAR CELL, 1 January 2004 (2004-01-01), United States, pages 147 - 157, XP055567197, Retrieved from the Internet <URL:https://ac.els-cdn.com/S1097276504005404/1-s2.0-S1097276504005404-main.pdf?_tid=77ff9dc9-88ac-413c-b1c8-7b6cb08e3ecd&acdnat=1552312381_06feb49cc5edfa5ca9d867dfd768a282> [retrieved on 20190311], DOI: 10.1016/j.molcel.2004.09.007 *
YONGWEI ZHANG ET AL.: "Yongwei Zhang, Uwe Werling, and Winfried Edelmann. Seamless Ligation Cloning Extract (SLiCE", CLONING METHOD. METHODS MOL BIOL., vol. 1116, 2014, pages 235 - 244

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
WO2021030666A1 (fr) * 2019-08-15 2021-02-18 The Broad Institute, Inc. Édition de bases par transglycosylation
WO2021203047A1 (fr) * 2020-04-02 2021-10-07 Altius Institute For Biomedical Sciences Procédés, compositions et kits pour identifier des régions d'adn génomique liées à une protéine
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
CN111944874A (zh) * 2020-07-20 2020-11-17 广东省微生物研究所(广东省微生物分析检测中心) 一种筛选鉴定胁迫应答基因表达调控因子的方法

Similar Documents

Publication Publication Date Title
WO2019139951A1 (fr) Détection de sites d&#39;interaction de protéines dans des acides nucléiques
US20230392141A1 (en) Methods and compositions for analyzing nucleic acid
Altemose et al. DiMeLo-seq: a long-read, single-molecule method for mapping protein–DNA interactions genome wide
Truax et al. ChIP and Re-ChIP assays: investigating interactions between regulatory proteins, histone modifications, and the DNA sequences to which they bind
JP6293742B2 (ja) デザイナーモノヌクレオソーム(designermononucleosome)のDNAバーコード化ならびにクロマチンリーダー、ライター、イレーサーおよびそのモジュレーターをプロファイリングするためのクロマチンアレイライブラリー
AU2017263810B2 (en) Recovering long-range linkage information from preserved samples
EP3507382A1 (fr) Analyse de la chromatine au moyen d&#39;une enzyme de coupure
De Sousa et al. Microbial omics: applications in biotechnology
JP2022520616A (ja) クロマチン会合タンパク質の定量的マッピング
US20090215029A1 (en) Methods of isolating and purifying nucleic acid-binding biomolecules and compositions including same
Liu et al. Transcriptome-wide measurement of poly (A) tail length and composition at subnanogram total RNA sensitivity by PAIso-seq
Jordán-Pla et al. Considerations on experimental design and data analysis of chromatin immunoprecipitation experiments
CN114206895A (zh) 用于检测dna中n-4-乙酰基脱氧胞苷的方法和试剂盒
Chaban et al. Tail-tape-fused virion and non-virion RNA polymerases of a thermophilic virus with an extremely long tail
Anthony et al. Using Disulfide Bond Engineering To Study Conformational Changes in the β′ 260-309 Coiled-Coil Region of Escherichia coli RNA Polymerase during σ70 Binding
Felix et al. Harnessing Nature’s Molecular Recognition Capabilities to Map and Study RNA Modifications
Farace et al. Phylogenomic analysis for Campylobacter fetus ocurring in Argentina
Marr et al. Whole-genome methods to define DNA and histone accessibility and long-range interactions in chromatin
Baytek et al. Robust co-immunoprecipitation with mass spectrometry for Caenorhabditis elegans using solid-phase enhanced sample preparation
Koren et al. Antibody variable-region sequencing as a method for hybridoma cell-line authentication
Benns et al. Prioritization of antimicrobial targets by CRISPR-based oligo recombineering
Sharma et al. Chromatin Immunoprecipitation (Chip)
Pinz et al. Assessing HDAC function in the regulation of signal transducer and activator of transcription 5 (STAT5) activity using chromatin immunoprecipitation (ChIP)
WO2020145405A1 (fr) Procédé d&#39;analyse d&#39;interactions de structure d&#39;adn tridimensionnel
CN110938610B (zh) 一种转座酶突变体、融合蛋白、其制备方法和应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19703799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19703799

Country of ref document: EP

Kind code of ref document: A1