WO2013102290A1 - Method for specifically recognizing dna containing 5-methylated cytosine - Google Patents

Method for specifically recognizing dna containing 5-methylated cytosine Download PDF

Info

Publication number
WO2013102290A1
WO2013102290A1 PCT/CN2012/001718 CN2012001718W WO2013102290A1 WO 2013102290 A1 WO2013102290 A1 WO 2013102290A1 CN 2012001718 W CN2012001718 W CN 2012001718W WO 2013102290 A1 WO2013102290 A1 WO 2013102290A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
protein
dhax3
tale
cytosine
Prior art date
Application number
PCT/CN2012/001718
Other languages
French (fr)
Chinese (zh)
Inventor
施一公
颜宁
邓东
闫创业
潘孝敬
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to CN201280060513.0A priority Critical patent/CN103987860B/en
Publication of WO2013102290A1 publication Critical patent/WO2013102290A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents

Definitions

  • the present invention relates to the field of biotechnology, and more particularly to a method for specifically recognizing DNA containing 5-methylcytosine. Background technique
  • TALE Transcription Activator Like Effectors
  • TALE Transcription Activator Like Effectors
  • TALE family proteins generally consist of three major functional domains, the N-terminal domain and
  • TALE is involved in secretion transport; the C-terminus has a transcriptional activation domain and a nuclear signal peptide fragment; the region located in the middle of TALE is a DNA-binding domain, but its DNA-binding domain is different from other known DNA-binding domains. It consists of a series of repeating units. In most cases, each repeat unit consists of 34 amino acids, and individual repeat units consist of 33 0 or 35 amino acid residues. Among the 34 amino acids, except for the amino acid changes at positions 12 and 13, the other amino acids are highly conserved. These two non-conservative amino acids are named RVD (repeat variable diresidue, repeated variable double residues). J. Boch et al. and MJ Moscou et al. (see J.
  • TALE Transcription activator-like effectors
  • the invention relates to a method for detecting cytosine thiolation in DNA, comprising the use of TALE protein and a derivative thereof to specifically recognize 5-mercapto-cytosine in DNA.
  • two different TALE proteins are used to specifically recognize cytosine and 5-mercapto-cytosine in the target sequence.
  • the method is for detecting guanidation of a CpG island.
  • the invention relates to the use of a TALE protein and a protein thereof for the specific recognition of 5-nonylated cytosine in DNA.
  • the invention relates to the use of a TALE protein and a derivative thereof for the preparation of a reagent for the specific recognition of 5-mercaptocytosine in DNA.
  • the invention relates to the use of a TALE protein and a derivative thereof for the preparation of a medicament for the diagnosis or treatment of cancer.
  • the diagnosis or treatment is carried out by specifically recognizing 5-mercapto-cytosine in the DNA.
  • the invention further relates to TALE proteins and derived proteins thereof for the specific recognition of 5- Amidoxime-modified DNA.
  • the invention also relates to TALE proteins and derived proteins thereof for use in the diagnosis or treatment of cancer.
  • the TALE protein can be a natural TALE protein and a TALE-derived protein that retains or enhances the specific recognition of 5-methylcytosine in DNA by genetic modification, modification, and assembly.
  • the TALE-derived protein further comprises a recombinant protein having a TALE protein DNA binding domain.
  • Figure 1 is a schematic representation of the high-resolution crystal structure (1.85 angstrom) of the DNA binding domain of dHax3 (dHax3 truncated, labeled dHax3-A) and double-stranded DNA.
  • 1-10 in the left panel shows each repeat unit of the DNA binding domain of dHax3, which recognizes the corresponding DNA sequence on the right side.
  • Each repeating unit consists of two o helices, and the two helices are and 1>, respectively.
  • the structure has been uploaded to the PDB database with the code: 3V6T.
  • dHax3 (designed Hax3 ) refers to the modified TALE protein Hax3.
  • Figure 2 shows the interaction between dHax3 and DNA bases.
  • A the side chain of RVD in dHax3 is pointed, the first amino acid in RVD does not extend into the DNA major groove, and the second amino acid extends the amino acid side chain to the DNA major groove;
  • B the first amino acid in RVD passes Hydrogen bond stabilizes the loop region conformation.
  • the first amino acid of the DNA binding domain repeat unit is asparagine (N) or histidine (H)
  • they are the eighth amino acid backbone of the repeat sequence.
  • the carbonyl oxygen atom forms a hydrogen bond interaction, which acts to stabilize the loop conformation of the entire RVD;
  • C the second amino acid in RVD directly interacts with the DNA base, when the amino acid residue is aspartic acid (D)
  • D When the carboxyl oxygen of aspartic acid directly forms a hydrogen bond with the amino group of cytosine in DNA through hydrogen bonding;
  • S serine
  • S the hydroxyl group in serine forms direct hydrogen with N7 in adenine.
  • Figure 3 is a comparison of the structure of thymine (left) and 5-mercapto-cytosine (right). It is clear from the comparison in the figure that the only difference between thymine (left) and 5-mercapto-cytosine (right) is the amino group at the six position and the carbonyl oxygen atom. Both the amino group and the carbonyl oxygen atom may interact with the amino acid residues of the protein by van der Waals forces.
  • Figure 4 shows that biochemical experiments and crystal structure analysis revealed that TALE protein recognizes 5-mercapto-cytosine through NG.
  • dHax3-recognized DNA sequence containing 5-mercapto-cytosine (5mC) (this sequence is called dHax3-5mC, contains three 5mC, only shows the base recognized by RVD of dHax3, the specific sequence is shown in the examples) And the corresponding RVD in the dHax3 protein; b.
  • EMSA detects the dHax3 pair of DNA sequences without 5mC (called dHax3 box, which is identical to the dHax3-5mC sequence except for 5mC) and the dHax3 pair of 5mC DNA sequences (dHax3) -5mC) binding capacity, about 4 nM nucleic acid probe was added to each lane; and gradient concentrations of dHax3 protein were added to the lanes 0 to 10, respectively, concentration 0, 8nM, 16nM, 31.5nM, 62.5nM , 125nM, 250nM, 500nM, ⁇ ⁇ ,
  • Figure 5 is an electropherogram showing the purification results of the dHax3 full-length protein. Lane marking
  • Ming 1. Whole bacteria breaking liquid; 2. Whole bacteria crushing and centrifugation; 3. Whole bacteria crushing centrifugal supernatant; 4. Nickel column culture waste liquid; 5. Nickel column cleaning liquid; 6. Nickel column elution recovery Liquid; 7. Nickel column; 8. Molecular weight marker.
  • Figure 6 is an electropherogram showing the purification results of the dHax3 truncated body protein (dHax3-A).
  • FIG. 7 shows that DNA binding experiments demonstrate that NG can specifically recognize thiolated cytosine.
  • 6T-6C indicates that 6 thymines (T) in dHax3-box were replaced with 6 cytosines (C);
  • 6T-6mC means 6 thymines (T) in dHax3-box with 6 methyl groups Cytosine (5mC) substitution;
  • 5C-5mC means replacing 5 cytosines (C) in dHax3-box with 5 methylated cytosines (5mC);
  • 5C-5mC means 5 of dHax3-box Cytosine (C) was replaced with 5 methylated cytosines (5mC);
  • 5C-5T was substituted for 5 cytosines (C) in dHax3-box with 5 thymine (5T) 0; 5C-5A Five cytosines
  • dHax3 has a DNA sequence containing six thiolation modifications (6T-6mC) Similar binding ability to the control experiment (dHax3-box).
  • 6T-6mC six thiolation modifications
  • c an RVD in dHax3, NG, cannot bind to cytosine (C) without thiolation.
  • d an RVD in dHax3 - HD - is specifically recognized for cytosine (C), and methylation modification affects the recognition of HD and cytosine.
  • Figure 8 is the DNA binding domain of dHax3-NN variant (dHax3-NN-A, that is, the RVD (NS) in the seventh repeat unit of the DNA binding domain of dHax3 is changed to NN by point mutation technique and the ninth RVD (HD) in the repeat unit is changed to NN by point mutation technique to form recognition of two thiolated CpG islands, and the specific recognition sequence thereof is shown in the examples.)
  • the crystal structure of the DNA containing two methylated CpG islands is combined. . detailed description
  • the inventors successfully analyzed the crystal structure of the complex of the DNA binding domain of the modified TALE protein Hax3 (referred to herein as dHax3 (designed Hax3)) and dsDNA.
  • dHax3 modified Hax3
  • dsDNA dsDNA binding domain of the modified TALE protein Hax3
  • NG in RVD relies on van der Waals force and 5-methyl interaction of thymine, and other thymine groups do not participate in the reaction.
  • the TALE protein may specifically recognize 5-methylcytosine in the DNA Han chain through NG because 5-mercapto-cytosine has a similar structure to thymine.
  • the inventors also successfully resolved the crystal structure of the complex of the DNA binding domain of dHax3 and the dsDNA having 5-mercaptocytosine.
  • This discovery provides a novel method for detecting and interfering with cytosine thiolation and can be used in the following ways:
  • DNA thiolation refers to the covalent bond of a methyl group at the 5' carbon position of the cytosine of the genomic CpG dinucleotide under the action of a DNA thiotransferase. Due to the close relationship between DNA thiolation and human development and tumor diseases, especially the transcriptional inactivation of tumor suppressor genes caused by thiolation of CpG islands,
  • methylation regions appear in the genome of cancer cells, and these thiolation phenomena do not occur in normal cells. Since the method of the present invention can effectively distinguish whether methylation occurs at a specific genomic locus, it can be used as a new cancer cell. testing method.
  • DNA methylation of cancer cells inhibits the expression of many tumor suppressor genes. Since the method of the present invention specifically reopens the expression of these genes in cancer cells, it can promote apoptosis of cancer cells.
  • TALE itself has the function of activating transcription. By designing the RVD on the TALE repeat sequence, it specifically binds to the upstream promoter sequence of the thiolated modified tumor suppressor gene, specifically opening up a large number of tumor suppressor genes in cancer cells. , to achieve the purpose of killing cancer cells.
  • TALE protein refers to Transcription Activator like
  • the TALE protein can be a natural TALE protein and a TALE-derived protein which retains or enhances the DNA, or DNA-RNA hybrid chain binding ability obtained by genetic modification, modification, and assembly.
  • Hax3 refers to one of the members of the TALE protein family.
  • the full name of Hax 0 is "Homolog of avrBs3 in Xanthomonas", t3 ⁇ 4 Hax3 ⁇ Aff > ⁇ 3 ⁇ 4 ⁇ "
  • One of the three homologous proteins identified by Armor aciae Xanthomonas campestris pv. Armoraciae ).
  • One of its members, its function is similar to that of other known TALE proteins such as avrBs3 (see S. Kay, J. Boch, U.
  • dHax3 refers to an artificially engineered Hax3 (designed Hax3) whose nucleotide sequence is SEQ ID NO: 1 and the amino acid sequence can be found in SEQ ID NO: 20 (with a 6XHis tag inserted therein) MM Mahfouz et al. designed dHax3 to have the ability to specifically recognize the following DNA sequences: TCCCTTTATCTCT (MM Mahfouz, L. Li, M. Shamimuzzaman, A. Wibowo, X. Fang, JK Zhu, De Novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks,
  • TALE transcription activator-like effector
  • dHax3 truncated body protein (“dHax3-A”) as used herein refers to a dHax3 truncated protein that has a N-terminal domain and a C-terminal domain removed, which is the dHax3 protein sequence.
  • dHax3-NN variant refers to a variant of dHax3 in which the RVD (NS) in the seventh repeat unit of the DNA binding domain of dHax3 is converted by a point mutation: G technology becomes NN and ninth RVD (HD) in a repeating unit becomes a point mutation technique
  • dHax3-NN-A refers to the protein sequence of the dHax3-NN variant.
  • a truncated body of 230-721, that is, a DNA binding domain is retained.
  • RVD-NG which involves dHax3 in the examples, specifically recognizes cytosine thiolation.
  • the ability to apply equally to other TALE proteins other than the dHax3 sequence of the examples is also within the scope of this patent.
  • composition of the 50 ⁇ standard PCR reaction system is shown in the following table, and the system can be amplified according to the ratio if necessary;
  • the amplified target gene fragment was directly recovered using a common DNA recovery kit. Note that if the amplified gene fragment is a point mutation, the DNA template is first removed by agarose gel electrophoresis, and then the target gene is recovered using an agarose gel DNA recovery kit.
  • the amplified fragment and vector were treated with the same restriction endonuclease to generate the same DNA cohesive ends.
  • the composition of the 50 ⁇ double digestion reaction system is shown in the following table:
  • the DNA fragment is recovered by gel electrophoresis of the lipogel gel DNA recovery kit.
  • the digested target gene fragment was ligated into the vector at room temperature for 30 to 120 min using T4 DNA ligase.
  • the connection system is shown in the following table:
  • the ligation product was transferred into DH5a competent cells according to the following method, and the positive clones were prepared for screening: 50 ⁇ 100 ⁇ 1 DH5 competent cells were added to the ligation product, placed on ice for 30 min; heat shocked at 42 °C for 90 s; placed on water for 2 min; All products were applied to ampicillin-resistant agar plates, spread with a coating bar, and cultured in an inverted 14-16 hours.
  • the plasmid was extracted using a common plasmid mini-kit, and sequencing was performed by Genewiz Biotech Co., Ltd.
  • overexpression is required.
  • Existing overexpression systems are Escherichia coli (£. / ), yeast, insect cells, and the like. Different proteins may be suitable for expression in different systems.
  • the target protein is a protein in Gram-negative bacteria, so E. coli was selected as an expression system for protein expression purification.
  • the specific purification steps are as follows: 50 ml of LB medium containing ampicillin or ampicillin/chloramphenicol antibody was added and incubated overnight at 37 °C on a shaker.
  • the induced E. coli was centrifuged at 4400 rpm for 4 min at 10 ° C, and the supernatant was discarded.
  • the wet bacteria collected by centrifugation per liter of culture medium were resuspended in 20 ml of lytic solution (25 11 1 ⁇ butyl 1 ⁇ -1 ⁇ 1 1 8.0 8.0, 500 mM NaCl).
  • elution buffer 25 mM Tris-HCl pH 8.0, 50 mM NaCl, 300 mM Imidazole
  • elution buffer 25 mM Tris-HCl pH 8.0, 50 mM NaCl, 300 mM Imidazole
  • Coomassie Brilliant Blue G-250 to check for cleanliness. If the elution is incomplete, repeat the above procedure.
  • the protein purified by the above two-step affinity chromatography was concentrated to ⁇ 10 mg/ml using an ultrafiltration concentrating tube. Finally, the protein was further purified using a molecular sieve (Superdax 200) and the protein was used.
  • the buffer used for the molecular sieve was 25 mM Tris-HCl pH 8.0, 150 mM NaCl, 10 mM DTT.
  • the buffer in the desalting column (Hiprep 26/10) dHax3 (231-720) protein was replaced with 25 mM MES pH 6.0, 50 mM NaCl, 5 mM MgCl 2 , 10 mM DTT.
  • the dHax3 (designed Hax3) gene is obtained by whole gene synthesis and the sequence is as follows (SEQ ID NO:
  • the synthesized gene was directly ligated into the pET300 (invitrogen) plasmid.
  • the expressed full-length protein has six histidine tags at the N-terminus and is used for affinity purification of the nickel column by protein purification.
  • the full-length protein sequence is as follows (SEQ ID NO: 2):
  • dHax3 truncation a truncated body protein (dHax3 truncation, labeled dHax3-A) containing the protein sequence 230-721) to obtain a more stable protein.
  • the dHax3 truncation was cloned into the pET21 (Novagen) expression vector.
  • the expressed dHax3 truncated protein sequence is as follows, wherein the C-terminus contains a His 6 tag for affinity purification by nickel column for protein purification (SEQ ID
  • the inventors also constructed and expressed the dHax3-NN-A protein for use with CpG islands.
  • Table 2 shows the RVD of the TALE repeat unit involved in the experiment and its identified DNA:
  • the synthesized single-stranded DNA was dissolved to 1 mM, the two single-stranded DNAs were mixed in an equimolar ratio, and the bath was heated at 85 ° C for more than 3 min, and slowly cooled to 22 ° C, which was not less than 3 hours.
  • lyophilization and cryopreservation can be performed.
  • the purified dHax3 truncated body protein (231-721 in the full-length sequence) was adjusted to a protein concentration of 6 to 7 mg/ml, and the double-stranded DNA after annealing at a molar ratio of 1.5:1 was added and incubated at 4 °C. 30 min.
  • the conditions for protein crystallization were screened from the above Kit, and the crystallization conditions were optimized by adjusting the concentration of the precipitant, the species, the concentration and type of the salt ions, and the concentration and type of the buffer.
  • the crystal was optimized using the Addtive Screen and the Detergent Screen Kit. At the same time, the crystal is dehydrated, annealed, etc. to improve the diffraction quality of the crystal.
  • Crystallization mother liquor 8-10% PEG3350 (w/v), 12% ethanol, 0.1 M MES pH 6.0. Climbing data collection and processing
  • Width allowed 7.3 6.5 Generously allowed 0.0 0.0
  • dHax3-A and double-stranded DNA dsDNA
  • This structure clearly demonstrates that dHax3 exhibits a right-handed helical structure that wraps dsDNA in the middle of the entire complex.
  • the protein is entangled outside the DNA and embedded in the large 5 groove of DNA (see Figure 1).
  • the structure shows that the 12th amino acid (histidine/asparagine) located in each repeat does not directly interact with DNA, but instead they will be the 8th amino acid (alanine) of the repeat sequence in which they are located.
  • the main chain oxygen atom forms a hydrogen bond, which acts to fix the ring in which the entire RVD is located.
  • thymine (T) and 5-methylcytosine (5mC) indicate that 5-mercapto-cytosine has a sulfhydryl group in the fifth position, and this thiol group is the only group recognized by NG. Therefore, NG may recognize 5mC. Accordingly, the inventors designed the DNA sequence dHax-5mC (Fig. 4a)
  • the inventor designed the DNA sequence dHax3-CpG.
  • Table 4 Crystal structure of the complex of dHax3-A and dHax3-5mC and statistical data of data collection and structural correction of the crystal structure of the complex of dHax3-NN-A and :0 dHax3-CpG Data DNA (dHax3-5mC) binds DNA (dHax3-CpG) to bind dHax3-A's dHax3-NN-A
  • the inventors analyzed the complex structure of dHax3 protein with three 5mC DNA with a resolution of 1.85 angstroms.
  • the high-resolution structure clearly reveals the molecular mechanism by which the dHax3 protein recognizes mC (Fig. 4c).
  • Figure 8 shows a schematic diagram of the DNA binding domain of the dHax3-NN variant and the crystal structure of the DNA containing two thiolated CpG islands, which confirmed that the dHax3-NN-A binding contains two thiolated CpG island DNAs.
  • DNA thiolation occurs only on C in CpG islands.
  • the applicant analyzed the crystal structure of TALE and the DNA sequence containing two CpG islands, further demonstrating that TALE has a specific recognition ability for thiolated DNA. This is very important for the expansion of TALE applications.
  • Example 4 Gel retardation assay demonstrates the ability of dHax3 to bind to DNA duplexes with 5-mercaptocytosine (5mC)
  • the gel retardation assay is a special gel electrophoresis technique that studies the interaction of DNA/RNA with proteins in vitro.
  • the basic principle is: In gel electrophoresis, due to the action of the electric field, the nucleic acid fragment of a small molecule moves faster toward the anode than the nucleic acid fragment to which the protein is bound. Therefore, a short nucleic acid fragment can be labeled, mixed with a protein, and the mixture can be subjected to gel electrophoresis. If the target DNA binds to a specific protein, the speed of movement is retarded, and autoradiography of the gel can be found. Nucleic acid binding protein. At the same time, by statistically comparing the amount of DNA bound to the protein and the amount of DNA of the unbound protein, a more accurate fit calculation can be made to the binding affinity of the protein to the nucleic acid.
  • 6T 6C 5 -CCACATATGTCATACGTGTCCCCCCACCCCCCTCCAGCTCGAG
  • T4 polynucleotide kinase (lOU/ ⁇ ) 1 ⁇ After setting up the reaction system according to the above table, gently mix and incubate at 37 °C for 30 min; use G25 pre-installed desalting column to remove excess [ ⁇ ] - 32 ⁇ ]- ⁇ , adding an excess of unlabeled complementary strands, annealing to generate double-stranded DNA or DNA-RNA hybrid double strands.
  • reaction components were added to the reaction system in the above ratio, and mixed for 4 min at 4 ° C; the reacted sample was run 6 % non-denaturing gel;
  • Image data was read with a Typhoon 9400 varible scanner.
  • FIG. 7 shows an RVD in dHax3 - NG - unable to bind to a cytosine without thiolation modification; and an RVD - HD in dHax3 - is specifically recognized for cytosine (C), and Methylation of cytosine affects the recognition of HD and cytosine.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medicinal Chemistry (AREA)
  • General Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed is a method for specifically recognizing DNA containing 5-methylated cytosine. The method comprises recognizing 5-methylated cytosine in DNA by using TALE protein.

Description

特异识别含有 5-曱基化胞嘧啶的 DNA的方法 技术领域  Method for specifically recognizing DNA containing 5-mercapto-cytosine
本发明涉及生物技术领域, 更具体地说, 涉及特异识别含有 5-甲 5 基化胞嘧啶的 DNA的方法。 背景技术  The present invention relates to the field of biotechnology, and more particularly to a method for specifically recognizing DNA containing 5-methylcytosine. Background technique
TALE( Transcription Activator Like Effectors, 转录激活子样效应因 子) 是植物致病菌黄单胞菌属(; 的细胞内的一种蛋白质。 : 0 当病原菌侵染植株时, 病菌会通过其自身的 III 型分泌系统将包括 TALE在内的一系列效应分子注入到植物细胞内。这些效应分子通过影 响宿主细胞的信号传递, 基因表达等方式来协助病菌进一步扩增。 TALE 则是这些效应分子中最大的一类, 它像植物自身的转录激活子 一样行使功能。  TALE ( Transcription Activator Like Effectors) is a protein in the cell of the phytopathogenic genus Xanthomonas; : 0 When a pathogen infects a plant, the pathogen passes its own III The type of secretory system injects a series of effector molecules, including TALE, into plant cells. These effector molecules help the bacteria to further expand by affecting host cell signaling, gene expression, etc. TALE is the largest of these effectors. In one class, it functions like a plant's own transcriptional activator.
: TALE家族蛋白一般由 3个主要的功能结构域组成, N端结构域与 : TALE family proteins generally consist of three major functional domains, the N-terminal domain and
TALE的分泌转运有关; C端具有转录激活结构域和入核信号肽片段; 位于 TALE中部的区域是 DNA结合结构域, 但它的 DNA 结合结构域 不同于其他已知的 DNA结合结构域,它是由一段串联的重复单元组成, 大多数情况下每个重复单元由 34 个氨基酸组成, 个别重复单元由 33 0 或 35个氨基酸残基组成。 这 34个氨基酸中除了第 12和 13位的氨基 酸变化较大之外, 其他氨基酸高度保守。 这两个不保守的氨基酸被命 名为 RVD ( repeat variable diresidue , 重复可变双残基 ) 。 J. Boch等人 和 M.J. Moscou等(参见 J. Boch, H. Scholze, S. Schornack, A. Landgraf, S. Hahn, S. Kay, T. Lahaye, A. Nickstadt, U. Bonas, Breaking the code of 5 DNA binding specificity of TAL-type III effectors, Science, 326 (2009) 1 509- 1 5 12和 .J. Moscou, A.J. Bogdanove, A simple cipher governs DNA recognition by TAL effectors, Science, 326 (2009) 1501 ) 已于 2009 年分别通过实验和生物信息学研究发现每个重复单元中第 12和 13位 的氨基酸 (RVD ) 与识别的核苷酸种类有特殊的对应关系, 例如: 表 1 部分 RVD与 DNA碱基序列的对应关系 TALE is involved in secretion transport; the C-terminus has a transcriptional activation domain and a nuclear signal peptide fragment; the region located in the middle of TALE is a DNA-binding domain, but its DNA-binding domain is different from other known DNA-binding domains. It consists of a series of repeating units. In most cases, each repeat unit consists of 34 amino acids, and individual repeat units consist of 33 0 or 35 amino acid residues. Among the 34 amino acids, except for the amino acid changes at positions 12 and 13, the other amino acids are highly conserved. These two non-conservative amino acids are named RVD (repeat variable diresidue, repeated variable double residues). J. Boch et al. and MJ Moscou et al. (see J. Boch, H. Scholze, S. Schornack, A. Landgraf, S. Hahn, S. Kay, T. Lahaye, A. Nickstadt, U. Bonas, Breaking the code Of 5 DNA binding specificity of TAL-type III effectors, Science, 326 (2009) 1 509- 1 5 12 and .J. Moscou, AJ Bogdanove, A simple cipher governs DNA recognition by TAL effectors, Science, 326 (2009) 1501 In 2009, through experiments and bioinformatics studies, it was found that the amino acids (RVD) at positions 12 and 13 in each repeat unit have a specific correspondence with the identified nucleotide species, for example: Table 1 Correspondence between partial RVD and DNA base sequences
RVD氨基酸序列 DNA碱基序列  RVD amino acid sequence DNA base sequence
HD C HD C
NG T NG T
N1 A  N1 A
NN G/A  NN G/A
NS A/G/C/T  NS A/G/C/T
TALE蛋白的特异 DNA序列识别以及灵活的可组装性为它们在分 子生物学中的应用提供了巨大的前景, 科学家们可以设计组装任意的 TALE单元去识别任意的 DNA双螺旋序列。 这一特性已经被用来构造 切割特异双链 DNA序列的 DNA酶 TALEN (TALE nuclease, TALE核 酸酶), 用于在细胞基因组中引入定点突变、 定点敲除等操作 (A.J. Bogdanove, D.F. Voytas, TAL effectors: customizable proteins for DNA targeting, Science, 333 (201 1) 1843-1846. ) 。 在目前所有已知的报道中, TALE识别的都是没有修饰的双链 DNA。 发明内容 The specific DNA sequence recognition and flexible assemblability of TALE proteins provide great promise for their application in molecular biology. Scientists can design and assemble arbitrary TALE units to recognize arbitrary DNA double helix sequences. This feature has been used to construct DNase (TALE nuclease, TALE nuclease), which cleaves a specific double-stranded DNA sequence, for introducing site-directed mutagenesis, site-specific knockout, etc. (AJ Bogdanove, DF Voytas, TAL Effectors: customizable proteins for DNA targeting, Science, 333 (201 1) 1843-1846. ). In all current known reports, TALE recognizes unmodified double-stranded DNA. Summary of the invention
一方面, 本发明涉及检测 DNA中的胞嘧啶曱基化的方法, 包括用 TALE蛋白及其衍生蛋白来特异性识别 DNA中的 5-曱基胞嘧啶。  In one aspect, the invention relates to a method for detecting cytosine thiolation in DNA, comprising the use of TALE protein and a derivative thereof to specifically recognize 5-mercapto-cytosine in DNA.
在优选实施方案中, 采用两种不同的 TALE 蛋白, 分别特异性识 別靶标序列中的胞嘧啶和 5-曱基化胞嘧啶。  In a preferred embodiment, two different TALE proteins are used to specifically recognize cytosine and 5-mercapto-cytosine in the target sequence.
在进一步优选的实施方案中,所述方法用于检测 CpG岛的曱基化。 一方面, 本发明涉及 TALE 蛋白及其衍生蛋白用于特异性识别 DNA中的 5-曱基化胞嘧啶的用途。  In a further preferred embodiment, the method is for detecting guanidation of a CpG island. In one aspect, the invention relates to the use of a TALE protein and a protein thereof for the specific recognition of 5-nonylated cytosine in DNA.
另一方面, 本发明涉及 TALE 蛋白及其衍生蛋白在制备用于特异 性识别 DNA中的 5-曱基胞嘧啶的试剂中的用途。  In another aspect, the invention relates to the use of a TALE protein and a derivative thereof for the preparation of a reagent for the specific recognition of 5-mercaptocytosine in DNA.
另一方面, 本发明涉及 TALE 蛋白及其衍生蛋白在制备用于诊断 或治疗癌症的药物中的用途。  In another aspect, the invention relates to the use of a TALE protein and a derivative thereof for the preparation of a medicament for the diagnosis or treatment of cancer.
在优选实施方案中, 所述诊断或治疗是通过特异性识别 DNA中的 5-曱基胞嘧啶来进行的。  In a preferred embodiment, the diagnosis or treatment is carried out by specifically recognizing 5-mercapto-cytosine in the DNA.
本发明另外涉及 TALE蛋白及其衍生蛋白, 其用于特异性识别 5- 曱基胞嘧啶修饰的 DNA。 The invention further relates to TALE proteins and derived proteins thereof for the specific recognition of 5- Amidoxime-modified DNA.
本发明还涉及 TALE蛋白及其衍生蛋白, 其用于诊断或治疗癌症。 TALE 蛋白可以为自然界已有的 TALE 蛋白以及在此基础上通过 基因方法突变、 修饰、 组装获得的保持或增强特异性识别 DNA中的 5- 甲基胞嘧啶的 TALE衍生蛋白。所述 TALE衍生蛋白还包含具有 TALE 蛋白 DNA结合结构域的重组蛋白。 附图说明  The invention also relates to TALE proteins and derived proteins thereof for use in the diagnosis or treatment of cancer. The TALE protein can be a natural TALE protein and a TALE-derived protein that retains or enhances the specific recognition of 5-methylcytosine in DNA by genetic modification, modification, and assembly. The TALE-derived protein further comprises a recombinant protein having a TALE protein DNA binding domain. DRAWINGS
图 1是 dHax3的 DNA结合域 ( dHax3截短体, 标记为 dHax3-A ) 与双链 DNA的高分辨率晶体结构( 1.85埃)示意图。左图中的 1-10 表 示 dHax3的 DNA结合域的每个重复单元, 其识别右侧对应的 DNA序 列。 每个重复单元由两个 o螺旋组成, 两个螺旋分别为 和1>。 该结构 已上传到 PDB数据库中,代码为: 3V6T。其中 dHax3 ( designed Hax3 ) 指经过改造的 TALE蛋白 Hax3。  Figure 1 is a schematic representation of the high-resolution crystal structure (1.85 angstrom) of the DNA binding domain of dHax3 (dHax3 truncated, labeled dHax3-A) and double-stranded DNA. 1-10 in the left panel shows each repeat unit of the DNA binding domain of dHax3, which recognizes the corresponding DNA sequence on the right side. Each repeating unit consists of two o helices, and the two helices are and 1>, respectively. The structure has been uploaded to the PDB database with the code: 3V6T. Where dHax3 (designed Hax3 ) refers to the modified TALE protein Hax3.
图 2表示 dHax3与 DNA碱基间的相互作用。 A、 dHax3 中 RVD 的侧链指向, RVD 中的第一个氨基酸并没有伸向 DNA 大沟内部, 同 时第二个氨基酸将氨基酸侧链伸向 DNA大沟; B、 RVD中第一个氨基 酸通过氢键稳定 loop 区域构象, 当 DNA结合结构域重复单元的第一 位的氨基酸为天冬酰胺 (N ) 或者组氨酸 (H ) 时, 它们与自身所在重 复序列的第八位的氨基酸主链上的羰基氧原子形成氢键相互作用, 起 到稳定整个 RVD所在 loop构象的作用; C、 RVD 中第二个氨基酸与 DNA 碱基直接相互作用, 当氨基酸残基为天冬氨酸 (D ) 时, 天冬氨 酸的羧基氧会通过氢键与 DNA 中胞嘧啶的氨基直接形成氢键相互作 用; 当氨基酸残基为丝氨酸 (S ) 时, 丝氨酸中羟基与腺嘌呤中的 N7 形成直接氢键相互作用; 当氨基酸残基为甘氨酸(G ) 时, 它与胸腺嘧 啶甲基之间会有范德华力相互作用, 但是 D、 如 A图所示的分子中, RVD为 NG的 loop 构象; E、如 B图所示的分子中, RVD为 NG的 loop 构象。  Figure 2 shows the interaction between dHax3 and DNA bases. A, the side chain of RVD in dHax3 is pointed, the first amino acid in RVD does not extend into the DNA major groove, and the second amino acid extends the amino acid side chain to the DNA major groove; B, the first amino acid in RVD passes Hydrogen bond stabilizes the loop region conformation. When the first amino acid of the DNA binding domain repeat unit is asparagine (N) or histidine (H), they are the eighth amino acid backbone of the repeat sequence. The carbonyl oxygen atom forms a hydrogen bond interaction, which acts to stabilize the loop conformation of the entire RVD; C, the second amino acid in RVD directly interacts with the DNA base, when the amino acid residue is aspartic acid (D) When the carboxyl oxygen of aspartic acid directly forms a hydrogen bond with the amino group of cytosine in DNA through hydrogen bonding; when the amino acid residue is serine (S), the hydroxyl group in serine forms direct hydrogen with N7 in adenine. Bond interaction; when the amino acid residue is glycine (G), there is a van der Waals interaction between it and the thymine methyl group, but D, as in the molecule shown in Figure A, RVD is the loop conformation of NG; E. In the molecule shown in Figure B, RVD is the loop conformation of NG.
图 3是胸腺嘧啶 (左) 与 5-曱基胞嘧啶 (右) 结构比较图。 从图 中对比可以清楚的发现胸腺嘧啶 (左) 与 5-曱基胞嘧啶 (右) 的唯一 区别是六位上的氨基和羰基氧原子。 而不论是氨基, 还是羰基氧原子 都可能通过范德华力与蛋白质的氨基酸残基相互作用。 图 4显示生化实验和晶体结构解析揭示了 TALE蛋白通过 NG识别 5- 曱基胞嘧啶。 a、 dHax3识别的含 5-曱基胞嘧啶( 5mC ) 的 DNA序列 (该序列称为 dHax3-5mC, 含有 3个 5mC , 只显示 dHax3的 RVD所 识别的碱基,具体序列详见实施例)以及 dHax3蛋白中的相应的 RVD; b、 EMSA检测 dHax3对不含 5mC的 DNA序列 (称为 dHax3 box, 其 与 dHax3-5mC序列相同,除了 5mC为 C )以及 dHax3对含 5mC的 DNA 序列 (dHax3-5mC ) 的结合能力, 每个泳道中加入大约 4 nM的核酸探 针; 同时泳道 0〜10的样品中加入了梯度浓度的 dHax3蛋白, 分别为浓 度 0 , 8nM , 16nM, 31.5nM, 62.5nM, 125nM, 250nM, 500nM, Ι ΟΟΟηΜ,Figure 3 is a comparison of the structure of thymine (left) and 5-mercapto-cytosine (right). It is clear from the comparison in the figure that the only difference between thymine (left) and 5-mercapto-cytosine (right) is the amino group at the six position and the carbonyl oxygen atom. Both the amino group and the carbonyl oxygen atom may interact with the amino acid residues of the protein by van der Waals forces. Figure 4 shows that biochemical experiments and crystal structure analysis revealed that TALE protein recognizes 5-mercapto-cytosine through NG. a, dHax3-recognized DNA sequence containing 5-mercapto-cytosine (5mC) (this sequence is called dHax3-5mC, contains three 5mC, only shows the base recognized by RVD of dHax3, the specific sequence is shown in the examples) And the corresponding RVD in the dHax3 protein; b. EMSA detects the dHax3 pair of DNA sequences without 5mC (called dHax3 box, which is identical to the dHax3-5mC sequence except for 5mC) and the dHax3 pair of 5mC DNA sequences (dHax3) -5mC) binding capacity, about 4 nM nucleic acid probe was added to each lane; and gradient concentrations of dHax3 protein were added to the lanes 0 to 10, respectively, concentration 0, 8nM, 16nM, 31.5nM, 62.5nM , 125nM, 250nM, 500nM, Ι ΟΟΟηΜ,
: G 2000nM , 4000nM; c、 dHax3的 DNA结合域( dHax3-A )与含 5mC的 DNA序列 (dHax3-5mC ) 的复合物晶体结构, 显示侧链的碱基为 5-甲 基胞嘧啶, 甘氨酸与 5-甲基胞嘧啶形成范德华力相互作用, 这种相互 作用与甘氨酸与胸腺嘧啶。 : G 2000nM , 4000nM; c, the crystal structure of the DNA binding domain of dHax3 ( dHax3-A ) and the DNA sequence containing 5mC (dHax3-5mC ), showing that the base of the side chain is 5-methylcytosine, glycine The van der Waals interaction with 5-methylcytosine interacts with glycine and thymine.
图 5是电泳图, 显示了 dHax3全长蛋白的纯化结果。 泳道标注说 Figure 5 is an electropherogram showing the purification results of the dHax3 full-length protein. Lane marking
: 明: 1. 全菌破 液; 2. 全菌破碎离心沉淀; 3. 全菌破碎离心上清液; 4. 镍柱培养弃液; 5. 镍柱清洗液; 6. 镍柱洗脱回收液; 7. 镍柱柱材; 8. 分子量标志物。 : Ming: 1. Whole bacteria breaking liquid; 2. Whole bacteria crushing and centrifugation; 3. Whole bacteria crushing centrifugal supernatant; 4. Nickel column culture waste liquid; 5. Nickel column cleaning liquid; 6. Nickel column elution recovery Liquid; 7. Nickel column; 8. Molecular weight marker.
图 6是电泳图, 显示了 dHax3截短体蛋白(dHax3-A )的纯化结果。 泳道标注说明: A. 全菌破碎液; P. 全菌破碎离心沉淀; S. 全菌破碎 0 离心上清液; F. 镍柱穿透液; W1. 镍柱清洗液 1 ; W1. 镍柱清洗液 2;  Figure 6 is an electropherogram showing the purification results of the dHax3 truncated body protein (dHax3-A). Lane markings: A. Whole bacterial crushing solution; P. Whole bacterial crushing and centrifugation; S. Whole bacterial disruption 0 Centrifugal supernatant; F. Nickel column penetrating solution; W1. Nickel column cleaning solution 1; W1. Cleaning solution 2;
E. 镍柱洗脱回收液; R. 镍柱柱材; M. 分子量标志物。  E. Nickel column elution recovery solution; R. Nickel column column; M. Molecular weight marker.
图 7 显示 DNA结合实验证明 NG可以特异性识别曱基化胞嘧啶。 a, 用于检测 DNA结合的不同 DNA探针(只显示 dHax3的 RVD所识 别的碱基, 详见实施例 )。 6T-6C表示将 dHax3-box中的 6个胸腺嘧啶 S ( T )用 6个胞嘧啶( C )替换; 6T-6mC表示将 dHax3-box中的 6个胸 腺嘧啶( T )用 6个甲基化胞嘧啶( 5mC )替换; 5C-5mC表示将 dHax3-box 中的 5个胞嘧啶 (C ) 用 5个甲基化胞嘧啶 (5mC ) 替换; 5C-5mC表 示将 dHax3-box中的 5个胞嘧啶(C ) 用 5个甲基化胞嘧啶( 5mC )替 换; 5C-5T表示将 dHax3-box中的 5个胞嘧啶( C )用 5个胸腺嘧啶(5T ) 0 替换; 5C-5A表示将 dHax3-box中的 5个胞嘧啶(C )用 5个腺嘌呤( A ) 替换; 5C-5G表示将 dHax3-box中的 5个胞嘧啶(C )用 5个鸟嘌呤(G ) 替换。 b, dHax3与含有六个曱基化修饰的 DNA序列 ( 6T-6mC ) 具有 与对照组实验 ( dHax3-box ) 相似的结合能力。 c , dHax3 中的一种 RVD—— NG——不能结合没有曱基化修饰的胞嘧啶 (C ) 。 d, dHax3 中的一种 RVD—— HD——对于胞嘧啶(C ) 是特异性的识别, 并且甲 基化修饰会影响 HD与胞嘧啶的识别。 在 EMSA实验中, 向泳道卜 5、 6〜10、 1 1 -15 , 16〜20中加入梯度浓度的 dHax3全长蛋白, 浓度分别为 0、 146nM、 440nM、 1330nM和 4000nM。 Figure 7 shows that DNA binding experiments demonstrate that NG can specifically recognize thiolated cytosine. a, different DNA probes for detecting DNA binding (only the bases recognized by the RVD of dHax3 are shown, see the examples for details). 6T-6C indicates that 6 thymines (T) in dHax3-box were replaced with 6 cytosines (C); 6T-6mC means 6 thymines (T) in dHax3-box with 6 methyl groups Cytosine (5mC) substitution; 5C-5mC means replacing 5 cytosines (C) in dHax3-box with 5 methylated cytosines (5mC); 5C-5mC means 5 of dHax3-box Cytosine (C) was replaced with 5 methylated cytosines (5mC); 5C-5T was substituted for 5 cytosines (C) in dHax3-box with 5 thymine (5T) 0; 5C-5A Five cytosines (C) in dHax3-box were replaced with five adenines (A); 5C-5G indicated that five cytosines (C) in dHax3-box were replaced with five guanines (G). b, dHax3 has a DNA sequence containing six thiolation modifications (6T-6mC) Similar binding ability to the control experiment (dHax3-box). c, an RVD in dHax3, NG, cannot bind to cytosine (C) without thiolation. d, an RVD in dHax3 - HD - is specifically recognized for cytosine (C), and methylation modification affects the recognition of HD and cytosine. In the EMSA experiment, gradient concentrations of dHax3 full-length protein were added to lanes 5, 6 to 10, 1 1 -15, and 16 to 20 at concentrations of 0, 146 nM, 440 nM, 1330 nM, and 4000 nM, respectively.
图 8是 dHax3-NN变体的 DNA结合结构域 (dHax3-NN-A, 即将 dHax3 的 DNA结合域的第七个重复单元中的 RVD ( NS ) 通过点突变 技术变成 NN并将第九个重复单元中 RVD ( HD )通过点突变技术变成 NN ,以形成对两个曱基化 CpG岛的识别,其具体识别序列参见实施例) 结合含有两个甲基化 CpG岛 DNA的晶体结构示意图。 具体实施方式  Figure 8 is the DNA binding domain of dHax3-NN variant (dHax3-NN-A, that is, the RVD (NS) in the seventh repeat unit of the DNA binding domain of dHax3 is changed to NN by point mutation technique and the ninth RVD (HD) in the repeat unit is changed to NN by point mutation technique to form recognition of two thiolated CpG islands, and the specific recognition sequence thereof is shown in the examples.) The crystal structure of the DNA containing two methylated CpG islands is combined. . detailed description
发明人成功解析了经过改造的 TALE 蛋白 Hax3 (在本文中称为 dHax3 ( designed Hax3 ) ) 的 DNA结合结构域与 dsDNA的复合物晶 体结构。 该结构揭示出 RVD特异识别每一个 DNA碱基的分子基础, RVD中的 NG依靠范德华力与胸腺嘧啶的 5-甲基相互作用, 胸腺嘧啶 其他基团不参与反应。 这一发现提示, TALE蛋白可能通过 NG特异识 别 DNA汉链中的 5-甲基胞嘧啶,因为 5-曱基胞嘧啶与胸腺嘧啶具有类 似的结构。 发明人还成功解析了 dHax3 的 DNA结合结构域与具有 5- 曱基胞嘧啶的 dsDNA的复合物晶体结构。  The inventors successfully analyzed the crystal structure of the complex of the DNA binding domain of the modified TALE protein Hax3 (referred to herein as dHax3 (designed Hax3)) and dsDNA. This structure reveals that RVD specifically recognizes the molecular basis of each DNA base. NG in RVD relies on van der Waals force and 5-methyl interaction of thymine, and other thymine groups do not participate in the reaction. This finding suggests that the TALE protein may specifically recognize 5-methylcytosine in the DNA Han chain through NG because 5-mercapto-cytosine has a similar structure to thymine. The inventors also successfully resolved the crystal structure of the complex of the DNA binding domain of dHax3 and the dsDNA having 5-mercaptocytosine.
这个发现提供了一种新型的检测以及干扰胞嘧啶曱基化的方法, 并且可以用于以下方面:  This discovery provides a novel method for detecting and interfering with cytosine thiolation and can be used in the following ways:
1. 癌细胞 CpG岛的检测  1. Detection of cancer cells CpG island
因为 5-曱基胞嘧啶出现在表观遗传学( epigenetics ) 中的一个重要 修饰 DNA 曱基化。 DNA 曱基化是指在 DNA 曱基化转移酶的作用 下,在基因组 CpG二核苷酸的胞嘧啶 5'碳位共价键结合一个甲基基团。 由于 DNA 曱基化与人类发育和肿瘤疾病的密切关系, 特别是 CpG岛 曱基化所致抑癌基因转录失活问题,  Because 5-mercapto-cytosine appears in epigenetics, an important modification of DNA thiolation. DNA thiolation refers to the covalent bond of a methyl group at the 5' carbon position of the cytosine of the genomic CpG dinucleotide under the action of a DNA thiotransferase. Due to the close relationship between DNA thiolation and human development and tumor diseases, especially the transcriptional inactivation of tumor suppressor genes caused by thiolation of CpG islands,
在癌症细胞的基因组中会出现一些甲基化区域, 而在正常的细胞 中这些曱基化现象并不会出现。 由于本发明的方法能够有效区分某一 特定基因组位点上甲基化发生与否, 因此可以作为一种新的癌症细胞 检测手段。 Some methylation regions appear in the genome of cancer cells, and these thiolation phenomena do not occur in normal cells. Since the method of the present invention can effectively distinguish whether methylation occurs at a specific genomic locus, it can be used as a new cancer cell. testing method.
2. 治疗癌症的新方法  2. New ways to treat cancer
癌症细胞的 DNA甲基化抑制了很多抑癌基因的表达。 由于本发明 的方法能特异地重新开启癌症细胞中这些基因的表达, 因此就可以促 5 使癌症细胞的凋亡。 TALE本身就具有激活转录的功能 ,通过设计 TALE 的重复序列上的 RVD, 让它特异性结合有曱基化修饰的抑癌基因上游 启动子序列, 特异地开启癌症细胞的抑癌基因的大量表达, 达到杀死 癌症细胞的目的。  DNA methylation of cancer cells inhibits the expression of many tumor suppressor genes. Since the method of the present invention specifically reopens the expression of these genes in cancer cells, it can promote apoptosis of cancer cells. TALE itself has the function of activating transcription. By designing the RVD on the TALE repeat sequence, it specifically binds to the upstream promoter sequence of the thiolated modified tumor suppressor gene, specifically opening up a large number of tumor suppressor genes in cancer cells. , to achieve the purpose of killing cancer cells.
除非本文另有定义, 本发明使用的相关科学和技术术语具有本领 Unless otherwise defined herein, the relevant scientific and technical terms used in the present invention have the skill
: G 域普通技术人员通常理解的含义。 而且, 除非上下文有其它规定, 单 数形式的术语应当包括复数, 而复数形式的术语应当包括单数。 通常, 与本文所述的分子生物学、 生物化学、 结构生物学及相关使用的命名 以及技术, 是本领域众所周知且普遍使用的那些。 除非另有说明, 下 面的术语应当理解为具有下述含义: : G Domain A common understanding of what ordinary technicians understand. Moreover, unless the context dictates otherwise, the singular term should include the plural and the plural term should include the singular. In general, the nomenclature and techniques of molecular biology, biochemistry, structural biology, and related uses described herein are those well known and commonly employed in the art. Unless otherwise stated, the following terms should be understood to have the following meanings:
: 本文所用的术语 "TALE蛋白" 是指 Transcription Activator Like : The term "TALE protein" as used herein refers to Transcription Activator Like
Effectors , 即转录激活子样效应因子。 TALE蛋白可以为自然界已有的 TALE蛋白以及在此基础上通过基因方法突变、修饰、 组装获得的保持 或增强 DNA、 或 DNA-RNA杂合链结合能力的 TALE衍生蛋白。 Effectors, ie transcriptional activator-like effectors. The TALE protein can be a natural TALE protein and a TALE-derived protein which retains or enhances the DNA, or DNA-RNA hybrid chain binding ability obtained by genetic modification, modification, and assembly.
本文所用的术语 "Hax3 " 是指 TALE 蛋白家族的成员之一。 Hax 0 的全称为 "Homolog of avrBs3 in Xanthomonas" , t¾ Hax3 ^Aff >^ ¾ ^"月包菌 种 Armor aciae ( Xanthomonas campestris pv. Armoraciae )鉴定 出的 3 个同源蛋白之一。 作为 TALE蛋白家族的成员之一, 它的功能 与其他已知的 TALE蛋白如 avrBs3的功能类似(参见 S. Kay, J. Boch, U. Bonas, Characterization of AvrBs3-like effectors from a Brassicaceae pathogen reveals virulence and avirulence activities and a protein with a novel repeat architecture, Molecular plant-microbe interactions : MPMI, 1 8 (2005 ) 838-848. ) 。  The term "Hax3" as used herein refers to one of the members of the TALE protein family. The full name of Hax 0 is "Homolog of avrBs3 in Xanthomonas", t3⁄4 Hax3 ^Aff >^ 3⁄4 ^" One of the three homologous proteins identified by Armor aciae ( Xanthomonas campestris pv. Armoraciae ). One of its members, its function is similar to that of other known TALE proteins such as avrBs3 (see S. Kay, J. Boch, U. Bonas, Characterization of AvrBs3-like effectors from a Brassicaceae pathogen reveals virulence and avirulence activities and a protein with a novel repeat architecture, Molecular plant-microbe interactions : MPMI, 1 8 (2005 ) 838-848. ).
本文所用的术语" dHax3,,是指人工改造的 Hax3 ( designed Hax3 ) , 其基因的核苷酸序列为 SEQ ID NO: 1 ,氨基酸序列可参见 SEQ ID NO:2 0 (其中插入了 6XHis标签)。 M.M. Mahfouz等人设计了 dHax3以使其 具有特异识别如下 DNA 序列的能力: TCCCTTTATCTCT ( M.M. Mahfouz, L. Li, M. Shamimuzzaman, A. Wibowo, X. Fang, J.K. Zhu, De novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks,The term "dHax3" as used herein refers to an artificially engineered Hax3 (designed Hax3) whose nucleotide sequence is SEQ ID NO: 1 and the amino acid sequence can be found in SEQ ID NO: 20 (with a 6XHis tag inserted therein) MM Mahfouz et al. designed dHax3 to have the ability to specifically recognize the following DNA sequences: TCCCTTTATCTCT (MM Mahfouz, L. Li, M. Shamimuzzaman, A. Wibowo, X. Fang, JK Zhu, De Novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks,
Proceedings of the National Academy of Sciences of the United States ofProceedings of the National Academy of Sciences of the United States of
America, 108 (201 1 ) 2623-2628. ) 。 America, 108 (201 1 ) 2623-2628. ).
本文所用的术语 "dHax3截短体蛋白" ("dHax3-A")是指去除了 N端 结构域和 C 端结构域的 dHax3 截短体蛋白, 其为 dHax3 蛋白序列 The term "dHax3 truncated body protein" ("dHax3-A") as used herein refers to a dHax3 truncated protein that has a N-terminal domain and a C-terminal domain removed, which is the dHax3 protein sequence.
230-721 , 具有 1 1.5个重复单元。 230-721 with 1 1.5 repeating units.
本文所用的术语" dHax3-NN 变体"是指 dHax3 的一种变体, 其中 dHax3 的 DNA结合域的第七个重复单元中的 RVD ( NS ) 通过点突变 : G 技术变成 NN并且第九个重复单元中 RVD ( HD )通过点突变技术变成 The term "dHax3-NN variant" as used herein refers to a variant of dHax3 in which the RVD (NS) in the seventh repeat unit of the DNA binding domain of dHax3 is converted by a point mutation: G technology becomes NN and ninth RVD (HD) in a repeating unit becomes a point mutation technique
NN , 以形成对两个两个曱基化 CpG 岛的识别, dHax3-NN如下 DNA 序列: TCCCTTTATCTCT。 NN to form an identification of two two thiolated CpG islands, dHax3-NN following the DNA sequence: TCCCTTTATCTCT.
本文所用的术语" dHax3-NN-A"是指 dHax3-NN 变体的蛋白序列 The term "dHax3-NN-A" as used herein refers to the protein sequence of the dHax3-NN variant.
230-721的截短体, 即保留 DNA结合结构域。 A truncated body of 230-721, that is, a DNA binding domain is retained.
) 5 由于所有 TALE蛋白中的 RVD识别 DNA碱基的分子机制相同, 虽然不同的 TALE蛋白存在一定序列差异性,但是涉及实施例中 dHax3 的 RVD—— NG——特异性识别胞嘧啶曱基化的能力也同样适用于其 他不同于实施例 dHax3序列的其他 TALE蛋白, 也在本专利的保护范 围之内。 5 Because the molecular mechanism of RVD recognition DNA bases in all TALE proteins is the same, although there are certain sequence differences in different TALE proteins, RVD-NG, which involves dHax3 in the examples, specifically recognizes cytosine thiolation. The ability to apply equally to other TALE proteins other than the dHax3 sequence of the examples is also within the scope of this patent.
?.ο 实施例中所采用的各种试剂, 包括緩冲液、 酶、 载体、 试剂盒等, 均可通过商业途径购得或者按照 《分子克隆实验指南》第三版 (黄培堂, 科学出版社, 2002)所推荐的方法配制。 实施例  ?.ο The various reagents used in the examples, including buffers, enzymes, vectors, kits, etc., are commercially available or in accordance with the third edition of the Guide to Molecular Cloning: Huang Peitang, Science Press , 2002) Prepared by the recommended method. Example
25 实施例 1 : 几种 TALE蛋白的构建以及纯化  25 Example 1 : Construction and purification of several TALE proteins
1. 分子克隆及表达载体构建的实验方法如下:  1. The experimental methods for molecular cloning and expression vector construction are as follows:
• PCR扩增目的基因片段  • PCR amplification of target gene fragments
50 μΐ标准 PCR反应体系组成如下表所示,如有需要可按照比例扩 增体系;
Figure imgf000010_0001
The composition of the 50 μΐ standard PCR reaction system is shown in the following table, and the system can be amplified according to the ratio if necessary;
Figure imgf000010_0001
成分 体积 (μΐ)  Component volume (μΐ)
Ex Taq 0.25  Ex Taq 0.25
Ι ΟχΕχ Tag 緩沖液 5  Ι ΟχΕχ Tag Buffer 5
dNTP 4  dNTP 4
DNA模板 2.5 ng  DNA template 2.5 ng
5' 引物 1  5' primer 1
3 ' 引物 1  3 ' Primer 1
ddH20 补齐至 50 μΐ 成功扩增目的片段后, 直接使用普通 DNA回收试剂盒回收扩增的 目的基因片段。 注意, 如果是点突变的扩增基因片段需要先使用琼脂 糖凝胶电泳去除 DNA模板, 然后使用琼脂糖凝胶 DNA回收试剂盒回 收目的基因。 ddH 2 0 was added to 50 μΐ After the target fragment was successfully amplified, the amplified target gene fragment was directly recovered using a common DNA recovery kit. Note that if the amplified gene fragment is a point mutation, the DNA template is first removed by agarose gel electrophoresis, and then the target gene is recovered using an agarose gel DNA recovery kit.
•限制性内切酶处理扩增片段和载体  • Restriction enzyme treatment of amplified fragments and vectors
使用相同的限制性内切酶处理扩增片段和载体, 从而产生相同的 DNA粘性末端。 50 μΐ双酶切反应体系成分如下表所示:  The amplified fragment and vector were treated with the same restriction endonuclease to generate the same DNA cohesive ends. The composition of the 50 μΐ double digestion reaction system is shown in the following table:
50 μΐ标准双酶切反应体系 50 μΐ standard double digestion reaction system
组分 体积 (μΐ)  Component volume (μΐ)
PCR扩增片段或质粒 42 10χ酶切緩沖液 (NEB 緩沖液 4) 5 Ndel 1.2 Xhol 1.8  PCR amplified fragment or plasmid 42 10χ digestion buffer (NEB buffer 4) 5 Ndel 1.2 Xhol 1.8
37 °C温浴 30〜180 min, 估计反应完全后, 进行凝胶电泳 脂糖凝胶 DNA回收试剂盒切胶回收 DNA片段。 After 37 °C warm bath 30~180 min, after the reaction is completed, the DNA fragment is recovered by gel electrophoresis of the lipogel gel DNA recovery kit.
: 馨 DNA连接 : Xin DNA connection
使用 T4 DNA连接酶将酶切后的目的基因片段连入载体 室温反应 30〜120 min。 连接体系如下表所示:
Figure imgf000011_0001
The digested target gene fragment was ligated into the vector at room temperature for 30 to 120 min using T4 DNA ligase. The connection system is shown in the following table:
Figure imgf000011_0001
组分 体积 (μΐ)  Component volume (μΐ)
酶切后目的基因片段 7  Target gene fragment after digestion 7
酶切后载体 1  After digestion, vector 1
10χΤ4连接酶緩冲液 1  10χΤ4 ligase buffer 1
Τ4 DNA连接酶 1 攀转化  Τ4 DNA ligase 1 climbing transformation
将连接产物按照下述方法转入 DH5a 感受态细胞中, 准备筛选阳 性克隆: 在连接产物中加入 50〜100μ1 DH5 感受态细胞, 冰上放置 30min ; 42 °C热击 90s; 水上放置 2min; 将所有产物加到氨苄抗性琼脂 平板上, 用涂布棒涂匀, 37 倒置培养 14-16小时。  The ligation product was transferred into DH5a competent cells according to the following method, and the positive clones were prepared for screening: 50~100μ1 DH5 competent cells were added to the ligation product, placed on ice for 30 min; heat shocked at 42 °C for 90 s; placed on water for 2 min; All products were applied to ampicillin-resistant agar plates, spread with a coating bar, and cultured in an inverted 14-16 hours.
罄使用菌落 PCR法筛选阳性克隆  筛选Use colony PCR to screen positive clones
在前一步得到的平板上标记 4〜8 个菌落, 使用如下体系检验阳性 Mark 4 to 8 colonies on the plate obtained in the previous step, and test positive using the following system.
:0 克隆: :0 clone:
菌落 PCR体系  Colony PCR system
成分 体积 (μΐ)  Component volume (μΐ)
Taq 0.2  Taq 0.2
Ι ΟχΕχ Tag 緩冲液 3  Ι ΟχΕχ Tag Buffer 3
dNTP 2  dNTP 2
DNA 模板 囷洛  DNA template
5 ' 引物 0.3  5 ' Primer 0.3
3 ' 引物 0.3  3 ' Primer 0.3
ddH20 补齐至 30 μΐ 使用凝胶电泳确认结果, 挑取阳性克隆, 在氨苄抗性 LB培养基中 37 °C、 220 rpm培养过夜。 ddH 2 0 was added to 30 μΐ The results were confirmed by gel electrophoresis, and positive clones were picked and cultured in ampicillin-resistant LB medium at 37 ° C and 220 rpm overnight.
: 5 *质粒提取  : 5 * Plasmid extraction
使用普通质粒小提试剂盒提取质粒, 测序由金唯智 (genewiz ) 生 物科技有限公司完成。  The plasmid was extracted using a common plasmid mini-kit, and sequencing was performed by Genewiz Biotech Co., Ltd.
拳重组蛋白的诱导表达 Induced expression of punch recombinant protein
◦ 为了获得大量纯化的蛋白, 需要进行过量表达。 现有的过量表达 体系有大肠杆菌(£. / )、 酵母、 昆虫细胞等。 不同的蛋白可能适合在 不同的体系中表达。 目的蛋白是革兰氏阴性菌中的一种蛋白, 所以选 择大肠杆菌作为表达体系进行蛋白表达纯化。 过量 In order to obtain a large amount of purified protein, overexpression is required. Existing overexpression systems are Escherichia coli (£. / ), yeast, insect cells, and the like. Different proteins may be suitable for expression in different systems. The target protein is a protein in Gram-negative bacteria, so E. coli was selected as an expression system for protein expression purification.
纯化出性质好, 纯度高的蛋白质是进行生化实验及结晶实验的前 提条件。 从大肠杆菌中纯化重组表达蛋白技术已经相当成熟。 为了方 便的使用亲和层析进行纯化, 构建了带有各种标签的重组蛋白。 经过 比较, 采用带有组氨酸标签的重组蛋白进行后续实验。 6个组氨酸组成 的组氨酸标签可以以配位键的形式结合到带有镍等金属原子的柱材 上。 经过镍柱亲和层析和肝素亲和层析纯化就可以得到纯度大约 95% 以上的蛋白。  Purification of high-quality, high-purity proteins is a prerequisite for biochemical experiments and crystallization experiments. The purification of recombinantly expressed proteins from E. coli has been quite mature. For easy purification using affinity chromatography, recombinant proteins with various tags were constructed. After comparison, recombinant proteins with histidine tag were used for subsequent experiments. A histidine tag composed of six histidines may be bonded to a column having a metal atom such as nickel in the form of a coordinate bond. Proteins with a purity of about 95% or more can be obtained by nickel column affinity chromatography and heparin affinity chromatography.
具体纯化步骤如下: 接入 50ml 含有氨苄青霉素或者氨苄青霉素 /氯霉 双抗的 LB培养基, 并置于 37°C摇床培养过夜。  The specific purification steps are as follows: 50 ml of LB medium containing ampicillin or ampicillin/chloramphenicol antibody was added and incubated overnight at 37 °C on a shaker.
b. 将 5-10ml的小瓶培养液转接到 1 L含有抗生素的 LB培养基于 37°C摇床培养约 3小时。 当 0D600=0.8〜: 1.0时, 加入 0.2mM 终浓度的 IPTG22 °C诱导表达 14〜16小时。  b. Transfer 5-10 ml vial culture to 1 L of antibiotic-containing LB medium and incubate at 37 °C for about 3 hours. When 0D600 = 0.8 to: 1.0, the expression was induced by adding 0.2 mM final concentration of IPTG at 22 °C for 14 to 16 hours.
c 完成诱导的大肠杆菌于 4°C4400rpm离心 lOmin, 弃上清。 每升 培养液离心收集的湿菌用 20 ml 裂菌液(25 11 1^丁1^-1^1 1^ 8.0 , 500 mM NaCl ) 重悬。  c The induced E. coli was centrifuged at 4400 rpm for 4 min at 10 ° C, and the supernatant was discarded. The wet bacteria collected by centrifugation per liter of culture medium were resuspended in 20 ml of lytic solution (25 11 1 ^ butyl 1^-1^1 1 8.0 8.0, 500 mM NaCl).
d. 超声破菌后, 14000rpm离心 50min, 取上清进行后续纯化。 e. 将上清緩緩加入事先用裂菌液( 25 mM Tris-HCl pH8.0, 500 mM NaCl ) 平衡好的镍柱中。 将穿过液重复上述操作 1〜2次。 d. After ultrasonic disruption, centrifuge at 14000 rpm for 50 min, and take the supernatant for subsequent purification. e. Slowly add the supernatant to a nickel column that has been previously equilibrated with a lysis solution (25 mM Tris-HCl pH 8.0, 500 mM NaCl). Repeat the above operation 1 to 2 times with the passing solution.
5 f. 加入清洗緩沖液 I ( 25 mM Tris-HCl pH 8.0 , 1000 mM NaCl ) 10ml, 除去部分杂质。 重复上述操作 3次。 5 f. Add 10 ml of Wash Buffer I (25 mM Tris-HCl pH 8.0, 1000 mM NaCl) to remove some impurities. Repeat the above operation 3 times.
g. 加入清洗緩沖液 II ( 25 mM Tris-HCl pH 8.0; 100 mM NaCl; l OmM Imidazole ) 10ml , 进一步除去杂蛋白。  g. Add 10 ml of Wash Buffer II (25 mM Tris-HCl pH 8.0; 100 mM NaCl; 1 OmM Imidazole) to further remove the heteroprotein.
h. 加入洗脱緩沖液(25 mM Tris-HCl pH 8.0, 50 mM NaCl , 300mM Imidazole ) 10ml , 将目的蛋白从镍柱上洗脱。 用考马斯亮蓝 G-250检 测是否洗脱干净, 如洗脱不完全, 重复上述操作。  h. Add 10 ml of elution buffer (25 mM Tris-HCl pH 8.0, 50 mM NaCl, 300 mM Imidazole) to elute the protein of interest from the nickel column. Use Coomassie Brilliant Blue G-250 to check for cleanliness. If the elution is incomplete, repeat the above procedure.
i. 将洗脱下来的蛋白緩緩加入事先已用緩冲液 (25 mM Tris-HCl PH 8.0, 50 mM NaCl)平衡好的肝素柱 ( heparin sepharose6 Fast Flow) 。 将穿过液重复上述操作 1〜2次。 i. Slowly add the eluted protein to the previously used buffer (25 mM Tris-HCl) PH 8.0, 50 mM NaCl) Heparin sepharose 6 Fast Flow. The above operation was repeated 1 to 2 times with the passing liquid.
j. 加入清洗緩沖液 I ( 25 mM Tris-HCl pH 8.0, 100 mM NaCl ) 10 ml, 除去杂质。 重复上述操作 3次。  j. Add 10 ml of Wash Buffer I (25 mM Tris-HCl pH 8.0, 100 mM NaCl) to remove impurities. Repeat the above operation 3 times.
k. 加入洗脱緩沖液 ( 25 mM Tris-HCl pH8.0, 1000 mM NaCl, 10 mM DTT) 10ml, 将目的蛋白从肝素柱上洗脱。 用考马斯亮蓝 G-250 检测是否洗脱干净。 如洗脱不完全, 重复上述操作。 使用 SDS-PAGE 鉴定蛋白纯度。  k. Add 10 ml of elution buffer (25 mM Tris-HCl pH 8.0, 1000 mM NaCl, 10 mM DTT) to elute the protein of interest from the heparin column. Use Coomassie Brilliant Blue G-250 to check for cleanliness. If the elution is not complete, repeat the above procedure. Protein purity was identified using SDS-PAGE.
1. 经过上述两步亲和层析纯化得到的蛋白, 使用超滤浓缩管浓缩 到〜 10mg/ml。最后使用分子筛(Superdax 200) 进一步纯化蛋白并检测蛋 白性质, 分子筛所使用的緩沖液为 25 mM Tris-HCl pH8.0, 150 mM NaCl, 10 mM DTT。 使用脱盐柱 ( Hiprep 26/10) dHax3(231-720) 蛋白所在緩沖液置换为 25 mM MES pH 6.0,50 mM NaCl, 5 mM MgCl2, 10 mM DTT。 1. The protein purified by the above two-step affinity chromatography was concentrated to ~10 mg/ml using an ultrafiltration concentrating tube. Finally, the protein was further purified using a molecular sieve (Superdax 200) and the protein was used. The buffer used for the molecular sieve was 25 mM Tris-HCl pH 8.0, 150 mM NaCl, 10 mM DTT. The buffer in the desalting column (Hiprep 26/10) dHax3 (231-720) protein was replaced with 25 mM MES pH 6.0, 50 mM NaCl, 5 mM MgCl 2 , 10 mM DTT.
2. dHax3及 dHax3-A的构建与表达 2. Construction and expression of dHax3 and dHax3-A
dHax3 ( designed Hax3 )基因通过全基因合成得到,序列如下( SEQ  The dHax3 (designed Hax3) gene is obtained by whole gene synthesis and the sequence is as follows (SEQ
Figure imgf000013_0001
Figure imgf000014_0001
Figure imgf000013_0001
Figure imgf000014_0001
TGGTTAATGGAACTTCTACCGCAATGA TGGTTAATGGAACTTCTACCGCAATGA
合成的基因直接被连入 pET300 ( invitrogen )质粒。 表达出来的全 长蛋白, N端有 6 个组氨酸标签, 用于蛋白纯化时通过镍柱的亲和纯 化。 全长蛋白序列如下 (SEQIDNO:2) : The synthesized gene was directly ligated into the pET300 (invitrogen) plasmid. The expressed full-length protein has six histidine tags at the N-terminus and is used for affinity purification of the nickel column by protein purification. The full-length protein sequence is as follows (SEQ ID NO: 2):
EELAWLMELLPQ EELAWLMELLPQ
dHax3全长蛋白的纯化图如图 5所示(利用 6 χ组氨酸标签经由镍 柱亲和层析纯化, SDS-PAGE电泳后经考马斯亮蓝显色) 。  The purification map of dHax3 full-length protein is shown in Figure 5 (purified by nickel column affinity chromatography using a 6 χ histidine tag, and Coomassie blue is developed by SDS-PAGE electrophoresis).
通过蛋白质二级结构预测, 发明人发现蛋白质的 Ν端和 C端都有 一大段没有二级结构区域。 这些区域不适合蛋白质结晶, 发明人于是 设计了截短体蛋白 (dHax3截短体, 标记为 dHax3-A ) , 包含蛋白序列 230-721 ) 来获得性质更加稳定的蛋白质。 dHax3 截短体被克隆到 pET21 (Novagen)表达载体中。表达出来的 dHax3截短体蛋白序列如下, 其中 C端含有 His6标签,用于蛋白纯化时通过镍柱的亲和纯化( SEQ ID Through protein secondary structure prediction, the inventors found that both the apical and C-terminal ends of the protein have a large number of regions without secondary structure. These regions are not suitable for protein crystallization, and the inventors have therefore designed a truncated body protein (dHax3 truncation, labeled dHax3-A) containing the protein sequence 230-721) to obtain a more stable protein. The dHax3 truncation was cloned into the pET21 (Novagen) expression vector. The expressed dHax3 truncated protein sequence is as follows, wherein the C-terminus contains a His 6 tag for affinity purification by nickel column for protein purification (SEQ ID
Figure imgf000016_0001
dHax3截短体蛋白的纯化图如图 6所示 (利用 Histidine6标签经由镍柱 亲和层析纯化, SDS-PAGE电泳后经考马斯亮蓝显色) 。 3. dHax3-NN-A的构建与表达
Figure imgf000016_0001
The purification map of the dHax3 truncated body protein is shown in Figure 6 (purified by nickel column affinity chromatography using a Histidine 6 tag, and subjected to Coomassie blue development by SDS-PAGE electrophoresis). 3. Construction and expression of dHax3-NN-A
发明人还构建并表达了 dHax3-NN-A蛋白用于与含有 CpG岛的 The inventors also constructed and expressed the dHax3-NN-A protein for use with CpG islands.
DNA序列的共结晶实验。 表 2显示了实验中涉及的 TALE重复单元的 RVD与其识别的 DNA对 应关系: Co-crystallization experiments of DNA sequences. Table 2 shows the RVD of the TALE repeat unit involved in the experiment and its identified DNA:
dHax3的 0 1 2 3 4 5 6 7 8 9 10 11 11.5dHax3 0 1 2 3 4 5 6 7 8 9 10 11 11.5
RVD HD HD HD NG NG NG NS NG HD NG HD NG RVD HD HD NG NG NG NS NG HD NG HD NG
dHax3-box τ C C C T T T A T C T C T dHax3-5mC τ C C C T mC T A mC C T C mC dHax3-N 0 1 2 3 4 5 6 7 8 9 10 11 11.5 的 RVD HD HD HD NG NG NG NN NG NN NG HD NG dHax3-CpG τ C C C T T mC G mC G T C T 实施例 2: 获得 dHax3晶体结构以及 dHax3-A与双链 DNA的复 合物晶体结构 dHax3-box τ CCCTTTATCTCT dHax3-5mC τ CCCT mC TA mC CTC mC dHax3-N 0 1 2 3 4 5 6 7 8 9 10 11 11.5 RVD HD HD HD NG NG NG NG NN NG HD NG dHax3-CpG τ CCCTT mC G mC GTCT Example 2: Obtaining the crystal structure of dHax3 and the crystal structure of the complex of dHax3-A and double-stranded DNA
撃单双链 DNA的获得  Obtaining single-double-stranded DNA
为了检验 dHax3 与单双链 DNA 的结合能力, 以及获得蛋白质与 dsDNA 复合物的晶体, 发明人通过化学合成的方法得到单链 DNA ( 17nt ) : ( Invitrogen & Takara )  In order to examine the ability of dHax3 to bind to single-stranded DNA and to obtain crystals of protein and dsDNA complexes, the inventors obtained single-stranded DNA (17 nt) by chemical synthesis: (Invitrogen & Takara)
5' TG TCCCTTTATCTCT CT 3' (SEQ ID NO:4 )  5' TG TCCCTTTATCTCT CT 3' (SEQ ID NO: 4)
3' AC AGGGAAATAGAGA GA 5' (SEQ ID NO:5)  3' AC AGGGAAATAGAGA GA 5' (SEQ ID NO: 5)
将合成得到的单链 DNA溶解至 1 mM, 等摩尔比将两条单链 DNA 混合, 85 °C 温浴 3 min以上, 緩慢降温到 22 °C , 此过程不得少于 3个 小时。 为了长期保存退火的双链 DNA可以进行冻干超低温保存。  The synthesized single-stranded DNA was dissolved to 1 mM, the two single-stranded DNAs were mixed in an equimolar ratio, and the bath was heated at 85 ° C for more than 3 min, and slowly cooled to 22 ° C, which was not less than 3 hours. For long-term preservation of the annealed double-stranded DNA, lyophilization and cryopreservation can be performed.
肇复合物结晶的获得  Obtainment of ruthenium complex crystal
将纯化好的 dHax3截短体蛋白(全长序列中的 231-721)调整蛋白浓 度在 6〜7 mg/ml , 加入摩尔比 1.5 : 1的退火后的双链 DNA, 4 °C孵育 30 min. The purified dHax3 truncated body protein (231-721 in the full-length sequence) was adjusted to a protein concentration of 6 to 7 mg/ml, and the double-stranded DNA after annealing at a molar ratio of 1.5:1 was added and incubated at 4 °C. 30 min.
前期的结晶条件筛选主要是基于商业化的 Screen Kit , 包括: Hampton公司的 SaltRX, Natrix, PEG/Ion, Crystal Screen, Index; Emerald 公司的 Wizard I , II , III ; Molecular dimension的 ProPlex。  The previous crystallization conditions were mainly based on the commercial Screen Kit, including: SaltRX from Hampton, Natrix, PEG/Ion, Crystal Screen, Index; Wizard I, II, III from Emerald; ProPlex from Molecular dimension.
从上述 Kit中筛选出蛋白结晶的条件,通过调节沉淀剂浓度,种类; 盐离子的浓度和种类;緩沖液的浓度和种类优化结晶条件。使用 Addtive Screen和 Detergent Screen Kit对晶体进行优化。 同时对晶体进行脱水 , 退火等尝试, 以提高晶体的衍射质量。  The conditions for protein crystallization were screened from the above Kit, and the crystallization conditions were optimized by adjusting the concentration of the precipitant, the species, the concentration and type of the salt ions, and the concentration and type of the buffer. The crystal was optimized using the Addtive Screen and the Detergent Screen Kit. At the same time, the crystal is dehydrated, annealed, etc. to improve the diffraction quality of the crystal.
使用蛋白质结晶没有规律可循, 所以到目前为止仍然还是一门艺 ': 0 术。 起始阶段常用 Sparse matrix screen, 即购买各公司配置的结晶条件 进行筛选。 大多数情况下, 初筛得到的结晶条件中并不能长出衍射质 量高的晶体, 在接下来的实验中, 发明人又进一步对初始结晶条件的 基础上进一步细化, 包括调整沉淀剂、 pH緩冲液、 盐、 添加还原剂、 去垢剂或醇; 调整结晶实验的温度, 时间等。 最后采用的结晶条件为 There is no regularity in the use of protein crystallization, so it is still a piece of art ': 0 surgery so far. Sparse matrix screen is commonly used in the initial stage, that is, the crystallization conditions of each company's configuration are purchased for screening. In most cases, crystals with high diffraction quality cannot be grown in the crystallization conditions obtained by the initial screening. In the following experiments, the inventors further refined the initial crystallization conditions, including adjusting the precipitant, pH. Buffer, salt, addition of reducing agent, detergent or alcohol; adjust the temperature, time, etc. of the crystallization experiment. The final crystallization conditions used are
1 将如下结晶母液与孵育好的蛋白核酸复合物通过 1 : 1的体积比混合,通 过悬滴法 ( hanging drop vapor diffusion method )在 18 °C培养两天, 即 可获得晶体。 1 The following crystallization mother liquid and the incubated protein nucleic acid complex were mixed by a volume ratio of 1:1, and cultured at 18 ° C for two days by a hanging drop vapor diffusion method, whereby crystals were obtained.
结晶母液: 8-10% PEG3350 (w/v), 12% ethanol, 0.1 M MES pH 6.0。 攀数据收集及处理  Crystallization mother liquor: 8-10% PEG3350 (w/v), 12% ethanol, 0.1 M MES pH 6.0. Climbing data collection and processing
?.o 使用上海同步辐射中心 ( SSRF ) BL17U 线束站 或者 曰本 ?.o use Shanghai Synchrotron Radiation Center (SSRF) BL17U harness station or transcript
SPRING-8 BL41XU 线束站进行数据收集 。 所有收集的衍射数据用 H L2000软件进行积分计算, 进一步的数据处理通过 CCP4软件实现。 使用不结合 DNA的 dHax3作为置换的模式, 通过分子置换的方法, 解 析 dHax3与 DNA复合物的结构。 最后使用 Phenix 和 COOT 两个软SPRING-8 BL41XU harness station for data collection. All collected diffraction data were integrated using H L2000 software, and further data processing was performed by CCP4 software. Using dHax3, which does not bind DNA, as a mode of substitution, the structure of dHax3 and DNA complexes was analyzed by molecular replacement. Finally use Phenix and COOT two soft
?.5 件完成对结构的修正处理。数据处理和结构解析、修正完成之后, dHax3 蛋白的结构分辨率达到 2.4A , dHax3 蛋白与 dsDNA 复合物结构达到 1.85A。 数据收集和结构修正的统计数据, 见下表: 表 3 dHax3晶体结构以及 dHax3 -△与双链 DNA的复合物晶体结构的数?. 5 pieces complete the correction processing of the structure. After data processing and structural analysis and modification, the structural resolution of dHax3 protein reached 2.4A, and the structure of dHax3 protein and dsDNA complex reached 1.85A. The statistics of data collection and structural correction are shown in the following table: Table 3 The crystal structure of dHax3 and the crystal structure of the complex of dHax3 -△ and double-stranded DNA
30 据收集和结构修正的统计数据 30 Statistics on collection and structural corrections
数据 dHax3 ( 270-703 ) DNA ( dHax3 box ) 结合 的 dHax3-A Integration Package HKL2000 HKL2000 Data dHax3 ( 270-703 ) DNA ( dHax3 box ) combined with dHax3-A Integration Package HKL2000 HKL2000
Space Group C222! P2, Space Group C222! P2,
Unit Cell (A) 74.76, 95.51 , 153.21 81.719, 87.679, 88.494 Unit Cell (°) 90, 90, 90 90.00, 103.04, 90.00 Wavelength (A) 0.97915 1.00000 Unit Cell (A) 74.76, 95.51 , 153.21 81.719, 87.679, 88.494 Unit Cell (°) 90, 90, 90 90.00, 103.04, 90.00 Wavelength (A) 0.97915 1.00000
Resolution (A) 40-2.4 (2.49-2.4) 40-1.85 (1.92-1.85)Resolution (A) 40-2.4 (2.49-2.4) 40-1.85 (1.92-1.85)
Rmerge (%) 4.9 (35.0) 6.1 (60.8) Rmerge (%) 4.9 (35.0) 6.1 (60.8)
I/sigma 24.1 (4.4) 22.5 (2.6)  I/sigma 24.1 (4.4) 22.5 (2.6)
Completeness (%) 95.6 (98.2) 99.7 (99.9)  Completeness (%) 95.6 (98.2) 99.7 (99.9)
Number of measured 84,417 391 ,380 Number of measured 84,417 391 ,380
reflections Reflections
Number of unique 20,832 103,239 Number of unique 20,832 103,239
reflections Reflections
Redundancy 4.1 (4.1) 3.8 (3.7) Redundancy 4.1 (4.1) 3.8 (3.7)
Wilson B factor (A2) 60.9 24.6 Wilson B factor (A 2 ) 60.9 24.6
R I free (%) 21. 1 1/ 26.36 19.07 / 21.99  R I free (%) 21. 1 1/ 26.36 19.07 / 21.99
No. atoms  No. atoms
Overall 2760 9579  Overall 2760 9579
Protein 271 1 7066  Protein 271 1 7066
DNA 0 1383  DNA 0 1383
Water 49 1 130  Water 49 1 130
Other entities 0 0  Other entities 0 0
Average B value (A2) Average B value (A 2 )
Overall 63.86 33.26  Overall 63.86 33.26
Protein 63.89 31.94  Protein 63.89 31.94
DNA 0.0 33.98  DNA 0.0 33.98
Water 62.47 40.58  Water 62.47 40.58
Other entities 0.0 0.0  Other entities 0.0 0.0
R.m.s. deviations R.m.s. deviations
Bonds (A) 0.009 0.008 Bonds (A) 0.009
Angle (。) 1.301 1.184 Angle (.) 1.301 1.184
Ramachandran plot  Ramachandran plot
statistics (%) Statistics (%)
Most favourable 92.7 93.5 Most favourable 92.7 93.5
Additionally allowed 7.3 6.5 Generously allowed 0.0 0.0 Width allowed 7.3 6.5 Generously allowed 0.0 0.0
Disallowed 0.0 0.0 发明人解析了 dHax3-A与双链 DNA ( dsDNA )的高分辨率晶体结 构( 1.8埃)。该结构清晰地展示了 dHax3展现右手螺旋结构,将 dsDNA 包裹于整个复合体的中间。 蛋白质缠绕在 DNA外面, 嵌入 DNA的大 5 沟 (见图 1 ) 。  Disallowed 0.0 0.0 The inventors analyzed the high resolution crystal structure (1.8 angstroms) of dHax3-A and double-stranded DNA (dsDNA). This structure clearly demonstrates that dHax3 exhibits a right-handed helical structure that wraps dsDNA in the middle of the entire complex. The protein is entangled outside the DNA and embedded in the large 5 groove of DNA (see Figure 1).
结构显示位于每个重复序列中第 12位氨基酸 (组氨酸 /天冬酰胺) 并不直接与 DNA 直接相互作用,相反它们都会与自身所在的重复序列 的第 8 个氨基酸 (丙氨酸) 的主链氧原子形成一个氢键, 从而起到固 定整个 RVD 所在环的作用。  The structure shows that the 12th amino acid (histidine/asparagine) located in each repeat does not directly interact with DNA, but instead they will be the 8th amino acid (alanine) of the repeat sequence in which they are located. The main chain oxygen atom forms a hydrogen bond, which acts to fix the ring in which the entire RVD is located.
: o 每个重复序列中的第 13位氨基酸, 如果是丝氨酸 /天冬氨酸, 那么 它们与 DNA中的碱基形成氢键直接相互作用; 如果是甘氨酸, 那么它 与胸腺嘧啶的曱基之间形成范德华力相互作用 (见图 2 ) 。 实施例 3. 获得 dHax3-A 与 dHax3-5mC 的复合物晶体结构以及 :5 dHax3-NN-A与 dHax3-CpG的复合物晶体结构 : o The 13th amino acid in each repeat, if it is serine/aspartate, then they directly interact with the bases in the DNA to form hydrogen bonds; if it is glycine, then it is thiol with thymine The van der Waals interaction is formed (see Figure 2). Example 3. The crystal structure of the complex of dHax3-A and dHax3-5mC and the crystal structure of the complex of 5dHax3-NN-A and dHax3-CpG were obtained.
如图 3所示, 胸腺嘧啶(T )与 5-甲基胞嘧啶(5mC )表示 5-曱基 胞嘧啶都在第五位有曱基, 而此曱基是与 NG识别唯一的基团, 因此, NG 可能识别 5mC。 据此, 发明人设计了 DNA序列 dHax-5mC (图 4a )  As shown in Figure 3, thymine (T) and 5-methylcytosine (5mC) indicate that 5-mercapto-cytosine has a sulfhydryl group in the fifth position, and this thiol group is the only group recognized by NG. Therefore, NG may recognize 5mC. Accordingly, the inventors designed the DNA sequence dHax-5mC (Fig. 4a)
?.ο 5' TCCT5mCTA5mCCTC5mC 3' (SEQ ID NO:6)  ?.ο 5' TCCT5mCTA5mCCTC5mC 3' (SEQ ID NO: 6)
3' AGGA GAT GGAG G 5' (SEQ ID NO:7)  3' AGGA GAT GGAG G 5' (SEQ ID NO: 7)
为了研究 dHax3-NN变体 CpG岛的识别能力,对发明人设计了 DNA 序列 dHax3-CpG  In order to study the recognition ability of the dHax3-NN variant CpG island, the inventor designed the DNA sequence dHax3-CpG.
5' TG TCCCTT(mC)G(mC)GTCTCT 3, (SEQ ID NO:8)  5' TG TCCCTT(mC)G(mC)GTCTCT 3, (SEQ ID NO: 8)
2 y AC AGGGAA GC GCAGAGA 5' (SEQ ID NO:9) 2 y AC AGGGAA GC GCAGAGA 5' (SEQ ID NO: 9)
采用实施例 2 中所述的方法, 发明人获得并解析了两种复合物晶 体结构, 数据收集和结构修正的统计数据如表 4所示。 表 4 dHax3-A 与 dHax3-5mC 的复合物晶体结构以及 dHax3-NN-A与 :;0 dHax3-CpG的复合物晶体结构的数据收集和结构修正的统计数据 数据 DNA ( dHax3-5mC ) 结合 DNA ( dHax3-CpG ) 结合 的 dHax3-A 的 dHax3-NN-A Using the method described in Example 2, the inventors obtained and analyzed the crystal structures of the two composites, and the statistical data of data collection and structural modification are shown in Table 4. Table 4: Crystal structure of the complex of dHax3-A and dHax3-5mC and statistical data of data collection and structural correction of the crystal structure of the complex of dHax3-NN-A and :0 dHax3-CpG Data DNA (dHax3-5mC) binds DNA (dHax3-CpG) to bind dHax3-A's dHax3-NN-A
Data collection  Data collection
Space Group P21 P21 Space Group P21 P21
Cell dimensions Cell dimensions
a, b, c (A) 81.84, 87.63, 88.46 81.20, 87.1 1, 88.15 a, b, c (A) 81.84, 87.63, 88.46 81.20, 87.1 1, 88.15
β, γ, (° )――—――….―—―.―.― 90, 102.85, 90 90, 102.85, 90  β, γ, (° )——————....―—―.― 90, 102.85, 90 90, 102.85, 90
Resolution (A) 40-1.85 (1.92-1.85) 40-1.95 (2.02-1.95) Rmerge (%) 6.4 (64.1) 6.2 (55.7)  Resolution (A) 40-1.85 (1.92-1.85) 40-1.95 (2.02-1.95) Rmerge (%) 6.4 (64.1) 6.2 (55.7)
1 1 I 21.6 (2.8) 22.3 (2.8)  1 1 I 21.6 (2.8) 22.3 (2.8)
Completeness (%) 99.7 (100.0) 99.7 (99.5)  Completeness (%) 99.7 (100.0) 99.7 (99.5)
Redundancy 3.7 (3.7) 3.7 (3.8)  Redundancy 3.7 (3.7) 3.7 (3.8)
Refinement  Refinement
Resolution (A) 40-1.85 40-1.95  Resolution (A) 40-1.85 40-1.95
No. reflections 103,273 87,970  No. reflections 103,273 87,970
Rwork 1 Rfree (%) 19.79/ 22.97 20.05/22.45 Rwork 1 Rfree (%) 19.79/ 22.97 20.05/22.45
No. atoms No. atoms
Protein 7121 7123  Protein 7121 7123
DNA/RNA 1387 1328  DNA/RNA 1387 1328
Water 832 753 Water 832 753
Ion 0 2  Ion 0 2
B-factors  B-factors
Protein 33.96 39.53  Protein 33.96 39.53
DNA/RNA 34.55 42.19  DNA/RNA 34.55 42.19
Water 39.94 47.88  Water 39.94 47.88
Ion - 59.56  Ion - 59.56
R. m. s. deviations  R. m. s. deviations
Bond lengths (A) 0.007 0.009  Bond lengths (A) 0.007 0.009
Bond angles (。) 1.158 1.344  Bond angles (.) 1.158 1.344
Ramachandran plot  Ramachandran plot
statistics (%) Statistics (%)
Most favoured 93.7 93.2 Most favoured 93.7 93.2
Additional allowed 6.3 6.6 Generously allowed 0.0 0.2 Additional allowed 6.3 6.6 Generously allowed 0.0 0.2
Disallowed 0.0 0.0 发明人解析了 dHax3蛋白与含有 3个 5mC的 DNA的复合物结构, 分辨率高达 1.85 埃。 高分辨率的结构清晰地揭示了 dHax3 蛋白识别 mC的分子机理 (图 4c ) 。  Disallowed 0.0 0.0 The inventors analyzed the complex structure of dHax3 protein with three 5mC DNA with a resolution of 1.85 angstroms. The high-resolution structure clearly reveals the molecular mechanism by which the dHax3 protein recognizes mC (Fig. 4c).
图 8显示了 dHax3-NN变体的 DNA结合结构域与含有两个曱基化 CpG岛 DNA的晶体结构示意图, 其证实了 dHax3-NN-A结合含有两个 曱基化 CpG岛 DNA。 在哺乳动物细胞中, DNA曱基化只发生在 CpG 岛中的 C上。 申请人解析了 TALE与含有两个 CpG岛的 DNA序列的 晶体结构示意图, 进一步证明 TALE对于曱基化修饰的 DNA具有特异 的识别能力。 这对于 TALE应用的拓展具有十分重要的意义。 实施例 4. 凝胶阻滞实验验证 dHax3与具有 5-曱基胞嘧啶 ( 5mC ) 的 DNA双链的结合能力  Figure 8 shows a schematic diagram of the DNA binding domain of the dHax3-NN variant and the crystal structure of the DNA containing two thiolated CpG islands, which confirmed that the dHax3-NN-A binding contains two thiolated CpG island DNAs. In mammalian cells, DNA thiolation occurs only on C in CpG islands. The applicant analyzed the crystal structure of TALE and the DNA sequence containing two CpG islands, further demonstrating that TALE has a specific recognition ability for thiolated DNA. This is very important for the expansion of TALE applications. Example 4. Gel retardation assay demonstrates the ability of dHax3 to bind to DNA duplexes with 5-mercaptocytosine (5mC)
• EMS A ( electrophoretic mobility shift assay, 电泳迁移率变动分 析, 又称凝胶阻滞实验)  • EMS A (electrophoretic mobility shift assay, also known as gel retardation assay)
凝胶阻滞实验是一种体外研究 DNA/RNA 与蛋白质相互作用的特 殊的凝胶电泳技术。 其基本原理为: 在凝胶电泳中, 由于电场的作用, 小分子的核酸片段比其结合了蛋白质的核酸片段向阳极移动的速度 快。 因此 , 可标记短的核酸片段, 将其与蛋白质混合, 对混合物进行 凝胶电泳,若目的 DNA与特异性蛋白质结合,其移动的速度受到阻滞, 对凝胶进行放射自显影, 就可以找到核酸结合蛋白。 同时通过统计结 合蛋白的 DNA 和 未结合蛋白的 DNA 的量, 可以比较准确的拟合计 算出, 蛋白质对核酸的结合能力 ( binding affinity ) 。  The gel retardation assay is a special gel electrophoresis technique that studies the interaction of DNA/RNA with proteins in vitro. The basic principle is: In gel electrophoresis, due to the action of the electric field, the nucleic acid fragment of a small molecule moves faster toward the anode than the nucleic acid fragment to which the protein is bound. Therefore, a short nucleic acid fragment can be labeled, mixed with a protein, and the mixture can be subjected to gel electrophoresis. If the target DNA binds to a specific protein, the speed of movement is retarded, and autoradiography of the gel can be found. Nucleic acid binding protein. At the same time, by statistically comparing the amount of DNA bound to the protein and the amount of DNA of the unbound protein, a more accurate fit calculation can be made to the binding affinity of the protein to the nucleic acid.
• DNA/DNA oligo  • DNA/DNA oligo
用于凝胶阻滞实验的 DNA/DNA oligo的片段, 如下表 5所示: 表 5 用于凝胶阻滞实验的 DNA/DNA oligo的片段序歹' J Fragments of DNA/DNA oligo used in gel retardation experiments are shown in Table 5 below: Table 5 Fragment sequence of DNA/DNA oligo for gel retardation experiments
dHax3-bo 5'-CCACATATGTCATACGTGTCCCTTTATCTCTCTCCAGCTCGAG x GAATTC (SEQ ID NO: 10)  dHax3-bo 5'-CCACATATGTCATACGTGTCCCTTTATCTCTCTCCAGCTCGAG x GAATTC (SEQ ID NO: 10)
5'-GAATTCCTCGAGCTGGAGAGAGATAAAGGGACACGTATGACA TATGTGG (SEQ ID NO: 1 1)  5'-GAATTCCTCGAGCTGGAGAGAGATAAAGGGACACGTATGACA TATGTGG (SEQ ID NO: 1 1)
dHax3-5m 5 '-CC AC ATATGTCAT ACGTGTCCCT TA I CTC 1 CTCCAGCTCGAGG C AATTC (SEQ ID NO: 12)  dHax3-5m 5 '-CC AC ATATGTCAT ACGTGTCCCT TA I CTC 1 CTCCAGCTCGAGG C AATTC (SEQ ID NO: 12)
5 -GAATTCCTCGAGCTGGAGGGAGGTAGAGGGACACGTATGACA TATGTGG (SEQ ID NO: 13)  5 -GAATTCCTCGAGCTGGAGGGAGGTAGAGGGACACGTATGACA TATGTGG (SEQ ID NO: 13)
5C 5mC 5'-CCACATATGTCATACGTGTl l lTTTATlTlTCTCCAGCTCGAGG  5C 5mC 5'-CCACATATGTCATACGTGTl l lTTTATlTlTCTCCAGCTCGAGG
AATTC (SEQ ID NO: 14)  AATTC (SEQ ID NO: 14)
TATGTGG (SEQ ID NO: 15) TATGTGG (SEQ ID NO: 15)
5C-5G 5 -CCACATATGTCATACGTGTGGGTTTATGTGTCTCCAGCTCGAG  5C-5G 5 -CCACATATGTCATACGTGTGGGTTTATGTGTCTCCAGCTCGAG
GAATTC (SEQ ID NO: 16)  GAATTC (SEQ ID NO: 16)
5 -GAATTCCTCGAGCTGl  5 -GAATTCCTCGAGCTGl
TATGTGG (SEQ ID NO: 17)  TATGTGG (SEQ ID NO: 17)
5C-5T 5 -CCACATATGTCATACGTGTTTTTTTATTTTTCTCCAGCTCGAGG  5C-5T 5 -CCACATATGTCATACGTGTTTTTTTATTTTTCTCCAGCTCGAGG
AATTC (SEQ ID NO: 18)  AATTC (SEQ ID NO: 18)
5,-(  5,-(
TATGTGG (SEQ ID NO: 19)  TATGTGG (SEQ ID NO: 19)
5C-5A 5'-CCACATATGTCATACGTGTAAATTTATATATCTCCAGCTCGAG  5C-5A 5'-CCACATATGTCATACGTGTAAATTTATATATCTCCAGCTCGAG
GAATTC (SEQ ID NO:20)  GAATTC (SEQ ID NO: 20)
5'-(  5'-(
TATGTGG (SEQ ID NO;21)  TATGTGG (SEQ ID NO; 21)
6T-6mC 5 '-CC AC AT ATGTC AT ACGTGTCCC l l l AlClCl CTCCAGCTCGAGG  6T-6mC 5 '-CC AC AT ATGTC AT ACGTGTCCC l l l AlClCl CTCCAGCTCGAGG
AATTC (SEQ ID NO:22)  AATTC (SEQ ID NO: 22)
5,-GAATTCCTCGAGCTGGAGGGGGGTGGGGGGACACGTATGACA TATGTGG (SEQ ID NO:23)  5,-GAATTCCTCGAGCTGGAGGGGGGTGGGGGGACACGTATGACA TATGTGG (SEQ ID NO: 23)
6T 6C 5 -CCACATATGTCATACGTGTCCCCCCACCCCCCTCCAGCTCGAG  6T 6C 5 -CCACATATGTCATACGTGTCCCCCCACCCCCCTCCAGCTCGAG
GAATTC (SEQ ID NO:24)  GAATTC (SEQ ID NO: 24)
5 -GAATTCCTCGAGCTGGAGGGGGGTGGGGGGACACGTATGACA TATGTGG (SEQ ID NO:25)  5 -GAATTCCTCGAGCTGGAGGGGGGTGGGGGGACACGTATGACA TATGTGG (SEQ ID NO: 25)
1 表示曱基化胞嘧啶  1 represents thiolated cytosine
识别序列突出显示。 • DNA/RNA 末端标记 The recognition sequence is highlighted. • DNA/RNA end labeling
待磷酸化 DNA 1 ~ 20 pmol(5'末端) 反应緩沖液 A(10X) 2 μΐ  To be phosphorylated DNA 1 ~ 20 pmol (5' end) Reaction buffer A (10X) 2 μΐ
[γ- 2Ρ]-ΑΤΡ (3,000Ci/mmol) 20 pmol [γ- 2 Ρ]-ΑΤΡ (3,000 Ci/mmol) 20 pmol
补充无核酸酶的去离子水 至 19 μΐ  Replenish nuclease-free deionized water to 19 μΐ
T4多聚核苷酸激酶 (lOU/μΙ) 1 μΐ 按照上表设置好反应体系后, 轻轻混匀, 置于 37 °C孵育 30 min; 使用 G25 预装脱盐层析柱出去多余的 [γ-32Ρ]-ΑΤΡ, 加入过量的未标记 的互补链, 退火生成双链 DNA或者 DNA-RNA 杂合双链。 T4 polynucleotide kinase (lOU/μΙ) 1 μΐ After setting up the reaction system according to the above table, gently mix and incubate at 37 °C for 30 min; use G25 pre-installed desalting column to remove excess [γ] - 32 Ρ]-ΑΤΡ, adding an excess of unlabeled complementary strands, annealing to generate double-stranded DNA or DNA-RNA hybrid double strands.
• DNA/RNA和蛋白相互作用体系  • DNA/RNA and protein interaction systems
全长蛋白(不同浓度) 5 ul  Full length protein (different concentrations) 5 ul
DNA /RNA 2 ul  DNA /RNA 2 ul
5Χ緩沖液 2 ul  5 Χ buffer 2 ul
ddH20 1 ul 将反应成分按上述比例加入反应体系中,混匀后 4 °C孵育 20 min; 将反应好的样品跑 6 % 非变性胶;  ddH20 1 ul The reaction components were added to the reaction system in the above ratio, and mixed for 4 min at 4 ° C; the reacted sample was run 6 % non-denaturing gel;
跑完胶用干胶仪将胶干透, 放在磷屏上曝光过夜;  After running the glue, dry the glue and put it on the phosphor screen for exposure overnight;
用 Typhoon 9400 varible scanner 读取图像数据。  Image data was read with a Typhoon 9400 varible scanner.
通过 EMS A检测了 dHax3蛋白与具有 5-甲基胞嘧啶( 5mC )的 DNA 的相互作用。 结合能力没有明显减弱 (详见图 4b ) 。 图 7显示 dHax3 中的一种 RVD—— NG——不能结合没有曱基化修饰的胞嘧啶; 而 dHax3 中的一种 RVD—— HD——对于胞嘧啶 (C ) 是特异性的识别, 并且胞嘧啶的甲基化修饰会影响 HD与胞嘧啶的识别。  The interaction of dHax3 protein with DNA with 5-methylcytosine (5mC) was detected by EMS A. The binding capacity is not significantly weakened (see Figure 4b for details). Figure 7 shows an RVD in dHax3 - NG - unable to bind to a cytosine without thiolation modification; and an RVD - HD in dHax3 - is specifically recognized for cytosine (C), and Methylation of cytosine affects the recognition of HD and cytosine.
尽管在本文中参考示例性的实施方案详细描述了本发明,但是应当 理解的是, 本发明不限于所述实施方案。 具有本领域普通技能且可获 取本文教导的人员会认识到在本发明范围内的其它变化、 修改和实施 方案。 因此, 本发明应与后面所述的权利要求一致地被广义地解释。  Although the invention has been described in detail herein with reference to exemplary embodiments thereof, it is understood that the invention is not limited to the embodiments. Other variations, modifications, and embodiments within the scope of the invention will be apparent to those of ordinary skill in the art. Therefore, the invention should be construed broadly in accordance with the appended claims.

Claims

权 利 要 求 Rights request
1 . 检测 DNA中的胞嘧啶甲基化的方法,包括用 TALE蛋白及其衍 生蛋白来特异性识别 DNA中的 5-曱基胞嘧啶。 1. A method for detecting cytosine methylation in DNA, comprising specifically recognizing 5-mercapto-cytosine in DNA using TALE protein and its derivative protein.
2. 权利要求 1 的方法, 其中采用两种不同的 TALE蛋白及其衍生 蛋白的重组蛋白, 分别特异性识别靶标序列中的胞嘧啶和 5-曱基胞嘧 啶。  2. The method of claim 1, wherein two different TALE proteins and recombinant proteins thereof are used to specifically recognize cytosine and 5-mercapto-cytosine in the target sequence, respectively.
3. 权利要求 1或 2的方法, 其中所述方法用于检测 CpG岛的甲基 化。  3. The method of claim 1 or 2, wherein the method is for detecting methylation of a CpG island.
4. TALE蛋白及其衍生蛋白用于特异性识别 DNA中的 5-曱基胞嘧 啶的用途。  4. The use of the TALE protein and its derived protein for the specific recognition of 5-mercapto-cytosine in DNA.
5. TALE蛋白及其衍生蛋白在制备用于特异性识别 DNA中的 5-曱 基胞嘧啶的试剂中的用途。  5. Use of a TALE protein and a derivative thereof for the preparation of a reagent for specifically recognizing 5-mercaptocytosine in DNA.
6. TALE 蛋白及其衍生蛋白在制备用于诊断或治疗癌症的药物中 的用途,所述诊断或治疗是通过特异性识别 DNA中的 5-甲基胞嘧啶来 进行的。  6. Use of a TALE protein and a protein thereof for the preparation of a medicament for the diagnosis or treatment of cancer, which is carried out by specifically recognizing 5-methylcytosine in DNA.
7. 诊断或治疗癌症的方法, 其中通过 TALE蛋白及其衍生蛋白特 异性识别 DNA中的 5-甲基胞嘧啶来进行。  7. A method of diagnosing or treating cancer, wherein the TALE protein and its derived protein specifically recognize 5-methylcytosine in the DNA.
8. TALE蛋白及其衍生蛋白, 其用于特异性识别 DNA中的 5-曱基 胞嘧啶。  8. TALE protein and its derived protein, which are used to specifically recognize 5-mercapto-cytosine in DNA.
9. TALE蛋白及其衍生蛋白, 其用于诊断或治疗癌症, 所述诊断或 治疗是通过特异性识别 DNA中的 5-曱基胞嘧啶来进行的。  9. A TALE protein and a derivative thereof for use in the diagnosis or treatment of cancer by specifically recognizing 5-mercapto-cytosine in DNA.
PCT/CN2012/001718 2012-01-04 2012-12-21 Method for specifically recognizing dna containing 5-methylated cytosine WO2013102290A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201280060513.0A CN103987860B (en) 2012-01-04 2012-12-21 Method for specifically recognizing DNA containing 5-methylated cytosine

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210021039.2 2012-01-04
CN201210021039 2012-01-04

Publications (1)

Publication Number Publication Date
WO2013102290A1 true WO2013102290A1 (en) 2013-07-11

Family

ID=48744961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/001718 WO2013102290A1 (en) 2012-01-04 2012-12-21 Method for specifically recognizing dna containing 5-methylated cytosine

Country Status (2)

Country Link
CN (1) CN103987860B (en)
WO (1) WO2013102290A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104498594A (en) * 2014-12-04 2015-04-08 李云英 TALEs double-recognition detection method and application thereof
WO2015052335A1 (en) * 2013-10-11 2015-04-16 Cellectis Methods and kits for detecting nucleic acid sequences of interest using dna-binding protein domain
WO2019024081A1 (en) * 2017-08-04 2019-02-07 北京大学 Tale rvd specifically recognizing dna base modified by methylation and application thereof
CN109384833A (en) * 2017-08-04 2019-02-26 北京大学 The TALE RVD of specific recognition methylation modifying DNA base and its application
US11624077B2 (en) 2017-08-08 2023-04-11 Peking University Gene knockout method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105154558B (en) * 2015-09-22 2018-10-09 武汉大学 A method of methylated cytosine in detection DNA

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110265198A1 (en) * 2010-04-26 2011-10-27 Sangamo Biosciences, Inc. Genome editing of a Rosa locus using nucleases
WO2011146121A1 (en) * 2010-05-17 2011-11-24 Sangamo Biosciences, Inc. Novel dna-binding proteins and uses thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110265198A1 (en) * 2010-04-26 2011-10-27 Sangamo Biosciences, Inc. Genome editing of a Rosa locus using nucleases
WO2011146121A1 (en) * 2010-05-17 2011-11-24 Sangamo Biosciences, Inc. Novel dna-binding proteins and uses thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONG DENG ET AL.: "Recognition of methylated DNA by TAL effectors", CELL RESEARCH, vol. 22, 4 September 2012 (2012-09-04), pages 1502 - 1504, XP055084118, DOI: doi:10.1038/cr.2012.127 *
MAGDY M.MAHFOUZ ET AL.: "De novo-engineered transcription activator-like effector (TALE) hybrid nuclease with novel DNA binding specificity creates double-strand breaks", PNAS, vol. 108, no. 6, 8 February 2011 (2011-02-08), pages 2623 - 2628, XP055007615 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015052335A1 (en) * 2013-10-11 2015-04-16 Cellectis Methods and kits for detecting nucleic acid sequences of interest using dna-binding protein domain
CN104498594A (en) * 2014-12-04 2015-04-08 李云英 TALEs double-recognition detection method and application thereof
WO2019024081A1 (en) * 2017-08-04 2019-02-07 北京大学 Tale rvd specifically recognizing dna base modified by methylation and application thereof
CN109384833A (en) * 2017-08-04 2019-02-26 北京大学 The TALE RVD of specific recognition methylation modifying DNA base and its application
CN111278848A (en) * 2017-08-04 2020-06-12 北京大学 TALE RVD for specifically recognizing methylated modified DNA base and application thereof
US11897920B2 (en) 2017-08-04 2024-02-13 Peking University Tale RVD specifically recognizing DNA base modified by methylation and application thereof
US11624077B2 (en) 2017-08-08 2023-04-11 Peking University Gene knockout method

Also Published As

Publication number Publication date
CN103987860B (en) 2017-04-12
CN103987860A (en) 2014-08-13

Similar Documents

Publication Publication Date Title
KR102606680B1 (en) S. Pyogenes ACS9 mutant gene and polypeptide encoded thereby
CN102796728B (en) Methods and compositions for DNA fragmentation and tagging by transposases
Lee et al. An improved SUMO fusion protein system for effective production of native proteins
WO2013102290A1 (en) Method for specifically recognizing dna containing 5-methylated cytosine
Manvilla et al. Crystal structure of human methyl-binding domain IV glycosylase bound to abasic DNA
Aparicio et al. Mycoplasma genitalium adhesin P110 binds sialic-acid human receptors
US11390856B2 (en) Variants of family a DNA polymerase and uses thereof
WO2017090684A1 (en) Dna polymerase mutant
JP4486009B2 (en) DNA ligase mutant
Sarre et al. Structural and functional characterization of two unusual endonuclease III enzymes from Deinococcus radiodurans
Papagiannis et al. Fis targets assembly of the Xis nucleoprotein filament to promote excisive recombination by phage lambda
CN106893698A (en) One kind restructuring Taq archaeal dna polymerases and its encoding gene and expression
Liu et al. Structural insights into the specific recognition of 5-methylcytosine and 5-hydroxymethylcytosine by TAL effectors
Marino et al. Translation-dependent downregulation of Cas12a mRNA by an anti-CRISPR protein
Devroede et al. Purine and pyrimidine-specific repression of the Escherichia coli carAB operon are functionally and structurally coupled
Annamalai et al. Analysis of DNA relaxation and cleavage activities of recombinant Mycobacterium tuberculosis DNA topoisomerase I from a new expression and purification protocol
Zhang et al. Archaeal DNA helicase HerA interacts with Mre11 homologue and unwinds blunt-ended double-stranded DNA and recombination intermediates
WO2013102289A1 (en) Specific binding and targeting method for dna-rna heteroduplex
JP2022519308A (en) Mini-circle-producing bacteria engineered to differentially methylate the nucleic acid molecules in them
JP2021531037A (en) Non-standard amino acid-containing compositions and their use
EP2718430A2 (en) Sequence-specific engineered ribonuclease h and the method for determining the sequence preference of dna-rna hybrid binding proteins
Pereira et al. A simple strategy for the purification of native recombinant full-length human RPL10 protein from inclusion bodies
Steiniger-White et al. Evidence for “unseen” transposase–DNA contacts
JP2010063373A (en) Monomer type streptavidin mutant and method for producing the same
Owen et al. The identification of a novel alternatively spliced form of the MBD4 DNA glycosylase

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12864154

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12864154

Country of ref document: EP

Kind code of ref document: A1