CN104694490A

CN104694490A - TET2protein and 5-mC DNA three-dimensional structure and application thereof

Info

Publication number: CN104694490A
Application number: CN201310647635.6A
Authority: CN
Inventors: 徐彦辉; 胡璐璐; 李泽; 程净东; 巩微; 饶钦辉; 刘梦杰; 朱佳玉; 王平
Original assignee: Individual
Current assignee: Individual
Priority date: 2013-12-04
Filing date: 2013-12-04
Publication date: 2015-06-10

Abstract

The invention relates to a TET2 gene modified, protein expression purified, TET2-5-mC DNA compound crystal structure and an application thereof, and provides polypeptide of an amino acid sequence shown as a following formula: in a X-L-Y formula, X can express fragments of SEQ ID NO: 2 at least containing 1129th -1448th bits amino acid residues of SEQ ID NO:2; Y can express fragments of SEQ ID NO: 2 at least containing 1844th -1925th bits amino acid residues of SEQ ID NO:2; L can express a linker sequence, and the polypeptide is a truncated fragment of TET2 protein. The polypeptide can be massively extracted and purified from escherichia coli, and has DNA hydroxymethylase activity, and the property is stable. The invention also comprises an application of polypeptide for detecting that 5-hmC in a DNA sequence exists or not, catalyzing DNA sequence 5-mC oxidation and researching TET protein and DNA effecting mechanism.

Description

The three-dimensional structure of TET2 albumen and 5-mC DNA and application thereof

Technical field

The application belongs to structure biology field, relates to TET2 genetic modification, protein expression and purification, TET2-5-mC DNA complex crystal structure and application.

Background technology

Malignant tumour and disease in the blood system are all the maximum killer of harm humans life and health for a long time, and newly-increased cases of cancer mortality with the speed increase of annual 1%.Infer, by the year two thousand thirty, in global range according to international cancer research institution of the World Health Organization (WHO), the patient's number making a definite diagnosis cancer will increase by more than 1 times than at the beginning of 21 century, namely end to the year two thousand thirty, global 2,700 ten thousand people that will have an appointment are diagnosed as cancer, and about 1,700 ten thousand people die from cancer.At present, the World Health Organization and hygiene department of national governments are classified as capture cancer as a top priority.

Along with the increase of epigenetics temperature in recent years, people more and more recognize the significance of epigenetics in genomic imprinting, gene expression regulation, embryonic stem cell maintain and differentiation, fetal development and adult hematopoietic function, malignant hematologic disease and tumour occur.Epigenetics (epigenetics) is the genetic mechanism that research causes genetic expression or Lymphocytic phenotype under the constant prerequisite of DNA base sequence, mainly comprise DNA(cytosine(Cyt), cytosine, C) methylate, histone modification and Chromatin Remodeling etc.

Higher eucaryotic cells DNA methylation is mainly cytosine(Cyt) (C) and methylates, and namely forms 5-methylcytosine (5-mC).5-mC is converted to C and is DNA demethylation.Cell DNA methylates/and demethylation is in dynamic balance state, expresses and cell functionating with regulatory gene.DNA methylation and demethylation occur all to be subject to strict regulation and control, these regulation and control are grown with individual growth and are regulated closely related with physiological activity, disease occurs often to modify disorderly [OKANO M with the abnormal caused DNA methylation of DNA methylation regulatory mechanism, BELL D W, HABER D A, et al.DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development [J] .Cell, 1999,99 (3): 247-257].DNA methylation is primarily of dnmt rna (DNA methyltransferase, DNMT) catalysis.DNA demethylation comprises the demethylation of full-length genome scope and specific gene site DNA, and its mechanism is comparatively complicated, is divided into passive and active demethylation two class.Passive demethylation refers in DNA replication dna process, causes methylation level on new synthetic DNA chain to be diluted along with copying of DNA because DNMT activity is suppressed.Initiatively demethylation is then independent of DNA replication dna, by relevant enzyme progressively catalysis 5-mC be converted into C, initiatively in demethylation process, on the one hand can by the direct excision of enzyme catalysis 5-mC methyl group [WU S C, ZHANG Y.Active DNA demethylation:many roads lead to Rome [J] .Nature Reviews Molecular Cell Biology, 2010,11 (9): 607-620]; Utilize the DNA damage such as base or nucleotide excision repair mechanism on the other hand, 5-mC or the 5-mC after chemical group is modified is converted into C [WU S C, ZHANG Y.Active DNA demethylation:many roads lead to Rome [J] .Nature Reviews Molecular Cell Biology, 2010,11 (9): 607-620; SANCAR A, LINDSEY B L A, UNSAL K K, et al.Molecular mechanisms of mammalian DNA repair and the DNA damage checkpoints [J] .Annu Rev Biochem, 2004,73:39-85].

Large quantity research shows, TET protein family can be oxidized to 5-hydroxymethyl cytosine (5-hmC) by catalysis 5-mC, and 5-hmC is otherwise known as the 6th kind of base, but lacks detailed understanding to the distribution of this base and importance always.5-hmC is found at first in the DNA of many animals, and has a large amount of distribution in embryo and neural system in early days.TET protein family has the hydroxylated enzymic activity of catalysis 5-mC, and 5-hmC distribution on chromosome and genetic expression have obvious relation ［ Loenarz C, Schofield CJ.Oxygenase catalyzed5-methylcytosine hydroxylation ［ J ］ .Chem Biol, 200916 (6): 580-583 ］.Studied discovery afterwards, can detect that in the genomic dna of various kinds of cell 5-hmC is oxidized by TET albumen further, generate 5-acyl group cytosine(Cyt) (5-formylcytosine, 5fC) with 5-carboxyl cytosine(Cyt) (5-carboxylcytosine, 5caC) [S.Ito, L.Shen, et al.Tet proteins can convert 5-methylcytosine to5-formylcytosine and5-carboxylcytosine [J] .Science, 2011, 333:1300-1303], they are all considered to the intermediate product of demethylation process, it is DNA epigenetic mark important in Mammals.

There are some researches show, the apparent tumour that is modified at occurs and all plays an important role in evolution, and multinomial experiment finds, TET protein gene mutation or textural anomaly and closely related ［ the Langemeijer SM of Hematopoietic Malignancies, Aslanyan MG, Jansen JH.TET proteins in malignant hematopoiesis ［ J ］ .Cell Cycle, 2009,8 (24): 4044-4048 ］.In March, 2009, first the people such as Tefferi A find, there is TET2 sudden change in the positive MPNs patient of JAK2V617F about 14%, and these sudden changes are present in early stage hemopoietic stem cell (CD34+CD38-), this important discovery makes TET2 become the study hotspot in nearest neoplastic hematologic disorder field.Large quantity research subsequently finds, other medullary system malignant tumours, as chronic myelomonocytic leukemia (chronic myelomonocyticleukemia, CMML), myelodysplastic syndrome (myelodysplastic syndrome, MDS), also there is TET protein mutant in the patient body such as acute myelocytic leukemia (Acute Myelocytic Leukemia, AML) and M7AML.Studies have found that; phenomenon ［ the Lorsbach RB that there is TET1 and ZNFN3A1 mll gene and merge is identified in multiple AML patient; Moore J; Mathew S; et al.TET1; a member of a novel protein family, is fused to MLL in acute myeloid leukemia containing the t (10; 11) (q22; Q23) ［ J ］ .Leukemia, 2003,17 (3): 637-641 ］.TET2 is at myeloproliferative tumour (myeloproliferative neoplasms, MPN) in, mutation rate is 7.6%, be be 12% ［ Abdel-Wahab O, Mullally A in 42%, AML in CMML, Hedvat C, et al.Genetic characterization of TET1, TET2, and TET3alterations in myeloid malignancies ［ J ］ .Blood, 2009,114 (1): 144-147 ］.TET2 is sudden change MDS genes involved ［ the Langemeijer SM the most frequently identified so far, Kuiper RP, Berends M, et al.Acquired mutations in TET2are common in myelodysplastic syndromes ［ J ］ .Nat Genet, 2009, 41 (7): 838-842 ］, TET2 transgenation is identified in the MDS patient of about 20%, and it is closely related with clinical symptom, therefore can be used as a kind of diagnosis molecular marker ［ Kosmider O well, Gelsi-Boyer V, Cheok M, et al.TET2mutation is an independent favorable prognostic factor in myelodysplastic syndromes (MDSs) ［ J ］ .Blood, 2009, 114 (15): 3285-3291 ］.

TET2 full name is TET Oncogene family member 2(TET oncogene family member2), its gene is positioned on karyomit(e) 4q24, and containing 11 exons, mRNA has three kinds of alternative splicing bodies, in vivo wide expression.TET2 albumen has three kinds of hypotypes and is respectively hypotype 1,2,3, and wherein the amino acid whose TET2 protein subunit 1 of total length 2002 comprises two evolution high conservative regions.In vertebrates, there are three kinds of conservative TET albumen: TET1, TET2, TET3, TET1 at embryonic stem cell (ES cells), TET2 in hematopoietic cell, TET3 great expression in ovocyte and zygote.TET albumen relates to multiple biological processes, as body early embryo formation, differentiation of stem cells and hemoposieis [M.R.Branco, G.Ficz, W.Reik, Uncovering the role of5-hydroxymethylcytosine in the epigenome [J] .Nat.Rev.Genet., 2011,13:7-13; H.Wu, Y.Zhang., Mechanisms and functions of Tet protein-mediated5-methylcytosine oxidation [J] .Genes & Dev., 2011,25:2436-2452].TET albumen belongs to α-ketoglutaric acid (alpha-ketoglutaric acid, α-KG) and Fe ²⁺the dioxygenase relied on, 1 catalyst structure domain (catalytic/dioxygenase domain) is had in the position near C end, this structural domain has the binding site of 3 metal ions (Fe2+) and 1 α-KG, also has one section of region (Cys-rich domain) of being rich in halfcystine before catalyst structure domain.The catalyst mechanism of TET albumen is: oxygen and α-KG α-KG oxidative decarboxylation while providing hydroxyl for 5-mC generates succsinic acid, this process also needs participation ［ the Xu W of Fe2+ and xitix, Yang H, Liu Y, et al.Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of α-ketoglutarate-dependent dioxygenases ［ J ］ .Cancer Cell, 2011,19 (1): 17-30 ］.On the one hand, 5-hmC can generate 5-hydroxylmethyluracil (5-hydroxymethyluracil further under deaminase AID (activation-induced deaminase) catalysis, 5hmU), 5hmU can by thymus pyrimidine-DNA glycosylase (thymine-DNA glycosylase, TDG) identify and excise, again by base excision repair (base excision repair, BER) approach is converted into C in this site the most at last, thus realize DNA demethylation ［ Guo JU, Su Y, Zhong C, et al.Hydroxylation of5-methylcytosine by TET1promotes active DNA demethylation in the adult brain ［ J ］ Cell, 2011, 145 (3): 423-434 ］, on the other hand, 5-hmC also realizes the demethyl effect of 5-mC by other mechanism.Research finds, further for 5-hmC catalysis can also be 5fC and 5caC by TET albumen itself, the 5caC generated can be identified by TDG and excise ［ 15.He YF, Li BZ, Li Z, et al.Tet-mediated formation of5-carboxylcytosine and its excision by TDG in mammalian DNA ［ J ］ .Science, 2011, 333 (6047): 1303-1307 ］, also demethylation ［ the Nabel CS of the generation of this site C can be realized under BER mechanism, Kohli RM.Molecular biology.Demystifying DNA demethylation ［ J ］ .Science, 2011, 333 (6047): 1229-1230 ］.Add the participation of DNMT family, 3 kinds of base Type C in site same on DNA, 5-mC and 5-hmC can be realized dynamic balance, this also can regard the reversible methylated mechanism of DNA as.

The understanding to DNA base modification complicacy has greatly been expanded in the 5-mC hydroxylation modification of TET proteins carry and the discovery of physiological function thereof, also deepened the understanding to the reversible methylation mechanism of DNA and biological function simultaneously, thus provide new explanation to some important life problems and relative disease mechanism, the tumour that people are caused abnormal T ET and hemopathicly had new understanding, thus provide novel targets for such medical diagnosis on disease, prevention and therapy.

In addition; in June, 2012, Cell reported the achievement in research of He Chuan study group of Chicago University; refer in literary composition and utilize TET albumen to carry out the auxiliary bisulfite sequencing technologies/TAB-Seq(TET-assisted Bisulfite Sequencing of the sequence measurement-TET of the DNA single base discrimination rate pinpoint of new generation of all 5-hmC of mammalian genes group) [Yu M, et al.Base-resolution analysis of5-hydroxymethylcytosine in the mammalian genome.Cell.2012; 149:1368 – 1380].In bisulfite sequencing technologies, bisulfite can make that methylated cytosine(Cyt) does not occur in DNA and change uridylic (U) into by desamination reaction, and to methylate or methylolated cytosine(Cyt) remains unchanged, required DNA fragmentation is amplified by PCR (polymerase chain reaction), uridylic in sequence changes thymus pyrimidine (T) into by copying, and the sequencing result compared before and after bisulfite process just can judge whether methylated site occurs.But, because methylated cytosine(Cyt) (5-mC) and methylolated cytosine(Cyt) (5-hmC) can resist the effect of this deaminizating, conventional bisulfite sequencing technologies cannot distinguish 5-mC and 5-hmC, thus cannot realize the order-checking of single base discrimination rate, become the bottleneck in the research fields such as the discovery of many methylolation sites and relative disease diagnosis.He Chuan study group then uses β-glucanotransferase (β-glucosyltransferase, β GT) suction pressure of a part to 5-hmC will be formed β-glucosyl-5-hydroxymethyl cytosine (5-gmC), prevent 5-hmC from being continued oxidation by TET, and the 5-mC on DNA fragmentation utilizes the Tet1 albumen in mouse source to be oxidized generation 5-caC further, after using bisulfite process, all C and 5-caC become U or 5-carboxyuracil (5-caU), T is changed into after amplification, and be still present in sequence with the form of 5-gmC by the 5-hmC that glycosyl is protected, exist with the form of C after amplification.By order-checking compare of analysis, the accurate location of 5-hmC in this DNA fragmentation can be obtained, 5-mC and 5-hmC on DNA is distinguished.This study group uses the Tet1 proteins carry structural domain (1367-2039 amino acids) of the Flag-tag in mouse source, extracts and obtained by anti-FLAG M2 antibody-agarose affinity gel purifying from the Sf9 cell of transfection.

From discovery TET albumen so far, its three-dimensional structure is always unknown, and partly cause is that cannot to take the character of the albumen of a large amount of enough purity or albumen stable not.Extensively exist in mammalian cell due to TET2 albumen and there is very important physiological significance, focus concentrates on and takes activity, well-behaved and can the TET albumen of crystallization by we, well-behaved and that tool enzyme is lived TET truncated protein can be applied to and detect that 5-hmC in DNA sequence dna exists, the purposes such as 5-mC oxidation in catalytic dna sequence; Also the interactional detailed mechanism of TET albumen and DNA can be explained from structure, and then TET protein-active can be regulated and controled, prepare corresponding antibodies by the existence of detection TET protein mutant, design and screening small-molecule drug, for prediction, prevention and therapy marrow hemopoiesis disease provide solid theoretical basis.In addition, the process of the mouse source Tet1 Protein expression and purification that He Chuan study group uses in sequencing technologies is complicated and input cost is higher, we by intestinal bacteria great expression and preparation, there is greater activity, TET albumen that stable in properties is homogeneous has extraordinary superiority.

Summary of the invention

The present inventor utilizes several truncated protein of people HEK293T cell expressing, lived by dot-blot enzyme and test (see figure 1) discovery, brachymemma before TET2 albumen n end 1129 amino acids or C hold the brachymemma after 1936 amino acids for its enzyme live impact minimum, and N hold 1156 amino acids before brachymemma or C hold 1913 amino acids after brachymemma can reduce protease activity very significantly.In the albumen of all structures, TET2(1129-1936) be the shortest and there is the albumen that enzyme lives.TET2 PROTEIN C end catalyst structure domain is rich in Cysteine domains, double-strand β-spiral (double-stranded β-helix by one, DSBH) the insertion section composition of structural domain and one section of low-complexity, report is had early stage to prove, this insertion section conservative property is low and without fixing three-dimensional structure, its flexibility may destroy the neat accumulation of crystal.Therefore the present inventor is cloned into TET1, TET2, the TET3 in people source and the Tet1 gene in mouse source, utilize engineered means by different brachymemma and the catalyst structure domain that deletes different insert district gene constructed in prokaryotic expression carrier pPEI, carry out the expression and purification of albumen.Have finally chosen and deleted 1481-1843 position residue, the TET2 proteins carry structural domain (1129-1936) replaced by the GS joint sequence (GGGGSGGGGSGGGGS) of 15 residues, enough soluble proteins can be purified to use GST label in intestinal bacteria, and utilize affinity chromatography, ion exchange chromatography, molecular exclusion chromatography three step purifying method, obtain purity be at least 95% stable in properties, homogeneous and have hydroxymethylase enzyme live protein.

By collecting Zn-SAD data, resolving phase place, and then solving the structure of whole Protein-DNA complex.Under NOG (N-oxalylglycine, NOG, a kind of analogue of same pentanedioic acid) existent condition, different TET2 albumen and DNA are mixed to get mixture.People TET2(1129-1936) and the mixture of the DNA of the albumen that replaced by the GS Linker of 15 residues of the 1481 to 1843 amino acids and 12bp provides optimum crystal, structural models resolving power is finally optimized to (referring to Fig. 2).TET2 albumen have employed a kind of protein folding mode never reported, its N-terminal cysteine rich structural domain (Cys-Rich domain) is wrapped the catalyst structure domain (DSBH domain) of carboxyl terminal and catalyzed structural domain is divided into two portions, double-stranded DNA is positioned at above the catalyst structure domain that is wound, DNA double spiral is pried open by hydrophobic interaction by the coiled structure of being stretched out by cysteine rich structural domain, and 5-mC routs up and is inserted in catalytic pocket and then completes hydroxylation procedures.By further analyzing discovery, the VITAMIN B4 of 11 cytosine(Cyt)s on 1129-1131,1136-1143,1925-1936,1464-1481 residue of TET2 albumen and GS Linker and DNA double chain, 12 thymus pyrimidines and 12 lacks electron density and does not thus show in a model, may be caused by the flexibility of crystalline structure.Because the locus of 1481-1843 amino acids is comparatively far away apart from core texture, infer that this section of insert district section is positioned at the outside surface of TET2 catalyst structure domain, and delete this section of sequence not to the raw too much influence of the enzyme life birth of albumen.The composite structure of TET2 truncated protein and DNA shows, two sections of ring-shaped area (called after L1 and L2 of carboxyl terminal Cys subdomain, as Fig. 3) be positioned at the both sides of DSBH core, form shallow ridges to interact with DNA, L2(1288-1312 amino acids) insert in the major groove of DNA, and L1(1256-1273 amino acids) support DNA in other side, three zinc fingerses are for the katalysis of stabilize proteins and also significant be combineding with each other of DNA.Just therefore, the 1129-1936 that contriver intercepts the TET2 catalyst structure domain albumen deleting 1481-1843 end had both possessed stronger enzyme lives, obtains again the albumen of a large amount of character stable homogeneous by escherichia expression system.

Based on above-mentioned discovery, the invention provides a peptide species, be selected from:

(1) truncated segment at least having clipped 1490-1830 amino acids residue of SEQ ID NO:2, wherein, the aminoacid sequence of described truncated segment is shown below:

X-L-Y (formula I)

In formula,

X represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1129-1448 amino acids residue;

Y represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1844-1925 amino acids residue;

L represents joint sequence; With

(2) in the peptide sequence described in (1), there is one or several aminoacid insertion, replacement or deletion, but still retain the polypeptide of methylolation enzymic activity.

In one embodiment, X is a-b-c, wherein, a is SEQ ID NO:2 the (n-1)th 128 amino acids residue, n is the integer of 1 to 1128, b is SEQ ID NO:2 1129-1448 amino acids residue, and c is SEQ ID NO:2 1449-m amino acids residue, and wherein m is the integer between 1450 to 1489; Y is d-e-f, wherein, d is the optional SEQ ID NO:2 g-1843 amino acids residue existed, e is SEQ ID NO:2 1844-1925 amino acids residue, f is SEQ ID NO:2 1926-h amino acids residue, wherein g is the integer between 1831 to 1842, and h is integer between 1927 to 2002.

In one embodiment, the polypeptide shown in formula (I) is selected from:

(1) truncated segment at least having clipped 1481-1843 amino acids residue of SEQ ID NO:2, wherein, X represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1129-1448 amino acids residue; Y represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1844-1925 amino acids residue; L represents joint sequence; With

In one embodiment, X can be expressed as a-b-c, wherein, a is SEQ ID NO:2 the (n-1)th 128 amino acids residue, and n is the integer of 1 to 1128, and b is SEQ ID NO:2 1129-1448 amino acids residue, c is SEQ ID NO:2 1449-m amino acids residue, wherein m is the integer between 1450 to 1480, and such as, m is 1461.

In one embodiment, Y can be expressed as e-f, and wherein, e is SEQ ID NO:2 1844-1925 amino acids residue, and f is SEQ ID NO:2 1926-h amino acids residue, wherein h be 1927 to 2002 between integer, such as, h is 1936 or 2002 etc.

In one embodiment, n be 970 or 1099, m be 1461 or 1480.

In one embodiment, X represents the fragment of the SEQ ID NO:2 containing SEQ ID NO:2 1129-1461 amino acids residue; Y represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1844-1936 amino acids residue.

In one embodiment, X represents the fragment of the SEQ ID NO:2 containing SEQ ID NO:2 1129-1480 amino acids residue; Y represents the fragment of the SEQ ID NO:2 at least containing SEQ ID NO:2 1844-1936 amino acids residue.

In one embodiment, L is the joint sequence of long 2-20 the amino-acid residue containing G and S.

In one embodiment, X is SEQ ID NO:2 1099-1480 amino acids residue, and Y is SEQ ID NO:2 1844-1936 amino acids residue.

In one embodiment, X is SEQ ID NO:2 1099-1461 amino acids residue, and Y is SEQ ID NO:2 1844-1925 amino acids residue.

In one embodiment, X is SEQ ID NO:2 1129-1480 amino acids residue, and Y is SEQ ID NO:2 1844-1936 amino acids residue.

In one embodiment, X is SEQ ID NO:2 1129-1461 amino acids residue, and Y is SEQ ID NO:2 1844-1925 amino acids residue.

In one embodiment, TET2 polypeptide fragment of the present invention is selected from the polypeptide being made up of (N to C end) following aminoacid sequence:

(1) joint sequence shown in SEQ ID NO:2 1099-1480 amino acids residue, SEQ ID NO:8 and SEQ ID NO:2 1844-1936 amino acids residue;

(2) joint sequence shown in SEQ ID NO:2 1099-1461 amino acids residue, SEQ ID NO:8 and SEQ ID NO:2 1844-1925 amino acids residue;

(3) joint sequence shown in SEQ ID NO:2 1129-1480 amino acids residue, SEQ ID NO:8 and SEQ ID NO:2 1844-1936 amino acids residue; With

(4) joint sequence shown in SEQ ID NO:2 1129-1461 amino acids residue, SEQ ID NO:8 and SEQ ID NO:2 1844-1925 amino acids residue.

In one embodiment, one that occurs or several aminoacid insertion, deletion or sudden change do not occur within SEQ ID NO:2 1129-1448 position and 1844-1925 amino acids residue.

The present invention also provides a peptide species, is selected from:

(1) aminoacid sequence is as shown in the formula the polypeptide shown in (II):

X’－L－Y’ （II）

In formula,

X ' represents the fragment of the SEQ ID NO:1 at least containing SEQ ID NO:1 1418-1754 amino acids residue;

Y ' represents the fragment of the SEQ ID NO:1 at least containing SEQ ID NO:1 1991-2081 amino acids residue;

L represents joint sequence; With

In one embodiment, polypeptide shown in formula (II) is the truncated segment at least having clipped 1772-1990 amino acids residue of SEQ ID NO:1.

In one embodiment, X ' can be expressed as a '-b '-c ', wherein, a ' is SEQ ID NO:1 the n-th '-1417 amino acids residue, the integer that n ' is 1-1417, and b ' is SEQ ID NO:1 1418-1754 amino acids residue, c ' is SEQ ID NO:1 1755-m ' amino acids residue, wherein m ' is the integer between 1756 to 1771, and such as, m ' is 1771.

In one embodiment, n ' can be such as 1395.

In one embodiment, Y ' can be expressed as d '-e ', wherein, d ' is SEQ ID NO:1 1991-2081 amino acids residue, and e ' is SEQ ID NO:1 2081-f ' amino acids residue, and wherein f ' is the integer between 2082 to 2136, such as, f ' is 2136 etc.

In one embodiment, X ' is SEQ ID NO:1 1395-1754 amino acids residue, and Y ' is SEQ ID NO:1 1991-2136 amino acids residue.

In one embodiment, X ' is SEQ ID NO:1 1418-1754 amino acids residue, and Y ' is SEQ ID NO:1 1991-2136 amino acids residue.

In one embodiment, X ' is SEQ ID NO:1 1418-1754 amino acids residue, and Y ' is SEQ ID NO:1 1991-2081 amino acids residue.

In one embodiment, X ' is SEQ ID NO:1 1418-1772 amino acids residue, and Y ' is SEQ ID NO:1 1991-2081 amino acids residue.

In one embodiment, TET1 polypeptide fragment of the present invention is selected from the polypeptide being made up of (N to C end) following aminoacid sequence:

(1) SEQ ID NO:1 1418-2081 amino acids residue;

(2) SEQ ID NO:1 1418-2136 amino acids residue;

(3) joint sequence shown in SEQ ID NO:1 1395-1754 amino acids residue, SEQ ID NO:8 and SEQ ID NO:1 1991-2136 amino acids residue;

(4) joint sequence shown in SEQ ID NO:1 1418-1754 amino acids residue, SEQ ID NO:8 and SEQ ID NO:1 1991-2136 amino acids residue;

(5) joint sequence shown in SEQ ID NO:1 1418-1754 amino acids residue, SEQ ID NO:8 and SEQ ID NO:1 1991-2081 amino acids residue; With

(6) joint sequence shown in SEQ ID NO:1 1418-1771 amino acids residue, SEQ ID NO:8 and SEQ ID NO:1 1991-2081 amino acids residue.

In one embodiment, one or several aminoacid insertion, replacement or deletion occur in outside SEQ ID NO:1 1418-1754 and 1991-2081 amino acids.

The present invention also provides a peptide species, is selected from:

(1) aminoacid sequence is as shown in the formula the polypeptide shown in (III):

X”－L－Y” （III）

In formula,

X " represent the fragment of SEQ ID NO:1 at least containing SEQ ID NO:3 717-1041 amino acids residue;

Y " represent the fragment of SEQ ID NO:1 at least containing SEQ ID NO:3 1501-1596 amino acids residue;

L represents joint sequence; With

In one embodiment, polypeptide shown in formula (III) is the truncated segment at least having clipped 1061-1500 amino acids residue of SEQ ID NO:3.

In one embodiment; X " can a be expressed as "-b "-c "; wherein; a " for SEQ ID NO:3 n-th "-716 amino acids residues, n " be the integer of 1-717, b " be SEQ ID NO:3 717-1041 amino acids residue; c " for SEQ ID NO:3 1042-m " amino acids residue; wherein m " be the integer between 1043-1060, such as, and m " be 1060.

In one embodiment, n " can be such as 663 or 689.

In one embodiment, Y " can d be expressed as "-e ", wherein; d " for SEQ ID NO:3 1501-1596 amino acids residue, e " be SEQ ID NO:3 1597-f " amino acids residue, wherein f " be integer between 1598 to 1660; such as, f " be 1660 etc.

In one embodiment, X " be SEQ ID NO:3 663-1041 amino acids residue, Y " be SEQ ID NO:3 1501-1596 amino acids residue.

In one embodiment, X " be SEQ ID NO:3 663-1041 amino acids residue, Y " be SEQ ID NO:3 1501-1660 amino acids residue.

In one embodiment, X ' is SEQ ID NO:3 689-1060 amino acids residue, Y " be SEQ ID NO:3 1501-1596 amino acids residue.

In one embodiment, X " be SEQ ID NO:3 717-1060 amino acids residue, Y " be SEQ ID NO:3 1501-1596 amino acids residue.

In one embodiment, TET3 polypeptide fragment of the present invention is selected from the polypeptide being made up of (N to C end) following aminoacid sequence:

(1) joint sequence shown in SEQ ID NO:3 663-1041 amino acids residue, SEQ ID NO:8 and SEQ ID NO:3 1051-1596 amino acids residue;

(2) joint sequence shown in SEQ ID NO:3 663-1041 amino acids residue, SEQ ID NO:8 and SEQ ID NO:3 1051-1660 amino acids residue;

(3) joint sequence shown in SEQ ID NO:3 689-1060 amino acids residue, SEQ ID NO:8 and SEQ ID NO:3 1051-1596 amino acids residue; With

(4) joint sequence shown in SEQ ID NO:3 717-1060 amino acids residue, SEQ ID NO:8 and SEQ ID NO:3 1051-1596 amino acids residue.

In one embodiment, one or several aminoacid insertion, replacement or deletion occur in outside SEQ ID NO:1 717-1041 and 1501-1596 amino acids.

In one embodiment, L is the sequence of long 2-20 the amino-acid residue containing G and S.In a specific embodiment, described joint sequence is as shown in SEQ ID NO:8.

The present invention includes polynucleotide sequence and the complementary sequence thereof of encoding such polypeptides.

The present invention also comprises the expression vector containing described polynucleotide sequence or its complementary sequence, and the host cell containing described expression vector.Host cell is commonly used to recombinant expressed TET polypeptide fragment of the present invention.In one embodiment, host cell is nonhuman mammalian cells.In another specific embodiment, host cell is bacterial cell, such as Bacillus coli cells.

The present invention also comprise TET polypeptide fragment of the present invention detecting whether 5-hmC in DNA sequence dna exists, in catalytic dna sequence 5-mC oxidation products generate in purposes.

The present invention also comprises the purposes of TET polypeptide fragment of the present invention in the interaction sites studying TET albumen and DNA and detailed mechanism.

Accompanying drawing explanation

Figure 1A and 1B is presented at dot hybridization experiment (dot-blot assay) when measuring methylolation enzymic activity, uses the impact that different TET2 truncated protein is lived for enzyme.

Fig. 1 C shows the TET2 albumen with catalytic activity being used for crystallization and enzyme experiment alive can be purified to homogeneous state.

Fig. 2 A shows the sequence signature of TET2 catalyst structure domain section.

Fig. 2 B shows the overall three-dimensional structure of TET2-DNA mixture.

Fig. 3 A shows DNA sequence dna and TET2 protein surface potential energy diagram in TET2 and DNA mixture.

In Fig. 3 B show dna sequence, 5-mC translates into the windup-degree of DNA double chain before and after in TET2 catalyst structure domain.

Fig. 3 C and 3D shows TET2 proteins carry structural domain and the interactional concrete site of DNA.

In the experiment alive of Fig. 4 display body exoenzyme, the enzyme running water of TET1 and TET2 albumen is put down.

Fig. 5 shows α-ketoglutaric acid and Fe between different plant species ²⁺the amino acid alignment of the dioxygenase relied on.

Embodiment

I. define

As used in this specification sheets He in claim, singulative " ", " one ", " being somebody's turn to do " comprise plural reference, except non-content obviously illustrates.

Following amino acid abbreviations is employed in literary composition:

" DNA methylolation enzymic activity " refers to that 5 methylcysteins on DNA can be oxidized to methylolation by polypeptide, and is further oxidized to aldehyde radical cytosine(Cyt) and carboxyl cytosine(Cyt), herein by this activity referred to as methylolation enzymic activity.

" separation " refers to that material is separated from its primal environment (if natural substance, namely primal environment is natural surroundings).As the polynucleotide under the native state in active somatic cell and polypeptide do not have separation and purification, but same polynucleotide or polypeptide as from native state with in other materials existed separately, then for separation and purification.

" fragment " refers to that full length sequence is by the sequence obtained after cutting out or a part, comprises aminoacid sequence and nucleotide sequence in this article.Such as, for aminoacid sequence, brachymemma from the either end of aminoacid sequence or two ends brachymemma simultaneously, also can clip or a part of amino-acid residue in the middle of (namely deleting) full length amino acid sequence.If delete one or a part of amino-acid residue from centre, not deleted part optionally uses joint sequence to connect, the sequence retained usually still according to original series 5 ' to 3 ' direction connect.

Be applicable to joint sequence of the present invention and comprise the joint sequence comprising GS.The usual long 2-20 amino-acid residue of joint sequence, such as 5-15 amino-acid residue.The example being applicable to joint sequence of the present invention includes but not limited to GGGGSGGGGSGGGGS(SEQ ID NO:8).

" TET fragment " or " TET polypeptide fragment " implication is identical in this article, and all refer to the polypeptide that the fragment of people TET albumen as herein described and mouse TET albumen or several fragment are formed by connecting, described fragment remains the methylolation enzymic activity of TET albumen.

Conservative replacement be with replaced amino-acid residue functionally or the aminoacid replacement that structure is equal to.Conservative replacement can comprise and being exchanged with another residue of replaced residue identical category (such as, hydrophobic, acid or alkaline) with having similar polarity, cubic arrangement or belonging to by a residue.Under indicate the conservative illustrative example replaced:

Initial residue	Representational replacement	Preferred replacement
			Ala(A)	Val;Leu;Ile	Val
Arg(R)	Lys;Gln;Asn	Lys
			Asn(N)	Gln;His;Lys;Arg	Gln
Asp(D)	Glu	Glu
			Cys(C)	Ser	Ser
Gln(Q)	Asn	Asn
			Glu(E)	Asp	Asp
Gly(G)	Pro;Ala	Ala
			His(H)	Asn;Gln;Lys;Arg	Arg
Ile(I)	Leu;Val;Met;Ala;Phe	Leu
			Leu(L)	Ile;Val;Met;Ala;Phe	Ile
Lys(K)	Arg;Gln;Asn	Arg
			Met(M)	Leu;Phe;Ile	Leu
Phe(F)	Leu;Val;Ile;Ala;Tyr	Leu

[0112]

Pro(P)	Ala	Ala
			Ser(S)	Thr	Thr
Thr(T)	Ser	Ser
			Trp(W)	Tyr;Phe	Tyr
Tyr(Y)	Trp;Phe;Thr;Ser	Phe
			Val(V)	Ile;Leu;Met;Phe;Ala	Leu

Therefore, have reason to predict: independent Isoleucine or valine for leucine, use glutamate for aspartate, use serine for threonin, or with the conservative amino acid that aminoacid replacement relevant in structure is similar, such replacement can not have material impact to biological activity.Such as, polypeptide of the present invention can comprise up to about 1-10 conservative aminoacid replacement, and the aminoacid replacement even guarded up to about 15-25, or any integer between 2-25, as long as the required function of this molecule still remains complete.Those skilled in the art in conjunction with Hopp/Woods and Kyte-Doolittle graphic representation well known in the art, can easily measure the region that can tolerate change in interested molecule.

" homology " refers to the percentage similarity between two polynucleotide or two polypeptide portions.Article two, DNA or two peptide sequence is in the molecular length determined, when sequence shows at least about 50%, preferably be at least about 75%, the better 80-85% of being at least about, be especially goodly at least about 90%, the best is when being at least about 95-98% sequence similarity, each other " substantially homology ".As described herein, basic homology also refers to and specific DNA or the identical sequence of peptide sequence.

" homogeny " to refer on two polynucleotide or peptide sequence accurately nucleotide vs nucleotide or amino acid-toamino acid corresponding.Directly compared their sequence information by the sequence arranging two molecules, the accurate quantity that mates between the sequences calculating two arrangements, by its length divided by most short data records, be then multiplied by 100, thus homogeny percentage ratio can be obtained.

The computer program using and be easy to obtain can be assisted in homology and homogeny analysis, as ALIGH, Dayhoff, M.O., (Atlas of Protein Sequence and Structure, M.O.Dayhoff edit, 5Suppl., 3:353-358, National Biomedical Research Foundation, Washington, DC), it is applicable to local homology algorithm (the Advances in Appl.Math. that Smith and Waterman analyzes peptide, 2:482-489,1981).Can from Wisconsin Sequence Analysis Package (the 8th edition, from Genetics Computer Group, Madison, WI obtains) obtain the program measuring nucleotide sequence homology, such as, BESTFIT, FASTA and GAP program, these programs also depend on Smith and Waterman algorithm.What use producer's suggestion can easily use these programs with the default parameters described in above-mentioned Wisconsin Sequence Analysis Package.Such as, the nucleotide sequence that the interval point penalty of the default scoring table of the homology algorithm of Smith and Warerman and 6 nucleotide positions (gap penalty) can be used to measure and the percent identity of reference sequence.

Or, carry out multi-nucleotide hybrid form the condition of stable double-strand between homology region under, then use the enzymic digestion of strand specific nucleic acid, then measure the size of the fragment of digestion, thus measure homology.In the Southern cross experiment carried out under as (to concrete system define) stringent condition, the DNA sequence dna of basic homology can be differentiated.Determine within the knowledge that suitable hybridization conditions is grasped those skilled in the art.Such as, see Sambrook etc., " molecular cloning: laboratory manual " (Molecular Cloning:a Laboratory Manual), the second edition, 1989.

Herein, structure coordinate is Cartesian coordinates, which depict atom in three dimensions relative to the position of other atoms in molecule or molecular complex.Use such as, X-ray-crystallography learns a skill or NMR technology can obtain structure coordinate.Extra structural information can obtain from spectroscopic techniques (such as, rotatory dispersion (ORD), circular dichroism (CD)), homology modeling and method of calculation (as comprising from molecule mechanism or the method for calculation from the data of kinetic determination).

Various software program allows to carry out graphic representation to obtain representing of molecule or molecular complex to one group of structure coordinate.Usually, this expression should reflect exactly (relatively and/or utterly) structure coordinate, or from the information of structure coordinate, as the distance between parts or angle.This expression can be X-Y scheme, and as stereoscopic two-dimensional figure, or interactional two dimension shows (such as, can the computer of not coplanar of display molecule or molecular complex show), or interactional stereoscopic two-dimensional is shown.Coordinate may be used for the generation instructing the physical three-dimensional of molecule or molecular complex to represent, as ball-and-stick model or the model prepared by rapid Design prototype.By mathematical operations, as by invert or integer add or deduct can modification structure coordinate.Similarly, structure coordinate is relative coordinate, and is subject to the restriction of actual x, y, z coordinate never particularly.

Three-dimensional separation flow is the expression of molecule or molecular complex.Three-dimensional model can be the physical model (such as, ball-and-stick model) of molecular structure, or the graphic representation of molecular structure.Graphic representation can comprise such as, the figure that graphoscope presents or figure.When two-dimensional representation reflects three-dimensional information, such as, by using perspective, shade or passing through to interdict from the farther parts of viewer with the parts closer to viewer, two-dimentional graphic representation can be three-dimensional model.Preferably, graphic representation reflects structure coordinate exactly, or from the information that structure coordinate obtains, as model parts between distance or angle.When three bit models comprise polypeptide, this model can comprise the structure of one or more different levelss, as primary structure (aminoacid sequence), secondary structure (such as, alpha-helix and beta sheet), tertiary structure (overall folded) and quaternary structure (oligomeric state).Model can comprise the details of different levels.Such as, model can comprise the relative position of the secondary structure parts of albumen, and does not have the position of specified atom.More detailed model can comprise the position of atom.

Model can comprise other chemical informations of characteristic sum obtained from structure coordinate.Such as, the shape on the come-at-able surface of solvent can obtain from the van der Waals radius of the van der Waals radius of structure coordinate, Model Atoms and solvent (such as, water).Other features that can obtain from structure coordinate include but not limited to, the position of the space in electrostatic potential, macromolecular structure and the position of pocket and hydrogen bond and salt bridge.

Model can comprise the structure coordinate of molecular structure Atom.Structure coordinate can by experiment, such as, by X-radiocrystallography or NMR spectroscopy determining, or can be produced by such as homology modeling.Molecular structure can comprise individual molecule, the part of molecule, the complex body of two or more molecules, component or its combination.In molecular complex model, molecule can be combined by covalency or non covalent bond, comprises such as, hydrogen bond, hydrophobic interaction or electrostatic attraction.Molecular complex can comprise the molecule of combining closely, as enzyme/inhibitor complex body, and the molecule of loose combination, as crystalline compounds, there is orderly solvent molecule or ion in it in crystal.Model can comprise such as, is incorporated into the complex body of the albumen of reagent, such as, is incorporated into the complex body of the enzyme of inhibitor.When model comprises structure coordinate, the coordinate of some atom in molecule can be omitted.

Reagent comprises albumen, polypeptide, peptide, nucleic acid (comprising DNA or RNA), molecule, compound or medicine.

Reactive site is the region of molecule or molecular complex, and it can interact with reagent (include but not limited to, albumen, polypeptide, peptide, nucleic acid, comprise DNA or RNA, molecule, compound or medicine) or combine.Reactive site can comprise such as, reagent combining site, and with the adjacent or immediate attached combining site of practice sites combined, it can interact with particular agent or affect activity after combining.Reactive site can comprise inhibitor combining site.Inhibitor can suppress in the following manner, namely the practice sites of direct interference Binding Capacity is passed through (such as, by competing with Binding Capacity) or by remote effect three-dimensional conformation or charge voltage, thus prevent or reduce the combination of practice sites place substrate of Binding Capacity.Such as, the position that reactive site can be that cofactor combines, the Binding Capacity substrate of phosphorylation (such as, will) or inhibitor combine.Reactive site can comprise the position that allosteric effector combines, or phosphorylation, glycosylation, alkylation, acidylate or other covalent modifications position.

Graphoscope can be used to show the three-dimensional model of TET2 and DNA mixture, such as, the figure of their binding site.

In sequence table, SEQ ID NO:1 shows the full length amino acid sequence of people TET1 albumen; SEQ ID NO:2 shows the full length amino acid sequence of people TET2 albumen; SEQ ID NO:3 shows the full length amino acid sequence of people TET3 albumen; SEQ ID NO:4 shows the full length amino acid sequence of mouse Tet1isoform1; SEQ ID NO:5 shows the full length amino acid sequence of mouse Tet1isoform2; SEQ ID NO:6 shows the full length amino acid sequence of mouse Tet2; SEQ ID NO:7 shows the full length amino acid sequence of mouse Tet3; SEQ ID NO:8 shows joint sequence; SEQ ID NOS:9-13 display label sequence; SEQ ID NO:14-39 is primer sequence; SEQ ID NO:40 is the encoding sequence of the joint sequence shown in SEQ ID NO:8.

II. detailed Description Of The Invention

Aminoacid sequence provided by the invention is the TET fragment that the hydroxymethylase enzyme remaining TET albumen of the arbitrary aminoacid sequence shown in SEQ ID NO:1-7 is lived.Better cysteine rich structural domain and the catalyst structure domain remaining TET albumen of described fragment.Described fragment is applicable to by recombination form great expression and preparation, and stable in properties is homogeneous.

The cysteine rich structural domain of people TET2 albumen (SEQ ID NO:2) and catalyst structure domain are as shown in Figure 2.The present inventor finds, the fragment retained after deleting a part for this sequence still retains the hydroxymethylase function of people TET2 albumen, and is applicable to by recombination form great expression and preparation.

Deleted amino acid residues length is generally 100-395 amino-acid residue, such as, 150-395 amino-acid residue, 200-395 amino-acid residue, 250-395 amino-acid residue, between 300-395 amino-acid residue not etc.Usually, at least the 1490-1830(of SEQ ID NO:2 comprises this number) amino acids residue is deleted.In certain embodiments, deleted between 1449-1843 position (comprising this number) amino-acid residue of SEQ ID NO:2.In certain embodiments, deleted amino-acid residue at least comprises the 1481-1830 amino acids residue of SEQ ID NO:2.Such as, deleted amino-acid residue at least comprises the 1462-1830 amino acids residue of SEQ ID NO:2 usually, or usually at least comprises 1449-1830 amino acids residue.In certain embodiments, deleted is SEQ ID NO:2 1481-1843 amino acids residue.In certain embodiments, deleted is SEQ ID NO:2 1462-1843 amino acids residue.In certain embodiments, deleted is SEQ ID NO:2 1449-1843 amino acids residue.

The amino-acid residue be retained generally includes SEQ ID NO: 1129-1448 amino acids residue and 1844-1925 amino acids residue.Such as, the amino-acid residue be retained generally includes SEQ ID NO: 1129-1461 amino acids residue and 1844-1925 amino acids residue, or comprise SEQ ID NO: 1129-1480 amino acids residue and 1844-1925 amino acids residue, or comprise SEQ ID NO: 1129-1461 amino acids residue and 1844-1936 amino acids residue, or comprise SEQ ID NO: 1129-1480 amino acids residue and 1844-1936 amino acids residue.The residue combinations that the TET fragment that following table lists exemplifying SEQ ID NO:2 comprises:

1129-1448	1844-1925
		1129-1448	1844-1936
1129-1448	1844-2002
		1129-1461	1844-1925
1129-1461	1844-1936
		1129-1461	1844-2002
1129-1480	1844-1925
		1129-1480	1844-1936
1129-1480	1844-2002

In some embodiments, the amino-acid residue of the N that retains end can the (n-1)th 448/1461/1480 to represent, wherein, n is the integer between 1-1129.That is, the amino-acid residue retained can be 2-1448,2-1461,2-1480 residue not etc.In a specific embodiment, n is 970 or 1099.

In some embodiments, the amino-acid residue of the C end retained can be that 1830-m or 1844-m amino acids residue represents, wherein m is the integer between 1925 to 2002.Such as, the C that retains holds residue can be 1830-1925 position, 1830-1936 position, 1844-1925 position, 1844-1936 position, 1844-2002 position residue not etc.

After clipping the amino-acid residue of middle portion, the N end retained can directly be connected with the residue of C end, and joint sequence also can be used to connect.The joint sequence of usual use flexibility connects.Flexible linker sequence usually with GS for its composition residue, a long 2-20 amino-acid residue.Exemplary flexible linker sequence is as shown in the application SEQ ID NO:8.

One or several conservative replacement can be there is in TET sequence of the present invention, such as, there is the conservative replacement of (such as 1-15,1-10,1-8,1-5) within 1-20.The conservative example replaced as mentioned before.Preferably, conservative replacement occurs in outside SEQ ID NO:2 1129-1449 and 1844-1936 amino acids.Such as, 1 or several conservative property replace or even non-conservative mutations can occur between SEQ ID NO:2 1450-1480 amino acids residue, if these residues are retained.Except conservative replacement, also can there is deletion or the insertion mutation of (such as 1-15,1-10,1-8,1-5) within 1-20 outside above-mentioned position.

The application also comprises the aminoacid sequence with above-mentioned fragment with at least 70% sequence iden, and preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%.More preferably, difference occurs in outside SEQ ID NO: the 1129-1449 and 1844-1936 amino acids.Such as, 1 or several conservative property replace or even non-conservation replaces and can occur between SEQ ID NO:2 1450-1480 amino acids residue, if these residues are retained, peptide sequence formed like this is compared with corresponding peptide sequence of the present invention, there is certain sequence iden, also remain simultaneously enzyme work with, and can recombinant expressed, purifying.

By SEQ ID NO:1(people TET1 albumen) based on TET fragment usually delete at least 1772-1990 amino acids residue of SEQ ID NO:1.In certain embodiments, the 1755-1990 amino acids residue of SEQ ID NO:1 is deleted.And what retain is at least 1418-1754 and the 1991-2081 amino acids residue of SEQ ID NO:1.Between two fragments retained, available flexible joint of the present invention connects.Similarly, one or several conservative replacement can be there is in described peptide sequence, such as, there is the conservative replacement of (such as 1-15,1-10,1-8,1-5) within 1-20.The conservative example replaced as mentioned before.Preferably, conservative replacement occurs in outside SEQ ID NO:1 1418-1754 and 1991-2081 amino acids.Such as, 1 or several conservative property replace or even non-conservative mutations can occur between SEQ ID NO:1 2081-2136 amino acids residue or 1755-1771 amino acids residue, if these residues are retained.The application also comprises the aminoacid sequence with above-mentioned fragment with at least 70% sequence iden, and preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%.More preferably, difference occurs in outside SEQ ID NO:1 1418-1754 and 1991-2081 amino acids residue.Such as, 1 or several conservative property replace or even non-conservative mutations can occur between SEQ ID NO:1 2081-2136 amino acids residue or 1755-1771 amino acids residue, if these residues are retained, peptide sequence formed like this is compared with corresponding peptide sequence of the present invention, there is certain sequence iden, also remain simultaneously enzyme work with, and can recombinant expressed, purifying.

By SEQ ID NO:3(people TET3 albumen) based on TET fragment usually delete at least 1061-1500 amino acids residue of SEQ ID NO:3.In certain embodiments, the 1042-1500 amino acids residue of SEQ ID NO:1 is deleted.And what retain is at least 717-1041 and the 1501-1596 amino acids residue of SEQ ID NO:3.In one embodiment, reservation is at least 689-1041 and the 1501-1596 amino acids residue of SEQ ID NO:3.In one embodiment, reservation is at least 663-1041 and the 1501-1596 amino acids residue of SEQ ID NO:3.Between two fragments retained, available flexible joint of the present invention connects.Similarly, one or several conservative replacement can be there is in described peptide sequence, such as, there is the conservative replacement of (such as 1-15,1-10,1-8,1-5) within 1-20.The conservative example replaced as mentioned before.Preferably, conservative replacement occurs in outside SEQ ID NO:3 663-717 and 1596-1660 amino acids.Such as, 1 or several conservative property replace or even non-conservative mutations can occur between SEQ ID NO:3 663-717 and 1596-1660 amino acids residue, if these residues are retained.The application also comprises the aminoacid sequence with above-mentioned fragment with at least 70% sequence iden, and preferably at least 80%, at least 85%, at least 90%, at least 95%, at least 97%, at least 99%.More preferably, difference occurs in outside SEQ ID NO:3 717-1041 and 1501-1596 amino acids.Such as, 1 or several conservative property replace or even non-conservative mutations can occur between SEQ ID NO:3 663-717 and 1596-1660 amino acids residue, if these residues are retained, peptide sequence formed like this is compared with corresponding peptide sequence of the present invention, there is certain sequence iden, also remain simultaneously enzyme work with, and can recombinant expressed, purifying.

The aminoterminal of TET fragment of the present invention or carboxyl terminal also can contain one or more polypeptide fragment, as protein tag.Any suitable label may be used to the present invention.Such as, described label can be FLAG, HA, HA1, c-Myc, Poly-His, Poly-Arg, Strep-TagII, AU1, EE, T7,4A6, ε, B, gE and Ty1.These labels can be used for carrying out purifying to albumen.Following table lists some labels wherein and sequence thereof.

In order to make the protein excretion of translation express (as being secreted into extracellular), also can TET fragment of the present invention N-terminal add on signal peptide sequence, as pelB signal peptide etc.Signal peptide can be cut from intracellular secretory process out in TET fragment.

The present invention also comprises the polynucleotide sequence of code book invention TET fragment.Polynucleotide sequence of the present invention can be DNA form or rna form.DNA form comprises the DNA of cDNA, genomic dna or synthetic.DNA can be strand or double-strand.DNA can be coding strand or noncoding strand.One of example of the polynucleotide sequence of code book invention TET fragment is the encoding sequence corresponding to TET polypeptide fragment of the present invention in the encoding sequence of hTET1, hTET2 and hTET3 known in the art.The known method in this area can be adopted to obtain the encoding sequence obtaining polypeptide fragment of the present invention from known TET cDNA.

The exemplarily example of property, the nucleotide sequence from 5 ' to 3 ' of code book invention TET2 fragment 1129-1936del1481-1843 is the Genbank number of logging in successively is NM_001127208.2 3872-4927 bit base (1129-1840 amino acids of encoding residue), joint sequence shown in SEQ ID NO:40(coding SEQ ID NO:8), and NM_001127208.2 6017-6295 bit base (1844-1936 amino acids of encoding residue).

The polynucleotide of code book invention TET fragment can be the polynucleotide comprising code book invention TET fragment, also can be the polynucleotide also comprising additional code and/or non-coding sequence.

The invention still further relates to the varient of above-mentioned polynucleotide, its coding has the polypeptide of identical aminoacid sequence or fragment, the sum analogous to general Dedekind sum of polypeptide with the present invention.The varient of these polynucleotide can be the allelic variant of natural generation or the varient of non-natural generation.These nucleotide variants comprise and replace varient, Deletion variants and insertion varient.As known in the art, allelic variant is the replacement form of polynucleotide, and it may be the replacement of one or more Nucleotide, disappearance or insertion, but can not from the function of polypeptide changing in fact its coding.

To the invention still further relates to and polynucleotide sequence of the present invention is hybridized and has at least 50% between two sequences, preferably at least 70%, the more preferably polynucleotide of at least 80% homogeny.The present invention be more particularly directed to polynucleotide interfertile with polynucleotide of the present invention under stringent condition (or stringent condition).In the present invention, " stringent condition " refers to: (1) compared with the hybridization under low ionic strength and comparatively high temps and wash-out, as 0.2 × SSC, 0.1%SDS, 60 DEG C; Or be added with denaturing agent during (2) hybridization, and as 50% (v/v) methane amide, 0.1% calf serum/0.1%Ficoll, 42 DEG C etc.; Or (3) homogeny only between two sequences, at least more than 90%, is just hybridized when being more preferably more than 95%.Further, the polypeptide of interfertile polynucleotide encoding has identical biological function and activity with polypeptide of the present invention.

The invention still further relates to the nucleic acid fragment of hybridizing with polynucleotide sequence of the present invention.As used herein, the length of " nucleic acid fragment ", at least containing 15 Nucleotide, is better at least 30 Nucleotide, is more preferably at least 50 Nucleotide, preferably more than at least 100 Nucleotide.Nucleic acid fragment can be used for the amplification technique (as PCR) of nucleic acid to determine and/or to be separated the polynucleotide of code book invention polypeptide.

TET fragment in the present invention with and polynucleotide preferably provide with the form be separated, be more preferably purified to homogeneous.

TET fragment of the present invention can obtain by the method for pcr amplification method, recombination method or synthetic usually.For pcr amplification method, can according to the published polynucleotide sequence in this area, especially open reading frame sequence designs primer, and with commercially available cDNA storehouse or by the cDNA storehouse prepared by ordinary method well known by persons skilled in the art as template, amplification and relevant sequence.When sequence is longer, usually needs to carry out twice or repeatedly pcr amplification, and then the fragment that each time amplifies is stitched together by proper order.

Once obtain relevant sequence, just relevant sequence can be obtained in large quantity with recombination method.This is normally cloned into carrier, then proceeds to cell, is then separated from the host cell after propagation by ordinary method and obtains relevant sequence.

In addition, also relevant sequence can be synthesized, when especially fragment length is shorter by the method for synthetic.Usually, by first synthesizing multiple small segment, and then carry out connect can obtain the very long fragment of sequence.

At present, the DNA sequence dna of code book invention TET fragment can be obtained completely by chemosynthesis.Then this DNA sequence dna can be introduced in various existing DNA molecular (or as carrier) as known in the art and cell.In addition, also by chemosynthesis, sudden change is introduced in protein sequence of the present invention.

The method of application round pcr DNA amplification/RNA is optimized for and obtains gene of the present invention.When being particularly difficult to obtain the cDNA of total length from library, preferably can use RACE method (RACE-cDNA end rapid amplification), primer for PCR suitably can be selected according to sequence information of the present invention disclosed herein, and using conventional procedures synthesis.Using conventional procedures is as the DNA/RNA fragment increased by gel electrophoresis abstraction and purification.

The present invention also relates to the carrier comprising polynucleotide of the present invention, and with the host cell that the encoding sequence of carrier of the present invention or polypeptide of the present invention produces through genetically engineered, and the method for polypeptide of the present invention is produced through recombinant technology.

By the recombinant DNA technology of routine, polynucleotide sequence of the present invention can be utilized to can be used to the TET fragment of the present invention of expression or Restruction.In general following steps are had:

(1). with the polynucleotide (or varient) of coding TET fragment of the present invention, or transform or suitable host cell of transduceing with the recombinant expression vector containing these polynucleotide;

(2). the host cell cultivated in suitable substratum;

(3). separation, protein purification from substratum or cell.

In the present invention, the polynucleotide sequence of TET fragment can be inserted in recombinant expression vector.Term " recombinant expression vector " refers to bacterial plasmid well known in the art, phage, yeast plasmid, vegetable cell is viral, mammalian cell is viral as adenovirus, retrovirus or other carriers.As long as can copy in host and stablize, any plasmid and carrier can be used.A key character of expression vector is usually containing replication orgin, promotor, marker gene and translation controlling elements.

Method well-known to those having ordinary skill in the art can be used for building containing TET DNA sequences encoding and the suitable expression vector of transcribing/translating control signal.These methods comprise recombinant DNA technology in vi, DNA synthetic technology, In vivo recombination technology etc.Described DNA sequence dna can be effectively connected in the suitable promotor in expression vector, synthesizes to instruct mRNA.The representative example of these promotors has: colibacillary lac or trp promotor; Lambda particles phage PL promotor; Eukaryotic promoter comprise CMV immediate early promoter, HSV thymidine kinase promoter, early stage and late period SV40 promotor, retrovirus LTRs and some other known can the promotor expressed in protokaryon or eukaryotic cell or its virus of controlling gene.Expression vector also comprises ribosome bind site and the transcription terminator of translation initiation.

In addition, expression vector preferably comprises one or more selected marker, to be provided for the phenotypic character selecting the host cell transformed, as Tetrahydrofolate dehydrogenase, neomycin resistance and green fluorescent protein (GFP) that eukaryotic cell is cultivated, or for colibacillary tsiklomitsin or amicillin resistance.

Comprise the carrier of above-mentioned suitable DNA sequence dna and suitably promotor or control sequence, may be used for transforming suitable host cell, with can marking protein.

Host cell can be prokaryotic cell prokaryocyte, as bacterial cell; Or the eukaryotic cell such as low, as yeast cell; Or higher eucaryotic cells, as mammalian cell.Representative example has: intestinal bacteria, streptomyces; The bacterial cell of Salmonella typhimurium; Fungal cell is as yeast; Vegetable cell; The insect cell of fruit bat S2 or Sf9; The zooblast etc. of CHO, COS, 293 cells or Bowes melanoma cells.

When polynucleotide of the present invention are expressed in higher eucaryotic cells, if will make to transcribe to be enhanced when inserting enhancer sequence in the carrier.Enhanser is the cis-acting factors of DNA, and nearly 10 to 300 base pairs, act on promotor transcribing with enhancing gene usually.Can for example be included in the SV40 enhanser of 100 to 270 base pairs of replication origin side in late period, the polyoma enhancer in replication origin side in late period and adenovirus cancers etc.

Persons skilled in the art all know how to select suitable carrier, promotor, enhanser and host cell.

Can carry out with routine techniques well known to those skilled in the art with recombinant DNA transformed host cell.When host be prokaryotic organism as intestinal bacteria time, the competent cell that can absorb DNA can be gathered in the crops at exponential growth after date, uses CaCl ₂method process, step used is well-known in this area.Another kind method uses MgCl ₂.If needed, transform and also can be undertaken by the method for electroporation.When host is eukaryote, can select following DNA transfection method: calcium phosphate precipitation, conventional mechanical methods is as microinjection, electroporation, liposome packaging etc.

The transformant obtained can be cultivated by ordinary method, expresses the polypeptide of coded by said gene of the present invention.According to host cell used, substratum used in cultivation can be selected from various conventional medium.Cultivate under the condition being suitable for host cell growth.When after host cell growth to suitable cell density, the promotor selected with the induction of suitable method (as temperature transition or chemical induction), cultivates for some time again by cell.

Recombinant polypeptide in the above methods can be expressed or be secreted into extracellular in cell or on cytolemma.If needed, can utilize its physics, the albumen of being recombinated by various separation method abstraction and purification with other characteristic of chemistry.These methods are well-known to those skilled in the art.The example of these methods includes, but are not limited to: conventional renaturation process, combination by protein precipitant process (salting-out method), centrifugal, the broken bacterium of infiltration, super process, ultracentrifugation, sieve chromatography (gel-filtration), adsorption chromatography, ion exchange chromatography, high performance liquid chromatography (HPLC) and other various liquid chromatography (LC) technology and these methods.

Below will set forth the present invention by way of example.Should be understood that these embodiments are only illustrative, and nonrestrictive.The reagent mentioned in embodiment, unless otherwise stated, be all commercially available conventional reagent on market.

Embodiment 1: the structure of people TET2 catalyst structure domain truncated protein (1129-1936 del 1481-1843) and purifying:

Take TET2cDNA as template, use 1129 forward primers and 1481 reverse primers, 1843 forward primers and 1936 reverse primers to increase respectively and (use archaeal dna polymerase 95 DEG C of denaturations 3 minutes, 95 DEG C of sex change 15 seconds, anneal 30 seconds for 55 DEG C, 72 DEG C extend 1 minute) obtain fragment 1129-1480 and 1844-1936.By two fragment equivalent (mol) mixing, 1129 forward primers and the amplification of 1936 reverse primers is used (to use archaeal dna polymerase 95 DEG C of denaturations 3 minutes, 95 DEG C of sex change 15 seconds, 55 DEG C of annealing 30 seconds, 72 DEG C extend 1 point 20 seconds) obtain object fragment 1129-1936del1481-1843.

PCR primer, pGEX-6p-1 are used BamH I/Xho I (Takara) at 37 DEG C of double digestions respectively, connect in room temperature respectively after recovery.After qualification is correct, pGEX-6p-1-TET2 (1129-1936del1481-1843) is proceeded in e. coli bl21 (DE3) expression strain, is respectively coated on the fresh LB plate culture medium with amicillin resistance (thousandth content).After growing mono-clonal, overnight culture, in the 100ml LB liquid nutrient medium containing penbritin, 37 DEG C of incubated overnight, is then proceeded to 1L 2 × YT liquid nutrient medium with the ratio of 1:100 and carries out amplification cultivation by picking mono-clonal respectively.Treat that bacteria concentration reaches OD ₆₀₀during=0.6-0.8, substratum is dropped to 15 DEG C, add isopropyl-β-D-thiogalactoside(IPTG) (IPTG) to final concentration 0.1mM.Continue at this temperature to cultivate after about 16 hours, collected by centrifugation thalline (4000rpm, 15min).

After thalline uses lysate (1 × PBS, 150mM NaCl, 5mM imidazoles) resuspended, adding final concentration is 10ug/ml DNase(Sigma company).High pressure fragmentation (1500bar), after high speed centrifugation (12000rpm, 4 DEG C, 25min), discards precipitation.Supernatant joins the Ni balanced ²⁺in post (GE Healthcare), be washed till flow out without foreign protein with washings (1 × PBS, 150mM NaCl, 20mM imidazoles).12%SDS-PAGE electrophoresis also detects with coomassie brilliant blue staining.

Homemade for 1-2ml laboratory 3C enzyme liquid is joined the Ni hanging with mixture ²⁺in post, enzyme cuts through night.

Low salt buffer (the 20mM Tris 8.0 of the TET2 albumen that enzyme cuts containing lower concentration imidazoles, 50mM NaCl, 20mM imidazoles) wash-out, with low salt buffer (20MM Tris8.5,3mM DTT) dilution, and regulate pH to 8.5, adopt the high resolution Source Q anion-exchange column (GE Healthcare) of prepackage, utilize the flash chromatography system of AKTA purifier (GE Healthcare), with 8ml/min flow velocity loading.From low salt buffer (20mM Tris-HCl pH8.5,3mM DTT (DTT)), to high-salt buffer (20mM Tris-HCl pH8.5,1M NaCl, 3mM DTT) continuous gradient wash-out, occur protein peak, going out peak position sampling, 12%SDS-PAGE electrophoresis also detects with coomassie brilliant blue staining.

The collecting protein of ion exchange column purification is concentrated, get supernatant after high speed centrifugation, purify with Superdex 200 molecular sieve (GE Healthcare) prepacked column, with 10mM Hepes pH7.0,100mM NaCl, 10mM beta-mercaptoethanol damping fluid is with 0.5ml/min flow velocity wash-out.Merge in sieve chromatography the collection liquid going out peak position, be concentrated into 50mg/ml, be distributed into often pipe 50 μ l-100 μ l, be placed in-80 DEG C frozen for subsequent use.

Embodiment 2: the structure of people TET1 catalyst structure domain truncated protein (1418-2136) and purifying

Be template with TET1cDNA, use 1418 forward primers and 2136 reverse primers amplification (use archaeal dna polymerase 95 DEG C of denaturation 3min, 95 DEG C of sex change 15sec, 55 DEG C of annealing 30sec, 72 DEG C extend 2min) to obtain fragment 1418-2136.PCR primer, pGEX-6p-1 are used BamH I/Xho I (Takara) at 37 DEG C of double digestions respectively, connect in room temperature respectively after recovery.After qualification is correct, pGEX-6p-1-TET1 (1418-2136) is proceeded in intestinal bacteria Rosseta expression strain, be respectively coated on fresh on penbritin and the Double LB plate culture medium of paraxin.After growing mono-clonal, picking mono-clonal is to containing in penbritin and the dual anti-100ml LB liquid nutrient medium of paraxin, 37 DEG C of incubated overnight, then respectively overnight culture is proceeded to 1L2 × YT liquid nutrient medium with the ratio of 1:100 and carry out amplification cultivation.Treat that bacteria concentration reaches OD ₆₀₀during=0.6-0.8, substratum is dropped to 15 DEG C, add isopropyl-β-D-thiogalactoside(IPTG) (IPTG) to final concentration 0.1mM.Continue at this temperature to cultivate after about 16 hours, collected by centrifugation thalline (4000rpm, 15min).

Low salt buffer (the 20mM Tris8.0 of the TET1 albumen that enzyme cuts containing lower concentration imidazoles, 50mM NaCl, 20mM imidazoles) wash-out, with low salt buffer (20mM Tris8.5,3mM DTT) dilution, and regulate pH to 8.5, adopt the high resolution Source Q anion-exchange column (GE Healthcare) of prepackage, utilize the flash chromatography system of AKTA purifier (GE Healthcare), with 8ml/min flow velocity loading.From low salt buffer (20mM Tris pH8.5,3mM DTT), to high-salt buffer (20mM Tris pH8.5,1M NaCl, 3mM DTT) continuous gradient wash-out, occur protein peak, going out peak position sampling, 12%SDS-PAGE electrophoresis also detects with coomassie brilliant blue staining.

The collecting protein of ion exchange column purification is concentrated, get supernatant after high speed centrifugation, purify with Superdex 200 molecular sieve (GE Healthcare) prepacked column, with 10mM Hepes pH7.0,100mM NaCl, 10mM beta-mercaptoethanol damping fluid is with 0.5ml/min flow velocity wash-out.Merge in sieve chromatography the collection liquid going out peak position, be concentrated into 50mg/ml, be distributed into often pipe 50ul-100ul, be placed in-80 DEG C frozen for subsequent use.

Embodiment 3: the structure of people TET3 catalyst structure domain truncated protein (663-1660 del 1042-1500) and purifying

Take TET1cDNA as template, use 663 forward primers and 1042 reverse primers, 1500 forward primers and 1660 reverse primers to increase respectively and (use archaeal dna polymerase 95 DEG C of denaturation 3min, 95 DEG C of sex change 15sec, 55 DEG C of annealing 30sec, 72 DEG C extend 1min) obtain fragment 663-1041 and 1501-1660.By two fragment equivalent (mol) mixing, 663 forward primers and the amplification of 1660 reverse primers is used (to use archaeal dna polymerase 95 DEG C of denaturation 3min, 95 DEG C of sex change 15sec, 55 DEG C of annealing 30sec, 72 DEG C extend 2min) obtain object fragment 663-1660 del 1042-1500.PCR primer, pGEX-6p-1 are used BamH I/Xho I (Takara) at 37 DEG C of double digestions respectively, connect in room temperature respectively after recovery.After qualification is correct, pGEX-6p-1-TET3 (663-1660del1042-1500) is proceeded in e. coli bl21 (DE3) expression strain, be respectively coated on fresh on the LB plate culture medium of amicillin resistance.After growing mono-clonal, overnight culture, in the 100ml LB liquid nutrient medium containing penbritin, 37 DEG C of incubated overnight, is then proceeded to 1L 2 × YT liquid nutrient medium with the ratio of 1:100 and carries out amplification cultivation by picking mono-clonal respectively.Treat that bacteria concentration reaches OD ₆₀₀during=0.6-0.8, substratum is dropped to 15 DEG C, add isopropyl-β-D-thiogalactoside(IPTG) (IPTG) to final concentration 0.1mM.Continue at this temperature to cultivate after about 16 hours, collected by centrifugation thalline (4000rpm, 15min).

Low salt buffer (the 20mM Tris 8.0 of the TET3 albumen that enzyme cuts containing lower concentration imidazoles, 50mM NaCl, 20mM imidazoles) wash-out, with low salt buffer (20mM Bis-Tris 6.0,3mM DTT) dilution, and regulate pH to 6.0, adopt the high resolution Source S cationic exchange coloum (GE Healthcare) of prepackage, utilize the flash chromatography system of AKTA purifier (GE Healthcare), with 8ml/min flow velocity loading.From low salt buffer (20mM Bis-Tris pH6.0,3mM DTT), to high-salt buffer (20mM Bis-Tris pH6.0,1M NaCl, 3mM DTT) continuous gradient wash-out, occur protein peak, going out peak position sampling, 12%SDS-PAGE electrophoresis also detects with coomassie brilliant blue staining.

The structure of embodiment 4:mTet1 catalyst structure domain truncated protein (1367-2039) and purifying

Be template with TET1cDNA, use 1367 forward primers and 2039 reverse primers amplification (use archaeal dna polymerase 95 DEG C of denaturation 3min, 95 DEG C of sex change 15sec, 55 DEG C of annealing 30sec, 72 DEG C extend 2min) to obtain fragment 1367-2039.PCR primer, pGEX-6p-1 are used respectively BamH I/Not I(Takara) at 37 DEG C of double digestions, connect in room temperature respectively after recovery.After qualification is correct, pGEX-6p-1-TET1 (1367-2039) is proceeded in intestinal bacteria Rosseta expression strain, be respectively coated on fresh on penbritin and the Double LB plate culture medium of paraxin.After growing mono-clonal, picking mono-clonal is to containing in penbritin and the dual anti-100ml LB liquid nutrient medium of paraxin, 37 DEG C of incubated overnight, then respectively overnight culture is proceeded to 1L2 × YT liquid nutrient medium with the ratio of 1:100 and carry out amplification cultivation.Treat that bacteria concentration reaches OD ₆₀₀during=0.6-0.8, substratum is dropped to 15 DEG C, add isopropyl-β-D-thiogalactoside(IPTG) (IPTG) to final concentration 0.1mM.Continue at this temperature to cultivate after about 16 hours, collected by centrifugation thalline (4000rpm, 15min).

Embodiment 5:TET2(1129-1936del1481-1843) protein crystal, crystal data collect and structure elucidation:

Get the TET2 albumen of fresh purifying, crystal test kit Crystallization screening KitI, II of producing with Hampton Research company; SaltI, II; Totally 432 conditions such as PEGIONI, II etc., adopt the sessile drop method in vapor phase grafting to carry out preliminary screening in 4 DEG C.Albumen and DNA, NOG are mixed according to mol ratio 1:1:3, at 0.1M MES (pH6.3), 25%PEG monomethyl ether2000 filters out cluster tabular crystal.The preliminary diffraction analysis of crystal at light source be Bruker Cu target rotary-anode X-ray producer on carry out.Voltage 45kV, electric current 60mA, wavelength take to collect data under cryogenic freezing condition, application Oxford cryosystem maintains the cold condition of 100K, and bulk composite crystal deicing fluid is the damping fluid containing 0.1M MES (pH6.3), 25%PEG monomethyl ether 2000.Sample nylon ring is pulled out, in deicing fluid, soak 2-3s, moves to rapidly in the nitrogen gas stream of 100K.Crystal good for the diffraction that X-shutdown is indoor picked out is stored in liquid nitrogen.

The data of primary collection process with HKL2000 software package (HKL Research, Inc.), and TET-DNA compound crystal belongs to spacer C222 ₁, a=48.3, b=88.2, c=263.0, α=β=γ=90 °.

The data gathering of crystal and statistics

By collecting Zn-SAD data, resolving phase place, and then solving the structure of whole protein complex.

Result as shown in Figure 2,3.

The hydroxymethylase of the different truncated protein of embodiment 6:TET2 is lived and is verified:

Dot hybridization experiment (dot blot assay) in cell

The different segment of people source TET2 albumen with Flag fusion tag is used in dot hybridization experiment in cell, comprise TET2 full-length proteins (1-2002), 970-2002 section, 1129-2002 section, 1156-1936 section, 1129-1913 section albumen, and leave out the TET21129-1936 section albumen of different insertion sequence (1481-1843,1462-1843,1449-1843).Constructed different TET2 total length and brachymemma segment are transfected into people HEK293T(human embryo kidney (HEK)) in cell, collecting cell after 42 hours.From the cell collected, extracting genomic dna, is heated sex change, and with the mode of 2 times of gradient dilutions point on Hybond-N+ nitrocellulose filter (Milipore).UV-crosslinked rear use is containing TBST(10mM Tris pH 7.4, the 150mM NaCl of 5% milk, 0.1%Tween 20) damping fluid closes one hour at ambient temperature, and then with 5-hmC antibody (Active Motif) overnight incubation under 4 DEG C of conditions.Nitrocellulose filter uses TBST wash buffer three times, after hatching, uses chemiluminescence detection system to detect the chemoluminescence situation of each point with the horseradish peroxidase (HRP) combining goat anti-rabbit antibody (Abmart).Negative contrast uses the genomic dna of extracting in the HEK293T cell proceeding to empty carrier, thus nitrocellulose filter cannot detect obvious 5-hmC, does not thus have the chemoluminescence of system visible.And have obvious visible color burn to illustrate can to detect and the generation of 5-hmC namely represent that corresponding TET albumen has methylolation enzymic activity in cell.Also use the antibody test of Flag label to arrive transfection in this experiment and enter the total length of HEK293T cell and the existence of different brachymemma TET2 albumen.

Result as shown in Figure 1.

Vitro enzyme is lived and is tested:

The 58bp of 0.5 μ g is mixed from the different TET truncated proteins of 2 μ g, is containing 50mM HEPES pH8.0,100mM NaCl, 100 μMs of Fe (NH ₄) ₂(SO ₄) ₂2mM ascorbate salt, 1h is reacted in the reaction solution of 1mM DTT and 1mM ATP, under 37 DEG C of reaction conditionss, extract DNA product, 100 DEG C of heating 10min make its sex change, add 0.5U nuclease P1 (Sigma Aldrich), add 0.5U CIP (NEB) after 37 DEG C of reaction 16h, then react 1.5h under 37 DEG C of conditions.Sample uses liquid chromatograph-mass spectrometer (LC-MS/MS) Shimadzu LC (LC-20AB pump) systems axiol-ogy.According to base standard substance (5-mC, 5-hmC, 5-fC, 5-caC and G standard substance) mass signal drawing standard curve, 5-hmC, 5-fC, the 5-caC and the G(that generate according to 5-mC in the linear equation calculation sample of standard curve fit and reaction due to bases G with C pairing and quantity is consistent, the total amount of the C of all modifications and unmodified in G representation DNA) per-cent.Containing 28 C and 2 5-mC in the DNA substrate of 58bp.

Result as shown in Figure 4.Because the TET1 protein content used is lower, its enzyme running water is flat lower, if utilize the enzyme work of TET1 truncated protein can suitably improve its consumption.

The brachymemma mode of other albumen of embodiment 7:TET protein family is summed up:

Because people TET1/TET2/TET3 and mouse Tet1/Tet2/Tet3 is very conservative in protein catalyst structure domain sequence, contriver designs according to its similarity and constructs other fragment of people TET1, TET2, TET3, and obtains the albumen with the stable uniform that hydroxymethylase is lived.Following table list the different fragments of the application preparation-obtained people TET1, TET2, TET3 and mouse Tet1 aminoacid sequence, express used carrier and bacterial strain and protein yield.

Below list the primer sequence that the application is used, for the primer sequence do not mentioned, those skilled in the art are not difficult to adopt known software design out according to the encoding sequence of known TET.

hTET1：

HTET1-1395 forward primer: SEQ ID NO:14

HTET1-1418 forward primer: SEQ ID NO:15

HTET1-2081 reverse primer: SEQ ID NO:16

HTET1-2136 reverse primer: SEQ ID NO:17

HTET1-1755 reverse primer: SEQ ID NO:18

HTET1-1772 reverse primer: SEQ ID NO:19

HTET1-1991 forward primer (1): SEQ ID NO:20

HTET1-1991 forward primer (2): SEQ ID NO:21

hTET2：

HTET2-1099 forward primer: SEQ ID NO:22

HTET2-1129 forward primer: SEQ ID NO:23

HTET2-1925 reverse primer: SEQ ID NO:24

HTET2-1936 reverse primer: SEQ ID NO:25

HTET2-1462 reverse primer: SEQ ID NO:26

HTET2-1481 reverse primer: SEQ ID NO:27

HTET2-1843 forward primer (1): SEQ ID NO:28

HTET2-1843 forward primer (2): SEQ ID NO:29

hTET3：

HTET3-663 forward primer: SEQ ID NO:30

HTET3-689 forward primer: SEQ ID NO:31

HTET3-717 forward primer: SEQ ID NO:32

HTET3-1596 reverse primer: SEQ ID NO:33

HTET3-1660 reverse primer: SEQ ID NO:34

HTET3-1042 reverse primer: SEQ ID NO:35

HTET3-1061 reverse primer: SEQ ID NO:36

HTET3-1500 forward primer: SEQ ID NO:37mTET1:

MTET1-1367 forward primer: SEQ ID NO:38

MTET1-2039 reverse primer: SEQ ID NO:39.

Claims

1. a peptide species, is selected from:

X－L－Y

In formula,

L represents joint sequence; With

(2) in the peptide sequence described in (1), there is one or several aminoacid insertion, replacement or deletion, but still retain the polypeptide of DNA methylolation enzymic activity.

2. polypeptide as claimed in claim 1, it is characterized in that, X is a-b-c, wherein, a is SEQ ID NO:2 the (n-1)th 128 amino acids residue, and n is the integer of 1 to 1128, and b is SEQ ID NO:2 1129-1448 amino acids residue, c is SEQ ID NO:2 1449-m amino acids residue, and wherein m is the integer between 1450 to 1489;

Y is d-e-f, wherein, d is the optional SEQ ID NO:2 g-1843 amino acids residue existed, e is SEQID NO:2 1844-1925 amino acids residue, f is SEQ ID NO:2 1926-h amino acids residue, wherein g is the integer between 1831 to 1842, and f is integer between 1927 to 2002.

3. a peptide species, is selected from:

(1) polypeptide that is shown below of aminoacid sequence:

X’－L－Y’

In formula,

L represents joint sequence; With

4. a peptide species, is selected from:

(1) polypeptide that is shown below of aminoacid sequence:

X”－L－Y”

In formula,

L represents joint sequence; With

5. the polypeptide according to any one of claim 1-4, is characterized in that, in described polypeptide:

A () X is SEQ ID NO:2 1099-1480 amino acids residue, Y is SEQ ID NO:2 1844-1936 amino acids residue;

B () X is SEQ ID NO:2 1099-1461 amino acids residue, Y is SEQ ID NO:2 1844-1925 amino acids residue;

C () X is SEQ ID NO:2 1129-1480 amino acids residue, Y is SEQ ID NO:2 1844-1936 amino acids residue;

D () X is SEQ ID NO:2 1129-1461 amino acids residue, Y is SEQ ID NO:2 1844-1925 amino acids residue;

E () X ' is SEQ ID NO:1 1395-1754 amino acids residue, Y ' is SEQ ID NO:1 1991-2136 amino acids residue;

F () X ' is SEQ ID NO:1 1418-1754 amino acids residue, Y ' is SEQ ID NO:1 1991-2136 amino acids residue;

G () X ' is SEQ ID NO:1 1418-1754 amino acids residue, Y ' is SEQ ID NO:1 1991-2081 amino acids residue;

H () X ' is SEQ ID NO:1 1418-1772 amino acids residue, Y ' is SEQ ID NO:1 1991-2081 amino acids residue;

(i) X " be SEQ ID NO:3 663-1041 amino acids residue, Y ' is SEQ ID NO:3 1501-1596 amino acids residue;

(j) X " be SEQ ID NO:3 663-1041 amino acids residue, Y ' is SEQ ID NO:3 1501-1660 amino acids residue;

K () X ' is SEQ ID NO:3 689-1060 amino acids residue, Y ' is SEQ ID NO:3 1501-1596 amino acids residue; Or

(l) X " be SEQ ID NO:3 717-1060 amino acids residue, Y ' is SEQ ID NO:3 1501-1596 amino acids residue.

6. the polypeptide according to any one of claim 1-5, is characterized in that, L is the joint sequence of long 2-20 the amino-acid residue containing G and S.

7. a polynucleotide sequence, is selected from:

(1) polynucleotide sequence of polypeptide according to any one of coding claim 1-6; With

(2) complementary sequence of the polynucleotide sequence described in (1).

8. a carrier, it contains polynucleotide sequence according to claim 7.

9. a host cell, it contains carrier according to claim 8.

10. the polypeptide according to any one of claim 1-6 is detecting the purposes that whether 5-hmC in DNA sequence dna exists, catalytic dna sequence 5-mC is oxidized and study in TET albumen Interaction with DNA mechanism.