CN109384833A - The TALE RVD of specific recognition methylation modifying DNA base and its application - Google Patents

The TALE RVD of specific recognition methylation modifying DNA base and its application Download PDF

Info

Publication number
CN109384833A
CN109384833A CN201710660240.8A CN201710660240A CN109384833A CN 109384833 A CN109384833 A CN 109384833A CN 201710660240 A CN201710660240 A CN 201710660240A CN 109384833 A CN109384833 A CN 109384833A
Authority
CN
China
Prior art keywords
tale
rvd
target sequence
fusion protein
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710660240.8A
Other languages
Chinese (zh)
Other versions
CN109384833B (en
Inventor
魏文胜
伊成器
张媛
郭生杰
朱晨旭
刘璐璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boya Series For (beijing) Biotechnology Co Ltd
Peking University
Original Assignee
Boya Series For (beijing) Biotechnology Co Ltd
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boya Series For (beijing) Biotechnology Co Ltd, Peking University filed Critical Boya Series For (beijing) Biotechnology Co Ltd
Priority to CN201710660240.8A priority Critical patent/CN109384833B/en
Publication of CN109384833A publication Critical patent/CN109384833A/en
Application granted granted Critical
Publication of CN109384833B publication Critical patent/CN109384833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/34Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase
    • C12Q1/44Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving hydrolase involving esterase
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/90Enzymes; Proenzymes
    • G01N2333/914Hydrolases (3)
    • G01N2333/916Hydrolases (3) acting on ester bonds (3.1), e.g. phosphatases (3.1.3), phospholipases C or phospholipases D (3.1.4)
    • G01N2333/922Ribonucleases (RNAses); Deoxyribonucleases (DNAses)

Abstract

The present invention identifies the RVD for 5mC, 5hmC and 6mA with identification preference, for these apparent modifications with different binding characteristics.The gene activation that methylation relies on, high efficiency gene group editor may be implemented using these RVD and to applications such as the targeting of 5hmC detections.Therefore the present invention provides isolated DNA combination polypeptide, fusion protein, polynucleotides, carrier and host cell comprising polynucleotides comprising TALE, and the application it includes the albumen of TALE repetitive structure domain in the reagent of the methylated base in preparation detection target gene target sequence, the present invention also provide the method for the target sequence of target gene in targeting combination cell simultaneously.

Description

The TALE RVD of specific recognition methylation modifying DNA base and its application
Invention field
The present invention relates to the technologies for using DNA binding protein to be adjusted, edit to gene and detecting.
Background technique
Transcriptional activation sample effector (transcription activator-like effector, TALE) is to come from The virulence factor of plant pathogen xanthomonas (Xanthomonas), can reprogram eukaryotic gene groups (reprogram)(1,2).TALE contains DNA binding domain, which is made of the tandem repeat unit of variable number (3).Each repetition includes the consensus sequence (consensus sequence) of 33-35 amino acid residue, in addition at the 12nd There are two the amino acid of alterable height (to repeat variable double residues, repeat-variable diresidues with the 13rd tool Or RVD) (4,5).TALE albumen is mediated the identification of DNA by tandem repeat unit, and tandem repeat unit targets core by its RVD Thuja acid combines DNA, RVD to determine polynucleotide (4,6) in a manner of sequence-specific.RVDs and DNA base are formed directly By modular DNA evident characteristics, TALE can be with functional domain, such as activating transcription factor for contact connect, sequence-specific The fusion such as (7,8), transcription inhibitory factor (9,10) or endonuclease (11,12), referred to as programmable gene editing tool.? Partial decoding of h (4,6) have been carried out to RVD-DNA recognition code using experimental method and calculation method in some research;It was found that most normal Four RVD (NI, NG, HD and NN) preferentially combine A, T, C and G/A (4,6) respectively.
Other than four kinds of conventional deoxyribonucleotides, mammalian genome also contains the DNA base of modification.Example Such as, 5-methylcytosine (5mC), it is known as the 5th kind of DNA base, is a kind of important apparent marker, adjusts gene It expresses (Figure 1A) (15,16).5mC can be aoxidized by 10-11 transposase (TET) family protein in succession, and it is phonetic to generate 5- methylol born of the same parents Pyridine (5hmC), 5- formyl cytimidine (5fC) and 5- carboxyl cytimidine (5caC), the latter two are thymidine DNA glycosylases Substrate, and finally revert to unmodified cytimidine (17,22).The cytimidine of 5hmC composition~1%-10% modification, and by It is considered a kind of stable apparent label;The dysregulation of 5hmC is frequently observed in cancer.
Other than the methylation modification on cytimidine, another common DNA methylation modifies N6- methyl adenine (N6- methyladenine, 6mA) as a kind of covalent modification on the adenine of DNA important work is played in prokaryotic cell With participation adjusts a variety of biological approaches, including as the one of restriction modification system (restriction-modification, RM) Exogenous DNA invasion is resisted in part, rise during DNA replication dna, mispairing reparation, genetic transcription and swivel base etc. regulating and controlling effect etc. (41, 47).And correlative study of the 6mA in eucaryote is fewer, while also not fully aware of to effect of the 6mA in epigenetic (46)。
Three articles on Cell magazine in 2015 are reported in the Eukaryotic genome such as chlamydomonas, nematode and drosophila In 6mA (42,43,48).Survey for 6- methylcystein position in chlamydomonas (Chlamydomonas reinhardtii) It is fixed, find there is 6mA distribution in the most gene of chlamydomonas, and majority is with the appearance of ApT double alkali yl mode;6mA exists simultaneously Transcription initiation site enrichment, related (42) are actively expressed to gene.And in drosophila (Drosophila melanogaster) and line It is important then to find that 6mA is likely to play in differentiation and growth course in the research of worm (Caenorhabditis elegans) Regulating and controlling effect (48).It is found that 6mA methylates, enzyme relevant with demethylation is conservative in evolution, so 6mA very may be used It can be also distributed in other eucaryotes (43).Until 2016, Koziol et al. confirms 6mA in vertebrate gene group In presence, which includes the different tissues of Africa xenopus (Xenopus laevis) and the tissues or cell of mouse and people System.6mA modification abundance in vertebrate is very low, research find it is different from chlamydomonas and drosophila, in Xenopus laevis and mouse genome 6mA is distributed widely in the region in addition to exon, while also showing certain sequence motifs rule, illustrates that 6mA modification exists There may be different functions (44) in different eucaryotes.Distribution of this epigenetic modification of 6mA in higher organism and its Effect and mechanism durings cell and ontogeny etc. need to be furtherd investigate.
TALE albumen, which has been reported, can identify the DNA base of modification (24-26).For example, having reported NG or N* (asterisk generation The missing of the 13rd amino acids of table) it can identify 5mC (25,27-31) in homologous dna;The combination of NG/N* and HD is used for 5mC/5hmC and C are distinguished into (32) in vitro detection.Nearest research also reported with it is truncated repeat ring (G*, S* and T* TALE albumen) can be with similar compatibility combination C, 5mC, 5hmC, 5fC and 5caC (33,34).It is compound in TALE-DNA In the crystal structure of object, RVD ring is contacted with DNA double chain major groove (duplex major groove), wherein first residue makes to close Suitable ring configuration keeps stablizing, and second residue carries out direct base specific contact (35,36).Currently, being identified for RVD The potential of 5mC, 5hmC and 6mA wait further to study.
Summary of the invention
The present invention, which is identified, has different knots with identification preference, for these apparent modifications for 5mC, 5hmC and 6mA Close the RVD of characteristic.Gene activation, the high efficiency gene group editor and to 5hmC that methylation relies on may be implemented using these RVD Targeting detection etc. application.
According to an aspect of the present invention, the DNA combination polypeptide of separation is provided, it includes TALE, the TALE includes choosing From one or more RVD below:
HA or NA, can be with specific recognition 5mC;
FS, can be with specific recognition 5hmC;
N*, NG or KP can identify both C and 5mC;
HV or KV can identify both C and 5hmC;
K* or RG can identify both 5mC and 5hmC;
G*, H*, R* or Y* can identify C, 5mC and 5hmC three;
NP, FT, CV or CP, can be with specific recognition 6mA;Or
RI, NI, KI or HI can identify both A and 6mA;
Wherein * indicates amino acid deletions in this position.
According to another aspect of the present invention, fusion protein is provided, includes functional domain and TALE, the TALE includes One or more RVD selected from the following:
HA or NA, can be with specific recognition 5mC;
FS, can be with specific recognition 5hmC;
N*, NG or KP can identify both C and 5mC;
HV or KV can identify both C and 5hmC;
K* or RG can identify both 5mC and 5hmC;
G*, H*, R* or Y* can identify C, 5mC and 5hmC three;
NP, FT, CV or CP, can be with specific recognition 6mA;Or
RI, NI, KI or HI can identify both A and 6mA;
Wherein * indicates amino acid deletions in this position.
In some embodiments, the functional domain is that functional domain, the epigenetic of adjusting gene expression are repaired Adorn functional domain, gene editing functional domain or fluorescin.
In some embodiments, the functional domain for adjusting gene expression is activating transcription factor, Transcription inhibition The factor or its functional fragment, the epigenetic modification functional domain are methylated transferase, demethylase or its function Energy property segment, the gene editing functional domain is nuclease or its functional fragment.
In some embodiments, the gene editing functional domain is endonuclease, preferably in FokI nucleic acid Domain is cut in enzyme cutting, the more preferably DNA of FokI endonuclease.
According to another aspect of the present invention, it provides and encodes above-mentioned DNA combination polypeptide or any of the above-described kind of fusion protein Polynucleotides.
According to another aspect of the present invention, the carrier comprising above-mentioned polynucleotides is provided.
According to another aspect of the present invention, host cell is provided, it includes above-mentioned polynucleotides or above-mentioned carriers.
According to another aspect of the present invention, the albumen comprising TALE repetitive structure domain detects target gene target sequence in preparation Application in the reagent of methylated base in column, comprising:
(1) methylated base 5mC of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence Reagent in application, one or more RVD of the TALE repetitive structure domain are HA or NA;
(2) methylated base 5hmC of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence Reagent in application, one or more RVD of the TALE repetitive structure domain are FS;Or
(3) methylated base 6mA of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence Reagent in application, one or more RVD of the repetitive structure domain of the TALE are NP, FT, CV or CP.
According to another aspect of the present invention, above-mentioned DNA combination polypeptide, any of the above-described kind of fusion protein, above-mentioned multicore are provided Thuja acid, above-mentioned carrier or above-mentioned host cell the answering in the reagent of the target sequence of target gene in preparation targeting combination cell With.
According to another aspect of the present invention, any of the above-described kind of fusion protein is provided or encodes the polynucleotides of the fusion protein The application in cell in the reagent of target gene expression is adjusted in preparation, wherein the functional domain for including in the fusion protein It is the functional domain for adjusting gene expression.
In some embodiments, the functional domain for adjusting gene expression is activating transcription factor or it is functional Segment or transcription inhibitory factor or its functional fragment.
According to another aspect of the present invention, any of the above-described kind of fusion protein is provided or encodes the polynucleotides of the fusion protein Application in the reagent that preparation carries out gene editing to target gene in cell, wherein the function of including in the fusion protein Structural domain is gene editing functional domain.
In some embodiments, the gene editing is nucleic acid cutting, and the gene editing functional domain is nucleic acid Enzyme or its functional fragment, preferably endonuclease or its functional fragment, more preferably FokI endonuclease or its DNA Cut domain.
According to another aspect of the present invention, any of the above-described kind of fusion protein is provided or encodes the polynucleotides of the fusion protein Application in the reagent that preparation carries out epigenetic modification to target gene in cell, wherein including in the fusion protein Functional domain is epigenetic modification functional domain.
In some embodiments, the epigenetic modification functional domain is methylated transferase, demethylase Or its functional fragment.
According to another aspect of the present invention, the method for the target sequence of target gene in targeting combination cell is provided, comprising: will Above-mentioned DNA combination polypeptide, any of the above-described kind of fusion protein or above-mentioned polynucleotides introduce cell, make the DNA combination polypeptide Or the TALE in fusion protein is in conjunction with the target sequence of target gene.
In some embodiments, in the above-mentioned methods:
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from HA or NA, only when the target base In the target sequence of cause when being 5mC on the recognition site of the RVD, the TALE in the DNA combination polypeptide or fusion protein is In conjunction with the target sequence of target gene;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from FS, only when the target gene TALE ability and mesh in target sequence when being 5hmC on the recognition site of the RVD, in the DNA combination polypeptide or fusion protein The target sequence for marking gene combines;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from NP, FT, CV or CP, only when described In the target sequence of target gene when being 6mA on the recognition site of the RVD, in the DNA combination polypeptide or fusion protein TALE is just in conjunction with the target sequence of target gene;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from N*, NG or KP, the target gene Target sequence in particular bases on the recognition site of the RVD methylation state it is uncertain, it may be possible to C or 5mC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from HV or KV, the target of the target gene The methylation state of particular bases in sequence on the recognition site of the RVD is uncertain, it may be possible to C or 5hmC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from K* or RG, the target of the target gene The methylation state of particular bases in sequence on the recognition site of the RVD is uncertain, it may be possible to 5mC or 5hmC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from G*, H*, R* or Y*, the target base The methylation states of particular bases in the target sequence of cause on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from RI, NI, KI or HI, the target base The methylation state of particular bases in the target sequence of cause on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
According to another aspect of the present invention, the method for adjusting target gene expression in cell is provided, comprising: by above-mentioned A kind of fusion protein or the polynucleotides of encoding said fusion protein introduce cell, make TALE in the fusion protein with The target sequence of target gene combines, so that the expression of target gene is fused the adjusting of the functional domain in albumen, wherein The functional domain is the functional domain for adjusting gene expression.
In some embodiments, in the above-mentioned methods:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene When being 5mC on the recognition site of the RVD, TALE in the fusion protein just in conjunction with the target sequence of target gene,;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in institute It states on the recognition site of RVD when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target when the target gene In sequence when being 6mA on the recognition site of the RVD, the target sequence knot of TALE ability and target gene in the fusion protein It closes;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
In some embodiments, in the above-mentioned methods, the functional domain for adjusting gene expression is transcriptional activation The factor or its functional fragment or transcription inhibitory factor or its functional fragment.
According to another aspect of the present invention, the method for carrying out gene editing to the target gene in cell is provided, comprising: Any of the above-described kind of fusion protein or the polynucleotides of encoding said fusion protein are introduced into cell, made in the fusion protein TALE in conjunction with the target sequence of target gene compiled so that target gene is fused the functional domain in albumen Volume, wherein the functional domain is gene editing functional domain.
In some embodiments, in the above-mentioned methods:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene When being 5mC on the recognition site of the RVD, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in institute It states on the recognition site of RVD when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target when the target gene In sequence when being 6mA on the recognition site of the RVD, the target sequence knot of TALE ability and target gene in the fusion protein It closes;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
In some embodiments, in the above-mentioned methods, the gene editing is nucleic acid cutting, the gene editing function Structural domain is nuclease or its functional fragment, preferably endonuclease or its functional fragment, more preferably FokI nucleic acid Restriction endonuclease or its DNA cut domain.
According to another aspect of the present invention, the method for carrying out epigenetic modification to the target gene in cell is provided, Include: that the fusion protein of any one of claim 2-5 or the polynucleotides of encoding said fusion protein are introduced into cell, makes TALE in the fusion protein is in conjunction with the target sequence of target gene, so that target gene is fused the function in albumen Structural domain carries out epigenetic modification, wherein the functional domain is epigenetic modification functional domain.
In some embodiments, in the above-mentioned methods:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene When being 5mC on the recognition site of the RVD, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in institute It states on the recognition site of RVD when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target when the target gene In sequence when being 6mA on the recognition site of the RVD, the target sequence knot of TALE ability and target gene in the fusion protein It closes;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
In some embodiments, in the above-mentioned methods, the epigenetic modification functional domain is methylation transfer Enzyme, demethylase or its functional fragment.
According to another aspect of the present invention, living cells chromosomal marker method is provided, comprising: merge any of the above-described kind Albumen or the polynucleotides of encoding said fusion protein introduce cell, make the TALE and target gene in the fusion protein Target sequence combine, wherein the functional domain is fluorescin, pass through the TALE and target gene in the fusion protein Target sequence be implemented in combination with the fluorescent marker to target sequence.
In some embodiments, in the above-mentioned methods:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene When being 5mC on the recognition site of the RVD, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in institute It states on the recognition site of RVD when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target when the target gene In sequence when being 6mA on the recognition site of the RVD, the target sequence knot of TALE ability and target gene in the fusion protein It closes;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in the target sequence of the target gene The methylation state of particular bases on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
According to another aspect of the present invention, it provides in detection cellular genome and whether is deposited on the specific site of target sequence In the method for 5mC, comprising:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the spy in the TALE The RVD of anchor point is HA or NA;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 5mC;Such as Fruit target sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not combine the target sequence It arranges and cuts, there are 5mC on the specific site;If target sequence is cut, the TALE is not associated with the target sequence Column, the nuclease combine the target sequence and cut, and 5mC is not present on the specific site.
According to another aspect of the present invention, it provides in detection cellular genome and whether is deposited on the specific site of target sequence In the method for 5hmC, comprising the following steps:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the spy in the TALE The RVD of anchor point is FS;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 5hmC;Such as Fruit target sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not combine the target sequence It arranges and cuts, there are 5hmC on the specific site;If target sequence is cut, the TALE is not associated with the target sequence Column, the nuclease combine the target sequence and cut, and 5hmC is not present on the specific site.
According to another aspect of the present invention, it provides in detection cellular genome and whether is deposited on the specific site of target sequence In the method for 6mA, comprising:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the spy in the TALE The RVD of anchor point is NP, FT, CV or CP;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 6mA;Such as Fruit target sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not combine the target sequence It arranges and cuts, there are 6mA on the specific site;If target sequence is cut, the TALE is not associated with the target sequence Column, the nuclease combine the target sequence and cut, and 6mA is not present on the specific site.
In some embodiments, the nuclease is endonuclease.
In some embodiments, the nuclease is Cas9 nuclease, and by the Cas9 nuclease in step (1) Cell is collectively incorporated into sgRNA.
Detailed description of the invention
Fig. 1 is the screening schematic diagram for assessing the cytimidine of all potential TALE RVDs identification modifications.(a)C,5mC With the chemical structure of 5hmC.(b) cytimidine for modification screens the system schematic of new RVD, by TALE activator and GFP Express reporter dna segment composition.(c) when the TALE of customization does not combine reporter dna segment (left figure), such as TALE- (E*)3It is right 5mC reporter dna segment, GFP expression are in baseline level (right figure);On the contrary, when TALE and reporter dna segment are combined closely (left figure), such as TALE- (G*)3To 5mC reporter dna segment, GFP expression up-regulation (right figure).MCherry intensity indicates TALE- (XX')3The transfection efficiency of plasmid.
Fig. 2 shows the preparation of the reporter dna segment containing 5mC and 5hmC.5mC and 5hmC is incorporated into be contained for generating In the primer of the reporter dna segment of 5mC and 5hmC.HPLC chromatogram shows the incorporation of (a) 5mC He (b) 5hmC;It can from enlarged drawing To be clearly observed the peak of 5hmC.(c) schematic diagram of the PCR amplification of the reporter dna segment containing 5mC and 5hmC.
Fig. 3 shows the complete assessment to TALE RVD for the efficiency of 5mC and 5hmC and specificity.
(a) garbled data of 5mC and 5hmC are summarized with thermal map.Wherein for the ease of comparing, conventional C and T is also shown The result of reporter dna segment.The identity that reporter dna segment is indicated using different colours, it is living to the EGFP of different reporter dna segments Property encoded, the brightness of color indicates TALE construct to the fold induction of reporter dna segment, relative to baseline level standard Change.Wherein use the one-letter abbreviations of amino acid.
(b) from the results of preliminary screening of a figure, picking some couples of 5mC and 5hmC has the RVD of recognition capability to test As a result, be specifically exactly selected for 5mC and 5hmC reporting system EGFP activate the biggish some RVD of multiple do 3 times Duplicate confirmation experiment, the figure illustrates RVD to the preference of the cytimidine of modification.Divide RVD to group according to base preference, and every It is grouped in a group according to the 13rd base.Data are shown as average value ± SD, n=3;* P < 0.005 P < 0.05, * *.
Fig. 4 shows 420 TALE RVD to the combination Preference of the cytimidine of modification.The data correspond to thermal map (figure 3a).Y-axis is the fold induction of EGFP reporter, and X-axis is RVD.Classified according to the first of RVD residue to column figure, according to second The lexicographic order of a residue lists data.
Fig. 5 shows the quantitative measurment by protecting measurement to carry out the DNA identification of TALE RVDs in vitro.
(a) principle of external protection measurement.In simple terms, TALE albumen (the TAL effect protein i.e. in figure) and specific sequence The DNA fragmentation combination of column can be such that MspI restriction endonuclease sites are closed, to inhibit the cutting of endonuclease, and lead Cause the DNA band that protected overall length band and cutting are generated in denaturation PAGE analysis.The protection efficiency of DNA is reflected The joint efficiency of TALE albumen and DNA.
(b) standardized protection efficiency is obtained by measuring the segment of uncut or protection DNA, is fitted to not With the protection curve of TALE RVD.The curve matching is specifically bound into curve using Hill slope (GraphPad).Institute There is experiment all in triplicate.
(c) inhibition constant calculated by (b), ratio of each constant relative to the minimum inhibition constant of the same RVD It is shown in bracket.The inhibition constant of RVD is that the cutting of C, 5mC and 5hmC are protected by the inclusion of the TALE albumen of different RVD Shield test obtains protection efficiency, and is fitted using 6 software of GraphPad Prism to protection efficiency curve and calculates inhibition What constant obtained, which characterizes different RVD to the joint efficiency of C, 5mC and 5hmC, and the numerical value of inhibition constant is smaller Illustrate that the protection efficiency of the RVD is stronger, and the combination of corresponding DNA fragmentation is stronger.The minimum suppression of the same RVD mentioned here Constant processed refers to the inhibition constant value of the RVD Yu highest one group of C, 5mC and 5hmC joint efficiency.
During Fig. 6 shows that protection measures in vitro, specific binding of the different TALE RVD to apparent cytimidine.
(a) the representative dimensions exclusion chromatography of the TALE protein purified.
(b) SDS-PAGE is analysis shows that the molecular weight of the TALE albumen of purifying is related to the molecular weight of calculating well.
(c) the representative gel images of external protection measurement.From fig. 6 it can be seen that MAPK6-HD can be with most efficiently Rate protects C, and HA protects 5mC and 5hmC, FS to protect with highest efficiency with efficiency higher for unmodified C 5hmC。
Fig. 7 shows the activation of gene expression and gene editing of methylation dependence.
(a)TALETET1Target the 16bpDNA sequence at the upstream about 80bp of the transcription initiation site (TSS) of TET1 gene. All three CpG (C therein is indicated with black) in the region are high methylations in HeLa cell, but It is unmethylated in HEK293T cell.
(b) with the TALE for containing different RVDTET1The opposite mRNA water of TET1 in HeLa the and HEK293T cell of transfection It is flat.
(c)TALELRP2Target the 16bp sequence at the upstream the TSS 100bp of LRP2 gene.Two in the two regions CpG, containing the methylation of medium level, and is unmethylated in HEK293T cell in HeLa cell.
(d) with the TALE for containing different RVDLRP2The opposite mRNA water of LRP2 in HeLa the and HEK293T cell of transfection It is flat.
(e) (Transcription activator-like effector nuclease, that is, merged FokI to TALEN The TALE effector of endonuclease) targeting sequence position.The CpG of methylation is indicated with red.
(f) using the gene editing efficiency of the TALEN of different RVD.Data are average value ± SD, n=3;* P < 0.05, * * P <0.005。
Fig. 8 shows the activation of gene expression and genome editor that methylation relies on.
(A) with the TALE with RVD NA, G* and Y*TET1TET1's is opposite in HeLa the and HEK293T cell of transfection MRNA level in-site.
(B) with the TALE with RVD NA, G* and Y*LRP2LRP2's is opposite in HeLa the and HEK293T cell of transfection MRNA level in-site.
(C) the genome editorial efficiency of the TALEN with RVD NA, G* and Y*.Data are average value ± SD, n=3;*P< 0.05, * P < 0.005 *.
Fig. 9, which is shown, detects 5hmC in genomic DNA with single base resolution ratio.
(a) RVD newly identified is with the workflow of base class resolution ratio detection 5hmC.In simple terms, targeted genome region It is protected by TALE, the DNA cutting mediated from Cas9-.
(b) protection of the TALE-FS (black) and TALE-HD (grey) in the single site 5hmC in mESC genome are targeted Efficiency.
(c) TALE-FS is for the single 5hmC in the genomic DNA of mESC, RAW264.7, L-M (TK-) and L929 cell The protection efficiency in site.This on anchor point, in all cell lines, mESC genome contains highest 5hmC modification water It is flat.
Figure 10 shows selective protection of the TALE-FS to the DNA containing 5hmC.Contain 5mC, 5hmC and unmodified C DNA (with MAPK6 gene have identical sequence) in different proportions in pairs mixing.When the fraction of 5mC (gray circles) increases When, protection efficiency is only slight to be improved.It (is mixed with C and 5mC, black circles and black triangles) when the fraction of 5hmC increases, Protection efficiency greatlys improve, this indicates selective protection of the RVD FS to 5hmC.
Figure 11 shows TALE- (XX ')3For the binding characteristic of 6mA and A.
Figure 12 shows part TALE- (XX ')3For the binding characteristic of 6mA and A.Second Amino acid score according to RVD Group, every group of RVD are ranked up from low to high by 6mA reporting system activation efficiency;The longitudinal axis is the activation to reporting system EGFP Multiple, grey corresponding A reporting system, black correspond to 6mA reporting system, horizontal axis RVD;6mA average value is big after only showing repetition In 5 data group.Data are means ± s.d., n=3.
Figure 13 shows different RVD for the recognition efficiency of A, T, C and G reporting system.
Specific embodiment
The invention shows the combinations of TALE albumen and DNA to be influenced by DNA base modification.The present invention passes through to 420 RVD Research, authenticated to 5mC, 5hmC and/or 6mA have unique specificity RVD.5mC, 5hmC and 6mA are Higher eukaryotics Important apparent marker in biology.Methylation and methylolation group do not interfere base pairing;But they are present in DNA double chain In major groove, this will affect the interaction of they and TALE albumen.
The structure of TALE-DNA compound shows that the 13rd amino acid is directly to interact with the DNA base of positive-sense strand Unique residue, and the effect of the 12nd residue be make in base-pair identification process appropriate ring conformation keep stablizing (35, 36).Present invention demonstrates that being above small amino acid (Gly and Ala) or missing at the 13rd, can increase to the affine of 5mC Property.This observation result and discovery before, i.e. N* and NG (naturally identifying T) can be consistent in conjunction with 5mC.It is possible that the 13rd Big side chain on position lacks the methyl group that can produce enough space 5mC.But this general trend exists Exception.For example, being also observed in the present invention, HG is very weak to the compatibility of 5mC, and HG is compared with HD at the 13rd containing lesser Residue, HD are the natural binders of C.It is interesting that (thus become RG) when the 12nd His is replaced by Arg, observe with The strong combination of 5mC.In fact, RG also identifies 5hmC.These observation indicate that identification of double residues to modification there may be more For complicated mode.
Present invention demonstrates that the gene that the methylation for the genome area to several high methylations that TALE- is mediated relies on Activation and genome editor.As important control, when identical region lacks cytosine methylation (in different cells In), it is barely perceivable gene activation.Therefore, present invention discover that RVD can provide a possibility that such: according to target gene Decorating state in vivo manipulates target gene.It is known there are many differential methylation regions (DMR), they are related to many important Biological event, including Genomic Imprinting and disease.Therefore, the unique ability that TALE albumen reads apparent marker makes future The application of the apparent gene group dependence of TALE in vivo is possibly realized.
Term as used herein " polynucleotides " refers to linear or cyclic conformation and single-stranded or double-stranded form deoxidation core Ribotide or ribonucleotide polymer.
In the present invention, term " polypeptide ", " peptide " and " albumen " be may be used interchangeably, and indicate the polymer of amino acid, wherein One or more amino acid can be naturally occurring amino acid or its chemical analog or modified derivative.
" in conjunction with " as described herein refers between macromolecular the sequence-specific of (for example, between protein and nucleic acid), non- Covalent interaction.Term as used herein " in conjunction with polypeptide " is the polypeptide or egg for capableing of another molecule of Non-covalent binding It is white.Another molecule can be DNA molecular, RNA molecule and/or protein molecular.
Term " TALE " used in the present invention refers to activating transcription factor sample effector (Transcription Activator-like Effectors), it includes that (alternatively referred to as TALE repetitive structure domain or TALE repeat single DNA binding domain Member) and its two sides the end N- and the end C- non repetitive sequence, can be with specific recognition DNA sequence dna.The DNA binding domain is by going here and there Join " repetitive unit " composition.Each " repetitive unit " includes 33-35 amino acid, wherein the 12nd and 13 residue is targets identification Critical sites, referred to as variable double residues (repeat-variable diresidues or RVD) are repeated, each RVD is only It can identify a base.TALE or its DNA binding domain are identified and the RVD corresponding DNA target sequence in order by RVD.
Naturally occurring TALE generally comprises 1.5~33.5 repetitive units, but existing research shows for the effective of DNA Identification and combine and usually require at least 6.5 repetitive units, and 10.5 or more repetitive unit can show stronger work Property (Boch, Jens, and Ulla Bonas. " Xanthomonas AvrBs3family-type III effectors: discovery and function."Annual review of phytopathology 48(2010):419-436.; Boch,Jens,et al."Breaking the code of DNA binding specificity of TAL-type III effectors."Science 326.5959(2009):1509-1512.)
TALE repetitive unit can be truncated repetitive unit, be referred to as half repetitive unit, i.e., it is repeated to be complete The a part at the end N- of unit, the truncated repetitive unit include RVD.In general, natural TALE repetitive structure domain c-terminus is most Repetitive unit is truncated repetitive unit afterwards.Half repetitive unit generally comprises 17-20 amino acid.
In the present invention, in some embodiments, the repetitive unit of TALE can be 6,7,8,9,10,11,12,13,14, 15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34 or 35.The repetition of TALE Unit may include 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28, 29,30,31,32,33 or 34 complete repetitive units and one and half repetitive units.
In preferred embodiments, TALE includes 14 complete repetitive units and 1 half repetitive unit, wherein half repeats Unit is located at the c-terminus of entire TALE repetitive unit.
In preferred embodiments, single " repetitive unit " in TALE can be LTPEQVVAIASXX’ GGKQALETVQRLLPVLCQAHG.In some embodiments, half repetitive unit sequence in TALE is LTPEQVVAIASXX’ GGKQ.Wherein XX ' is RVD.
It is one of Xanthomonas (xanthomonas) that TALE repetitive unit sequence is used in the embodiment of the present invention AvrBs3 protein amino acid sequence.In addition to the sequence, RVD of the invention be may be equally applied to containing other repetitive unit sequences In TALE.AvrBs3 has different homologues in the subspecies of different xanthomonas, and particular sequence can be found in article “Boch,Jens,and Ulla Bonas."Xanthomonas AvrBs3family-type III effectors: discovery and function."Annual review of phytopathology 48(2010):419-436.”。
In the present invention, the amino acid in polypeptide sequence is shown with one-letter abbreviations, the present invention involved in amino acid and its One-letter abbreviations are as follows:
Glycine Glycine Gly G
Alanine Alanine Ala A
Valine Valine Val V
Leucine Leucine Leu L
Isoleucine Isoleucine Ile I
Swollen propylhomoserin Proline Pro P
Phenylpropyl alcohol atmosphere acid Phenylalanine Phe F
Tyrosine Tyrosine Tyr Y
Tryptophan Tryptophan Trp W
Serine Serine Ser S
Threonine Threonine Thr T
Cysteine Cystine Cys C
Methionine Methionine Met M
Asparagine Asparagine Asn N
Glutamine Glutamine Gln Q
Aspartic acid Asparticacid Asp D
Glutamic acid Glutamicacid Glu E
Lysine Lysine Lys K
Arginine Arginine Arg R
Histidine tidine His H
In the present invention, when describing RVD, * indicates amino acid deletions on the position.
In the present invention, " base " and " nucleotide " is used interchangeably, and refers to one kind by purine base or pyrimidine base, ribose or deoxidation The compound of ribose and tricresyl phosphate kind material composition is the main constituents of DNA sequence dna and RNA sequence.Common deoxidation Nucleotide includes cytimidine (C), thymidine (T), adenine (A) and guanine (G).
Other than above-mentioned four kinds conventional deoxyribonucleotides, mammalian genome also contains the DNA base of modification. For example, 5-methylcytosine (5mC), it is known as the 5th kind of DNA base, is a kind of important apparent marker, adjusts base Because of expression.5mC can be aoxidized by 10-11 transposase (TET) family protein in succession, generate 5-hydroxymethyl cytosine (5hmC), 5- Formyl cytimidine (5fC) and 5- carboxyl cytimidine (5caC).It is another common other than the methylation modification on cytimidine DNA methylation modifies N6- methyl adenine (N6- methyladenine, 6mA) it is covalently repaired as one kind on the adenine of DNA Decorations play a significant role in prokaryotic cell.
Heretofore described " methylation modified base " refers to the base with methylation modification, including 5- methylation born of the same parents Pyrimidine (5mC), 5- methylolation cytimidine (5hmC) and 6- methylates adenine (6mA).
Present invention finds the RVD to 5mC, 5hmC, 6mA with specific recognition effect, and can recognize these methyl Change modified base and accordingly the degeneracy RVD without modified base, referring specifically to following table:
Identify base RVD
5mC HA,NA
5hmC FS
C,5mC N*,NG,KP
C,5hmC HV,KV
5mC,5hmC K*,RG
C,5mC,5hmC G*,H*,R*,Y*
6mA NP,FT,CV,CP
6mA,A RI,NI,KI,HI
According to upper table, RVD HA or NA can be with specific recognition 5mC, it can distinguish 5mC and 5hmC and C;RVD FS can be with specific recognition 5hmC, it can distinguishes 5hmC and 5mC and C;RVD NP, FT, CV or CP can specificity knowledges Other 6mA, it can distinguish 6mA and A;Degeneracy RVD N*, NG or KP can identify C and 5mC;Degeneracy.
In the present invention, as context, when referring to " C ", refers to nothing when describing the base of RVD identification without special instruction Methylate the cytimidine modified;When referring to " A ", refer to the adenine without methylation modification;When referring to " 5mC ", refer to 5- Methylated cytosine;When referring to " 5hmC ", refer to 5- methylolation cytimidine;When referring to " 6mA ", refer to 6- methylation gland Purine.
According to the present invention, " specific recognition " some modified base that specifically methylates refers to RVD to the specific methyl The binding affinity for changing modified base is significantly greater than the identical base that there are other forms to modify, or is greater than literalness identical Base, or it is greater than other different bases.
Binding affinity can be measured by various methods well known to those skilled in the art, such as can be by such as following TALE-VP64-mCherry construct is constructed described in bibliography, and constructing includes different modifying base and fluorescence protein gene Reporter dna segment, the TALE-VP64 protein binding expressed in cell using TALE-VP64-mCherry construct simultaneously swashed Fluorescin signal caused by reporter dna segment living increase multiple determine in TALE in contained RVD and reporter dna segment it is contained not With the binding affinity of modified base.When RVD is apparently higher than other shapes for the EGFP activation multiple of certain specific modification base When the EGFP of formula base activates multiple, it is believed that RVD can be with the specific recognition specific modification base.Binding affinity may be used also To be determined by the external protection measurement as described in the embodiment of the present invention 4.
According to upper table, present invention discover that RVD HA or NA can be with specific recognition 5mC.The combination of RVD HA or NA and 5mC Affinity is apparently higher than 5hmC and C, can be distinguished 5mC and 5hmC and C using the RVD, realizes the specificity of TALE and 5mC In conjunction with, and realize the various concrete applications for depending on 5mC.
Various concrete applications dependent on 5mC include but is not limited to the detection to 5mC in gene, the gene dependent on 5mC Expression regulation, gene editing, epigenetic modification etc. (i.e. only in target sequence there are in the case where 5mC without gene expression Regulation, gene editing or epigenetic modification carry out gene expression regulation, base in the case where on corresponding position being C or 5hmC Because of editor or epigenetic modification), the living cells chromosomal marker dependent on 5mC is (i.e. only in marker chromosome in corresponding position Gene with 5mC does not mark if corresponding position is C or 5hmC, it is possible thereby to observe the cytimidine of gene in living cells Methylation status), preparation can specifically bind the albumen of the sequence containing 5mC.
It has also been found that RVD FS can be with specific recognition 5hmC.The binding affinity of RVD FS and 5hmC are apparently higher than 5mC and C can be distinguished 5hmC and 5mC and C using the RVD, realize TALE and 5hmC specific binding, and realize according to Rely the various concrete applications in 5hmC.
Various concrete applications dependent on 5hmC include but is not limited to the detection to 5hmC in gene, the base dependent on 5hmC Because expression regulation, gene editing, epigenetic modification etc. (i.e. only in target sequence there are in the case where 5hmC without gene table Up to regulation, gene editing or epigenetic modification, gene expression regulation, base are carried out in the case where on corresponding position being C or 5mC Because of editor or epigenetic modification), the living cells chromosomal marker dependent on 5hmC is (i.e. only in marker chromosome in corresponding position Gene with 5hmC does not mark if corresponding position is C or 5mC, it is possible thereby to observe the cytimidine of gene in living cells Methylation status), preparation can specifically bind the albumen of the sequence containing 5hmC.
It has also been found that RVD NP, FT, CV or CP can be with specific recognition 6mA.The combination of these RVD and 6mA is affine Power is apparently higher than A, can be distinguished 6mA and A using these RVD, realize TALE and 6mA specific binding, and realize according to Rely the various concrete applications in 6mA.
Various concrete applications dependent on 6mA include but is not limited to the detection to 6mA in gene, the gene dependent on 6mA Expression regulation, gene editing, epigenetic modification etc. (i.e. only in target sequence there are in the case where 6mA without gene expression Regulation, gene editing or epigenetic modification carry out gene expression regulation, gene editing in the case where on corresponding position being A Or epigenetic modification), the living cells chromosomal marker dependent on 6mA (only has 6mA in corresponding position in marker chromosome Gene do not marked if corresponding position is A, it is possible thereby to observe the cytosine methylation situation of gene in living cells), Preparation can specifically bind the albumen of the sequence containing 6mA.
It has also been found that degeneracy RVD N*, NG or KP can identify C and 5mC.These degeneracys RVD is with similar combination parent With power combination C and 5mC, and these degeneracys RVD is apparently higher than 5hmC to the binding affinity of C and 5mC.
It has also been found that degeneracy RVD HV or KV can identify C and 5hmC.These degeneracys RVD is affine with similar combination Power combination C and 5hmC, and these degeneracys RVD is apparently higher than 5mC to the binding affinity of C and 5hmC.
It has also been found that degeneracy RVD K* or RG can identify 5mC and 5hmC.These degeneracys RVD is with similar combination parent With power combination 5mC and 5hmC, and these degeneracys RVD is apparently higher than 5mC to the binding affinity of 5mC and 5hmC.
It has also been found that degeneracy RVD G*, H*, R* or Y* can identify C, 5mC and 5hmC.These degeneracys RVD is with similar Binding affinity combination C, 5mC and 5hmC.
It has also been found that degeneracy RVD RI, NI, KI or HI can identify 6mA and A.These degeneracys RVD is with similar knot Close affinity combination A and 6mA.
These degeneracys RVD can identify two or three of different methylation modification or the base without methylation modification simultaneously, They can be used under conditions of being unaware of base methylation modification situation, improved the targeting joint efficiency of TALE, reduced first Baseization modifies the influence to TALE in conjunction with target sequence.Such as the 5mC in cellular genome can be in the catalysis of TET family protein Under be oxidized into 5hmC, using can identify that the degeneracy RVD of 5mC and 5hmC can be avoided cytosine methylation type not simultaneously The problems such as joint efficiency caused by together reduces.Therefore, can be according to different experiment purposes in specific experiment, it can be to specificity The RVD of a certain methylation modified base of identification, the three kinds of first of degeneracy RVD and identification for identifying two of them methylation pattern base The degeneracy RVD of base form base is combined use, to meet specific experiment demand.
RVD of the invention can be used for any need and combine in application with the base of specific methylation pattern, these applications It can be and in vitro carry out or carry out in vivo, these applications can be non-therapeutic.
TALE containing RVD of the invention can be expressed as DNA combination polypeptide, have specific methylation shape to combine The base of formula.In some cases, this DNA combination polypeptide can play the role of " antibody ", to combine its " antigen " (i.e. The target sequence of base containing specific methylation pattern).In some cases, this DNA combination polypeptide can with containing specific The target sequence of the base of methylation pattern combines, the cutting for protecting it from nuclease or the not polypeptide in conjunction with other DNA (such as transcription regulaton factor etc.) interacts.
TALE containing RVD of the invention can also be coupled to form fusion protein with fluorescin, utilize the fusion protein The target sequence for the base for containing specific methylation pattern on chromosome in living cells can be combined, so as in living cells The dynamic change of middle observation chromosome.
Fluorescin be it is well known to those skilled in the art, including but not limited to green fluorescent protein (GFP), enhancing green Fluorescin (EGFP), red fluorescent protein (RFP) or blue fluorescent protein (BFP) etc..
TALE containing RVD of the invention can also be coupled to form fusion protein with functional domain, utilize the fusion egg The white operation that the target gene to the base containing specific methylation pattern may be implemented.The operation can be gene editing, adjust Gene expression or epigenetic modification etc. are saved, the functional domain can be gene editing functional domain, adjust gene table The structural domain or epigenetic modification structural domain reached.
Term " gene editing " refers to the gene order changed on target site, insertion, deletion or replacement including gene.Example Such as, the gene editing can be carries out DNA double chain cutting, forms DNA single stranded gaps etc. using nuclease to target site, then Insertion or the deletion (insertion of DNA can be generated during the non-homologous end joining (NHEJ) of DNA sequence dna is repaired And deletion, indel), frameshift mutation is caused, to achieve the purpose that gene knockout.Gene editing functional domain is Refer to realize the amino acid sequence of gene editing function.
Gene editing is carried out using the fusion protein of the TALE containing RVD of the invention and gene editing functional domain When, the gene editing functional domain can be nuclease.Nuclease includes but is not limited to endonuclease, Zinc finger nuclease (ZFN), Cas9 nuclease.The application of Cas9 nuclease be it is well-known in the art, using being usually by Cas9 nuclease It is collectively incorporated into cell with sgRNA, to realize the cutting to target sequence.
In the present invention, when carrying out gene editing, the preferably described fusion protein can be provided in the form of TALEN, at this time Gene editing functional domain is the DNA cutting domain of FokI endonuclease.
Term " adjusting gene expression " refers to the level of the expression or RNA molecule that change gene, including non-coding RNA and volume The RNA of the one or more protein of code or protein subunit." adjusting gene expression " further includes changing one or more genes to produce The activity of object, protein or protein subunit.The functional domain for adjusting gene expression refers to adjust expression of target gene Amino acid sequence.
The functional domain for adjusting gene expression can be activating transcription factor or its functional fragment, or transcription Inhibiting factor or its functional fragment.
Term " epigenetic modification " refers to the modification in the case where not changing the DNA sequence dna of target gene, for DNA, Including DNA methylation modification, DNA demethylation etc..Epigenetic modification functional domain is referred to target gene carry out table See the amino acid sequence of genetic modification.
The epigenetic modification functional domain can be methylated transferase, demethylase.
Term " functional fragment " is a part that its sequence is full length protein or polypeptide, however with full length protein or Polypeptide protein with the same function or polypeptide, such as can be the egg that corresponding function can be played under specific condition of experiment White structural domain, such as the cutting function domain of nucleic acid cleaving enzymatic.
Cell as described herein can be any cell or cell line, can be plant, animal (for example, mammal example Such as mouse, rat, primate, domestic animal, rabbit), the cells such as fish, can also be eukaryon (for example, yeast, plant, fungi, The mammalian cell of fish and such as cat, dog, mouse, ox, sheep and pig) cell.
It is thin that cell as described herein can be egg mother cell, K562 cell, CHO (Chinese hamster ovary) cell, HEP-G2 Born of the same parents, BaF-3 cell, Schneider cell, COS cell (MK cells of expression SV40T- antigen), CV-1 cell, HuTu80 Cell, NTERA2 cell, NB4 cell, HL-60 cell and HeLa cell, HEK293T cell etc..
The method of technical solution any one of of the invention can carry out in vitro or in vivo.
The method of technical solution any one of of the invention can be non-therapeutic.
1 materials and methods of embodiment
1, DNA synthesis and purifying
Oligo DNA primer synthesizes on 8909 DNA/RNA synthesizer of Expedite, using sub- comprising 5mC and 5hmC The standard reagent (Glen Research) of phosphamide.The standard method recommended by Glen Research Corp. is to Oligo DNA deprotection, and purified using Glen-Pak DNA purification cassette (purification cartridge).
The DNA of synthesis is verified by high performance liquid chromatography (HPLC), briefly: with ribozyme P1 (Sigma, N8630) and DNA is digested to nucleosides by alkaline phosphatase (Sigma, P4252).5%-50% acetonitrile is used on SB-Aq C18 column (Agilent) Nucleosides is separated in 30 minutes.
2, cell culture, transfection and flow cytometry
(this laboratory is protected for HEK293T cell (laboratory Stanley Cohen from Stanford University), HeLa cell Deposit) it is cultivated in DMEM, 10%FBS and 1% Pen .- Strep is added, in 37 DEG C and 5%CO2Under the conditions of cultivate.It is transfecting 24 hours before by cell inoculation in 24 orifice plates, inoculum density be every hole 7x 104A cell.0.15 μ of cell in every hole g TALE-(XX')3Plasmid and 0.15 μ g reporter DNA pass through polyethyleneimine (PEI) cotransfection.48h after transfection is collected thin Born of the same parents simultaneously analyze on BD LSR Fortessa flow cytometer (BD Biosciences).Respectively with wavelength be 488nm and The laser of 561nm quantifies EGFP and mCherry protein expression.From each sample collection at least 10000 events, obtain Enough data are for analyzing.MCherry fluorescence intensity is 5x 103-5x 104Cell be used to analyze.
3, the building of TALEN
The skeleton of TALEN plasmid contain CMV promoter, nuclear localization signal, TALE aminoterminal and c-terminus non repetitive sequence, And endonuclease FokI monomer, particular sequence see below bibliography 37.
In use, insertion contains the TALE repetitive unit of different RVD in TALEN skeleton carrier, to verify different RVD's Effect, construction method is referring to Yang, Junjiao, et al. " Assembly of Customized TAL Effectors Through Advanced ULtiMATE System."TALENs:Methods and Protocols(2016):49-60.
4, the expression and purifying of TALE albumen
The TALE albumen of expression and purifying protects measurement in vitro for carrying out.
There is the TALE repetitive unit of specification RVD (i.e. NI, NG, HD and NN) using the building of ULtiMATE system, such as preceding institute State (37).For using the TALE repetitive unit of new RVD, the TALE repetitive unit monomer containing new RVD is separately synthesized.Make The final assembling of these TALE constructs is carried out with same ULtiMATE scheme, (37) as previously described.
The building of TALE expression plasmid is that TALE repetitive unit is building up in TALEN skeleton.From corresponding TALEN plasmid In amplify N- the and C- end sequence containing TALE and the segment with intermediate repetitive unit, and be cloned into pET-28a's (+) In the site NheI and HindIII.
TALE with different RVD sequence (comprising the His label for purifying, N- the and C- end sequence of TALE and It is capable of the TALE repetitive unit of specific recognition DNA) it is cloned into pET28a carrier (Novagen).When cell density reaches When OD600 is 8.0, TALE is induced with 1.0mM isopropyl ss-D- thiogalactoside (IPTG) in e. coli bl21 (DE3) The overexpression of albumen.After 20 DEG C grow 16 hours, cell is harvested, and be resuspended in containing 8.0 He of 25mM Tris-HCl pH In the buffer of 150mM NaCl, clasmatosis is made by ultrasound.Pass through Ni2+Nitrilotriacetate affine resin (Ni- NTA, GE healthcare) (Buffer A:10mM Tris-HCl pH 8.0,300mM NaCl, Buffer B:10mM Tris-HCl pH 8.0,300mM NaCl and 500mM imidazoles) and HiLoad superdax PG200 (GE Healthcare) (Buffer GF:10mM Tris-HCl, pH 8.0,100mM NaCl) sequence purification of recombinant proteins.
5, TALE repetitive unit
TALE repetitive unit as used in the following examples includes that continuous 14 repetitive units and one and half repeat list Member, each single repetitive unit include 34 amino acid residues, sequence are as follows: LTPEQVVAIASXX ' GGKQALETVQRLLPVLCQAHG, half repetitive unit include preceding 17 amino acid residues of single repetitive unit, sequence are as follows: LTPEQVVAIASXX'GGKQ.Wherein XX ' is RVD.
Material described in the present embodiment and method are used for following embodiment 2-7.
The building of 2 artificial screening system of embodiment
Artificial screening system is made of reporter dna element and TALE-VP64 expression library.
TALE-VP64 expression library includes 400 TALE-VP64-mCherry constructs, each TALE-VP64- MCherry construct is circular plasmids, and TALE-VP64 fusion protein (specific construction method ginseng is expressed after it is transferred into cell See below bibliography 37).As shown in Figure 1B, each construct include artificial T ALE array, the artificial TALE array contain with 1-6,10-14 repetitions and last half of repetition (are shown as the 14.5th by 14.5 repetitions of VP64 fusion in Figure 1B A repetition) for, it is identical between different constructs, but the 7-9 repetition is different between different constructs.For For each construct, positioned at artificial T ALE array the 7th to the 9th duplicate three continuous monomers RVD by identical 6 random synthesis nucleotide coding, i.e., three identical RVD of the the 7th to the 9th repetitive unit expressing in series, these are artificial TALE array is referred to as TALE- (XX ')3, 400 TALE with different test RVD XX' are consequently formed, it is different to detect Identification of the RVD to 5mC and 5hmC.Wherein X and X' respectively represents the 12nd in repetition and the 13rd residue (i.e. RVD).In addition, Due to having found that N* can identify 5mC before, 20 TALE- (X*) are in addition assembled3, wherein the 13rd residue deletions.Hereafter In, by above-mentioned TALE- (XX ')3With TALE- (X*)3It is referred to as TALE- (XX ')3, accordingly, used TALE-VP64 expression Library includes separately including 420 kinds of difference TALE- (XX') altogether3420 kinds of TALE-VP64-mCherry constructs, hereinafter unite Claim TALE construct, alternatively referred to as TALE- (XX ')3Plasmid or TALE- (XX')3Expression plasmid.
Generate TALE-VP64 expression library.Specifically, 420 kinds of TALE- (XX ')3It is divided into two classes, wherein 400 kinds TALE-(XX’)3The 12nd and 13 amino acids residues of the RVD of 7th to the 9th repetitive unit of plasmid are 20 kinds of natural amino acids The combination of residue, this kind of TALE- (XX ')3The construction method of plasmid is below with reference to described in document 13.
Other 20 kinds of TALE- (XX')3The the 7th to the 9th repetitive unit expressed by RVD be 13 amino acids residue deletions RVD, i.e. A*, C*, D*, E*, F*, G*, H*, I*, K*, L*, M*, N*, P*, Q*, R*, S*, T*, V*, W*, Y*.This 20 kinds TALE-(XX')3The building of expression plasmid is using below with reference to the TALE- (XX ') reported in described in document 133Building respectively Method.I.e. using the forward primer 5 '-of a specific RVD of coding (NNNNNN represents coding to tCGTCTCaGAACAGGTTGTAGCCATAGCTTCTNNNNNNGGAGGTAAGCAGGCACTG GAA-3 ' The base sequence of specific RVD) and an identical reverse primer 5 '- aaCGTCTCaGTTCGGGGGTCAACCCATGAGCCTGACACAGTACTGGGAGCAGGCGCTGCACGGTTTCCAGTGCCTGC TT-3 ' generates one section of long 102bp by way of annealing and PCR extends and BsmBI restriction endonuclease sites is contained at both ends TALE monomer segment.Later, the TALE monomer segment is linked together by 6 Golden-Gate digestion connection circulations, and With primer G-lib-F and G-lib-R by TALE polymer amplify come.Finally by glue recycling mode selective recovery containing only There are three the segments of TALE monomer to recycle, and is connected on the library expression vector built in advance, and is transformed into Trans1-T1 sense By in state cell.The TALE- (XX') for correctly expressing corresponding RVD is obtained by Sanger sequencing3Plasmid.Wherein:
G-lib-F:5’-TAGCTATACGTCTCATTGACCCCCGAACAGGTTGTAGCC-3’
G-lib-R:5’-TAGCTATACGTCTCACCCATGAGCCTGACACAGTACTGGGAGCA-3’。
It includes TALE- (XX ') that reporter dna element, which is one section,3Identify sequence, miniCMV promoter, EGFP encoding histone sequence The linear DNA fragment (Fig. 1 b) of column and polyA signal.TALE- (XX ') in reporter dna element3Identify that sequence length is 15 Base, wherein 1-6,10-15 bases, can be wrapped in 1-6,10-14.5 repetitions of library TALE construct respectively The RVD identification contained.TALE- (XX ') in reporter dna element3The 7-9 base of identification sequence can be continuous three 5mC, 5hmC or 6mA are known respectively as 5mC report for detecting different RVD to the binding ability of corresponding methylation modified base DNA element, 5hmC reporter dna element or 6mA reporter dna element.The methylation modified base screened as needed, which determines, to be used One or more of reporter dna elements.Reporter dna element contains specific methylation modified base using chemically synthesized one Forward primer Report-F and an identical reverse primer Report-R are obtained by PCR amplification mode, and size is about 1450bp。
Primer sequence is as follows:
Report-F:
5’-G*C*C*AGATATACGCGTTACTGGAGCCATCTGGCCNNNTACGTAGGCGTGTAC-3 ', wherein N generation Table 5mC, 5hmC or 6mA;
Report-R:5 '-A*G*C*GTCTCCCGTAAAGCACTAAATCGGAACCCTAAAGGGAGC-3 '
(* indicates the base of thio-modification, primarily serves and protects reporter dna element in the cell not by nuclease degradation Effect, underscore indicate TALE- (XX ')3Identify sequence, i.e. TALE binding sequence)
The building process of reporter dna element are as follows: amplification report plasmid pcDNA6_3A (Fig. 2 c) first from Escherichia coli, Include in this report plasmid TALE- (XX')3Binding site template sequence CTGGCCAAATACGTA.Then, chemical synthesis The above-mentioned primer containing 5mC, 5hmC or 6mA is generated by PCR and contains 5mC, 5hmC or 6mA in TALE binding sequence Linear reporter dna element (Fig. 2 c), forward primer contain TALE- (XX')3Binding sequence CTGGCCNNNTACGTA, wherein N table Show 5mC, 5hmC or 6mA, is immediately adjacent to the upstream of minimum CMV promoter (pminiCMV) and the EGFP gene in its downstream.Its 5mC, 5hmC or the 6mA connected in middle TALE binding sequence corresponding to 7-9 duplicate bases for 3.
In addition, can also be comprising the reporter dna element for C and T in artificial screening system, they are annular DNAs, are pressed According to the method building of following bibliography 13.TALE- included in it (XX ')3Identify that sequence is same as described above, in addition to it In NNN be CCC or TTT.
Using above-mentioned artificial screening system, TALE- (XX') is detected by the measurement to EGFP fluorescence level3With report Accuse the specificity that the TALE binding sequence in DNA element combines.Thus building is obtained for systematicness assessment TALE RVD identification Screening Platform.
The TALE RVD of the cytimidine of the screening identification modification of embodiment 3
In order to measure the binding affinity of 420 RVD and 5mC and 5hmC, by 420 TALE constructs each with three Kind EGFP reporter dna element (is concomitantly introduced into HEK293T cell containing one in three C, 5mC or 5hmC even) respectively.It uses Facs analysis measures EGFP and mCherry fluorescence level (Fig. 1 c).It expresses by comparing the EGFP of each RVD relative to corresponding The multiple of the baseline level of base changes, and determines that 420 RVD in TALE construct are special to the combination of C, 5mC and 5hmC respectively It is anisotropic.1260 data points of C, 5mC and 5hmC, and being directed in (referring to following bibliography 13) that work before will be directed to 420 data points of T are summarised in thermal map (heat map) (Fig. 3 a and Fig. 4).
From the results of preliminary screening of Fig. 3 a, select some couples of higher RVD of 5mC and 5hmC binding force do 3 times it is duplicate Confirmation experiment, wherein RVD of the EGFP activation multiple more than or equal to 4 for 5mC, 5hmC reporter dna segment is considered right respectively In the higher RVD of both nucleotide binding forces, as the result is shown in Fig. 3 b.
From the results, it was seen that screening obtains the effectively specificity and degeneracy RVD of identification 5mC and 5hmC.It authenticated more A bonding agent with 5mC with high associativity, and three classes are divided them into according to the 13rd amino acid residue: containing Gly's RVDs (NG, KG and RG), the RVDs (HA and NA) containing Ala and the RVDs (N*, K*, H*, R*, Y* and G*) containing missing.? In RVDs containing Gly and containing missing, there are general RVDs (identification 5mC, 5hmC and routine C) and degeneracy RVDs (identification 5mC And 5hmC);It is interesting that two RVDs (HA and NA) containing Ala have selectivity for 5mC.Research before uses NG (natural binders of T) and N* identify 5mC;And we also authenticated the two RVDs in screening process, report in this research Many new RVDs they are above for the binding affinity of 5mC.For example, HA, NA and X* (X refers to K, H, Y and G) are proved There is higher binding affinity for 5mC.We do not have found these three types of RVDs also in relation with conventional T;This is not astonishing, They either have the amino acid residue with small side chain, or have the missing of the residue at the 13rd.
The RVD of selective binding 5hmC was not reported before.As discussed above, we authenticated in conjunction with 5hmC Good degeneracy RVDs and general RVDs.Among them, these 5hmC bonding agents observed with~15 times of induction, this Prove that they have strong compatibility to 5hmC.In addition, we have also observed that one group of new 5hmC- combination RVDs, they are There is serine (FS, YS and WS) on 13 residues.Although they are lower than general bonding agent to the compatibility of 5hmC, relative to For 5mC, they preferentially combine 5hmC, this provides possibility for positive and selectivity 5hmC identification.For synthesis, I Find that the general bonding agent of 5mC and 5hmC and degeneracy bonding agent tend at the 13rd containing glycine or missing, and 5mC and The specific binding agent of 5hmC has alanine or serine at their the 13rd respectively.
Embodiment 4RVD is to the binding affinity of 5mC, 5hmC and routine C and the quantitative measurment of specificity
By protecting measurement to verify the new RVD obtained in embodiment 3 to the identification of DNA in vitro, (reaction principle is shown in Fig. 5 a).In this experiment, the gene order on one section of MAPK6 gene has been synthesized by chemically synthesized mode, 5 '- TTCAGCTGGAT[CCCGGAGGA]GCGGATATAACCAGG-3'.Identify that sequence is square brackets according to the TALE of the sequence design Shown in sequence, include a MspI restriction enzyme enzyme recognition site in the sequence (shown in underscore).When chemical synthesis, In the DNA oligonucleotides of given position (second C of MspI recognition site) containing C, 5mC or 5hmC;There are varied concentrations TALE albumen under conditions of, endonuclease MspI is added in DNA probe.The cytimidine alkali that TALE albumen is identified with it The combination of base can inhibit cutting of the endonuclease to DNA, thus lead to protected overall length occur in denaturation PAGE analysis The DNA band of band and cutting.Then protection efficiency is calculated to each RVD, (Ki, it is binding affinity with inhibition constant Measurement reciprocal) form provide.The inhibition constant of RVD is the TALE albumen by the inclusion of different RVD for C, 5mC and 5hmC Cutting protection test obtain protection efficiency, and protection efficiency curve is fitted simultaneously using 6 software of GraphPad Prism Calculate what inhibition constant obtained, which characterizes different RVD to the joint efficiency of C, 5mC and 5hmC, inhibition constant The smaller protection efficiency for illustrating the RVD of numerical value is stronger, and the combination of corresponding DNA fragmentation is stronger.
Carry out external protection measurement using endonuclease MspI (its principle is shown in Fig. 5 a).Contain in every 10 μ l reaction system DNA, the 1 μ l 10X CutSmart Buffer (NEB) and 100nM NaCl of 1nM label.TALE albumen is added, final concentration exists Between 10nM and 8 μM.It is incubated 30 minutes in conjunction with system at 25 DEG C.Then 0.4U MspI is added, continues to incubate 15 minutes.It is added Reaction is quenched in 10 μ l formamides, then heats 5 minutes at 95 DEG C.DNA protected with Urea-PAGE separation and cutting, and It is imaged with Chemiluminescent Nucleic Acid Detection Module Kit (Thermo).
Detection is optimized with RVD HD first, HD is the natural binders for having high-affinity to conventional cytimidine. HD observes low Ki for C, and 30 times of height (Fig. 5 b and c, Fig. 6 c) is then at least for the Ki of 5mC and 5hmC, this proves protection Detect the ability in the qualitative assessment to binding affinity.In the vitro detection, show NG and N* only in conjunction with 5mC, without In conjunction with 5hmC (Fig. 5 b and c).It is further assessed from selection representativeness RVD in the selection result (Fig. 3 b).5mC specificity RVD HA, which is shown, has minimum Ki to 5mC, and in the vitro detection, it is to C and 5hmC~5-7 times to the selectivity of 5mC. 5hmC specificity RVD FS shows that it is to C and 5mC~5-6 times, although its combination to 5hmC to the selectivity of 5hmC Compatibility seems strong to the compatibility of 5mC not as good as HA.In addition, degeneracy RVD RG shows comparable protection to 5mC and 5hmC, And the general RVD R* in conjunction with C, 5mC and 5hmC, all there is similar compatibility to all three.(referring to Fig. 5 b and 5c).
The new RVD of embodiment 5 activated gene in a manner of the dependence that methylates is expressed
In order to study the potential that the RVD newly identified identifies cytosine methylation in vivo, to them in target in people's cell Performance in terms of gene activation is studied.The TALE-VP64 design construction TALE-activator developed before use is realized The activation (37) of specific gene.The skeleton of TALE-activator plasmid contains CMV promoter, nuclear localization signal, TALE amino End and c-terminus non repetitive sequence and activity factor VP64, particular sequence see below bibliography 37.
In use, insertion contains the TALE repetitive unit of different RVD in TALE-activator skeleton, to verify difference The effect of RVD, construction method is referring to Yang, Junjiao, et al. " Assembly of Customized TAL Effectors Through Advanced ULtiMATE System."TALENs:Methods and Protocols (2016):49-60.
Firstly, selecting TET1 gene using the existing methylation data from USCS database, its promoter exists Have hyper-methylation horizontal in HeLa cell, but is demethylation (Fig. 7 a) in HEK293T cell.Building contains targeting The TALE-activator of the TALE repetitive unit of TET1 gene.In HeLa cell, the HA of 5mC specificity, the RG of degeneracy and General R* all significantly activate TET1 expression (standard significantly activated is the expression quantity that can improve TET1 compared with the control group, And expression quantity has conspicuousness raising, *, P < 0.05 compared with the control group;*, P < 0.005), wherein RG has reached about 10 times It activates (Fig. 7 b), the new RVD of all three identified (HA, RG, R*) is proved to have preferably performance compared with NG and N*;This Outside, HD does not raise the expression of TET1 significantly.And in HEK293T cell, the TET1 promoter of HD and demethylation is tied well It closes, and further enhances its and express (although it has had high expression level), HA and RG do not influence gene expression, and general R* HD is lower than to the compatibility of conventional C, slightly raises gene expression;Since NG and N* are difficult to distinguish unmodified C, they are also slight Activate TET1 gene expression in ground.
Then, the TALE-activator of TALE repetitive unit of the building containing targeting LRP2 gene, they target LRP2 The promoter region of gene, which is medium methylation in HeLa cell, and is in HEK293T cell (Fig. 7 c) of methylation.In addition, this region is contained only, there are two CpG, therefore with more challenge for the differentiation that RVD is mediated Property.
By HEK293T and HeLa cell inoculation in 6 orifice plates, and grow to 60% density.For each hole, 2 μ g TALE- activator plasmid is used2000 (Invitrogen) transfection.MCherry is being sorted by fluidic cell By cell culture 3 days of transfection before positive cell.Total serum IgE is separated from mCherry positive cell and carries out reverse transcription, SYBR Green 2X premix II is used on ViiATM7 Real-Time PCR System (Applied Biosystems) (Takara) real-time PCR analysis is carried out under the conditions of standard reaction.
Observe the significant ground activated gene in HeLa cell of the RVD (HA, RG) in conjunction with 5mC.In HEK293T cell, The expression of only HD and general RVD R* activation LRP2 gene, does not activate the expression (Fig. 7 d) of LRP2 gene in conjunction with the RVD of 5mC. Therefore, the new RVD (HA, RG) identified can in vivo distinguish the site of medium methylation and non-methylation sites.
Embodiment 6 carries out the genome editor of methylation dependence using new RVD
In order to check methylation dependence genome editor a possibility that, use contain different RVD TALEN construct It (is inserted into TALE repetitive unit in TALEN expression vector to obtain, TALEN expression vector (i.e. TALEN plasmid backbone) contains CMV Promoter, nuclear localization signal, TALE amino and c-terminus non repetitive sequence, endonuclease FokI monomer, particular sequence ginseng Bibliography 37 is seen below, construction method is referring to Yang, Junjiao, et al. " Assembly of Customized TAL Effectors Through Advanced ULtiMATE System."TALENs:Methods and Protocols (2016): 49-60.) targeting people PLXNB2 gene carries out DNA cutting (Fig. 7 e).Select PLXNB2 second exon, it It is high methylation (data come from UCSC) in HeLa cell, and is evaluated using indel ratio (i.e. insertion and deletion ratio) The DNA cutting that TALEN- is mediated.
HeLa cell inoculation in 6 orifice plates and is grown into 60% density.For each hole, a pair of of TALEN plasmid and PmaxGFP (LonZa Group Ltd.) uses Xtreme Gene HP with the ratio (0.9 μ g:0.9 μ g:0.2 μ g) of 9:9:2 (Roche) cotransfection.By cell culture 3 days of transfection before sorting GFP positive cell by fluidic cell.The target area TALEN- Domain is the genomic DNA PCR amplification from isolated GFP positive cell.As previously mentioned, with the sensitive T7 endonuclease of mispairing Enzyme (T7E1;New England Biolabs) analyze the indels (41) that TALEN is mediated.
The result shows that TALEN-HD shows negligible editorial efficiency (Fig. 7 f), this shows to deposit in the area It is modified in three 5mC, effectively prevents its combination.When three RVD containing HD are replaced by the RVD (detection in conjunction with 5mC HA, R*, NG and N*) when, observe high indel ratio (Fig. 7 f and Fig. 8 C).These are the result shows that these RVD may be implemented The genome editor of methylation dependence is carried out in people's cell.
The detection with single base resolution ratio to 5hmC in mammalian genome that embodiment 7RVD- is mediated
The methylation ratio of cytimidine can be determined by bisulfite sequencing;But traditional bisulfite Sequencing cannot distinguish between 5hmC and 5mC (38).Previously had using the detection for combining the TALE albumen of C and 5mC to carry out indirect 5hmC It reports (32).A possibility that direct 5hmC is detected is carried out using the TALE albumen containing 5hmC- identification RVD in order to study, first The mode DNA sequence dna in specific site incorporation 5hmC, 5mC and C has been synthesized, and has had detected the selection that RVD FS detects 5hmC Property.In vitro in protection detection, with the increase of 5hmC ratio, the linear increase (Figure 10) of the full length DNA of protection, on the contrary, working as When the rate of change of 5mC and C, protective rate shows faint variation.The experiment is with separately including C, 5mC and 5hmC, and sequence Identical DNA fragmentation is mixed with ratio as shown in the figure.Black circles indicate ratio of the 5hmC in 5mC and 5hmC mixture The situation of change of degree of protection when increasing to 100% by 0%.Black triangles indicate ratio of the 5hmC in C and 5hmC mixture The situation of change of degree of protection when increasing to 100% by 0%.Gray circles indicate ratio of the 5mC in C and 5mC mixture by The situation of change of 0% degree of protection when increasing to 100%.It can be seen from fig. 10 that as 5mC is in C and 5mC mixture Ratio increases, and TALE-FS only slightly increased the degree of protection of DNA.Compared to this, as 5hmC is mixed in C and 5hmC It closes object and the ratio in 5mC and 5hmC mixture increases, TALE-FS also greatly increases the degree of protection of DNA, says Bright TALE-FS protective effect selective for the DNA fragmentation containing 5hmC.These observation indicate that 5hmC specificity RVD FS can be used in genome DNA sample, (for interested nucleotide, exist simultaneously at least in complexity modification situation C, the modification of 5mC and 5hmC) under detection 5hmC modification.
Then the TALE albumen (i.e. TAL effect protein in Fig. 9 a) containing FS is used to carry out site in genomic DNA Specific 5hmC detection.In view of the complexity of genomic DNA, restriction enzyme is replaced using CRISPR/Cas9 system, in the guarantor DNA cutting (Fig. 9 a) is generated in shield detection.Select the sequence of the 10bp in the introne of mouse slc9a9 gene, it was reported that wherein First cytimidine is height methylolated (39) in mES cell.
Reaction condition are as follows: every 10 μ l reaction system contains 50ng genomic DNA, 1 μ l 10X Cas9 nucleic acid enzyme reaction buffering Liquid (NEB) and 1nM DTT.TALE albumen is added, final concentration is between 20nM and 500nM.Association reaction incubates 30 points at 25 DEG C Clock.5 μ l precincubation Cas9 and sgRNA are added, continue to incubate 1 hour at 37 DEG C.Heating at 95 DEG C is quenched reaction.With Ampure Beads purifies DNA, and with SYBR Green 2X premix II (Takara) on96(Roche) Analyze qPCR.
The result shows that the protection efficiency of TALE-FS is significantly larger than TALE-HD (Fig. 9 b), this shows that TALE-FS is able to detect A single site 5hmC in the complex environment of genomic DNA.In order to further study energy of this method in 5hmC detection This method is applied to the genomic DNA of the horizontal unknown cell line of other methylolations on same loci by power.With mESC Sample compares, when there are relatively low intensity of TALE albumen (RAW264.7, L-M (TK-) and L929 cell), to these cells Genomic DNA protection it is much smaller (Fig. 9 c), this shows that the level of 5hmC on the specific site in these cells is lower. The above results show that the TALE albumen containing the RVD newly identified can be used in the resolution ratio detection genomic DNA with base degree Methylol state.
Embodiment 8 identifies the TALE albumen RVD of identification 6mA
Using with identical screening system described in embodiment 2, i.e. TALE- (XX ')3It the independent library RVD and is repaired containing 6mA The linear DNA reporting system of decorations detects TALE- by flow cytometry after they are distinguished corotation HEK293T cell (XX’)3To the EGFP expression activation multiple of 6mA reporting system.Figure 11 is thermal map of 420 kinds of RVD to 6mA the selection result.
As can be seen that there is the TALE- (XX ') of activation efficiency to 6mA reporting system to 6mA the selection result thermal map3Than More, the first amino acids are mainly His (H), Lys (K), Asn (N) and Arg (R);And the second ammonia of these efficient RVD Most base acid is Ile (I), Pro (P), Ser (S), Thr (T) or Val (V).It can be seen that from the thermal map (Figure 11 c) of superposition It is many also to have a preferable recognition capability to literalness adenine in the above-mentioned RVD for having higher recognition capability to 6mA, such as XI, The series such as XS, XT, XV RVD;There are also be, such as XP series RVD relatively good to 6mA specificity.Figure 12 is from preliminary sieve Select in result, pick it is that some couples of 6mA have that the RVD of recognition capability tested as a result, be specifically exactly selected for 6mA reporting system EGFP activation multiple does 3 duplicate confirmation experiments greater than 5 some RVD.
Still closely related with the second amino acids of RVD, this research hair is seen on the whole with Preference to the recognition capability of 6mA The existing RVD such as XP series RVD and NA, CV, FT show the obvious Preference to 6mA;And XI, XC and part XT series are right Identification without modification adenine and N6- methyl adenine does not have obvious Preference.Wherein Ile (I) is its side with the contact of A base The Van der Waals interaction (45) formed between chain and adenine C8 and N7, it is possible that not by increase methyl on 6 bit aminos It influences.In the RVD to 6mA specificity high (6mA/A > 5), FT, CV, CP and NP identify other bases without methylation modification Background value it is relatively low (Figure 13), wherein NP takes second place to 6mA recognition capability highest, FT, and CV and CP are more lower, these are believed that It is the RVD selection best to 6mA Preference.
It to sum up, the study find that, is above small amino acid (Gly and Ala) or missing, energy at the 13rd usually Enough compatibilities increased to 5mC.This observation result and discovery before, i.e. N* and NG (naturally identifying T) can be in conjunction with 5mC Unanimously.It is possible that big side chain on the 13rd lacks the methyl group that can produce enough space 5mC.But There is exception in this general trend.For example, observing that HG is very weak to the compatibility of 5mC, HG contains compared with HD at the 13rd Lesser residue, HD are the natural binders of C.It is interesting that (thus become RG) when the 12nd His is replaced by Arg, I Observe strong combination with 5mC.In fact, RG also identifies 5hmC.These are observation indicate that identification of double residues to modification There may be increasingly complex modes.In order to fully understand TALEs to the recognition mechanism of modification, need these new RVDs with The crystal structure in compound that the cytimidine of modification is formed.
The gene for also confirming that the methylation for the genome area to several high methylations that TALE- is mediated relies on herein swashs Living and genome editor.As important control, when identical region lacks cytosine methylation (in different cells), It is barely perceivable gene activation.Therefore, the new RVD reported in this research can provide a possibility that such: according to target gene Decorating state in vivo manipulates target gene.It is known there are many differential methylation regions (DMR), they are related to many important Biological event, including Genomic Imprinting and disease.Therefore, the unique ability that TALE albumen reads apparent marker makes future The application of the apparent gene group dependence of TALE in vivo is possibly realized.
In addition, this research searched out by the method for high flux screening it is preferable for N6- methyl adenine Preference RVD, such as CV, FT, NP.These RVD can be used to construct the N6- methyl adenine combination TALE albumen of sequence specific, play The effect of similar antibody can also be used in combination with the RVD without modification A base is only identified, reach quantitative or qualitative detection 6mA Purpose.To 6mA and RVD of the A base without Preference, such as NI, agonic targeting can be used to and contain potential adenine methyl Change the sequence of modification, to overcome the problems, such as that methylation modification leads to gene editing inefficiencies.
Bibliography
1.Kay S&Bonas U(2009)How Xanthomonas type III effectors manipulate the host plant.CurrOpin Microbiol 12(1):37-43.
2.Kay S,Hahn S,Marois E,Hause G,&Bonas U(2007)A bacterial effector acts as a planttranscription factor and induces a cell size regulator.Science 318(5850):648-651.
3.Boch J&Bonas U(2010)Xanthomonas AvrBs3family-type III effectors: discovery andfunction.Annu Rev Phytopathol 48:419-436.
4.Boch J,et al.(2009)Breaking the code of DNA binding specificity of TAL-type III effectors.Science 326(5959):1509-1512.
5.Gurlebeck D,Thieme F,&Bonas U(2006)Type III effector proteins from the plant pathogenXanthomonas and their role in the interaction with the host plant.J Plant Physiol 163(3):233-255.
6.Moscou MJ&Bogdanove AJ(2009)A simple cipher governs DNA recognition by TALeffectors.Science 326(5959):1501.
7.Bogdanove AJ&Voytas DF(2011)TAL effectors:customizable proteins for DNA targeting.Science 333(6051):1843-1846.
8.Morbitzer R,Romer P,Boch J,&Lahaye T(2010)Regulation of selected genome loci usingde novo-engineered transcription activator-like effector (TALE)-type transcription factors.Proc Natl Acad Sci U S A 107(50):21617- 21622.
9.Cong L,Zhou R,Kuo YC,Cunniff M,&Zhang F(2012)Comprehensive interrogation ofnatural TALE DNA-binding modules and transcriptional repressor domains.Nat Commun3:968.
10.Garg A,Lohmueller JJ,Silver PA,&Armel TZ(2012)Engineering synthetic TAL effectorswith orthogonal target sites.Nucleic Acids Res 40(15): 7584-7595.
11.Christian M,et al.(2010)Targeting DNA double-strand breaks with TAL effector nucleases.Genetics 186(2):757-761.
12.Miller JC,et al.(2011)A TALE nuclease architecture for efficient genome editing.NatBiotechnol 29(2):143-148.
13.Yang J,et al.(2014)Complete decoding of TAL effectors for DNA recognition.Cell research24(5):628-631.
14.Miller JC,et al.(2015)Improved specificity of TALE-based genome editing using anexpanded RVD repertoire.Nat Methods 12(5):465-471.
15.Kohli RM&Zhang Y(2013)TET enzymes,TDG and the dynamics of DNA demethylation.Nature 502(7472):472-479.
16.Pastor WA,Aravind L,&Rao A(2013)TETonic shift:biological roles of TET proteins inDNA demethylation and transcription.Nat Rev Mol Cell Biol 14 (6):341-356.
17.Kriaucionis S&Heintz N(2009)The nuclear DNA base 5- hydroxymethylcytosine is presentin Purkinje neurons and the brain.Science 324 (5929):929-930.
18.Tahiliani M,et al.(2009)Conversion of 5-methylcytosine to 5- hydroxymethylcytosine inmammalian DNA by MLL partner TET1.Science 324(5929): 930-935.
19.Ito S,et al.(2010)Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal andinner cell mass specification.Nature 466(7310):1129- 1133.
20.He YF,et al.(2011)Tet-mediated formation of 5-carboxylcytosine and its excision by TDG inmammalian DNA.Science 333(6047):1303-1307.
21.Maiti A&Drohat AC(2011)Thymine DNA glycosylase can rapidly excise 5-formylcytosineand 5-carboxylcytosine:potential implications for active demethylation of CpG sites.J BiolChem 286(41):35334-35338.
22.Pfaffeneder T,et al.(2011)The discovery of 5-formylcytosine in embryonic stem cell DNA.Angew Chem Int Ed Engl 50(31):7008-7012.
23.Huang Y&Rao A(2014)Connections between TET proteins and aberrant DNA modificationin cancer.Trends Genet 30(10):464-474.
24.Bultmann S,et al.(2012)Targeted transcriptional activation of silent oct4 pluripotency geneby combining designer TALEs and inhibition of epigenetic modifiers.Nucleic Acids Res40(12):5368-5377.
25.Valton J,et al.(2012)Overcoming transcription activator-like effector(TALE)DNA bindingdomain sensitivity to cytosine methylation.J Biol Chem 287(46):38427-38432.
26.Kim Y,et al.(2013)A library of TAL effector nucleases spanning the human genome.NatBiotechnol 31(3):251-258.
27.Deng D,et al.(2012)Recognition of methylated DNA by TAL effectors.Cell research22(10):1502-1504.
28.Dupuy A,et al.(2013)Targeted gene therapy of xeroderma pigmentosum cells usingmeganuclease and TALEN.PLoS One 8(11):e78678.
29.Hu J,et al.(2014)Direct activation of human and mouse Oct4 genes using engineered TALEand Cas9 transcription factors.Nucleic Acids Res 42(7): 4375-4390.
30.Kubik G,Schmidt MJ,Penner JE,&Summerer D(2014)Programmable and highly resolvedin vitro detection of 5-methylcytosine by TALEs.Angew Chem Int Ed Engl 53(23):6002-6006.
31.Kubik G&Summerer D(2015)Achieving single-nucleotide resolution of 5-methylcytosinedetection with TALEs.Chembiochem 16(2):228-231.
32.Kubik G,Batke S,&Summerer D(2015)Programmable sensors of 5- hydroxymethylcytosine.J Am Chem Soc 137(1):2-5.
33.Maurer S,Giess M,Koch O,&Summerer D(2016)Interrogating Key Positions of Size-Reduced TALE Repeats Reveals a Programmable Sensor of 5- Carboxylcytosine.ACS ChemBiol 11(12):3294-3299.
34.Rathi P,Maurer S,Kubik G,&Summerer D(2016)Isolation of Human Genomic DNASequences with Expanded Nucleobase Selectivity.J Am Chem Soc 138 (31):9910-9918.
35.Deng D,et al.(2012)Structural basis for sequence-specific recognition of DNA by TALeffectors.Science 335(6069):720-723.
36.Mak AN,Bradley P,Cernadas RA,Bogdanove AJ,&Stoddard BL(2012)The crystalstructure of TAL effector PthXo1 bound to its DNA target.Science 335 (6069):716-719.
37.Yang J,et al.(2013)ULtiMATE system for rapid assembly of customized TAL effectors.PLoS One 8(9):e75649.
38.Wu H&Zhang Y(2015)Charting oxidized methylcytosines at base resolution.Nat StructMol Biol 22(9):656-661.
39.Yu M,et al.(2012)Base-resolution analysis of 5- hydroxymethylcytosine in the mammaliangenome.Cell 149(6):1368-1380.
40.Hsu PD,Lander ES,&Zhang F(2014)Development and applications of CRISPR-Cas9 forgenome engineering.Cell 157(6):1262-1278.
41.Mussolino C,et al.(2011)A novel TALE nuclease scaffold enables high genome editingactivity in combination with low toxicity.Nucleic Acids Res 39(21):9283-9293.
42.Fang,G.,Munera,D.,Friedman,D.I.,Mandlik,A.,Chao,M.C.,Banerjee,O., Feng,Z.,Losic,B.,Mahajan,M.C.,Jabado,O.J.,et al.(2012).Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single- molecule real-time sequencing.Nature biotechnology 30,1232-1239.
43.Fu,Y.,Luo,G.Z.,Chen,K.,Deng,X.,Yu,M.,Han,D.,Hao,Z.,Liu,J.,Lu,X., Dore,L.C.,et al.(2015).N6-methyldeoxyadenosine marks active transcription start sites in Chlamydomonas.Cell 161,879-892.
44.Greer,E.L.,Blanco,M.A.,Gu,L.,Sendinc,E.,Liu,J.,Aristizabal- Corrales,D.,Hsu,C.H.,Aravind,L.,He,C.,and Shi,Y.(2015).DNA Methylation on N6- Adenine in C.elegans.Cell 161,868-878.
45.Koziol,M.J.,Bradshaw,C.R.,Allen,G.E.,Costa,A.S.,Frezza,C.,and Gurdon,J.B.(2016).Identification of methylated deoxyadenosines in vertebrates reveals diversity in DNA modifications.Nature structural&molecular biology 23,24-30.
46.Mak,A.N.,Bradley,P.,Cernadas,R.A.,Bogdanove,A.J.,and Stoddard,B.L. (2012).The crystal structure of TAL effector PthXo1 bound to its DNA target.Science 335,716-719.
Ratel,D.,Ravanat,J.L.,Berger,F.,and Wion,D.(2006).N6-methyladenine: the other methylated base of DNA.BioEssays:news and reviews in molecular, cellular and developmental biology 28,309-315.
47.Wion,D.,and Casadesus,J.(2006).N6-methyl-adenine:an epigenetic signal for DNA-protein interactions.Nature reviews Microbiology 4,183-192.
48.Zhang,G.,Huang,H.,Liu,D.,Cheng,Y.,Liu,X.,Zhang,W.,Yin,R.,Zhang,D., Zhang,P.,Liu,J.,et al.(2015).N6-methyladenine DNA modification in Drosophila.Cell 161,893-906.

Claims (25)

1. isolated DNA combination polypeptide, it includes TALE, the TALE includes one or more RVD selected from the following:
HA or NA, can be with specific recognition 5mC;
FS, can be with specific recognition 5hmC;
N*, NG or KP can identify both C and 5mC;
HV or KV can identify both C and 5hmC;
K* or RG can identify both 5mC and 5hmC;
G*, H*, R* or Y* can identify C, 5mC and 5hmC three;
NP, FT, CV or CP, can be with specific recognition 6mA;Or
RI, NI, KI or HI can identify both A and 6mA;
Wherein * indicates amino acid deletions in this position.
2. fusion protein includes functional domain and TALE, the TALE includes one or more RVD selected from the following:
HA or NA, can be with specific recognition 5mC;
FS, can be with specific recognition 5hmC;
N*, NG or KP can identify both C and 5mC;
HV or KV can identify both C and 5hmC;
K* or RG can identify both 5mC and 5hmC;
G*, H*, R* or Y* can identify C, 5mC and 5hmC three;
NP, FT, CV or CP, can be with specific recognition 6mA;Or
RI, NI, KI or HI can identify both A and 6mA;
Wherein * indicates amino acid deletions in this position.
3. the fusion protein of claim 2, wherein the functional domain is the functional domain for adjusting gene expression, apparent something lost Pass rhetorical function structural domain, gene editing functional domain or fluorescin.
Preferably, wherein the functional domain for adjusting gene expression is activating transcription factor, transcription inhibitory factor or its function Energy property segment, the epigenetic modification functional domain is methylated transferase, demethylase or its functional fragment, institute Stating gene editing functional domain is nuclease or its functional fragment.
It is highly preferred that wherein the gene editing functional domain is endonuclease, preferably FokI endonuclease is more excellent Choosing is the DNA cutting domain of FokI endonuclease.
4. encoding the polynucleotides of the DNA combination polypeptide of claim 1 or the fusion protein of any one of claim 2-3.
5. the carrier of the polynucleotides comprising claim 4.
6. host cell, it includes the polynucleotides of claim 4 or the carriers of claim 5.
7. the albumen comprising TALE repetitive structure domain is in the reagent of the methylated base in preparation detection target gene target sequence Application, comprising:
(1) examination of methylated base 5mC of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence Application in agent, one or more RVD of the TALE repetitive structure domain are HA or NA;
(2) examination of methylated base 5hmC of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence One or more RVD of application in agent, the TALE repetitive structure domain are FS;Or
(3) examination of methylated base 6mA of the albumen comprising TALE repetitive structure domain in preparation detection target gene target sequence One or more RVD of application in agent, the repetitive structure domain of the TALE are NP, FT, CV or CP.
8. the polynucleotides of the fusion protein of any one of the DNA combination polypeptide of claim 1, claim 2-3, claim 4, The examination of host cell target sequence of target gene in preparation targeting combination cell of the carrier or claim 6 of claim 5 Application in agent.
9. the fusion protein of any one of claim 2-3 or the polynucleotides for encoding the fusion protein adjust mesh in cell in preparation The application in the reagent of gene expression is marked, wherein the functional domain for including in the fusion protein is the function for adjusting gene expression It can structural domain.
Preferably, wherein the functional domain for adjusting gene expression is activating transcription factor or its functional fragment, or Transcription inhibitory factor or its functional fragment.
10. the fusion protein of any one of claim 2-3 or the polynucleotides for encoding the fusion protein are being prepared to mesh in cell The application in the reagent of gene progress gene editing is marked, wherein the functional domain for including in the fusion protein is gene editing Functional domain.
Preferably, wherein the gene editing is nucleic acid cutting, the gene editing functional domain is nuclease or its function Property segment, preferably endonuclease or its functional fragment, more preferably FokI endonuclease or its DNA cut domain.
11. the fusion protein of any one of claim 2-3 or the polynucleotides for encoding the fusion protein are being prepared to mesh in cell The application in the reagent of gene progress epigenetic modification is marked, wherein the functional domain for including in the fusion protein is apparent Genetic modification functional domain.
Preferably, wherein the epigenetic modification functional domain is methylated transferase, demethylase or its functionality Segment.
12. the method for targeting the target sequence of target gene in combination cell, comprising: by the DNA combination polypeptide of claim 1, power Benefit require any one of 2-3 fusion protein or claim 4 polynucleotides introduce cell, make the DNA combination polypeptide or TALE in fusion protein is in conjunction with the target sequence of target gene.
13. the method for claim 12, in which:
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from HA or NA, only when the target gene TALE ability and mesh in target sequence when being 5mC on the recognition site of the RVD, in the DNA combination polypeptide or fusion protein The target sequence for marking gene combines;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from FS, only when the target sequence of the target gene TALE ability and target base in column when being 5hmC on the recognition site of the RVD, in the DNA combination polypeptide or fusion protein The target sequence of cause combines;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from NP, FT, CV or CP, only when the target TALE in the target sequence of gene when being 6mA on the recognition site of the RVD, in the DNA combination polypeptide or fusion protein Just in conjunction with the target sequence of target gene;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from N*, NG or KP, the target of the target gene The methylation state of particular bases in sequence on the recognition site of the RVD is uncertain, it may be possible to C or 5mC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from HV or KV, the target sequence of the target gene In particular bases on the recognition site of the RVD methylation state it is uncertain, it may be possible to C or 5hmC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from K* or RG, the target sequence of the target gene In particular bases on the recognition site of the RVD methylation state it is uncertain, it may be possible to 5mC or 5hmC;
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from G*, H*, R* or Y*, the target gene The methylation state of particular bases in target sequence on the recognition site of the RVD is uncertain, it may be possible to C, 5mC or 5hmC; Or
TALE in the DNA combination polypeptide or fusion protein includes the RVD selected from RI, NI, KI or HI, the target gene The methylation state of particular bases in target sequence on the recognition site of the RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
14. the method for adjusting target gene expression in cell, comprising: by the fusion protein or volume of any one of claim 2-3 The polynucleotides of the code fusion protein introduce cell, make the target sequence knot of the TALE and target gene in the fusion protein It closes, so that the expression of target gene is fused the adjusting of the functional domain in albumen, wherein the functional domain is to adjust Save the functional domain of gene expression.
15. the method for claim 14, in which:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene in institute It states on the recognition site of RVD when being 5mC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in the RVD Recognition site on when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target sequence when the target gene In when being 6mA on the recognition site of the RVD, TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
It is highly preferred that wherein the functional domain for adjusting gene expression is activating transcription factor or its functional fragment, or Person's transcription inhibitory factor or its functional fragment.
16. the method that the target gene in pair cell carries out gene editing, comprising: by the fusion egg of any one of claim 2-3 White or encoding said fusion protein polynucleotides introduce cell, make TALE in the fusion protein and target gene Target sequence combines, and is edited so that target gene is fused the functional domain in albumen, wherein the functional structure Domain is gene editing functional domain.
17. the method for claim 16, in which:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene in institute State on the recognition site of RVD when being 5mC, the TALE in the fusion protein just in conjunction with the target sequence of target gene,;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in the RVD Recognition site on when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target sequence when the target gene In when being 6mA on the recognition site of the RVD, TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
It is highly preferred that wherein the gene editing is nucleic acid cutting, the gene editing functional domain is nuclease or its function It can property segment, preferably endonuclease or its functional fragment, more preferably FokI endonuclease or its DNA cutting domain.
18. the method that the target gene in pair cell carries out epigenetic modification, comprising: by melting for any one of claim 2-3 Hop protein or the polynucleotides of encoding said fusion protein introduce cell, make TALE and target base in the fusion protein The target sequence of cause combines, so that target gene, which is fused the functional domain in albumen, carries out epigenetic modification, wherein The functional domain is epigenetic modification functional domain.
19. the method for claim 18, in which:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene in institute State on the recognition site of RVD when being 5mC, the TALE in the fusion protein just in conjunction with the target sequence of target gene,;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in the RVD Recognition site on when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target sequence when the target gene In when being 6mA on the recognition site of the RVD, TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
It is highly preferred that wherein the epigenetic modification functional domain is methylated transferase, demethylase or its function Property segment.
20. living cells chromosomal marker method, comprising: melt described in the fusion protein or coding by any one of claim 2-3 The polynucleotides of hop protein introduce cell, make the TALE in the fusion protein in conjunction with the target sequence of target gene, wherein institute Stating functional domain is fluorescin, is implemented in combination with by the target sequence of TALE and target gene in the fusion protein to target The fluorescent marker of sequence.
21. the method for claim 20, in which:
TALE in the fusion protein includes the RVD selected from HA or NA, only when in the target sequence of the target gene in institute State on the recognition site of RVD when being 5mC, the TALE in the fusion protein just in conjunction with the target sequence of target gene,;
TALE in the fusion protein includes the RVD selected from FS, only when in the target sequence of the target gene in the RVD Recognition site on when being 5hmC, the TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from NP, FT, CV or CP, the only target sequence when the target gene In when being 6mA on the recognition site of the RVD, TALE in the fusion protein is just in conjunction with the target sequence of target gene;
TALE in the fusion protein includes the RVD selected from N*, NG or KP, described in the target sequence of the target gene The methylation state of particular bases on the recognition site of RVD is uncertain, it may be possible to C or 5mC;
TALE in the fusion protein includes the RVD selected from HV or KV, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to C or 5hmC;
TALE in the fusion protein includes the RVD selected from K* or RG, the RVD's in the target sequence of the target gene The methylation state of particular bases on recognition site is uncertain, it may be possible to 5mC or 5hmC;
TALE in the fusion protein includes the RVD selected from G*, H*, R* or Y*, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to C, 5mC or 5hmC;Or
TALE in the fusion protein includes the RVD selected from RI, NI, KI or HI, in institute in the target sequence of the target gene The methylation state for stating the particular bases on the recognition site of RVD is uncertain, it may be possible to A or 6mA;
Wherein * indicates amino acid deletions in this position.
22. detecting the method that whether there is 5mC in cellular genome on the specific site of target sequence, comprising:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the certain bits in the TALE The RVD of point is HA or NA;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 5mC;If mesh Mark sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not be in conjunction with the target sequence simultaneously It cuts, there are 5mC on the specific site;If target sequence is cut, the TALE is not associated with the target sequence, institute Nuclease is stated in conjunction with the target sequence and is cut, 5mC is not present on the specific site.
23. detecting the method that whether there is 5hmC in cellular genome on the specific site of target sequence, comprising:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the certain bits in the TALE The RVD of point is FS;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 5hmC;If mesh Mark sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not be in conjunction with the target sequence simultaneously It cuts, there are 5hmC on the specific site;If target sequence is cut, the TALE is not associated with the target sequence, The nuclease combines the target sequence and cuts, and 5hmC is not present on the specific site.
24. detecting the method that whether there is 6mA in cellular genome on the specific site of target sequence, comprising:
(1) albumen comprising TALE is introduced into cell, the TALE targets target sequence, identifies the certain bits in the TALE The RVD of point is NP, FT, CV or CP;
(2) nuclease is then introduced into cell, the targeting cleavage site of the nuclease is located in TALE target sequence;
(3) whether detection target sequence is cut, and is thus judged on the specific site of target sequence with the presence or absence of 6mA;If mesh Mark sequence is not cut, then the TALE is in conjunction with the target sequence, so that the nuclease can not be in conjunction with the target sequence simultaneously It cuts, there are 6mA on the specific site;If target sequence is cut, the TALE is not associated with the target sequence, institute Nuclease is stated in conjunction with the target sequence and is cut, 6mA is not present on the specific site.
25. the method for any one of claim 30-32, wherein the nuclease is endonuclease.
Preferably, wherein the nuclease is Cas9 nuclease, and the Cas9 nuclease and sgRNA are total in step (1) With introducing cell.
CN201710660240.8A 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof Active CN109384833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710660240.8A CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710660240.8A CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Publications (2)

Publication Number Publication Date
CN109384833A true CN109384833A (en) 2019-02-26
CN109384833B CN109384833B (en) 2021-04-27

Family

ID=65412408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710660240.8A Active CN109384833B (en) 2017-08-04 2017-08-04 TALE RVD for specifically recognizing methylated modified DNA base and application thereof

Country Status (1)

Country Link
CN (1) CN109384833B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110106231A (en) * 2019-04-22 2019-08-09 武汉大学 A method of utilizing N6 or N1 generation methylation modifications of adenine in dUTP or dTTP detection nucleic acid
CN111876414A (en) * 2020-06-24 2020-11-03 湖南文理学院 Improved yeast upstream activation element and application thereof in fish
CN113677694A (en) * 2019-04-09 2021-11-19 国立研究开发法人科学技术振兴机构 Nucleic acid binding proteins
CN114591949A (en) * 2020-12-04 2022-06-07 中国科学院脑科学与智能技术卓越创新中心 Method for detecting endogenous low-abundance gene and lncRNA level of cell

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103025344A (en) * 2010-05-17 2013-04-03 桑格摩生物科学股份有限公司 Novel DNA-binding proteins and uses thereof
WO2013102290A1 (en) * 2012-01-04 2013-07-11 清华大学 Method for specifically recognizing dna containing 5-methylated cytosine

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103025344A (en) * 2010-05-17 2013-04-03 桑格摩生物科学股份有限公司 Novel DNA-binding proteins and uses thereof
WO2013102290A1 (en) * 2012-01-04 2013-07-11 清华大学 Method for specifically recognizing dna containing 5-methylated cytosine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SARAH FLADE等: "The N6-Position of Adenine is a Blind Spot for TAL-Effectors that Enables Effective Binding of Methylated and Fluorophore-Labeled DNA", 《ACS CHEMICAL BIOLOGY》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113677694A (en) * 2019-04-09 2021-11-19 国立研究开发法人科学技术振兴机构 Nucleic acid binding proteins
CN110106231A (en) * 2019-04-22 2019-08-09 武汉大学 A method of utilizing N6 or N1 generation methylation modifications of adenine in dUTP or dTTP detection nucleic acid
CN110106231B (en) * 2019-04-22 2021-08-17 武汉大学 Method for detecting methylation modification of adenine N6 or N1 bit in nucleic acid by using dUTP or dTTP
CN111876414A (en) * 2020-06-24 2020-11-03 湖南文理学院 Improved yeast upstream activation element and application thereof in fish
CN114591949A (en) * 2020-12-04 2022-06-07 中国科学院脑科学与智能技术卓越创新中心 Method for detecting endogenous low-abundance gene and lncRNA level of cell

Also Published As

Publication number Publication date
CN109384833B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN107922931B (en) Thermostable Cas9 nuclease
US11946040B2 (en) Adenine DNA base editor variants with reduced off-target RNA editing
AU2018254619B2 (en) Variants of Cpf1 (CAS12a) with altered PAM specificity
KR20190059966A (en) S. The Piogenes CAS9 mutant gene and the polypeptide encoded thereby
JP6408914B2 (en) Modified CASCADE ribonucleoproteins and their uses
US20200140835A1 (en) Engineered CRISPR-Cas9 Nucleases
JP2023126956A (en) Using split deaminases to limit unwanted off-target base editor deamination
EP3467125B1 (en) Using rna-guided foki nucleases (rfns) to increase specificity for rna-guided genome editing
KR102655021B1 (en) Using nucleosome interacting protein domains to enhance targeted genome modification
CN109790527A (en) The variant of the CRISPR1 (Cpf1) of general Bordetella and Francisella
KR20190104342A (en) Thermostable CAS9 nuclease
CN109384833A (en) The TALE RVD of specific recognition methylation modifying DNA base and its application
JP2012531909A (en) Rapid screening of biologically active nucleases and isolation of nuclease modified cells
WO2021042062A2 (en) Combinatorial adenine and cytosine dna base editors
CN114672473B (en) Optimized Cas protein and application thereof
JP7207665B2 (en) TALE RVDs that specifically recognize DNA bases modified by methylation and uses thereof
WO2001068807A2 (en) Identification of in vivo dna binding loci of chromatin proteins using a tethered nucleotide modification enzyme
US11608570B2 (en) Targeted in situ protein diversification by site directed DNA cleavage and repair
EP3978513A1 (en) Efficient ppr protein production method and use thereof
EP4095243A1 (en) System for hybridization-based precision genome cleavage and editing, and uses thereof
Park et al. Characterization of functional, noncovalently assembled zinc finger nucleases
Gao Combining CRISPR-Cas9 and Proximity Labeling to Illuminate Chromatin Composition, Organization, and Regulation
Tarlachkov et al. Cloning, purification and characterization of translationally fused protein DNA methyltransferase M• HhaI-EGFP
WO2019090287A2 (en) Sequence detection systems
KR20120087860A (en) A novel zinc finger nuclease and uses thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant