CN114940979B

CN114940979B - Method for improving cation-pi interaction by utilizing genetic code expansion and application

Info

Publication number: CN114940979B
Application number: CN202210140263.7A
Authority: CN
Inventors: 林世贤; 赵红霞; 刘超; 方誉
Original assignee: Hangzhou Chihua Hesheng Pharmaceutical Technology Co ltd
Current assignee: Hangzhou Chihua Hesheng Pharmaceutical Technology Co ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2024-01-23
Anticipated expiration: 2042-02-16
Also published as: CN114940979A

Abstract

Cation-pi interactions are an important non-covalent interaction between molecules, playing an important role in the biological and chemical fields, and despite great success in understanding the origin and biological function of cation-pi, research to design and synthesize stronger cation-pi interactions is scarce. The invention provides a method for improving cation-pi interaction by utilizing genetic code expansion and application thereof, taking histone methylation modified decoding protein as an example, introducing a tryptophan analogue substituted by a strong electron donating side chain group into a tryptophan site of an aromatic cage of the decoding protein by utilizing a genetic code expansion technology, improving the affinity of the decoding protein and histone methylation modification, and establishing a super-parent molecular recognition system for recognizing histone methylation modification.

Description

Method for improving cation-pi interaction by utilizing genetic code expansion and application

Technical Field

The invention relates to a method for improving cation-pi interaction, in particular to a method for improving cation-pi interaction by utilizing genetic code expansion and application thereof, belonging to the technical field of biology.

Background

Non-covalent interactions regulate the structure and function of biomolecules, playing a key role in the folding and recognition of molecules. Non-covalent interactions include cation-pi interactions, hydrogen bonding interactions, ionic interactions and hydrophobic interactions, where cation-pi is a strong non-covalent interaction that occurs between cations and pi electron clouds and plays an important role in biomolecular self-assembly, molecular recognition, molecular adhesion and molecular folding, and a series of work in recent years regarding the origin and rationale of cation-pi interactions suggest that cation-pi interactions play a critical role in the recognition process of substrate-receptor binding and post-histone modification. It has been reported that substitution of aromatic amino acids in aromatic cages with fluorine substituted tryptophan analogues impairs the cation-pi interactions due to the electron withdrawing ability of fluorine. In addition, mutation of aromatic amino acids in the aromatic cage of the protein can significantly reduce or disrupt interactions. Despite great success in understanding the origin and biological function of cation-pi interactions, the study of designing and synthesizing stronger cation-pi interactions is essentially blank.

Taking histone methylation decoding protein as an example, histone methylation refers to methylation modification on H3 and H4 histone N-terminal arginine or lysine residues mediated by methyltransferase, and the histone methylation modification is identified by decoding protein to be involved in regulating important life processes such as gene expression, DNA replication, DNA damage repair and cell cycle regulation, so that the research on the distribution and abundance of histone methylation is the basis for understanding the mechanism of histone codes and chromatin regulation molecules, and antibodies based on histone methylation are currently a technical method for mainly detecting the distribution and site specificity of histone methylation genome. Unfortunately, antibodies have the disadvantages of sequence-dependent affinity, low substrate resolution, non-specific recognition, suitability for in vitro experiments only, and the like, limiting their use and precise resolution of histone methylation function. There is therefore a need to develop new methods for high affinity detection of histone methylation modifications.

Research shows that a histone methylation modified decoding protein specifically recognizes histone methylation modification through cation-pi interaction by forming a hydrophobic pocket from 2-4 aromatic amino acids, and in view of the characteristic that the decoding protein can specifically recognize histone methylation modification, a method for detecting histone methylation modification based on a decoding protein domain becomes a substitute of a specific antibody, and is widely focused, wherein an ADD domain of ATRX protein and a PWWP domain of DNMT3A protein are respectively used for capturing H3K9me3 and H3K36me3; the MBT2 domain of L3MBTL1 recognizes either methylated lysine or di-methylated lysine modifications in a broad spectrum and has therefore evolved as a method of capturing methylated lysine proteomes. The detection method based on the decoding protein domain has the advantages of easy modification, economy and capturing of various PTMs, but the affinity of the decoding protein domain and histone methylation modification is in the micromolar level, and the wide application of the technology is limited. Therefore, there is a need to design high affinity histone methylation-encoding proteins that facilitate the application of the encoding protein domains in the enrichment, imaging, and sequencing of histone methylation modifications.

The genetic code expansion technology (genetic code expansion, GCE for short) specifically introduces unnatural amino acids with novel structures and unique properties on proteins, expands bricks and tiles of synthetic proteins, and provides a powerful tool for precise protein manipulation, protein function identification and optimization. The invention patent No. ZL 2019 1 0440254.8 entitled "construction of orthogonal aminoacyl-tRNA synthetase/tRNA system using chimeric design method" illustrates that the universal orthogonal property of pyrrolysine aminoacyl-tRNA synthetase (PylRS)/trnua orthogonal pair is transplanted to human mitochondrial phenylalanyl-tRNA synthetase (PheRS)/tRNA pair using protein chimeric design to construct a chimeric phenylalanyl-tRNA synthetase (chPheRS)/tRNA system with universal orthogonality, thereby widening the types of recognition of unnatural amino acids, providing a new tool for genetic code expansion technology, and the chimeric phenylalanyl-tRNA synthetase of the system can specifically recognize tryptophan analogues, such as: 6-methyl-tryptophan, 7-methyl-tryptophan, 6-chloro-tryptophan, 7-chloro-tryptophan, 6-cyano-tryptophan and 7-cyano-tryptophan.

Disclosure of Invention

The invention aims to provide a method for improving cation-pi interaction by using genetic code expansion, which replaces tryptophan forming the cation-pi interaction in a biological molecule with tryptophan analogues by using genetic code expansion technology, thereby improving the binding energy of the cation-pi interaction.

The technical scheme adopted for solving the technical problems is as follows:

a method for increasing cation-pi interactions using genetic code expansion, which replaces tryptophan forming an aromatic cage of cation-pi interactions in a biomolecule with tryptophan analogues using genetic code expansion techniques to increase binding energy of the cation-pi interactions.

Non-covalent interactions regulate the structure and function of biomolecules, playing a key role in molecular folding and molecular recognition, where cation-pi is a strong non-covalent interaction between cations and pi electron clouds, playing an important role in biomolecule self-assembly, molecular recognition, molecular adhesion and molecular folding, but the invention of how to enhance cation-pi interactions is in a blank state. The invention provides a synthesis method of tryptophan compounds with electron donating groups, and develops a method for replacing tryptophan of an aromatic cage by using a genetic code expansion technology, and the verified synthesized compounds A1-A6 obviously improve the cation-pi interaction binding energy. The histone methylation modification plays an important role in regulating gene expression, DNA replication, DNA damage repair, regulating cell cycle and other important life processes, the current method for detecting the histone methylation modification mainly depends on a specific antibody, but the antibody has the defects of sequence-dependent affinity, low substrate resolution, nonspecific recognition, suitability for in vitro experiments and the like. The method is used for establishing a super parent molecule recognition system for recognizing histone methylation modification, and the system is applied to aspects of detection, imaging, sequencing and the like of histone methylation modification.

In the invention, a series of tryptophan analogues are specifically introduced into an aromatic cage of a decoding protein to regulate the binding affinity of the decoding protein and histone methylation modification, the affinity of histone methylation and the decoding protein thereof is improved by 4-8 times by utilizing the strategy, the affinity of H3K4me3 and PHD decoding protein structural domain reaches nanomolar level through tandem repeat design, and a super-parent molecular recognition system for detecting the histone methylation modification is developed by utilizing the strategy.

The method of the invention can be applied to study any biomacromolecule that forms a cation-pi interaction.

Preferably, the method comprises the steps of:

s1, designing and synthesizing a strong electron side chain substituted tryptophan analogue, wherein the tryptophan analogue is unnatural amino acid, and is selected from one of 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furyran-tryptophan (A10), 6, 7-dioxole-tryptophan (A11) or 6, 7-cyclopentane-tryptophan (A12), and the structural formulas of the tryptophan analogues A1 to A12 are as follows:

S2, screening chimeric phenylalanine aminoacyl-tRNA synthetase mutants which specifically recognize tryptophan analogues A1 to A12;

s3, taking a biomolecule forming cation-pi interaction as a research object, and utilizing a genetic code expansion technology to specifically introduce tryptophan analogues into the biomolecule through the chimeric phenylalanine aminoacyl-tRNA synthetase mutant to obtain the protein with the tryptophan analogues.

Preferably, indole B substituted at different positions is used as a reactant to react to obtain a target product, and the chemical structural formula of the indole substituted at different positions is as follows:the structural general formula of the target product is as follows: />Wherein X is selected from: one of an oxygen atom or a carbon atom.

Preferably, the synthesis method of the tryptophan analogues A1 to A12 comprises the following steps:

step one: synthesis of starting material compound B:

starting material B is selected from one of 6-methyl-indole (B1), 6-methoxy-indole (B2), 7-methyl-indole (B3), 7-methoxy-indole (B4), 6, 7-methoxy-indole (B5), 6, 7-methyl-indole (B6), 7, 8-dihydrofuran-indole (B7), 6, 7-dihydrofuran-indole (B8), 7, 8-furan-indole (B9), 6, 7-furan-indole (B10), 6, 7-dioxole-indole (B11) or 6, 7-cyclopentane-indole (B12), the structural formula of the above indole analogues B1 to B12 being:

(1) Synthesis of compounds B6, B7, B8, B9, B10: aniline (G6, G7, G8, G9 or G10) and triethanolamine are used as reactants, ruCl is used ₃ ·nH ₂ O，SnCl ₂ ·2H ₂ O and PPh ₃ As a catalyst, reacting in anhydrous dioxane to obtain a starting material compound B; (2) Chemical treatmentSynthesis of compounds B11, B12: aniline (G11 or G12), chloral hydrate and hydroxylamine hydrochloride are used as reactants, sulfuric acid is used as a catalyst, water is used as a solvent to obtain a crude product, the crude product is reacted with methanesulfonic acid to obtain an isatoic product, and finally, the isatoic product is reduced by lithium aluminum hydride to obtain an initial raw material compound B;

step two: synthesis of Compound C: the initial raw material compound B and iodine are used as reactants, potassium hydroxide is used as alkali, anhydrous N, N-dimethylformamide is used as a solvent, and an intermediate compound C is obtained through reaction;

step three: synthesis of Compound D: the compound C and di-tert-butyl dicarbonate are used as reactants, triethylamine is used as alkali, DMAP is used as a catalyst, and the intermediate compound D is obtained by reaction in anhydrous dichloromethane;

step four: synthesis of compound E: the compound D and Boc-3-iodine-L-alanine methyl ester are used as reactants, palladium acetate is used as a catalyst, S-Phos is used as a ligand, and the intermediate compound E is obtained by reaction in anhydrous N, N-dimethylformamide solvent under the protection of nitrogen;

Step five: synthesis of compound F: under the condition that methanol and water are used as solvents, the compound E reacts with potassium hydroxide as alkali to obtain an intermediate compound F;

step six: synthesis of compound a: the compound F reacts with trifluoroacetic acid as a catalyst under the condition of anhydrous dichloromethane as a solvent to obtain target products (tryptophan analogues A1 to A12).

Preferably, in the fourth step, the palladium acetate catalyst is used in an amount of 2% of the substrate (compound D) in terms of molar amount;

the reaction time is 1-2h in the first step, 2h in the second step, 8h in the third step, 5h in the fourth step, 2-3h in the fifth step and 2h in the sixth step;

the reaction temperature is 90 ℃ in the first step, 0 ℃ in the second step, 0 ℃ in the third step, 40 ℃ in the fourth step, 25 ℃ in the fifth step and 0 ℃ in the sixth step.

Preferably, in S2, (1) the chimeric phenylalanyl-tRNA synthetase mutants specifically recognizing tryptophan analogues are selected by constructing a saturated mutagenesis gene library of amino acids of the amino acid binding pocket of the chimeric phenylalanyl-tRNA synthetase; (2) GFP fluorescence and LC-MS mass spectrometry to identify the efficiency and specificity of phenylalanine aminoacyl-tRNA synthetase mutant recognition; (3) The chimeric phenylalanine aminoacyl-tRNA mutant obtained by screening is applied to expression application of hosts such as bacteria, cells or viruses.

In screening for chimeric phenylalanine aminoacyl-tRNA synthetase mutants specifically recognizing 6-methoxy-tryptophan (A2), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophan (A11), 6, 7-cyclopentane-tryptophan (A12), positive and negative screening for amino acids specifically recognizing 6-methoxy-tryptophan (A2), 7-methoxy-tryptophan (A11), 6, 7-cyclopentane-tryptophan (A12), 6, 7-methoxy-tryptophan (A2), 6, 7-dioxole-tryptophan (A4), 6, 7-methyl-tryptophan (A7, 7-tryptophan), 8-tryptophan (A), 7-5), 8-dimethyl-tryptophan (A), 6, 7-furantryptophan (A), 8-7-dimethyl-tryptophan (A), 6, 7-furantryptophan (A464), M490, T467 and A507, by constructing a library of saturated mutagenesis genes (E391, V393, F464, M467 and A507), the chimeric phenylalanyl-tRNA synthetase mutant of 6, 7-cyclopentane-tryptophan (A12) finally obtains the phenylalanine aminoacyl-tRNA synthetase mutant containing six mutations of E391D, V393G, M490V, F464V, T467G and A507G, wherein the nucleotide sequence and the amino acid sequence of the phenylalanyl-tRNA synthetase mutant are shown as SEQ: ID 1-2.

Preferably, in S3, (1) the tryptophan corresponding site of the decoding protein forming the aromatic cage is mutated to a stop codon (TAG), (2) the decoding protein mutant is co-expressed with the chimeric phenylalanyl-tRNA synthetase mutant, the corresponding tryptophan analog (typically 1 mM) is added during expression, (3) the decoding protein variant is purified according to the GST-TAG protein purification method, and the fidelity of the decoding protein variant is identified by LC-MS.

Preferably, the nucleotide sequence and the amino acid sequence of the chimeric phenylalanine-tRNA synthetase mutants recognizing tryptophan analogues A1-A6 are set forth in SEQ ID NO: 1-2.

Preferably, the method is directed to a histone methylation decoding protein domain, said decoding protein domain being any one of Chromo, PHD, PWWP, tudor, MBT, CW, SPIN and BAH domains.

Use of a protein with a tryptophan analog obtained by the method for constructing a decoded protein super parent recognition system specifically recognizing histone methylation modifications. Taking H3K4me3 as an example, establishing a decoding protein KDM5A-PHD3 super parent for identifying the H3K4me3, and naming the decoding protein as PHD super parent; taking H3K9me3 as an example, establishing a decoding protein CDY1-Chromo super parent for identifying the H3K9me3, and naming the decoding protein as the Chromo super parent; taking H3K27me3 as an example, establishing a decoding protein BAHD1-BAH super parent for identifying H3K27me3, and named as BAH super parent; taking H3K36me3 as an example, a decoding protein DNMT3B-PWWP super parent for identifying H3K36me3 is established and named as PWWP super parent.

The method comprises the steps of establishing a specific recognition histone methylation modified decoding protein super parent, wherein the affinity reaches a nanomolar level, and the titer is superior to that of a histone methylation modified specific antibody, so as to detect histone methylation modification in a biological sample. The specific recognition histone methylation modified decoding protein super parent is marked by a fluorescent group, and can be applied to detection of histone methylation modification in biological samples by an imaging technology. The specific recognition histone methylation modified decoding protein super parent recognition system can be applied to living body imaging and can dynamically detect the change of histone methylation modification. The histone methylation modified decoding protein super parent identification system can be used for enriching histone methylation modification of a sample and is applied to single cell sequencing technology. The strong electron side chain substituted tryptophan analogues can improve the affinity of the decoding protein and histone methylation modification by 4-8 times, and the tandem repeated decoding protein improves the affinity of the decoding protein and histone methylation modification.

Taking a KDM5A PHD3 as an example,

(1) The specific recognition H3K4me3 of the aromatic cage is formed by W18 and W28 through judging the crystal structure (PDB: 2 KGI), so that the W18 site and the W28 site of KDM5A PHD3 are mutated into stop codons, and the nucleotide sequences are respectively shown as SEQ ID NO:3 to 4.

(2) A variant protein of the PHD3 decoding protein domain is obtained by specifically introducing tryptophan analogues at the W18 and W28 sites of the PHD3 decoding protein domain respectively by using a chimeric phenylalanine translation system, wherein the tryptophan analogues are any one of 6-cyano-tryptophan, 7-cyano-tryptophan, 6-chloro-tryptophan, 7-chloro-tryptophan, 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophan (A11) and 6, 7-cyclopentane-tryptophan (A12).

(3) The affinity of PHD3 decoding protein structural domain variant and H3K4me3 is measured by a microphoresis instrument, when the non-natural amino acid A2 is inserted into the W28 site of the PHD3 decoding protein structural domain, the affinity of PHD3 and H3K4me3 can be improved by 8 times, and the decoding protein variant is PHD3-W28-A2 and is named PHD. The amino acid sequence of the specific H3K4me3 polypeptide is shown in Table 2.

(4) In some specific embodiments, the site-specific introduction of 6-methoxy-tryptophan (A2) into a different histone methylation modified decoding protein domain increases the affinity of the decoding protein domain for histone methylation modifications.

Taking H3K9me3 as an example, selecting a Chromo domain of CDY1 as a study object, and improving the affinity of the H3K9me3 and the Chromo domain of CDY1 by 2 times after 6-methoxy-tryptophan is inserted into the W28 site of the Chromo domain of CDY 1; taking H3K27me3 as an example, selecting the BAH domain of BAHD1 as a study object, and improving the affinity of H3K27me3 and the BAH domain by 5 times after 6-methoxy-tryptophan is inserted into the W667 site of the BAH domain of BAHD 1; taking H3K36me3 as an example, the PWWP domain of DNMT3B was chosen as the subject, when 6-methoxy-tryptophan was inserted into the W263 site of the PWWP domain of DNMT3B, the affinity of the PWWP domains of H3K36me3 and DNMT3B was increased by 7-fold. Wherein the nucleotide sequence and the protein sequence of the Chromo domain of CDY1 are shown in SEQ ID NO:5-6; wherein the nucleotide sequence and the protein sequence of the BAH domain of BAHD1 are shown in SEQ ID NO:7-8; wherein the nucleotide sequence and the protein sequence of the PWWP domain of DNMT3B are shown in SEQ ID NO:9-10.

The following illustrates the application of several aspects of the invention.

1. The invention provides a method for establishing tandem repeat histone methylation decoding protein to improve histone methylation modification and decoding protein affinity.

Taking PHD3 as an example of this,

(1) Constructing a multiple repeat of the decoded protein variant, and comparing SEQ ID NO:4 as a template, constructing double and triple decoding protein mutants, wherein the double and triple decoding proteins respectively carry 2 and 3 amber Terminators (TAGs), and the nucleotide sequences of the specific double and triple mutants are shown as SEQ ID NO:11 to 12.

(2) The 2 or 3 6-methoxy-tryptophan (A2) are introduced into the amber terminator site of the duplex or triplex PHD protein through the genetic code expansion technology in a site-specific way, so that duplex and triplex PHD protein variants are obtained and are respectively named as 2x PHD and 3x PHD.

The affinity of duplex or triplex PHD protein variants to H3K4me3 was determined using a microphoresis instrument. Duplex and triplex PHD variants were obtained with 14.7 fold and 62.9 fold increases in affinity for H3K4me 3.

In some specific embodiments, the above strategy is equally applicable to other histone methylation modified decoding protein domains.

2. The invention provides a method for detecting histone methylation modification by a histone methylation modification super-parent molecular recognition system.

(1) Expressing and purifying the 6-methoxy-tryptophan (A2) substituted PHD protein variant to obtain PHD protein, 2xPHD protein and 3xPHD protein.

(2) Taking HeLa cell lysate as an example, heLa cells were lysed, and after gradient dilution to different concentrations, the protein samples were separated by SDS-PAGE running gel and transferred to PVDF membranes.

(3) After milk blocking with PVDF membrane, H3K4me3 specific antibodies, PHD protein, 2x PHD protein and 3x PHD protein were incubated overnight, respectively.

(4) Incubation of PVDF membrane for H3K4me 3-specific antibody the corresponding secondary antibody was incubated. Incubation of PVDF membrane of PHD protein further incubates GST-specific antibodies, and finally incubates the corresponding secondary antibodies.

(5) Chemiluminescent imaging. The 2x PHD protein and the 3x PHD protein exhibited higher detectability than the H3K4me3 specific antibodies.

3. The invention provides a method for detecting histone methylation modification by using a histone methylation modification super-parent molecular recognition system through an imaging technology. The method can be applied to living cell imaging and also can be applied to in vitro immunofluorescence imaging technology. The method can be applied to different cells, such as: HEK 293T cell line, heLa cell line, NCI-60 cell line, CHOs cell line, and the like.

The method is applied to living cell imaging and comprises the following steps:

(1) Constructing a plasmid expressed by cells, and encoding the plasmid with SEQ ID NO:4, cloning PHD-W28TAG fragment by using a template, cloning vector fragment by using pEGFP-EGFP as a template, and constructing plasmid of pEGFP-PHD-W28TAG-EGFP by Gibson assembly; the sequence represented by SEQ ID NO:1 as template cloning phenylalanine aminoacyl-tRNA synthetase (chPheRS) fragment, pCDNA3.1 as template cloning carrier fragment, and Gibson assembly to construct pCDNA3.1-chPheR9 plasmid. The plasmid map is shown in FIG. 1.

(2) The above two plasmids were combined according to 1:1 in molar ratio, HEK 293T cells were transfected. After 6-8 hours, 2mM 6-methoxy-tryptophan (A2) was added.

(3) The live cell imaging microscope detects EGFP fluorescence signal changes.

The immunofluorescence imaging technique applied in vitro comprises the following steps:

(1) Expressing and purifying the 6-methoxy-tryptophan substituted PHD protein variant to obtain PHD protein, 2xPHD protein and 3xPHD protein.

(2) The labeled proteins, PHD protein, 2x PHD protein and 3xPHD protein, were labeled with NHS-Cy5 activated lipid, respectively.

(3) Taking a HeLa cell line as an example, formaldehyde-immobilized cells are respectively incubated with Cy 5-labeled PHD protein or histone methylation modified specific antibodies, and after incubation, the localization of methylation modification of the corresponding group of proteins is detected by confocal microscopy imaging. The 2xPHD protein variants and the 3xPHD protein variants have a higher signal-to-noise ratio than the H3K4me3 specific antibodies.

4. The invention provides a method for detecting a histone methylation modification interaction relation group by a histone methylation modification super-parent molecular recognition system through a proximity labeling technology.

Adjacent labelling technology, which generally utilizes CRISPR gene editing technology or plasmid-based expression to express adjacent biotinylases fused to bait proteins in cells, has evolved to provide a supplement to traditional methods of studying intermolecular interactions in living cells. After the addition of exogenous biotin, proteins adjacent to the bait protein are biotinylated, which can be enriched by streptavidin-coupled magnetic beads, and then identified by mass spectrometry. The affinity of the H3K4me3 super-affinity molecular recognition system provided by the invention to H3K4me3 reaches 7nM, PHD-W28TAG and the bad blood acid peroxidase APEX/APEX2 can be fused and expressed, when PHD variant is specifically combined with H3K4me3, APEX2 can mark a protein group adjacent to H3K4me3 with Biotin under the stimulation of hydrogen peroxide, and then the protein group is enriched through strepitavidins, and finally the protein group is analyzed through LC-MS. The method is not limited to APEX 2-based proximity labelling techniques, but is also applicable to other proximal biotin-based techniques. Such as: horseradish peroxidase HRP and biotin ligase BioID, BASU, turboID, miniTurbo, and the like.

Compared with the prior art, the invention has the main advantages that:

1. the invention provides a method for improving cation-pi interaction, which can be applied to any biomacromolecule with cation-pi interaction. The invention provides a synthetic path of 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophane (A11) and 6, 7-cyclopentane-tryptophan (A12), which can be applied to a legacy code expansion technology and can improve cation-pi interaction.

2. The present invention has a wide range of applications including (but not limited to): applied to detect dynamic changes of histone methylation modification in combination with living cell imaging techniques; detecting histone methylation modification in the biological sample in combination with immunofluorescence technology; genome sequencing technology and the like can be combined to analyze genome related to histone methylation modification; the histone methylation modification interaction proteomes can be identified in conjunction with proximity labeling techniques and the like. Specifically: (1) The introduction of 6-methoxy-tryptophane (A2) at the tryptophan site forming the cation-pi interaction using the genetic code extension technique can increase the affinity between biomolecules by a factor of 4-8. (2) 6-methoxy-tryptophan (A2) is specifically introduced into the PHD locus of the decoding protein of H3K4me3 to improve the affinity of the PHD with H3K4me3 by 8 times, and after the triple design, the affinity of the PHD variant with H3K4me3 reaches 7nM. (3) The histone methylation super-parent molecular recognition system provided by the invention has high sensitivity in recognizing histone methylation modification, and has higher specificity and sensitivity compared with a histone methylation modification specific antibody. (4) The histone methylation modified super-parent molecular recognition system has the advantages of easy modification, economy and capturing of various PTMs, and can be developed into various super-parent molecular recognition systems aiming at specific methylation modification.

Drawings

FIG. 1 is a plasmid map;

FIG. 2 is a chemical synthesis pathway, A is a synthesis pathway of 6-methoxy-tryptophan (A2) and 7-methoxy-tryptophan (A4), and B is a chemical synthesis pathway of 6, 7-cyclopentane-tryptophan A12;

FIG. 3 is an inventive strategy for modulating cation-pi interactions between histone methylation and its decoding proteins using genetic code expansion techniques. (A) Tryptophan is the amount of tryptophan in the aromatic cage component of the different histone methylation modified decoding proteins. (B) A flow diagram of a histone methylation super-parent molecular recognition system is developed by applying a genetic code expansion technology. Taking H3K4me3 as an example, replacing tryptophan in the aromatic cage of the decoding protein with a tryptophan analogue by utilizing a genetic code expansion technology, so as to regulate and control cation-pi interaction in the aromatic cage and obtain an unnatural amino acid analogue which remarkably improves the methylation affinity of the decoding protein and the histone. (C) The structural formula of the unnatural amino acid used in the invention;

FIG. 4 is a graph depicting the efficiency and specificity of identifying chimeric alanine aminoacyl-tRNA synthetases A2 and A4, wherein: GFP fluorescence reporting experiments identify the efficiency of the chimeric phenylalanine aminoacyl tRNA synthetases A2RS (A) and A4RS (B) to recognize A2 and A4, respectively, and mass spectrometry identifies the fidelity of the chimeric phenylalanine aminoacyl tRNA synthetases A2RS (C) and A4RS (D) to recognize A2 and A4, respectively;

FIG. 5 is a PHD domain variant protein designed to increase affinity for H3K4me3. (A) Complex structure of KDM5A PHD3 protein and H3K4me3 polypeptide (PDB: 2 KGI) wherein PHD3 and polypeptide are displayed in cartoon mode and aromatic amino acid and H3K4me3 modifications are displayed in a rod-like structure. (B) Coomassie brilliant blue displays PHD-W18-UAA and PHD-W28-UAA variants. (C) The affinity of PHD-W18-UAA variants to H3K4me3 was determined by a microphoresis instrument, wherein H3K4me3 was labeled with a FITC fluorophore. (D) Determining the affinity of the PHD-W28-UAA variant with H3K4me3 by a microphoresis instrument, wherein the H3K4me3 is marked by a FITC fluorescent group;

FIG. 6 is a diagram of a multivalent tandem repeat PHD domain designed to recognize H3K4me3. (A) A cartoon of multivalent tandem repeat PHD domain design. (B) Coomassie blue staining identified purity of the multiple PHD protein variants. (C) Determining the affinity of the multi-linked PHD protein variant with H3K4me3 by a micro thermophoresis instrument, wherein H3K4me3 is marked by a FITC fluorescent group;

FIG. 7 is the detection and imaging of H3K4me3 using a histone methylation super-philic molecular recognition system. (A) The group protein methylation super-philic molecule recognition system is applied to strategy diagrams of detection and imaging. (B) The H3K4me3 level of HeLa cells was detected using histone methylation super-philic molecules. The H3 specific antibody and the H3K4me3 specific antibody are used as control groups, and PHD-WT, 2xPHD and 3xPHD are used for detecting H3K4me3 respectively. (C) The histone methylation super parent molecule recognition system is applied to fluorescence imaging detection of cell H3K4me3 positioning, wherein PHD protein is marked by Cy 5;

FIG. 8 is a graph showing the efficiency of the system in recognizing 6-methoxy-tryptophan, 6, 7-methoxy-tryptophan, and 6, 7-methyl-tryptophan in mammalian cells using a chimeric phenylalanyl-tRNA synthetase obtained by a flow cytometry detection screen of the mammalian cells;

FIG. 9 shows the experimental flow chart of the application of the histone methylation super-philic molecular recognition system established by the genetic code expansion technology to the near marker detection of protein interaction group (A) and the GO analysis data.

Detailed Description

The technical scheme of the invention is further specifically described by the following specific examples. It should be understood that the practice of the invention is not limited to the following examples, but is intended to be illustrative of any of the various modifications and/or variations that may be made to the invention.

In the present invention, unless otherwise specified, all parts and percentages are by weight, and the equipment, materials, etc. used are commercially available or are conventional in the art. The methods in the following examples are conventional in the art unless otherwise specified.

The primer sequences used in the construction of the vector of the present invention in the specific examples are shown in Table 1:

table 1: primer sequences for constructing vectors

/>

The inventive strategy for modulating the cation-pi interactions between histone methylation and its decoding proteins using genetic code expansion techniques of the present invention is shown in FIG. 3, and the following examples illustrate specific methods.

Example 1: chemical Synthesis of Compounds A2 and A4

To 50mL of a solution of anhydrous N, N-dimethylformamide in B (2.0 g,13.6 mmol) was added potassium hydroxide (1.68 g,29.9 mmol), and the mixture was stirred at room temperature for 20min. To the reaction flask was added dropwise 30mL of an iodine solution of anhydrous N, N-dimethylformamide (4.14 g,16.3 mmol), and stirring was continued at room temperature for 2 hours. The reaction mixture was poured into an ice water solution containing 0.1% sodium thiosulfate. The mixture was placed in a refrigerator to ensure complete precipitation. The precipitate was filtered, washed with cold water and then dried in vacuo. 3-iodo-1H-indole B (90% B2 yield, 93% B4 yield) was obtained as a pale yellow solid, which was used in the next step without further purification.

The solid B (2.73 g,10.0 mmol) obtained in the first step was dissolved in 30mL anhydrous N, N-dimethylformamide. After 60% NaH (391.2 mg,16.3 mmol) was washed with hexane, it was suspended in 10mL anhydrous N, N-dimethylformamide under nitrogen. Raw material B was slowly added to the suspension under ice bath conditions, stirred for 10min, then p-toluenesulfonyl chloride (2.1 g,11.0 mmol) was added and stirred for 5h at 25 ℃. The mixture was poured into water, extracted three times with ethyl acetate, and then the ethyl acetate organic layer was washed with saturated brine and dried over anhydrous sodium sulfate. The organic phase is then concentrated under reduced pressure. Column chromatography was performed with petroleum ether and ethyl acetate to give compound C as a white solid (C2 yield 85%, C4 yield 81%).

The dried degassed N, N-dimethylformamide was charged under nitrogen to a vessel containing zinc powder (3.9 g,50.0 mmol). TMSCl (108.6 mg,1.0 mmol) was added, and the mixture was vigorously stirred at room temperature for 30min, after which the stirring was stopped, the zinc was precipitated. The supernatant was withdrawn under a flow of nitrogen with a syringe, and then new N, N-dimethylformamide was added to the zinc. Stirring was stopped after 2 minutes continued to precipitate zinc dust and the supernatant removed as before, and this step was repeated two more times. 1, 2-dibromoethane (751.4 mg,4.0 mmol) was then added to the vessel and stirred at 80℃for 30min. After the mixture was cooled to 25 ℃, tmcl (325.8 mg,3.0 mmol) was added and the resulting mixture was stirred for an additional 30min. Boc-3-iodo-L-alanine methyl ester (3.95 g,12 mmol) was dissolved in 10mL of N, N-dimethylformamide and added to the activated zinc powder, and the mixture was stirred vigorously. After the exotherm subsided (controlled with ice bath), stirring was continued for an additional 30min at which time stirring was stopped and the zinc was allowed to precipitate. The supernatant was gently withdrawn with a syringe and poured into a clean reaction flask under a flow of nitrogen. The supernatant was transferred by syringe to compound D (2.13 g,5.0 mmol), pd (OAc) ₂ (112.2 mg,0.5 mmol) and S-Phos (410.5 mg, 1.0 mmol). The reaction was carried out for 4h under nitrogen protection. After completion of the reaction, the mixture was poured into water, extracted with ethyl acetate, and the upper organic layer was washed with brine, dried over anhydrous sodium sulfate, and purified by column chromatography over petroleum ether and ethyl acetate after completion of concentration under reduced pressure to give compound E (E2 yield 57%, E4 yield 45%) as a pale yellow oil.

Product E was analyzed and the results were as follows: e2 (s) ¹ H NMR(500MHz,CDCl ₃ )δ7.70(d,J＝8.5 Hz,2H),7.48(d,J＝2.3Hz,1H),7.30(d,J＝8.7Hz,1H),7.22(d,J＝8.1Hz,3H),6.84 (dd,J＝8.7,2.3Hz,1H),5.05(d,J＝8.0Hz,1H),4.60(d,J＝7.1Hz,1H),3.86(s,3H), 3.62(s,3H),3.14(qd,J＝14.7,5.6Hz,2H),2.34(s,3H),1.49–1.26(m,9H). ¹³ C NMR (125MHz,CDCl ₃ )δ172.15,158.25,155.13,145.01,136.29,135.28,129.98,126.82, 123.22,120.13,117.44,112.55,98.09,80.26,55.91,53.69,52.47,28.46,21.70.HRMS(ESI)m/z calcd.For C ₂₀ H ₂₃ N ₂ O ₅ S ⁺ (M-Boc) ⁺ 403.1322,found 403.1331.E4) ¹ H NMR (500MHz,CDCl ₃ )δ7.69(d,J＝8.1Hz,2H),7.62(s,1H),7.24(d,J＝8.1Hz,2H),7.16 –7.06(m,2H),6.67(dd,J＝7.3,1.5Hz,1H),5.15(d,J＝8.0Hz,1H),4.65(dt,J＝8.0,5.5Hz,1H),3.71(s,3H),3.67(s,3H),3.32–3.11(m,2H),2.37(s,3H),1.44(s,9H). ¹³ C NMR(125MHz,CDCl ₃ )δ172.28,155.16,147.49,144.21,137.36,133.77,129.43, 127.30,126.91,124.79,124.04,114.95,112.05,107.16,80.12,55.54,53.85,52.49,28.41,28.01,26.99,21.67.HRMS(ESI)m/z calcd.For C ₂₀ H ₂₃ N ₂ O ₅ S ⁺ (M-Boc) ⁺ 403.1322, found 403.1334.

Compound E (973.2 mg,2.0 mmol) was dissolved in 50mL of methanol, naOH (1.2 g,30.0 mmol) was added and dissolved in 20mL of H ₂ O. The mixture was heated under reflux for 8h, then methanol was evaporated under reduced pressure to a volume of about half the reaction volume. Acidifying with ice-cold 2M diluted hydrochloric acid, and adjusting pH to 3. The aqueous solution was extracted with cold ethyl acetate, and the upper organic layer was washed with saturated brine, dried over anhydrous sodium sulfate, and evaporated in vacuo to give a colorless oil, which was used to give carbamate F without further purification. F was then dissolved in dichloromethane and trifluoroacetic acid (112.2 mg,0.5 mmol) was added for deprotection to give the title compound A (74% yield of A2 and 68% yield of A4) as a pale yellow solid, the complete synthetic route being shown in FIG. 2.

Product a was analyzed and the results were as follows: a2 (s) ¹ H NMR(500MHz,D ₂ O)δ7.49(d,J＝8.7Hz, 1H),7.01(s,1H),6.95(d,J＝2.4Hz,1H),6.72(dd,J＝8.7,2.4Hz,1H),3.75(s,3H), 3.45(dd,J＝7.3,5.2Hz,1H),3.03(dd,J＝14.4,5.2Hz,1H),2.86(dd,J＝14.4,7.3Hz, 1H). ¹³ C NMR(125MHz,D ₂ O)δ182.83,155.19,136.74,123.18,122.03,119.51, 110.63,108.80,95.26,56.42,55.78,30.43.HRMS(ESI)m/z calcd.For C ₁₂ H ₁₅ N ₂ O ₃ S ⁺ (M+H) ⁺ 235.1077,found 235.1081.A4) ¹ H NMR(500MHz,D ₂ O)δ7.27–7.22(m,2H), 7.08(td,J＝7.9,0.9Hz,1H),6.78(d,J＝7.7Hz,1H),4.31(ddd,J＝6.3,5.4,0.9Hz, 1H),3.94(s,3H),3.44(ddt,J＝15.4,5.3,0.9Hz,1H),3.36(dd,J＝15.4,7.3Hz,1H)). ¹³ C NMR(125MHz,D ₂ O)δ171.83,146.06,128.04,126.58,124.97,120.10,117.44, 115.12,106.92,55.63,53.27,25.83.HRMS(ESI)m/z calcd.For C ₁₂ H ₁₃ N ₂ O ₃ S ^- (M-H) ^- 233.0932,found 233.0939.

Example 2: library construction of chimeric phenylalanine aminoacyl-tRNA synthetase mutants and positive and negative screening

In this example, the gene sequence of the chimeric phenylalanyl-tRNA synthetase chPheRS is shown in SEQ ID NO: 1.

(1) Selecting the amino acid binding site of the chimeric phenylalanyl-tRNA synthetase with reference to the structure of the humanized mitochondrial phenylalanyl-tRNA synthetase: f464, T467 and a507, amino acids around the binding pocket: e391, V393, M490.

(2) The chimeric phenylalanyl-tRNA synthetase (T467G and A507G) is used as a template, the primers chPheRS-E391NNK-V393NNK-R/F, chPheRS-M490NNK-R/F and chPheRS-F464NNK-R/F are used for amplifying gene fragments, and the nucleotide sequences of the primers are shown in SEQ ID NO:19-24, the library of mutations was cloned into the pBK vector by Gibson assembly to generate a chPheRS mutant gene library (E391 NNK, V393NNK, M490NNK, F464NNK, T467G and A507G).

(3) The pNEG-chPheT-Barnase-2 TAG was transformed into E.coli DH10B to prepare negative selection competent cells, the plasmid map of which is shown in FIG. 1; positive selection competent cells were prepared by transforming pNEG-3C11-CAT-112TAG-GFP190TAG into E.coli DH10B, the plasmid map of which is shown in FIG. 1.

(4) The screened library in (2) was transformed into negative selection competent cells, and the bacterial solution was spread on LB plates (kanamycin, 50. Mu.g/mL; ampicillin, 100. Mu.g/mL; 0.2% L-arabinose) and incubated at 37 ℃.

(5) The clones in (4) were collected to extract plasmids, and the plasmids were transformed into positive selection competent cells, and the whole bacterial solution was plated on LB agar plates (kanamycin, 50. Mu.g/mL; ampicillin, 100. Mu.g/mL; chloramphenicol, 10. Mu.g/mL; 0.2% L-arabinose; 2mM unnatural amino acid) to which unnatural amino acids were added, cultured at 37℃for 12 hours, and further cultured at 30℃for 48 hours.

Example 3: screening chimeric phenylalanine aminoacyl-tRNA synthetase mutants specifically recognizing unnatural amino acid by GFP fluorescence reporting experiment

(1) After two rounds of forward screening, the monoclonal with fluorescent signal in example 2 was picked overnight for culture.

(2) According to 1:100, and culturing at 37deg.C until OD600 = 0.6-0.8, adding 0.2% L-arabinose to induce expression, simultaneously taking 1mL bacterial liquid, adding 1mM corresponding unnatural amino acid, and expressing at 30deg.C for 20 hr.

(3) After centrifugation of 750. Mu.L of the bacterial liquid in (2), 150. Mu.L of 1 XBugbuster (Millipore, lot: 3492682) was added and placed at 25℃for 30min, followed by centrifugation, 100. Mu.L of the supernatant was taken into a 96-well plate, and 100. Mu.L of the bacterial liquid in (2) was simultaneously taken, and GFP fluorescence signal intensity and OD of the corresponding clone were measured by an enzyme-labeled instrument Bio Tek Synergy NEO2 ₆₀₀ The efficiency of the mutant in recognizing unnatural amino acids was calculated.

(4) Sequencing the chimeric phenylalanine aminoacyl-tRNA synthetase mutant capable of recognizing the corresponding unnatural amino acid at high efficiency to obtain a specific mutant sequence, and placing the plasmid corresponding to cloning at-20 ℃ for standby.

(6) Finally, a chimeric phenylalanyl-tRNA synthetase mutant was identified that recognizes 6-methoxy-tryptophan, 7-methoxy-tryptophan, 6, 7-methyl-tryptophan, 6, 7-methoxy-tryptophan, and comprises six mutations E391D, V393G, M490V, F464V, T467G, and A507G, designated chPheRS9, the nucleotide and amino acid sequence of which is shown in SEQ ID NO: 1-2.

(7) The efficiency of the chimeric phenylalanine translation system in recognizing unnatural amino acids at different unnatural amino acid concentrations was determined using GFP fluorescence reporting experiments. The efficiency and fidelity of the recognition of 6-methoxy-tryptophan and 7-methoxy-tryptophan by the chimeric phenylalanyl-tRNA synthetases is shown in FIG. 4.

Example 4: series plasmid construction of KDM5A PHD3 (PHD) Domain

All plasmids were constructed from the Gibson assembly system, except as specified. Take as an example a series of plasmid constructs for the KDM5A PHD3 (PHD) domain.

1. PHD wild-type plasmid: amplifying GST tag by using pGEX-6p vector as template and using primer pNEG-GST-F/R, the nucleotide sequence is shown as SEQ ID NO: 25-26; the cDNA is used as a template, a PHD domain is amplified by using a primer pNEG-PHD-F/R (Uniport ID: P29375, nucleotides 1598-1663, the nucleotide sequence of the primer is shown as SEQ ID NO:27-28, a pNEG-2 x chPheT vector is used as a template, the primer pNEG-PHD-V-F/R is used for amplifying the vector, the nucleotide sequence of the primer is shown as SEQ ID NO:29-30, and a plasmid pNEG-2 x chPheT-PHD-GST is constructed by Gibson assembly.

2. PHD mutant plasmid: introducing a site-directed mutation of an amber codon into the PHD domain W28 by using a primer pNEG-PHD-W28TAG-F/R and constructing a plasmid pNEG-2 xchPheT-PHD-W28 TAG-GST through Gibson assembly by using pNEG-2 xchPheT-PHD-GST as a template; introducing a site-directed mutation of an amber codon into the PHD domain W18 by using a primer pNEG-PHD-W18TAG-F/R, and constructing a plasmid pNEG-2 XchPheT-PHD-W18 TAG-GST through Gibson assembly, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 31-34.

3. Multivalent tandem repeat PHD domain plasmid: the PHD-W28TAG fragment containing 6x-Linker (GGSGGS) is amplified by using pNEG-2 x chPheT-PHD-W28TAG-GST as a template and adopting a primer pNEG-2 x PHD-F/R, and the nucleotide sequence of the PHD-W28TAG fragment is shown as SEQ ID NO: 35-36; amplifying the vector by using a primer pNEG-2 x PHD-V-R and pNEG-PHD-V-F, wherein the nucleotide sequence of the vector is shown in SEQ ID NO:37 and SEQ ID NO:29, construction of duplex or triplex PHD plasmids by Gibson assembly: pNEG-2 xchPheT-2 xPHD-W28TAG-GST and pNEG-2 xchPheT-3 xPHD-W28TAG-GST.

4. Multicomponent tandem repeat PHD-Chromo domain plasmid: amplifying the vector by using pNEG-2 xchPheT-PHD-W28-GST as a template and adopting primers pNGE-PHD-V-F and pNEG-2 xPHD-V-R; the pNEG-2 XchPheT-CDY 1-W28TAG-GST is used as a template, a primer pNEG-PHD-CDY1-F/R is used for amplifying CDY1-W2TAG fragments, and the nucleotide sequence of the primer is shown as SEQ ID NO: 38-39. Construction of multicomponent tandem plasmids by Gibson assembly: pNEG-2 chPheT-PHD-W28TAG-CDY1-W28TAG-GST. The plasmid map is shown in FIG. 1.

5. Eukaryotic cell expression plasmid: plasmid construction of pEGFP-PHD3-W28 TAG-EGFP: pEGFP-EGFP is used as a template, and a primer pEGFP-PHD-V-F/R is used as an amplification vector, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 40-41; PHD domain is amplified by using pNEG-2 XchPheT-PHD-W28 TAG-GST as a template and using a primer pEGFP-PHD-F/R, and the nucleotide sequence is shown as SEQ ID NO:42-43, the plasmid was constructed by Gibson assembly and the plasmid map is shown in FIG. 1. plasmid construction of pCDNA3.1-chPheRS 9: designing a primer to amplify chimeric phenylalanyl-tRNA synthetase (chPheRS 9), cloning the chimeric phenylalanyl-tRNA synthetase onto a pcDNA3.1 vector, and respectively under the control of CMV and U6 promoters, wherein the cloning genes and the primers of the vector are respectively shown as SEQ ID NO:44-47, and the plasmid map is shown in FIG. 1.

Sequencing of plasmids was done by Peking Optimaceae. The construction of the remaining plasmids was the same as above.

Example 5: expression purification of KDM5A PHD3 (PHD) wild-type and mutant proteins

1. Expression of KDM5A PHD3 (PHD) wild-type protein

1. Plasmid transformation: the DH10B chemically competent strain is taken out from a refrigerator at the temperature of minus 80 ℃, immediately placed into an ice box, and after the strain is melted, plasmid pNEG-2 XchPheT-PHD-GST is added, and the strain is stirred evenly by flicking the abdomen. Standing in ice bath for 30 min, heat-shocking at 42deg.C for 90s, standing in ice bath for 2min, adding antibiotic-free LB liquid medium, recovering at 37deg.C for 40min, spreading 200 μl of bacterial liquid on LB agar plate (ampicillin, 100 μg/mL), and culturing at 37deg.C overnight.

2. Induction of expression: from the above resistant plate, a single clone was picked up to 3mL of LB liquid medium (ampicillin, 100. Mu.g/mL), and cultured overnight with shaking (37 ℃, 2.)20 rpm); according to 1: inoculating the bacterial liquid at a ratio of 100, culturing at 37deg.C to OD ₆₀₀ When the concentration is=0.6 to 0.8, L-arabinose (final concentration: 0.2%) and ZnCl are added ₂ (final concentration: 0.1 mM), and induction was carried out at 22℃for 24 hours.

2. Expression of KDM5A PHD3 mutant proteins

PHD3-W28-6MeOW mutant was exemplified.

1. Co-transformation of plasmids pNEG-2 XchPheT-PHD-W28 TAG-GST and pBK-chPheRS9 into E.coli DH10B was performed in the same manner as above.

2. Induction of expression: from the above-mentioned resistant plate, a single clone was picked up to 3mL of LB liquid medium (ampicillin, 100. Mu.g/mL; kanamycin, 50. Mu.g/mL), and cultured overnight with shaking (37 ℃,220 rpm); according to 1: 100. inoculating into 100mL LB liquid medium, culturing at 37deg.C to OD ₆₀₀ When=0.6 to 0.8, L-arabinose (final concentration: 0.2%) and ZnCl were added ₂ (final concentration: 0.1 mM) and the non-natural amino acid at a final concentration of 0.5mM, and induction of expression was carried out at 22℃for 24 hours.

3. Purification of KDM5A PHD3 (PHD)

1, collecting bacterial liquid. Centrifugation (4 ℃,4000rpm,20 min) was performed to collect the submerged bacteria.

2 resuspension of the cells. Using lysis buffer (20 mM Tris-HCl, pH 7.5,150mM NaCl,0.1mM ZnCl) ₂ 2mM beta-Me, protease inhibitor PMSF, aprotinin).

And 3, ultrasonic crushing. Setting an ultrasonic instrument program: working for 2s, intermittent for 5s, power 60%, and ultrasonic at 4 ℃.

4 centrifugation (4 ℃,12000rpm,20 min), and collecting supernatant.

5 take 0.5mL GST beads and apply to a gravity column with ddH ₂ O washes the beads and equilibrates the column with 10 column volumes of lysis buffer.

6 the supernatant from 4 was added to the equilibrated GST column.

7 with 20 column volumes of lysis buffer (20 mM Tris-HCl, pH 7.5 150mM NaCl,0.1mM ZnCl) ₂ 2mM beta-Me, protease inhibitor PMSF, aprotinin) to elute non-specifically adsorbed heteroproteins.

8 eluting with 10 times of column volume of elution buffer (20 mM Tris-HCl, pH 7.5, 150mM NaCl,20mM reduced glutathione), and collecting the eluate, i.e., the target protein component.

9 protein expression purity was determined by SDS polyacrylamide gel electrophoresis (SDS-PAGE), and protein expression level was measured using Nanodrop (micro-and fluorescence spectrophotometry, siemens). The protein is used for subsequent SDS protein gel electrophoresis analysis, mass spectrum identification and MST experiments.

FIG. 5 is a PHD domain variant protein designed to increase affinity for H3K4me 3. Wherein, (A) complex structure of KDM5A PHD3 protein and H3K4me3 polypeptide (PDB: 2 KGI) wherein PHD3 and polypeptide are displayed in cartoon mode and aromatic amino acid and H3K4me3 modification are displayed in stick-like structure. (B) Coomassie brilliant blue displays PHD-W18-UAA and PHD-W28-UAA variants. (C) The affinity of PHD-W18-UAA variants to H3K4me3 was determined by a microphoresis instrument, wherein H3K4me3 was labeled with a FITC fluorophore. (D) Determining the affinity of the PHD-W28-UAA variant with H3K4me3 by a microphoresis instrument, wherein the H3K4me3 is marked by a FITC fluorescent group;

As shown in FIG. 5B, the purity of the SDS protein was 90% or more.

4. LC-MS identification of proteins

Purified proteins were analyzed by SCIEX Triple TOF 6600MS mass spectrometry using electrospray ionization and SCIEX analysis TF software. Adopting PHENOMENEX AERIS wide-pore C4 chromatographic column2.1x50mm,3.6 μm). Mobile phase a was 0.1% formic acid in water and mobile phase B was 0.1% acetonitrile formate. The constant flow rate was set at 0.2mL/min. Mass spectrum deconvolution was performed using SCIEX OS-Q software (version 2.0, SCIEX Corporation) to analyze mass spectrum data. The molecular weight of the protein was predicted using the ExPASy Compute pI/Mw tool.

The LC-MS identification results are shown in figures 4C and 4D, the theoretical molecular weight of the target protein is 33378Da, the actual molecular weight is 33377Da and 33378Da respectively, and the specificity of the chimeric phenylalanyl-tRNA synthetase mutants is proved to be recognized as 6-methoxy-tryptophan and 7-methoxy-tryptophan.

Example 6: a microphoresis Meter (MST) determines the affinity of the decoded protein domain variant to the histone methylation-modified polypeptide. The peptides used in the experiments were all synthesized by Beijing cloisonne midbody biotechnology Co., ltd, and the C-terminal of the peptide was labeled with Fluorescein Isothiocyanate (FITC), and the specific sequences are shown in Table 2.

Table 2. Polypeptide sequence information used in mst experiments

The MST assay is specifically described using the decoding proteins PHD and H3K4me3 as examples.

(1) Desalting of protein samples. Protein samples were dialyzed against 2L of MST buffer (20 mM Tris-HCl,50mM NaCl,1 mM DTT,0.05%Tween-20, pH 7.5) and repeated 3 times.

(2) Concentrating the protein. Protein samples were concentrated to the appropriate concentration using a 10Kd protein concentrate tube (Millipore).

(3) Preparing 16 PCR tubes, adding 10 mu L of MST buffer into the No. 2-16 PCR tubes, taking 20 mu L of protein sample to the No. 1 tube, pipetting 10 mu L of protein sample from the No. 1 tube to the No. 2 tube, and iteratively diluting the protein sample;

(4) Adding 10 mu L of polypeptide molecules with final concentration of 100nM into each tube, and fully mixing to obtain 20 mu L;

(5) And (5) loading a capillary tube.

(6) Kd value measurement. This was done using a NT.115Monolith instrument (Nano Temper Technologies, munich, germany) using a blue LED excitation light source at a constant temperature of 25 ℃. The instrument is provided with: 20% of blue LED excitation power and 40% of infrared laser power. All measurements were performed using standard glass capillaries (Nano Temper Technologies, # catMO-K022), and each set of experiments was repeated 3 times, unless otherwise specified.

(7) And (5) data processing. By means of NT analysis software, the target protein and fluorescent peptide fragment were combined according to the following sequence 1: and 1, fitting the combined model in proportion to obtain the dissociation constant Kd of the target protein. All data were analyzed by Origin software process.

(8) Other histone methylation modified decoding proteins are identical to histone methylation modified affinity assay procedures.

Experimental results: experimental data as shown in fig. 5C and 5D: the affinity of the PHD wild-type domain and H3K4me3 is 440nM, the affinity of the PHD variant with the 6-methoxy-tryptophan introduced by the W28 site specificity and the H3K4me3 is 52nM, and compared with the PHD wild-type domain of PHD3, the affinity of the PHD variant with the 6-methoxy-tryptophan introduced by the W28 site specificity and the H3K4me3 is increased by 8 times, and the affinity of other electron donating tryptophan analogues of the PHD protein and the H3K4me3 is increased by 2-6 times. Similarly, the site-specific introduction of 6-methoxy-tryptophan into other decoding protein domains also increases the affinity of the decoding protein for its corresponding histone methylation modification by 2-4 fold.

Example 7: construction of multivalent tandem repeat PHD domains to increase their affinity for H3K4me3

1. Duplex and triplex repeat PHD domain plasmids were constructed as described in example 4: pNEG-2 xchPheT-2 xPHD-W28TAG-GST (2 xPHD) and pNEG-2 xchPheT-3 xPHD-W28TAG-GST (3 xPHD).

2. The purified W28 site-specifically introduced duplex and triplex PHD variants of 6-methoxy-tryptophan (2 x PHD, 3x PHD) were expressed as described in example 5, and protein expression purity was identified by SDS polyacrylamide gel electrophoresis (SDS-PAGE) and protein molecular weight was identified by LC-MS.

3. The affinities of the multivalent tandem repeat PHD domains 2x PHD, 3x PHD with H3K4me3 were determined as described in example 6.

4. The strategy is not limited to interaction between PHD structural domains and H3K4me3, and can be expanded between other decoding proteins and other histone methylation modifications, and experiments prove that the strategy can improve affinity between different decoding protein structural domains and corresponding histone methylation modifications.

FIG. 6 is a diagram of a multivalent tandem repeat PHD domain designed to recognize H3K4me 3. (A) A cartoon of multivalent tandem repeat PHD domain design. (B) Coomassie blue staining identified purity of the multiple PHD protein variants. (C) The microphoresis instrument determines the affinity of the multiple PHD protein variant to H3K4me3, wherein H3K4me3 is labeled with a FITC fluorophore.

Experimental results: experimental data are shown in fig. 6C: the affinity of PHD wild-type domain and H3K4me3 was 440nM, and the affinity of the W28 site-specifically introduced duplex and triplex PHD variants of 6-methoxy-tryptophan with H3K4me3 was 30nM and 7nM, respectively, as compared to the PHD wild-type domain of PHD3, the W28 site-specifically introduced duplex and triplex PHD variants of 6-methoxy-tryptophan with H3K4me3 affinity was up to 14.7-fold and 62.9-fold. The affinity assay results that extend this strategy to other decoded proteins and their corresponding histone methylation modifications indicate that: the multivalent tandem repeat histone methylation modification can increase the affinity of the multivalent tandem repeat histone methylation modification to the corresponding histone methylation modification.

Example 8: far-Western Blot assessment of the efficiency of PHD super-parent molecule recognition on H3K4me3

1. Protein expression. Purified 6-methoxy-tryptophan-substituted PHD protein variants were expressed as described in example 4, example 5, to obtain PHD protein, 2x PHD protein and 3x PHD protein.

2. SDS polyacrylamide gel electrophoresis. Taking HeLa cell lysate as an example, after Hela cells are lysed, the protein samples are subjected to gradient dilution to different concentrations and separated by SDS-PAGE running gel

3. And (5) transferring films. The proteins were transferred to PVDF membranes. Constant current 300mA, transfer film 2.5h.

4. Closing: the membrane was placed in a plastic box containing 5% skim milk/TBST, placed on a shaker, sealed for 1h and the sealing solution was poured off. Washed 3 times with TBST for 10min.

5. Incubating the bait protein. H3K4me 3-specific antibodies, PHD protein, 2x PHD protein and 3x PHD protein were incubated overnight, respectively. TBST was washed 3 times, once for 10min.

6. And (5) incubating the antibody. PVDF membranes incubated with H3K4me 3-specific antibodies the corresponding secondary antibodies were incubated at room temperature. Incubation of PVDF membrane of PHD protein GST-specific antibodies (Sigma-Aldrich, cat#G7781) were further incubated, followed by final incubation of the corresponding secondary antibodies (Proteintech, cat#SA 00001-2). TBST was washed 3 times, once for 10min.

7. Chemiluminescent imaging. The PVDF film was covered on the developing solution, taking care of the uniform coverage, left at room temperature for 3 minutes, and then imaged and developed on a multifunctional imager.

FIG. 7 is the detection and imaging of H3K4me3 using a histone methylation super-philic molecular recognition system. (A) The group protein methylation super-philic molecule recognition system is applied to strategy diagrams of detection and imaging. (B) The H3K4me3 level of HeLa cells was detected using histone methylation super-philic molecules. The H3 specific antibody and the H3K4me3 specific antibody are used as control groups, and PHD-WT, 2xPHD and 3xPHD are used for detecting H3K4me3 respectively. (C) The histone methylation super parent molecule recognition system is applied to fluorescent imaging detection of cell H3K4me3 localization, wherein PHD protein is marked by Cy 5.

Experimental results: experimental data as shown in fig. 7B, the 2x PHD protein and the 3x PHD protein exhibited higher signal-to-noise ratios than the H3K4me3 specific antibodies. The experiment is not limited to interaction of PHD and H3K4me3, and the same is applicable to detection of corresponding histone methylation modification of different decoding proteins.

Example 9: histone methylation superphilic molecular recognition systems detect histone methylation modifications in combination with immunofluorescence techniques.

Histone methylation of H3K4me3 and PHD-decoded proteins are exemplified.

1. The PHD variant was marked. PHD decoding protein domains were labeled with Cy5 dye. NHS-Cy5 was dissolved in DMSO and PHD-decoded protein in PBS solution. NHS-Cy5 and PHD proteins at 1:2 molar ratio, incubated at 37℃in the absence of light for 1h,50mM Tris-HCl, pH 8.0 solution, and the reaction stopped.

2. PHD protein was purified using a PD MiniTrapTM G-25 desalting column (Cytiva, cat# 28918007).

3. Preparation of cell sheets. HeLa cells were inoculated into a petri dish in which a treated cover glass was previously placed, and cultured at 37 ℃.

4. Cell fixation. After the cells were completely adherent, the medium was removed, rinsed 1 time with PBS, fixed for 10min at room temperature with 4% paraformaldehyde (4% PFA/PBS), and rinsed 3 times with PBS.

5. Cell permeabilization. Cells were permeabilized with PBS containing 0.5% Triton X-100 for 10min and rinsed 3 times with PBS.

6. And (5) sealing. At room temperature, the cells were blocked for 30min with PBS containing 3% BSA.

7. And (5) incubating the primary antibody. The Cy 5-labeled PHD protein (wild-type and mutant) and H3K4me3 antibody (Abcam, cat#ab 8580) of (2) were used for incubation for 2H at room temperature, respectively.

8. And (5) incubating the secondary antibody. Cells incubated with H3K4me3 antibody were rinsed with PBS, 10min each time, and repeated 3 times. Then incubated with Dyight 488, goat rabbit antibody IgG (Abbkine, cat#A 23220) for 1h at room temperature. The PBS was rinsed 3 times for 10min each. Cells incubated with Cy 5-labeled PHD protein were rinsed directly 3 times for 10min each with PBS.

9. And (5) sealing the piece. The coverslips were mounted face down on slides with DAPI blocking agent (Abcam, cat#ab 104139) and allowed to stand overnight in the dark. Mounted on a glass slide for imaging.

10. Imaging. At room temperature, images were made using a LSM710 confocal microscope (Zessi) with a 63x oil microscope. All images were analyzed and processed using Zeiss ZEN 2.3lite software.

Experimental results: experimental data as shown in figure 7C immunofluorescence results indicate that Cy 5-labeled PHD super-parent molecules are able to detect co-localization of H3K4me3 with M-phase condensed chromosomes during mitosis. The PHD super-parent molecule has a higher signal-to-noise ratio than the commercially available commercial H3K4me3 antibody (Abcam, cat#ab 8580).

Example 10: flow cytometry analysis of the efficiency of chimeric phenylalanine translation systems in mammalian cells

1. Transfecting the cells. 293T cells were transfected according to the standard plasmid transient transfection procedure, with the experimental group being cells co-transfected with plasmid pCDNA3.1-chPheRS9 and the fluorescent reporter plasmid pEGFP-mCherry-T2A-EGFP-190TAG expressing the chimeric phenylalanine translation system, and the control group being cells infected with pEGFP-mCherry and pEGFP-EGFP alone.

2. After 48h of cell transfection, the medium was aspirated off and the residual medium was washed off by addition of 1 XPBS.

3. The PBS solution was aspirated off, cells were digested with pancreatin, resuspended in 1mL DMEM medium, and the cells were transferred to a 1.5mL centrifuge tube.

4. The flow cytometer was set up with 293T cells for forward and side scatter gates, mCherry-expressing cells for parameters and gates of the PE channel, EGFP-expressing cells for parameters and gates of the FITC channel.

5. The experimental group cells were assayed and 50000 cells were set per sample collection. Data was analyzed using software FlowJo.

Experimental results: the experimental data are shown in FIG. 8, and the results of the flow cytometry experiments show that the chimeric phenylalanine aminoacyl-tRNA synthetase (chPheRS 9) can efficiently recognize any unnatural amino acid in a mammalian cell, namely 6-methoxy-tryptophan (6 MeOW), 7-methoxy-tryptophan (7 MeOW), 6, 7-methyl-tryptophan (67 MW) and 6, 7-methoxy-tryptophan (67 MeOW). The experimental procedure is applicable to 293T cell lines, but is not limited to 293T cell lines, and is applicable to various cell lines.

Example 11: capturing proteomes interacting with histone methylation modifications using proximity labeling techniques

1. Using pT3 vector as template, using primer pT3-PHD-APEX-V-F/R amplification vector, the nucleotide sequence of the primer is shown as SEQ ID NO: 52-53; PHD gene fragment is amplified by using pNEG-2 XchPheT-PHD-W28 TAG-GST as a template and using a primer pT3-PHD-F/R, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 56-57; the primer pT3-APEX2-F/R is used for amplifying the APEX2 gene fragment, and the nucleotide sequence of the primer is shown as SEQ ID NO:54-55, the nucleotide and amino acid sequences of APEX2 are set forth in SEQ ID NO: 13-14, plasmid pT3-PHD-APEX2 was constructed by Gibson assembly. The plasmid map is shown in FIG. 1.

2. And constructing a stable transfer cell line for stably expressing the PHD and APEX2 fusion protein. Plasmid pCMV-SB100 (see FIG. 1 for specific plasmid map) containing the sleep-bed transposon system was co-transformed with plasmid pT3-APEX2-PHD into HeLa cells, which were cultured at 37℃for 24 hours, and then periodically subjected to liquid exchange with DMEM containing 2. Mu.g/mL puromycin. After all cells of the blank group died, the experimental group cells were cultured with DMEM containing 1 μg/mL puromycin to obtain a mixed clone stable cell line.

3. Cells were transfected and pCDNA3.1-chPheRS9 was overexpressed in a clonally stable cell line with the addition of 2mM 6-methoxy-tryptophan and cultured for 36h.

4. APEX2 catalyzed proximity tag. The stable cell line in step (3) was incubated with DMEM containing 500. Mu.M biotin phenol at 37℃for 30min. Changing the solution to 1mM H ₂ O ₂ PBS solution, standing at room temperature for 5min. Cells were rinsed 4 times with pre-chilled 20mM ascorbic acid/PBS followed by 1 time of PBS. Using pancreatin digestion, DMEM was neutralized, centrifuged (1000 g,1 min) and the supernatant discarded. Finally, PBS was added to resuspend the cells, centrifugation (1000 g,1 min), the supernatant was discarded, and the procedure described above was repeated 1 time.

5. Isolation of nuclei. 1.5mL of hypotonic buffer (10mM HEPES,10mM KCl,0.05%NP40) was added to the cells obtained in step (4), resuspended, allowed to stand on ice for 10min, centrifuged (4 ℃,12000rpm, 20 min), and the supernatant discarded. Repeating the above steps for 5-8 times.

6. Lysis of the nuclei. To the pellet in step (5) was added 400. Mu.L of lysis buffer (25mM TEOA pH7.5, 150mM NaCl,0.1%SDS,1%Triton X-100,0.5% sodium deoxycholate, 1mM PMSF,1 XPIC) and 20. Mu.L of DNase, resuspended, left at room temperature for 20min, centrifuged (4 ℃,18000rpm,15 min) and the supernatant was collected.

7. Enrichment of biotin-labeled proteins. To the supernatant collected in step (6), streptavidin-conjugated magnetic beads (Streptavidin Beads) were added and incubated overnight at 4 ℃. Washing 1 time with 0.01% NP40/PBS buffer, then washing 3 times with 0.01% NP40/PBS buffer containing 500mM NaCl, 0.01% NP40/PBS buffer containing 0.2% SDS, 0.01% NP40/PBS buffer containing 2M urea, washing 1 time with 0.01% NP40/PBS buffer, finally adding 100. Mu.L of 1X SDS loading buffer to resuspension, boiling at 100 ℃ for 10min.

8. SDS polyacrylamide gel electrophoresis. The protein samples in step (7) were separated by SDS polyacrylamide gel electrophoresis (SDS-PAGE), stained with Coomassie brilliant blue G250, and destained.

9. LC-MS/MS detects proteomes interacting with histone methylation modifications. Separating protein strips of the protein gel in the step (8) by using a clean blade gel cutting, and respectively carrying out decoloring, dehydration, drying, reduction, alkylation, enzymatic hydrolysis, peptide segment extraction, desalination, isotope labeling and desalination treatment. The samples treated as described above were analyzed by Q Exactive Orbitrap mass spectrometer using Proxeon nanospray ionization and the high performance liquid chromatography instrument was Proxeon Easy-nLC II HPLC. Sample loading to 100-micro x 20mm Magic C18 Desalting in 5U reverse column, and passing through 75-microx 100mm Magic C18->The protein samples were separated on a 3U reverse phase column. The elution flow rate was set at 300nL/min and the elution time was set at 60min to obtain MS/MS results. And (3) data processing: experimental software MaxQuant and pfbel software analysis processed experimental results.

The flow of this example is shown in FIG. 9, which combines proximity labeling technology with a histone methylation super-parent recognition system to capture proteomes interacting with histone methylation modifications (H3K 4me 3), which is useful for analyzing biological functions of H3K4me3 during life.

In summary, the invention provides a synthesis method of pyridine alkaloid compounds, and the compounds are used for remarkably improving the cation-pi interaction, so that a research method is provided for researching biomacromolecules of the cation-pi interaction, a theoretical basis is provided for developing biotechnology of a super parent based on histone methylation modification of decoding protein, possibility is provided for further application, and great clinical value and development and application value are provided.

It should be understood that the foregoing detailed description of the present invention is provided for illustration only and is not limited to the technical solutions described in the embodiments of the present invention, and those skilled in the art should understand that the present invention may be modified or equivalently replaced to achieve the same technical effects; as long as the use requirement is met, the invention is within the protection scope of the invention.

Sequence listing

<110> university of Zhejiang

<120> a method for increasing cation-pi interaction by genetic code extension and application thereof

<130> ZJDX-002

<160> 49

<170> SIPOSequenceListing 1.0

<210> 1

<211> 1668

<212> DNA

<213> Synthesis (synthetic sequence)

<400> 1

atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc 60

ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg 120

gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180

cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg tttctgacga agatatcaac 240

aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagt tgtttctgag 300

ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360

ccggtttctg cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct 420

accccgaact ctccggttcc gacctctgca agcgccccag ctctgactaa atcccagacg 480

gaccgtctgg aggtgctgct gaacccaaag gatgaaatct ctctgaacag cggcaagcct 540

ttccgtgagc tggaaagcga gctgctgtct cgtcgtaaaa aggatctgca acagatctac 600

gctgaggaac gcgagggtgg cggaagcggc ggcggtggcg gaagcggcgg cggtggcgga 660

agcggcggcg gtggaagcca ggcctgggga tcgaggcctc ctgcagcaga gtgtgccacc 720

caaagagctc caggcagtgt ggtggagctg ctgggcaaat cctaccctca ggacgaccac 780

agcaacctca cccggaaggt cctcaccaga gttggcagga acctgcacaa ccagcagcat 840

caccctctgt ggctgatcaa ggagagggtg ttggagcact tcaacaagca gtatgtgggc 900

agctctggga ccccgttgtt ctcggtctat gacaaccttt cgccagtggt cacgacctgg 960

cagaactttg acagcctgct catcccagct gatcacccct gcaggaagaa gggggacaac 1020

tattacctga atcggactca catgctgaga gcgcacacgt ccgcacacca gtgggacttg 1080

ctgcacgcgg gactggatgc cttcctggtg gtgggtgatg tctacaggcg tgaccagatc 1140

gactcccagc actaccctat tttccaccag ctggacgccg gtcggctctt ctctaagcat 1200

gagttatttg ctggtataaa ggatggggaa agcctgcagc tctttgaaca aagttctcgc 1260

tctgcgcata aacaagagac acacaccatg gaggccgtga agcttgttga gtttgatctt 1320

aagcaaacgc ttaccaggct catggcacat ctttttggag atgagccgga gataaggtgg 1380

gtagactgct acgttccttt tggacatcct tcctttgaga tggagatcaa ctttcatgga 1440

gaatggctgg aagttcttgg ctgcggggtg gttgaacaac aactggtcaa ttcagctggt 1500

gctcaagacc gaatcggctg gggatttggc ctagggttag aaaggctagc catgatcctc 1560

tacgacatcc ctgatatccg tctcttctgg tgtgaggacg agcgcttcct gaagcagttc 1620

tgtgtatcca acattaatca gaaggtgaag tttcagcctc ttagcaaa 1668

<210> 2

<211> 556

<212> PRT

<213> Synthesis (synthetic sequence)

<400> 2

Met Asp Lys Lys Pro Leu Asp Val Leu Ile Ser Ala Thr Gly Leu Trp

1 5 10 15

Met Ser Arg Thr Gly Thr Leu His Lys Ile Lys His Tyr Glu Ile Ser

20 25 30

Arg Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly Asp His Leu Val Val

35 40 45

Asn Asn Ser Arg Ser Cys Arg Pro Ala Arg Ala Phe Arg Tyr His Lys

50 55 60

Tyr Arg Lys Thr Cys Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn

65 70 75 80

Asn Phe Leu Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys

85 90 95

Val Val Ser Glu Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser

100 105 110

Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr

115 120 125

Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro Asn Ser

130 135 140

Pro Val Pro Thr Ser Ala Ser Ala Pro Ala Leu Thr Lys Ser Gln Thr

145 150 155 160

Asp Arg Leu Glu Val Leu Leu Asn Pro Lys Asp Glu Ile Ser Leu Asn

165 170 175

Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser Glu Leu Leu Ser Arg Arg

180 185 190

Lys Lys Asp Leu Gln Gln Ile Tyr Ala Glu Glu Arg Glu Gly Gly Gly

195 200 205

Ser Gly Gly Gly Gly Gly Ser Gly Gly Gly Gly Gly Ser Gly Gly Gly

210 215 220

Gly Ser Gln Ala Trp Gly Ser Arg Pro Pro Ala Ala Glu Cys Ala Thr

225 230 235 240

Gln Arg Ala Pro Gly Ser Val Val Glu Leu Leu Gly Lys Ser Tyr Pro

245 250 255

Gln Asp Asp His Ser Asn Leu Thr Arg Lys Val Leu Thr Arg Val Gly

260 265 270

Arg Asn Leu His Asn Gln Gln His His Pro Leu Trp Leu Ile Lys Glu

275 280 285

Arg Val Leu Glu His Phe Asn Lys Gln Tyr Val Gly Ser Ser Gly Thr

290 295 300

Pro Leu Phe Ser Val Tyr Asp Asn Leu Ser Pro Val Val Thr Thr Trp

305 310 315 320

Gln Asn Phe Asp Ser Leu Leu Ile Pro Ala Asp His Pro Cys Arg Lys

325 330 335

Lys Gly Asp Asn Tyr Tyr Leu Asn Arg Thr His Met Leu Arg Ala His

340 345 350

Thr Ser Ala His Gln Trp Asp Leu Leu His Ala Gly Leu Asp Ala Phe

355 360 365

Leu Val Val Gly Asp Val Tyr Arg Arg Asp Gln Ile Asp Ser Gln His

370 375 380

Tyr Pro Ile Phe His Gln Leu Asp Ala Gly Arg Leu Phe Ser Lys His

385 390 395 400

Glu Leu Phe Ala Gly Ile Lys Asp Gly Glu Ser Leu Gln Leu Phe Glu

405 410 415

Gln Ser Ser Arg Ser Ala His Lys Gln Glu Thr His Thr Met Glu Ala

420 425 430

Val Lys Leu Val Glu Phe Asp Leu Lys Gln Thr Leu Thr Arg Leu Met

435 440 445

Ala His Leu Phe Gly Asp Glu Pro Glu Ile Arg Trp Val Asp Cys Tyr

450 455 460

Val Pro Phe Gly His Pro Ser Phe Glu Met Glu Ile Asn Phe His Gly

465 470 475 480

Glu Trp Leu Glu Val Leu Gly Cys Gly Val Val Glu Gln Gln Leu Val

485 490 495

Asn Ser Ala Gly Ala Gln Asp Arg Ile Gly Trp Gly Phe Gly Leu Gly

500 505 510

Leu Glu Arg Leu Ala Met Ile Leu Tyr Asp Ile Pro Asp Ile Arg Leu

515 520 525

Phe Trp Cys Glu Asp Glu Arg Phe Leu Lys Gln Phe Cys Val Ser Asn

530 535 540

Ile Asn Gln Lys Val Lys Phe Gln Pro Leu Ser Lys

545 550 555

<210> 3

<211> 201

<212> DNA

<213> person (H. Sapiens)

<400> 3

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattaggtt cagtgtgatg gtggttgtga tgaatggttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg t 201

<210> 4

<211> 201

<212> DNA

<213> person (H. Sapiens)

<400> 4

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatagttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg t 201

<210> 5

<211> 198

<212> DNA

<213> person (H. Sapiens)

<400> 5

atggcaagtc aggaatttga agtagaagca attgttgata aacgtcaaga taaaaacggt 60

aatacccaat atctggttcg ttggaaaggt tatgataaac aggatgatac atgggaaccg 120

gaacagcatc tgatgaattg tgaaaaatgt gtgcatgatt tcaaccgtcg ccaaaccgaa 180

aaacagaaag gtggaagc 198

<210> 6

<211> 66

<212> PRT

<213> person (H. Sapiens)

<400> 6

Met Ala Ser Gln Glu Phe Glu Val Glu Ala Ile Val Asp Lys Arg Gln

1 5 10 15

Asp Lys Asn Gly Asn Thr Gln Tyr Leu Val Arg Trp Lys Gly Tyr Asp

20 25 30

Lys Gln Asp Asp Thr Trp Glu Pro Glu Gln His Leu Met Asn Cys Glu

35 40 45

Lys Cys Val His Asp Phe Asn Arg Arg Gln Thr Glu Lys Gln Lys Gly

50 55 60

Gly Ser

65

<210> 7

<211> 579

<212> DNA

<213> person (H. Sapiens)

<400> 7

atgaatggct gggtacctgt tggggctgcg tgtgagaagg ctgtgtatgt cttggatgag 60

ccggagccag ccatccgaaa gagctaccag gcggtagagc ggcatgggga gacaatccga 120

gtccgggaca ccgtccttct caaatcaggc ccacgaaaga cctccacacc ttatgtggcc 180

aagatctctg ccctctggga gaaccccgag tcaggagagc tgatgatgag cctcctgtgg 240

tattacagac ctgagcactt acagggaggc cgcagtccca gcatgcacga gcccttgcag 300

aatgaagtgt ttgcatcgcg acatcaggac cagaacagtg tggcctgcat tgaggagaag 360

tgctatgtgc tgacttttgc cgagtactgc aggttctgtg ccatggccaa gcgccgaggt 420

gaaggcctcc ccagccgaaa gacagcactg gttcccccct ctgcagacta ttccacccca 480

ccccaccgca cagtgccaga ggacacggac cctgagctgg tgttcctttg ccgccatgtc 540

tatgacttcc gccacgggcg catccttaag aacccccag 579

<210> 8

<211> 193

<212> PRT

<213> person (H. Sapiens)

<400> 8

Met Asn Gly Trp Val Pro Val Gly Ala Ala Cys Glu Lys Ala Val Tyr

1 5 10 15

Val Leu Asp Glu Pro Glu Pro Ala Ile Arg Lys Ser Tyr Gln Ala Val

20 25 30

Glu Arg His Gly Glu Thr Ile Arg Val Arg Asp Thr Val Leu Leu Lys

35 40 45

Ser Gly Pro Arg Lys Thr Ser Thr Pro Tyr Val Ala Lys Ile Ser Ala

50 55 60

Leu Trp Glu Asn Pro Glu Ser Gly Glu Leu Met Met Ser Leu Leu Trp

65 70 75 80

Tyr Tyr Arg Pro Glu His Leu Gln Gly Gly Arg Ser Pro Ser Met His

85 90 95

Glu Pro Leu Gln Asn Glu Val Phe Ala Ser Arg His Gln Asp Gln Asn

100 105 110

Ser Val Ala Cys Ile Glu Glu Lys Cys Tyr Val Leu Thr Phe Ala Glu

115 120 125

Tyr Cys Arg Phe Cys Ala Met Ala Lys Arg Arg Gly Glu Gly Leu Pro

130 135 140

Ser Arg Lys Thr Ala Leu Val Pro Pro Ser Ala Asp Tyr Ser Thr Pro

145 150 155 160

Pro His Arg Thr Val Pro Glu Asp Thr Asp Pro Glu Leu Val Phe Leu

165 170 175

Cys Arg His Val Tyr Asp Phe Arg His Gly Arg Ile Leu Lys Asn Pro

180 185 190

Gln

<210> 9

<211> 207

<212> DNA

<213> person (H. Sapiens)

<400> 9

atggagtatc aggatgggaa ggagtttgga ataggggacc tcgtgtgggg aaagatcaag 60

ggcttctcct ggtggcccgc catggtggtg tcttggaagg ccacctccaa gcgacaggct 120

atgtctggca tgcggtgggt ccagtggttt ggcgatggca agttctccga ggtctctgca 180

gacaaactgg tggcactggg gctgttc 207

<210> 10

<211> 69

<212> PRT

<213> person (H. Sapiens)

<400> 10

Met Glu Tyr Gln Asp Gly Lys Glu Phe Gly Ile Gly Asp Leu Val Trp

1 5 10 15

Gly Lys Ile Lys Gly Phe Ser Trp Trp Pro Ala Met Val Val Ser Trp

20 25 30

Lys Ala Thr Ser Lys Arg Gln Ala Met Ser Gly Met Arg Trp Val Gln

35 40 45

Trp Phe Gly Asp Gly Lys Phe Ser Glu Val Ser Ala Asp Lys Leu Val

50 55 60

Ala Leu Gly Leu Phe

65

<210> 11

<211> 417

<212> DNA

<213> person (H. Sapiens)

<400> 11

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatagttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg tggcagcagc ggcagcagca gcggtgcaga agaatcagat 240

gatgaaaatg cagtttgtgc agcacagaat tgtcagcgcc cgtgtaaaga taaagttgat 300

tgggttcagt gtgatggtgg ttgtgatgaa tagtttcatc aggtttgtgt tggtgttagc 360

ccggaaatgg cagaaaatga agattatatt tgcatcaact gcgcaaaaaa acagggt 417

<210> 12

<211> 636

<212> DNA

<213> person (H. Sapiens)

<400> 12

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatggttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg tggcagcagc ggcagcagca gcggtgcaga agaatcagat 240

gatgaaaatg cagtttgtgc agcacagaat tgtcagcgcc cgtgtaaaga taaagttgat 300

tgggttcagt gtgatggtgg ttgtgatgaa tggtttcatc aggtttgtgt tggtgttagc 360

ccggaaatgg cagaaaatga agattatatt tgcatcaact gcgcaaaaaa acagggtctg 420

gtgccgcgcg gcagcagcag cggtgcagaa gaatcagatg atgaaaatgc agtttgtgca 480

gcacagaatt gtcagcgccc gtgtaaagat aaagttgatt gggttcagtg tgatggtggt 540

tgtgatgaat ggtttcatca ggtttgtgtt ggtgttagcc cggaaatggc agaaaatgaa 600

gattatattt gcatcaactg cgcaaaaaaa cagggt 636

<210> 13

<211> 747

<212> DNA

<213> Soybean (Glycine max)

<400> 13

ggaaagtctt acccaactgt gagtgctgat taccaggacg ccgttgagaa ggcgaagaag 60

aagctcagag gcttcatcgc tgagaagaga tgcgctcctc taatgctccg tttggcattc 120

cactctgctg gaacctttga caagggcacg aagaccggtg gacccttcgg aaccatcaag 180

caccctgccg aactggctca cagcgctaac aacggtcttg acatcgctgt taggcttttg 240

gagccactca aggcggagtt ccctattttg agctacgccg atttctacca gttggctggc 300

gttgttgccg ttgaggtcac gggtggacct aaggttccat tccaccctgg aagagaggac 360

aagcctgagc caccaccaga gggtcgcttg cccgatccca ctaagggttc tgaccatttg 420

agagatgtgt ttggcaaagc tatggggctt actgaccaag atatcgttgc tctatctggg 480

ggtcacacta ttggagctgc acacaaggag cgttctggat ttgagggtcc ctggacctct 540

aatcctctta ttttcgacaa ctcatacttc acggagttgt tgagtggtga gaaggaaggt 600

ctccttcagc taccttctga caaggctctt ttgtctgacc ctgtattccg ccctctcgtt 660

gacaaatatg cagcggacga agatgccttc tttgctgatt acgctgaggc tcaccaaaag 720

ctttccgagc ttgggtttgc tgatgcc 747

<210> 14

<211> 249

<212> PRT

<213> Soybean (Glycine max)

<400> 14

Gly Lys Ser Tyr Pro Thr Val Ser Ala Asp Tyr Gln Asp Ala Val Glu

1 5 10 15

Lys Ala Lys Lys Lys Leu Arg Gly Phe Ile Ala Glu Lys Arg Cys Ala

20 25 30

Pro Leu Met Leu Arg Leu Ala Phe His Ser Ala Gly Thr Phe Asp Lys

35 40 45

Gly Thr Lys Thr Gly Gly Pro Phe Gly Thr Ile Lys His Pro Ala Glu

50 55 60

Leu Ala His Ser Ala Asn Asn Gly Leu Asp Ile Ala Val Arg Leu Leu

65 70 75 80

Glu Pro Leu Lys Ala Glu Phe Pro Ile Leu Ser Tyr Ala Asp Phe Tyr

85 90 95

Gln Leu Ala Gly Val Val Ala Val Glu Val Thr Gly Gly Pro Lys Val

100 105 110

Pro Phe His Pro Gly Arg Glu Asp Lys Pro Glu Pro Pro Pro Glu Gly

115 120 125

Arg Leu Pro Asp Pro Thr Lys Gly Ser Asp His Leu Arg Asp Val Phe

130 135 140

Gly Lys Ala Met Gly Leu Thr Asp Gln Asp Ile Val Ala Leu Ser Gly

145 150 155 160

Gly His Thr Ile Gly Ala Ala His Lys Glu Arg Ser Gly Phe Glu Gly

165 170 175

Pro Trp Thr Ser Asn Pro Leu Ile Phe Asp Asn Ser Tyr Phe Thr Glu

180 185 190

Leu Leu Ser Gly Glu Lys Glu Gly Leu Leu Gln Leu Pro Ser Asp Lys

195 200 205

Ala Leu Leu Ser Asp Pro Val Phe Arg Pro Leu Val Asp Lys Tyr Ala

210 215 220

Ala Asp Glu Asp Ala Phe Phe Ala Asp Tyr Ala Glu Ala His Gln Lys

225 230 235 240

Leu Ser Glu Leu Gly Phe Ala Asp Ala

245

<210> 15

<211> 47

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<400> 15

taagatgggt agactgctac nnkccttttg gtcatccttc ttttgag 47

<210> 16

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 16

gtagcagtct acccatctta tctcc 25

<210> 17

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<400> 17

aagttcttgg ctgcggggtg nnkgaacaac aactggtcaa ttcagc 46

<210> 18

<211> 20

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 18

caccccgcag ccaagaactt 20

<210> 19

<211> 50

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<220>

<221> misc_feature

<222> (27)..(28)

<223> n is a, c, g, or t

<400> 19

accctatttt ccaccagctg nnkgccnnkc ggctcttctc caagcatgag 50

<210> 20

<211> 24

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 20

cagctggtgg aaaatagggt agtg 24

<210> 21

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 21

ctggtgccgc gcggcagcat gtcccctata ctaggttatt ggaaaa 46

<210> 22

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 22

gtggcgacca tcctccaaaa tgaagcatgc accattcctt 40

<210> 23

<211> 45

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 23

taagaaggag atatacatat gagcggtgca gaagaatcag atgat 45

<210> 24

<211> 44

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 24

catgctgccg cgcggcacca gaccctgttt ttttgcgcag ttga 44

<210> 25

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 25

tgaagcatgc accattcctt gc 22

<210> 26

<211> 58

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 26

catatgtata tctccttctt aaagttaaac aaaattattt ctagcccaaa aaaacggg 58

<210> 27

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 27

cgtgtaaaga taaagttgat taggttcagt gtgatggtgg ttgtga 46

<210> 28

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 28

atcaacttta tctttacacg ggcgc 25

<210> 29

<211> 27

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 29

tttcatcagg tttgtgttgg tgttagc 27

<210> 30

<211> 47

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 30

ccaacacaaa cctgatgaaa ctattcatca caaccaccat cacactg 47

<210> 31

<211> 59

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 31

actgcgcaaa aaaacagggt ggcagcagcg gcagcagcag cggtgcagaa gaatcagat 59

<210> 32

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 32

atgctgccgc gcggcaccag accctgtttt tttgcgcagt tg 42

<210> 33

<211> 39

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 33

accctgtttt tttgcgcagt tgatgcaaat ataatcttc 39

<210> 34

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 34

atgctgccgc gcggcaccag gcttccacct ttctgttttt cg 42

<210> 35

<211> 59

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 35

actgcgcaaa aaaacagggt ggcagcagcg gcagcagcgc aagtcaggaa tttgaagta 59

<210> 36

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 36

ggcagcagcg gcagcagcgt gagcaagggc gaggagctgt 40

<210> 37

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 37

catggtggcg accggtagcg ct 22

<210> 38

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 38

cgctaccggt cgccaccatg agcggtgcag aagaatcaga tg 42

<210> 39

<211> 43

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 39

cgctgctgcc gctgctgcca ccctgttttt ttgcgcagtt gat 43

<210> 40

<211> 43

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 40

ctgcacggaa gcttgccacc atggataaga agccgctgga tgt 43

<210> 41

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 41

tagtgatggt gatggtggtg tttgctaaga ggctgaaact tcacct 46

<210> 42

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 42

caccaccatc accatcacta aaccc 25

<210> 43

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 43

ggtggcaagc ttccgtgcag tt 22

<210> 44

<211> 23

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 44

taactagtcc actgagatcg acg 23

<210> 45

<211> 26

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 45

cttatcgtcg tcatccttgt agtcca 26

<210> 46

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 46

tctggcagcg gttctgctag cggaaagtct tacccaactg tgagtg 46

<210> 47

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 47

cgatctcagt ggactagtta ggcatcagca aacccaagct 40

<210> 48

<211> 41

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 48

acaaggatga cgacgataag agcggtgcag aagaatcaga t 41

<210> 49

<211> 51

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 49

gctagcagaa ccgctgccag aaccgctgcc accctgtttt tttgcgcagt t 51

Claims

1. A method for improving cation-pi interaction by using genetic code expansion, which is characterized in that tryptophan forming an aromatic cage of the cation-pi interaction in a biological molecule is replaced by tryptophan analogues by using a genetic code expansion technology so as to improve the binding energy of the cation-pi interaction, and the method comprises the following specific steps of:

S1, designing and synthesizing a tryptophan analogue with a strong electron side chain substitution, wherein the tryptophan analogue is an unnatural amino acid and is selected from one of 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3) and 7-methoxy-tryptophan (A4), and structural formulas of the tryptophan analogues A2 to A4 are as follows:

s2, providing chimeric phenylalanine aminoacyl-tRNA synthetase mutants capable of specifically recognizing tryptophan analogues A2 to A4, wherein the nucleotide sequence and the amino acid sequence of the chimeric phenylalanine aminoacyl-tRNA synthetase mutants are shown in SEQ ID NO: 1-2;

2. The method according to claim 1, characterized in that the tryptophan analogues are synthesized by: indole B substituted at different positions is used as a reactant to react to obtain a target product,

the chemical structural formula of the indole substituted at different positions is as follows:the structural general formula of the obtained target product is as follows: / >Wherein X is selected from: one of an oxygen atom or a carbon atom.

3. The method according to claim 1, characterized in that: in the step S3, the processing unit,

(1) The tryptophan corresponding site of the decoding protein forming an aromatic cage is mutated into a stop codon,

(2) Co-transferring the decoded protein mutant and chimeric phenylalanyl-tRNA synthetase mutant, adding corresponding tryptophan analogue in the expression process,

(3) The decoded protein variants were purified according to the GST-tag protein purification method and the fidelity of the decoded protein variants was identified by LC-MS.

4. The method according to claim 1, characterized in that: the method takes histone methylation decoding protein domain as a research object, wherein the decoding protein domain is any one of Chromo, PHD, PWWP, tudor, MBT, CW, SPIN and BAH domains.

5. Use of a protein with a tryptophan analog obtained by the method of claim 1 for establishing a decoded protein super-parent recognition system specifically recognizing histone methylation modifications.