CN114940979A

CN114940979A - Method for improving cation-pi interaction by genetic code expansion and application

Info

Publication number: CN114940979A
Application number: CN202210140263.7A
Authority: CN
Inventors: 林世贤; 赵红霞; 刘超; 方誉
Original assignee: Zhejiang University ZJU
Current assignee: Hangzhou Chihua Hesheng Pharmaceutical Technology Co ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-08-26
Anticipated expiration: 2042-02-16
Also published as: CN114940979B

Abstract

The cation-pi interaction is an important non-covalent interaction between molecules, playing an important role in the biological and chemical fields, and although there is great success in understanding the origin and biological functions of cation-pi, research for designing and synthesizing stronger cation-pi interactions is scarce. The invention provides a method for improving cation-pi interaction by using genetic code expansion and application thereof, taking decoded protein subjected to histone methylation modification as an example, tryptophan analogues substituted by strong electron-donating side chain groups are introduced into tryptophan sites of an aromatic cage of the decoded protein by using a genetic code expansion technology, the affinity of the decoded protein and the histone methylation modification is improved, and a super-parent molecular recognition system for recognizing histone methylation modification is established.

Description

Method for improving cation-pi interaction by genetic code expansion and application

Technical Field

The invention relates to a method for improving cation-pi interaction, in particular to a method for improving cation-pi interaction by genetic code expansion and application, and belongs to the technical field of biology.

Background

Non-covalent interactions regulate the structure and function of biomolecules, playing a key role in molecular folding and molecular recognition. Non-covalent interactions include cation-pi interactions, hydrogen bonding interactions, ionic interactions, and hydrophobic interactions, where cation-pi is a strong non-covalent interaction that occurs between cations and the pi electron cloud, playing an important role in biomolecule self-assembly, molecular recognition, molecular adhesion, and molecular folding, and a series of recent work on the origin and rationale of cation-pi interactions suggest that cation-pi interactions play a critical role during substrate-receptor binding and recognition of histone post-translational modifications. It has been reported that substitution of aromatic amino acids in aromatic cages by fluorine-substituted tryptophan analogs impairs cation-pi interactions due to the electron withdrawing ability of fluorine. In addition, aromatic amino acids in the aromatic cages of the mutant decoder proteins significantly reduce or disrupt the interaction. Despite great success in understanding the origin and biological function of cation-pi interactions, research to design and synthesize stronger cation-pi interactions is essentially blanked.

Taking histone methylation decoding protein as an example, histone methylation refers to methylation modification which is mediated by methyltransferase and occurs on arginine or lysine residues at the N-terminal of H3 and H4 histones, histone methylation modification is recognized by decoding protein to participate in important life processes such as regulation of gene expression, DNA replication, DNA damage repair and regulation of cell cycle, the study of distribution and abundance of histone methylation is the basis for understanding the molecular mechanism of histone code and chromatin regulation, and the current histone methylation-based antibody is a technical method for mainly detecting histone methylation genome distribution and site specificity. Unfortunately, antibodies have the disadvantages of sequence-dependent affinity, low substrate resolution, non-specific recognition, and suitability only for in vitro experiments, which limits their use and accurate resolution of histone methylation functions. Therefore, a new method for detecting histone methylation modification with high affinity is urgently needed to be developed.

Studies have shown that a histone methylation-modified decoding protein forms a hydrophobic pocket from 2 to 4 aromatic amino acids to specifically recognize histone methylation modification through cation-pi interaction, and in view of the property that the decoding protein can specifically recognize histone methylation modification, a method for detecting histone methylation modification based on a decoding protein domain is widely focused as an alternative to specific antibodies, and the ADD domain of ATRX protein and the PWWP domain of DNMT3A protein are used to capture H3K9me3 and H3K36me3, respectively; the MBT2 domain of L3MBTL1 broadly recognizes methylated lysine or double methylated lysine modifications and thus was developed as a method to capture the methylated lysine proteome. The detection method based on the decoding protein domain has the advantages of easy modification, economy and capture of a plurality of PTMs, but the affinity of the decoding protein domain and histone methylation modification is in micromolar level, so that the wide application of the technology is limited. Therefore, it is highly desirable to design high affinity histone methylation decoding proteins to facilitate application of decoding protein domains in enrichment, imaging, sequencing and other aspects of histone methylation modification.

The genetic code expansion technology (GCE for short) specifically introduces unnatural amino acids with novel structures and unique properties on proteins, and expands tiles for synthesizing the proteins, thereby providing a powerful tool for precise protein control and identification and optimization of protein functions. The invention, entitled "construction of orthogonal aminoacyl-tRNA synthetase/tRNA System Using chimeric design" patent ZL 201910440254.8 discloses the use of protein chimeric design to transplant the characteristics of Pyrrolysinyl tRNA synthetase (PylRS)/tRNACUA orthogonal pair to universal orthogonality to human Source mitochondrial Phenylaminoacyl tRNA synthetase (PheRS)/tRNA pair to construct chimeric Phenylaminoacyl tRNA synthetase (chPheRS)/tRNA system with universal orthogonality, thereby broadening the recognition of the types of unnatural amino acids and providing new tools for genetic code expansion technology, where the chimeric Phenylaminoacyl tRNA synthetase of the system can specifically recognize tryptophan analogs, such as: 6-methyl-tryptophan, 7-methyl-tryptophan, 6-chloro-tryptophan, 7-chloro-tryptophan, 6-cyano-tryptophan, and 7-cyano-tryptophan.

Disclosure of Invention

The invention aims to provide a method for improving cation-pi interaction by genetic code expansion, which utilizes a genetic code expansion technology to replace tryptophan forming cation-pi interaction in a biological molecule with a tryptophan analogue so as to improve the binding energy of the cation-pi interaction.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for improving cation-pi interaction by genetic code expansion features that the tryptophan of aromatic cage for forming cation-pi interaction in biologic molecule is substituted by the tryptophan analog to improve the binding energy of cation-pi interaction.

Non-covalent interactions regulate the structure and function of biomolecules playing a key role in molecular folding and molecular recognition, where cation-pi is a strong non-covalent interaction occurring between cations and pi electron clouds playing an important role in biomolecule self-assembly, molecular recognition, molecular adhesion and molecular folding, but the invention of how to improve cation-pi interactions is in the blank state. The invention provides a synthesis method of tryptophan compounds with electron donating groups, and develops a method for remarkably improving the cation-pi interaction binding energy of a verified and synthesized compound A1-A6 by replacing tryptophan of an aromatic cage with a genetic code expansion technology. The method mainly depends on specific antibodies, but the antibodies have the defects of sequence-dependent affinity, low substrate resolution, non-specific recognition, suitability for in vitro experiments and the like, and the invention takes the decoded protein of the histone methylation modification as an example, introduces tryptophan analogues strongly substituted by electron-donating side chain groups into tryptophan sites of an aromatic cage of the decoded protein by using a genetic code expansion technology, improves the affinity of the decoded protein and the histone methylation modification by 4-8 times, does not influence the functions of the protein, and shows the potential value of the histone methylation modification. The method is utilized to establish a super-parent molecular recognition system for recognizing histone methylation modification, and the system is applied to the aspects of detection, imaging, sequencing and the like of histone methylation modification.

In the invention, a series of tryptophan analogues are introduced into an aromatic cage of the decoding protein site-specifically to regulate the binding affinity of the decoding protein and histone methylation modification, the affinity of histone methylation and the decoding protein thereof is improved by 4-8 times by utilizing the strategy, the affinity of H3K4me3 and a PHD decoding protein structural domain reaches nanomolar level by series repeated design, and the strategy is utilized to develop a super-parent molecule recognition system for detecting histone methylation modification.

The method of the invention can be applied to the study of any biomacromolecule forming cation-pi interaction.

Preferably, the method comprises the steps of:

s1, designing and synthesizing tryptophan analogues with strong electron supply side chain substitution, wherein the tryptophan analogues are non-natural amino acids and are selected from one of 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophan (A11) or 6, 7-cyclopentane-tryptophan (A12), the structural formulas of the tryptophan analogs A1 to A12 are as follows:

s2, screening a chimeric phenylalanine aminoacyl-tRNA synthetase mutant specifically recognizing tryptophan analogs A1 to A12;

s3, taking the biological molecule forming cation-pi interaction as the research object, utilizing the genetic code expansion technology to specifically introduce tryptophan analogues into the biological molecule through the chimera phenylalanine aminoacyl-tRNA synthetase mutant, and obtaining the protein with the tryptophan analogues.

Preferably, indole B substituted at different positions is used as a reactant to react to obtain a target product, wherein the chemical structural formula of the indole substituted at different positions is as follows:

the general structural formula of the target product is as follows:

wherein X is selected from: oxygen atom or carbon atom.

Preferably, the method for synthesizing the tryptophan analogs A1 to A12 comprises the following steps:

the method comprises the following steps: synthesis of starting material compound B:

starting material B is selected from one of 6-methyl-indole (B1), 6-methoxy-indole (B2), 7-methyl-indole (B3), 7-methoxy-indole (B4), 6, 7-methoxy-indole (B5), 6, 7-methyl-indole (B6), 7, 8-dihydrofuran-indole (B7), 6, 7-dihydrofuran-indole (B8), 7, 8-furan-indole (B9), 6, 7-furan-indole (B10), 6, 7-dioxole-indole (B11) or 6, 7-cyclopentane-indole (B12), the structural formulae of the above indole analogs B1 to B12 are:

(1) synthesis of compounds B6, B7, B8, B9, B10: aniline (G6, G7, G8, G9 or G10) and triethanolamine as reactants, and RuCl as a reaction product ₃ ·nH ₂ O，SnCl ₂ ·2H ₂ O and PPh ₃ As a catalyst, reacting in anhydrous dioxane to obtain a starting material compound B; (2) synthesis of compound B11, B12: using aniline (G11 or G12), chloral hydrate and hydroxylamine hydrochloride as reactants, sulfuric acid as a catalyst, water as a solvent to obtain a crude product, then reacting the crude product with methanesulfonic acid to obtain an isatin product, and finally reducing the isatin product by lithium aluminum hydride to obtain a starting material compound B;

step two: synthesis of Compound C: reacting an initial raw material compound B and iodine as reactants, potassium hydroxide as alkali and anhydrous N, N-dimethylformamide as a solvent to obtain an intermediate compound C;

step three: synthesis of Compound D: reacting the compound C and di-tert-butyl dicarbonate serving as reactants in anhydrous dichloromethane by using triethylamine as alkali and DMAP (dimethyl formamide) as a catalyst to obtain an intermediate compound D;

step four: synthesis of Compound E: reacting the compound D and Boc-3-iodine-L-alanine methyl ester serving as reactants in an anhydrous N, N-dimethylformamide solvent under the protection of nitrogen by using palladium acetate as a catalyst and S-Phos as a ligand to obtain an intermediate compound E;

step five: synthesis of Compound F: under the condition that methanol and water are used as solvents, potassium hydroxide is used as alkali, and an intermediate compound F is obtained through reaction;

step six: synthesis of Compound A: and reacting the compound F under the condition of taking anhydrous dichloromethane as a solvent and taking trifluoroacetic acid as a catalyst to obtain a target product (tryptophan analogues A1-A12).

Preferably, in the fourth step, the catalyst palladium acetate is used in an amount of 2% by mole of the substrate (compound D);

the reaction time of the first step is 1-2h, the reaction time of the second step is 2h, the reaction time of the third step is 8h, the reaction time of the fourth step is 5h, the reaction time of the fifth step is 2-3h, and the reaction time of the sixth step is 2 h;

the reaction temperature in the first step is 90 ℃, the reaction temperature in the second step is 0 ℃, the reaction temperature in the third step is 0 ℃, the reaction temperature in the fourth step is 40 ℃, the reaction temperature in the fifth step is 25 ℃, and the reaction temperature in the sixth step is 0 ℃.

Preferably, in S2, (1) screening a mutant of chimeric phenylalanyl-tRNA synthetase that specifically recognizes a tryptophan analog by constructing a saturated mutagenic gene library for amino acids in the amino acid binding pocket of the chimeric phenylalanyl-tRNA synthetase; (2) identifying the recognition efficiency and specificity of the phenylalanine aminoacyl-tRNA synthetase mutant by GFP fluorescence and LC-MS mass spectrometry; (3) the obtained chimera phenylalanine aminoacyl-tRNA mutant is screened and applied to the expression of bacteria, cells, viruses and other hosts.

In screening for chimeric phenylalanyl-tRNA synthetase mutants that specifically recognize 6-methoxy-tryptophan (a2), 7-methoxy-tryptophan (a4), 6, 7-methoxy-tryptophan (a5), 6, 7-methyl-tryptophan (a6), 7, 8-dihydrofuran-tryptophan (a7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (a9), 6, 7-furan-tryptophan (a10), 6, 7-dioxol-tryptophan (a11), 6, 7-cyclopentane-tryptophan (a12), the amino acid binding pocket for the chimeric phenylalanyl-tRNA synthetase amino acid was determined by comparing the results of the results obtained using the results of the screening for chimeric phenylalanyl-tRNA synthetase mutants that specifically recognize 6-methoxy-tryptophan (a2), 7-methoxy-tryptophan (a4), 6, 7-dihydrofuran-tryptophan (A8), m490, T467 and a507) to construct saturation mutagenesis gene libraries (E391NNK, V393NNK, F464NNK, M490NNK, T467G and a507G), screening chimeric phenylalanyl-tRNA synthetase mutants specifically recognizing 6-methoxy-tryptophan (a2), 7-methoxy-tryptophan (a4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (a7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (a9), 6, 7-furan-tryptophan (a10), 6, 7-dioxole-tryptophan (a11), 6, 7-cyclopentane-tryptophan (a12) by positive and negative selection strategies, finally obtaining a mutant comprising E391D, six mutant phenylalanine aminoacyl-tRNA synthetase mutants of V393G, M490V, F464V, T467G and a507G, wherein the nucleotide sequences and amino acid sequences of the phenylalanine aminoacyl-tRNA synthetase mutants are as shown in SEQ: ID 1-2.

Preferably, in S3, (1) the tryptophan-corresponding site of the decoded protein forming the aromatic cage is mutated to the termination codon (TAG), (2) the decoded protein mutant is cotransferred with the chimeric phenylalanyl-tRNA synthetase mutant and the corresponding tryptophan analog (typically 1mM) is added during expression, (3) the decoded protein variant is purified according to the GST-TAG protein purification method and the fidelity of the decoded protein variant is identified by LC-MS.

Preferably, the nucleotide sequence and amino acid sequence of the chimeric phenylalanine-tRNA synthetase mutant recognizing the tryptophan analogue a1-a6 are respectively shown in SEQ ID NO: 1-2.

Preferably, the method takes histone methylation decoding protein structural domain as a research object, and the decoding protein structural domain is any one of Chromo, PHD, PWWP, Tudor, MBT, CW, SPIN and BAH structural domain.

The application of the protein with the tryptophan analogues obtained by the method in establishing a decoded protein super-parent recognition system for specifically recognizing histone methylation modification. Taking H3K4me3 as an example, establishing a decoding protein KDM5A-PHD3 super-parent for identifying H3K4me3, and naming the super-parent as PHD; taking H3K9me3 as an example, establishing a decoding protein CDY1-Chromo super-parent for identifying H3K9me3, and naming the super-parent as Chromo super-parent; taking H3K27me3 as an example, establishing a decoding protein BAHD1-BAH super-parent for identifying H3K27me3, and naming the protein as BAH super-parent; taking H3K36me3 as an example, establishing a decoding protein DNMT3B-PWWP super-parent for identifying H3K36me3, and naming the super-parent as PWWP.

The method establishes the decoding protein super-parent specifically recognizing the histone methylation modification, the affinity of the decoding protein super-parent reaches nanomolar level, the titer is superior to that of a histone methylation modification specific antibody, and the decoding protein super-parent is used for detecting the histone methylation modification in a biological sample. The decoded protein super-parent specifically recognizing histone methylation modification is marked by a fluorescent group, and can be applied to detecting histone methylation modification in a biological sample by an imaging technology. The decoded protein super-parent recognition system specifically recognizing histone methylation modification can be applied to living body imaging and dynamically detecting the change of histone methylation modification. The histone methylation modified decoded protein super-parent recognition system can be used for histone methylation modification of enriched samples and applied to a single cell sequencing technology. The tryptophan analogue with strong electron supply side chain substitution can improve the affinity of the decoded protein and histone methylation modification by 4-8 times, and the decoded protein with repeated tandem improves the affinity of the decoded protein and histone methylation modification.

Taking KDM5A PHD3 as an example,

(1) the specific recognition of H3K4me3 by the aromatic cage formed by W18 and W28 is judged by the crystal structure (PDB: 2KGI), so that the W18 site and the W28 site of KDM5A PHD3 are mutated into stop codons, and the nucleotide sequences are respectively shown as SEQ ID NO: 3 to 4.

(2) Specifically introducing tryptophan analogs such as 6-cyano-tryptophan, 7-cyano-tryptophan, 6-chloro-tryptophan, 7-chloro-tryptophan, 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9) into the W18 and W28 sites of the PHD3 decoding protein domain, respectively, using a chimeric phenylalanine translation system, Any one of 6, 7-furan-tryptophan (a10), 6, 7-dioxole-tryptophan (a11), 6, 7-cyclopentane-tryptophan (a12) to obtain PHD3 decoding protein domain variant protein.

(3) The micro calorimetric electrophoresis apparatus is used for measuring the affinity of the PHD3 decoding protein domain variant and H3K4me3, when the non-natural amino acid A2 is inserted into the W28 site of the PHD3 decoding protein domain, the affinity of the PHD3 and H3K4me3 can be improved by 8 times, and the decoding protein variant is PHD3-W28-A2 and is named as PHD. The amino acid sequence of the specific H3K4me3 polypeptide is detailed in Table 2.

(4) In some specific embodiments, the site-specific introduction of 6-methoxy-tryptophan (a2) into different histone methylation-modified decoding protein domains increases the affinity of the decoding protein domain for histone methylation modifications.

Taking H3K9me3 as an example, a Chromo domain of CDY1 is selected as a research object, and after 6-methoxy-tryptophan is inserted into a W28 site of the Chromo domain of CDY1, the affinity of the Chromo domains of H3K9me3 and CDY1 is improved by 2 times; taking H3K27me3 as an example, the BAH domain of BAHD1 is selected as a research object, and when 6-methoxy-tryptophan is inserted into the W667 site of the BAH domain of BAHD1, the affinity of H3K27me3 and the BAH domain is improved by 5 times; taking H3K36me3 as an example, the PWWP domain of DNMT3B is selected as a research object, and when 6-methoxy-tryptophan is inserted into the W263 site of the PWWP domain of DNMT3B, the affinity of the PWWP domains of H3K36me3 and DNMT3B is improved by 7 times. Wherein the nucleotide sequence and protein sequence of Chromo domain of CDY1 are as shown in SEQ ID NO: 5-6; wherein the nucleotide sequence and protein sequence of BAH domain of BAHD1 are shown in SEQ ID NO: 7-8; wherein the nucleotide sequence and protein sequence of the PWWP domain of DNMT3B are set forth in SEQ ID NO: 9-10.

The following illustrates the application of several aspects of the present invention.

The invention provides a method for establishing tandem repeat histone methylation decoding protein to improve the affinity of histone methylation modification and decoding protein.

Taking the PHD3 as an example,

(1) constructing a decoding protein variant with multiple repeats, and converting SEQ ID NO: 4 as a template, constructing duplex and triplet decoding protein mutants, wherein the duplex and triplet decoding proteins carry 2 and 3 amber Terminators (TAGs) respectively, and the nucleotide sequences of the specific duplex and triplet mutants are shown in SEQ ID NO: 11 to 12.

(2) The duplex and tripartite PHD protein variants, designated 2x PHD and 3x PHD, respectively, were obtained by site-specific introduction of 2 or 3 6-methoxy-tryptophan (a2) at the amber terminator site of the duplex or tripartite PHD protein by genetic code expansion techniques.

The affinity of the duplex or triplex PHD protein variants to H3K4me3 was determined using a microcalorimetry electrophoresis apparatus. The obtained duplex and triplet PHD variants had 14.7-fold and 62.9-fold improved affinity for H3K4me 3.

In some specific embodiments, the above strategy is equally applicable to decoding protein domains that are methylation modified with other histones.

The invention provides a method for detecting histone methylation modification by a histone methylation modification super-parent molecule recognition system.

(1) Expressing and purifying the PHD protein variant substituted by 6-methoxy-tryptophan (A2) to obtain PHD protein, 2xPHD protein and 3xPHD protein.

(2) In the case of HeLa cell lysate, HeLa cells were lysed, then they were diluted in a gradient to different concentrations, and the protein samples were separated by SDS-PAGE running gel and transferred to PVDF membrane.

(3) After PVDF membrane milk blocking, H3K4me3 specific antibody, PHD protein, 2x PHD protein and 3x PHD protein were respectively incubated overnight.

(4) PVDF membranes incubated with H3K4me 3-specific antibodies incubated with the corresponding secondary antibodies. The PVDF membrane incubated with PHD protein further incubated with GST-specific antibodies, and finally the corresponding secondary antibodies.

(5) And (4) performing chemiluminescence imaging. Compared to the H3K4me3 specific antibody, the 2x PHD protein and the 3x PHD protein showed higher detection ability.

The invention provides a method for detecting histone methylation modification by a histone methylation modification super-parent molecule recognition system through an imaging technology. The method can be applied to living cell imaging and can also be applied to in vitro immunofluorescence imaging technology. The method can be applied to different cells, such as: HEK 293T cell line, HeLa cell line, NCI-60 cell line, CHOs cell line, etc.

The application of the method to living cell imaging comprises the following steps:

(1) constructing a plasmid expressed by cells, and expressing the plasmid with SEQ ID NO: 4, cloning a PHD-W28TAG fragment by taking the pEGFP-EGFP as a template, cloning a vector fragment by taking the pEGFP-EGFP as a template, and constructing a plasmid of the pEGFP-PHD-W28TAG-EGFP by Gibson assembly; the peptide represented by SEQ ID NO: cloning phenylalanine aminoacyl-tRNA synthetase (chPheRS) fragment with 1 as template, cloning carrier fragment with pCDNA3.1 as template, and assembling with Gibson to construct pCDNA3.1-chPheR9 plasmid. The plasmid map is shown in FIG. 1.

(2) The two plasmids were prepared according to 1: HEK 293T cells were transfected with a molar ratio of 1. After 6-8 hours the solution was changed and 2mM 6-methoxy-tryptophan (A2) was added.

(3) And detecting the EGFP fluorescence signal change by a living cell imaging microscope.

The immunofluorescence imaging technology applied to in vitro comprises the following steps:

(1) expressing and purifying the PHD protein variant substituted by 6-methoxy-tryptophan to obtain PHD protein, 2xPHD protein and 3xPHD protein.

(2) The marker proteins, PHD protein, 2XPHD protein and PHD protein were labeled with NHS-Cy5 activated lipid, respectively.

(3) Taking the HeLa cell line as an example, the formaldehyde-fixed cells were incubated with the Cy 5-labeled PHD protein or histone methylation-modified specific antibody, respectively, and after incubation, the localization of the corresponding histone methylation modification was detected by confocal microscope imaging. Compared with an H3K4me3 specific antibody, the 2xPHD protein variant and the 3xPHD protein variant have higher signal-to-noise ratio.

The invention provides a method for detecting a histone methylation modification interaction relation group by a histone methylation modification super-affinity molecule recognition system through a proximity labeling technology.

The development of proximity labeling technology provides a supplement to the traditional methods for studying intermolecular interactions in living cells, and the technology generally utilizes CRISPR gene editing technology or plasmid-based expression to express proximity biotinylation enzyme and bait protein in a fusion manner in cells. After the exogenous biotin is added, the protein adjacent to the bait protein is biotinylated, and the biotinylated protein can be enriched by streptavidin-coupled magnetic beads and then identified by mass spectrometry. The affinity of the H3K4me3 super-parent molecular recognition system provided by the invention to H3K4me3 reaches 7nM, PHD-W28TAG and ascorbate peroxidase APEX/APEX2 can be expressed in a fusion manner, when a PHD variant is specifically combined with H3K4me3, the APEX2 can mark a proteome adjacent to H3K4me3 with Biotin under the stimulation of hydrogen peroxide, then the proteome is enriched through Streptavidin, and finally the proteome is analyzed through LC-MS. The method is not limited to the proximity labeling technique based on APEX2, but is also applicable to other proximity biotinidase-based techniques. Such as: horseradish peroxidase HRP and biotin ligase BioID, BASU, TurboID, miniTurbo, and the like.

Compared with the prior art, the invention has the main advantages that:

1. the present invention provides a method for enhancing cation-pi interactions, which can be applied to any biomacromolecule that undergoes cation-pi interactions. The present invention provides synthetic routes of 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophan (A11), 6, 7-cyclopentane-tryptophan (A12), all of which are applicable to genetic code expansion technology, the cation-pi interaction can be improved.

2. The present invention has a wide range of applications including (but not limited to): the method is combined with a living cell imaging technology and applied to detecting the dynamic change of histone methylation modification; detecting histone methylation modification in a biological sample by combining with an immunofluorescence technique; the genome related to histone methylation modification can be analyzed by combining with a genome sequencing technology and the like; the histone methylation modification interaction proteome can be identified by combining with a proximity labeling technology and the like. Specifically, the method comprises the following steps: (1) the 6-methoxy-tryptophan (A2) is introduced into the tryptophan site forming the cation-pi interaction by using the genetic code expansion technology, so that the affinity between the biomolecules can be improved by 4-8 times. (2) The PHD site specificity of the decoding protein of H3K4me3 is introduced into 6-methoxy-tryptophan (A2), so that the affinity of H3K4me3 and PHD is improved by 8 times, and the affinity of the PHD variant and H3K4me3 reaches 7nM after triple design. (3) The histone methylation super-parent molecule recognition system has high sensitivity in recognizing histone methylation modification, and shows higher specificity and sensitivity compared with a histone methylation modification specific antibody. (4) The histone methylation modified super-parent molecule recognition system has the advantages of easy modification, economy and capture of a plurality of PTMs, and can develop a plurality of super-parent molecule recognition systems aiming at specific methylation modification.

Drawings

FIG. 1 is a plasmid map;

FIG. 2 is a chemical synthesis pathway, A is the synthesis pathway of 6-methoxy-tryptophan (A2) and 7-methoxy-tryptophan (A4), B is the chemical synthesis pathway of 6, 7-cyclopentane-tryptophan A12;

FIG. 3 is an inventive strategy for modulating cation- π interactions between histone methylation and its decoded proteins using genetic code expansion techniques. (A) The amount of tryptophan in the aromatic cage component of the decoded protein is modified by methylation of different histones. (B) A flow diagram of a histone methylation super-parent molecule recognition system developed by applying genetic code expansion technology. Taking H3K4me3 as an example, the tryptophan in the aromatic cage of the decoded protein is replaced by the tryptophan analogue by utilizing the genetic code expansion technology, so that the cation-pi interaction in the aromatic cage is regulated and controlled, and the unnatural amino acid analogue which obviously improves the methylation affinity of the decoded protein and histone is obtained. (C) The structural formula of the non-natural amino acid used in the present invention;

FIG. 4 is a graph identifying the efficiency and specificity of chimeric alanine aminoacyl-tRNA synthetases A2 and A4, wherein: GFP fluorescence report experiments identify the efficiency of chimeric phenylalanine aminoacyltRNA synthetases A2RS (A) and A4RS (B) in recognizing A2 and A4 respectively, and mass spectrometry identifies the fidelity of chimeric phenylalanine aminoacyltRNA synthetases A2RS (C) and A4RS (D) in recognizing A2 and A4 respectively;

figure 5 is a PHD domain variant protein designed to increase affinity to H3K4me 3. (A) Complex structure (PDB: 2KGI) of KDM5A PHD3 protein and H3K4me3 polypeptide, wherein PHD3 and polypeptide are displayed in cartoon mode, and aromatic amino acid and H3K4me3 are modified and displayed in stick-shaped structure. (B) Coomassie Brilliant blue displays PHD-W18-UAA and PHD-W28-UAA variants. (C) The affinity of the PHD-W18-UAA variant to H3K4me3 was determined by microcalorimetry, in which H3K4me3 was labeled with FITC fluorophore. (D) The affinity of the PHD-W28-UAA variant and H3K4me3 is measured by a micro thermophoresis kinetic instrument, wherein H3K4me3 is marked by FITC fluorescent group;

figure 6 is a multivalent tandem repeat PHD domain designed to recognize H3K4me 3. (A) Multivalent tandem repeat PHD domains design cartoon diagrams. (B) Coomassie blue staining identifies the purity of the concatameric PHD protein variants. (C) The micro-calorimetric electrophoresis apparatus is used for measuring the affinity of the multi-linked PHD protein variant and H3K4me3, wherein H3K4me3 is labeled by FITC fluorescent group;

FIG. 7 is the detection and imaging of H3K4me3 using the histone methylation super-parent molecule recognition system. (A) A strategy diagram of a histone methylation super-parent molecule recognition system applied to detection and imaging is shown. (B) H3K4me3 levels of HeLa cells were detected using histone methylated super-philic molecules. H3 specific antibody and H3K4me3 specific antibody are used as control groups, and PHD-WT, 2xPHD and 3xPHD are respectively used for detecting H3K4me 3. (C) The histone methylation super-parent molecular recognition system is applied to the fluorescent imaging detection of the H3K4me3 positioning of cells, wherein PHD protein is marked by Cy 5;

FIG. 8 shows the efficiency of the system in mammalian cell flow cytometry to detect the efficiency of screened chimeric phenylalanyl-tRNA synthetases in recognizing 6-methoxy-tryptophan, 6, 7-methoxy-tryptophan and 6, 7-methyl-tryptophan in mammalian cells;

FIG. 9 shows the application of histone methylation super-affinity molecule recognition system established by genetic code expansion technology to the near label detection protein interaction group, (A) experimental flow chart, and (B) GO analysis data.

Detailed Description

The technical solution of the present invention will be further specifically described below by way of specific examples. It is to be understood that the practice of the present invention is not limited to the following examples, and that any variations and/or modifications may be made to the present invention without departing from the scope thereof.

In the present invention, all parts and percentages are by weight unless otherwise specified, and the equipment and materials used are commercially available or commonly used in the art. The methods in the following examples are conventional in the art unless otherwise specified.

The sequences of the primers used in the construction of the vectors of the invention in the specific examples are shown in Table 1:

table 1: primer sequences for constructing vectors

The inventive strategy for modulating cation-pi interactions between histone methylation and its decoded proteins using genetic code expansion techniques is shown in FIG. 3, and the following example illustrates a specific method.

Example 1: chemical Synthesis of Compounds A2 and A4

To 50mL of anhydrous N, N-dimethylformamide solution in B (2.0g,13.6mmol) was added potassium hydroxide (1.68g,29.9mmol), and the mixture was stirred at room temperature for 20 min. 30mL of an iodine solution of anhydrous N, N-dimethylformamide (4.14g,16.3mmol) was added dropwise to the reaction flask, and stirring was continued at room temperature for 2 h. The reaction mixture was poured into an ice-water solution containing 0.1% sodium thiosulfate. The mixture was put in a refrigerator to ensure complete precipitation. The precipitate was filtered, washed with cold water and then dried in vacuo. 3-iodo-1H-indole B (90% yield of B2, 93% yield of B4) was obtained as a light yellow solid and used in the next step without further purification.

The solid B obtained in the first step (2.73g,10.0mmol) was dissolved in 30mL of anhydrous N, N-dimethylformamide. After washing 60% NaH (391.2mg,16.3mmol) with hexane, it was suspended in 10mL of anhydrous N, N-dimethylformamide under nitrogen. Feed B was slowly added to the suspension under ice-bath conditions, and after stirring for 10min, p-toluenesulfonyl chloride (2.1g,11.0mmol) was added and stirred at 25 ℃ for 5 h. The mixture was poured into water, extracted three times with ethyl acetate, and then the ethyl acetate organic layer was washed with saturated brine and dried over anhydrous sodium sulfate. The organic phase is then concentrated under reduced pressure. Column chromatography using petroleum ether and ethyl acetate gave compound C as a white solid (85% C2, 81% C4).

Dry degassed N, N-dimethylformamide was charged under nitrogen to a vessel containing zinc dust (3.9g,50.0 mmol). TMSCl (108.6mg,1.0mmol) was added and stirred vigorously at room temperature for 30min, and after stopping stirring, zinc was precipitated. The supernatant was extracted with a syringe under a nitrogen flow, and then a new N, N-bisMethyl formamide is added to the zinc. After stirring was continued for 2min, stirring was stopped to precipitate zinc powder, and the supernatant was removed as before, and this step was repeated twice more. 1, 2-dibromoethane (751.4mg,4.0mmol) was then added to the vessel and stirred at 80 ℃ for 30 min. After the mixture was cooled to 25 ℃, TMSCl (325.8mg,3.0mmol) was added and the resulting mixture was stirred for a further 30 min. Boc-3-iodo-L-alanine methyl ester (3.95g,12mmol) was dissolved in 10mL of N, N-dimethylformamide and added to the activated zinc powder and the mixture stirred vigorously. After the exotherm subsided (controlled by the ice bath), stirring was continued for a further 30min, at which time stirring was stopped and zinc was allowed to precipitate. The supernatant was gently removed with a syringe and poured into a clean reaction flask under a flow of nitrogen. The supernatant was transferred by syringe to Compound D (2.13g,5.0mmol), Pd (OAc) ₂ (112.2mg,0.5mmol) and S-Phos (410.5mg, 1.0 mmol). And reacting for 4 hours under the protection of nitrogen. After completion of the reaction, the mixture was poured into water, extracted with ethyl acetate, and the upper organic layer was washed with brine, dried over anhydrous sodium sulfate, and concentrated under reduced pressure, followed by purification by petroleum ether and ethyl acetate column chromatography to give compound E (57% yield of E2, 45% yield of E4) as a pale yellow oil.

The product E was analyzed and the results were as follows: E2) ¹ H NMR(500MHz,CDCl ₃ )δ7.70(d,J＝8.5 Hz,2H),7.48(d,J＝2.3Hz,1H),7.30(d,J＝8.7Hz,1H),7.22(d,J＝8.1Hz,3H),6.84 (dd,J＝8.7,2.3Hz,1H),5.05(d,J＝8.0Hz,1H),4.60(d,J＝7.1Hz,1H),3.86(s,3H), 3.62(s,3H),3.14(qd,J＝14.7,5.6Hz,2H),2.34(s,3H),1.49–1.26(m,9H). ¹³ C NMR (125MHz,CDCl ₃ )δ172.15,158.25,155.13,145.01,136.29,135.28,129.98,126.82, 123.22,120.13,117.44,112.55,98.09,80.26,55.91,53.69,52.47,28.46,21.70.HRMS (ESI)m/z calcd.For C ₂₀ H ₂₃ N ₂ O ₅ S ⁺ (M-Boc) ⁺ 403.1322,found 403.1331.E4) ¹ H NMR (500MHz,CDCl ₃ )δ7.69(d,J＝8.1Hz,2H),7.62(s,1H),7.24(d,J＝8.1Hz,2H),7.16 –7.06(m,2H),6.67(dd,J＝7.3,1.5Hz,1H),5.15(d,J＝8.0Hz,1H),4.65(dt,J＝8.0, 5.5Hz,1H),3.71(s,3H),3.67(s,3H),3.32–3.11(m,2H),2.37(s,3H),1.44(s,9H). ¹³ C NMR(125MHz,CDCl ₃ )δ172.28,155.16,147.49,144.21,137.36,133.77,129.43, 127.30,126.91,124.79,124.04,114.95,112.05,107.16,80.12,55.54,53.85,52.49,28.41, 28.01,26.99,21.67.HRMS(ESI)m/z calcd.For C ₂₀ H ₂₃ N ₂ O ₅ S ⁺ (M-Boc) ⁺ 403.1322, found 403.1334.

compound E (973.2mg,2.0mmol) was dissolved in 50mL of methanol, NaOH (1.2g,30.0 mmol) was added and dissolved in 20mL of H ₂ And (4) in O. The mixture was heated at reflux for 8h, and then methanol was evaporated under reduced pressure to a volume of about half the reaction volume. Acidified with ice cold 2M dilute hydrochloric acid and adjusted to pH 3. The aqueous solution was extracted with cold ethyl acetate and the upper organic layer was washed with saturated brine, dried over anhydrous sodium sulfate and evaporated in vacuo to give a colorless oil which gave carbamate F without further purification. F was then dissolved in dichloromethane and trifluoroacetic acid (112.2mg,0.5mmol) was added to deprotect to afford the title compound a (74% yield of a 2%, 68% yield of a4) as a pale yellow solid, the complete synthetic route being shown in figure 2.

The product a was analyzed and the results were as follows: A2) ¹ H NMR(500MHz,D ₂ O)δ7.49(d,J＝8.7Hz, 1H),7.01(s,1H),6.95(d,J＝2.4Hz,1H),6.72(dd,J＝8.7,2.4Hz,1H),3.75(s,3H), 3.45(dd,J＝7.3,5.2Hz,1H),3.03(dd,J＝14.4,5.2Hz,1H),2.86(dd,J＝14.4,7.3Hz, 1H). ¹³ C NMR(125MHz,D ₂ O)δ182.83,155.19,136.74,123.18,122.03,119.51, 110.63,108.80,95.26,56.42,55.78,30.43.HRMS(ESI)m/z calcd.For C ₁₂ H ₁₅ N ₂ O ₃ S ⁺ (M+H) ⁺ 235.1077,found 235.1081.A4) ¹ H NMR(500MHz,D ₂ O)δ7.27–7.22(m,2H), 7.08(td,J＝7.9,0.9Hz,1H),6.78(d,J＝7.7Hz,1H),4.31(ddd,J＝6.3,5.4,0.9Hz, 1H),3.94(s,3H),3.44(ddt,J＝15.4,5.3,0.9Hz,1H),3.36(dd,J＝15.4,7.3Hz,1H)). ¹³ C NMR(125MHz,D ₂ O)δ171.83,146.06,128.04,126.58,124.97,120.10,117.44, 115.12,106.92,55.63,53.27,25.83.HRMS(ESI)m/z calcd.For C ₁₂ H ₁₃ N ₂ O ₃ S ^- (M-H) ^- 233.0932,found 233.0939.

example 2: library construction and positive-negative screening of chimera phenylalanyl-tRNA synthetase mutant

The gene sequence of the chimera phenylalanyl-tRNA synthetase chPheRS in the embodiment is shown as SEQ ID NO: 1 is shown.

(1) Selecting the amino acid binding site of the chimeric phenylalanyl-tRNA synthetase by taking the structure of the human mitochondria phenylalanyl-tRNA synthetase as a reference: f464, T467 and a507, and the amino acids surrounding the binding pocket: e391, V393, M490.

(2) The gene fragment is amplified by taking chimeric phenylalanyl-tRNA synthetase (T467G and A507G) as a template and primers chPheRS-E391NNK-V393NNK-R/F, chPheRS-M490NNK-R/F and chPheRS-F464NNK-R/F, wherein the nucleotide sequence of the primers is shown as SEQ ID NO: 19-24, cloning the mutant library into the pBK vector by Gibson assembly to generate chPheRS mutant gene library (E391NNK, V393NNK, M490NNK, F464NNK, T467G, and A507G).

(3) Transforming pNEG-chPheT-Barnase-2 TAG into Escherichia coli DH10B to prepare negative selection competent cells, wherein the plasmid map of the negative selection competent cells is shown in figure 1; positive screening competent cells were prepared by transforming pNEG-3C11-CAT-112TAG-GFP190TAG into E.coli DH10B, and the plasmid map is shown in FIG. 1.

(4) The screening library of (2) was transformed into negative screening competent cells, and the bacterial solution was spread on LB plate (kanamycin, 50. mu.g/mL; ampicillin, 100. mu.g/mL; 0.2% L-arabinose) and incubated at 37 ℃.

(5) The plasmids were extracted from the clones in (4) and transformed into positive selection competent cells, and the whole culture was spread on LB agar plate (kanamycin, 50. mu.g/mL; ampicillin, 100. mu.g/mL; chloramphenicol, 10. mu.g/mL; 0.2% L-arabinose; 2mM unnatural amino acid) supplemented with an unnatural amino acid, cultured at 37 ℃ for 12 hours, and further cultured at 30 ℃ for 48 hours.

Example 3: screening of chimeric phenylalanine aminoacyl-tRNA synthetase mutant for specifically recognizing unnatural amino acid through GFP (green fluorescent protein) fluorescence report experiment

(1) After two rounds of forward screening, the single clones with fluorescent signals from example 2 were picked for overnight culture.

(2) According to the following steps: 100 percent of the strain solution in the step (1) is inoculated, when the strain solution is cultured at 37 ℃ until OD600 is 0.6-0.8, 0.2 percent of L-arabinose is added for induction expression, and 1mL of the strain solution is added with 1mM of corresponding unnatural amino acid and expression is carried out for 20h at 30 ℃.

(3) After 750. mu.L of the bacterial suspension in (2) was centrifuged, 150. mu.L of 1 XBugbuster (Millipore, Lot: 3492682) was added and the mixture was incubated at 25 ℃ for 30min, followed by centrifugation, 100. mu.L of the supernatant was transferred to a 96-well plate, and 100. mu.L of the bacterial suspension in (2) was simultaneously subjected to measurement of the GFP fluorescence signal intensity and OD of the corresponding clone by means of a microplate reader Bio Tek Synergy NEO2 ₆₀₀ And calculating the efficiency of the mutant for recognizing the unnatural amino acid.

(4) The chimera phenylalanine aminoacyl-tRNA synthetase mutant which can efficiently identify corresponding unnatural amino acid is sequenced to obtain a specific mutant sequence, and the corresponding cloned plasmid is placed at the temperature of minus 20 ℃ for standby.

(6) Finally, the mutant of the chimeric phenylalanyl-tRNA synthetase which recognizes 6-methoxy-tryptophan, 7-methoxy-tryptophan, 6, 7-methyl-tryptophan and 6, 7-methoxy-tryptophan was identified, and the mutant of the phenylalanyl-tRNA synthetase which comprises six mutations of E391D, V393G, M490V, F464V, T467G and A507G is named chPheRS9, and the nucleotide sequence and the amino acid sequence of the mutant of the phenylalanyl-tRNA synthetase are detailed in SEQ ID NO: 1-2.

(7) The efficiency of the chimera phenylalanine translation system for recognizing the unnatural amino acid under different concentrations of the unnatural amino acid is determined by a GFP fluorescence report experiment. The efficiency and fidelity of recognition of 6-methoxy-tryptophan and 7-methoxy-tryptophan by the chimeric phenylalanyl-tRNA synthetase is shown in FIG. 4.

Example 4: serial plasmid construction of KDM5A PHD3(PHD) Domain

All plasmids were constructed by the Gibson assembly system, except where specifically indicated. A series of plasmid constructs of the KDM5A PHD3(PHD) domain are exemplified.

1. PHD wild type plasmid: and (3) amplifying a GST tag by using a primer pNEG-GST-F/R by taking a pGEX-6p vector as a template, wherein the nucleotide sequence is shown as SEQ ID NO: 25-26; the cDNA was used as a template to amplify the PHD domain (Uniport ID: P29375, nucleotide 1598-1663, nucleotide sequence of the primer is shown in SEQ ID NO: 27-28), pNEG-2 chPheT vector was used as a template to amplify the vector by using primer pNEG-PHD-V-F/R, nucleotide sequence of the primer is shown in SEQ ID NO: 29-30, and plasmid map is shown in 1.

2. PHD mutant plasmid: using pNEG-2 chPheT-PHD-GST as a template, introducing site-directed mutagenesis of amber codon in a PHD domain W28 by using a primer pNEG-PHD-W28TAG-F/R, and constructing a plasmid pNEG-2 chPheT-PHD-W28TAG-GST through Gibson assembly; plasmid pNEG-2 × chPheT-PHD-W18TAG-GST was constructed by Gibson assembly using primer pNEG-PHD-W18TAG-F/R to introduce site-directed mutagenesis of the amber codon in PHD domain W18, the nucleotide sequence of the primer being as shown in SEQ ID NO: 31-34.

3. Multivalent tandem repeat PHD domain plasmids: and (2) amplifying a PHD-W28TAG fragment containing 6x-linker (GGSGGS) by using pNEG-2 chPheT-PHD-W28TAG-GST as a template and adopting a primer pNEG-2 PHD-F/R, wherein the nucleotide sequence of the PHD-W28TAG fragment is shown as SEQ ID NO: 35-36; and (2) amplifying a vector by adopting primers pNEG-2 PHD-V-R and pNEG-PHD-V-F, wherein the nucleotide sequence of the vector is shown as SEQ ID NO: 37 and SEQ ID NO: 29, construction of duplex or triplex PHD plasmids by Gibson assembly: pNEG-2 × chPheT-2 xPHHD-W28 TAG-GST and pNEG-2 × chPheT-3 xPHHD-W28 TAG-GST.

4. Multicomponent tandem repeat PHD-Chromo domain plasmid: amplifying the vector by using pNEG-2 chPheT-PHD-W28-GST as a template and adopting primers pNGE-PHD-V-F and pNEG-2 PHD-V-R; and (2) amplifying a CDY1-W2TAG fragment by using pNEG-2 chPheT-CDY1-W28TAG-GST as a template and adopting a primer pNEG-PHD-CDY1-F/R, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 38-39. Construction of multicomponent tandem plasmids by Gibson assembly: pNEG-2 chPheT-PHD-W28TAG-CDY1-W28 TAG-GST. The plasmid map is shown in FIG. 1.

5. Eukaryotic cell expression plasmid: plasmid construction of pEGFP-PHD3-W28 TAG-EGFP: pEGFP-EGFP is taken as a template, and a primer pEGFP-PHD-V-F/R is used for amplifying a vector, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 40-41; and amplifying a PHD structure domain by using pNEG-2 chPheT-PHD-W28TAG-GST as a template and using a primer pEGFP-PHD-F/R, wherein the nucleotide sequence of the PHD structure domain is shown as SEQ ID NO: 42-43, the plasmid was constructed by Gibson assembly and the map of the plasmid is shown in FIG. 1. plasmid construction of pCDNA3.1-chPheRS 9: primers were designed to amplify the chimeric phenylalanyl-tRNA synthetase (chPheRS9) cloned into pcdna3.1 vector under the control of CMV and U6 promoters, respectively, and the primers of the cloned gene and vector were as shown in SEQ ID NO: 44-47, and the map of the plasmid is shown in FIG. 1.

The sequencing of the plasmids is completed by Beijing Okagaku Biotech. The construction of the remaining plasmids was the same as above.

Example 5: expression and purification of wild type and mutant proteins of KDM5A PHD3(PHD)

Expression of KDM5A PHD3(PHD) wild type protein

1. And (3) plasmid transformation: taking out the DH10B chemosensory strain from a refrigerator at-80 ℃, immediately placing the strain into an ice box, adding the plasmid pNEG-2 chPheT-PHD-GST after the strain is melted, and flicking the belly to uniformly mix the strain. Standing in ice bath for 30min, heat-shocking at 42 deg.C for 90s, standing in ice bath for 2min, adding non-anti LB liquid culture medium, recovering at 37 deg.C for 40min, spreading 200 μ L of the bacterial liquid on LB agar plate (ampicillin, 100 μ g/mL), and culturing at 37 deg.C overnight.

2. Inducing expression: single colonies were picked from the resistant plates described above into 3mL of LB liquid medium (ampicillin, 100. mu.g/mL), and cultured overnight with shaking (37 ℃ C., 220 rpm); according to the following steps of 1: inoculating the above bacterial liquid at a ratio of 100, culturing at 37 deg.C to OD ₆₀₀ When the concentration is 0.6-0.8, L-arabinose (final concentration: 0.2%) and ZnCl are added ₂ (final concentration: 0.1mM), expression was induced at 22 ℃ for 24 h.

Second, expression of KDM5A PHD3 mutant protein

The PHD3-W28-6MeOW mutant is exemplified.

1. The plasmids pNEG-2. multidot. chPheT-PHD-W28TAG-GST and pBK-chPheRS9 were co-transformed into E.coli DH10B by the same procedure as above.

2. Inducing expression: single colonies were picked from the resistant plates described above into 3mL of LB liquid medium (ampicillin, 100. mu.g/mL; kanamycin, 50. mu.g/mL), and cultured overnight with shaking (37 ℃ C., 220 rpm); according to the following steps of 1: 100 in 100mL LB liquid medium, cultured at 37 ℃ to OD ₆₀₀ When the concentration is 0.6-0.8, L-arabinose (final concentration: 0.2%) and ZnCl are added ₂ (final concentration: 0.1mM) and a non-natural amino acid (final concentration: 0.5 mM), and induced expression was performed at 22 ℃ for 24 hours.

Thirdly, purification of KDM5A PHD3(PHD)

1, collecting bacterial liquid. The mixture was centrifuged (4 ℃, 4000rpm, 20min) and the deposited bacteria were collected.

2 resuspending the cells. Lysis buffer (20mM Tris-HCl, pH7.5, 150mM NaCl,0.1mM ZnCl) was used ₂ 2mM beta-Me, protease inhibitors PMSF, Aprotinin).

3, carrying out ultrasonic crushing. Setting an ultrasonic instrument program: working for 2s, intermittent for 5s, power for 60 percent and ultrasonic treatment at 4 ℃.

4 centrifugation (4 ℃, 12000rpm, 20min) and collection of the supernatant.

5 apply 0.5mL of GST beads to the gravity column, add ddH ₂ The beads were washed and the column equilibrated with 10 column volumes of lysis buffer.

6 the supernatant from 4 was added to the equilibrated GST column.

7 lysis buffer (20mM Tris-HCl, pH 7.5150 mM NaCl,0.1mM ZnCl) in 20 column volumes ₂ 2mM beta-Me, protease inhibitors PMSF, Aprotinin) elute the unspecifically adsorbed heteroproteins.

8 the eluate, i.e., the target protein fraction, was collected with 10 column volumes of elution buffer (20mM Tris-HCl, pH7.5, 150mM NaCl, 20mM glutathione).

9 the protein after elution was subjected to SDS polyacrylamide gel electrophoresis (SDS-PAGE) to determine the protein expression purity, and the amount of protein expression was measured using Nanodrop (microspectrophotometer, fluorospectrophotometer, Saimer fly). The protein is used for subsequent SDS protein gel electrophoresis analysis, mass spectrum identification and MST experiment.

Figure 5 is a PHD domain variant protein designed to increase affinity to H3K4me 3. Wherein (A) the complex structure of KDM5A PHD3 protein and H3K4me3 polypeptide (PDB: 2KGI) wherein PHD3 and polypeptide are displayed in cartoon mode, and aromatic amino acid and H3K4me3 are modified and displayed in stick structure. (B) Coomassie Brilliant blue shows PHD-W18-UAA and PHD-W28-UAA variants. (C) The affinity of the PHD-W18-UAA variant to H3K4me3 was determined by microcalorimetry, in which H3K4me3 was labeled with FITC fluorophore. (D) The affinity of the PHD-W28-UAA variant and H3K4me3 is measured by a micro thermophoresis kinetic instrument, wherein H3K4me3 is marked by FITC fluorescent group;

the results of SDS protein gel electrophoresis are shown in FIG. 5B, and the purity of the protein reached 90% or more.

Fourth, LC-MS identification of proteins

The purified proteins were analyzed by SCIEX Triple TOF 6600MS mass spectrometer using electrospray ionization and SCIEX analysis TF software. Using a PHENOMENEX AERIS wide pore C4 column (

2.1 × 50mm,3.6 μm) was desalted by separation. Mobile phase a was 0.1% formic acid in water and mobile phase B was 0.1% formic acid acetonitrile. A constant flow rate of 0.2mL/min was set. Mass spectrum data were analyzed by deconvolution of mass spectra using SCIEX OS-Q software (version2.0, SCIEX Corporation). The molecular weight of the protein was predicted using the ExPASy computer pI/Mw tool.

The LC-MS identification results are shown in FIGS. 4C and 4D, the theoretical molecular weight of the target protein is 33378Da, and the actual molecular weights are 33377Da and 33378Da respectively, so that the chimeric phenylalanyl-tRNA synthetase mutant can be proved to specifically recognize 6-methoxy-tryptophan and 7-methoxy-tryptophan.

Example 6: microcalorimetry (MST) measures the affinity of the decoded protein domain variants to histone methylation-modified polypeptides. The polypeptides used in the experiment are all synthesized by Beijing cloisonne department of China, Biotechnology, Inc., and the C end of the peptide segment is marked by Fluorescein Isothiocyanate (FITC), and the specific sequence is shown in Table 2.

TABLE 2 polypeptide sequence information used in MST experiments

The MST determination method is specifically described by taking decoded proteins PHD and H3K4me3 as examples.

(1) Desalting of the protein sample. Protein samples were dialyzed 3 times against 2L of MST buffer (20mM Tris-HCl,50mM NaCl, 1mM DTT, 0.05% Tween-20, pH 7.5).

(2) The protein is concentrated. Protein samples were concentrated to the appropriate concentration using a10 Kd protein concentration tube (Millipore).

(3) Preparing 16 PCR tubes, adding 10 mu L of MST buffer into the No. 2-16 PCR tube, taking 20 mu L of protein sample to the No. 1 tube, pipetting 10 mu L of protein sample from the No. 1 tube to the No. 2 tube, and iteratively diluting the protein sample;

(4) adding 10 μ L of 100nM polypeptide molecule into each tube, and mixing well to obtain 20 μ L total;

(5) and (4) loading the capillary.

(6) Kd values were measured. This was done using a NT.115Monolith instrument (Nano temperature Technologies, Munich, Germany) using a blue LED excitation light source at a constant temperature of 25 ℃. The instrument is set as follows: 20% of blue LED excitation power and 40% of infrared laser power. All measurements were performed using standard glass capillaries (Nano tester Technologies, # catMO-K022) and each set of experiments was repeated 3 times, unless otherwise specified.

(7) And (6) data processing. By NT analysis software, the protein of interest and the fluorescent peptide fragment were expressed in a ratio of 1: 1, fitting the data by using a model of proportional binding to obtain a dissociation constant Kd of the target protein. All data were analyzed by Origin software processing.

(8) Other histone methylation modified decoding proteins have the same affinity determination procedure as histone methylation modification.

The experimental results are as follows: experimental data as shown in figures 5C and 5D: the affinity of the PHD wild-type domain and H3K4me3 is 440nM, the affinity of the PHD variant introduced with 6-methoxy-tryptophan at the W28 site is 52nM with H3K4me3, compared with the PHD wild-type domain of PHD3, the affinity of the PHD variant introduced with 6-methoxy-tryptophan at the W28 site is improved by 8 times with H3K4me3, and the affinity of the PHD protein and H3K4me3 is improved by 2-6 times with other electron-donating tryptophan analogs. Similarly, the introduction of 6-methoxy-tryptophan site specificity into other decoding protein domains also improves the affinity of the decoding protein for methylation modification of the corresponding histone by 2-4 times.

Example 7: construction of multivalent tandem repeat PHD Domain to increase its affinity for H3K4me3

1. The duplex and triplet-repeat PHD domain plasmids were constructed as described in example 4: pNEG-2. multidot. chPheT-2 xPHHD-W28 TAG-GST (2x PHD) and pNEG-2. multidot. chPheT-3 xPHHD-W28 TAG-GST (3x PHD).

2. Duplex and triplex PHD variants of 6-methoxy-tryptophan (2x PHD, 3x PHD) were site-specifically introduced by expression purification of W28 as described in example 5, and protein expression purity was identified by SDS polyacrylamide gel electrophoresis (SDS-PAGE) and protein molecular weight was identified by LC-MS.

3. The affinity of the multivalent tandem repeat PHD domain 2x PHD, 3x PHD to H3K4me3 was determined as described in example 6.

4. The strategy is not limited to the interaction between the PHD structural domain and H3K4me3, and can be expanded to the methylation modification of other decoding proteins and other histones, and experiments prove that the strategy can improve the affinity between different decoding protein structural domains and the corresponding histone methylation modifications.

Figure 6 is a multivalent tandem repeat PHD domain designed to recognize H3K4me 3. (A) Multivalent tandem repeat PHD domains design cartoon figures. (B) Coomassie blue staining identifies the purity of the concatemeric PHD protein variants. (C) The micro-calorimetric electrophoresis apparatus is used for measuring the affinity of the multi-linked PHD protein variant and H3K4me3, wherein the H3K4me3 is labeled by FITC fluorescent group.

The experimental results are as follows: the experimental data are shown in fig. 6C: the affinity of the PHD wild-type domain and H3K4me3 is 440nM, the affinity of the duplex and triplet PHD variants with H3K4me3, which are introduced with 6-methoxy-tryptophan by W28 site specificity, is 30nM and 7nM respectively, and the affinity of the duplex and triplet PHD variants with H3K4me3, which are introduced with 6-methoxy-tryptophan by W28 site specificity, is improved by 14.7 times and 62.9 times compared with the PHD wild-type domain of PHD 3. The strategy is expanded to the affinity determination result of other decoding proteins and corresponding histone methylation modification, which shows that: the multivalent tandem repeat histone methylation modification decoding protein structural domain can improve the affinity of the multivalent tandem repeat histone methylation modification decoding protein structural domain to corresponding histone methylation modification.

Example 8: Far-Western Blot evaluation of recognition efficiency of PHD (phospholipoprotein) super-parent molecule on H3K4me3

1. And (4) protein expression. The 6-methoxy-tryptophan substituted PHD protein variants were expressed and purified as described in example 4, example 5 to obtain PHD protein, 2x PHD protein and 3x PHD protein.

2. SDS Polyacrylamide gel electrophoresis. Taking HeLa cell lysate as an example, Hela cells are lysed and then are diluted to different concentrations in gradient, and the protein sample is separated by SDS-PAGE running gel

3. And (5) transferring the film. Proteins were transferred to PVDF membranes. Constant current 300mA, and rotating the film for 2.5 h.

4. And (3) sealing: the membrane was placed in a plastic box containing 5% skim milk/TBST, placed on a shaker, sealed for 1h, and the blocking solution was decanted. Wash 3 times with TBST for 10 min.

5. And (5) incubating the bait protein. H3K4me 3-specific antibodies, PHD protein, 2x PHD protein, and 3x PHD protein were each incubated overnight. TBST washing 3 times, once for 10 min.

6. And (4) incubating the antibody. PVDF membranes incubated with H3K4me3 specific antibodies incubated with the corresponding secondary antibodies at room temperature. PVDF membrane incubation of PHD protein further incubation of GST specific antibody (Sigma-Aldrich, cat # G7781), finally incubation of the corresponding secondary antibody (Proteitech, cat # SA 00001-2). TBST washing 3 times, once for 10 min.

7. And (4) performing chemiluminescence imaging. The PVDF film was covered on the developer, with care for uniform coverage, left at room temperature for 3 minutes, and then developed imagewise on a multifunctional imager.

FIG. 7 is the detection and imaging of H3K4me3 using the histone methylation super-parent molecule recognition system. (A) A strategy diagram of a histone methylation super-parent molecule recognition system applied to detection and imaging is shown. (B) H3K4me3 levels of HeLa cells were detected using histone methylated super-philic molecules. H3-specific antibody and H3K4me 3-specific antibody were used as control groups, and PHD-WT, 2xPHD and 3xPHD were used to detect H3K4me3, respectively. (C) The histone methylation super-parent molecular recognition system is applied to fluorescence imaging detection of H3K4me3 positioning of cells, wherein PHD protein is labeled by Cy 5.

The experimental results are as follows: experimental data as shown in fig. 7B, the 2x PHD protein and the 3x PHD protein showed higher signal to noise ratios compared to the H3K4me3 specific antibody. The experiment is not limited to the interaction of PHD and H3K4me3, and is also applicable to the detection of the corresponding histone methylation modification by different decoding proteins.

Example 9: the histone methylation super-parent molecular recognition system is combined with an immunofluorescence technology to detect histone methylation modification.

Histone methylation H3K4me3 and PHD decoding protein are taken as examples.

1. The PHD variant is labeled. The PHD decoding protein domain was labeled with Cy5 dye. NHS-Cy5 was dissolved in DMSO and the PHD-decoded protein was in PBS. NHS-Cy5 and PHD proteins were expressed as 1: 2 molar ratio, incubating at 37 ℃ for 1h in the absence of light, and terminating the reaction with 50mM Tris-HCl, pH 8.0 solution.

2. The PHD protein was purified using a PD MiniTratTM G-25 desalting column (Cytiva, cat # 28918007).

3. And (4) preparing a cell sheet. HeLa cells were seeded on a petri dish on which a treated cover glass was placed in advance, and cultured at 37 ℃.

4. And (4) fixing the cells. After the cells were fully adherent, the medium was removed, rinsed 1 time with PBS, fixed with 4% paraformaldehyde (4% PFA/PBS) for 10min at room temperature, and rinsed 3 times with PBS.

5. And (4) cell permeabilization. Cells were permeabilized with PBS containing 0.5% Triton X-100 for 10min and rinsed 3 times with PBS.

6. And (5) sealing. Blocking was performed at room temperature for 30min using 3% BSA in PBS.

7. Primary antibody incubation. Incubation was performed for 2H at room temperature using the Cy 5-labeled PHD protein (wild type and mutant) and H3K4me3 antibody (Abcam, cat # ab8580) in (2), respectively.

8. And (5) incubating a secondary antibody. Cells incubated with H3K4me3 antibody were rinsed with PBS for 10min each, repeated 3 times. Incubated with Dylight 488, goat rabbit antibody IgG (Abbkine, cat # A23220) for 1h at room temperature. PBS rinse 3 times, 10min each time. Cells incubated with Cy 5-labeled PHD protein were rinsed directly 3 times for 10min each with PBS.

9. And (6) sealing the sheet. Coverslips cells were covered face down on slides with DAPI fixative (Abcam, cat # ab104139) dropped and left overnight in the dark. Mounted on a glass slide for imaging.

10. And (6) imaging. Imaging was performed at room temperature using an LSM710 confocal microscope (Zessi) with a 63x oil lens. All images were analyzed and processed using ZEN 2.3lite software from Zeiss.

The experimental results are as follows: experimental data as shown in figure 7C immunofluorescence results indicate that Cy 5-labeled PHD super-parent molecule was able to detect co-localization of H3K4me3 with M-phase condensed chromosomes during mitosis. The PHD super-parent molecule has a higher signal-to-noise ratio compared to the commercially available H3K4me3 antibody (Abcam, cat # ab 8580).

Example 10: flow cytometry analysis of efficiency of chimeric phenylalanine translation System in mammalian cells

1. Cells were transfected. 293T cells were transfected according to the standard plasmid transient transfection protocol, the experimental group was cells co-transfected with plasmid pCDNA3.1-chPheRS9 expressing the chimeric phenylalanine translation system and the fluorescent reporter plasmid pEGFP-mCherry-T2A-EGFP-190TAG, and the control group was cells infected with pEGFP-mCherry and pEGFP-EGFP alone.

2. After 48h of cell transfection, the medium was aspirated off and 1 × PBS was added to wash out residual medium.

3. The PBS solution was aspirated, trypsinized cells were added, 1mL of DMEM medium was added to resuspend the cells, and the cells were transferred to a 1.5mL centrifuge tube.

4. The forward and side scatter gates of the flow cytometer were set with 293T cells, the parameters and gates of the PE channel were set with cells expressing mCherry, and the parameters and gates of the FITC channel were set with cells expressing EGFP.

5. The experimental group of cells was assayed, setting up 50000 cells collected per sample. Data was analyzed using the software FlowJo.

The experimental results are as follows: experimental data as shown in fig. 8, the results of flow cytometry experiments showed that the chimeric phenylalanine aminoacyl-tRNA synthetase (chPheRS9) can efficiently recognize any one of unnatural amino acids of 6-methoxy-tryptophan (6MeOW), 7-methoxy-tryptophan (7MeOW), 6, 7-methyl-tryptophan (67MW), and 6, 7-methoxy-tryptophan (67MeOW) in mammalian cells. The experimental procedure is applicable to 293T cell lines, but not limited to 293T cell lines, and is applicable to various cell lines.

Example 11: capture of proteomes interacting with histone methylation modifications using proximity labeling technology

1. The pT3 vector is used as a template, a primer pT3-PHD-APEX-V-F/R is used for amplifying the vector, and the nucleotide sequence of the primer is shown as SEQ ID NO: 52-53; and (3) amplifying a PHD gene fragment by using a primer pT3-PHD-F/R by taking pNEG-2 chPheT-PHD-W28TAG-GST as a template, wherein the nucleotide sequence of the primer is shown as SEQ ID NO: 56-57; the primer pT3-APEX2-F/R is used for amplifying an APEX2 gene fragment, and the nucleotide sequence of the primer is shown as SEQ ID NO: 54-55, and the nucleotide sequence and amino acid sequence of APEX2 are shown in SEQ ID NO: 13-14 plasmid pT3-PHD-APEX2 was constructed by Gibson assembly. The plasmid map is shown in FIG. 1.

2. Constructing a stable transgenic cell line for stably expressing the fusion protein of PHD and APEX 2. The plasmid pCMV-SB100 (the specific plasmid map is shown in figure 1) containing the Sleeping Beauty transposon system and the plasmid pT3-APEX2-PHD were co-transformed into HeLa cells, and after 24 hours of culture at 37 ℃, the cells were cultured by DMEM containing 2 mug/mL puromycin, and the solution was changed periodically. After the cells of the blank control group all died, the cells of the experimental group were cultured with DMEM containing 1. mu.g/mL puromycin to obtain a mixed clone stable cell line.

3. The cells were transfected and over-expressed pCDNA3.1-chPheRS9 in a clonally stable cell line, with 2mM addition of 6-methoxy-tryptophan and incubated for 36 h.

4. Proximity labeling catalyzed by APEX2. The stable cell line in step (3) was incubated with DMEM containing 500. mu.M biotin phenol at 37 ℃ for 30 min. The change solution is 1mM H ₂ O ₂ PBS solution, standing at room temperature for 5 min. The cells were rinsed 4 times and 1 time with PBS in turn with pre-chilled 20mM ascorbic acid/PBS. Digestion with pancreatin, neutralization in DMEM, centrifugation (1000g, 1min) and discarding of the supernatant. Finally, PBS was added to resuspend the cells, centrifuged (1000g, 1min), the supernatant was discarded, and the above procedure was repeated 1 time.

5. And (4) separating cell nucleuses. To the cells obtained in step (4), 1.5mL of hypotonic buffer (10mM HEPES,10mM KCl, 0.05% NP40) was added, resuspended, allowed to stand on ice for 10min, centrifuged (4 ℃, 12000rpm, 20min), and the supernatant was discarded. Repeating the above steps for 5-8 times.

6. Lysis of the cell nucleus. To the pellet in step (5), 400. mu.L of lysis buffer (25mM TEOA pH7.5, 150mM NaCl, 0.1% SDS, 1% Triton X-100, 0.5% sodium deoxycholate, 1mM PMSF,1 XPIC) and 20. mu.L of DNase were added, resuspended, left at room temperature for 20min, centrifuged (4 ℃, 18000rpm, 15min), and the supernatant was collected.

7. Enrichment of biotinylated proteins. Streptavidin-coupled magnetic Beads (Streptavidin Beads) were added to the supernatant collected in step (6), and the mixture was incubated overnight at 4 ℃. The cells were washed 1 time with 0.01% NP40/PBS buffer, then washed 3 times with 0.01% NP40/PBS buffer containing 500mM NaCl, 0.01% NP40/PBS buffer containing 0.2% SDS, and 0.01% NP40/PBS buffer containing 2M urea, respectively, washed 1 time with 0.01% NP40/PBS buffer, and finally resuspended by adding 100. mu.L 1 × SDS loading buffer and boiled at 100 ℃ for 10 min.

8. SDS Polyacrylamide gel electrophoresis. The protein sample in step (7) was separated by SDS polyacrylamide gel electrophoresis (SDS-PAGE), stained with Coomassie Brilliant blue G250, and then destained.

9. LC-MS/MS detects proteomes that interact with histone methylation modifications. And (3) cutting the protein strips of the protein gel in the separation step (8) by using a clean blade, and performing decoloration dehydration, drying, reduction, alkylation, enzymolysis, peptide segment extraction, desalting, isotope labeling and desalting treatment respectively. The processed sample passes Q activeThe Orbitrap mass spectrometer analysis was performed by Proxeon nanospray ionization and the HPLC instrument was Proxeon Easy-nLC II HPLC. The samples were loaded into a 100-micron x 20mm Magic C18

Desalting in 5U reverse column, and passing through 75-micronx 100mm Magic C18

The 3U reverse phase column separates the protein sample. And setting the elution flow rate to be 300nL/min and the elution time to be 60min to obtain an MS/MS result. Data processing: the experimental software MaxQuant and pLabel software analysis process the experimental results.

The operation flow of this example is shown in fig. 9, and combines the proximity labeling technology and the histone methylation super-parent recognition system to capture the proteome interacting with histone methylation modification (H3K4me3), which is beneficial to analyze the biological functions performed by H3K4me3 in the life process.

In conclusion, the invention provides a synthetic method of a pyridine alkaloid compound, and the compound is applied to remarkably improve the cation-pi interaction, thereby providing a research method for researching biomacromolecules of the cation-pi interaction, providing a theoretical basis for developing the biotechnology of histone methylation modified super-parents based on decoded proteins, providing possibility for further application, and having great clinical value and development and application value.

It should be understood that the above detailed description of the present invention is only for illustrating the present invention and is not limited to the technical solutions described in the embodiments of the present invention, and those skilled in the art should understand that the present invention can still be modified or substituted equally to achieve the same technical effects; and are within the scope of the present invention as long as the requirements of use are met.

Sequence listing

<110> Zhejiang university

<120> method for improving cation-pi interaction by genetic code expansion and application

<130> ZJDX-002

<160> 49

<170> SIPOSequenceListing 1.0

<210> 1

<211> 1668

<212> DNA

<213> Artificial Synthesis (synthetic sequence)

<400> 1

atggataaga agccgctgga tgttctgatc tctgcgaccg gtctgtggat gtcccgtacc 60

ggcacgctgc acaagatcaa gcactatgag atttctcgtt ctaaaatcta catcgaaatg 120

gcgtgtggtg accatctggt tgtgaacaac tctcgttctt gtcgtcccgc acgtgcattc 180

cgttatcata aataccgtaa aacctgcaaa cgttgtcgtg tttctgacga agatatcaac 240

aacttcctga cccgttctac cgaaggcaaa acctctgtta aagttaaagt tgtttctgag 300

ccgaaagtga aaaaagcgat gccgaaatct gtttctcgtg cgccgaaacc gctggaaaat 360

ccggtttctg cgaaagcgtc taccgacacc tctcgttctg ttccgtctcc ggcgaaatct 420

accccgaact ctccggttcc gacctctgca agcgccccag ctctgactaa atcccagacg 480

gaccgtctgg aggtgctgct gaacccaaag gatgaaatct ctctgaacag cggcaagcct 540

ttccgtgagc tggaaagcga gctgctgtct cgtcgtaaaa aggatctgca acagatctac 600

gctgaggaac gcgagggtgg cggaagcggc ggcggtggcg gaagcggcgg cggtggcgga 660

agcggcggcg gtggaagcca ggcctgggga tcgaggcctc ctgcagcaga gtgtgccacc 720

caaagagctc caggcagtgt ggtggagctg ctgggcaaat cctaccctca ggacgaccac 780

agcaacctca cccggaaggt cctcaccaga gttggcagga acctgcacaa ccagcagcat 840

caccctctgt ggctgatcaa ggagagggtg ttggagcact tcaacaagca gtatgtgggc 900

agctctggga ccccgttgtt ctcggtctat gacaaccttt cgccagtggt cacgacctgg 960

cagaactttg acagcctgct catcccagct gatcacccct gcaggaagaa gggggacaac 1020

tattacctga atcggactca catgctgaga gcgcacacgt ccgcacacca gtgggacttg 1080

ctgcacgcgg gactggatgc cttcctggtg gtgggtgatg tctacaggcg tgaccagatc 1140

gactcccagc actaccctat tttccaccag ctggacgccg gtcggctctt ctctaagcat 1200

gagttatttg ctggtataaa ggatggggaa agcctgcagc tctttgaaca aagttctcgc 1260

tctgcgcata aacaagagac acacaccatg gaggccgtga agcttgttga gtttgatctt 1320

aagcaaacgc ttaccaggct catggcacat ctttttggag atgagccgga gataaggtgg 1380

gtagactgct acgttccttt tggacatcct tcctttgaga tggagatcaa ctttcatgga 1440

gaatggctgg aagttcttgg ctgcggggtg gttgaacaac aactggtcaa ttcagctggt 1500

gctcaagacc gaatcggctg gggatttggc ctagggttag aaaggctagc catgatcctc 1560

tacgacatcc ctgatatccg tctcttctgg tgtgaggacg agcgcttcct gaagcagttc 1620

tgtgtatcca acattaatca gaaggtgaag tttcagcctc ttagcaaa 1668

<210> 2

<211> 556

<212> PRT

<213> Artificial Synthesis (synthetic sequence)

<400> 2

Met Asp Lys Lys Pro Leu Asp Val Leu Ile Ser Ala Thr Gly Leu Trp

1 5 10 15

Met Ser Arg Thr Gly Thr Leu His Lys Ile Lys His Tyr Glu Ile Ser

20 25 30

Arg Ser Lys Ile Tyr Ile Glu Met Ala Cys Gly Asp His Leu Val Val

35 40 45

Asn Asn Ser Arg Ser Cys Arg Pro Ala Arg Ala Phe Arg Tyr His Lys

50 55 60

Tyr Arg Lys Thr Cys Lys Arg Cys Arg Val Ser Asp Glu Asp Ile Asn

65 70 75 80

Asn Phe Leu Thr Arg Ser Thr Glu Gly Lys Thr Ser Val Lys Val Lys

85 90 95

Val Val Ser Glu Pro Lys Val Lys Lys Ala Met Pro Lys Ser Val Ser

100 105 110

Arg Ala Pro Lys Pro Leu Glu Asn Pro Val Ser Ala Lys Ala Ser Thr

115 120 125

Asp Thr Ser Arg Ser Val Pro Ser Pro Ala Lys Ser Thr Pro Asn Ser

130 135 140

Pro Val Pro Thr Ser Ala Ser Ala Pro Ala Leu Thr Lys Ser Gln Thr

145 150 155 160

Asp Arg Leu Glu Val Leu Leu Asn Pro Lys Asp Glu Ile Ser Leu Asn

165 170 175

Ser Gly Lys Pro Phe Arg Glu Leu Glu Ser Glu Leu Leu Ser Arg Arg

180 185 190

Lys Lys Asp Leu Gln Gln Ile Tyr Ala Glu Glu Arg Glu Gly Gly Gly

195 200 205

Ser Gly Gly Gly Gly Gly Ser Gly Gly Gly Gly Gly Ser Gly Gly Gly

210 215 220

Gly Ser Gln Ala Trp Gly Ser Arg Pro Pro Ala Ala Glu Cys Ala Thr

225 230 235 240

Gln Arg Ala Pro Gly Ser Val Val Glu Leu Leu Gly Lys Ser Tyr Pro

245 250 255

Gln Asp Asp His Ser Asn Leu Thr Arg Lys Val Leu Thr Arg Val Gly

260 265 270

Arg Asn Leu His Asn Gln Gln His His Pro Leu Trp Leu Ile Lys Glu

275 280 285

Arg Val Leu Glu His Phe Asn Lys Gln Tyr Val Gly Ser Ser Gly Thr

290 295 300

Pro Leu Phe Ser Val Tyr Asp Asn Leu Ser Pro Val Val Thr Thr Trp

305 310 315 320

Gln Asn Phe Asp Ser Leu Leu Ile Pro Ala Asp His Pro Cys Arg Lys

325 330 335

Lys Gly Asp Asn Tyr Tyr Leu Asn Arg Thr His Met Leu Arg Ala His

340 345 350

Thr Ser Ala His Gln Trp Asp Leu Leu His Ala Gly Leu Asp Ala Phe

355 360 365

Leu Val Val Gly Asp Val Tyr Arg Arg Asp Gln Ile Asp Ser Gln His

370 375 380

Tyr Pro Ile Phe His Gln Leu Asp Ala Gly Arg Leu Phe Ser Lys His

385 390 395 400

Glu Leu Phe Ala Gly Ile Lys Asp Gly Glu Ser Leu Gln Leu Phe Glu

405 410 415

Gln Ser Ser Arg Ser Ala His Lys Gln Glu Thr His Thr Met Glu Ala

420 425 430

Val Lys Leu Val Glu Phe Asp Leu Lys Gln Thr Leu Thr Arg Leu Met

435 440 445

Ala His Leu Phe Gly Asp Glu Pro Glu Ile Arg Trp Val Asp Cys Tyr

450 455 460

Val Pro Phe Gly His Pro Ser Phe Glu Met Glu Ile Asn Phe His Gly

465 470 475 480

Glu Trp Leu Glu Val Leu Gly Cys Gly Val Val Glu Gln Gln Leu Val

485 490 495

Asn Ser Ala Gly Ala Gln Asp Arg Ile Gly Trp Gly Phe Gly Leu Gly

500 505 510

Leu Glu Arg Leu Ala Met Ile Leu Tyr Asp Ile Pro Asp Ile Arg Leu

515 520 525

Phe Trp Cys Glu Asp Glu Arg Phe Leu Lys Gln Phe Cys Val Ser Asn

530 535 540

Ile Asn Gln Lys Val Lys Phe Gln Pro Leu Ser Lys

545 550 555

<210> 3

<211> 201

<212> DNA

<213> human (H. sapiens)

<400> 3

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattaggtt cagtgtgatg gtggttgtga tgaatggttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg t 201

<210> 4

<211> 201

<212> DNA

<213> person (H. sapiens)

<400> 4

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatagttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg t 201

<210> 5

<211> 198

<212> DNA

<213> human (H. sapiens)

<400> 5

atggcaagtc aggaatttga agtagaagca attgttgata aacgtcaaga taaaaacggt 60

aatacccaat atctggttcg ttggaaaggt tatgataaac aggatgatac atgggaaccg 120

gaacagcatc tgatgaattg tgaaaaatgt gtgcatgatt tcaaccgtcg ccaaaccgaa 180

aaacagaaag gtggaagc 198

<210> 6

<211> 66

<212> PRT

<213> human (H. sapiens)

<400> 6

Met Ala Ser Gln Glu Phe Glu Val Glu Ala Ile Val Asp Lys Arg Gln

1 5 10 15

Asp Lys Asn Gly Asn Thr Gln Tyr Leu Val Arg Trp Lys Gly Tyr Asp

20 25 30

Lys Gln Asp Asp Thr Trp Glu Pro Glu Gln His Leu Met Asn Cys Glu

35 40 45

Lys Cys Val His Asp Phe Asn Arg Arg Gln Thr Glu Lys Gln Lys Gly

50 55 60

Gly Ser

65

<210> 7

<211> 579

<212> DNA

<213> person (H. sapiens)

<400> 7

atgaatggct gggtacctgt tggggctgcg tgtgagaagg ctgtgtatgt cttggatgag 60

ccggagccag ccatccgaaa gagctaccag gcggtagagc ggcatgggga gacaatccga 120

gtccgggaca ccgtccttct caaatcaggc ccacgaaaga cctccacacc ttatgtggcc 180

aagatctctg ccctctggga gaaccccgag tcaggagagc tgatgatgag cctcctgtgg 240

tattacagac ctgagcactt acagggaggc cgcagtccca gcatgcacga gcccttgcag 300

aatgaagtgt ttgcatcgcg acatcaggac cagaacagtg tggcctgcat tgaggagaag 360

tgctatgtgc tgacttttgc cgagtactgc aggttctgtg ccatggccaa gcgccgaggt 420

gaaggcctcc ccagccgaaa gacagcactg gttcccccct ctgcagacta ttccacccca 480

ccccaccgca cagtgccaga ggacacggac cctgagctgg tgttcctttg ccgccatgtc 540

tatgacttcc gccacgggcg catccttaag aacccccag 579

<210> 8

<211> 193

<212> PRT

<213> human (H. sapiens)

<400> 8

Met Asn Gly Trp Val Pro Val Gly Ala Ala Cys Glu Lys Ala Val Tyr

1 5 10 15

Val Leu Asp Glu Pro Glu Pro Ala Ile Arg Lys Ser Tyr Gln Ala Val

20 25 30

Glu Arg His Gly Glu Thr Ile Arg Val Arg Asp Thr Val Leu Leu Lys

35 40 45

Ser Gly Pro Arg Lys Thr Ser Thr Pro Tyr Val Ala Lys Ile Ser Ala

50 55 60

Leu Trp Glu Asn Pro Glu Ser Gly Glu Leu Met Met Ser Leu Leu Trp

65 70 75 80

Tyr Tyr Arg Pro Glu His Leu Gln Gly Gly Arg Ser Pro Ser Met His

85 90 95

Glu Pro Leu Gln Asn Glu Val Phe Ala Ser Arg His Gln Asp Gln Asn

100 105 110

Ser Val Ala Cys Ile Glu Glu Lys Cys Tyr Val Leu Thr Phe Ala Glu

115 120 125

Tyr Cys Arg Phe Cys Ala Met Ala Lys Arg Arg Gly Glu Gly Leu Pro

130 135 140

Ser Arg Lys Thr Ala Leu Val Pro Pro Ser Ala Asp Tyr Ser Thr Pro

145 150 155 160

Pro His Arg Thr Val Pro Glu Asp Thr Asp Pro Glu Leu Val Phe Leu

165 170 175

Cys Arg His Val Tyr Asp Phe Arg His Gly Arg Ile Leu Lys Asn Pro

180 185 190

Gln

<210> 9

<211> 207

<212> DNA

<213> person (H. sapiens)

<400> 9

atggagtatc aggatgggaa ggagtttgga ataggggacc tcgtgtgggg aaagatcaag 60

ggcttctcct ggtggcccgc catggtggtg tcttggaagg ccacctccaa gcgacaggct 120

atgtctggca tgcggtgggt ccagtggttt ggcgatggca agttctccga ggtctctgca 180

gacaaactgg tggcactggg gctgttc 207

<210> 10

<211> 69

<212> PRT

<213> human (H. sapiens)

<400> 10

Met Glu Tyr Gln Asp Gly Lys Glu Phe Gly Ile Gly Asp Leu Val Trp

1 5 10 15

Gly Lys Ile Lys Gly Phe Ser Trp Trp Pro Ala Met Val Val Ser Trp

20 25 30

Lys Ala Thr Ser Lys Arg Gln Ala Met Ser Gly Met Arg Trp Val Gln

35 40 45

Trp Phe Gly Asp Gly Lys Phe Ser Glu Val Ser Ala Asp Lys Leu Val

50 55 60

Ala Leu Gly Leu Phe

65

<210> 11

<211> 417

<212> DNA

<213> human (H. sapiens)

<400> 11

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatagttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg tggcagcagc ggcagcagca gcggtgcaga agaatcagat 240

gatgaaaatg cagtttgtgc agcacagaat tgtcagcgcc cgtgtaaaga taaagttgat 300

tgggttcagt gtgatggtgg ttgtgatgaa tagtttcatc aggtttgtgt tggtgttagc 360

ccggaaatgg cagaaaatga agattatatt tgcatcaact gcgcaaaaaa acagggt 417

<210> 12

<211> 636

<212> DNA

<213> human (H. sapiens)

<400> 12

atgagcggtg cagaagaatc agatgatgaa aatgcagttt gtgcagcaca gaattgtcag 60

cgcccgtgta aagataaagt tgattgggtt cagtgtgatg gtggttgtga tgaatggttt 120

catcaggttt gtgttggtgt tagcccggaa atggcagaaa atgaagatta tatttgcatc 180

aactgcgcaa aaaaacaggg tggcagcagc ggcagcagca gcggtgcaga agaatcagat 240

gatgaaaatg cagtttgtgc agcacagaat tgtcagcgcc cgtgtaaaga taaagttgat 300

tgggttcagt gtgatggtgg ttgtgatgaa tggtttcatc aggtttgtgt tggtgttagc 360

ccggaaatgg cagaaaatga agattatatt tgcatcaact gcgcaaaaaa acagggtctg 420

gtgccgcgcg gcagcagcag cggtgcagaa gaatcagatg atgaaaatgc agtttgtgca 480

gcacagaatt gtcagcgccc gtgtaaagat aaagttgatt gggttcagtg tgatggtggt 540

tgtgatgaat ggtttcatca ggtttgtgtt ggtgttagcc cggaaatggc agaaaatgaa 600

gattatattt gcatcaactg cgcaaaaaaa cagggt 636

<210> 13

<211> 747

<212> DNA

<213> Soybean (Glycine max)

<400> 13

ggaaagtctt acccaactgt gagtgctgat taccaggacg ccgttgagaa ggcgaagaag 60

aagctcagag gcttcatcgc tgagaagaga tgcgctcctc taatgctccg tttggcattc 120

cactctgctg gaacctttga caagggcacg aagaccggtg gacccttcgg aaccatcaag 180

caccctgccg aactggctca cagcgctaac aacggtcttg acatcgctgt taggcttttg 240

gagccactca aggcggagtt ccctattttg agctacgccg atttctacca gttggctggc 300

gttgttgccg ttgaggtcac gggtggacct aaggttccat tccaccctgg aagagaggac 360

aagcctgagc caccaccaga gggtcgcttg cccgatccca ctaagggttc tgaccatttg 420

agagatgtgt ttggcaaagc tatggggctt actgaccaag atatcgttgc tctatctggg 480

ggtcacacta ttggagctgc acacaaggag cgttctggat ttgagggtcc ctggacctct 540

aatcctctta ttttcgacaa ctcatacttc acggagttgt tgagtggtga gaaggaaggt 600

ctccttcagc taccttctga caaggctctt ttgtctgacc ctgtattccg ccctctcgtt 660

gacaaatatg cagcggacga agatgccttc tttgctgatt acgctgaggc tcaccaaaag 720

ctttccgagc ttgggtttgc tgatgcc 747

<210> 14

<211> 249

<212> PRT

<213> Soybean (Glycine max)

<400> 14

Gly Lys Ser Tyr Pro Thr Val Ser Ala Asp Tyr Gln Asp Ala Val Glu

1 5 10 15

Lys Ala Lys Lys Lys Leu Arg Gly Phe Ile Ala Glu Lys Arg Cys Ala

20 25 30

Pro Leu Met Leu Arg Leu Ala Phe His Ser Ala Gly Thr Phe Asp Lys

35 40 45

Gly Thr Lys Thr Gly Gly Pro Phe Gly Thr Ile Lys His Pro Ala Glu

50 55 60

Leu Ala His Ser Ala Asn Asn Gly Leu Asp Ile Ala Val Arg Leu Leu

65 70 75 80

Glu Pro Leu Lys Ala Glu Phe Pro Ile Leu Ser Tyr Ala Asp Phe Tyr

85 90 95

Gln Leu Ala Gly Val Val Ala Val Glu Val Thr Gly Gly Pro Lys Val

100 105 110

Pro Phe His Pro Gly Arg Glu Asp Lys Pro Glu Pro Pro Pro Glu Gly

115 120 125

Arg Leu Pro Asp Pro Thr Lys Gly Ser Asp His Leu Arg Asp Val Phe

130 135 140

Gly Lys Ala Met Gly Leu Thr Asp Gln Asp Ile Val Ala Leu Ser Gly

145 150 155 160

Gly His Thr Ile Gly Ala Ala His Lys Glu Arg Ser Gly Phe Glu Gly

165 170 175

Pro Trp Thr Ser Asn Pro Leu Ile Phe Asp Asn Ser Tyr Phe Thr Glu

180 185 190

Leu Leu Ser Gly Glu Lys Glu Gly Leu Leu Gln Leu Pro Ser Asp Lys

195 200 205

Ala Leu Leu Ser Asp Pro Val Phe Arg Pro Leu Val Asp Lys Tyr Ala

210 215 220

Ala Asp Glu Asp Ala Phe Phe Ala Asp Tyr Ala Glu Ala His Gln Lys

225 230 235 240

Leu Ser Glu Leu Gly Phe Ala Asp Ala

245

<210> 15

<211> 47

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<400> 15

taagatgggt agactgctac nnkccttttg gtcatccttc ttttgag 47

<210> 16

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 16

gtagcagtct acccatctta tctcc 25

<210> 17

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<400> 17

aagttcttgg ctgcggggtg nnkgaacaac aactggtcaa ttcagc 46

<210> 18

<211> 20

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 18

caccccgcag ccaagaactt 20

<210> 19

<211> 50

<212> DNA

<213> Artificial sequence (synthetic sequence)

<220>

<221> misc_feature

<222> (21)..(22)

<223> n is a, c, g, or t

<220>

<221> misc_feature

<222> (27)..(28)

<223> n is a, c, g, or t

<400> 19

accctatttt ccaccagctg nnkgccnnkc ggctcttctc caagcatgag 50

<210> 20

<211> 24

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 20

cagctggtgg aaaatagggt agtg 24

<210> 21

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 21

ctggtgccgc gcggcagcat gtcccctata ctaggttatt ggaaaa 46

<210> 22

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 22

gtggcgacca tcctccaaaa tgaagcatgc accattcctt 40

<210> 23

<211> 45

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 23

taagaaggag atatacatat gagcggtgca gaagaatcag atgat 45

<210> 24

<211> 44

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 24

catgctgccg cgcggcacca gaccctgttt ttttgcgcag ttga 44

<210> 25

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 25

tgaagcatgc accattcctt gc 22

<210> 26

<211> 58

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 26

catatgtata tctccttctt aaagttaaac aaaattattt ctagcccaaa aaaacggg 58

<210> 27

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 27

cgtgtaaaga taaagttgat taggttcagt gtgatggtgg ttgtga 46

<210> 28

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 28

atcaacttta tctttacacg ggcgc 25

<210> 29

<211> 27

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 29

tttcatcagg tttgtgttgg tgttagc 27

<210> 30

<211> 47

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 30

ccaacacaaa cctgatgaaa ctattcatca caaccaccat cacactg 47

<210> 31

<211> 59

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 31

actgcgcaaa aaaacagggt ggcagcagcg gcagcagcag cggtgcagaa gaatcagat 59

<210> 32

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 32

atgctgccgc gcggcaccag accctgtttt tttgcgcagt tg 42

<210> 33

<211> 39

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 33

accctgtttt tttgcgcagt tgatgcaaat ataatcttc 39

<210> 34

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 34

atgctgccgc gcggcaccag gcttccacct ttctgttttt cg 42

<210> 35

<211> 59

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 35

actgcgcaaa aaaacagggt ggcagcagcg gcagcagcgc aagtcaggaa tttgaagta 59

<210> 36

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 36

ggcagcagcg gcagcagcgt gagcaagggc gaggagctgt 40

<210> 37

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 37

catggtggcg accggtagcg ct 22

<210> 38

<211> 42

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 38

cgctaccggt cgccaccatg agcggtgcag aagaatcaga tg 42

<210> 39

<211> 43

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 39

cgctgctgcc gctgctgcca ccctgttttt ttgcgcagtt gat 43

<210> 40

<211> 43

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 40

ctgcacggaa gcttgccacc atggataaga agccgctgga tgt 43

<210> 41

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 41

tagtgatggt gatggtggtg tttgctaaga ggctgaaact tcacct 46

<210> 42

<211> 25

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 42

caccaccatc accatcacta aaccc 25

<210> 43

<211> 22

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 43

ggtggcaagc ttccgtgcag tt 22

<210> 44

<211> 23

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 44

taactagtcc actgagatcg acg 23

<210> 45

<211> 26

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 45

cttatcgtcg tcatccttgt agtcca 26

<210> 46

<211> 46

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 46

tctggcagcg gttctgctag cggaaagtct tacccaactg tgagtg 46

<210> 47

<211> 40

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 47

cgatctcagt ggactagtta ggcatcagca aacccaagct 40

<210> 48

<211> 41

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 48

acaaggatga cgacgataag agcggtgcag aagaatcaga t 41

<210> 49

<211> 51

<212> DNA

<213> Artificial sequence (synthetic sequence)

<400> 49

gctagcagaa ccgctgccag aaccgctgcc accctgtttt tttgcgcagt t 51

Claims

1. A method for improving cation-pi interaction by genetic code expansion is characterized in that tryptophan of an aromatic cage forming cation-pi interaction in a biological molecule is replaced by a tryptophan analogue by utilizing a genetic code expansion technology so as to improve the binding energy of the cation-pi interaction.

2. A method according to claim 1, characterized in that the method comprises the steps of:

s1, designing and synthesizing strong electron-donating side chain substituted tryptophan analogues, wherein the tryptophan analogues are unnatural amino acids and are selected from one of 6-methyl-tryptophan (A1), 6-methoxy-tryptophan (A2), 7-methyl-tryptophan (A3), 7-methoxy-tryptophan (A4), 6, 7-methoxy-tryptophan (A5), 6, 7-methyl-tryptophan (A6), 7, 8-dihydrofuran-tryptophan (A7), 6, 7-dihydrofuran-tryptophan (A8), 7, 8-furan-tryptophan (A9), 6, 7-furan-tryptophan (A10), 6, 7-dioxole-tryptophan (A11) or 6, 7-cyclopentane-tryptophan (A12), the structural formulas of the tryptophan analogs A1 to A12 are as follows:

s3, taking the biological molecule forming the cation-pi interaction as a research object, and specifically introducing tryptophan analogues into the biological molecule through the chimeric phenylalanine aminoacyl-tRNA synthetase mutant by utilizing the genetic code expansion technology to obtain the protein with the tryptophan analogues.

3. The method according to claim 1, characterized in that the synthesis of the tryptophan analogues is: indole B substituted at different positions is taken as a reactant to react to obtain a target product,

the chemical structural formula of the indole substituted at different positions is as follows:

the general structural formula of the target product is as follows:

wherein X is selected from: oxygen atom or carbon atom.

4. The method of claim 1, wherein: the method for synthesizing the tryptophan analogs A1 to A12 comprises the following steps:

(1) synthesis of compounds B6, B7, B8, B9, B10: aniline (G6, G7, G8, G9 or G10) and triethanolamine as reactants, and RuCl as a reaction product ₃ ·nH ₂ O，SnCl ₂ ·2H ₂ O and PPh ₃ As a catalyst, reacting in anhydrous dioxane to obtain a starting material compound B; (2) synthesis of compound B11, B12: aniline (G11 or G12), chloral hydrate and hydroxylamine hydrochloride are used as reactants, sulfuric acid is used as a catalyst, water is used as a solvent to obtain a crude product, the crude product is reacted with methanesulfonic acid to obtain an isatin product, and finally, a starting material compound B is obtained by reduction of lithium aluminum hydride;

5. The synthesis method of claim 4, wherein in the fourth step, the amount of the catalyst palladium acetate is 2% of the substrate (compound D) by molar weight;

the reaction temperature in the first step is 90 ℃, the reaction temperature in the second step is 0 ℃, the reaction temperature in the third step is 0 ℃, the reaction temperature in the fourth step is 40 ℃, the reaction temperature in the fifth step is 25 ℃ and the reaction temperature in the sixth step is 0 ℃.

6. The method of claim 1, wherein: in the step S2, the first step,

(1) constructing a saturated mutagenic gene library for amino acids in an amino acid binding pocket of the chimeric phenylalanyl-tRNA synthetase, and screening a chimeric phenylalanyl-tRNA synthetase mutant for specifically recognizing a tryptophan analogue;

(2) identifying the recognition efficiency and specificity of the phenylalanine aminoacyl-tRNA synthetase mutant by GFP fluorescence and LC-MS mass spectrometry;

(3) the obtained chimera phenylalanine aminoacyl-tRNA mutant is screened and applied to the expression of bacteria, cells, viruses and other hosts.

7. The method of claim 1, wherein: in the step S3, the first step,

(1) the tryptophan corresponding site for decoding the protein to form an aromatic cage is mutated into a stop codon (TAG),

(2) co-transformingly expressing the decoding protein mutant and the chimera phenylalanyl-tRNA synthetase mutant, adding corresponding tryptophan analogues in the expression process,

(3) the decoding protein variant was purified according to the GST-tag protein purification method, and the fidelity of the decoding protein variant was identified by LC-MS.

8. The method of claim 1, wherein: the nucleotide sequence and the amino acid sequence of the chimeric phenylalanine-tRNA synthetase mutant for recognizing the tryptophan analogue A1-A6 are respectively shown as SEQ ID NO: 1-2.

9. The method of claim 1, wherein: the method takes histone methylation decoding protein structural domain as a research object, and the decoding protein structural domain is any one of Chromo, PHD, PWWP, Tudor, MBT, CW, SPIN and BAH structural domain.

10. Use of the protein with tryptophan analogues obtained by the method of claim 1 to establish a super-parent recognition system for specifically recognizing histone methylation-modified decoded proteins.