CN113899902A

CN113899902A - Tyrosine phosphatase substrate identification method

Info

Publication number: CN113899902A
Application number: CN202010573873.7A
Authority: CN
Inventors: 范高峰; 庄敏; 张佳丽
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2022-01-07

Abstract

The invention relates to the technical field of biology, in particular to a method for identifying a tyrosine phosphatase substrate. The tyrosine phosphatase substrate identification method provided by the invention comprises the following steps: placing the system comprising the potential substrate for interaction with tyrosine phosphatase in the presence of a tyrosine phosphatase-PafA fusion protein, a pup protein, and labeling the pup protein; enriching potential substrates for interaction with tyrosine phosphatases by the labeled pup protein; the tyrosine phosphatase-PafA fusion protein comprises a tyrosine phosphatase fragment and a PafA fragment, wherein the tyrosine phosphatase fragment has a substrate capture mutant. The method and the system for identifying the tyrosine phosphatase substrate can popularize biological research aiming at the tyrosine phosphatase to any important tyrosine phosphatase family member, and have good industrialization prospect.

Description

Tyrosine phosphatase substrate identification method

Technical Field

The invention relates to the technical field of biology, in particular to a method for identifying a tyrosine phosphatase substrate.

Background

Signal transduction in an organism requires a very complex and precise balance, requiring the involvement of various protein adaptors, protein effectors, proteases and substrates, and other protein complexes to deliver extracellular signals to intracellular targets in a timely and accurate manner. There are many ways of controlling this equilibrium, which can be regulated by phosphorylation and dephosphorylation of important proteins in kinase and phosphatase regulated signal transduction processes. Therefore, the study of the dynamic equilibrium regulation of kinases and phosphatases in organisms is of great importance for the study of signal transduction in cells.

Protein-protein interactions have been a leading hot problem in biochemical and cell biological research. With the progress of research, the transient nature of most intracellular protein interactions has become more and more recognized, including the mode of action between tyrosine phosphatases and catalytic substrates. Conventional protein interaction detection methods, including co-immunoprecipitation combined with mass spectrometry, Fluorescence Resonance Energy Transfer (FRET), and yeast two-hybrid techniques, have failed to meet the need for real-time detection of protein-protein interactions. The proximity labeling based on ligase mediation is a novel technology which is reported in recent years and can convert the interaction between proteins (including weak interaction and transient interaction) into stable covalent linkage under the condition of living cells, and the emergence of the technology undoubtedly provides powerful technical guarantee for detecting the protein interaction. Currently, the widely applied proximity labeling technologies include BioID, APEX, etc., however, these two methods still have limitations: BioID background is too high and APEX requires the addition of hydrogen peroxide, which is damaging to cells.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, it is an object of the present invention to provide a method for identifying a substrate for tyrosine phosphatase, which solves the problems of the prior art.

To achieve the above and other related objects, according to one aspect of the present invention, there is provided a method for identifying a substrate for tyrosine phosphatase, comprising:

1) placing the system comprising the potential substrate for interaction with tyrosine phosphatase in the presence of a tyrosine phosphatase-PafA fusion protein, a pup protein, and labeling the pup protein;

2) enriching potential substrates for interaction with tyrosine phosphatases by the labeled pup protein;

the tyrosine phosphatase-PafA fusion protein comprises a tyrosine phosphatase fragment and a PafA fragment, wherein the tyrosine phosphatase fragment has a substrate capture mutant.

In some embodiments of the invention, the tyrosine phosphatase fragment is selected from the group consisting of a SHP1 fragment, a SHP2 fragment, a PTP1B fragment, a TCPTP fragment, a PTPRK fragment, and a CD45 fragment.

In some embodiments of the invention, in step 1), the system including the potential substrate interacting with the tyrosine phosphatase is the target cell, and the step 1) is specifically: the target cells were cultured in the presence of the tyrosine phosphatase-PafA fusion protein, the pup protein, and labeled with the pup protein.

In some embodiments of the invention, said step 1) further comprises; the cells obtained from the culture are lysed to provide a lysate.

In some embodiments of the invention, the tyrosine phosphatase-PafA fusion protein and/or the pup protein is expressed by a cell of interest.

In some embodiments of the invention, the pup protein comprises a pup fragment and a tag protein fragment, wherein the tag protein is selected from the group consisting of biotin tag proteins.

In some embodiments of the invention, in step 1), the pup protein is labeled with biotin.

In some embodiments of the invention, in step 2), the substrate potentially interacting with the target protein is enriched by the labeled pup protein based on the biotin-avidin system.

In another aspect, the invention provides a tyrosine phosphatase substrate identification system comprising a combination of a tyrosine phosphatase-PafA fusion protein comprising a tyrosine phosphatase fragment and a PafA fragment, and a pup protein, the tyrosine phosphatase fragment having a substrate capture mutation.

In some embodiments of the invention, a system of potential substrates for interaction with tyrosine phosphatases is also included.

In some embodiments of the invention, a marker for labeling the pup protein is also included.

In another aspect, the invention provides an expression system comprising a construct or genome of polynucleotides encoding tyrosine phosphatase-PafA fusion proteins and pup proteins having exogenous polynucleotides encoding tyrosine phosphatase-PafA fusion proteins and pup proteins integrated therein.

Drawings

FIG. 1 shows a schematic diagram of the principle of searching for a phosphatase substrate for PEPSI according to the invention.

FIG. 2 shows an analytical volcano plot of mass spectra results of SHP1-DA VS SHP1-WT of example 1 of the present invention.

FIG. 3 is a graph showing the results of experiments on the interaction of THEMIS and SHP1 in example 2 of the present invention.

FIG. 4 is a graph showing the experimental results of LCK phosphorylation THEMIS in example 3 of the present invention.

FIG. 5 is a graph showing the results of the experiment in which LCK acts on THEMIS Tyr34 site in example 4 of the present invention.

FIG. 6 is a graph showing the results of an experiment for dephosphorylating THEMIS according to example 5 of the present invention by SHP 1.

FIG. 7 is a schematic diagram showing the experimental results of the peptide fragment of THEMIS phosphorylated at the dephosphorylated Tyr34 site of SHP1 in example 6 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments, and other advantages and effects of the present invention will be apparent to those skilled in the art from the disclosure of the present specification.

The inventors of the present invention have made extensive experimental studies, and found that when a tyrosine phosphatase-PafA fusion protein having a substrate trapping mutation is used in combination with a phosphorylation site capable of interacting with a substrate in the surrounding environment, PafA catalyzes a pup protein to modify the pup protein to a lysine site of the substrate interacting with a tyrosine phosphatase fragment in the fusion protein, and can separate and identify the substrate of tyrosine phosphatase by labeling the pup protein, thereby providing a highly efficient method and system for identifying a substrate of tyrosine phosphatase.

In a first aspect, the present invention provides a method for identifying a substrate for tyrosine phosphatase, comprising:

the tyrosine phosphatase-PafA fusion protein comprises a tyrosine phosphatase fragment and a PafA fragment, wherein the tyrosine phosphatase fragment has a substrate trapping mutation (substrate trapping mutation).

In the method for identifying a tyrosine phosphatase substrate provided by the invention, the tyrosine phosphatase-PafA fusion protein can comprise a tyrosine phosphatase fragment. The tyrosine phosphatase fragment may be a fragment of various enzymes belonging to the tyrosine phosphatase family, for example, the tyrosine phosphatase fragment may be a SHP1(Gene ID:5777) fragment, a SHP2(Gene ID:5781) fragment, a PTP1B (Gene ID:5770) fragment, a TCPTP (Gene ID:5771) fragment, a PTPRK (Gene ID:5796) fragment, a CD45(Gene ID:5788) fragment, and the like. The tyrosine phosphatase fragment can be generally derived from human (homo sapiens) or mouse (mus musculus) and has high homology. As described above, the tyrosine phosphatase fragment in the tyrosine phosphatase-PafA fusion protein usually has a substrate bridging mutation, and the substrate bridging mutation usually means that the mutated tyrosine phosphatase fragment can still be combined with a substrate phosphorylation site, but the tyrosine phosphatase fragment is not easy to separate after being combined with the substrate due to the loss of the activity of the tyrosine phosphatase, and compared with the corresponding wild-type tyrosine phosphatase, the tyrosine phosphatase fragment with the substrate bridging mutation has stronger affinity with the substrate and longer combination time, thereby being used for searching substrates acting with the tyrosine phosphatase. In 1997 Andrew J.Flint et al found that the tyrosinase family all have highly conserved PTP domain comprising 27 invariant residues, which when mutated at one of the positions, change its spatial conformation, for example, D181A mutation of PTP1B makes it have substrate capture properties (DOI: 10.1073/pnas.94.5.1680). The different members of the tyrosine phosphatase family differ in their amino acid sequence and therefore differ in their substrate tracking mutation sites, but are regularly reproducible, typically in highly conserved substrate-binding pockets in the PTP domain, either as aspartic acid, Asp, or cysteine, Cys sites. For example: the mutation sites of different phosphatases are different, such as SHP 1D 419A, PTP1B D181A, SHP 2D 425A, and TCPTP D182A. Specifically, the amino acid sequence of the tyrosine phosphatase fragment can comprise: a) an amino acid sequence as shown in one of SEQ ID NO. 1-4; or b) an amino acid sequence having a sequence similarity of 80% or more to the amino acid sequence represented by any one of SEQ ID nos. 1 to 4, and having the function of the amino acid sequence defined in a). Specifically, the amino acid sequence in b) specifically refers to: the polypeptide fragment which is obtained by substituting, deleting or adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2 or 3) amino acids to the amino acid sequence shown in any one of SEQ ID NO.1 to 4, or adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2 or 3) amino acids to the N-terminal and/or C-terminal and has the functions of the polypeptide fragment shown in any one of SEQ ID NO.1 to 4, for example, a polypeptide fragment which can bind to a substrate phosphorylation site but loses the activity of tyrosine phosphatase, and the bonding of the substrate and the substrate is not easy to separate. The amino acid sequence in b) can have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity with one of SEQ ID No. 1-4.

In the method for identifying a substrate for tyrosine phosphatase provided by the invention, the tyrosine phosphatase-PafA fusion protein can comprise a PafA fragment. The PafA fragment is typically derived from Corynebacterium glutamicum (Corynebacterium glutamicum). The amino acid sequence of the PafA fragment may include: c) an amino acid sequence shown as SEQ ID NO. 4; or d) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.4 and having the function of the amino acid sequence defined in c). Specifically, the amino acid sequence in d) specifically refers to: the amino acid sequence shown as SEQ ID No.4 is obtained by substituting, deleting or adding one or more (specifically 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2 or 3) amino acids, or by adding one or more (specifically, 1 to 50, 1 to 30, 1 to 20, 1 to 10, 1 to 5, 1 to 3, 1, 2, or 3) amino acids to the N-terminus and/or C-terminus, and has the function of the polypeptide fragment with the amino acid shown as SEQ ID No.4, for example, can be a lysine site of a substrate capable of catalyzing the modification of the pup protein to interact with a tyrosine phosphatase fragment of the tyrosine phosphatase-PafA fusion protein. The amino acid sequence in d) may have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity to SEQ ID No. 4.

In the method for identifying a substrate for tyrosine phosphatase provided by the present invention, the pup protein may include a pup fragment. The Pup protein is usually derived from Corynebacterium glutamicum (Corynebacterium glutamicum). The amino acid sequence of the pup fragment may include: e) an amino acid sequence shown as SEQ ID NO. 5; or f) an amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.5 and having the function of the amino acid sequence defined in e). Specifically, the amino acid sequence in f) specifically refers to: the amino acid sequence shown as SEQ ID No.5 is obtained by substituting, deleting or adding one or more (specifically, 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids, or one or more (specifically, 1-50, 1-30, 1-20, 1-10, 1-5, 1-3, 1, 2, or 3) amino acids are added to the N-terminal and/or C-terminal of the polypeptide fragment shown as SEQ ID No.5, and the polypeptide fragment having the function of the polypeptide fragment shown as SEQ ID No.5 is, for example, a polypeptide fragment which can be catalyzed by a PafA fragment in a tyrosine phosphatase-PafA fusion protein to be modified at a lysine site of a substrate interacting with a tyrosine phosphatase fragment in the tyrosine phosphatase-PafA fusion protein . The amino acid sequence in f) may have more than 80%, 85%, 90%, 93%, 95%, 97%, or 99% similarity to SEQ ID No. 5.

The method for identifying the tyrosine phosphatase substrate provided by the invention can comprise the following steps: the system comprising the potential substrate for interaction with tyrosine phosphatase is placed in the presence of a tyrosine phosphatase-PafA fusion protein, the pup protein, and the pup protein is labeled. The system comprising a substrate potentially interacting with tyrosine phosphatase generally means various systems which may comprise a substrate capable of interacting with tyrosine phosphatase, for example, a target cell or the like. Interacting a system comprising a substrate potentially interacting with tyrosine phosphatase with a tyrosine phosphatase-PafA fusion protein and a pup protein, binding a tyrosine phosphatase fragment of the tyrosine phosphatase-PafA fusion protein to a substrate possibly present in the system and capable of interacting with tyrosine phosphatase, and catalyzing the pup protein by the PafA fragment to modify the pup protein to a lysine site of the substrate capable of interacting with the tyrosine phosphatase fragment of the fusion protein, so that the substrate is labeled with the fusion protein as a whole, and the label on the pup protein is retained after separation of the substrate from the tyrosine phosphatase fragment, so that, by subsequent processing, if the system of the substrate potentially interacting with tyrosine phosphatase does comprise a substrate capable of interacting with tyrosine phosphatase, then the substrate capable of interacting with tyrosine phosphatase can be enriched by the labeled pup protein.

In the method for identifying the tyrosine phosphatase substrate provided by the invention, the system comprising the potential substrate capable of interacting with the tyrosine phosphatase can be the target cell, namely, the target cell can be placed in the presence of the tyrosine phosphatase-PafA fusion protein and the pup protein, so that the substrate capable of interacting with the tyrosine phosphatase, which is possibly arranged in the target cell, can interact with the tyrosine phosphatase-PafA fusion protein and the pup protein, and if the target cell does comprise the substrate capable of interacting with the tyrosine phosphatase, the substrate capable of interacting with the tyrosine phosphatase can be enriched by the labeled pup protein. The target cell may be generally a eukaryotic cell or the like, more specifically a mammalian cell or the like, for example, a human cell, a mouse cell, a rat cell, a hamster cell or the like, and in one embodiment of the invention, the target cell may be Jurkat, HEK293, Hela, HEPG2, CAOV4, K562, MCF7, RAW264.7, C2C12, 3T3L1, CHO-K1 or the like.

In the method for identifying a tyrosine phosphatase substrate provided by the invention, when the system comprising the substrate potentially interacting with the tyrosine phosphatase is a target cell, the target cell can be generally enabled to express the tyrosine phosphatase-PafA fusion protein and/or the pup protein, so that the target cell is placed in the presence of the tyrosine phosphatase-PafA fusion protein and the pup protein. Suitable methods for expressing the tyrosine phosphatase-PafA fusion protein and/or the pup protein in the target cell will be known to those skilled in the art, for example, a construct comprising a polynucleotide encoding the tyrosine phosphatase-PafA fusion protein and the pup protein can be transfected into the target cell, or a foreign polynucleotide encoding the tyrosine phosphatase-PafA fusion protein and the pup protein can be integrated into the genome of the target cell. Furthermore, when the system comprising the potential substrate for interaction with tyrosine phosphatase is a target cell, it is usually necessary to lyse the cells obtained by the culture to provide a lysate, so that the substrate capable of interacting with tyrosine phosphatase can be enriched by the labeled pup protein.

In the method for identifying a tyrosine phosphatase substrate provided by the invention, an appropriate labeling system is generally selected to label the pup protein, so that the potential substrate interacting with the tyrosine phosphatase can be enriched by the labeled pup protein. For example, the pup protein may include a pup fragment as described above, and may further include a tag protein fragment, and the tag protein may be biotin tag protein, tag protein FLAG, tag protein MYC, tag protein HA, and the like, and for example, the amino acid sequence of biotin tag protein may include the sequence shown in SEQ ID No.6, the amino acid sequence of tag protein FLAG may include the sequence shown in SEQ ID No.7, the amino acid sequence of tag protein MYC may include the sequence shown in SEQ ID No.8, and the amino acid sequence of tag protein HA may include the sequence shown in SEQ ID No. 9. When the pup protein includes a tag protein fragment, one skilled in the art can select an appropriate label to label the pup protein, and can select an appropriate method to enrich the substrate for potential interaction with tyrosine phosphatase by the labeled pup protein. For example, when the pup protein is a fusion protein comprising a biotin tag protein fragment, biotin can be enriched by the pup protein, and thus a substrate potentially interacting with a target protein can be enriched based on the biotin-avidin system. For another example, when the pup protein is a fusion protein including fragments of tag proteins such as FLAG, MYC, HA, etc., the substrates potentially interacting with the target protein can be enriched by the corresponding antibodies.

The method for identifying the tyrosine phosphatase substrate provided by the invention can further comprise the following steps: the enriched obtained substrate which interacts with the tyrosine phosphatase is identified to provide a substrate capable of interacting with the tyrosine phosphatase. Suitable methods for identifying the substrate will be known to those skilled in the art and may, for example, be by mass spectrometry or the like.

In a second aspect, the invention provides a tyrosine phosphatase substrate identification system comprising a combination of a tyrosine phosphatase-PafA fusion protein comprising a tyrosine phosphatase fragment and a PafA fragment, and a pup protein, the tyrosine phosphatase fragment having a substrate capture mutation. As described above, when the tyrosine phosphatase-PafA fusion protein with the substrate capture mutation is combined with a phosphorylation site capable of interacting with its substrate in the surrounding environment, PafA catalyzes the pup protein to modify it to a lysine site of the substrate interacting with the tyrosine phosphatase fragment in the fusion protein, and the substrate of tyrosine phosphatase can be isolated and identified by labeling the pup protein.

The tyrosine phosphatase substrate identification system provided by the invention can also comprise a marker for marking the pup protein and/or a system comprising a substrate potentially interacting with tyrosine phosphatase.

In a third aspect, the invention provides an expression system comprising a construct or genome of polynucleotides encoding tyrosine phosphatase-PafA fusion proteins and pup proteins having exogenous polynucleotides encoding tyrosine phosphatase-PafA fusion proteins and pup proteins integrated therein. The expression system may typically be the target cell to be investigated, which may include a substrate potentially interacting with a tyrosine phosphatase. The expression system can be constructed from a cell line corresponding to a target cell to be studied, and the cell can be a eukaryotic cell, more specifically a mammalian cell, such as human cell, mouse cell, rat cell, hamster cell, etc., and in one embodiment of the invention, the target cell can be Jurkat, HEK293, Hela, HEPG2, CAOV4, K562, MCF7, RAW264.7, C2C12, 3T3L1, CHO-K1 cell, etc.

The method and the system for identifying the tyrosine phosphatase substrate can be applied to various tyrosine phosphatases, explore the interaction proteome and the substrate thereof, and capture the transient interaction between the tyrosine phosphatase and the substrate thereof. The human tyrosine phosphatase family has 105 members, each member has important functions, and different cell lines can be selected for researching different functions of the same kind of phosphatase in different organs, so that the tyrosine phosphatase substrate identification method and system provided by the invention can popularize biological research aiming at the tyrosine phosphatase to any important tyrosine phosphatase family member, thereby being beneficial to the fields of cancer, obesity, diabetes, neurodegenerative diseases and other serious diseases and having good industrialization prospects.

The invention of the present application is further illustrated by the following examples, which are not intended to limit the scope of the present application.

Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts. These techniques are well described in the literature, and may be found in particular in the study of the MOLECULAR CLONING, Sambrook et al: a LABORATORY MANUAL, Second edition, Cold Spring Harbor LABORATORY Press, 1989and Third edition, 2001; ausubel et al, Current PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987and periodic updates; the series METHODS IN ENZYMOLOGY, Academic Press, San Diego; wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; (iii) METHODS IN ENZYMOLOGY, Vol.304, Chromatin (P.M.Wassarman and A.P.Wolffe, eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULAR BIOLOGY, Vol.119, chromatography Protocols (P.B.Becker, ed.) Humana Press, Totowa, 1999, etc.

Example 1

The cytoplasmic protein SHP1 encoded by PTPN6 gene belongs to non-receptor type tyrosine phosphatase in tyrosine phosphatase superfamily I, mainly comprises an N-SH2 structural domain, a C-SH2 structural domain and a PTP catalytic structural domain, wherein the dephosphorylation of a tyrosine site is mainly performed by a PTP catalytic activity center in signal transduction. The protein tyrosine phosphatase PTPN6/SHP1 is used as a signal regulation molecule and participates in physiological processes of cell proliferation and differentiation, cell apoptosis, intracellular signal transduction and the like. In 1975, studies by Green, M.C. et al found that the mutant phenotype called moteaten (me) in C57BL/6J mice was an autoimmune disease caused by a disturbance in the autoimmune system. The phenotype is caused by that the third exon of the SHP1 gene lacks a cytidine, thereby causing frame shift mutation, so that the transcribed and translated protein almost completely loses the dephosphorylation enzyme activity. In 1984, Shultz, L.D. et al again found a phenotype similar to that of moteaten, with the same symptoms, but the mice with this phenotype survived longer, hence the name moteaten viable (mev). mev is the partial loss of dephosphorylating enzyme activity of a protein transcribed and translated by insertion/deletion in the ninth exon of SHP1 gene. Complete loss of SHP1 function resulted in the development of a moteaten phenotype in mice with disturbed immune system leading to systemic skin ulceration, suggesting that SHP1 is critical for immune system development. While the T cell antigen receptor (TCR) is a receptor which is combined with the Major Histocompatibility Complex (MHC) on the surface of an antigen presenting cell on the surface of a T lymphocyte, various documents report that SHP1 can regulate T cell development, TCR signals and the like, and conditional knockout of SHP1 in a T cell expressing CD4 can enhance the strength of TCR signals, so that the number of apoptotic cells and the number of mature cells are increased in the development selection process, but the development process is not influenced, and the development process of T cells of a germline SHP1 knock-out mouse is damaged, which indicates that SHP1 plays a very important role in the development process of the T cells. However, the molecular mechanism of regulation of SHP1 activity in immune cells is not well understood, and little is known about the substrate of SHP 1.

In this example, SHP1, a protein tyrosine phosphatase, was used as a subject, and its substrate was searched in jurkat.t cell line by the PEPSI method (i.e., the method for identifying a substrate for tyrosine phosphatase provided by the present invention). As shown in FIG. 1, a schematic diagram of a substrate search using PEPSI is shown for SHP1 mapping mutant (SHP 1-D419A).

The Jurkat cell line for inducing and expressing the fusion protein of the biotin tag protein and the small protein pup is constructed, a plasmid (pHR-Tet3G rtTA; pHR-Tet3G Bio-PupE-IRES-BFP) of a Tet-On system containing the sequence (SEQ ID NO.10) of the fusion protein and a lentivirus packaging plasmid are transfected into HEK293T cells through liposome, the obtained lentivirus is infected with the Jurkat cells to obtain the Jurkat cell line for inducing and expressing the fusion protein of the pup and the biotin tag protein by Doxycycline, and the BFP is expressed while the pup is expressed, so that the positive cell line can be screened through BFP fluorescence. Then, a plasmid containing SHP1-WT + PafA-Myc-6His fusion protein and a plasmid containing SHP1-DA + PafA-Myc-6His fusion protein are respectively transfected into HEK293T cells together with a lentivirus packaging plasmid through liposome, and the obtained lentivirus is infected with the Jurkat cell line for inducing expression of pup as described above, so that the Jurkat cell line stably expressing the SHP1-WT + PafA-Myc-6His fusion protein or the SHP1-DA + PafA-Myc-6His fusion protein is obtained for mass spectrometry.

The amino acid sequence of the SHP1-WT + PafA-Myc-6His fusion protein is as follows:

whereas the SHP1-DA + PafA-Myc-6His fusion protein differs from SHP1-WT + PafA-Myc-6His in that the bold and underlined D becomes A.

After adjusting the cell concentration to 0.5 million/ml and starting cells of 10-20million per sample, the above cell line was treated with 2. mu.g/ml Doxycyline (SELLECK S4163) and 4. mu.m Biotin (Hu test 67000260), and the cells after drug addition were placed in an incubator at 37 ℃ and 5% carbon dioxide for 28-32h, so that pup could complete the process of labeling potential substrates, while the Biotin-tagged protein fused to pup could enrich Biotin, and thus could enrich potential substrates with streptavidin-conjugated magnetic beads. After labeling was completed, the cells were taken out from the incubator, collected by centrifugation, horizontally rotated at room temperature at 500g for 3min, and the supernatant was removed. Cells were lysed for 30-60min on ice using 900. mu.l of lysis buffer (50mM Tris 7.5; 200mM NaCl; 2% Triton; 0.1% SDS) supplemented with the protease inhibitor Cocktail (APExBIO K1007) per sample. 13000g was then centrifuged at 4 ℃ for 15min to remove the DNA from the lysate, and 800ul of the supernatant was added urea to a final concentration of 8M (0.384g) and DTT to a final concentration of 10mM (MDBio D023-5g), 56 ℃ (Thermomixer) for 1 h. IAM (iodoacetamide, currently available ABCONE I53892-25G) was added to a final concentration of 25mM, protected from light, and left at room temperature for 45 min. DTT was added to a final concentration of 25mM and allowed to stand at room temperature for 0.5 h. The pup-labeled protein was enriched by adding 50ul of beads to streptavidin beads (NEB S1420S), and then bound on a homogenizer at room temperature for 1 h. The beads were washed to remove non-specifically bound proteins, using 4 buffers in sequence, each time on a homogenizer and spun at room temperature for 5 min.

Buffer 1: 2 times (8M Urea; 50mM Tris 8.0; 200mM NaCl; 0.2% SDS)

Buffer 2: 2 times (8M Urea; 50mM Tris 8.0; 200mM NaCl)

Buffer 3: 2 times (50mM Tris 8.0; 0.5mM EDTA; 1mM DTT)

Buffer 4: 2 times (50mM ammonium bicarbonate)

The protein was digested to peptides by adding 6ug of pancreatin (promega V5113) and shaking overnight on a 37 ℃ Thermomixer instrument with shaking overnight on a shaker at 37 ℃ with 15ul of pancreatin +110ul of 50mM ammonium bicarbonate.

The overnight cleaved peptide fragments were desalted using ZipTip (Merck/Millipore ZTC18S096), the 37 ℃ overnight cleaved tube was removed, and the supernatant was transferred to a new EP tube with the aid of a magnet. 10% formic acid was added to a final concentration of 1% in order to inactivate the enzyme while precipitating the protein. After the addition, the pH is detected by using a pH test paper, and the pH is ensured to be less than 3. 100% acetonitrile soaks Ziptip, 200ul rifle head inhales acetonitrile and sheathes Ziptip, beats off acetonitrile slowly, makes ZipTip soak. Ziptip was equilibrated with 0.1% TFA (trifluoroacetic acid) and directly aspirated, then added to waste stream 2 times. The sample was then combined and blown 4-5 times, taking care that no air bubbles could be generated. The sample was then desalted by washing with 0.1% TFA, and the sample was removed after each inhalation and repeated 2-3 times. Finally, 50ul of 70% ACN-0.1% TFA was taken alone with a new 100ul pipette tip, followed by sheathing with Ziptip containing the sample, eluting the sample into a new EP tube, pipetting 2-3 times, and finally vacuum-drying the sample. The identification of the protein component was performed on the samples by mass spectrometry and the mass spectrometry data was analyzed using the perseus platform. unique peptides represent the specificity of peptide fragments, and the higher the value of the peptides is in a certain sense, the higher the content of the protein in a sample is, and the protein can be used as a reference index. In this case, the candidate genes were ranked according to unique peptides, THEMIS was selected as the candidate gene for further study, and the specific results are shown in fig. 2.

Example 2

The mass spectrum result is verified, and THEMIS-FLAG and SHP1-WT or SHP1-DA are cloned into pEF6/myc-His A vector by using a homologous recombination method to construct an overexpression plasmid.

The base sequence of SHP-WT is as follows:

atggtgaggtggtttcaccgagacctcagtgggctggatgcagagaccctgctcaagggccgaggtgtccacggtagcttcctggctcggcccagtcgcaagaaccagggtgacttctcgctctccgtcagggtgggggatcaggtgacccatattcggatccagaactcaggggatttctatgacctgtatggaggggagaagtttgcgactctgacagagctggtggagtactacactcagcagcagggtgtcctgcaggaccgcgacggcaccatcatccacctcaagtacccgctgaactgctccgatcccactagtgagaggtggtaccatggccacatgtctggcgggcaggcagagacgctgctgcaggccaagggcgagccctggacgtttcttgtgcgtgagagcctcagccagcctggagacttcgtgctttctgtgctcagtgaccagcccaaggctggcccaggctccccgctcagggtcacccacatcaaggtcatgtgcgagggtggacgctacacagtgggtggtttggagaccttcgacagcctcacggacctggtggagcatttcaagaagacggggattgaggaggcctcaggcgcctttgtctacctgcggcagccgtactatgccacgagggtgaatgcggctgacattgagaaccgagtgttggaactgaacaagaagcaggagtccgaggatacagccaaggctggcttctgggaggagtttgagagtttgcagaagcaggaggtgaagaacttgcaccagcgtctggaagggcagcggccagagaacaagggcaagaaccgctacaagaacattctcccctttgaccacagccgagtgatcctgcagggacgggacagtaacatccccgggtccgactacatcaatgccaactacatcaagaaccagctgctaggccctgatgagaacgctaagacctacatcgccagccagggctgtctggaggccacggtcaatgacttctggcagatggcgtggcaggagaacagccgtgtcatcgtcatgaccacccgagaggtggagaaaggccggaacaaatgcgtcccatactggcccgaggtgggcatgcagcgtgcttatgggccctactctgtgaccaactgcggggagcatgacacaaccgaatacaaactccgtaccttacaggtctccccgctggacaatggagacctgattcgggagatctggcattaccagtacctgagctggcccgaccatggggtccccagtgagcctgggggtgtcctcagcttcctggaccagatcaaccagcggcaggaaagtctgcctcacgcagggcccatcatcgtgcactgcagcgccggcatcggccgcacaggcaccatcattgtcatcgacatgctcatggagaacatctccaccaagggcctggactgtgacattgacatccagaagaccatccagatggtgcgggcgcagcgctcgggcatggtgcagacggaggcgcagtacaagttcatctacgtggccatcgcccagttcattgaaaccactaagaagaagctggaggtcctgcagtcgcagaagggccaggagtcggagtacgggaacatcacctatcccccagccatgaagaatgcccatgccaaggcctcccgcacctcgtccaaacacaaggaggatgtgtatgagaacctgcacactaagaacaagagggaggagaaagtgaagaagcagcggtcagcagacaaggagaagagcaagggttccctcaagaggaag(SEQ ID NO.12)

the base sequence of SHP1-DA is as follows:

atggtgaggtggtttcaccgagacctcagtgggctggatgcagagaccctgctcaagggccgaggtgtccacggtagcttcctggctcggcccagtcgcaagaaccagggtgacttctcgctctccgtcagggtgggggatcaggtgacccatattcggatccagaactcaggggatttctatgacctgtatggaggggagaagtttgcgactctgacagagctggtggagtactacactcagcagcagggtgtcctgcaggaccgcgacggcaccatcatccacctcaagtacccgctgaactgctccgatcccactagtgagaggtggtaccatggccacatgtctggcgggcaggcagagacgctgctgcaggccaagggcgagccctggacgtttcttgtgcgtgagagcctcagccagcctggagacttcgtgctttctgtgctcagtgaccagcccaaggctggcccaggctccccgctcagggtcacccacatcaaggtcatgtgcgagggtggacgctacacagtgggtggtttggagaccttcgacagcctcacggacctggtggagcatttcaagaagacggggattgaggaggcctcaggcgcctttgtctacctgcggcagccgtactatgccacgagggtgaatgcggctgacattgagaaccgagtgttggaactgaacaagaagcaggagtccgaggatacagccaaggctggcttctgggaggagtttgagagtttgcagaagcaggaggtgaagaacttgcaccagcgtctggaagggcagcggccagagaacaagggcaagaaccgctacaagaacattctcccctttgaccacagccgagtgatcctgcagggacgggacagtaacatccccgggtccgactacatcaatgccaactacatcaagaaccagctgctaggccctgatgagaacgctaagacctacatcgccagccagggctgtctggaggccacggtcaatgacttctggcagatggcgtggcaggagaacagccgtgtcatcgtcatgaccacccgagaggtggagaaaggccggaacaaatgcgtcccatactggcccgaggtgggcatgcagcgtgcttatgggccctactctgtgaccaactgcggggagcatgacacaaccgaatacaaactccgtaccttacaggtctccccgctggacaatggagacctgattcgggagatctggcattaccagtacctgagctggcccgcccatggggtccccagtgagcctgggggtgtcctcagcttcctggaccagatcaaccagcggcaggaaagtctgcctcacgcagggcccatcatcgtgcactgcagcgccggcatcggccgcacaggcaccatcattgtcatcgacatgctcatggagaacatctccaccaagggcctggactgtgacattgacatccagaagaccatccagatggtgcgggcgcagcgctcgggcatggtgcagacggaggcgcagtacaagttcatctacgtggccatcgcccagttcattgaaaccactaagaagaagctggaggtcctgcagtcgcagaagggccaggagtcggagtacgggaacatcacctatcccccagccatgaagaatgcccatgccaaggcctcccgcacctcgtccaaacacaaggaggatgtgtatgagaacctgcacactaagaacaagagggaggagaaagtgaagaagcagcggtcagcagacaaggagaagagcaagggttccctcaagaggaag(SEQ ID NO.13)

overexpression of the protein was achieved by integration of the plasmids of THEMIS-FLAG and SHP1-WT or SHP1-DA into Jurkat cells using electrical breakdown (i.e., electroporation) methods. Before electrotransfer, DNA of each sample (30 ug of plasmid in total per sample) is mixed well, centrifuged for 5min at 800g to collect cells, supernatant is discarded, the cells are resuspended to a final concentration of 40M/ml with OPti-MEM (Gibco,31985070), 300ul (12M) of resuspended cells are added to each mixed plasmid, the tube wall is flicked to mix them evenly, the mixed solution of the mixed plasmid and cells is allowed to stand at room temperature for 15min, and then transferred to an electric shock cup (Bio-rad, 1652088), the cell is allowed to stand for 15min in the electric shock cup by using a preset program of Jurkat cells in an apparatus of Bio-rad, parameters are adjusted to 250V and 0.4cm, the cells are allowed to stand for 15min after electric shock, the cells after electrotransfer are placed in 10ml of preheated culture medium, and then an incubator with a carbon dioxide concentration of 5% at 37 ℃ for 24-32 h.

Wherein the amino acid sequence of the fusion protein of THEMIS-FLAG is as follows: MALSLEEFVHSLDLRTLPRVLEIQAGIYLEGSIYEMFGNECCFSTGEVIKITGLKVKKIIAEICEQIEGCESLQPFELPMNFPGLFKIVADKTPYLTMEEITRTIHIGPSRLGHPCFYHQKDIKLENLIIKQGEQIMLNSVEEIDGEIMVSCAVARNHQTHSFNLPLSQEGEFYECEDERIYTLKEIVEWKIPKNRTRTVNLTDFSNKWDSTNPFPKDFYGTLILKPVYEIQGVMKFRKDIIRILPSLDVEVKDITDSYDANWFLQLLSTEDLFEMTSKEFPIVTEVIEAPEGNHLPQSILQPGKTIVIHKKYQASRILASEIRSNFPKRHFLIPTSYKGKFKRRPREFPTAYDLEIAKSEKEPLHVVATKAFHSPHDKLSSVSVGDQFLVHQSETTEVLCEGIKKVVNVLACEKILKKSYEAALLPLYMEGGFVEVIHDKKQYPISELCKQFRLPFNVKVSVRDLSIEEDVLAATPGLQLEEDITDSYLLISDFANPTECWEIPVGRLNMTVQLVSNFSRDAEPFLVRTLVEEITEEQYYMMRRYESSASHPPPRPPKHPSVEETKLTLLTLAEERTVDLPKSPKRHHVDITKKLHPNQAGLDSKVLIGSQNDLVDEEKERSNRGATAIAETFKNEKHQKPGLEPWKLMDYKDDDDKDI (SEQ ID NO.14)

The horizontal rotor was rotated at room temperature 300g, centrifuged for 5min to collect cells, added with 800ul of lysis buffer (20mM Hepes pH 7.5; 150mM NaCl; 1% NP40) to lyse cells, lysed on ice for 10min, centrifuged at 4 ℃ 15000rpm for 10min to remove DNA, collected as supernatant, 200ul of which was taken as INPUT sample to indicate transfection, the remaining 550ul was added with 8ul of FLAG beads (SigmaA2220-1ML) to enrich THEMIS protein by immunoprecipitation, and WB was used to detect whether proteins of SHP1-WT or SHP1-DA were also precipitated. The results are shown in FIG. 3, where LANE1 was the control cell transfected with 2.5ug of PEF6/myc-His A empty plasmid, LANE2 was the cell transfected with 1.25ug of THEMIS-FLAG over-expression plasmid and 1.25ug of PEF6/myc-HisA empty plasmid, LANE3 was the cell transfected with 1.25ug of SHP1-WT over-expression plasmid and 1.25ug of PEF6/myc-His A empty plasmid, LANE4 was the cell transfected with 1.25ug of SHP1-DA over-expression plasmid and 1.25ug of PEF6/myc-His A empty plasmid, LANE5 was the cell transfected with 1.25ug of SHP1-WT and 1.25ug of SHP1-DA over-expression plasmid, and LANE5 was the cell transfected with 1.25ug of THEMIS-FLAG 1 and 1.25ug of SHP1-DA over-empty plasmid. As can be seen from FIG. 3, THEMIS does interact with SHP1 protein, and SHP1-DA binds THEMIS more strongly than SHP 1-WT.

Example 3

SHP1 as phosphatase dephosphorylates substrate, firstly detecting THEMIS can not be phosphorylated, and LCK can phosphorylate THEMIS according to the prior literature report. Thus, LCK as well as THEMIS-WT-FLAG were expressed by lipofection in HEK293T cells. The liposome (Mirus Trans-IT 2020) and the plasmid were mixed in Opti-MEM, left to stand at room temperature for 20min, and then slowly added dropwise to the cell culture medium, and then placed in an incubator at 37 ℃ and 5% carbon dioxide concentration for about 18-24 h. The liposome is added with the volume which is twice of the mass of the plasmid (namely 1ug of the plasmid is added into 2ul of the liposome), the liposome can wrap DNA molecules into the liposome and phosphate radicals of nucleic acid through electrostatic interaction to form a DNA-lipid complex, and can also be adsorbed by cell membranes with negative charges on the surface, and then through fusion of the membranes or endocytosis of the cells, the DNA is transferred into the cells to form inclusion bodies or enter lysosomes, wherein a small part of the DNA can be released from the inclusion bodies, enters cytoplasm, and further enters nucleus for transcription and expression. 24h after transfection, medium supernatant was discarded, 800ul of lysate (20mM Hepes pH 7.5; 150mM NaCl; 1% NP40) was added to lyse the cells, the cells were lysed on ice for 10min, DNA was removed by centrifugation at 4 ℃ and 15000rpm for 10min, the supernatant was collected, 200ul of the supernatant was taken as an INPUT sample to indicate transfection, the remaining 550ul was added with 8ul of FLAG beads (Sigma A2220-1ML) to enrich the THEMIS protein by immunoprecipitation, and phosphorylation level of tyrosine site of the THEMIS protein was detected by WB. Experiments show that LCK can indeed phosphorylate THEMIS, the specific results are shown in FIG. 4, wherein LANE1 is a control cell transfected with 2.5ug PEF6/myc-His A unloaded plasmid, LANE2 is a cell transfected with 1.25ug THEMIS-WT-FLAG over-expression plasmid and 1.25ug PEF6/myc-His A unloaded plasmid, LANE3 is a cell transfected with 1.25ug LCK over-expression plasmid and 1.25ug PEF6/myc-His A unloaded plasmid, and LANE4 is a cell transfected with 1.25ug LCK and 1.25ug THEMIS-WT-FLAG over-expression plasmid simultaneously.

Wherein the amino acid sequence of the LCK protein is as follows: MGCGCSSHPE DDWMENIDVC ENCHYPIVPL DGKGTLLIRN GSEVRDPLVT YEGSNPPASP LQDNLVIALH SYEPSHDGDL GFEKGEQLRI LEQSGEWWKA QSLTTGQEGF IPFNFVAKAN SLEPEPWFFK NLSRKDAERQ LLAPGNTHGS FLIRESESTA GSFSLSVRDF DQNQGEVVKH YKIRNLDNGG FYISPRITFP GLHELVRHYT NASDGLCTRL SRPCQTQKPQ KPWWEDEWEV PRETLKLVER LGAGQFGEVW MGYYNGHTKV AVKSLKQGSM SPDAFLAEAN LMKQLQHQRL VRLYAVVTQE PIYIITEYME NGSLVDFLKT PSGIKLTINK LLDMAAQIAE GMAFIEERNY IHRDLRAANI LVSDTLSCKI ADFGLARLIE DNEYTAREGA KFPIKWTAPE AINYGTFTIK SDVWSFGILL TEIVTHGRIP YPGMTNPEVI QNLERGYRMV RPDNCPEELY QLMRLCWKER PEDRPTFDYL RSVLEDFFTA TEGQYQPQP (SEQ ID NO.15)

Example 4

The THEMIS protein has 19 tyrosine sites in total, 19 tyrosine (Y) sites are respectively mutated into a plasmid with single point mutation of phenylalanine (F) by a point mutation method, and the 19 different chemis YF mutants are detected to change the phosphorylation degree. LCK (SEQ ID NO.15), THEMIS-WT-FLAG (SEQ ID NO.14) and THEMIS 19 YF mutants (Y in the sequence of SEQ ID NO.14 is changed to F, respectively) were also expressed in HEK293T cells by lipofection, liposomes (Mirus Trans-IT 2020) and plasmids were mixed in Opti-MEM (Gibco31985070), left to stand at room temperature for 20min and then slowly dropped into the cell culture medium, cultured in an incubator at 37 ℃ and 5% carbon dioxide concentration for 24h, then the medium supernatant was discarded, 800ul was added to 800ul of lysate (20mM Hepes pH 7.5; 150mM NaCl; 1% NP40) to lyse the cells, lysed on ice for 10min, centrifuged at 4 ℃ and 15000rpm for 10min to remove DNA, the supernatant was collected, 200ul of which was taken as an INPUT sample to indicate transfection, the remaining 200ul of FLAads (Sigma A0-1) was added to the sample to indicate transfection, and the remaining 550ul was enriched by the IMIS protein immunoprecipitation method, the tyrosine phosphorylation level of the THEMIS protein is detected by WB to evaluate whether the phosphorylation degree of the THEMIS protein is influenced by mutating one site, thereby finding out the site where the THEMIS is phosphorylated by LCK. The specific results are shown in FIG. 5, in which EV is the control transfected with the empty plasmid, THEMIS WT only is the cell transfected with the THEMIS-WT overexpression plasmid and the empty plasmid equivalent to LCK, and the rest is the cell transfected with both LCK and the corresponding single point mutant THEMIS overexpression plasmid. As can be seen from FIG. 5, THEMIS Tyr34 is critical for its own phosphorylation, and once Tyr at position 34 is mutated, THEMIS is hardly phosphorylated any more by LCK.

Example 5

7.5ug of THEMIS plasmid and 7.5ug of LCK plasmid were simultaneously expressed in 293T cells (10cm dish) by lipofection, liposomes (Mirus Trans-IT 2020) and plasmids were mixed in Opti-MEM, left to stand at room temperature for 20min and slowly added dropwise to the cell culture medium, incubated at 37 ℃ in an incubator with 5% carbon dioxide concentration for 24h, then the medium supernatant was discarded, 1000ul of lysate (20mM Hepes pH 7.5; 150mM NaCl; 1% NP40) was added to lyse the cells, lysed on ice for 10min, centrifuged at 4 ℃ 15000rpm for 10min to remove DNA, the supernatant was collected, and 20ul of FLAG beads (Sigma A2220-1ML) were added to enrich the THEMIS protein by immunoprecipitation. SHP1 has two states of off-state and on-state, and in general, N-SH2 domain of SHP1 binds to PTP domain, thereby inhibiting its phosphatase activity. In addition, the results of in vitro enzyme activity experiments showed that the phosphatase activity of SHP1-EA (E74A, i.e., the amino acid at position 74 in the aforementioned sequence of SHP-WT was changed from E to A) in the activated state was the highest as compared with WT. Therefore, in the in vitro assay, the dephosphorylation test was carried out using SHP1-EA and inactivated SHP1-CS (C453S, i.e., the 453 th amino acid in the sequence of SHP-WT mentioned above is changed from C to S), FALG beads enriched in phosphorylated THEMIS protein were divided into 12 equal portions, reacted with purified SHP1-EA or SHP1-CS protein at 30 ℃ for 1h, and the amounts of added proteins were, in order from left to right: 0.25 ug; 0.5 ug; 1 ug; 2ug, SHP1-EA corresponds to the amount of protein added from left to right in SHP 1-CS. The experiment was stopped by adding protein loading buffer (250mM Tris-HCl (pH6.8); 10% (W/V) SDS; 0.5% (W/V) bromophenol blue; 50% (V/V) glycerol; 5% (W/V) beta-mercaptoethanol) and the protein was denatured by boiling at 95 ℃ for 5min, and then the phosphorylation level of THEMIS was measured by WB. The specific results are shown in FIG. 6. As can be seen from FIG. 6, SHP1 did dephosphorylate THEMIS, and the degree of phosphorylation of THEMIS decreased continuously with increasing concentration of SHP 1. This also verifies the mass spectrum results obtained by the PEPSI system, THEMIS indeed a novel substrate for SHP 1.

Example 6

In vitro experiments confirmed that SHP1 could indeed dephosphorylate THEMIS, and further confirmed whether its specific site of action was Tyr 34. Artificially synthesizing a Tyr34 site phosphorylation modified peptide segment: EGSI (p-Y) EMFGNECCFS, which was reacted after solubilization (20ng/ul) with purified SHP1-EA or SHP1-CS proteins at 30 ℃ for 1h, with an amount of 360ng peptide fragment added per sample, the amounts of protein added being, in order from left to right: 0.5 ug; 1 ug; 2ug, SHP1-EA corresponds to the amount of protein added from left to right in SHP 1-CS. The experiment was stopped by adding a protein loading buffer containing no bromophenol blue, and the protein was denatured by boiling at 95 ℃ for 5min, the reaction was dropped onto a nitrocellulose membrane, air-dried, and then blocked with a 2.5% BSA solution, and the degree of phosphorylation of the peptide fragment of THEMIS was detected using 4G10(pTyr) antibody. The specific results are shown in FIG. 7. As can be seen from FIG. 7, SHP1-EA was able to dephosphorylate pY34 the thesis peptide.

Taken together, LCK is able to phosphorylate THEMIS and its primary site of action is THEMIS Tyr34, whereas the phosphorylation of THEMIS Tyr34 is able to be abolished by SHP 1.

In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Sequence listing

<110> Shanghai science and technology university

<120> a method for identifying a substrate for tyrosine phosphatase

<160> 15

<170> SIPOSequenceListing 1.0

<210> 1

<211> 595

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 1

Met Val Arg Trp Phe His Arg Asp Leu Ser Gly Leu Asp Ala Glu Thr

1 5 10 15

Leu Leu Lys Gly Arg Gly Val His Gly Ser Phe Leu Ala Arg Pro Ser

20 25 30

Arg Lys Asn Gln Gly Asp Phe Ser Leu Ser Val Arg Val Gly Asp Gln

35 40 45

Val Thr His Ile Arg Ile Gln Asn Ser Gly Asp Phe Tyr Asp Leu Tyr

50 55 60

Gly Gly Glu Lys Phe Ala Thr Leu Thr Glu Leu Val Glu Tyr Tyr Thr

65 70 75 80

Gln Gln Gln Gly Val Leu Gln Asp Arg Asp Gly Thr Ile Ile His Leu

85 90 95

Lys Tyr Pro Leu Asn Cys Ser Asp Pro Thr Ser Glu Arg Trp Tyr His

100 105 110

Gly His Met Ser Gly Gly Gln Ala Glu Thr Leu Leu Gln Ala Lys Gly

115 120 125

Glu Pro Trp Thr Phe Leu Val Arg Glu Ser Leu Ser Gln Pro Gly Asp

130 135 140

Phe Val Leu Ser Val Leu Ser Asp Gln Pro Lys Ala Gly Pro Gly Ser

145 150 155 160

Pro Leu Arg Val Thr His Ile Lys Val Met Cys Glu Gly Gly Arg Tyr

165 170 175

Thr Val Gly Gly Leu Glu Thr Phe Asp Ser Leu Thr Asp Leu Val Glu

180 185 190

His Phe Lys Lys Thr Gly Ile Glu Glu Ala Ser Gly Ala Phe Val Tyr

195 200 205

Leu Arg Gln Pro Tyr Tyr Ala Thr Arg Val Asn Ala Ala Asp Ile Glu

210 215 220

Asn Arg Val Leu Glu Leu Asn Lys Lys Gln Glu Ser Glu Asp Thr Ala

225 230 235 240

Lys Ala Gly Phe Trp Glu Glu Phe Glu Ser Leu Gln Lys Gln Glu Val

245 250 255

Lys Asn Leu His Gln Arg Leu Glu Gly Gln Arg Pro Glu Asn Lys Gly

260 265 270

Lys Asn Arg Tyr Lys Asn Ile Leu Pro Phe Asp His Ser Arg Val Ile

275 280 285

Leu Gln Gly Arg Asp Ser Asn Ile Pro Gly Ser Asp Tyr Ile Asn Ala

290 295 300

Asn Tyr Ile Lys Asn Gln Leu Leu Gly Pro Asp Glu Asn Ala Lys Thr

305 310 315 320

Tyr Ile Ala Ser Gln Gly Cys Leu Glu Ala Thr Val Asn Asp Phe Trp

325 330 335

Gln Met Ala Trp Gln Glu Asn Ser Arg Val Ile Val Met Thr Thr Arg

340 345 350

Glu Val Glu Lys Gly Arg Asn Lys Cys Val Pro Tyr Trp Pro Glu Val

355 360 365

Gly Met Gln Arg Ala Tyr Gly Pro Tyr Ser Val Thr Asn Cys Gly Glu

370 375 380

His Asp Thr Thr Glu Tyr Lys Leu Arg Thr Leu Gln Val Ser Pro Leu

385 390 395 400

Asp Asn Gly Asp Leu Ile Arg Glu Ile Trp His Tyr Gln Tyr Leu Ser

405 410 415

Trp Pro Asp His Gly Val Pro Ser Glu Pro Gly Gly Val Leu Ser Phe

420 425 430

Leu Asp Gln Ile Asn Gln Arg Gln Glu Ser Leu Pro His Ala Gly Pro

435 440 445

Ile Ile Val His Cys Ser Ala Gly Ile Gly Arg Thr Gly Thr Ile Ile

450 455 460

Val Ile Asp Met Leu Met Glu Asn Ile Ser Thr Lys Gly Leu Asp Cys

465 470 475 480

Asp Ile Asp Ile Gln Lys Thr Ile Gln Met Val Arg Ala Gln Arg Ser

485 490 495

Gly Met Val Gln Thr Glu Ala Gln Tyr Lys Phe Ile Tyr Val Ala Ile

500 505 510

Ala Gln Phe Ile Glu Thr Thr Lys Lys Lys Leu Glu Val Leu Gln Ser

515 520 525

Gln Lys Gly Gln Glu Ser Glu Tyr Gly Asn Ile Thr Tyr Pro Pro Ala

530 535 540

Met Lys Asn Ala His Ala Lys Ala Ser Arg Thr Ser Ser Lys His Lys

545 550 555 560

Glu Asp Val Tyr Glu Asn Leu His Thr Lys Asn Lys Arg Glu Glu Lys

565 570 575

Val Lys Lys Gln Arg Ser Ala Asp Lys Glu Lys Ser Lys Gly Ser Leu

580 585 590

Lys Arg Lys

595

<210> 2

<211> 593

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Met Thr Ser Arg Arg Trp Phe His Pro Asn Ile Thr Gly Val Glu Ala

1 5 10 15

Glu Asn Leu Leu Leu Thr Arg Gly Val Asp Gly Ser Phe Leu Ala Arg

20 25 30

Pro Ser Lys Ser Asn Pro Gly Asp Phe Thr Leu Ser Val Arg Arg Asn

35 40 45

Gly Ala Val Thr His Ile Lys Ile Gln Asn Thr Gly Asp Tyr Tyr Asp

50 55 60

Leu Tyr Gly Gly Glu Lys Phe Ala Thr Leu Ala Glu Leu Val Gln Tyr

65 70 75 80

Tyr Met Glu His His Gly Gln Leu Lys Glu Lys Asn Gly Asp Val Ile

85 90 95

Glu Leu Lys Tyr Pro Leu Asn Cys Ala Asp Pro Thr Ser Glu Arg Trp

100 105 110

Phe His Gly His Leu Ser Gly Lys Glu Ala Glu Lys Leu Leu Thr Glu

115 120 125

Lys Gly Lys His Gly Ser Phe Leu Val Arg Glu Ser Gln Ser His Pro

130 135 140

Gly Asp Phe Val Leu Ser Val Arg Thr Gly Asp Asp Lys Gly Glu Ser

145 150 155 160

Asn Asp Gly Lys Ser Lys Val Thr His Val Met Ile Arg Cys Gln Glu

165 170 175

Leu Lys Tyr Asp Val Gly Gly Gly Glu Arg Phe Asp Ser Leu Thr Asp

180 185 190

Leu Val Glu His Tyr Lys Lys Asn Pro Met Val Glu Thr Leu Gly Thr

195 200 205

Val Leu Gln Leu Lys Gln Pro Leu Asn Thr Thr Arg Ile Asn Ala Ala

210 215 220

Glu Ile Glu Ser Arg Val Arg Glu Leu Ser Lys Leu Ala Glu Thr Thr

225 230 235 240

Asp Lys Val Lys Gln Gly Phe Trp Glu Glu Phe Glu Thr Leu Gln Gln

245 250 255

Gln Glu Cys Lys Leu Leu Tyr Ser Arg Lys Glu Gly Gln Arg Gln Glu

260 265 270

Asn Lys Asn Lys Asn Arg Tyr Lys Asn Ile Leu Pro Phe Asp His Thr

275 280 285

Arg Val Val Leu His Asp Gly Asp Pro Asn Glu Pro Val Ser Asp Tyr

290 295 300

Ile Asn Ala Asn Ile Ile Met Pro Glu Phe Glu Thr Lys Cys Asn Asn

305 310 315 320

Ser Lys Pro Lys Lys Ser Tyr Ile Ala Thr Gln Gly Cys Leu Gln Asn

325 330 335

Thr Val Asn Asp Phe Trp Arg Met Val Phe Gln Glu Asn Ser Arg Val

340 345 350

Ile Val Met Thr Thr Lys Glu Val Glu Arg Gly Lys Ser Lys Cys Val

355 360 365

Lys Tyr Trp Pro Asp Glu Tyr Ala Leu Lys Glu Tyr Gly Val Met Arg

370 375 380

Val Arg Asn Val Lys Glu Ser Ala Ala His Asp Tyr Thr Leu Arg Glu

385 390 395 400

Leu Lys Leu Ser Lys Val Gly Gln Gly Asn Thr Glu Arg Thr Val Trp

405 410 415

Gln Tyr His Phe Arg Thr Trp Pro Asp His Gly Val Pro Ser Asp Pro

420 425 430

Gly Gly Val Leu Asp Phe Leu Glu Glu Val His His Lys Gln Glu Ser

435 440 445

Ile Met Asp Ala Gly Pro Val Val Val His Cys Ser Ala Gly Ile Gly

450 455 460

Arg Thr Gly Thr Phe Ile Val Ile Asp Ile Leu Ile Asp Ile Ile Arg

465 470 475 480

Glu Lys Gly Val Asp Cys Asp Ile Asp Val Pro Lys Thr Ile Gln Met

485 490 495

Val Arg Ser Gln Arg Ser Gly Met Val Gln Thr Glu Ala Gln Tyr Arg

500 505 510

Phe Ile Tyr Met Ala Val Gln His Tyr Ile Glu Thr Leu Gln Arg Arg

515 520 525

Ile Glu Glu Glu Gln Lys Ser Lys Arg Lys Gly His Glu Tyr Thr Asn

530 535 540

Ile Lys Tyr Ser Leu Ala Asp Gln Thr Ser Gly Asp Gln Ser Pro Leu

545 550 555 560

Pro Pro Cys Thr Pro Thr Pro Pro Cys Ala Glu Met Arg Glu Asp Ser

565 570 575

Ala Arg Val Tyr Glu Asn Val Gly Leu Met Gln Gln Gln Lys Ser Phe

580 585 590

Arg

<210> 3

<211> 438

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 3

Met Glu Met Glu Lys Glu Phe Glu Gln Ile Asp Lys Ser Gly Ser Trp

1 5 10 15

Ala Ala Ile Tyr Gln Asp Ile Arg His Glu Ala Ser Asp Phe Pro Cys

20 25 30

Arg Val Ala Lys Leu Pro Lys Asn Lys Asn Arg Asn Arg Tyr Arg Asp

35 40 45

Val Ser Pro Phe Asp His Ser Arg Ile Lys Leu His Gln Glu Asp Asn

50 55 60

Asp Tyr Ile Asn Ala Ser Leu Ile Lys Met Glu Glu Ala Gln Arg Ser

65 70 75 80

Tyr Ile Leu Thr Gln Gly Pro Leu Pro Asn Thr Cys Gly His Phe Trp

85 90 95

Glu Met Val Trp Glu Gln Lys Ser Arg Gly Val Val Met Leu Asn Arg

100 105 110

Val Met Glu Lys Gly Ser Leu Lys Cys Ala Gln Tyr Trp Pro Gln Lys

115 120 125

Glu Glu Lys Glu Met Ile Phe Glu Asp Thr Asn Leu Lys Leu Thr Leu

130 135 140

Ile Ser Glu Asp Ile Lys Ser Tyr Tyr Thr Val Arg Gln Leu Glu Leu

145 150 155 160

Glu Asn Leu Thr Thr Gln Glu Thr Arg Glu Ile Leu His Phe His Tyr

165 170 175

Thr Thr Trp Pro Asp Phe Gly Val Pro Glu Ser Pro Ala Ser Phe Leu

180 185 190

Asn Phe Leu Phe Lys Val Arg Glu Ser Gly Ser Leu Ser Pro Glu His

195 200 205

Gly Pro Val Val Val His Cys Ser Ala Gly Ile Gly Arg Ser Gly Thr

210 215 220

Phe Cys Leu Ala Asp Thr Cys Leu Leu Leu Met Asp Lys Arg Lys Asp

225 230 235 240

Pro Ser Ser Val Asp Ile Lys Lys Val Leu Leu Glu Met Arg Lys Phe

245 250 255

Arg Met Gly Leu Ile Gln Thr Ala Asp Gln Leu Arg Phe Ser Tyr Leu

260 265 270

Ala Val Ile Glu Gly Ala Lys Phe Ile Met Gly Asp Ser Ser Val Gln

275 280 285

Asp Gln Trp Lys Glu Leu Ser His Glu Asp Leu Glu Pro Pro Pro Glu

290 295 300

His Ile Pro Pro Pro Pro Arg Pro Pro Lys Arg Ile Leu Glu Pro His

305 310 315 320

Asn Gly Lys Cys Arg Glu Phe Phe Pro Asn His Gln Trp Val Lys Glu

325 330 335

Glu Thr Gln Glu Asp Lys Asp Cys Pro Ile Lys Glu Glu Lys Gly Ser

340 345 350

Pro Leu Asn Ala Ala Pro Tyr Gly Ile Glu Ser Met Ser Gln Asp Thr

355 360 365

Glu Val Arg Ser Arg Val Val Gly Gly Ser Leu Arg Gly Ala Gln Ala

370 375 380

Ala Ser Pro Ala Lys Gly Glu Pro Ser Leu Pro Glu Lys Asp Glu Asp

385 390 395 400

His Ala Leu Ser Tyr Trp Lys Pro Phe Leu Val Asn Met Cys Val Ala

405 410 415

Thr Val Leu Thr Ala Gly Ala Tyr Leu Cys Tyr Arg Phe Leu Phe Asn

420 425 430

Ser Asn Thr Arg Trp Ala

435

<210> 4

<211> 482

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 4

Met Ser Thr Val Glu Ser Ala Leu Thr Arg Arg Ile Met Gly Ile Glu

1 5 10 15

Thr Glu Tyr Gly Leu Thr Phe Val Asp Gly Asp Ser Lys Lys Leu Arg

20 25 30

Pro Asp Glu Ile Ala Arg Arg Met Phe Arg Pro Ile Val Glu Lys Tyr

35 40 45

Ser Ser Ser Asn Ile Phe Ile Pro Asn Gly Ser Arg Leu Tyr Leu Asp

50 55 60

Val Gly Ser His Pro Glu Tyr Ala Thr Ala Glu Cys Asp Asn Leu Thr

65 70 75 80

Gln Leu Ile Asn Phe Glu Lys Ala Gly Asp Val Ile Ala Asp Arg Met

85 90 95

Ala Val Asp Ala Glu Glu Ser Leu Ala Lys Glu Asp Ile Ala Gly Gln

100 105 110

Val Tyr Leu Phe Lys Asn Asn Val Asp Ser Val Gly Asn Ser Tyr Gly

115 120 125

Cys His Glu Asn Tyr Leu Val Gly Arg Ser Met Pro Leu Lys Ala Leu

130 135 140

Gly Lys Arg Leu Met Pro Phe Leu Ile Thr Arg Gln Leu Ile Cys Gly

145 150 155 160

Ala Gly Arg Ile His His Pro Asn Pro Leu Asp Lys Gly Glu Ser Phe

165 170 175

Pro Leu Gly Tyr Cys Ile Ser Gln Arg Ser Asp His Val Trp Glu Gly

180 185 190

Val Ser Ser Ala Thr Thr Arg Ser Arg Pro Ile Ile Asn Thr Arg Asp

195 200 205

Glu Pro His Ala Asp Ser His Ser Tyr Arg Arg Leu His Val Ile Val

210 215 220

Gly Asp Ala Asn Met Ala Glu Pro Ser Ile Ala Leu Lys Val Gly Ser

225 230 235 240

Thr Leu Leu Val Leu Glu Met Ile Glu Ala Asp Phe Gly Leu Pro Ser

245 250 255

Leu Glu Leu Ala Asn Asp Ile Ala Ser Ile Arg Glu Ile Ser Arg Asp

260 265 270

Ala Thr Gly Ser Thr Leu Leu Ser Leu Lys Asp Gly Thr Thr Met Thr

275 280 285

Ala Leu Gln Ile Gln Gln Val Val Phe Glu His Ala Ser Lys Trp Leu

290 295 300

Glu Gln Arg Pro Glu Pro Glu Phe Ser Gly Thr Ser Asn Thr Glu Met

305 310 315 320

Ala Arg Val Leu Asp Leu Trp Gly Arg Met Leu Lys Ala Ile Glu Ser

325 330 335

Gly Asp Phe Ser Glu Val Asp Thr Glu Ile Asp Trp Val Ile Lys Lys

340 345 350

Lys Leu Ile Asp Arg Phe Ile Gln Arg Gly Asn Leu Gly Leu Asp Asp

355 360 365

Pro Lys Leu Ala Gln Val Asp Leu Thr Tyr His Asp Ile Arg Pro Gly

370 375 380

Arg Gly Leu Phe Ser Val Leu Gln Ser Arg Gly Met Ile Lys Arg Trp

385 390 395 400

Thr Thr Asp Glu Ala Ile Leu Ala Ala Val Asp Thr Ala Pro Asp Thr

405 410 415

Thr Arg Ala His Leu Arg Gly Arg Ile Leu Lys Ala Ala Asp Thr Leu

420 425 430

Gly Val Pro Val Thr Val Asp Trp Met Arg His Lys Val Asn Arg Pro

435 440 445

Glu Pro Gln Ser Val Glu Leu Gly Asp Pro Phe Ser Ala Val Asn Ser

450 455 460

Glu Val Asp Gln Leu Ile Glu Tyr Met Thr Val His Ala Glu Ser Tyr

465 470 475 480

Arg Ser

<210> 5

<211> 64

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 5

Met Asn Ala Lys Gln Thr Gln Ile Met Gly Gly Gly Gly Arg Asp Glu

1 5 10 15

Asp Asn Ala Glu Asp Ser Ala Gln Ala Ser Gly Gln Val Gln Ile Asn

20 25 30

Thr Glu Gly Val Asp Ser Leu Leu Asp Glu Ile Asp Gly Leu Leu Glu

35 40 45

Asn Asn Ala Glu Glu Phe Val Arg Ser Tyr Val Gln Lys Gly Gly Glu

50 55 60

<210> 6

<211> 84

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 6

Ala Gly Lys Ala Gly Glu Gly Glu Ile Pro Ala Pro Leu Ala Gly Thr

1 5 10 15

Val Ser Lys Ile Leu Val Lys Glu Gly Asp Thr Val Lys Ala Gly Gln

20 25 30

Thr Val Leu Val Leu Glu Ala Met Lys Met Glu Thr Glu Ile Asn Ala

35 40 45

Pro Thr Asp Gly Lys Val Glu Lys Val Leu Val Lys Glu Arg Asp Ala

50 55 60

Val Gln Gly Gly Gln Gly Leu Ile Lys Ile Gly Asp Tyr Asp Ile Pro

65 70 75 80

Thr Thr Ala Ser

<210> 7

<211> 8

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 7

Asp Tyr Lys Asp Asp Asp Asp Lys

1 5

<210> 8

<211> 10

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 8

Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu

1 5 10

<210> 9

<211> 9

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 9

Tyr Pro Tyr Asp Val Pro Asp Tyr Ala

1 5

<210> 10

<211> 450

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

atggccggga aggcaggcga gggagagatc cccgcaccct tggccggcac ggtcagcaaa 60

atcctggtca aggaaggcga caccgtgaag gctggacaga cggtgttggt actggaggcg 120

atgaagatgg agacagagat caatgccccg accgatggga aggtggagaa ggtgttggtt 180

aaggaaaggg acgccgtgca gggcggtcag ggactgatca agatcggcga ctacgacatc 240

ccgacaaccg ccagcatgaa cgcgaaacag acccagatca tgggtggcgg tggtcgtgac 300

gaagacaatg cggaagactc tgctcaggcg tctggtcagg ttcagatcaa taccgaaggt 360

gttgactctc tgctggacga aatcgacggc ctgctcgaaa acaacgcgga ggaattcgtt 420

cgttcttacg ttcagaaagg tggtgaataa 450

<210> 11

<211> 1112

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 11

Met Val Arg Trp Phe His Arg Asp Leu Ser Gly Leu Asp Ala Glu Thr

1 5 10 15

Leu Leu Lys Gly Arg Gly Val His Gly Ser Phe Leu Ala Arg Pro Ser

20 25 30

Arg Lys Asn Gln Gly Asp Phe Ser Leu Ser Val Arg Val Gly Asp Gln

35 40 45

Val Thr His Ile Arg Ile Gln Asn Ser Gly Asp Phe Tyr Asp Leu Tyr

50 55 60

Gly Gly Glu Lys Phe Ala Thr Leu Thr Glu Leu Val Glu Tyr Tyr Thr

65 70 75 80

Gln Gln Gln Gly Val Leu Gln Asp Arg Asp Gly Thr Ile Ile His Leu

85 90 95

Lys Tyr Pro Leu Asn Cys Ser Asp Pro Thr Ser Glu Arg Trp Tyr His

100 105 110

Gly His Met Ser Gly Gly Gln Ala Glu Thr Leu Leu Gln Ala Lys Gly

115 120 125

Glu Pro Trp Thr Phe Leu Val Arg Glu Ser Leu Ser Gln Pro Gly Asp

130 135 140

Phe Val Leu Ser Val Leu Ser Asp Gln Pro Lys Ala Gly Pro Gly Ser

145 150 155 160

Pro Leu Arg Val Thr His Ile Lys Val Met Cys Glu Gly Gly Arg Tyr

165 170 175

Thr Val Gly Gly Leu Glu Thr Phe Asp Ser Leu Thr Asp Leu Val Glu

180 185 190

His Phe Lys Lys Thr Gly Ile Glu Glu Ala Ser Gly Ala Phe Val Tyr

195 200 205

Leu Arg Gln Pro Tyr Tyr Ala Thr Arg Val Asn Ala Ala Asp Ile Glu

210 215 220

Asn Arg Val Leu Glu Leu Asn Lys Lys Gln Glu Ser Glu Asp Thr Ala

225 230 235 240

Lys Ala Gly Phe Trp Glu Glu Phe Glu Ser Leu Gln Lys Gln Glu Val

245 250 255

Lys Asn Leu His Gln Arg Leu Glu Gly Gln Arg Pro Glu Asn Lys Gly

260 265 270

Lys Asn Arg Tyr Lys Asn Ile Leu Pro Phe Asp His Ser Arg Val Ile

275 280 285

Leu Gln Gly Arg Asp Ser Asn Ile Pro Gly Ser Asp Tyr Ile Asn Ala

290 295 300

Asn Tyr Ile Lys Asn Gln Leu Leu Gly Pro Asp Glu Asn Ala Lys Thr

305 310 315 320

Tyr Ile Ala Ser Gln Gly Cys Leu Glu Ala Thr Val Asn Asp Phe Trp

325 330 335

Gln Met Ala Trp Gln Glu Asn Ser Arg Val Ile Val Met Thr Thr Arg

340 345 350

Glu Val Glu Lys Gly Arg Asn Lys Cys Val Pro Tyr Trp Pro Glu Val

355 360 365

Gly Met Gln Arg Ala Tyr Gly Pro Tyr Ser Val Thr Asn Cys Gly Glu

370 375 380

His Asp Thr Thr Glu Tyr Lys Leu Arg Thr Leu Gln Val Ser Pro Leu

385 390 395 400

Asp Asn Gly Asp Leu Ile Arg Glu Ile Trp His Tyr Gln Tyr Leu Ser

405 410 415

Trp Pro Asp His Gly Val Pro Ser Glu Pro Gly Gly Val Leu Ser Phe

420 425 430

Leu Asp Gln Ile Asn Gln Arg Gln Glu Ser Leu Pro His Ala Gly Pro

435 440 445

Ile Ile Val His Cys Ser Ala Gly Ile Gly Arg Thr Gly Thr Ile Ile

450 455 460

Val Ile Asp Met Leu Met Glu Asn Ile Ser Thr Lys Gly Leu Asp Cys

465 470 475 480

Asp Ile Asp Ile Gln Lys Thr Ile Gln Met Val Arg Ala Gln Arg Ser

485 490 495

Gly Met Val Gln Thr Glu Ala Gln Tyr Lys Phe Ile Tyr Val Ala Ile

500 505 510

Ala Gln Phe Ile Glu Thr Thr Lys Lys Lys Leu Glu Val Leu Gln Ser

515 520 525

Gln Lys Gly Gln Glu Ser Glu Tyr Gly Asn Ile Thr Tyr Pro Pro Ala

530 535 540

Met Lys Asn Ala His Ala Lys Ala Ser Arg Thr Ser Ser Lys His Lys

545 550 555 560

Glu Asp Val Tyr Glu Asn Leu His Thr Lys Asn Lys Arg Glu Glu Lys

565 570 575

Val Lys Lys Gln Arg Ser Ala Asp Lys Glu Lys Ser Lys Gly Ser Leu

580 585 590

Lys Arg Lys Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly

595 600 605

Ser Gly Gly Gly Ser Gly Met Ser Thr Val Glu Ser Ala Leu Thr Arg

610 615 620

Arg Ile Met Gly Ile Glu Thr Glu Tyr Gly Leu Thr Phe Val Asp Gly

625 630 635 640

Asp Ser Lys Lys Leu Arg Pro Asp Glu Ile Ala Arg Arg Met Phe Arg

645 650 655

Pro Ile Val Glu Lys Tyr Ser Ser Ser Asn Ile Phe Ile Pro Asn Gly

660 665 670

Ser Arg Leu Tyr Leu Asp Val Gly Ser His Pro Glu Tyr Ala Thr Ala

675 680 685

Glu Cys Asp Asn Leu Thr Gln Leu Ile Asn Phe Glu Lys Ala Gly Asp

690 695 700

Val Ile Ala Asp Arg Met Ala Val Asp Ala Glu Glu Ser Leu Ala Lys

705 710 715 720

Glu Asp Ile Ala Gly Gln Val Tyr Leu Phe Lys Asn Asn Val Asp Ser

725 730 735

Val Gly Asn Ser Tyr Gly Cys His Glu Asn Tyr Leu Val Gly Arg Ser

740 745 750

Met Pro Leu Lys Ala Leu Gly Lys Arg Leu Met Pro Phe Leu Ile Thr

755 760 765

Arg Gln Leu Ile Cys Gly Ala Gly Arg Ile His His Pro Asn Pro Leu

770 775 780

Asp Lys Gly Glu Ser Phe Pro Leu Gly Tyr Cys Ile Ser Gln Arg Ser

785 790 795 800

Asp His Val Trp Glu Gly Val Ser Ser Ala Thr Thr Arg Ser Arg Pro

805 810 815

Ile Ile Asn Thr Arg Asp Glu Pro His Ala Asp Ser His Ser Tyr Arg

820 825 830

Arg Leu His Val Ile Val Gly Asp Ala Asn Met Ala Glu Pro Ser Ile

835 840 845

Ala Leu Lys Val Gly Ser Thr Leu Leu Val Leu Glu Met Ile Glu Ala

850 855 860

Asp Phe Gly Leu Pro Ser Leu Glu Leu Ala Asn Asp Ile Ala Ser Ile

865 870 875 880

Arg Glu Ile Ser Arg Asp Ala Thr Gly Ser Thr Leu Leu Ser Leu Lys

885 890 895

Asp Gly Thr Thr Met Thr Ala Leu Gln Ile Gln Gln Val Val Phe Glu

900 905 910

His Ala Ser Lys Trp Leu Glu Gln Arg Pro Glu Pro Glu Phe Ser Gly

915 920 925

Thr Ser Asn Thr Glu Met Ala Arg Val Leu Asp Leu Trp Gly Arg Met

930 935 940

Leu Lys Ala Ile Glu Ser Gly Asp Phe Ser Glu Val Asp Thr Glu Ile

945 950 955 960

Asp Trp Val Ile Lys Lys Lys Leu Ile Asp Arg Phe Ile Gln Arg Gly

965 970 975

Asn Leu Gly Leu Asp Asp Pro Lys Leu Ala Gln Val Asp Leu Thr Tyr

980 985 990

His Asp Ile Arg Pro Gly Arg Gly Leu Phe Ser Val Leu Gln Ser Arg

995 1000 1005

Gly Met Ile Lys Arg Trp Thr Thr Asp Glu Ala Ile Leu Ala Ala Val

1010 1015 1020

Asp Thr Ala Pro Asp Thr Thr Arg Ala His Leu Arg Gly Arg Ile Leu

1025 1030 1035 1040

Lys Ala Ala Asp Thr Leu Gly Val Pro Val Thr Val Asp Trp Met Arg

1045 1050 1055

His Lys Val Asn Arg Pro Glu Pro Gln Ser Val Glu Leu Gly Asp Pro

1060 1065 1070

Phe Ser Ala Val Asn Ser Glu Val Asp Gln Leu Ile Glu Tyr Met Thr

1075 1080 1085

Val His Ala Glu Ser Tyr Arg Ser Glu Gln Lys Leu Ile Ser Glu Glu

1090 1095 1100

Asp Leu His His His His His His

1105 1110

<210> 12

<211> 1785

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

atggtgaggt ggtttcaccg agacctcagt gggctggatg cagagaccct gctcaagggc 60

cgaggtgtcc acggtagctt cctggctcgg cccagtcgca agaaccaggg tgacttctcg 120

ctctccgtca gggtggggga tcaggtgacc catattcgga tccagaactc aggggatttc 180

tatgacctgt atggagggga gaagtttgcg actctgacag agctggtgga gtactacact 240

cagcagcagg gtgtcctgca ggaccgcgac ggcaccatca tccacctcaa gtacccgctg 300

aactgctccg atcccactag tgagaggtgg taccatggcc acatgtctgg cgggcaggca 360

gagacgctgc tgcaggccaa gggcgagccc tggacgtttc ttgtgcgtga gagcctcagc 420

cagcctggag acttcgtgct ttctgtgctc agtgaccagc ccaaggctgg cccaggctcc 480

ccgctcaggg tcacccacat caaggtcatg tgcgagggtg gacgctacac agtgggtggt 540

ttggagacct tcgacagcct cacggacctg gtggagcatt tcaagaagac ggggattgag 600

gaggcctcag gcgcctttgt ctacctgcgg cagccgtact atgccacgag ggtgaatgcg 660

gctgacattg agaaccgagt gttggaactg aacaagaagc aggagtccga ggatacagcc 720

aaggctggct tctgggagga gtttgagagt ttgcagaagc aggaggtgaa gaacttgcac 780

cagcgtctgg aagggcagcg gccagagaac aagggcaaga accgctacaa gaacattctc 840

ccctttgacc acagccgagt gatcctgcag ggacgggaca gtaacatccc cgggtccgac 900

tacatcaatg ccaactacat caagaaccag ctgctaggcc ctgatgagaa cgctaagacc 960

tacatcgcca gccagggctg tctggaggcc acggtcaatg acttctggca gatggcgtgg 1020

caggagaaca gccgtgtcat cgtcatgacc acccgagagg tggagaaagg ccggaacaaa 1080

tgcgtcccat actggcccga ggtgggcatg cagcgtgctt atgggcccta ctctgtgacc 1140

aactgcgggg agcatgacac aaccgaatac aaactccgta ccttacaggt ctccccgctg 1200

gacaatggag acctgattcg ggagatctgg cattaccagt acctgagctg gcccgaccat 1260

ggggtcccca gtgagcctgg gggtgtcctc agcttcctgg accagatcaa ccagcggcag 1320

gaaagtctgc ctcacgcagg gcccatcatc gtgcactgca gcgccggcat cggccgcaca 1380

ggcaccatca ttgtcatcga catgctcatg gagaacatct ccaccaaggg cctggactgt 1440

gacattgaca tccagaagac catccagatg gtgcgggcgc agcgctcggg catggtgcag 1500

acggaggcgc agtacaagtt catctacgtg gccatcgccc agttcattga aaccactaag 1560

aagaagctgg aggtcctgca gtcgcagaag ggccaggagt cggagtacgg gaacatcacc 1620

tatcccccag ccatgaagaa tgcccatgcc aaggcctccc gcacctcgtc caaacacaag 1680

gaggatgtgt atgagaacct gcacactaag aacaagaggg aggagaaagt gaagaagcag 1740

cggtcagcag acaaggagaa gagcaagggt tccctcaaga ggaag 1785

<210> 13

<211> 1785

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 13

atggtgaggt ggtttcaccg agacctcagt gggctggatg cagagaccct gctcaagggc 60

cgaggtgtcc acggtagctt cctggctcgg cccagtcgca agaaccaggg tgacttctcg 120

ctctccgtca gggtggggga tcaggtgacc catattcgga tccagaactc aggggatttc 180

tatgacctgt atggagggga gaagtttgcg actctgacag agctggtgga gtactacact 240

cagcagcagg gtgtcctgca ggaccgcgac ggcaccatca tccacctcaa gtacccgctg 300

aactgctccg atcccactag tgagaggtgg taccatggcc acatgtctgg cgggcaggca 360

gagacgctgc tgcaggccaa gggcgagccc tggacgtttc ttgtgcgtga gagcctcagc 420

cagcctggag acttcgtgct ttctgtgctc agtgaccagc ccaaggctgg cccaggctcc 480

ccgctcaggg tcacccacat caaggtcatg tgcgagggtg gacgctacac agtgggtggt 540

ttggagacct tcgacagcct cacggacctg gtggagcatt tcaagaagac ggggattgag 600

gaggcctcag gcgcctttgt ctacctgcgg cagccgtact atgccacgag ggtgaatgcg 660

gctgacattg agaaccgagt gttggaactg aacaagaagc aggagtccga ggatacagcc 720

aaggctggct tctgggagga gtttgagagt ttgcagaagc aggaggtgaa gaacttgcac 780

cagcgtctgg aagggcagcg gccagagaac aagggcaaga accgctacaa gaacattctc 840

ccctttgacc acagccgagt gatcctgcag ggacgggaca gtaacatccc cgggtccgac 900

tacatcaatg ccaactacat caagaaccag ctgctaggcc ctgatgagaa cgctaagacc 960

tacatcgcca gccagggctg tctggaggcc acggtcaatg acttctggca gatggcgtgg 1020

caggagaaca gccgtgtcat cgtcatgacc acccgagagg tggagaaagg ccggaacaaa 1080

tgcgtcccat actggcccga ggtgggcatg cagcgtgctt atgggcccta ctctgtgacc 1140

aactgcgggg agcatgacac aaccgaatac aaactccgta ccttacaggt ctccccgctg 1200

gacaatggag acctgattcg ggagatctgg cattaccagt acctgagctg gcccgcccat 1260

ggggtcccca gtgagcctgg gggtgtcctc agcttcctgg accagatcaa ccagcggcag 1320

gaaagtctgc ctcacgcagg gcccatcatc gtgcactgca gcgccggcat cggccgcaca 1380

ggcaccatca ttgtcatcga catgctcatg gagaacatct ccaccaaggg cctggactgt 1440

gacattgaca tccagaagac catccagatg gtgcgggcgc agcgctcggg catggtgcag 1500

acggaggcgc agtacaagtt catctacgtg gccatcgccc agttcattga aaccactaag 1560

aagaagctgg aggtcctgca gtcgcagaag ggccaggagt cggagtacgg gaacatcacc 1620

tatcccccag ccatgaagaa tgcccatgcc aaggcctccc gcacctcgtc caaacacaag 1680

gaggatgtgt atgagaacct gcacactaag aacaagaggg aggagaaagt gaagaagcag 1740

cggtcagcag acaaggagaa gagcaagggt tccctcaaga ggaag 1785

<210> 14

<211> 660

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 14

Met Ala Leu Ser Leu Glu Glu Phe Val His Ser Leu Asp Leu Arg Thr

1 5 10 15

Leu Pro Arg Val Leu Glu Ile Gln Ala Gly Ile Tyr Leu Glu Gly Ser

20 25 30

Ile Tyr Glu Met Phe Gly Asn Glu Cys Cys Phe Ser Thr Gly Glu Val

35 40 45

Ile Lys Ile Thr Gly Leu Lys Val Lys Lys Ile Ile Ala Glu Ile Cys

50 55 60

Glu Gln Ile Glu Gly Cys Glu Ser Leu Gln Pro Phe Glu Leu Pro Met

65 70 75 80

Asn Phe Pro Gly Leu Phe Lys Ile Val Ala Asp Lys Thr Pro Tyr Leu

85 90 95

Thr Met Glu Glu Ile Thr Arg Thr Ile His Ile Gly Pro Ser Arg Leu

100 105 110

Gly His Pro Cys Phe Tyr His Gln Lys Asp Ile Lys Leu Glu Asn Leu

115 120 125

Ile Ile Lys Gln Gly Glu Gln Ile Met Leu Asn Ser Val Glu Glu Ile

130 135 140

Asp Gly Glu Ile Met Val Ser Cys Ala Val Ala Arg Asn His Gln Thr

145 150 155 160

His Ser Phe Asn Leu Pro Leu Ser Gln Glu Gly Glu Phe Tyr Glu Cys

165 170 175

Glu Asp Glu Arg Ile Tyr Thr Leu Lys Glu Ile Val Glu Trp Lys Ile

180 185 190

Pro Lys Asn Arg Thr Arg Thr Val Asn Leu Thr Asp Phe Ser Asn Lys

195 200 205

Trp Asp Ser Thr Asn Pro Phe Pro Lys Asp Phe Tyr Gly Thr Leu Ile

210 215 220

Leu Lys Pro Val Tyr Glu Ile Gln Gly Val Met Lys Phe Arg Lys Asp

225 230 235 240

Ile Ile Arg Ile Leu Pro Ser Leu Asp Val Glu Val Lys Asp Ile Thr

245 250 255

Asp Ser Tyr Asp Ala Asn Trp Phe Leu Gln Leu Leu Ser Thr Glu Asp

260 265 270

Leu Phe Glu Met Thr Ser Lys Glu Phe Pro Ile Val Thr Glu Val Ile

275 280 285

Glu Ala Pro Glu Gly Asn His Leu Pro Gln Ser Ile Leu Gln Pro Gly

290 295 300

Lys Thr Ile Val Ile His Lys Lys Tyr Gln Ala Ser Arg Ile Leu Ala

305 310 315 320

Ser Glu Ile Arg Ser Asn Phe Pro Lys Arg His Phe Leu Ile Pro Thr

325 330 335

Ser Tyr Lys Gly Lys Phe Lys Arg Arg Pro Arg Glu Phe Pro Thr Ala

340 345 350

Tyr Asp Leu Glu Ile Ala Lys Ser Glu Lys Glu Pro Leu His Val Val

355 360 365

Ala Thr Lys Ala Phe His Ser Pro His Asp Lys Leu Ser Ser Val Ser

370 375 380

Val Gly Asp Gln Phe Leu Val His Gln Ser Glu Thr Thr Glu Val Leu

385 390 395 400

Cys Glu Gly Ile Lys Lys Val Val Asn Val Leu Ala Cys Glu Lys Ile

405 410 415

Leu Lys Lys Ser Tyr Glu Ala Ala Leu Leu Pro Leu Tyr Met Glu Gly

420 425 430

Gly Phe Val Glu Val Ile His Asp Lys Lys Gln Tyr Pro Ile Ser Glu

435 440 445

Leu Cys Lys Gln Phe Arg Leu Pro Phe Asn Val Lys Val Ser Val Arg

450 455 460

Asp Leu Ser Ile Glu Glu Asp Val Leu Ala Ala Thr Pro Gly Leu Gln

465 470 475 480

Leu Glu Glu Asp Ile Thr Asp Ser Tyr Leu Leu Ile Ser Asp Phe Ala

485 490 495

Asn Pro Thr Glu Cys Trp Glu Ile Pro Val Gly Arg Leu Asn Met Thr

500 505 510

Val Gln Leu Val Ser Asn Phe Ser Arg Asp Ala Glu Pro Phe Leu Val

515 520 525

Arg Thr Leu Val Glu Glu Ile Thr Glu Glu Gln Tyr Tyr Met Met Arg

530 535 540

Arg Tyr Glu Ser Ser Ala Ser His Pro Pro Pro Arg Pro Pro Lys His

545 550 555 560

Pro Ser Val Glu Glu Thr Lys Leu Thr Leu Leu Thr Leu Ala Glu Glu

565 570 575

Arg Thr Val Asp Leu Pro Lys Ser Pro Lys Arg His His Val Asp Ile

580 585 590

Thr Lys Lys Leu His Pro Asn Gln Ala Gly Leu Asp Ser Lys Val Leu

595 600 605

Ile Gly Ser Gln Asn Asp Leu Val Asp Glu Glu Lys Glu Arg Ser Asn

610 615 620

Arg Gly Ala Thr Ala Ile Ala Glu Thr Phe Lys Asn Glu Lys His Gln

625 630 635 640

Lys Pro Gly Leu Glu Pro Trp Lys Leu Met Asp Tyr Lys Asp Asp Asp

645 650 655

Asp Lys Asp Ile

660

<210> 15

<211> 509

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 15

Met Gly Cys Gly Cys Ser Ser His Pro Glu Asp Asp Trp Met Glu Asn

1 5 10 15

Ile Asp Val Cys Glu Asn Cys His Tyr Pro Ile Val Pro Leu Asp Gly

20 25 30

Lys Gly Thr Leu Leu Ile Arg Asn Gly Ser Glu Val Arg Asp Pro Leu

35 40 45

Val Thr Tyr Glu Gly Ser Asn Pro Pro Ala Ser Pro Leu Gln Asp Asn

50 55 60

Leu Val Ile Ala Leu His Ser Tyr Glu Pro Ser His Asp Gly Asp Leu

65 70 75 80

Gly Phe Glu Lys Gly Glu Gln Leu Arg Ile Leu Glu Gln Ser Gly Glu

85 90 95

Trp Trp Lys Ala Gln Ser Leu Thr Thr Gly Gln Glu Gly Phe Ile Pro

100 105 110

Phe Asn Phe Val Ala Lys Ala Asn Ser Leu Glu Pro Glu Pro Trp Phe

115 120 125

Phe Lys Asn Leu Ser Arg Lys Asp Ala Glu Arg Gln Leu Leu Ala Pro

130 135 140

Gly Asn Thr His Gly Ser Phe Leu Ile Arg Glu Ser Glu Ser Thr Ala

145 150 155 160

Gly Ser Phe Ser Leu Ser Val Arg Asp Phe Asp Gln Asn Gln Gly Glu

165 170 175

Val Val Lys His Tyr Lys Ile Arg Asn Leu Asp Asn Gly Gly Phe Tyr

180 185 190

Ile Ser Pro Arg Ile Thr Phe Pro Gly Leu His Glu Leu Val Arg His

195 200 205

Tyr Thr Asn Ala Ser Asp Gly Leu Cys Thr Arg Leu Ser Arg Pro Cys

210 215 220

Gln Thr Gln Lys Pro Gln Lys Pro Trp Trp Glu Asp Glu Trp Glu Val

225 230 235 240

Pro Arg Glu Thr Leu Lys Leu Val Glu Arg Leu Gly Ala Gly Gln Phe

245 250 255

Gly Glu Val Trp Met Gly Tyr Tyr Asn Gly His Thr Lys Val Ala Val

260 265 270

Lys Ser Leu Lys Gln Gly Ser Met Ser Pro Asp Ala Phe Leu Ala Glu

275 280 285

Ala Asn Leu Met Lys Gln Leu Gln His Gln Arg Leu Val Arg Leu Tyr

290 295 300

Ala Val Val Thr Gln Glu Pro Ile Tyr Ile Ile Thr Glu Tyr Met Glu

305 310 315 320

Asn Gly Ser Leu Val Asp Phe Leu Lys Thr Pro Ser Gly Ile Lys Leu

325 330 335

Thr Ile Asn Lys Leu Leu Asp Met Ala Ala Gln Ile Ala Glu Gly Met

340 345 350

Ala Phe Ile Glu Glu Arg Asn Tyr Ile His Arg Asp Leu Arg Ala Ala

355 360 365

Asn Ile Leu Val Ser Asp Thr Leu Ser Cys Lys Ile Ala Asp Phe Gly

370 375 380

Leu Ala Arg Leu Ile Glu Asp Asn Glu Tyr Thr Ala Arg Glu Gly Ala

385 390 395 400

Lys Phe Pro Ile Lys Trp Thr Ala Pro Glu Ala Ile Asn Tyr Gly Thr

405 410 415

Phe Thr Ile Lys Ser Asp Val Trp Ser Phe Gly Ile Leu Leu Thr Glu

420 425 430

Ile Val Thr His Gly Arg Ile Pro Tyr Pro Gly Met Thr Asn Pro Glu

435 440 445

Val Ile Gln Asn Leu Glu Arg Gly Tyr Arg Met Val Arg Pro Asp Asn

450 455 460

Cys Pro Glu Glu Leu Tyr Gln Leu Met Arg Leu Cys Trp Lys Glu Arg

465 470 475 480

Pro Glu Asp Arg Pro Thr Phe Asp Tyr Leu Arg Ser Val Leu Glu Asp

485 490 495

Phe Phe Thr Ala Thr Glu Gly Gln Tyr Gln Pro Gln Pro

500 505

Claims

1. A method of identifying a tyrosine phosphatase substrate comprising:

2. The method for identifying a tyrosine phosphatase substrate according to claim 1, wherein the tyrosine phosphatase fragment is selected from the group consisting of a SHP1 fragment, a SHP2 fragment, a PTP1B fragment, a TCPTP fragment, a PTPRK fragment, and a CD45 fragment.

3. The method for identifying a substrate for tyrosine phosphatase as claimed in claim 1, wherein in step 1), the system comprising the substrate potentially interacting with tyrosine phosphatase is the target cell, and the step 1) comprises: the target cells were cultured in the presence of the tyrosine phosphatase-PafA fusion protein, the pup protein, and labeled with the pup protein.

4. A method of identifying a substrate for tyrosine phosphatase as claimed in claim 3, wherein said step 1) further comprises; the cells obtained from the culture are lysed to provide a lysate.

5. A method of identifying a tyrosine phosphatase substrate according to claim 3, wherein the tyrosine phosphatase-PafA fusion protein and/or the pup protein is expressed by the target cell.

6. The method for identifying a substrate for tyrosine phosphatase according to claim 1, wherein in the step 1), the pup protein comprises a pup fragment and a tag protein fragment, and the tag protein is selected from the group consisting of a biotin tag protein;

and/or, labeling the pup protein by biotin;

and/or, in the step 2), the substrates potentially interacting with the target protein are enriched by the labeled pup protein based on the biotin-avidin system.

7. A tyrosine phosphatase substrate identification system comprising a combination of a tyrosine phosphatase-PafA fusion protein comprising a tyrosine phosphatase fragment and a PafA fragment, said tyrosine phosphatase fragment having a substrate capture mutation, and a pup protein.

8. A tyrosine phosphatase substrate identification system according to claim 7 further comprising a system of potential substrates for interaction with tyrosine phosphatases.

9. The substrate identification system for tyrosine phosphatase according to claim 7, further comprising a marker for labeling the pup protein.

10. An expression system comprising a construct or genome of polynucleotides encoding tyrosine phosphatase-PafA fusion protein and pup protein having exogenous polynucleotides encoding tyrosine phosphatase-PafA fusion protein and pup protein integrated therein.