WO2020158947A1

WO2020158947A1 - Polypeptide

Info

Publication number: WO2020158947A1
Application number: PCT/JP2020/003795
Authority: WO
Inventors: 裕生上久保; 有吾林; 健大佐藤; 中村　浩之
Original assignee: Ｓｐｉｂｅｒ株式会社
Priority date: 2019-01-31
Filing date: 2020-01-31
Publication date: 2020-08-06
Also published as: JPWO2020158947A1

Abstract

The present invention pertains to a polypeptide having an amino acid sequence wherein the average aggregation free energy g(k), which is calculated in accordance with formula (1), per amino acid residue is -4.0 or less. [In formula (1): εp i,j(L) and εa i, j(L) represent energy calculated for a pair of amino acid residues; δi≤k<i+L is 1 if k falls within a range of from i to i+L-1, and 0 otherwise; δj≤k<j+L is 1 if k falls within a range of from j to j+L-1, and 0 otherwise; i and j correspond respectively to residue numbers in the amino acid sequence, with the start point being referred to as 1; and λ is 2.0.]

Description

Polypeptide

The present invention relates to polypeptides. The present invention also relates to fusion proteins containing the polypeptide.

Proteins may form a specific complex through the interaction between protein molecules (protein-protein interaction). The protein-protein interaction can be detected, for example, by the Two-hybrid method. In recent years, methods for predicting protein-protein interactions have also been reported (for example, Patent Document 1 and Non-Patent Document 1).

Japanese Patent No. 3588605

A variety of amino acid sequences involved in protein-protein interactions can be applied. For example, by adding a polypeptide having an amino acid sequence involved in protein-protein interaction to a desired protein, it is considered possible to control the association between the proteins.

It is considered that the higher-order structure of proteins (eg, secondary structure, tertiary structure) is also involved as one factor in the protein-protein interaction. The higher-order structure of a protein is easily affected by, for example, temperature, pH, and the presence of a denaturant, and there is a problem that the proteins cannot stably interact with each other depending on the environment in which the protein is placed.

The present invention aims to provide a polypeptide that can selectively interact even under denaturing conditions.

The present invention relates to the following inventions, for example.
[1]
A polypeptide having an amino acid sequence in which the average value of the aggregation free energy g(k) calculated by the following formula (1) per amino acid residue is −4.0 or less.

[In the formula (1), ε ^p _i,j (L) and ε ^a _i,j (L) are energies calculated for a pair of amino acid residues. δ _{i ≦k<i+L} is 1 when k belongs to i to i+L−1, and is 0 otherwise. δ _j≦k<j+L is 1 when k belongs to j to j+L−1, and is 0 otherwise. i and j correspond to the residue numbers in the amino acid sequence with the start point as 1. λ is 2.0. ]
[2]
A polypeptide comprising an amino acid sequence having 60% or more sequence identity with the amino acid sequence represented by any of SEQ ID NOs: 3 to 16.
[3]
The polypeptide according to [1] or [2], which comprises a tag sequence at either or both of the N-terminus and C-terminus.
[4]
The polypeptide according to [3], wherein the tag sequence comprises the amino acid sequence shown by SEQ ID NO:17.
[5]
A polypeptide comprising the amino acid sequence set forth in any of SEQ ID NOs: 3-16.
[6]
A polypeptide comprising the amino acid sequence set forth in any of SEQ ID NOs: 18-31.
[7]
A nucleic acid encoding the polypeptide according to any of [1] to [6].
[8]
A fusion protein comprising the amino acid sequence obtained by fusing the polypeptide according to any one of [1] to [6] with a protein.
[9]
The fusion protein according to [8], which comprises the amino acid sequence of the protein and the amino acid sequence of the polypeptide.
[10]
The fusion protein according to [8], wherein the amino acid sequence of the protein contains the amino acid sequence of the polypeptide.
[11]
The fusion protein according to [10], wherein the number of amino acid residues from the N-terminus or C-terminus of the protein to the fusion site is 10% or less with respect to the total number of amino acid residues of the protein.
[12]
The fusion protein according to any of [8] to [11], wherein the protein is a structural protein.
[13]
The fusion protein according to any of [8] to [12], wherein the protein is a spider silk protein.
[14]
A nucleic acid encoding the fusion protein according to any of [8] to [13].
[15]
A method for imparting a self-association ability to the above protein, which comprises fusing the protein according to any one of [1] to [6] with a protein.

According to the present invention, it is possible to provide a polypeptide that can selectively interact even under denaturing conditions.

The free energy of aggregation g(k) in the amino acid sequence (#GEN891, SEQ ID NO: 1) in which the tag sequence represented by SEQ ID NO: 17 is added to the N-terminus of the large spit duct dragline protein (ADF4) of Japanese spider Spider is used as a residue number. It is a graph plotted against. The aggregation free energy g(k) in the amino acid sequence (#GEN889, SEQ ID NO: 2) in which the tag sequence represented by SEQ ID NO: 17 is added to the N-terminal of the large spit duct dragline protein (ADF3) of the Japanese spider Spider is designated as the residue number. It is a graph plotted against. The supernatant and precipitate of the polypeptide containing SDS and guanidine thiocyanate (His-ADF3-NRC (SEQ ID NO: 20) or His-ADF3-shortNRC (SEQ ID NO: 22)) were treated with the presence of a denaturing agent (2-mercaptoethanol). 3 is a photograph showing the results of analysis by SDS-PAGE below. FIG. 3 is a photograph showing the result of SDS-PAGE analysis of each solution in the case of column purification of a polypeptide having the amino acid sequence represented by SEQ ID NO: 20 or 22, in the absence of a denaturing agent. SDS-PAGE analysis of polypeptide (His-ADF4-NRC (SEQ ID NO: 18)) solutions before addition of denaturant, after addition of denaturant (DTT), 3 hours, 6 hours and 12 hours after the start of dialysis It is a photograph which shows the result. Immediately after the addition of the denaturant (DTT) (0 hour), 1.5 hours, 3 hours, 6 hours, and 8 hours after the start of dialysis, the polypeptide (His-ADF3-9467 (SEQ ID NO: 30)) solution was added to SDS. -A photograph showing the result of analysis by the PAGE method.

Hereinafter, modes for carrying out the present invention will be described in detail. However, the present invention is not limited to the following embodiments.

The polypeptide according to this embodiment has an amino acid sequence in which the average value of the aggregation free energy g(k) calculated by the following formula (1) per amino acid residue is −4.0 or less.

[In the formula (1), ε ^p _i,j (L) and ε ^a _i,j (L) are energies calculated for a pair of amino acid residues. δ _{i ≦k<i+L} is 1 when k belongs to i to i+L−1, and is 0 otherwise. δ _j≦k<j+L is 1 when k belongs to j to j+L−1, and is 0 otherwise. i and j correspond to the residue numbers in the amino acid sequence with the start point as 1. λ is 2.0. ]

The cohesive free energy g(k) is a value calculated for the amino acid residue of residue number k. The average value of aggregation free energy g(k) per amino acid residue is the total value of aggregation free energy g(k) calculated for each amino acid residue in a certain amino acid sequence divided by the number of amino acid residues. Means the value

Since the polypeptide according to this embodiment has such a structure, it can selectively interact (self-associate) even under denaturing conditions.

The energies represented by ε ^p _i,j (L) and ε ^a _i,j (L) are calculated by the DSSP algorithm for high-quality globular proteins (top 500) registered in the protein database (PDB). It is a value calculated by the following formula (2) based on the secondary structure to be assigned, and specifically, it is shown in Table 1 (E ^p _ab ) and Table 2 (E ^a _ab ).

In Tables 1 and 2, the alphabets in the first column and the first row are the one-letter codes of amino acid residues.

The polypeptide according to the present embodiment has an amino acid sequence having an average value of aggregation free energy g(k) per amino acid residue of −4.0 or less (hereinafter, also referred to as “self-association sequence”). ), and in addition to the self-association sequence, it may have an amino acid sequence having an average value of aggregation free energy g(k) per amino acid residue of more than -4.0.

The average value of the aggregation free energy g(k) of the self-association sequence per amino acid residue is preferably −4.5 or less, more preferably −5.0 or less, and −5.5 or less. Is more preferable, it is even more preferably −6.0 or less, even more preferably −6.5 or less, and particularly preferably −7.0 or less.

As a specific example of the polypeptide according to the present embodiment, an amino acid sequence consisting of the 546th amino acid residue to the 582th amino acid residue of the amino acid sequence shown by SEQ ID NO: 1, the 547th amino acid sequence shown by SEQ ID NO: 2 The amino acid sequence consisting of the 579th amino acid residue from the amino acid residue of, and the amino acid sequences shown in SEQ ID NOs: 3 to 16 are mentioned.

The amino acid sequences represented by SEQ ID NOs: 3 to 8 are amino acid sequences containing a part or all of the self-association sequence in the amino acid sequence represented by SEQ ID NO: 1 or SEQ ID NO: 2.

A polypeptide according to another embodiment contains an amino acid sequence having 60% or more sequence identity with the amino acid sequence shown in any of SEQ ID NOs: 3 to 16. Since the polypeptide according to the present embodiment has such a constitution, it can selectively interact (self-associate) even in the presence of a denaturant.

The amino acid sequence represented by SEQ ID NO: 3 is the amino acid sequence of the non-repetitive region (NRC region) at the C-terminal of the large vesicle guideline protein (ADF4) of the spider (Araneus diadematus). The amino acid sequence represented by SEQ ID NO:4 is a partial sequence of the amino acid sequence represented by SEQ ID NO:3, and is an amino acid sequence corresponding to a portion in which the average value of the aggregation free energy g(k) per amino acid residue is particularly low. is there.

The amino acid sequence represented by SEQ ID NO: 5 is the amino acid sequence of the non-repetitive region at the C-terminus of the large vesicle guideline thread protein (ADF3) of the Japanese spider, Spiderwort. The amino acid sequence represented by SEQ ID NO: 6 is a partial sequence of the amino acid sequence represented by SEQ ID NO: 5, and is an amino acid sequence corresponding to a portion in which the average value of the aggregation free energy g(k) per amino acid residue is particularly low. is there.

The amino acid sequence represented by SEQ ID NO: 7 is an amino acid sequence in which helix 5 located at the most C-terminal side from the non-repetitive region at the C-terminal of the giant spider duct silkworm thread protein (ADF3) of the armpit spider is deleted. The amino acid sequence represented by SEQ ID NO:8 is a partial sequence of the amino acid sequence represented by SEQ ID NO:7, and is an amino acid sequence corresponding to a portion where the average value of the aggregation free energy g(k) per amino acid residue is particularly low. is there.

The amino acid sequences shown in SEQ ID NOs: 10, 12, 14 and 16 are amino acid sequences selected by the following procedure. First, a profile hidden Markov model (HMM) was created from the C-terminal amino acid sequence of a known large ampullate spidroin (MaSp) (specifically, the amino acid sequences represented by SEQ ID NOs: 27 to 191). An amino acid sequence was generated by executing the hmmemit program using the created HMM as an input value (10000 sequences). With respect to the generated amino acid sequence, the average value of the aggregation free energy g(k) per 1 amino acid residue was calculated for every 5 consecutive amino acid residues. An amino acid sequence having a calculated minimum value of −4.5 or less was selected.

The amino acid sequences shown in SEQ ID NOs: 9, 11, 13 and 15 are amino acid sequences obtained by adding an additional sequence to the N-terminal of the amino acid sequences shown in SEQ ID NOs: 10, 12, 14 and 16, respectively.

The amino acid sequences shown in SEQ ID NO: 3, SEQ ID NO: 5 and SEQ ID NO: 7 have at least 60% or more (63%) sequence identity with each other.

The polypeptide according to this embodiment may have 65% or more sequence identity with the amino acid sequence shown in any of SEQ ID NOs: 3 to 16, or 70% or more sequence identity. May have a sequence identity of 75% or more, may have a sequence identity of 80% or more, and may have a sequence identity of 85% or more. May have a sequence identity of 90% or more, may have a sequence identity of 95% or more, or may have a sequence identity of 98% or more. , 99% or more sequence identity may be sufficient.

The polypeptide according to this embodiment may include the amino acid sequence represented by any of SEQ ID NOS: 3 to 16 or may be the amino acid sequence represented by any of SEQ ID NOs: 3 to 16. Good.

The polypeptide according to the present embodiment may have a tag sequence added to either or both of the N-terminus and the C-terminus. This enables isolation, immobilization, detection, visualization, etc. of the polypeptide according to this embodiment.

As the tag sequence, for example, an affinity tag that utilizes specific affinity (binding, affinity) with another molecule can be mentioned. A specific example of the affinity tag is a histidine tag (His tag). The His tag is a short peptide in which about 4 to 10 histidine residues are lined up and has a property of specifically binding to a metal ion such as nickel. Therefore, the His tag according to this embodiment by metal chelating chromatography It can be used to isolate the polypeptide. Specific examples of the tag sequence include, for example, the amino acid sequence represented by SEQ ID NO: 17 (amino acid sequence containing His tag).

Alternatively, tag sequences such as glutathione-S-transferase (GST) that specifically binds to glutathione and maltose binding protein (MBP) that specifically binds to maltose can be used.

Furthermore, you can also use an "epitope tag" that utilizes the antigen-antibody reaction. By adding a peptide (epitope) showing antigenicity as a tag sequence, an antibody against the epitope can be bound. Examples of the epitope tag include HA (peptide sequence of influenza virus hemagglutinin) tag, myc tag, FLAG tag and the like. By using the epitope tag, the polypeptide according to this embodiment can be easily purified with high specificity.

Furthermore, a tag sequence that can be cleaved with a specific protease can also be used. By subjecting the polypeptide of the present embodiment adsorbed via the tag sequence to a protease treatment, the polypeptide of the present embodiment from which the tag sequence is cleaved can be recovered.

Specific examples of the polypeptide containing the tag sequence include, for example, the polypeptides containing the amino acid sequences represented by any of SEQ ID NOs: 18 to 31. The amino acid sequences represented by SEQ ID NOS: 18 to 31 are obtained by adding the amino acid sequences represented by SEQ ID NO: 17 (amino acid sequences including His tag) to the N-terminals of the amino acid sequences represented by SEQ ID NOS: 3 to 16, respectively.

The polypeptide according to one embodiment may include the amino acid sequence shown in any of SEQ ID NOs: 18 to 31, or may consist of the amino acid sequence shown in any of SEQ ID NOs: 18 to 31. Good.

The molecular weight of the polypeptide according to this embodiment is not particularly limited, but may be, for example, 1 kDa or more and 300 kDa or less. The molecular weight of the polypeptide according to the present embodiment may be, for example, 2 kDa or more, 3 kDa or more, 4 kDa or more, 5 kDa or more, 6 kDa or more, 7 kDa or more, 8 kDa or more, 9 kDa or more, or 10 kDa or more, for example, 250 kDa or less, It may be 200 kDa or less, 150 kDa or less, 100 kDa or less, 90 kDa or less, 80 kDa or less, 70 kDa or less, 60 kDa or less, 50 kDa or less, 40 kDa or less, 30 kDa or less, 20 kDa or less, or 15 kDa or less.

The number of amino acid residues of the polypeptide according to this embodiment is not particularly limited, but for example, 150 amino acid residues or less, 120 amino acid residues or less, 100 amino acid residues or less, 80 amino acid residues or less, 60 amino acid residues or less. , 40 amino acid residues or less, 30 amino acid residues or less, 25 amino acid residues or less, or 20 amino acid residues or less, for example, 5 amino acid residues or more, 10 amino acid residues or more, or 15 amino acid residues or more May be When the number of amino acid residues of the polypeptide according to this embodiment is small (for example, about 20 amino acid residues), for example, when a fusion protein containing the polypeptide according to this embodiment is produced by a microorganism or the like, Since the total number of amino acid residues is not significantly increased, the influence on the productivity can be reduced, and the aggregating property of the fusion protein can be increased or the aggregating property can be added.

The polypeptide according to this embodiment may be hydrophilic or hydrophobic. For the polypeptide according to the present embodiment, the sum of the hydrophobicity indices (hydropathic index, HI) of all amino acid residues constituting the polypeptide is calculated, and then the sum is divided by the total number of amino acid residues. It is preferable that (average HI, hereinafter also referred to as “hydrophobicity”) be −1.0 or more. Regarding the hydrophobicity index of amino acid residues, a known index (Hydropathy index: Kyte J, & Doolittle R (1982) “A simple method for dissipating the hydropathic character. 105-132). Specifically, the hydrophobicity index of each amino acid is as shown in Table 3 below.

The hydrophobicity of the polypeptide according to this embodiment may be −0.5 or more, 0 or more, 1 or more, 5 or more, or 10 or more, and 10 or less, 5 or less, or 1 or less.

The polypeptide according to the present embodiment is preferably one in which the polypeptides interact with each other to form a specific complex (that is, one having self-association ability) under denaturing conditions. And more preferably have self-association ability.

The polypeptide according to this embodiment may be an artificial polypeptide. As used herein, the term "artificial polypeptide" means an artificially produced polypeptide. Artificial polypeptides include, for example, polypeptides produced using gene recombination technology and chemically synthesized polypeptides.

(Denaturation conditions)
Denaturing conditions are conditions that can destroy the higher-order structure (eg, secondary structure, tertiary structure) of a protein. The denaturing conditions include, for example, the pH of the protein solution (eg, acidic conditions of pH 1 to 4 and alkaline condition of pH 10 to 14), the environmental temperature at which the protein is placed (eg, low temperature condition of −20° C. to 5° C., and The high temperature condition of 35° C. or higher), the presence of a modifier, and the like.

The denaturing agent includes, for example, a substance that breaks hydrogen bonds, hydrophobic bonds, ionic bonds, and disulfide bonds between protein molecules and/or intramolecularly. Specific examples of the denaturing agent include, for example, urea, guanidine salt (eg, guanidine thiocyanate), sodium dodecyl sulfate (SDS), 2-mercaptoethanol, tetrahydrofuran (THF), and dimethyl sulfoxide (DMSO), acetone, alcohol. Examples include polar solvents that are miscible with water, such as groups (eg, methanol, ethanol, isopropanol), dioxane, dimethylformamide (DMF), and acetonitrile.

(Method for producing polypeptide)
The polypeptide according to this embodiment can be produced by a conventional method using a nucleic acid encoding the polypeptide. The nucleic acid encoding the polypeptide may be chemically synthesized based on the nucleotide sequence information, or may be synthesized by utilizing the PCR method or the like.

[Fusion protein]
The fusion protein according to this embodiment includes an amino acid sequence in which the polypeptide according to the present invention and the protein are fused.

The aspect of fusion in the fusion protein according to the present embodiment is not particularly limited, and includes, for example, a fusion protein containing the amino acid sequence of the protein and the amino acid sequence of the polypeptide of the present invention (for example, the N-terminal of the amino acid sequence of the protein and Fusion protein containing the amino acid sequence of the polypeptide of the present invention in either or both of the C-terminals, and a fusion protein containing the amino acid sequence of the polypeptide of the present invention in the amino acid sequence of the protein ( For example, it may be a fusion protein containing the amino acid sequence of the polypeptide of the present invention in the middle of the amino acid sequence of the protein.

In a fusion protein containing the amino acid sequence of the polypeptide of the present invention in the amino acid sequence of the protein, a fusion site from the N-terminal or C-terminal of the protein (insertion site of the polypeptide of the present invention in the amino acid sequence of the protein) Up to 45%, 40% or less, 35% or less, 30% or less with respect to the total number of amino acid residues of the protein. , 25% or less, 20% or less, preferably 15% or less, preferably 10% or less, 9% or less, 8% or less, 7% or less, 6% or less, 5% or less, It is more preferably 4% or less, 3% or less, 2% or less, or 1% or less. The lower the ratio, the more likely the polypeptide of the present invention is fused to a position closer to the N-terminal or C-terminal of the protein, and more likely to cause selective interaction (self-association).

(protein)
The protein in the fusion protein according to the present embodiment is not particularly limited, and any protein can be used. Specific examples of proteins include proteins that can be used industrially, proteins that can be used medically, and structural proteins. Specific examples of proteins that can be used for industrial or medical purposes include spider silk protein, enzyme, regulatory protein, receptor, peptide hormone, cytokine, membrane or transport protein, antigen used for vaccination, vaccine, antigen-binding protein, It may include immunostimulatory proteins, allergens, and full length antibodies or antibody fragments or derivatives. Specific examples of structural proteins include fibroin (for example, spider silk fibroin (spider silk), silkworm silk, etc.), keratin, collagen, elastin, resilin, fragments of these proteins, and proteins derived therefrom. You can

In the present specification, fibroin includes naturally occurring fibroin and modified fibroin. In the present specification, "naturally-derived fibroin" means a fibroin having the same amino acid sequence as naturally-derived fibroin, and "modified fibroin" means a fibroin having an amino acid sequence different from that of naturally-derived fibroin. To do.

Fibroin may be spider silk fibroin. "Spider silk fibroin" includes natural spider silk fibroin and modified fibroin derived from natural spider silk fibroin. Examples of the natural spider silk fibroin include spider silk protein (SSP) produced by spiders.

Fibroin is, for example, a protein containing a domain sequence represented by the formula 3: [(A) _n motif-REP] _m or the formula 4: [(A) _n motif-REP] _m -(A) _n motif. May be. The fibroin according to the present embodiment may further have an amino acid sequence (N-terminal sequence and C-terminal sequence) added to either or both of the N-terminal side and the C-terminal side of the domain sequence. The N-terminal sequence and the C-terminal sequence are typically, but not limited to, regions having no repeat of the amino acid motif characteristic of fibroin, and consist of about 100 amino acids.

As used herein, the term “domain sequence” refers to a crystalline region (typically corresponding to the (A) _n motif of an amino acid sequence) and an amorphous region (typically REP of an amino acid sequence) peculiar to fibroin. Corresponding to the amino acid sequence represented by formula 3: [(A) _n motif-REP] _m or formula 4: [(A) _n motif-REP] _m -(A) _n motif. Means an array. Here, the (A) _n motif represents an amino acid sequence mainly composed of alanine residues, and the number of amino acid residues is 2 to 27. (A) The _number of amino acid residues in the _n motif may be an integer of 2 to 20, 4 to 27, 4 to 20, 8 to 20, 10 to 20, 4 to 16, 8 to 16, or 10 to 16. .. Further, the ratio of the number of alanine residues to the total number of amino acid residues in the (A) _n motif may be 40% or more, 60% or more, 70% or more, 80% or more, 83% or more, 85% or more, It may be 86% or more, 90% or more, 95% or more, or 100% (meaning that it is composed of only alanine residues). At least seven of the (A) _n motifs present in the domain sequence may be composed of only alanine residues. REP indicates an amino acid sequence composed of 2 to 200 amino acid residues. REP may be an amino acid sequence composed of 10 to 200 amino acid residues. m represents an integer of 2 to 300, and may be an integer of 10 to 300. The plurality of (A) _n motifs may have the same amino acid sequence or different amino acid sequences. The plurality of REPs may have the same amino acid sequence or different amino acid sequences.

Examples of naturally-occurring fibroin include, for example, a domain sequence represented by Formula 3: [(A) _n motif-REP] _m or Formula 4: [(A) _n motif-REP] _m -(A) _n motif. The proteins included can be mentioned. Specific examples of naturally-derived fibroin include, for example, fibroin produced by insects or arachnids.

Examples of the fibroin produced by insects include Bombyx mori, Bombyx mandarina, Antheraea yam tai rya pynayi, ori peri erygium, Anteraea periyna, Pomegranate (Anteraea periyi), Anteraea periyna, Pomegranate (Anteraea peryny) ), silkworm proteins produced by silkworms such as Anthera apse assap, such as silkworm silkworm (Antheraea assama), such as silkworm silkworms (Samia cynthia), chestnut (Caligura japonica), chusser silkworms (Antheraea mylitta), and mug silkworms (Antheraea assama). Hornet silk protein is mentioned.

More specific examples of fibroin produced by insects include silkworm fibroin L chain (GenBank Accession No. M76430 (base sequence), and AAA278840.1 (amino acid sequence)).

Examples of fibroin produced by arachnids include spider silk protein produced by arachnids belonging to the order Araneae. More specifically, spiders belonging to the genus Araneus (Araneus) such as spiders, spiders spiders, red spiders, red spiders, and spiders, etc., spiders belonging to the genus Araneus, spider spiders, spider spiders, spider spiders and spiders belonging to the genus Nela spp. Spiders belonging to the genus Pronus, such as the spider Spider, and spiders belonging to the genus Cyrtarachne, such as Torinofundamasi and Otorinofundamas, and the genus Gasteracanth, such as the spider Spider and the genus Gasteracanthus. Spiders belonging to the genus Ordgarius (genus Ordgarius) such as spiders, spiders Mameitaisekimo and Mutsutoeiguisekimo spiders, spiders belonging to the genus Argiope, such as Argiope, spiders belonging to the genus Argiope, such as spiders, Argiope, and Argiope. Spiders belonging to the genus Spider, spiders belonging to the genus Acusilas such as spiders, spiders belonging to the genus Cytophora, such as spiders, spiders, blue-breasted spiders and spiders belonging to the genus Cytophora, and spiders belonging to the genus Poltys, such as the spider Spider. Spider silk proteins produced by spiders belonging to the genus Cyclosa (genus Cyclosa), such as spiders, spider spiders, margot spiders, and black spiders, and spider silk proteins produced by spiders belonging to the genus Chorizopes, such as Yamato kana spiders, Spiders that belong to the genus Tetragnatha, such as the black-faced spider and the black-legged spider, spiders that belong to the genus Leucaug, such as the spiders of the genus Leucaug, such as the spiders of the genus Leucaug Spiders belonging to the genus Menosira, such as spiders, black spiders, and spiders belonging to the genus Dyschiriognatha, such as the genus Dyschiriognatha, spiders of the genus Genus Lepidoptera, L. sp. Spider silk proteins produced by spiders belonging to the family Tetragnathidae, such as spiders belonging to the genus Euprostenops and spiders belonging to the genus Euprosthenops To be Examples of spider silk proteins include dragline proteins such as MaSp (MaSp1 and MaSp2) and ADF (ADF3 and ADF4), MiSp (MiSp1 and MiSp2), AcSp, PySp, Flag and the like.

Examples of keratin-derived proteins include Capra hircus type I keratin.

Examples of the collagen-derived protein include a protein containing a domain sequence represented by Formula 5: [REP2] _p (here, p represents an integer of 5 to 300. REP2 is Gly-X- An amino acid sequence composed of Y is shown, and X and Y are arbitrary amino acid residues other than Gly. A plurality of REP2s may be the same amino acid sequence or different amino acid sequences. it can.

Examples of elastin-derived proteins include proteins having amino acid sequences such as NCBI GenBank Accession Nos. AAC98395 (human), I47076 (sheep), and NP786966 (bovine).

Examples of the resilin-derived protein include a protein containing a domain sequence represented by the formula 6: [REP3] _q (wherein, q is an integer of 4 to 300 in the formula 6. REP3 is Ser-JJ). Shows an amino acid sequence composed of -Tyr-Gly-U-Pro, J represents an arbitrary amino acid residue, and particularly preferably an amino acid residue selected from the group consisting of Asp, Ser and Thr, U is arbitrary. The amino acid residue is preferably an amino acid residue selected from the group consisting of Pro, Ala, Thr, and Ser, and a plurality of REP4s may have the same amino acid sequence or different amino acid sequences. ) Can be mentioned.

(Method for producing fusion protein)
The fusion protein according to this embodiment can be produced by a conventional method using a nucleic acid encoding the fusion protein. The nucleic acid encoding the fusion protein may be chemically synthesized based on the nucleotide sequence information, or may be synthesized by using the PCR method or the like.

[Nucleic acid]
The nucleic acid according to the present embodiment encodes the polypeptide according to the present invention or the fusion protein according to the present invention.

The nucleic acid according to one embodiment may be a nucleic acid that hybridizes under stringent conditions with a complementary strand of the nucleic acid encoding the polypeptide of the present invention or the fusion protein of the present invention.

”The “stringent conditions” refer to conditions under which so-called specific hybrids are formed and non-specific hybrids are not formed. The “stringent conditions” may be any of low stringent conditions, medium stringent conditions and high stringent conditions. Low stringency conditions mean that hybridization will occur only when there is at least 85% identity between the sequences, eg, 5×SSC containing 0.5% SDS at 42° C. The conditions for hybridizing are mentioned. The moderately stringent condition means that hybridization occurs only when at least 90% identity exists between the sequences, and for example, 5×SSC containing 0.5% SDS is used at 50° C. The conditions for hybridizing are mentioned. Highly stringent conditions mean that hybridization occurs only when at least 95% identity exists between the sequences, for example, using 5×SSC containing 0.5% SDS at 60° C. The conditions for hybridizing are mentioned.

The nucleic acid according to one embodiment may be a nucleic acid having 90% or more sequence identity with the nucleic acid encoding the polypeptide of the present invention or the fusion protein of the present invention. The sequence identity is preferably 95% or more.

As a specific example of the nucleic acid according to the present embodiment, a nucleic acid having the nucleotide sequence represented by SEQ ID NO: 32, SEQ ID NO: 33 or SEQ ID NO: 34 can be mentioned. The base sequences represented by SEQ ID NO: 32, SEQ ID NO: 33 or SEQ ID NO: 34 encode the amino acid sequences represented by SEQ ID NO: 3, SEQ ID NO: 5 and SEQ ID NO: 7, respectively.

The nucleic acid according to this embodiment may be provided in a form incorporated in an expression vector. The expression vector has the nucleic acid sequence according to the present embodiment and one or more regulatory sequences operably linked to the nucleic acid sequence. The regulatory sequence is a sequence that controls the expression of the recombinant protein in the host (for example, a promoter, an enhancer, a ribosome binding sequence, a transcription termination sequence, etc.), and can be appropriately selected depending on the type of host. The type of expression vector can be appropriately selected according to the type of host, such as a plasmid vector, a virus vector, a cosmid vector, a fosmid vector, or an artificial chromosome vector.

[Method of giving protein self-association ability]
The present invention can also be regarded as a method for imparting a self-association ability to a protein, which comprises fusing the polypeptide of the present invention with a protein. Specific aspects such as the method for producing the polypeptide, protein, and fusion protein are as described above.

Hereinafter, the present invention will be described more specifically based on examples. However, the present invention is not limited to the following examples.

[Test Example 1: Evaluation of cohesive free energy g(k)]
The amino acid sequence (#GEN891, SEQ ID NO: 1) in which the tag sequence represented by SEQ ID NO: 17 is added to the N-terminal of the giant spider canal silkworm protein (ADF4) of the spiderweed spider, and the large spitrel bookmark silkworm protein of the spiderweed spider, The aggregation free energy g(k) of the amino acid sequence (#GEN889, SEQ ID NO: 2) in which the tag sequence represented by SEQ ID NO: 17 was added to the N-terminus of ADF3) was calculated according to the following formula (1).

In the formula (1), ε ^p _i,j (L) and ε ^a _i,j (L) are energies calculated for a pair of amino acid residues (see Table 1 and Table 2).

Further, in the formula (1), δ _{i ≦k<i+L} is 1 when k belongs to i to i+L−1, and 0 otherwise. δ _j≦k<j+L is 1 when k belongs to j to j+L−1, and is 0 otherwise. i and j correspond to the residue numbers in the amino acid sequence with the start point as 1. λ is 2.0.

FIG. 1 shows the aggregation free energy g(k) in the amino acid sequence (#GEN891, SEQ ID NO: 1) in which the tag sequence shown in SEQ ID NO: 17 is added to the N-terminus of the large spit duct dragline protein (ADF4) of the Japanese spider. It is the graph which plotted with respect to the residue number. FIG. 2 is an aggregation free energy g(k) in the amino acid sequence (#GEN889, SEQ ID NO: 2) in which the tag sequence represented by SEQ ID NO: 17 is added to the N-terminus of the large spit duct dragline filament protein (ADF3) of the spiderweed spider. It is the graph which plotted with respect to the residue number. In #GEN891, the aggregation free energy g(k) of each amino acid residue was −4.0 or less from the 546th amino acid residue to the 582nd amino acid residue (FIG. 1). The average value of the aggregation free energy g(k) per amino acid residue in the amino acid sequence composed of these amino acid residues is naturally -4.0 or less. In #GEN889, the aggregation free energy g(k) of each amino acid residue was −4.0 or less from the 547th amino acid residue to the 579th amino acid residue (FIG. 2). The average value of the aggregation free energy g(k) per amino acid residue in the amino acid sequence composed of these amino acid residues is naturally -4.0 or less. The lower the aggregation free energy g(k), the stronger the aggregation tendency. Therefore, these amino acid sequences are considered to be amino acid sequences having self-association ability.

[Test Example 2: Production and evaluation of polypeptide having self-association ability]
(1) Production of polypeptide having self-association ability Polypeptides having the amino acid sequences shown in SEQ ID NOs: 18, 20 and 22 were designed. The amino acid sequence represented by SEQ ID NO: 18 is the amino acid sequence represented by SEQ ID NO: 17 in the amino acid sequence of the non-repetitive region (NRC region) at the C-terminal of the large vesicular duct dragline protein (ADF4) of the spider (Araneus diadematus). Tag sequence) (His-ADF4-NRC, average HI 0.297, molecular weight 12.9 kDa). The amino acid sequence represented by SEQ ID NO: 20 is obtained by adding the amino acid sequence (tag sequence) shown in SEQ ID NO: 17 to the amino acid sequence of the non-repetitive region at the C-terminal of the giant vesicle guideline thread protein (ADF3) of the Japanese spider. Yes (His-ADF3-NRC, average HI 0.221, molecular weight 13.5 kDa). The amino acid sequence represented by SEQ ID NO: 22 is the amino acid sequence lacking the helix 5 located at the most C-terminal side from the C-terminal non-repetitive region of the giant spider duct silkworm protein of A. The amino acid sequence (tag sequence) shown is added (His-ADF3-shortNRC, average HI 0.777, molecular weight 10.6 kDa). The polypeptide having the amino acid sequence represented by SEQ ID NOs: 18, 20 and 22 is a part of the amino acid sequence consisting of amino acid residues having the aggregation free energy g(k) of #GEN891 and #GEN889 of -4.0 or less, or It includes all.

Nucleic acid encoding each polypeptide was cloned into a cloning vector (pUC118). Then, each nucleic acid was cleaved by restriction enzyme treatment with NdeI and EcoRI and excised, and then recombined into the protein expression vector pET-22b(+) to obtain an expression vector. Escherichia coli BLR (DE3) was transformed with the obtained expression vector to obtain transformed Escherichia coli (recombinant cell) expressing each polypeptide.

The resulting transformed E. coli was cultured in 2 mL of LB medium containing ampicillin for 15 hours. The culture solution was added to 100 mL of a seed culture medium containing ampicillin (Table 4) so that the OD ₆₀₀ was 0.005. The temperature of the culture solution was kept at 30° C., and flask culture was carried out until the OD ₆₀₀ reached 5 (about 15 hours) to obtain a seed culture solution.

The seed culture solution was added to a jar fermenter to which 500 mL of the production medium (Table 5) was added so that the OD ₆₀₀ was 0.05. The temperature of the culture solution was maintained at 37° C., and the culture was performed at a constant pH of 6.9. Further, the dissolved oxygen concentration in the culture solution was maintained at 20% of the dissolved oxygen saturation concentration.

Immediately after the glucose in the production medium was completely consumed, the feed solution (glucose 455 g/1 L, yeast extract 120 g/1 L) was added at a rate of 1 mL/min. The temperature of the culture solution was maintained at 37° C., and the culture was performed at a constant pH of 6.9. Further, the dissolved oxygen concentration in the culture solution was maintained at 20% of the dissolved oxygen saturation concentration, and the culture was performed for 20 hours. After that, 1 M isopropyl-β-thiogalactopyranoside (IPTG) was added to the culture medium to a final concentration of 1 mM to induce the expression of the desired polypeptide. At 20 hours after the addition of IPTG, the culture solution was centrifuged to collect the bacterial cells. The cells prepared from the culture solution before addition of IPTG and after addition of IPTG were ultrasonically disrupted and centrifuged to recover the target polypeptide. SDS-PAGE was performed on the recovered target polypeptide, and it was confirmed that the target polypeptide was expressed as an insoluble substance by the appearance of a band corresponding to the molecular weight of the target polypeptide depending on the addition of IPTG. did.

(2) Solubilization of Polypeptide Having Self-Association Ability The desired polypeptide is collected in an Eppendorf tube and sodium dodecyl sulfate (SDS) (2%) (Fuji Film Wako Pure Chemical Industries, for biochemistry) and Guanidine thiocyanate (manufactured by Fuji Film Wako Pure Chemical Industries, for biochemistry) was added to dissolve each target polypeptide. FIG. 3 shows supernatants and precipitates of a polypeptide (His-ADF3-NRC (SEQ ID NO: 20) or His-ADF3-shortNRC (SEQ ID NO: 22)) containing SDS and guanidine thiocyanate, which were treated with a denaturant (2- 1 is a photograph showing the results of analysis by SDS-PAGE in the presence of (mercaptoethanol). Bands corresponding to the molecular weight of each polypeptide (His-ADF3-NRC and His-ADF3-shortNRC) were mainly detected in the supernatant, and each polypeptide was solubilized by the addition of SDS and guanidine thiocyanate. It was shown that

(3) Confirmation of self-association ability Each polypeptide was dissolved in a buffer solution containing SDS (1% SDS, 5 mM Tris hydrochloride, 2 mM DTT, pH 8.0), and the solvent was diluted by dialysis with a buffer solution containing urea (8 M urea). , 25 mM Tris hydrochloride, pH 8.0) and column purification was performed under the following conditions using a histidine tag.

(Column purification)
A solution containing each polypeptide was added to a His Gravi Trap column (bed volume: 1 mL, manufactured by GE Healthcare) equilibrated with an equilibration solution (8 M urea, 100 mM sodium chloride, 25 mM Tris hydrochloride, 10 mM imidazole, pH 8.0). Each polypeptide was adsorbed on the column. The column was washed with 4 mL of washing solution (8 M urea, 100 mM sodium chloride, 25 mM Tris hydrochloride, 10 mM imidazole, pH 8.0). Next, 3 mL of the first eluate (8 M urea, 100 mM sodium chloride, 25 mM Tris hydrochloride, 300 mM imidazole, pH 8.0) was passed through the column to elute the polypeptide. Further, 3 mL of the second eluate (8 M urea, 100 mM sodium chloride, 25 mM Tris hydrochloride, 500 mM imidazole, pH 8.0) was flown to elute the polypeptide.

FIG. 4 is a photograph showing the results of analyzing each solution at the time of column purification by the SDS-PAGE method in the absence of a denaturing agent. In FIG. 4, “through” is a flow-through when the column is loaded with the lysate, “wash” is a flow-through when the column is washed, “elute1” is an eluate when eluted with the first eluate, “Elute2” is the eluate when eluted with the second eluate. As shown in FIG. 4, each polypeptide was detected as a band having a molecular weight corresponding to a dimer. From this result, it was confirmed that each polypeptide dimerizes in the absence of a denaturing agent, and that it has a self-association ability.

(4) Confirmation of self-association ability in the presence of a denaturant Using a polypeptide (His-ADF4-NRC) having the amino acid sequence represented by SEQ ID NO: 18 column-purified under the same conditions as in (3) above, It was confirmed whether the polypeptide has a selective self-association ability in the presence of a denaturant (DTT: dithiothreitol). As a control, instead of the polypeptide, a non-patent document (Proteins:Struct. Funct. Genet., vol. 49, pp. 255-265, (2002)), Proteins: Struct. Funct. Bioinformat., vol. 58. , Pp. 271-277, (2005)) and SNase (K116C) prepared according to the method.

DTT was added to a His-ADF4-NRC solution (solvent: 5M guanidine thiocyanate, 10 mM Tris hydrochloride, pH 8) so as to be 5 mM. The solution to which DTT was added was transferred to a dialysis tube (BioDesignDiallysisTubing #D100, 8000MWCO, manufactured by BioDesign Inc.), put in a dialysate (5M guanidine thiocyanate, 10 mM tris hydrochloride, pH 8), and dialyzed for 12 hours. Samples were collected from the His-ADF4-NRC solution before the addition of DTT, immediately after the addition of DTT, and 3 hours, 6 hours and 12 hours after the start of dialysis, and analyzed by SDS-PAGE. The same analysis was performed for SNase (K116C). In FIG. 5, polypeptide (His-ADF4-NRC) solutions before denaturant addition, denaturant (DTT) addition, and 3 hours, 6 hours, and 12 hours after the start of dialysis were analyzed by SDS-PAGE. It is a photograph which shows a result.

As shown in FIG. 5, the polypeptide having the amino acid sequence represented by SEQ ID NO: 18 (His-ADF4-NRC) was a monomer immediately after the addition of the denaturant, but almost 3 hours after the initiation of dialysis. Was dimerized. On the other hand, it was confirmed that even after 12 hours from the start of dialysis of SNase (K116C), there was a protein that did not dimerize.

③ In the presence of a denaturant, the three-dimensional structure cannot be maintained, and the interaction between proteins is controlled by random thermal fluctuations, so the efficiency of protein-protein interaction is extremely low. However, a general protein (SNase (K116C)) is not dimerized, and 3 hours after the start of dialysis (that is, the effect of the denaturant is considered to remain), it is shown by SEQ ID NO: 18. It was confirmed that a polypeptide having an amino acid sequence (His-ADF4-NRC) undergoes dimerization with extremely high efficiency. That is, it can be said that the polypeptide having the amino acid sequence represented by SEQ ID NO: 18 (His-ADF4-NRC) has a selective self-association ability even in the presence of a denaturing agent. In addition, the polypeptide having the amino acid sequence represented by SEQ ID NO: 18 (His-ADF4-NRC) can form a disulfide bond in a short time through selective interaction. Note that it is considered that a part of SNase (K116C) is dimerized by the collision of molecules due to Brownian motion.

[Test Example 3: Design and evaluation of polypeptide having self-association ability]
(1) Design of polypeptide capable of self-association Based on the C-terminal amino acid sequence of known large ampullate spidroin (MaSp) (amino acid sequences represented by SEQ ID NOs: 27 to 191), profile hidden Markov model (HMM) It was created. Then, the HMMmit program was executed using the created HMM as an input value to generate an amino acid sequence having an E value smaller than 10 ⁻⁵ (10000 sequences). With respect to the generated amino acid sequence, the average value of the aggregation free energy g(k) per 1 amino acid residue was calculated for every 5 consecutive amino acid residues. The amino acid sequences represented by SEQ ID NOs: 10, 12, 14 and 16 were selected as the amino acid sequences having the calculated minimum value of −4.5 or less.

(2) Production of polypeptide having self-association ability A polypeptide (His-ADF3-9467) having the amino acid sequence represented by SEQ ID NO: 30 was designed. The amino acid sequence represented by SEQ ID NO: 30 is obtained by adding the amino acid sequence (tag sequence) represented by SEQ ID NO: 17 to the N-terminus of the amino acid sequence represented by SEQ ID NO: 15. The amino acid sequence shown by SEQ ID NO:15 is the amino acid sequence shown by SEQ ID NO:16 with an additional sequence added to the N-terminus.

The polypeptide having the amino acid sequence represented by SEQ ID NO: 30 was produced and solubilized by the same procedure as in Test Example 2 except that the nucleic acid encoding the polypeptide was used, and denatured by the same procedure as in Test Example 2. The self-association ability in the presence of the agent was confirmed. To confirm the self-association ability in the presence of a denaturant, a sample was prepared from the polypeptide (His-ADF3-9467) solution immediately after the addition of DTT, 1.5 hours, 3 hours, 6 hours, and 8 hours after the start of dialysis. Collected and analyzed by SDS-PAGE. Results are shown in FIG.

FIG. 6 shows SDS-polypeptide (His-ADF3-9467) solutions immediately after the addition of denaturant (DTT) (0 hour), 1.5 hours, 3 hours, 6 hours, and 8 hours after the start of dialysis. It is a photograph which shows the result analyzed by the PAGE method. As shown in FIG. 6, the polypeptide having the amino acid sequence represented by SEQ ID NO: 30 (His-ADF3-9467) is a monomer immediately after addition of the denaturant, but 1.5 hours after the initiation of dialysis. It was confirmed that a dimer was formed and that it had selective self-association ability even in the presence of a denaturant.

Claims

A polypeptide having an amino acid sequence in which the average value of the aggregation free energy g(k) calculated by the following formula (1) per amino acid residue is −4.0 or less.

[In the formula (1), ε p i,j (L) and ε a i,j (L) are energies calculated for a pair of amino acid residues. δ i ≦k<i+L is 1 when k belongs to i to i+L−1, and is 0 otherwise. δ j≦k<j+L is 1 when k belongs to j to j+L−1, and is 0 otherwise. i and j correspond to the residue numbers in the amino acid sequence with the start point as 1. λ is 2.0. ]
A polypeptide comprising an amino acid sequence having 60% or more sequence identity with the amino acid sequence represented by any of SEQ ID NOs: 3 to 16.
The polypeptide according to claim 1 or 2, comprising a tag sequence at either or both of the N-terminus and the C-terminus.
The polypeptide according to claim 3, wherein the tag sequence comprises the amino acid sequence shown by SEQ ID NO:17.
A polypeptide comprising the amino acid sequence shown in any of SEQ ID NOs: 3 to 16.
A polypeptide comprising the amino acid sequence shown in any of SEQ ID NOs: 18 to 31.
A nucleic acid encoding the polypeptide according to any one of claims 1 to 6.
A fusion protein comprising the amino acid sequence obtained by fusing the polypeptide according to any one of claims 1 to 6 and a protein.
The fusion protein according to claim 8, comprising the amino acid sequence of the protein and the amino acid sequence of the polypeptide.
The fusion protein according to claim 8, which comprises the amino acid sequence of the polypeptide in the amino acid sequence of the protein.
The fusion protein according to claim 10, wherein the number of amino acid residues from the N terminus or C terminus of the protein to the fusion site is 10% or less with respect to the total number of amino acid residues of the protein.
The fusion protein according to any one of claims 8 to 11, wherein the protein is a structural protein.
The fusion protein according to any one of claims 8 to 12, wherein the protein is a spider silk protein.
A nucleic acid encoding the fusion protein according to any one of claims 8 to 13.
A method for imparting a self-association ability to the protein, which comprises fusing the polypeptide according to any one of claims 1 to 6 with a protein.