WO2017127940A1 - Artificial spider aciniform silk proteins, methods of making and uses thereof - Google Patents
Artificial spider aciniform silk proteins, methods of making and uses thereof Download PDFInfo
- Publication number
- WO2017127940A1 WO2017127940A1 PCT/CA2017/050099 CA2017050099W WO2017127940A1 WO 2017127940 A1 WO2017127940 A1 WO 2017127940A1 CA 2017050099 W CA2017050099 W CA 2017050099W WO 2017127940 A1 WO2017127940 A1 WO 2017127940A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sasp
- seq
- recombinant
- amino acid
- protein
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/43504—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates
- C07K14/43513—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates from arachnidae
- C07K14/43518—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from invertebrates from arachnidae from spiders
-
- D—TEXTILES; PAPER
- D01—NATURAL OR MAN-MADE THREADS OR FIBRES; SPINNING
- D01D—MECHANICAL METHODS OR APPARATUS IN THE MANUFACTURE OF ARTIFICIAL FILAMENTS, THREADS, FIBRES, BRISTLES OR RIBBONS
- D01D5/00—Formation of filaments, threads, or the like
- D01D5/06—Wet spinning methods
-
- D—TEXTILES; PAPER
- D01—NATURAL OR MAN-MADE THREADS OR FIBRES; SPINNING
- D01F—CHEMICAL FEATURES IN THE MANUFACTURE OF ARTIFICIAL FILAMENTS, THREADS, FIBRES, BRISTLES OR RIBBONS; APPARATUS SPECIALLY ADAPTED FOR THE MANUFACTURE OF CARBON FILAMENTS
- D01F4/00—Monocomponent artificial filaments or the like of proteins; Manufacture thereof
- D01F4/02—Monocomponent artificial filaments or the like of proteins; Manufacture thereof from fibroin
Definitions
- the present invention relates generally to isolated artificial spider aciniform silk proteins, referred to herein as (SASPs).
- SASPs isolated artificial spider aciniform silk proteins
- the present invention also relates to methods of making recombinant SASPs and uses of these SASPs to produce commercially important spun aciniform silk fibers.
- Spider silks are extraordinary biomaterials with diverse and impressive mechanical properties such as high toughness and tensile strength, making them mechanically superior to synthetic materials such as polyester and nylon. Spider silks are also biodegradable, making them desirable for many downstream applications in industry and medicine.
- Spiders produce up to seven types silk, serving different biological functions such as web construction and locomotion (major ampullate (MA) silk, or dragline silk) or capturing prey as the orb-web spiral (flagelliform silk).
- Aciniform silk wrapped silk is the toughest spider silk and is used to wrap and immobilize prey, build the egg case inner layer, and decorate the web. Its toughness is more than two, seven and sixty times those of spider dragline silk, Kevlar and steel, respectively.
- aciniform silk The main component of aciniform silk is the protein aciniform spidroin 1 (AcSpl), which is typically a large protein (-300-430 kDa, depending upon the species) composed of
- AcSpl undergoes a structural transition, from a state composed of globular helical domains connected by intrinsically disordered linkers to a state retaining a similar proportion of disorder alongside a mixture of oriented ⁇ -sheet and a- helical domains.
- the conversion from a-helix to ⁇ -sheet is believed to be important in providing strength to silk fibers.
- Previously isolated and characterized AcSpl including the amino acid sequence of AcSpl can be found in the National Center for Biotechnology Information (NCBI) Accession No. AAR83925.1
- spiders are cannibalistic and territorial in nature, making it infeasible to collect large amounts of silk for commercial applications. As well, it has been found that spiders produce silk in lower quantities when held in captivity, making collection of large amounts even more difficult.
- yeast cells, mammalian cells, insect cells, bacterial cells, transgenic plants, and animals have been used to produce recombinant spider silks.
- an Escherichia coli system has many advantages over other methods because of its high yield, cost effectiveness and simplicity in cultivation and protein purification. Since fiber formation in aciniform silk is poorly understood and harvesting large amounts of fibrous material from spiders is not feasible, coupled with the ability to readily modify gene/protein sequence in E. coli, employment of recombinant spider silk proteins is therefore highly desirable.
- the spinning dope is known to be an essential component involved in spinning fibers and can also influence fiber mechanical properties.
- l, l,l,3,3,3-hexafluoro-2-propanol (HFIP) has been used to dissolve recombinant spider silk proteins at relatively high concentrations. It should be mentioned that a major obstacle with artificial spider silk proteins is having an effective means for solubilizing a high enough concentrated spinning dope for wet-spinning.
- dimethylformamide and dimethylacetamide combined with organic salts such as N- methylmorpholine-N-oxide ( MMO).
- organic salts such as N- methylmorpholine-N-oxide ( MMO).
- Fluorinated organic solvents such as FTFIP and FIFA, which are potent hydrogen bond disruptors, have been reported to cause no measurable degradation to recombinant spider silk proteins.
- Electrospinning has been applied to regenerated Bombyx silk and recombinant MA silk proteins.
- Another method that has been employed is the use of microfluidic devices for elongational flow (rather than shearing flow), allowing for changes in pH, ionic strength and salt composition, essentially acting as a biomimetic of the MA gland.
- the wet-spinning method subjects a spinning dope to shear force followed by extrusion into a coagulation bath, serving to amalgamate the protein in a solid fiber (just like the spider).
- IP A isopropyl alcohol
- MeOH methanol
- the fiber thus induced (referred to as "as-spun” (AS) at this stage) may then be wound onto a spool/collector.
- spider silk proteins including major ampullate silk proteins (MaSpl, MaSp2 and ADF3), flagelliform silk proteins and tubuliform silk proteins (TuSpl) have been spun into fibers using a wet-spinning method, wet-spinning of aciniform silk has not been previously reported.
- Post-spin stretching treatments To further promote molecular alignment within AS fibers, a post-spin stretching treatment is typically performed. As the name implies, this is simply carried out by stretching the AS fibers in a solvent. This step is usually used to promote favorable structural changes within the fiber that lead to improvements in mechanical properties such as strength, extensibility and toughness. Just like the wet-spinning method, the rationale of the post-spinning treatment method is based on what is known about the MA spinning gland.
- the spinning duct is designed to allow for elongational flow (and/or mechanical drawing) of the ⁇ -sheet crystals and/or micelle spheroids (henceforth simply referred to as spheroids) diverging into an endpoint, the spinneret(s)/spigots.
- Elongational flow is postulated to allow for increased velocity with a simultaneous increase in shearing forces unducing unfolding and subsequent refolding of the protein structures. This is held to promote alignment within the spheroids and result in self- assembly into fibrillar precursors for fiber formation.
- This hypothetical mechanism is consistent with behavior of silk in studies using small angle X-ray scattering and atomic force microscopy, confocal scanning light microscopy and electron microscopy, as well as solid-state nuclear magnetic resonance spectroscopy.
- shearing flow is applied with wet-spinning. With shearing flow, both the velocity and shearing forces are constant.
- the post-spinning bath is composed of a mixture of organic solvent and water, with water being the functional component that improves fiber properties.
- organic solvent is added because many AS silk-based fibers cannot withstand water penetration without being dissolved.
- a two-step post-spinning treatment is often employed, with the first step using an organic solvent/water mixture followed by a second step in pure water.
- the present invention relates to isolated recombinant spider aciniform silk proteins (SASPs) comprising or consisting of an amino acid sequence as set forth in SEQ ID NOs: 1-44, and 47-84 or any homolog thereof.
- SASPs spider aciniform silk proteins
- the recombinant SASPs can be used to manufacture silk fibers having exceptional toughness and strength.
- the present invention provides a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, preferably, from about 100 amino acids to about 3,000 amino acids,
- the SASP comprises: at least one W subunit, each W subunit ranging from about 150 to 250 amino acid residues in length, and at least one non-repetitive fragment selected from: (i) a non-repetitive N-terminal fragment and (ii) a non- repetitive C-terminal fragment.
- the N-terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2).
- the C-terminal (CT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (C ac ) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (C ma i) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (C m a 2 ) of the spider silk protein major ampullate spidroin 2 (MaSp2).
- the at least one W subunit, the N- terminal (NT) non-repetitive fragment, and the C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in one or more of SEQ ID NOs: 1-84 & 106-131.
- the present invention provides a method for producing polymeric forms of SASPs of higher molecular weight, the method comprising: (i) providing two recombinant SASP polypeptides, one of which (SASPN) is fused at its C-terminus to a split intein, having an amino acid sequence as shown in any one of SEQ ID NOs: 90-105, and/or a related split inteins, and an N-terminal domain (IN) providing the construct: SASPN-IN, and the other recombinant SASP fragment (SASPc) which is fused at its N-terminus to a split intein C- terminal domain (Ic) providing the construct: Ic-SASPc; (ii) trans-splicing in vitro to join SASPN and SASPc to form the desired full-length SASP (SASPN-SASPC) ligated by a native peptide bond.
- SASPN recombinant SASP polypeptides
- SASPN recombinant SASP polypeptid
- exemplary fusion proteins shown in SEQ ID NO: 59-76 Two or more inteins could be used to ligate three or more SASPs together to form an even larger protein, with exemplary fusion proteins shown in SEQ ID NO: 70-76.
- the present invention provides a method for producing polymeric forms of SASPs of higher molecular weight, the method comprising: (i) providing two recombinant SASP fragments, one of which (SASPN) is fused at its C-terminus to a split intein, coming from SEQ ID 89-96 or related split inteins, N-terminal domain (IN) giving SASPN-IN and the other of which (SASPc) is fused at its N-terminus to a split intein C-terminal domain (Ic) giving Ic-SASPc; (ii) co-expression of both fusion proteins, SASPN-IN and Ic-SASPc, in one organism and trans-splicing in vivo to join these proteins to form the desired full-length SASP (SASPN-SASPC) ligated by a native peptide bond, with exemplary fusion proteins shown in SEQ ID NO: 59-76. Two or more inteins could be used to ligate
- the present invention provides a method for producing a polymeric form of an isolated spider aciniform silk protein (SASP).
- SASP spider aciniform silk protein
- recombinant SASPs are produced by transforming a suitable host as defined herein with a vector or a nucleic acid disclosed herein, and expressing the spider silk gene under suitable conditions.
- the illustrative method incorporates the steps: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, or from about 100 amino acids to about 3,000 amino acids, and, optionally, (d) a purification tag; (ii) optionally, removing the purification tag from the recombinant SASP; (iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and (iv) isolating the SASP from said affinity chromatography column.
- the present invention provides a method for producing a spider silk fiber, the method comprising: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acid residues, or from about 100 amino acids to about 3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; (b) at least one of: a N-terminal, or a N-terminal (NT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from spider silk protein major ampullate spidroin 2 (MaSp2) and a C-terminal (CT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from at least one of: a C-terminal fragment of the spider silk protein aciniform spidroin 1 (AcSpl), a C-terminal fragment of the spider silk protein major ampullate spidro
- the wet-spinning step of the third aspect further comprises, dissolving the recombinant SASP in at least two solvents selected from the group consisting of water, phosphoric acid, acetic acid, formic acid, hydrochloric acid, sulfuric acid, nitric acid, hexafluoroisopropanol (HFIP), hexafluoropropanol (HFP), hexafluoroacetone (H A), trifluoroacetic acid (TFA), trifluoroethanol (TFE), and methylimidazolium chloride, and wherein the recombinant SASP is wet-spun at a rate of 0.1 to 20 mL/hr, or from about 0.3 to about 20 mL/hr, or from about 0.6 to about 20 mL/hr, or from about 0.1 to about 10 mL/hr, or from about 0.1 to about 5 mL/hr, in at least one coagul
- FIG. 1 depicts a schematic diagram of the wet-spinning fiber production method.
- the illustrated apparatus consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing or needle (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), and a collector for the AS fibers.
- FIG. 2 depicts a schematic illustration of the post-spinning apparatus.
- the translational control knob allows for controlled motion of the stage
- the metric ruler allows for a more consistent and precise measurement of fiber stretching
- the drain plug allows the dH 2 0 bath to steadily drain.
- FIG. 3 A-3D depict photomicrographs of coomassie stained SDS-polyacrylamide gels.
- Protein constructs GLGL, LGLG, GLGLG and LGLGL were fused with H6-SUMO and purification was carried out in three steps: initial precursor purification using affinity chromatography, SUMO cleavage and reverse purification. The resulting silk proteins are tag free.
- H6-W3 and H6-W 4 are the only recombinant spidroins fused to a 3 ⁇ 4 tag and purification was carried out by one step purification using affinity chromatography.
- the resulting silk proteins have 3 ⁇ 4 tag at their N-terminus.
- FT 1-3 reverse purification through a series of flow-throughs was carried out by nickel affinity chromatography. Ni: after reverse
- the purified target proteins are labeled with boxes.
- FIG. 4A and FIG. 4B depict photographs of light microscope images of representative W 3 AS fibers formed using wet-spinning method. Images were taken at both 100X magnification (FIG. 4A) and 400X magnification (FIG. 4B).
- FIG. 5 A and FIG. 5B depict photographs of light microscope images of representative PS 4x (AS fiber being stretched 4 times its original length) W 3 fiber formed from wet-spinning. Images were taken at both 100X magnification (FIG. 5 A) and 400X magnification (FIG. 5B).
- FIG. 6A-6C depict combined line graphs of data related to stress-strain curves for AS and PS 4x W 3 fibers spun from the HFIP/H2O spinning dope (represented as dotted lines), and AS and PS 4x W 3 fibers spun from the TFA/TFE/H2O spinning dope (represented as solid lines).
- FIG. 7 depicts a line graph representing analysis of W 3 protein secondary structure in HFIP/H2O.
- FIG. 8 depicts a line graph representing analysis of W 3 protein secondary structure in TFA/TFE/H2O.
- FIG. 9A-9D depict line graphs of Raman spectra of W 3 fibers spun from the
- HFIP/H2O spinning dope in perpendicular (represented as dotted lines) or parallel (represented as solid lines) alignment relative to the incident polarized scattered light.
- Full spectra range of AS fibers (FIG. 9A), amide I region of AS fibers (FIG. 9B), full spectra range of PS 4x fibers (FIG. 9C) and amide I region of PS 4x fibers (FIG. 9D).
- FIG. 10A-10D depict line graphs of Raman spectra of W3 fibers spun from the TFA/TFE/H2O spinning dope in perpendicular (represented as dotted lines) or parallel
- FIG. 11 A and FIG. 1 IB depict photographs of birefringence of W3 fibers spun from HFIP/H2O spinning dope visualized by polarized light microscopy of AS fibers as shown in FIG. 11 A, and PS 4x fibers as shown in FIG. 1 IB.
- FIG. 12A and FIG. 12B depict photographs of birefringence of W3 fibers spun from TFA/TFE/H2O spinning dope visualized by polarized light microscopy of AS fibers as shown in FIG. 12A, and PS 4x fibers as shown in FIG. 12B.
- FIG. 13 A and FIG. 13B depict photomicrographs of W 2 Cac as shown in FIG. 13 A; and W3 fibers as shown in FIG. 13B.
- FIG. 14 depicts a line graph representing illustrative stress-strain curves of W3 and W2C ac fibers.
- FIG. 15A-15D depict bar graphs illustrating four mechanical properties of indicated fibers.
- FIG. 16 depicts a line graph of Far-UV circular dichroism spectroscopy of indicated protein in 50 mM potassium phosphate, pH 7.5, at 22°C.
- FIG. 17A-17C depict secondary structure and structural orientation of different fibers.
- Panel A Raman spectra of indicated fibers are presented. Amide I decomposition was based on five bands: a-helix; ⁇ -sheet; random coil, turns, etc.
- Panel B depicts an overlay of amide I and amide III bands of indicated fibers from XX (perpendicular) and ZZ (parallel) directions.
- Panel C depicts bar graphs illustrating WIxx ratios of a-helix and ⁇ -sheet bands indicated in (Panel B).
- the ⁇ -sheet structure of W 2 Cac and native aciniform silk is statistically more orientated and indicated by "*".
- FIG. 18A-18C depict intein splicing of SASPs.
- FIG. 18A depicts examples of screening intein splicing activity of the W 4 protein.
- FIG. 18B depicts a schematic diagram of an exemplary production scheme of the W2 protein.
- splicing products were passed through a Ni-NTA column, allowing the spliced W 2 to be collected in the flow-through while all other proteins were He-tagged and thus retained in the column.
- FIG. 18C in the illustrated SDS-PAGE gel shows production of W 6 and NW 4 by split-intein-mediated tram-splicing (IN and Ic refer to N- and C-terminal intein fragments, respectively).
- Precursor proteins W 4 IN and IcW 2 were mixed to produce splicing product W 6 ; NW 2 IN and IcW 2 were mixed to produce splicing product NW 4 .
- FIG. 19 depicts a schematic of an apparatus useful in the optimization of wet-spinning of spinning dope comprising the recombinant SASPs of the present invention.
- FIG. 20 depicts a schematic illustration of an actual, fully automated wet-spinning apparatus described in FIG. 19.
- FIG. 21 is a schematic illustration of a front perspective view of an exemplary fiber- spinning apparatus in accordance with the various embodiments of the present invention.
- FIG. 22 depicts a schematic of a front plan view of an exemplary fiber-spinning apparatus in accordance with the various embodiments of the present invention.
- FIG. 23 is a schematic illustration of a rear perspective view of an exemplary fiber- spinning apparatus in accordance with the various embodiments of the present invention.
- FIG. 24A-FIG. 24B are line graphs depicting the stress-strain characteristics of recombinant SASPs in accordance with the embodiments of the present invention.
- Embodiments including the transition phrase "consisting of or “consisting essentially of include only the recited components and inactive ingredients.
- the term "about” means plus or minus 10% of the numerical value of the number with which it is being used. Therefore, about 50% means in the range of 45%-55%.
- amino acid not only encompasses the 20 common amino acids in naturally synthesized proteins, but also includes any modified, unusual, or synthetic amino acid.
- protein is a polymer consisting essentially of any of the 20 amino acids.
- polypeptide refers to a polymer of amino acids and does not refer to a specific length of the product.
- peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not exclude post-expression modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like. Included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
- proteins As is known in the art, “proteins”, “peptides,” “polypeptides” and “oligopeptides” are chains of amino acids (typically L-amino acids) whose alpha carbons are linked through peptide bonds formed by a condensation reaction between the carboxyl group of the alpha carbon of one amino acid and the amino group of the alpha carbon of another amino acid.
- amino acids making up a protein are numbered in order, starting at the amino terminal residue and increasing in the direction toward the carboxy terminal residue of the protein.
- a polypeptide or protein "domain” comprises a region along a polypeptide or protein that comprises an independent unit. Domains may be defined in terms of structure, sequence and/or biological activity. In one embodiment, a polypeptide domain may comprise a region of a protein that folds in a manner that is substantially independent from the rest of the protein. Domains may be identified using domain databases such as, but not limited to PFAM, PRODOM, PROSITE, BLOCKS, PRINTS, SBASE, ISREC PROFILES, SAMRT, and PROCLASS.
- wild-type or “native” (used interchangeably) refers to the naturally- occurring polynucleotide sequence encoding a protein, or a portion thereof, or protein sequence, or portion thereof, respectively, as it normally exists in vivo.
- isolated polypeptide refers to a polypeptide that has been separated or purified from cellular components that naturally accompany it. Typically, the polypeptide is considered “purified” when it is at least 70% (e.g., at least 75%, 80%), 85%o, 90%), 95%o, or 99%) by dry weight, free from the proteins and naturally occurring molecules with which it is naturally associated.
- mutant refers to any change in the genetic material of an organism, in particular a change (i.e., deletion, substitution, addition, or alteration) in a wild-type
- mutant refers to a change in the sequence of a wild-type protein regardless of whether that change alters the function of the protein (e.g., increases, decreases, imparts a new function), or whether that change has no effect on the function of the protein (e.g., the mutation or variation is silent).
- nucleic acid is well known in the art.
- a “nucleic acid” as used herein will generally refer to a molecule (i.e., a strand) of DNA, RNA or a derivative or analog thereof, comprising a nucleobase.
- a nucleobase includes, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g., an adenine "A,” a guanine "G,” a thymine “T” or a cytosine “C”) or RNA (e.g., an A, a G, an uracil "U” or a C).
- nucleic acid encompass the terms “oligonucleotide” and “polynucleotide,” each as a subgenus of the term “nucleic acid.”
- oligonucleotide refers to a molecule of between 3 and about 100 nucleobases in length.
- polynucleotide refers to at least one molecule of greater than about 100 nucleobases in length.
- Double stranded nucleic acids are formed by fully complementary binding, although in some embodiments a double stranded nucleic acid may form by partial or substantial complementary binding.
- a nucleic acid may encompass a double-stranded molecule that comprises one or more complementary strand(s) or "complement(s)" of a particular sequence, typically comprising a molecule.
- a single stranded nucleic acid may be denoted by the prefix "ss" and a double stranded nucleic acid by the prefix "ds”.
- nucleotide refers to a nucleoside further comprising a "backbone moiety".
- a backbone moiety generally covalently attaches a nucleotide to another molecule comprising a nucleotide, or to another nucleotide to form a nucleic acid.
- the "backbone moiety” in naturally occurring nucleotides typically comprises a phosphorus moiety, which is covalently attached to a 5-carbon sugar. The attachment of the backbone moiety typically occurs at either the 3'- or 5'-position of the 5-carbon sugar.
- other types of attachments are known in the art, particularly when a nucleotide comprises derivatives or analogs of a naturally occurring 5- carbon sugar or phosphorus moiety.
- polynucleotide sequence and “nucleotide sequence” are also used interchangeably herein.
- upstream refers to a residue that is N-terminal to a second residue where the molecule is a protein, or 5' to a second residue where the molecule is a nucleic acid.
- downstream refers to a residue that is C-terminal to a second residue where the molecule is a protein, or 3' to a second residue where the molecule is a nucleic acid.
- portion and “fragment” are used interchangeably to refer to parts of a polypeptide, nucleic acid, or other molecular construct.
- the term "vector” refers to a nucleic acid molecule that may be used to transport a second nucleic acid molecule into a cell.
- the vector allows for replication of DNA sequences inserted into the vector.
- the vector may comprise a promoter to enhance and/or maintain expression of the nucleic acid molecule in at least some host cells.
- Vectors may replicate autonomously (extrachromasomally) or may be integrated into a host cell chromosome.
- the vector may comprise an expression vector capable of producing a protein or a nucleic acid derived from at least part of a nucleic acid sequence inserted into the vector.
- recombinant as used herein in relation to a polynucleotide intends a polynucleotide of semisynthetic, or synthetic origin, or encoded by cDNA or genomic DNA ("gDNA") such that it is not entirely associated with all or a portion of a polynucleotide with which it is associated in nature.
- gDNA genomic DNA
- recombinant as used herein in relation to a protein intends a protein produced in a non-arachnid organism transformed or transfected with a polynucleotide encoding a SASP or a homolog thereof, and isolated from the non-arachnid organism.
- the recombinant SASPs are produced in prokaryotic organisms, for example, bacteria.
- hybridization conditions refer to washing hybrids in low salt buffer at high temperatures.
- Hybridization may be to filter bound DNA using hybridization solutions standard in the art such as 0.5M NaHPO.sub.4, 7% sodium dodecyl sulfate (SDS), at 65 °C, and washing in 0.25 M NaHPO.sub.4, 3.5% SDS followed by washing 0.1 x SSC/0.1% SDS at a temperature ranging from room temperature to 68°C depending on the length of the probe (see e.g. Ausubel, F. M. et al., Short Protocols in Molecular Biology, 4 th Ed., Chapter 2, John Wiley & Sons, N. Y.).
- a high stringency wash comprises washing in 6 x S SC/0.05% sodium pyrophosphate at 37 °C for a 14 base oligonucleotide probe, or at 48 °C for a 17 base oligonucleotide probe, or at 55 °C for a 20 base oligonucleotide probe, or at 60 °C for a 25 base oligonucleotide probe, or at 65 °C for a nucleotide probe about 250 nucleotides in length.
- Nucleic acid probes may be labeled with radionucleotides by end-labeling with, for example, [ ⁇ - 32 ⁇ ] ATP, or incorporation of radiolabeled nucleotides such as [a- 32P ]dCTP by random primer labeling.
- probes may be labeled by incorporation of biotinylated or fluorescein labeled nucleotides, and the probe detected using Streptavidin or anti-fluorescein antibodies.
- identity refers to sequence identity between two amino acid sequences or between two nucleic acid sequences. Percent identity can be determined by aligning two sequences and refers to the number of identical residues (i.e., amino acid or nucleotide) at positions shared by the compared sequences. Sequence alignment and comparison may be conducted using the algorithms standard in the art (e.g. Smith and Waterman, 1981, Adv.
- BLAST and FASTA publicly available as BLAST and FASTA.
- ENTREZ available through the National Institutes of Health, Bethesda MD, may be used for sequence comparison.
- the percent identity of two sequences may be determined using GCG with a gap weight of 1, such that each amino acid gap is weighted as if it were a single amino acid mismatch between the two sequences.
- the term at least 90% identical thereto includes sequences that range from 90 to 100% identity to the indicated sequences and includes all ranges in between.
- the term at least 90% identical thereto includes sequences that are
- the term "at least 70% identical includes sequences that range from 70 to 100% identical, with all ranges in between. The determination of percent identity is determined using the algorithms described here.
- homologous proteins are those that are similar in sequence and function. Typically, the sequence identity between two homologous sequences will be at least 50%. Also, homologous proteins will have conservative substitutions for non-identical sequences. In alternate embodiments, the sequence identity between two homologous sequences will be at least 60%; or at least 75%; or at least 80%; or at least 90%, or at least 95%, or at least 96%), or at least 97%, or at least 98%, or at least 99%. Also, as used herein, the term
- homologue means a polypeptide having a degree of homology with the wild-type amino acid sequence. Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate percent homology between two or more sequences (e.g. Wilbur, W. J. and Lipman, D. J., 1983, Proc. Natl. Acad. Sci. USA, 80:726-730).
- spiders of the present invention are from the Arachnida class, the Araneae order, and include for example Nephilidae (especially Nephila species, like clavipes), and Araneidae such as Araneus and Argiopes, among many other suitable species.
- spider species from which the recombinant SASP are derived may include: Argiope trifasciata, Argiope amoena, Euprosthenops australis, and Araneus diadematus
- silk includes proteins and peptides produced by arthropods, typically by spiders, or by Lepidoptera, that display properties typical of native silk peptides.
- Lepidopteran silk generally is made up of a heavy fibroin polypeptide and a light chain fibroin peptide that are joined by a disulfide bond. In spiders there are two or more peptides not joined by a disulfide bond.
- silk includes proteinaceous filaments produced by insects or spiders, typically (but not necessarily) of two or more polypeptides. These may be chemically linked, and are typically very long polypeptides.
- a native spider silk polypeptide is one of the proteins or polypeptides, or fragments thereof, produced by spider silk glands. As used herein, a native spider silk polypeptide is a polypeptide having at least 99% identity, or in some cases 100% identity, to a native spider silk heavy and/or light fibroin polypeptide. Spider silk is a protein based fiber. In some
- a native spider silk protein is an AcSpl spider fibroin polypeptide. It is known for its high strength and elasticity. Each species of spider produces several kinds of silk, and the silks vary in sequence between the species. Each of these types of silk is encompassed by the present invention. Some of the varieties of silk produced by spiders for which either the natural peptides, or recombinant variants are encompassed by the methods, compositions and systems of the invention are: aciniform silk— a tough and elastic silk that is used to wrap captured prey;
- a "variant of a spider silk” or a “variant of a spider silk polypeptide” comprises or consists of a synthetic non-naturally occurring polypeptide having amino acid domains, such as beta-sheets and alpha helices that are derived from, or homologous to, those domains as found in spider silk proteins.
- a spider silk analog such as beta-sheets and alpha helices
- polypeptide is comprised of peptide domains that are at least 50%, 60%, 70%, 80%, 90%, 95%, 96%), 97%), or 98% identical to native spider silk.
- the SASPs may comprise, or consist of, a sequence made up of a plurality of alternating spider silk beta-sheet sequences and alpha helices as described herein.
- the spider silk polypeptide may comprise from 4 to 1000, or 4 to 800, or 4 to 500, or 5 to 200, or 5 to 100, or 5 to 50, or 6 to 40, or 6 to 30, or 6 to 15 or 6 to 12, or about 9 beta-sheet domains.
- the beta sheet regions may comprise a plurality of consecutive alanine residues, or a plurality of other amino acids that can form hydrogen bonds and that are typically arranged in consecutive order in beta sheet regions, and may range from about or 3 to 50, or 4 to 40, or 4 to 30, or 4 to 15, or 4 to 12, or 6 to 10, or about 9 consecutive hydrogen bonding amino acids (e.g., (Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala).
- the SASPs may comprise from 4 to 1000, or 4 to 800, or 4 to 500, or 5 to 200, or 5 to 100, or 5 to 50, or 6 to 40, or 6 to 30, or 6 to 15 or 6 to 12, or about 9 or 10 alpha helix domains.
- the alpha helix domains may comprise a plurality of glycine residues interspersed with other amino acids (e.g., Q, Y, L, S. R, A or P) typically found in alpha helix domains, and may range from about 4 to 200, or 5 to 100, 5 to 50, or 6 to 45, or 12 to 40, or 12 to 45 amino acids in length.
- other amino acids e.g., Q, Y, L, S. R, A or P
- the spider silk peptide domains are derived from, i.e., are at least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, or 98% identical to spider silk fibroin sequences.
- a SASP may comprise a single polypeptide having a mixture of different spider silk polypeptide domains, or analogs thereof, either from the same or different species.
- SASPs may be generated using molecular techniques. For example, PCR mutagenesis of DNA encoding the spider silk peptide analogs can be used. Or RNA based mutagenesis techniques may be used. An example of a PCR technique for making mutations in DNA is described in WO 92/22653. Another method for making analogs, muteins, and derivatives, is cassette mutagenesis based on the technique described by Wells, Gene, (1985) 34:315. Or, chemical modification of the peptides may be performed.
- the SASPs may contain amino acid substitutions, deletions, or insertions.
- the amino acid substitutions can be conservative amino acid substitutions or substitutions to eliminate non-essential amino acid residues such as to alter a glycosylation site, a
- conserved residues refers to amino acids that are the same among a plurality of proteins having the same structure and/or function.
- a region of conserved residues may be important for protein structure or function.
- contiguous conserved residues as identified in a three-dimensional protein may be important for protein structure or function.
- a comparison of sequences for the same or similar proteins from different species, or of individuals of the same species may be made.
- Conservative amino acid substitutions are generally those that preserve the general charge, hydrophobicity/hydrophilicity and/or steric bulk of the amino acid substituted, for example, substitutions between the members of the following groups are conservative substitutions: Gly/Ala, Val/Ile/Leu, Asp/Glu, Lys/Arg, Asn/Gln, Ser/Cys/Thr and Phe/Trp/Tyr.
- Whether a change in the amino acid sequence of a peptide results in a structural SASP of the present invention with the mechanical attributes can be readily determined by assessing the ability of the recombinant SASP to produce a fiber with the desired mechanical properties in a fashion similar to the wild-type spider aciniform silk protein. Polypeptides in which more than one amino acid replacement has taken place can readily be spun into a fiber and tested in the same manner.
- Percent (%) amino acid sequence identity with respect to a peptide or polypeptide sequence is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the specific peptide or polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or Megalign (DNASTAR) software.
- % amino acid sequence identity values are generated using the sequence comparison computer program ALIGN-2, as described in U.S. Pat. No. 6,828, 146.
- the inventors have discovered a method to express, purify, and isolate modified recombinant SASPs.
- the recombinant SASPs of the present invention are produced in prokaryotic organisms, such as bacteria.
- the present invention provides a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, or from about 100 amino acids to about 3,000 amino acids.
- the SASP comprises: at least one W subunit, each W subunit ranging from about 150 to 250 amino acid residues in length, and at least one of: (a) a non-repetitive N-terminal fragment and (b) a non-repetitive C-terminal fragment.
- the N-terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2).
- the C-terminal (CT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (C ac ) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (C ma i) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (C m a 2 ) of the spider silk protein major ampullate spidroin 2 (MaSp2).
- the at least one W subunit, the N-terminal (NT) non- repetitive fragment, and the C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in one or more of SEQ ID NOs: 1-131, or one or more of SEQ ID NOs: 1-76, and 106-131, for example, one or more of SEQ ID NOs: 1-84, and 106-131, or for example one or more of SEQ ID NOs: 1-44.
- the at least one W repeat unit, the non-repetitive N- terminal (NT) fragment and the non-repetitive C-terminal (CT) fragment have an amino acid sequence as shown in any one or more of SEQ ID NOs: 1-123.
- the isolated recombinant SASP or a homolog of any thereof is derived from the arachnid genus and species Argiope trifasciata or a combination of both Argiope trifasciata and
- W unit Euprosthenops australis or Argiope trifasciata and Argiope amoena.
- W unit W subunit and W repeat unit
- the recombinant SASPs comprise an amino acid sequence as set forth in any one or more of SEQ ID NOs: 1-44 & 47-84, for example, SEQ ID NOs: 1-44, or any homolog thereof.
- the isolated SASPs of the present invention comprise or consist of one or more polypeptides having an amino acid sequence as set forth in SEQ ID NOs: 1-84 & 106-131.
- the recombinant SASPs of the present invention also contemplate homolog proteins having an amino acid sequence as set forth in SEQ ID NO: 1-131, for example, SEQ ID NOs: 1-44 & 47-84, wherein one to ten, for example, one to seven, or one to five amino acids are substituted with a conservative amino acid substitution.
- recombinant SASPs of the present invention also contemplate allelic variants of the recombinant SASPs expressed in various non-arachnid organisms.
- the isolated recombinant SASP comprises or consists of one or more polypeptides having an amino acid sequence as set forth in any one or more of SEQ ID NOs: 1-84 & 106-131, for example, SEQ ID NOs: 1-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-58 and 59-84.
- an exemplary SASP of the present invention comprises or consists of one or more polypeptides having an amino acid sequence as set forth in any one or more of SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-58 and 59-84.
- the polypeptides are fused together to form one contiguous SASP.
- an isolated recombinant spider aciniform silk protein (SASP) or a homolog of any thereof comprises or consists of one or more peptides having an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-84, or one or more of SEQ ID NOs: 1-44 & 47-84, or one or more peptides having an amino acid sequence of SEQ ID NOs: 15-43, or SEQ ID NO: 2, 3, 5, 6, 37, 38, 47-84 or a homolog of any thereof, each of which is optionally fused to a purification tag molecule.
- SASP spider aciniform silk protein
- purification tag molecule intends to mean a tag such as glutathione-S-transferase (GST), c- Myc, biotin, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (3 ⁇ 4), 3 ⁇ 4- SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA).
- GST glutathione-S-transferase
- SUMO Small Ubiquitin-like Modifier
- MBP Maltose Binding Protein
- Trx Thioredoxin
- FLAG tag streptavidin-binding peptide
- SBP streptavidin-binding peptide
- CBP calmodulin-binding peptide
- S-tag hemagglutin
- exemplary recombinant SASPs comprising or consisting of an amino acid sequence of any one or more of SEQ ID NOs: 1-131, for example, SEQ ID NOs: 1- 84, or for example, SEQ ID NOs: 1-44 & 47-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-84, are provided in Table 1 below.
- the W repeat unit protein refers to a protein having an amino acid sequence as shown in SEQ ID NOs: 45 or 46.
- W is a wild-type repeat unit comprising 199 residues (SEQ ID NO: 46), if it is a one-repeat-unit protein (Wi), or it is as the first repeat unit in a protein that comprises two to eight W units (e.g. W 2 -W 8 ).
- the W repeat unit W may also contain 200 amino acid residues (SEQ ID NO: 45) if it is not the first repeat unit.
- the W unit may be the wild-type Wi as shown in SEQ ID NOs: 45 or 46, or it may have an amino acid sequence as shown in SEQ ID NOs: 45 or 46, with one or more amino acid mutations in either of these W units.
- Illustrative SASPs which may be recombinantly produced, and/or isolated, may have one or more amino acid mutations in the native sequence of W (SEQ ID NOs: 45 or 46) and are provided in Table 1.
- W units also include mutated W units having polypeptide amino acid sequences as provided in SEQ ID NO: 108-123.
- exemplary W units also include mutated W units having polypeptide amino acid sequences as provided in SEQ ID NO: 108-123 wherein each polypeptide has an additional serine (S) positioned as the first N-terminal amino acid.
- Each SASP contains at least one W unit (either wild-type or a wild-type amino acid sequence having one or more amino acid mutations) and at least one of: (a) a non-repetitive N- terminal fragment and (b) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non-repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (a) the C-terminal fragment (C ac ) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (C ma i) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (C ma
- GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRV ANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFS SALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASA SSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATG
- GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRV ANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFS SALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASA SSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTL RTVLRTGVSQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSA
- VQAVSRLPAGSDTSAY AQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLATGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
- VQAVSRLPAGSDTSAY AQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSVVQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAF SSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSS
- TX ThyX
- Intein #4 Ssp GyrB (SG) from Synechocystis species, strain PCC6803 4 N : N-fragment of SG intein
- Intein #4 Ssp GyrB (SG) from Synechocystis species, strain PCC6803 4 C : C-fragment of SG intein
- the SASPs encompass full length (from about 100 amino acids to about 5,000 amino acids, or and any number therebetween, for example, 1 to 3,000 amino acids) wherein the polypeptides are polypeptides two or more, each of which has an amino acid sequence from two or more of SEQ ID Nos: 1-131.
- an exemplary SASP amino acid sequence from two or more of SEQ ID Nos: 1-131.
- the N- terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2).
- the C-terminal (CT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmal) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2).
- an exemplary SASP will have at least one W repeat unit, wherein each W repeat unit (whether wild-type or mutated) or its circularly permuted analog (e.g., GLG, LGL, etc.) is repeated 0-22 times and is flanked, if present, by a non-repetitive N- and/or C-terminal fragment derived from AcSpl, MaSpl or MaSp2, and/or a mutated W subunit and/or a mutated C-terminal fragment thereof.
- SASPs of the present invention comprise one to fourteen (and all numbers therebetween, e.g.
- W units (either wild-type or mutated, or combinations thereof), for example one to four W units, wherein at least one of the W units is mutated compared to the W subunit derived from AcSpl (SEQ ID NOs: 45 and/or 46).
- SASPs of the present invention comprise one to fourteen W unit(s), wherein at least one of the W units is mutated compared to the W unit derived from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl, MaSp2 or TuSpl .
- SASPs of the present invention comprise one to fourteen W unit(s), wherein at least one of the W units is mutated compared to the W unit derived from AcSpl (SEQ ID NOs: 45 or 46) in combination with a mutated non-repetitive C-terminal fragment derived from
- SASPs of the present invention comprise a single W unit from AcSpl (SEQ ID NOs: 45 and/or 46, which may either be wild-type or mutated) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl, MaSp2 or TuSpl .
- SASPs of the present invention comprise a single mutated W unit from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2.
- SASPs of the present invention comprise a single mutated W unit from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2.
- SASPs of the present invention comprise one to four W units from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2.
- SASPs of the present invention comprise a mutated W unit derived from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2.
- SASPs of the present invention are recombinantly produced.
- the method of synthesizing a recombinant SASP provides an isolated and purified SASP protein that has a purity (on a %wt versus protein contaminants) that is typically greater than 90% pure, or greater than 95% pure, or greater than 96% pure, or greater than 97%) pure, or greater than 98% pure, or greater than 99% pure, or greater than 99.5% pure, or greater than 99.9% pure.
- the present invention also provides isolated and/or recombinant nucleic acids encoding a recombinant spider aciniform silk proteins (SASPs).
- SASPs spider aciniform silk proteins
- the subject nucleic acids may be single-stranded or double-stranded, DNA or RNA molecules. These nucleic acids are useful as therapeutic agents. For example, these nucleic acids are useful in making recombinant spider aciniform silk proteins which are spun into a variety of articles of manufacture.
- the invention provides isolated or recombinant nucleic acid sequences that are at least 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% identical to a region of the nucleotide sequence depicted in SEQ ID NOs: 124-129 in which the nucleotide sequence encodes a recombinant spider aciniform silk protein as described herein.
- nucleic acid sequences complementary to the subject nucleic acids, and variants of the subject nucleic acids are also within the scope of this invention.
- the nucleic acid sequences of the invention can be isolated, recombinant, and/or fused with a heterologous nucleotide sequence, or in a DNA library.
- nucleic acids of the invention also include nucleotide sequences that hybridize under highly stringent conditions to the nucleotide sequence depicted in SEQ ID NOs: 132-137, or a complement sequence thereof.
- appropriate stringency conditions which promote DNA hybridization can be varied.
- appropriate stringency conditions which promote DNA hybridization can be varied. For example, one could perform the hybridization at 6. Ox sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2. Ox SSC at 50°C.
- the salt concentration in the wash step can be selected from a low stringency of about 2.
- the temperature in the wash step can be increased from low stringency conditions at room temperature, about 22°C, to high stringency conditions at about 65°C. Both temperature and salt may be varied, or temperature or salt concentration may be held constant while the other variable is changed.
- the invention provides nucleic acids which hybridize under low stringency conditions of 6x SSC at room temperature followed by a wash at 2x SSC at room temperature.
- the recombinant nucleic acids of the invention may be operably linked to one or more regulatory nucleotide sequences in an expression construct.
- Regulatory nucleotide sequences will generally be appropriate for a host cell used for expression. Numerous types of appropriate expression vectors and suitable regulatory sequences are known in the art for a variety of host cells.
- one or more regulatory nucleotide sequences may include, but are not limited to, promoter sequences, leader or signal sequences, ribosomal binding sites, transcriptional start and termination sequences, translational start and termination sequences, and enhancer or activator sequences. Constitutive or inducible promoters as known in the art are contemplated by the invention.
- the promoters may be either naturally occurring promoters, or hybrid promoters that combine elements of more than one promoter.
- An expression construct may be present in a cell on an episome, such as a plasmid, or the expression construct may be inserted in a chromosome.
- the expression vector contains a selectable marker gene to allow the selection of transformed host cells. Selectable marker genes are well known in the art and will vary with the host cell used.
- the nucleotide sequence encoding a recombinant SASP is operably fused (in frame) to a signal peptide, or expression or purification aid, for example, an illustrative recombinant SASP (comprising an amino acid sequence of one or more of SEQ ID NOs: 1-131, for example, SEQ ID NOs: 1-44 & 47-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-84, or SEQ ID NOs: 1-44 provided in Table 1 herein), is fused to a purification tag, for example, an enzyme label, a peptide, a marker, for example, glutathione-S-transferase (GST), c-Myc, biotin, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (3 ⁇ 4), 3 ⁇ 4- SUMO, Maltose Binding Protein (MBP), Thior
- the subject nucleic acid is provided in an expression vector comprising a nucleotide sequence encoding a SASP and operably linked to at least one regulatory sequence.
- Regulatory sequences are art-recognized and are selected to direct expression of the soluble polypeptide. Accordingly, the term “regulatory sequence” includes promoters, enhancers, and other expression control elements. Exemplary regulatory sequences are described in Goeddel; Gene Expression Technology: Methods in Enzymology, Academic Press, San Diego, Calif. (1990). For instance, any of a wide variety of expression control sequences that control the expression of a DNA sequence when operatively linked to it may be used in these vectors to express DNA sequences encoding a soluble polypeptide.
- Such useful expression control sequences include, for example, the early and late promoters of SV40, tet promoter, adenovirus or cytomegalovirus immediate early promoter, the lac system, the tip system, the TAC or TRC system, T7 promoter whose expression is directed by T7 RNA polymerase, the major operator and promoter regions of phage lambda, the control regions for fd coat protein, the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase, e.g., PhoS, the promoters of the yeast a-mating factors, the polyhedron promoter of the baculovirus system and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof.
- T7 promoter whose expression is directed by T7 RNA polymerase
- the major operator and promoter regions of phage lambda the control regions for fd coat protein
- the design of the expression vector may depend on such factors as the choice of the host cell to be transformed and/or the type of protein desired to be expressed. Moreover, the vector's copy number, the ability to control that copy number and the expression of any other protein encoded by the vector, such as antibiotic markers, should also be considered.
- This invention also pertains to a host cell transfected with a recombinant gene including a coding sequence for one or more of the subject SASPs.
- the host cell may be any prokaryotic or eukaryotic cell.
- a soluble polypeptide of the invention may be expressed in bacterial cells such as E. coli, insect cells (e.g., using a baculovirus expression system), yeast, or mammalian cells.
- Other suitable host cells are known to those skilled in the art. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available.
- Such vectors include, but are not limited to, the following vectors: 1) Bacterial ⁇ pET, pQE70, pQE60, pQE-9 (Qiagen), pBS, pDIO, phagescript, psiX174, pbluescript SK, pETDuetTM, pBSKS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); 2) Eukaryotic-pWLNEO, pSV2CAT, pOG44, PXT1, pSG
- mammalian expression vectors comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences.
- DNA sequences derived from the SV40 splice, and polyadenylation sites may be used to provide the required non-transcribed genetic elements.
- transcription of the DNA encoding the SASPs by higher eukaryotes is increased by inserting an enhancer sequence into the vector.
- Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp that act on a promoter to increase its transcription.
- Enhancers useful in the present invention include, but are not limited to, the SV40 enhancer on the late side of the replication origin bp 100 to 270, a cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.
- the DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (for example, a promoter) to direct mRNA synthesis.
- Promoters useful in the present invention include, but are not limited to, the LTR or SV40 promoter, the E. coli lac or tip, the phage lambda PL and PR, T3 and T7 promoters, and the cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, and mouse metallothionein-I promoters and other promoters known to control expression of gene in prokaryotic or eukaryotic cells or their viruses.
- CMV cytomegalovirus
- HSV herpes simplex virus
- thymidine kinase thymidine kinase
- recombinant expression vectors include origins of replication and selectable markers permitting transformation of the host cell (e.g., dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or selectable antibiotic markers, for example, tetracycline or ampicillin resistance m E. coli).
- origins of replication e.g., dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or selectable antibiotic markers, for example, tetracycline or ampicillin resistance m E. coli.
- the expression vector may also contain a ribosome binding site for translation initiation (IRES) and a transcription terminator.
- the vector may also include appropriate sequences for amplifying expression.
- the present invention provides host cells containing the above-described vector constructs.
- the host cell is a higher eukaryotic cell (e.g., a mammalian or insect cell).
- the host cell is a lower eukaryotic cell (e.g., a yeast cell).
- the host cell can be a prokaryotic cell (e.g., a bacterial cell).
- host cells include, but are not limited to, Escherichia coli, Salmonella typhimurium, Bacillus subtilis, species within the genera Pseudomonas, Streptomyces, Staphylococcus, as well as eukaryotic host cells Saccharomycees cerivisiae, Schizosaccharomycees pombe, Drosophila S2 cells, Spodoptera Sf9 cells, Chinese hamster ovary (CHO) cells, COS-7 lines of monkey kidney fibroblasts, CI 27, 3T3, 293, 293 T, HeLa, epithelial cell lines, (for example, A549, BEAS-2B, PtKl, NCI H441), BHK cell lines, T-l (tobacco cell culture line), root cell and cultured plant cells.
- Escherichia coli Salmonella typhimurium
- Bacillus subtilis species within the genera Pseudomonas
- Streptomyces Streptomyces
- the constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence.
- introduction of the construct into the host cell can be accomplished by calcium phosphate transfection, DEAE- Dextran mediated transfection, or electroporation, gene gun approach and other known methods for introducing DNA into cells (See e.g., Davis et al. [1986] Basic Methods in Molecular Biology).
- the polypeptides and polynucleotides, including nucleic acid probes of the invention can be synthetically produced by conventional peptide and oligonucleotide synthesizers.
- SASPs can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such SASPs using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et al. (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N. Y. Exemplary methods for expressing SASPs, and SASPs fusion proteins are provided in further detail in the Examples below.
- the selected promoter is induced by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period.
- appropriate means e.g., temperature shift or chemical induction
- cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification.
- microbial cells employed in expression of SASPs can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
- the protein can be expressed in bacterial cells using plasmid based expression vectors, insect cells using baculoviral vectors, or in mammalian cells using vaccinia virus or specialized eukaryotic expression vectors.
- the cDNA sequence may be ligated to heterologous promoters, such as the simian virus (SV 40) promoter in the pSV2 vector or other similar vectors and introduced into cultured eukaryotic cells such as COS cells to achieve transient or long-term expression.
- the stable integration of the chimeric gene construct may be maintained in mammalian cells by biochemical selection, such as neomycin and mycophenolic acid.
- the DNA sequence can be altered using procedures such as restriction enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA sequences and site-directed sequence alteration with the use of specific oligonucleotides together with PCR.
- the cDNA sequence or portions thereof can be introduced into eukaryotic expression vectors by conventional techniques. These vectors permit the transcription of the cDNA in eukaryotic cells by providing regulatory sequences that initiate and enhance the transcription of the cDNA and ensure its proper splicing and polyadenylation. Different promoters within vectors have different activities, which alters the level of expression of the cDNA. In addition, certain promoters can also modulate function such as the glucocorticoid-responsive promoter from the mouse mammary tumor virus.
- Cell lines can also be produced which have integrated the vector into the genomic DNA. In this manner, the gene product is produced on a continuous basis.
- Vectors are introduced into recipient cells by various methods including calcium phosphate, strontium phosphate, electroporation, lipofection, DEAE dextran, microinjection, or by protoplast fusion.
- the cDNA can be introduced by infection using viral vectors.
- the expression vectors containing the SASP gene or portions thereof can be introduced into a variety of mammalian cells from other species or into non-mammalian cells.
- the recombinant expression vector comprises the selected DNA of the DNA sequences of this invention for expression in a suitable host.
- the DNA is operatively joined in the vector to an expression control sequence in the recombinant DNA molecule so that normal or mutant protein can be expressed.
- the expression control sequence may be selected from the group consisting of sequences that control the expression of genes of prokaryotic or eukaryotic cells and their viruses and combinations thereof.
- the expression control sequence may be selected from the group consisting of the lac system, the trp system, the tac system, the trc system, major operator and promoter regions of phage lambda, the control region of the fd coat protein, early and late promoters of SV40, promoters derived from polyoma, adenovirus, retrovirus, baculovirus, simian virus, 3-phosphoglycerate kinase promoter, yeast acid phosphatase promoters, yeast alpha-mating factors and combinations thereof.
- the host cells to be transfected with the vectors of this invention may be from a host selected from the group consisting of viruses, bacteria, yeasts, fungi, insects, mice or other animals or plant hosts.
- a host selected from the group consisting of viruses, bacteria, yeasts, fungi, insects, mice or other animals or plant hosts.
- similar systems are employed to express and produce the SASPs.
- SASPs of the present invention are produced by and isolated from, eukaryotic or prokaryotic organisms after the polynucleotide sequences encoding the SASPs of the present invention are transfected or transformed into said eukaryotic or prokaryotic organisms.
- the SASPs are expressed recombinantly in prokaryotic organisms, such as bacteria.
- methods for producing the recombinant SASPs include the steps of:
- a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non- repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment (C ac ) of the spider silk protein aciniform spidroin 1 (AcSpl), (2) a C-terminal fragment (C ma i) of the spider silk protein major ampullate spidr
- methods for producing the recombinant SASP include providing an expression vector containing a polynucleotide encoding an illustrative SASP and transforming the expression vector into a competent expression host cell.
- the polynucleotide encoding the SASP is also operably linked, and in frame with a purification tag, for example, glutathione-S-transferase (GST), c-Myc, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (H 6 ), He-SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA).
- GST glutathione-S-transferase
- c-Myc streptavidin
- streptavidin Small Ubiquitin-like Modifier
- H 6 hexa-histidine
- He-SUMO He-SUMO
- Maltose Binding Protein MBP
- Trx Thioredoxin
- FLAG tag strept
- the purification tag can be either the He-SUMO purification tag or the 3 ⁇ 4 purification tag. With the former being removed by cleaving off the H6-SUMO purification tag, and the latter not being removed (therefore saving time and steps to purify).
- the recombinant SASP may be recombinantly made having the He purification tag fused in frame to a SASP comprising one or more amino acid sequence(s) selected from SEQ ID NOs: 1-44 & 47-84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76.
- methods for producing and isolating the recombinant SASPs do not involve the use of a purification tag and affinity chromatography.
- the recombinant SASPs are produced in a competent host cell transformed or transfected with a polynucleotide encoding a recombinant SASP or a homolog thereof.
- the host cells are then induced to express the SASP and then is treated to disrupt the cell wall or cell membrane to release the content within the cell.
- a cell lysate is thereby produced and is used as the source material for extracting and isolating the recombinant SASP.
- the soluble recombinant SASPs contained within the cell lysate is precipitated by adding a precipitating solution, such as a low-alkyl alcohol, for example ethanol, or alternatively an ammonium sulfate salt can be used.
- a precipitating solution such as a low-alkyl alcohol, for example ethanol, or alternatively an ammonium sulfate salt can be used.
- a solubilizing agent such as a surfactant or a detergent.
- the surfactant can be a non-ionic surfactant.
- the solubilizing agent can include a non-ionic surfactant, for example, a hydrophilic polyethylene oxide polymer, for example, a Triton series of surfactant, for example, Triton X-100, or Triton X-l 14 surfactants.
- a non-ionic surfactant for example, a hydrophilic polyethylene oxide polymer, for example, a Triton series of surfactant, for example, Triton X-100, or Triton X-l 14 surfactants.
- non-ionic detergents useful as solubilizing agents can include P-40, Brij series of surfactants, for example, Brij-35 or Brij -58, Tween 20, Tween 80, Octyl glucoside, Octyl thioglucoside, and the like.
- the surfactant or detergent can be removed and washed with pure water and subsequently frozen at -80 °C prior to the last step of optional lyophilization.
- the SASP is produced by splicing two smaller SASP precursors together using intein tram-splicing.
- Methods for producing the recombinant SASPs include the steps of:
- each of the two precursor SASPs is linked or fused in frame with an intein N- fragment (denoted as IN) and/or an intein C-fragment (denoted Ic) and a purification tag and expressed separately (SASPN-IN and SASPc-Ic, as defined above);
- the precursors are purified using affinity column chromatography or surfactant washes;
- the present invention provides the following steps for making fibers comprising the SASPs of the present invention: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, for example, from 1-3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non- repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment
- the present invention provides a method for producing a spider silk fiber. The steps used to produce the fibers are as follows: (i) providing a recombinant SASP
- SASP comprising from about 100 amino acids to about 5,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, for example, from 1-3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non- repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N- terminal (NT) non-repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C-terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment (C ac
- the amino acid sequence of the recombinant SASP comprises any one or more of SEQ ID NO: 1-44 & 47- 84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76.
- wet-spinning the recombinant SASP comprises dissolving the recombinant SASP in at least two solvents selected from the group consisting of water, phosphoric acid, acetic acid, formic acid, hydrochloric acid, sulfuric acid, nitric acid,
- HFIP hexafluoroisopropanol
- HFP hexafluoropropanol
- HFA hexafluoroacetone
- TFE trifluoroethanol
- methylimidazolium chloride for example, HFIP and water or a mixture of TF A/TFE/water.
- solubilizing solvent is 70% HFIP/30% dH 2 0 (v/v), or 40-60%TF A/20-40 %TFE/20% dH 2 0.
- the spinning dope containing concentrated recombinant silk protein is wet-spun, i.e. the wet-spinning method subjects a spinning dope to shear force followed by extrusion into a coagulation bath, serving to amalgamate the protein in a solid fiber (just like the spider) at a rate of 0.3 to 20 mL/hr.
- the coagulation bath contains a single or plurality of solvents, selected from the group consisting of methanol, ethanol, isopropanol, acetone, ammonium sulfate, and water.
- the coagulation bath contains a mixture of an organic dehydrating solvent such as methanol, ethanol, isopropanol, acetone, ammonium sulfate, with water.
- the coagulation bath contains 80-95%) ethanol/5-20%) dH 2 0 or aqueous solutions of ammonium sulfate.
- the fiber thus induced (referred to as "as-spun" (AS) at this stage) may then be wound onto a spool/collector.
- the apparatus is fully automated and consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing or needle (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), a first set of rollers to guide the as-spun fiber into the post-spin stretching bath, a second set of rollers to facilitate stretching in the bath and a smooth transition of the post-spun fibers to the collector.
- a spinning apparatus 10 is used to form silk fibers. Spinning dope 16 is loaded into syringe 12.
- the spinning dope 16 may comprise recombinant spider silk protein powder and one or more solvents.
- the spinning dope 16 may comprise HFIP, TFA, TFE, water, or any combination thereof, in addition to mixtures of different recombinant spider silk protein powders, metal ions including, but not limited to, aluminum (Al 3+ ), titanium (Ti 3+ or Ti 4+ ), iron (Fe 2+ or Fe 3+ ), and zinc (Zn 2+ ).
- the syringe 12 is connected to syringe pump 14.
- the syringe pump 14 pumps the spinning dope 16 out of the syringe 12.
- the spinning dope 16 flows out of the syringe 12, is extruded through syringe tubing 18 and exits into coagulation bath 20.
- the syringe 12 is connected to and in fluid communication with the syringe tubing 18 at a proximal end of the syringe tubing 18; a distal end of the syringe tubing 18 is below the surface of the coagulation bath 20.
- the coagulation bath 20 comprises one or more solvents to aid in coagulation of the spinning dope 16.
- the coagulation bath 20 may comprise a mixture of water and ethanol.
- the coagulation bath 20 can also include organic solvents such as isopropanol and methanol, combinations of these organic solvents (including ethanol), and mixtures of this with water, metal ions including, but not limited to aluminum (Al 3+ ), titanium (Ti 3+ or Ti 4+ ), iron (Fe 2+ or Fe 3+ ), and zinc (Zn 2+ ).
- organic solvents such as isopropanol and methanol, combinations of these organic solvents (including ethanol), and mixtures of this with water, metal ions including, but not limited to aluminum (Al 3+ ), titanium (Ti 3+ or Ti 4+ ), iron (Fe 2+ or Fe 3+ ), and zinc (Zn 2+ ).
- the spinning dope can be immersed in the coagulation bath from about 1 second to about thirty seconds.
- the coagulation bath 20 amalgamates protein in the soluble spinning dope 16 into an insoluble fiber form.
- the coagulation bath 20 is contained within coagulation vessel 22.
- the AS fiber 24 exits the coagulation bath 20 and is connected to a first roller 26A.
- the AS fiber 24 may be initially connected to the first roller 26A by manual means, for example, using tweezers.
- the AS fiber 24 is then rolled across a first set of rollers 26 that are attached to a first roller mount 28.
- the rollers are driven rotationally by a motor so as to pull the AS fiber 24 circumferentially across a portion of their surface.
- a continuous length of fiber is rolled over each of the rollers.
- the AS fiber 24 After the AS fiber 24 travels across second roller 26B and as it travels across third roller 26C, the AS fiber 24 is soaked in a post-spin stretching bath 30.
- the AS fiber 24 undergoes post-spin stretching while in the post-spin stretching bath 30.
- the post-spin stretching bath 30 comprises one or more solvents to aid in the stretching of the AS fiber 24.
- the post-spin stretching bath 30 comprises water in combination with organic solvents such as isopropanol and ethanol, metal ions including, but not limited to aluminum (Al 3+ ), titanium (Ti 3+ or Ti 4+ ), iron (Fe 2+ or Fe 3+ ), and zinc (Zn 2+ ).
- the post-spin stretching bath 30 is contained within post-spin stretching vessel 32.
- Post-spun fiber (PS fiber) 38 travels across fourth roller 34A, exits the post- spin stretching bath 30 and continues to travel across a second set of rollers 34.
- the third roller 26C and fourth roller 34A are fully or partially submerged in the post-spin stretching bath 30.
- a length of the fiber between the third roller 26C and fourth rollers 34A is submerged in the post- spin stretching bath 30.
- the second set of rollers 34 are attached to a second roller mount 36.
- the PS fiber 38 continues to be rolled across fifth roller 34B and sixth roller 34C.
- the PS fiber 38 is collected on the sixth roller 34C also referred to as the collector. Collection of the PS fiber 38 on the collector occurs by continuously winding the PS fiber 38 around the full circumference of the collector.
- the second set of rollers 34 rotate at a faster rate than the first set of rollers 26 to stretch the fiber.
- different ratios of stretching provide PS fibers with differing mechanical properites.
- the PS fibers may be stretched from 3x to 6x. Ratio speed (second set of rollers:first set of rollers) would be 3 : 1, 4: 1, 5: 1, or 6: 1, or 7: 1.
- the rollers may be grooved rollers or planar cylindrical rollers.
- the first roller 26A, second roller 26B, fifth roller 34B and sixth roller 34C are planar cylindrical rollers and the third roller 26C and fourth roller 34A are grooved rollers.
- rollers 26A-B and 34B-C can include the planar cylindrical rollers: having diameters ranging from about 3 cm to about 9 cm, preferably around 6 cm.
- the rollers 26A-B and 34B-C can have lengths ranging from about 4 cm to about 12 cm, preferably 8 cm.
- rollers 26C and 34A are grooved rollers wherein the diameters in the middle of the grooved portion of the rollers 26C and 34A can range from about 3 cm to about 9 cm, preferably 6 cm and on the ends are preferably about 9 cm, ranging from about 6 cm to aboutl2 cm.
- the lengths of rollers 26C and 34A can measure from about 3 cm to about 11 cm in length, preferably 7 cm in length.
- the grooved rollers may have a flared cylindrical shape with a diameter that is smallest at or near the middle of the cylinder and largest at or near the ends of the cylinder.
- the rollers are mounted on the roller mounts in a configuration that allows the fiber to be rolled across the rollers in a zig-zagging manner. This allows for tension in the fiber and greater surface contact of the fiber on the rollers.
- the coagulation vessel 22 and post-spin stretching vessel 32 are adjacent to one another.
- the coagulation vessel 22 and post-spin stretching vessel 32 are physically attached to one another or are formed from a single vessel with a divider to separate the coagulation bath 20 and post-spin stretching bath 30.
- the syringe pump 14 has a power supply 40 that may be connected to a power source.
- the syringe pump 14 may have an arm 42 that is driven by the syringe pump 14 to push a syringe plunger 46 of the syringe 12.
- the syringe plunger 46 may be connected to the syringe pump arm 42 at a syringe holder 44.
- the syringe 12 may also be connected to a syringe needle (not shown).
- the syringe pump 14 may be controlled to produce a desired flow rate based on the diameter of the syringe 12 and the drive rate of the syringe pump arm 42.
- this flow rate can be used to control the velocity of the spinning dope 16 into the coagulation bath 20 based on the diameter of the syringe tubing 18.
- the velocity and diameter of the flow of spinning dope 16 into the coagulation bath 20 may be used to control production of the AS fiber 24, which is fed into the remainder of the apparatus.
- First stepper motor board and keypad 70 and second stepper motor board and keypad 72 are used to control a first set of stepper motors 74 and second set of stepper motors 76, respectively.
- Each stepper motor board and keypad (70, 72) has a corresponding power supply 80, 82 that may be connected to a power source.
- Each stepper motor board and keypad (70, 72) is in electrical communication with a series of stepper motor drivers (not shown).
- Each stepper motor board and keypad (70, 72) is also in electrical communication with its corresponding set of stepper motors (74, 76).
- the first set of stepper motors 74 consists of a first, second, and third stepper motor (74A, 74B, and 74C) that each drive rotation of the corresponding first, second, and third rollers (26A, 26B, and 26C) (third roller not shown in FIG. 23).
- the second set of stepper motors 76 consists of a fourth, fifth, and sixth stepper motor (76A, 76B, and 76C) that each drive rotation of the corresponding fourth, fifth, and sixth rollers (26A, 26B, and 26C) (fourth roller not shown in FIG. 23).
- the rotation rate of the rollers as driven by the stepper motors can be used to control the stretching of the fibers.
- the rotation rate of the second set of rollers 34 may be greater than the rotation rate of the first set of rollers 26 in order to stretch the fiber as disclosed above.
- Materials, components, and process conditions familiar to persons of ordinary skill in the art may be used in constructing and using the spinning apparatus 10. The following are some specific exemplary embodiments that may be used in the spinning apparatus.
- the coagulation vessel 22 may be made from glass or other solid material for example, hardened plastic. Rollers and roller mounts may be made of a non-warping metal, such as stainbless steel or 6061 aluminum.
- the post-spin stretching vessel 32 may be made from poly(methyl methacrylate).
- the syringe 12 may be a 250 Hamilton reversible needle (RN) syringe (Hamilton, Reno NV) or a 250 Hamilton luer tip special cemented needle (LTSN) syringe (Hamilton, Reno NS).
- the syringe tubing 18 may be 6-10 cm long polyetheretherketone (PEEK) tube (inner diameter: 0.127-0.254 mm) (Sigma-Aldrich; Oakville, ON).
- the syringe tubing 18 may be attached to the syringe 12 by RN compression fittings (1/16 inch; Hamilton; Reno, NV).
- a 26s gauge needle (inner diameter: 0.13 mm) may also be used with the syringe 12.
- the syringe pump 14 may be a KD Scientific model 100 series syringe pump (Holliston, MA). The syringe pump 14 may be set to a flow rate of 5 to 50 ⁇ / ⁇ .
- the wet-spinning apparatus as described herein was optimized to incorporate an automated post-spin stretching treatment. Since it was demonstrated that a post-spin stretching treatment improves fiber mechanical properties (mechanical testing), increases fiber
- the present invention also provides SASP fibers useful in the manufacture of various articles.
- articles of manufacture incorporating the SASP fibers of the present invention may be useful in the manufacture of articles for use the field of biotechnology and/or medicine.
- such articles could include:
- threads/fibers of the present invention can be used in the
- SASPs of the present invention are in the manufacture and processing of clothing fabric (textiles) and leather, automotive covers and parts, aircraft construction materials as well as in the manufacture and processing of paper.
- the recombinant SASPs of the present invention may be added to cellulose and keratin and collagen products and thus, the present invention is also directed to a paper or a skin care and hair care product, comprising cellulose and/or keratin and/or collagen and the spider silk proteins of the present invention. Papers and skin care and hair care products, in which the SASPs of the present invention can be incorporated can improve or enhance the tensile strength or tear strength of various woven materials.
- the recombinant SASPs of the present invention can be used as a coating for metals, plastics, textile and leather products, thereby conferring stability and durability to the coated product.
- Expression plasmid was transformed into E. coli BL21(DE3) (Novagen, Darmstadt, Germany).
- Lauria-Bertania (LB) medium was prepared (Fisher Scientific; Ottawa, ON), and a starter culture was formed by inoculating a single colony of cells in LB media containing 50 ⁇ g/mL ampicillin (Fisher Scientific; Ottawa, ON), which was incubated with a shaker at 37 °C overnight. The resulting overnight culture was stored at 4 °C and, following 6 h storage at 4 °C, was transferred into 1.6 L LB medium containing 50 ⁇ g/mL ampicillin.
- Expression plasmid was transformed into E. coli BL21(DE3) (Novagen, Darmstadt, Germany). A starter culture (100 mL) was formed as described above at 37 °C overnight. Cells were harvested and resuspended in 1 L ZYP auto-induction media containing 10 g/L tryptone, 5 g/L yeast extract, 25 mM sodium succinate, 2 % glycerol, 2 mM MgS0 4 , 50 mM Na 2 HP0 4 , 50 mM KH 2 P0 4 , 25 mM ( H 4 ) 2 S0 4 , 0.05% glucose, 0.2 % alpha-lactose monohydrate and trace metals (50 ⁇ FeCl 3 , 20 ⁇ CaCl 2 , 10 ⁇ MnCl 2 , 10 ⁇ ZnS0 4 , 2 ⁇ CoCl 2 , 2 ⁇ CuCl 2 , 2 ⁇ NiCl 2 , 2 ⁇ Na 2 Mo
- the supernatant was loaded onto a column with immobilized Ni-NTA Sepharose (Qiagen, Germany), flowing through twice at room temperature. As per manufacturer's instructions, the column was then washed using wash buffer and then the bound recombinant proteins with a hexahistidine (H6)-SUMO protein tag or just a H 6 -tag were eluted using elution buffer (50 mM NaH 2 P0 4 , 300 mM NaCl, 250 mM imidazole, pH 8.0). For H 6 -SASPs, the eluted protein is the final purified target protein.
- elution buffer 50 mM NaH 2 P0 4 , 300 mM NaCl, 250 mM imidazole, pH 8.0.
- H6-SUMO-silk proteins For H6-SUMO-silk proteins, they were further digested by SUMO protease in the presence of 1 mM dithiothreitol (Fisher Scientific; Ottawa, ON). The reaction mixture was then transferred to dialysis tubing and dialyzed against 50 mM K 3 P0 4 (pH 7.5). SUMO cleavage and dialysis was carried out at 4°C overnight.
- the soluble fraction of cell lysate (supernatant) was also used for protein purification in this method. Firstly, the soluble target protein was precipitated by adding 20% ( FL ⁇ SC (20% to its saturation concentration) to the supernatant at 4°C for overnight. Centrifugation at 12,000 rcf for 30 min was conducted to pellet precipitated silk protein. Pellet was suspended in 0.1-1%) Triton X-100 (-0.5 mL/1 mg protein) with a tissue-grinder homogenizer until all clumps were disrupted. Centrifuging at 12,000 rcf for 10 min to collect pellet and repeat the Triton wash and centrifuging steps twice. After Triton wash, silk protein should remain in the pellet.
- dH 2 0 was used to wash pellet using the suspending and centrifuging method used for Triton.
- the silk protein (in pellet) was finally suspended in dH 2 0, frozen at -80 °C, and lyophilized.
- Step 1 Solubilization of recombinant spider silk protein
- Dope #1 Spinning dope type#l was made by dissolving -8% (w/v) lyophilized recombinant spider silk protein powder into 70% F£FIP/30%> dH 2 0 (v/v) in glass vials. Protein- solvent mixture was vortexed until homogeneous, and then sonicated twice at 37 °C for 5 min, with vortexing in between. Subsequently, the glass vials were wrapped with aluminum foil to prevent exposure of the protein-solvent mixtures to light, and incubated for -48 h at room temperature, with occasional vortexing.
- the protein-solvent mixture was centrifuged at 14,000 rcf for 30 min at 20°C and transferred into a new glass vial. This was repeated until the protein-solvent mixture (spinning dope) was transparent with no visible insoluble components remaining.
- Dope #2 Spinning dope type#2 was made by dissolving -10-15%) (w/v) lyophilized recombinant spider silk protein powder into 40-60% TF A/20-40% TFE/20% dH 2 0. Protein- solvent mixture was vortexed for -2 min followed by incubation for -30 min at room
- Step 2 Wet-spinning
- the spinning dopes were loaded into a 250 ⁇ _, Hamilton reversible needle (RN) syringe (Hamilton, Reno NV), which was attached to 6-10 cm long polyetheretherketone (PEEK) tube (inner diameter: 0.127-0.254 mm); Simga-Aldrich; Oakville, ON) by RN compression fittings (1/16 inch; Hamilton; Reno, NV).
- PEEK polyetheretherketone
- a 26s gauge needle inner diameter: 0.13 mm was also used for this syringe.
- the syringe containing the spinning dope was securely attached to a KD Scientific model 100 series syringe pump (Holliston, MA).
- the dope was extruded through the PEEK tube into a coagulation bath of 80-95% ethanol/5-20%) dH 2 0 at a constant speed of 16 ⁇ 7 ⁇ . Fibers formed in the coagulation bath were as-spun fiber (AS) and were carefully picked up by tweezers and guided onto a collector. A schematic of this can be seen in FIG. 1.
- AS as-spun fiber
- FIG. 1 Schematic of the wet-spinning fiber production method.
- This apparatus consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), and a collector for the AS fibers.
- Step 3 Post-spin stretching
- the post-spin stretching apparatus was slowly filled with dH 2 0 until the fibers were fully immersed. Using the control knob, the fibers were smoothly stretched to 4x their original length and allowed to rest in the dH 2 0 bath for 3 min. The dH 2 0 bath was then drained with simultaneous misting using 95% EtOH until the fibers were no longer in contact with dH 2 0. Subsequently, the resulting "PS stretched" fibers were allowed to dry at room temperature for 5-10 min. A schematic of the post-spin stretching procedure with this apparatus can be seen in FIG. 2.
- FIG. 2 An image of the post-spin stretching apparatus.
- the translational control knob allows for controlled motion of the stage, the metric ruler allows for a more consistent and precise measurement of fiber stretching, and the drain plug allows the dH 2 0 bath to steadily drain.
- Fibers were pulled from solutions of purified spider silk proteins in different buffers (10 mM Tris.HCl, pH 8.0 or 50 mM K 3 P0 4 , pH 7.5) at room temperature.
- buffers 10 mM Tris.HCl, pH 8.0 or 50 mM K 3 P0 4 , pH 7.5
- a 5-20 ⁇ , protein solution (-0.1-1 mg/mL) was placed on a glass slide at room temperature, and fibers were pulled from the protein solution using a plastic 200 ⁇ . pipette tip, where one end of the fiber apparently attached to the pipette tip.
- Each pulling action was a continuous motion at a speed of -2-10 mm/s.
- split-intein can ligate two proteins together to make a larger protein.
- the ability to splice more silk proteins together is more desired and intein tandom splicing is required, meaning more than one split-intein is needed.
- Intein splicing is also protein dependent, especially the residues ajecent to it. So we selected 6 split-inteins to test splicing activities on the W 4 protein having four W repeat units. In total, 30 constructs (Table 3) were made, with each one containing 3 ⁇ 4 tag + intein C-fragment (Ic) +W 4 + intein N-fragment (IN) + H 6 tag.
- Trichodesmium erythraeum FMS101 SEQ ID NO 92; 4: Ssp GyrB (SG) from Synechocystis species, strain PCC6803, SEQ ID NO 93; 5: Rma DnaB (RB) from Rhodothermus marinus, SEQ ID NO 94; 6: Cne-AD PRP8 (CP) from Cryptococcus neoformans, SEQ ID NO 95.
- W 4 is four W repeat units which includes the first W repeat unit of SEQ ID NO: 46 and three W repeat units containing the amino acid sequence of SEQ ID NO 45.
- the N- fragment of intein No. 1 is referred to as IN
- the C-fragment of intein No. 3 is referred to as 3c etc.
- Plasmid was transformed into E. coli BL21(DE3), a starter culture and a larger culture was then prepared in the same manner as was described in [00140]. The cells were allowed to incubate with shaking at 37 °C until the optical density at 600 nm (OD600) reached 0.8-1 and expression was induced with 0.8 mM IPTG at 37 °C for 2-5 hours.
- Intein tram'-splicing was performed by mixing two precursor proteins together in 50 mM potassium phosphate, pH 7.5, with 1 mM DTT added, for example, mixing 4CW 4 3N and 3CW 4 5N together in order to test splicing activity of intein 3. Splicing reaction was carried out at room temperature or at 4°C for 1-16 hours.
- inteins After screening intein splicing activity, two best performed inteins (4: Ssp GyrB and 5: Rma DnaB) were selected for protein ligation to make larger proteins, eg. Proteins with more than four W units.
- N-precursor WI/ 2 / 4 IN or NW 2 / 4 IN: Wi, W 2 or W 4 + IN + H 6 tag; or NTD + W 2 or W 4 + IN + 3 ⁇ 4 tag
- C-precursor IcWi/ 2 / 4 or IcW 2 / 4 C: H 6 tag + intein Ic + Wi, W 2 or W 4 ; or H 6 tag + Ic + W 2 or W 4 + CTD
- N- and C-precursors were purified by Ni-NTA affinity chromatography.
- the two precursors were mixed together and splicing reaction was carried out in purification elution buffer (50 mM sodium phosphate, 300 mM NaCl, 250 mM imidazole, pH 8.0) with 1 mM DTT at 4°C for > 6 hours.
- purification elution buffer 50 mM sodium phosphate, 300 mM NaCl, 250 mM imidazole, pH 8.0
- urea was needed to solubilize the protein and was added to the purification buffers, and it has to be reduced to ⁇ 4 M in the final splicing mixture by dislysis.
- the mixture was then dialyzed against 50 mM potassium phosphate, pH 7.5 at 4°C for > 2 hours and reverse purified using Ni-NTA Sepharose as described above.
- Proteins that have been tested expression in E. coli includes: protein SEQ ID NO: 1- 47, 58 and precursor proteins to produce SEQ ID NO: 48-57.
- FIGs. 3 A-3D provide photomicrograph examples of protein purification of GLGL (SEQ ID NO: 11), LGLG (SEQ ID NO: 12), GLGLG (SEQ ID NO: 13), LGLGL (SEQ ID NO: 14), H-W3 (SEQ ID NO: 37) and H-W (SEQ ID NO: 38) using affinity chromatography.
- GLGL, LGLG, GLGLG and LGLGL were fused with H6-SUMO and purification was carried out in three steps: initial precursor purification using affinity chromatography, SUMO cleavage and reverse purification. The resulting silk proteins are tag free.
- H6-W3, H6-W 4 and H6-W3C ma2 are the only proteins fused to a 3 ⁇ 4 tag and purification was carried out by one step purification using affinity chromatography. The resulting silk proteins have a 3 ⁇ 4 tag at the N-terminus.
- FT1-3 reverse purification through a series of flow-throughs was carried out by nickel affinity chromatography. Ni: after reverse purification, the other H6-proteins bound to Ni-NTA Sepharose. The purified target proteins are labeled with red boxes. This was resolved by SDS-PAGE and visualized with Coomassie blue staining.
- SASPs should be able to form fibers using wet-spinning method.
- There are at least 10 exemplary SASPs that have been proven to form fibers including W 2 , W 3 , W 4 , 3 ⁇ 4-W 3 , H6-W 4 , H6-W 3 C ma 2, GLGL, LGLG, GLGLG and LGLGL.
- Fibers formed using wet-spinning method are homogeneous, smooth and continuous.
- FIG. 4A-4B shows an example of wet-spun AS fiber formed by W 3 .
- FIGs. 4A-4B light microscope images of representative AS W 3 fibers formed using wet-spinning method. Images were taken at both (FIG. 4A) 100X magnification and (FIG. 4B) at 400X magnification.
- FIGs. 5A & 5B light microscope images of representative PS 4x (AS fiber being stretched 4 times of its original length) W 3 fiber formed from wet-spinning. Images were taken at both 100X magnification (FIG. 5 A) and at 400X magnification (FIG. 5B).
- PS 4x W3 fibres spun from the TFA/TFE/H2O spinning dope were stronger, more extensible, and tougher with an engineering stress of 139 MPa, engineering strain of 47.3%, and toughness of 58.8 ⁇ 33 MJ m "3 .
- spinning dope solvent composition plays an important role in the final mechanical properties when applied to a wet-spinning/post-spin stretching method. This could be because of different pre-structuring and resulting pre-assembly within the two spinning dopes giving rise to different mechanical properties. This is reflected by the notable differences observed in the CD spectra for soluble protein in spinning dope (FIGs. 7 and 8).
- a conformational change in protein secondary structure occurs in the fibers formed from the HFIP/H2O spinning dope following post-spin stretching (FIGs. 9A-9D). Particularly, in the amide I region it can be observed that the AS fibers formed from the HFIP/H2O spinning dope are in an a-helical/p-sheet conformation, which then shifts to an enrichment in ⁇ -sheet conformation in the PS 4x fibers. This conformational change into a predominant ⁇ -sheet structure reinforces the idea that a more complete transition of the crystalline phase is likely occurring. Specifically, this could reflect an increase in ⁇ -sheet crystallization, resulting in an increase in local protein concentration and therefore promoting greater mechanical strength (Table 4).
- FIGs. 9 A Raman spectra of W 3 fibers formed from the HFIP/H2O spinning dope in perpendicular (blue) or parallel (red) alignment relative to the incident polarized scattered light are depicted.
- Full spectra range of AS fibers are shown in FIG. 9A.
- FIG. 9B depicts amide I region of AS fibers.
- FIG. 9C illustrates full spectra range of PS 4x fibers, while in FIG. 9D, the amide I region of PS 4x fibers are shown.
- TFA/TFE/H2O spinning dope in perpendicular (blue) or parallel (red) alignment relative to the incident polarized scattered light are compared.
- Full spectra range of AS fibers are shown in FIG. 10A
- the amide I region of AS fibers are shown in FIG. 10B
- FIG. IOC depicts the full spectra range of PS 4x fibers
- FIG. 10D depicts the amide I region of PS 4x fibers.
- Protein secondary structure also plays an important role in fiber mechanical properties. Better alignment leads to better strength and it can be reflected by brighter birefringence. As expected, notable differences can be observed between AS W3 fibers from the two spinning dopes (FIGs. 11 and 12). In addition, in both conditions, birefringence increased following post- spin stretching, suggesting more alignment of protein molecules within PS fibers (FIGs. 1 IB and 12B).
- FIGs. 11 A & 1 IB micrographs depicting the birefringence of W3 fibers spun from HFIP/H2O spinning dope visualized by polarized light microscopy is shown in FIG. 11 A depicting AS fibers and in FIG. 1 IB, PS 4x fibers.
- FIG. 12 provides micrographs depicting birefringence of W3 fibers spun from
- FIG. 12A depicts AS fibers and FIG. 12B depicts PS 4x fibers.
- Fibers' mechanical properties We have tested more types of hand-pulled fibers' mechanical properties and found the trends are that (1) the bigger the protein is; the better mechanical properties of the fibers are; (2) proteins with the C-terminal non-repetitive domain (CTD) have better mechanical properties.
- CCD C-terminal non-repetitive domain
- Fiber surface morphology After the automated wet-spinning method has been developed recently, we can easily test more types of wet-spun fibers and scale up for different applications.
- Fibers formed using hand-pulling method are thin (diameter -1.5-4 ⁇ ),
- FIG. 13 shows two examples of hand-pulled fibers formed by W 3 and
- optical microscopy images of indicated fiber types are provided.
- the average toughness observed for W 2 C ac fibers is more than double those for W 2 and W 3 fibers and ⁇ 1.2x that for W 4 fibers.
- FIGs. 17A-17C secondary structure and structural orientation of different types of fiber are shown.
- panel (A) Orientation-insensitive Raman spectra of indicated fibers. Amide I decomposition was based on five bands: a-helix (red); ⁇ -sheet (purple); random coil, turns, etc. (3 x green).
- FIG. 17B an overlay of amide I and amide III bands of indicated fibers from XX (perpendicular) and ZZ (parallel) directions is provided.
- FIG. 17C Izz/Ixx ratios of a-helix and ⁇ -sheet bands as indicated in FIG. 17B.
- the ⁇ -sheet structure of W 2 C ac and native aciniform silk is statistically more orientated and indicated by "*".
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Insects & Arthropods (AREA)
- Organic Chemistry (AREA)
- Textile Engineering (AREA)
- Toxicology (AREA)
- Biophysics (AREA)
- Tropical Medicine & Parasitology (AREA)
- General Chemical & Material Sciences (AREA)
- Zoology (AREA)
- Gastroenterology & Hepatology (AREA)
- Biochemistry (AREA)
- Mechanical Engineering (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Peptides Or Proteins (AREA)
Abstract
The present invention relates to SASPs that inhibit or reduce angiogenesis in various tissues. Methods for synthesizing recombinant SASPs are provided. Methods of making spider silk protein and some commercial uses thereof are also disclosed.
Description
I
ARTIFICIAL SPIDER ACINIFORM SILK PROTEINS, METHODS OF MAKING AND
USES THEREOF
CROSS REFERENCE TO RELATED PATENT APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 62/287,564, filed January 27, 2016, the disclosure of which is hereby incorporated by reference into this application in its entirety.
INCORPORATION BY REFERENCE OF MATERIAL SUBMITTED ELECTRONICALLY
[0002] Incorporated by reference in its entirety is a computer-readable sequence listing submitted concurrently herewith and identified as follows: One 639 KB ASCII (Text) file named "223274-377859_Sequence_Listing_ST25.txt," created on January 27, 2017.
TECHNICAL FIELD
[0003] The present invention relates generally to isolated artificial spider aciniform silk proteins, referred to herein as (SASPs). The present invention also relates to methods of making recombinant SASPs and uses of these SASPs to produce commercially important spun aciniform silk fibers.
BACKGROUND
[0004] Spider silks are extraordinary biomaterials with diverse and impressive mechanical properties such as high toughness and tensile strength, making them mechanically superior to synthetic materials such as polyester and nylon. Spider silks are also biodegradable, making them desirable for many downstream applications in industry and medicine.
[0005] Spiders produce up to seven types silk, serving different biological functions such as web construction and locomotion (major ampullate (MA) silk, or dragline silk) or capturing prey as the orb-web spiral (flagelliform silk). Aciniform silk (wrapping silk) is the toughest spider silk and is used to wrap and immobilize prey, build the egg case inner layer, and decorate the web. Its toughness is more than two, seven and sixty times those of spider dragline silk, Kevlar and steel, respectively.
[0006] The main component of aciniform silk is the protein aciniform spidroin 1 (AcSpl), which is typically a large protein (-300-430 kDa, depending upon the species) composed of
1
20746599.1
iterated repeats comprising >95% of the protein sequence and flanked by short N- and C- terminal non-repetitive domains (the NTD and CTD, respectively). In AcSpl oiArgiope trifasciata, the iterarted repeat is made up of a concatentated series of identical 200 amino acid units. In other species currently sequenced, these range from about 150 to 250 amino acids. Here, we refer to these iterated repeat units as "W units". AcSpl is produced and stored in the aciniform gland as a highly soluble protein and readily spun (extruded) into solid fibers when needed. Between the soluble and fibrous forms, AcSpl undergoes a structural transition, from a state composed of globular helical domains connected by intrinsically disordered linkers to a state retaining a similar proportion of disorder alongside a mixture of oriented β-sheet and a- helical domains. The conversion from a-helix to β-sheet is believed to be important in providing strength to silk fibers. Previously isolated and characterized AcSpl including the amino acid sequence of AcSpl can be found in the National Center for Biotechnology Information (NCBI) Accession No. AAR83925.1
[0007] Unlike silkworms, spiders are cannibalistic and territorial in nature, making it infeasible to collect large amounts of silk for commercial applications. As well, it has been found that spiders produce silk in lower quantities when held in captivity, making collection of large amounts even more difficult. Currently, making recombinant spider silks using different host cells is by far the most effective way to produce artificial spider silks. At present, yeast cells, mammalian cells, insect cells, bacterial cells, transgenic plants, and animals have been used to produce recombinant spider silks. However, an Escherichia coli system has many advantages over other methods because of its high yield, cost effectiveness and simplicity in cultivation and protein purification. Since fiber formation in aciniform silk is poorly understood and harvesting large amounts of fibrous material from spiders is not feasible, coupled with the ability to readily modify gene/protein sequence in E. coli, employment of recombinant spider silk proteins is therefore highly desirable.
[0008] Once the protein has been expressed and purified, solubilizing it is required for further processing. The spinning dope is known to be an essential component involved in spinning fibers and can also influence fiber mechanical properties. In most wet-spinning (fiber production) protocols, l, l,l,3,3,3-hexafluoro-2-propanol (HFIP) has been used to dissolve recombinant spider silk proteins at relatively high concentrations. It should be mentioned that a major obstacle with artificial spider silk proteins is having an effective means for solubilizing a high enough
concentrated spinning dope for wet-spinning. 100% HFIP has been used to dissolve recombinant MA silk (MaSpl, MaSp2 and ADF3), recombinant flagelliform (Flag) silk, and recombinant cylindriform/tubiliform silk (TuSpl) proteins. Aside from using organic solvents, aqueous buffer solutions such as phosphate buffered saline and urea have been used to dissolve recombinant MA-based proteins.
[0009] Other solvents have also been used on regenerated Bombyx silkworm silk fibroin to make spinning dopes for wet-spinning. Silkworms have two silk proteins known as sericin (gluelike protein), which allows for composite fibers to hold the cocoon case for protection of the silkworms, and fibroin, which acts as the structural component of the fibers. Regenerated silk fibroin is produced by removal of sericin (termed "degumming") to obtain isolated fibroin. Solvents used with regenerated Bombyx silk fibroin have been combined with inorganic salts such as lithium bromide, and have included acids, such as phosphoric acid or a phosphoric acid/formic acid mixture. Other examples include fluorinated organic solvents such as hexafluoroacetone (HFA), as well as organic solvents including dimethyl sulfoxide,
dimethylformamide and dimethylacetamide combined with organic salts such as N- methylmorpholine-N-oxide ( MMO).
[0010] Even though there is long list of solvents that have been successfully used to make spinning dopes for wet-spinning, each type of silk protein (whether from a spider or silkworm) will behave distinctly because of inherent differences at the primary through to quaternary structure levels. There are also advantages and disadvantages for each solvent type. Organic salts can be easily regenerated after use, but often require high temperatures (>100 °C) to dissolve proteins, resulting in degradation. Conversely, inorganic salts do not require high temperature, but require long dialysis times for the removal of salts, again resulting in degradation. Acids are also commonly used for dissolving regenerated silk fibroin, but degradation is again a drawback because of perturbation and attacking of peptide bonds. However, this can be avoided if wet- spinning is carried out shortly after dissolving the protein. Fluorinated organic solvents such as FTFIP and FIFA, which are potent hydrogen bond disruptors, have been reported to cause no measurable degradation to recombinant spider silk proteins.
[0011] Recombinant aciniform silk protein constructs with varying numbers of W units have been produced in E. coli and recombinant W2, W3, and W4 protein constructs were demonstrated to be capable of forming multi-centimeter silk-like fibers when manually drawn from a low-
concentration (-0.06 mg/mL) buffered aqueous solution. However, a low-concentration spinning dope is not optimal for efficient production of fibers and cannot be applied to wet-spinning (described in the next section), which is why a new spinning dope with significantly higher concentration needed to be formulated.
[0012] Wet-spinning of silk proteins
[0013] A number of methods involved in spinning silks from a dope solution have been employed. Electrospinning has been applied to regenerated Bombyx silk and recombinant MA silk proteins. Another method that has been employed is the use of microfluidic devices for elongational flow (rather than shearing flow), allowing for changes in pH, ionic strength and salt composition, essentially acting as a biomimetic of the MA gland.
[0014] Dehydration (coagulation) bath
[0015] Once the protein is solubilized in an adequately concentrated spinning dope, wet- spinning in a coagulation bath is a common method used to spin the silk protein into a fiber. As mentioned, most of what is known today about the spider spinning process is based on the processing machinery of MA spidroins from the MA gland. This machinery involves an intricate combination of pH and ionic gradient from the spinning gland (where the spinning dope is produced) to the spinning duct leading into to spinneret(s)/spigot(s) for fiber formation. This process fundamentally dehydrates the spidroin to shock it out of solution and promote self- assembly of the protein molecules into a solid fiber. Although the spinning apparatus is different in the aciniform gland, the involvement of dehydration as a key aspect of fiber formation in the MA gland implies that this may also be involved in aciniform silk fiber formation.
[0016] Following this rationale, the wet-spinning method subjects a spinning dope to shear force followed by extrusion into a coagulation bath, serving to amalgamate the protein in a solid fiber (just like the spider). Typically, either isopropyl alcohol (IP A) or methanol (MeOH) are used as dehydrating solvents. The fiber thus induced (referred to as "as-spun" (AS) at this stage) may then be wound onto a spool/collector. Although several recombinant spider silk proteins, including major ampullate silk proteins (MaSpl, MaSp2 and ADF3), flagelliform silk proteins and tubuliform silk proteins (TuSpl) have been spun into fibers using a wet-spinning method, wet-spinning of aciniform silk has not been previously reported.
[0017] Post-spin stretching treatments
[0018] To further promote molecular alignment within AS fibers, a post-spin stretching treatment is typically performed. As the name implies, this is simply carried out by stretching the AS fibers in a solvent. This step is usually used to promote favorable structural changes within the fiber that lead to improvements in mechanical properties such as strength, extensibility and toughness. Just like the wet-spinning method, the rationale of the post-spinning treatment method is based on what is known about the MA spinning gland. The spinning duct is designed to allow for elongational flow (and/or mechanical drawing) of the β-sheet crystals and/or micelle spheroids (henceforth simply referred to as spheroids) diverging into an endpoint, the spinneret(s)/spigots. Elongational flow is postulated to allow for increased velocity with a simultaneous increase in shearing forces unducing unfolding and subsequent refolding of the protein structures. This is held to promote alignment within the spheroids and result in self- assembly into fibrillar precursors for fiber formation. This hypothetical mechanism is consistent with behavior of silk in studies using small angle X-ray scattering and atomic force microscopy, confocal scanning light microscopy and electron microscopy, as well as solid-state nuclear magnetic resonance spectroscopy.
[0019] Instead of elongational flow (and/or mechanical drawing), shearing flow is applied with wet-spinning. With shearing flow, both the velocity and shearing forces are constant.
Because of this, it is likely that unfolding of the protein structures cannot occur and that alignment is hindered, resulting in aggregation of the spheroids into a solid fiber-like material. Based upon this behavior, this is the rationale for including a post-spin stretching step to further promote molecular alignment, rather than directly using AS fibers. Typically, the post-spinning bath is composed of a mixture of organic solvent and water, with water being the functional component that improves fiber properties. In most cases, organic solvent is added because many AS silk-based fibers cannot withstand water penetration without being dissolved. In order to address this issue, a two-step post-spinning treatment is often employed, with the first step using an organic solvent/water mixture followed by a second step in pure water.
SUMMARY
[0020] It is to be understood that the present invention includes a variety of different versions or embodiments, and this Summary is not meant to be limiting or all-inclusive. This Summary
provides some general descriptions of some of the embodiments, but may also include some more specific descriptions of other embodiments.
[0021] The present invention relates to isolated recombinant spider aciniform silk proteins (SASPs) comprising or consisting of an amino acid sequence as set forth in SEQ ID NOs: 1-44, and 47-84 or any homolog thereof. The recombinant SASPs can be used to manufacture silk fibers having exceptional toughness and strength.
[0022] In a related aspect, the present invention provides a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, preferably, from about 100 amino acids to about 3,000 amino acids, In some embodiments, the SASP comprises: at least one W subunit, each W subunit ranging from about 150 to 250 amino acid residues in length, and at least one non-repetitive fragment selected from: (i) a non-repetitive N-terminal fragment and (ii) a non- repetitive C-terminal fragment. In various embodiments, the N-terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2). In various embodiments, the C-terminal (CT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2). In various embodiments, the at least one W subunit, the N- terminal (NT) non-repetitive fragment, and the C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in one or more of SEQ ID NOs: 1-84 & 106-131.
[0023] In a second aspect, the present invention provides a method for producing polymeric forms of SASPs of higher molecular weight, the method comprising: (i) providing two recombinant SASP polypeptides, one of which (SASPN) is fused at its C-terminus to a split intein, having an amino acid sequence as shown in any one of SEQ ID NOs: 90-105, and/or a related split inteins, and an N-terminal domain (IN) providing the construct: SASPN-IN, and the other recombinant SASP fragment (SASPc) which is fused at its N-terminus to a split intein C- terminal domain (Ic) providing the construct: Ic-SASPc; (ii) trans-splicing in vitro to join SASPN and SASPc to form the desired full-length SASP (SASPN-SASPC) ligated by a native peptide bond. In various embodiments, exemplary fusion proteins shown in SEQ ID NO: 59-76. Two or
more inteins could be used to ligate three or more SASPs together to form an even larger protein, with exemplary fusion proteins shown in SEQ ID NO: 70-76.
[0024] In a related aspect, the present invention provides a method for producing polymeric forms of SASPs of higher molecular weight, the method comprising: (i) providing two recombinant SASP fragments, one of which (SASPN) is fused at its C-terminus to a split intein, coming from SEQ ID 89-96 or related split inteins, N-terminal domain (IN) giving SASPN-IN and the other of which (SASPc) is fused at its N-terminus to a split intein C-terminal domain (Ic) giving Ic-SASPc; (ii) co-expression of both fusion proteins, SASPN-IN and Ic-SASPc, in one organism and trans-splicing in vivo to join these proteins to form the desired full-length SASP (SASPN-SASPC) ligated by a native peptide bond, with exemplary fusion proteins shown in SEQ ID NO: 59-76. Two or more inteins could be used to ligate three or more proteins together to form an even larger protein, with exemplary fusion proteins shown in SEQ ID NO: 70-76.
[0025] In a third aspect, the present invention provides a method for producing a polymeric form of an isolated spider aciniform silk protein (SASP). In some exemplary methods, recombinant SASPs are produced by transforming a suitable host as defined herein with a vector or a nucleic acid disclosed herein, and expressing the spider silk gene under suitable conditions. The illustrative method incorporates the steps: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, or from about 100 amino acids to about 3,000 amino acids, and, optionally, (d) a purification tag; (ii) optionally, removing the purification tag from the recombinant SASP; (iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and (iv) isolating the SASP from said affinity chromatography column.
[0026] In a fourth aspect, the present invention provides a method for producing a spider silk fiber, the method comprising: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acid residues, or from about 100 amino acids to about 3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; (b) at least one of: a N-terminal, or a N-terminal (NT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from spider silk protein major ampullate spidroin 2 (MaSp2) and a C-terminal (CT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from at least one of: a C-terminal fragment of the spider silk protein aciniform spidroin 1
(AcSpl), a C-terminal fragment of the spider silk protein major ampullate spidroin 1 (MaSpl) or a C-terminal fragment of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit and optional C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in SEQ ID NOs: 1-84 & 106-131; and (c) a purification tag; (ii) removing the purification tag from the recombinant SASP; (iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and (iv) isolating the SASP from said affinity chromatography column and (v) wet-spinning the collected recombinant SASP.
[0027] In a related aspect, the wet-spinning step of the third aspect further comprises, dissolving the recombinant SASP in at least two solvents selected from the group consisting of water, phosphoric acid, acetic acid, formic acid, hydrochloric acid, sulfuric acid, nitric acid, hexafluoroisopropanol (HFIP), hexafluoropropanol (HFP), hexafluoroacetone (H A), trifluoroacetic acid (TFA), trifluoroethanol (TFE), and methylimidazolium chloride, and wherein the recombinant SASP is wet-spun at a rate of 0.1 to 20 mL/hr, or from about 0.3 to about 20 mL/hr, or from about 0.6 to about 20 mL/hr, or from about 0.1 to about 10 mL/hr, or from about 0.1 to about 5 mL/hr, in at least one coagulation-inducing solvent selected from the group consisting of methanol, ethanol, isopropanol, acetone, ammonium sulfate, and water.
BRIEF DESCRIPTION OF THE DRAWINGS
[0028] FIG. 1 depicts a schematic diagram of the wet-spinning fiber production method. In accordance with the present invention. The illustrated apparatus consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing or needle (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), and a collector for the AS fibers.
[0029] FIG. 2 depicts a schematic illustration of the post-spinning apparatus. The translational control knob allows for controlled motion of the stage, the metric ruler allows for a more consistent and precise measurement of fiber stretching, and the drain plug allows the dH20 bath to steadily drain.
[0030] FIG. 3 A-3D depict photomicrographs of coomassie stained SDS-polyacrylamide gels. Examples of protein purification of the protein constructs: GLGL (SEQ ID NO: 11), LGLG (SEQ ID NO: 12), GLGLG (SEQ ID NO: 13), LGLGL (SEQ ID NO: 14), H-W3 (SEQ ID NO:37) and
H-W4 (SEQ ID NO: 38) using affinity chromatography. Protein constructs GLGL, LGLG, GLGLG and LGLGL were fused with H6-SUMO and purification was carried out in three steps: initial precursor purification using affinity chromatography, SUMO cleavage and reverse purification. The resulting silk proteins are tag free. H6-W3 and H6-W4 are the only recombinant spidroins fused to a ¾ tag and purification was carried out by one step purification using affinity chromatography. The resulting silk proteins have ¾ tag at their N-terminus. Non: noninduced total cell lysate; T: induced total cell lysate; S: soluble fraction of cell lysate; P: insoluble fraction of cell lysate; U: unattached proteins without ¾ tag; E: elution of He-silk proteins; C: after SUMO protease cleavage and overnight dialysis; FT1-3: reverse purification through a series of flow-throughs was carried out by nickel affinity chromatography. Ni: after reverse
purification, the other Η6- proteins bound to Ni-NTA Sepharose. The purified target proteins are labeled with boxes.
[0031] FIG. 4A and FIG. 4B depict photographs of light microscope images of representative W3 AS fibers formed using wet-spinning method. Images were taken at both 100X magnification (FIG. 4A) and 400X magnification (FIG. 4B).
[0032] FIG. 5 A and FIG. 5B depict photographs of light microscope images of representative PS 4x (AS fiber being stretched 4 times its original length) W3 fiber formed from wet-spinning. Images were taken at both 100X magnification (FIG. 5 A) and 400X magnification (FIG. 5B).
[0033] FIG. 6A-6C depict combined line graphs of data related to stress-strain curves for AS and PS 4x W3 fibers spun from the HFIP/H2O spinning dope (represented as dotted lines), and AS and PS 4x W3 fibers spun from the TFA/TFE/H2O spinning dope (represented as solid lines).
[0034] FIG. 7 depicts a line graph representing analysis of W3 protein secondary structure in HFIP/H2O. Far-UV CD spectra of either 8% (w/v) W3 in HFIP/H2O (represented as solid lines) or 0.8% (w/v) W3 in HFIP/H2O (represented as dotted lines).
[0035] FIG. 8 depicts a line graph representing analysis of W3 protein secondary structure in TFA/TFE/H2O. Far-UV CD spectra of either 10% (w/v) W3 in TFA/TFE/H2O (represented as solid lines) or 1% (w/v) W3 in TFA/TFE/H2O (represented as dotted lines).
[0036] FIG. 9A-9D depict line graphs of Raman spectra of W3 fibers spun from the
HFIP/H2O spinning dope in perpendicular (represented as dotted lines) or parallel (represented as solid lines) alignment relative to the incident polarized scattered light. Full spectra range of AS
fibers (FIG. 9A), amide I region of AS fibers (FIG. 9B), full spectra range of PS 4x fibers (FIG. 9C) and amide I region of PS 4x fibers (FIG. 9D).
[0037] FIG. 10A-10D depict line graphs of Raman spectra of W3 fibers spun from the TFA/TFE/H2O spinning dope in perpendicular (represented as dotted lines) or parallel
(represented as solid lines) alignment relative to the incident polarized scattered light. Full spectra range of AS fibers (FIG. 10A), amide I region of AS fibers (FIG. 10B), full spectra range of PS 4x fibers (FIG. IOC) and amide I region of PS 4x fibers (FIG. 10D).
[0038] FIG. 11 A and FIG. 1 IB depict photographs of birefringence of W3 fibers spun from HFIP/H2O spinning dope visualized by polarized light microscopy of AS fibers as shown in FIG. 11 A, and PS 4x fibers as shown in FIG. 1 IB.
[0039] FIG. 12A and FIG. 12B depict photographs of birefringence of W3 fibers spun from TFA/TFE/H2O spinning dope visualized by polarized light microscopy of AS fibers as shown in FIG. 12A, and PS 4x fibers as shown in FIG. 12B.
[0040] FIG. 13 A and FIG. 13B depict photomicrographs of W2Cac as shown in FIG. 13 A; and W3 fibers as shown in FIG. 13B.
[0041] FIG. 14 depicts a line graph representing illustrative stress-strain curves of W3 and W2Cac fibers.
[0042] FIG. 15A-15D depict bar graphs illustrating four mechanical properties of indicated fibers.
[0043] FIG. 16 depicts a line graph of Far-UV circular dichroism spectroscopy of indicated protein in 50 mM potassium phosphate, pH 7.5, at 22°C.
[0044] FIG. 17A-17C depict secondary structure and structural orientation of different fibers. In Panel A, Raman spectra of indicated fibers are presented. Amide I decomposition was based on five bands: a-helix; β-sheet; random coil, turns, etc. Panel B depicts an overlay of amide I and amide III bands of indicated fibers from XX (perpendicular) and ZZ (parallel) directions. Panel C depicts bar graphs illustrating WIxx ratios of a-helix and β-sheet bands indicated in (Panel B). The β-sheet structure of W2Cac and native aciniform silk is statistically more orientated and indicated by "*".
[0045] FIG. 18A-18C depict intein splicing of SASPs. FIG. 18A depicts examples of screening intein splicing activity of the W4 protein. FIG. 18B depicts a schematic diagram of an exemplary production scheme of the W2 protein. In the illustrated SDS-PAGE gel, splicing
products were passed through a Ni-NTA column, allowing the spliced W2 to be collected in the flow-through while all other proteins were He-tagged and thus retained in the column. FIG. 18C in the illustrated SDS-PAGE gel shows production of W6 and NW4 by split-intein-mediated tram-splicing (IN and Ic refer to N- and C-terminal intein fragments, respectively). Precursor proteins W4IN and IcW2 were mixed to produce splicing product W6; NW2IN and IcW2 were mixed to produce splicing product NW4.
[0046] FIG. 19 depicts a schematic of an apparatus useful in the optimization of wet-spinning of spinning dope comprising the recombinant SASPs of the present invention.
[0047] FIG. 20 depicts a schematic illustration of an actual, fully automated wet-spinning apparatus described in FIG. 19.
[0048] FIG. 21 is a schematic illustration of a front perspective view of an exemplary fiber- spinning apparatus in accordance with the various embodiments of the present invention.
[0049] FIG. 22 depicts a schematic of a front plan view of an exemplary fiber-spinning apparatus in accordance with the various embodiments of the present invention.
[0050] FIG. 23 is a schematic illustration of a rear perspective view of an exemplary fiber- spinning apparatus in accordance with the various embodiments of the present invention.
[0051] FIG. 24A-FIG. 24B are line graphs depicting the stress-strain characteristics of recombinant SASPs in accordance with the embodiments of the present invention.
[0052] These figures are provided by way of example and are not intended to limit the scope of the invention.
DETAILED DESCRIPTION
DEFINITIONS:
[0053] For purposes of this disclosure, unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. See, e.g. Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994); Sambrook et al.,
Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press (Cold Spring Harbor, N.Y. 1989). These references are hereby incorporated into this disclosure by reference in their entireties.
[0054] Before the present compositions and methods are described, it is to be understood that
any invention is not limited to the particular processes, compositions, or methodologies described, as these may vary. Moreover, the processes, compositions, and methodologies described in particular embodiments are interchangeable. Therefore, for example, a composition, dosage regimen, route of administration, and so on described in a particular embodiment may be used in any of the methods described in other particular embodiments. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the present invention, which will be limited only by the appended claims. Unless clearly defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. Although any methods similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, the preferred methods are now described. All publications and references mentioned herein are incorporated by reference. Nothing herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.
[0055] It must be noted that, as used herein, and in the appended claims, the singular forms "a", "an", and "the" include plural reference unless the context clearly dictates otherwise.
[0056] Embodiments including the transition phrase "consisting of or "consisting essentially of include only the recited components and inactive ingredients.
[0057] As used herein, the term "about" means plus or minus 10% of the numerical value of the number with which it is being used. Therefore, about 50% means in the range of 45%-55%.
[0058] "Optional" or "optionally" may be taken to mean that the subsequently described structure, event or circumstance may or may not occur, and that the description includes both instances where the event occurs and instances where it does not.
[0059] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Practitioners are particularly directed to Current Protocols in Molecular Biology (Ansubel) for definitions and terms of the art. Abbreviations for amino acid residues are the standard 3 -letter and/or 1 -letter codes used in the art to refer to one of the 20 common L-amino acids.
[0060] The term "amino acid" not only encompasses the 20 common amino acids in naturally synthesized proteins, but also includes any modified, unusual, or synthetic amino acid. One of ordinary skill in the art would be familiar with modified, unusual, or synthetic amino acids.
[0061] As used herein, "protein" is a polymer consisting essentially of any of the 20 amino acids. Although "polypeptide" is often used in reference to relatively large polypeptides, and "peptide" is often used in reference to small polypeptides, usage of these terms in the art overlaps and is varied. The terms "peptide(s)", "protein(s)" and "polypeptide(s)" are used interchangeably herein.
[0062] As used herein, the term "polypeptide" refers to a polymer of amino acids and does not refer to a specific length of the product. Thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not exclude post-expression modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like. Included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring. As is known in the art, "proteins", "peptides," "polypeptides" and "oligopeptides" are chains of amino acids (typically L-amino acids) whose alpha carbons are linked through peptide bonds formed by a condensation reaction between the carboxyl group of the alpha carbon of one amino acid and the amino group of the alpha carbon of another amino acid. Typically, the amino acids making up a protein are numbered in order, starting at the amino terminal residue and increasing in the direction toward the carboxy terminal residue of the protein.
[0063] As used herein, a polypeptide or protein "domain" comprises a region along a polypeptide or protein that comprises an independent unit. Domains may be defined in terms of structure, sequence and/or biological activity. In one embodiment, a polypeptide domain may comprise a region of a protein that folds in a manner that is substantially independent from the rest of the protein. Domains may be identified using domain databases such as, but not limited to PFAM, PRODOM, PROSITE, BLOCKS, PRINTS, SBASE, ISREC PROFILES, SAMRT, and PROCLASS.
[0064] The term "wild-type" or "native" (used interchangeably) refers to the naturally- occurring polynucleotide sequence encoding a protein, or a portion thereof, or protein sequence, or portion thereof, respectively, as it normally exists in vivo.
[0065] The term "isolated" or "purified" polypeptide as used herein refers to a polypeptide that has been separated or purified from cellular components that naturally accompany it.
Typically, the polypeptide is considered "purified" when it is at least 70% (e.g., at least 75%, 80%), 85%o, 90%), 95%o, or 99%) by dry weight, free from the proteins and naturally occurring molecules with which it is naturally associated.
[0066] The term "mutant" refers to any change in the genetic material of an organism, in particular a change (i.e., deletion, substitution, addition, or alteration) in a wild-type
polynucleotide sequence or any change in a wild-type protein sequence. The term "variant" is used interchangeably with "mutant". Although it is often assumed that a change in the genetic material results in a change of the function of the protein, the terms "mutant" and "variant" refer to a change in the sequence of a wild-type protein regardless of whether that change alters the function of the protein (e.g., increases, decreases, imparts a new function), or whether that change has no effect on the function of the protein (e.g., the mutation or variation is silent).
[0067] The term "nucleic acid" is well known in the art. A "nucleic acid" as used herein will generally refer to a molecule (i.e., a strand) of DNA, RNA or a derivative or analog thereof, comprising a nucleobase. A nucleobase includes, for example, a naturally occurring purine or pyrimidine base found in DNA (e.g., an adenine "A," a guanine "G," a thymine "T" or a cytosine "C") or RNA (e.g., an A, a G, an uracil "U" or a C). The term "nucleic acid" encompass the terms "oligonucleotide" and "polynucleotide," each as a subgenus of the term "nucleic acid." The term "oligonucleotide" refers to a molecule of between 3 and about 100 nucleobases in length. The term "polynucleotide" refers to at least one molecule of greater than about 100 nucleobases in length.
[0068] These definitions refer to a single-stranded or double-stranded nucleic acid molecule. Double stranded nucleic acids are formed by fully complementary binding, although in some embodiments a double stranded nucleic acid may form by partial or substantial complementary binding. Thus, a nucleic acid may encompass a double-stranded molecule that comprises one or more complementary strand(s) or "complement(s)" of a particular sequence, typically comprising a molecule. As used herein, a single stranded nucleic acid may be denoted by the prefix "ss" and a double stranded nucleic acid by the prefix "ds".
[0069] As used herein, a "nucleotide" refers to a nucleoside further comprising a "backbone moiety". A backbone moiety generally covalently attaches a nucleotide to another molecule comprising a nucleotide, or to another nucleotide to form a nucleic acid. The "backbone moiety" in naturally occurring nucleotides typically comprises a phosphorus moiety, which is covalently
attached to a 5-carbon sugar. The attachment of the backbone moiety typically occurs at either the 3'- or 5'-position of the 5-carbon sugar. However, other types of attachments are known in the art, particularly when a nucleotide comprises derivatives or analogs of a naturally occurring 5- carbon sugar or phosphorus moiety.
[0070] The terms "polynucleotide sequence" and "nucleotide sequence" are also used interchangeably herein.
[0071] As used herein, the term "upstream" refers to a residue that is N-terminal to a second residue where the molecule is a protein, or 5' to a second residue where the molecule is a nucleic acid. Also as used herein, the term "downstream" refers to a residue that is C-terminal to a second residue where the molecule is a protein, or 3' to a second residue where the molecule is a nucleic acid. Also, the terms "portion" and "fragment" are used interchangeably to refer to parts of a polypeptide, nucleic acid, or other molecular construct.
[0072] The term "vector" refers to a nucleic acid molecule that may be used to transport a second nucleic acid molecule into a cell. In one embodiment, the vector allows for replication of DNA sequences inserted into the vector. The vector may comprise a promoter to enhance and/or maintain expression of the nucleic acid molecule in at least some host cells. Vectors may replicate autonomously (extrachromasomally) or may be integrated into a host cell chromosome. In one embodiment, the vector may comprise an expression vector capable of producing a protein or a nucleic acid derived from at least part of a nucleic acid sequence inserted into the vector.
[0073] The term "recombinant" as used herein in relation to a polynucleotide intends a polynucleotide of semisynthetic, or synthetic origin, or encoded by cDNA or genomic DNA ("gDNA") such that it is not entirely associated with all or a portion of a polynucleotide with which it is associated in nature. The term "recombinant" as used herein in relation to a protein intends a protein produced in a non-arachnid organism transformed or transfected with a polynucleotide encoding a SASP or a homolog thereof, and isolated from the non-arachnid organism. In some embodiments, the recombinant SASPs are produced in prokaryotic organisms, for example, bacteria.
[0074] As is known in the art, conditions for hybridizing nucleic acid sequences to each other can be described as ranging from low to high stringency. Generally, highly stringent
hybridization conditions refer to washing hybrids in low salt buffer at high temperatures.
Hybridization may be to filter bound DNA using hybridization solutions standard in the art such as 0.5M NaHPO.sub.4, 7% sodium dodecyl sulfate (SDS), at 65 °C, and washing in 0.25 M NaHPO.sub.4, 3.5% SDS followed by washing 0.1 x SSC/0.1% SDS at a temperature ranging from room temperature to 68°C depending on the length of the probe (see e.g. Ausubel, F. M. et al., Short Protocols in Molecular Biology, 4th Ed., Chapter 2, John Wiley & Sons, N. Y.). For example, a high stringency wash comprises washing in 6 x S SC/0.05% sodium pyrophosphate at 37 °C for a 14 base oligonucleotide probe, or at 48 °C for a 17 base oligonucleotide probe, or at 55 °C for a 20 base oligonucleotide probe, or at 60 °C for a 25 base oligonucleotide probe, or at 65 °C for a nucleotide probe about 250 nucleotides in length. Nucleic acid probes may be labeled with radionucleotides by end-labeling with, for example, [γ-32Ρ] ATP, or incorporation of radiolabeled nucleotides such as [a-32P]dCTP by random primer labeling. Alternatively, probes may be labeled by incorporation of biotinylated or fluorescein labeled nucleotides, and the probe detected using Streptavidin or anti-fluorescein antibodies.
[0075] The terms "identity" or "percent identical" refer to sequence identity between two amino acid sequences or between two nucleic acid sequences. Percent identity can be determined by aligning two sequences and refers to the number of identical residues (i.e., amino acid or nucleotide) at positions shared by the compared sequences. Sequence alignment and comparison may be conducted using the algorithms standard in the art (e.g. Smith and Waterman, 1981, Adv.
Appl. Math. 2:482; Needleman and Wunsch, 1970, J. Mol. Biol. 48:443; Pearson and Lipman,
1988, Proc. Natl. Acad. Sci., USA, 85:2444) or by computerized versions of these algorithms
(Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science
Drive, Madison, Wis.) publicly available as BLAST and FASTA. Also, ENTREZ, available through the National Institutes of Health, Bethesda MD, may be used for sequence comparison.
In one embodiment, the percent identity of two sequences may be determined using GCG with a gap weight of 1, such that each amino acid gap is weighted as if it were a single amino acid mismatch between the two sequences. For example, the term at least 90% identical thereto includes sequences that range from 90 to 100% identity to the indicated sequences and includes all ranges in between. Thus, the term at least 90% identical thereto includes sequences that are
91, 91.5, 92, 92.5, 93, 93.5, 94, 94.5, 95, 95.5, 96, 96.5, 97, 97.5, 98, 98.5, 99, 99.5 percent identical to the indicated sequence. Similarly, the term "at least 70% identical includes sequences that range from 70 to 100% identical, with all ranges in between. The determination of percent
identity is determined using the algorithms described here.
[0076] As used herein, "homology" refers to the degree of similarity between two proteins and or nucleic acid sequences. Homologous proteins are those that are similar in sequence and function. Typically, the sequence identity between two homologous sequences will be at least 50%. Also, homologous proteins will have conservative substitutions for non-identical sequences. In alternate embodiments, the sequence identity between two homologous sequences will be at least 60%; or at least 75%; or at least 80%; or at least 90%, or at least 95%, or at least 96%), or at least 97%, or at least 98%, or at least 99%. Also, as used herein, the term
"homologue" means a polypeptide having a degree of homology with the wild-type amino acid sequence. Homology comparisons can be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs can calculate percent homology between two or more sequences (e.g. Wilbur, W. J. and Lipman, D. J., 1983, Proc. Natl. Acad. Sci. USA, 80:726-730).
[0077] As used herein, the term "spider" refers to air-breathing chelicerate arthropods that have two body segments, eight legs, no chewing parts and that make silk. Spiders of the present invention are from the Arachnida class, the Araneae order, and include for example Nephilidae (especially Nephila species, like clavipes), and Araneidae such as Araneus and Argiopes, among many other suitable species. In some embodiments, spider species from which the recombinant SASP are derived may include: Argiope trifasciata, Argiope amoena, Euprosthenops australis, and Araneus diadematus
[0078] As used herein, "silk" includes proteins and peptides produced by arthropods, typically by spiders, or by Lepidoptera, that display properties typical of native silk peptides. Lepidopteran silk generally is made up of a heavy fibroin polypeptide and a light chain fibroin peptide that are joined by a disulfide bond. In spiders there are two or more peptides not joined by a disulfide bond. Thus, silk includes proteinaceous filaments produced by insects or spiders, typically (but not necessarily) of two or more polypeptides. These may be chemically linked, and are typically very long polypeptides.
[0079] The term "fiber" as used herein relates to polymers having a thickness of at least 0.1 μηι, preferably macroscopic polymers that are visible to the human eye, i.e. having a thickness of at least 1 μηι, and have a considerable extension in length compared to its thickness, preferably above 5 mm. The term "fiber" does not encompass unstructured aggregates or precipitates.
[0080] A native spider silk polypeptide is one of the proteins or polypeptides, or fragments thereof, produced by spider silk glands. As used herein, a native spider silk polypeptide is a polypeptide having at least 99% identity, or in some cases 100% identity, to a native spider silk heavy and/or light fibroin polypeptide. Spider silk is a protein based fiber. In some
embodiments, a native spider silk protein is an AcSpl spider fibroin polypeptide. It is known for its high strength and elasticity. Each species of spider produces several kinds of silk, and the silks vary in sequence between the species. Each of these types of silk is encompassed by the present invention. Some of the varieties of silk produced by spiders for which either the natural peptides, or recombinant variants are encompassed by the methods, compositions and systems of the invention are: aciniform silk— a tough and elastic silk that is used to wrap captured prey;
[0081] As used herein, a "variant of a spider silk" or a "variant of a spider silk polypeptide" comprises or consists of a synthetic non-naturally occurring polypeptide having amino acid domains, such as beta-sheets and alpha helices that are derived from, or homologous to, those domains as found in spider silk proteins. In certain embodiments, a spider silk analog
polypeptide is comprised of peptide domains that are at least 50%, 60%, 70%, 80%, 90%, 95%, 96%), 97%), or 98% identical to native spider silk. For example, the SASPs may comprise, or consist of, a sequence made up of a plurality of alternating spider silk beta-sheet sequences and alpha helices as described herein. In certain embodiments, the spider silk polypeptide may comprise from 4 to 1000, or 4 to 800, or 4 to 500, or 5 to 200, or 5 to 100, or 5 to 50, or 6 to 40, or 6 to 30, or 6 to 15 or 6 to 12, or about 9 beta-sheet domains. The beta sheet regions may comprise a plurality of consecutive alanine residues, or a plurality of other amino acids that can form hydrogen bonds and that are typically arranged in consecutive order in beta sheet regions, and may range from about or 3 to 50, or 4 to 40, or 4 to 30, or 4 to 15, or 4 to 12, or 6 to 10, or about 9 consecutive hydrogen bonding amino acids (e.g., (Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala- Ala). In certain embodiments, the SASPs may comprise from 4 to 1000, or 4 to 800, or 4 to 500, or 5 to 200, or 5 to 100, or 5 to 50, or 6 to 40, or 6 to 30, or 6 to 15 or 6 to 12, or about 9 or 10 alpha helix domains. The alpha helix domains may comprise a plurality of glycine residues interspersed with other amino acids (e.g., Q, Y, L, S. R, A or P) typically found in alpha helix domains, and may range from about 4 to 200, or 5 to 100, 5 to 50, or 6 to 45, or 12 to 40, or 12 to 45 amino acids in length.
[0082] In certain embodiments, the spider silk peptide domains are derived from, i.e., are at
least 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97%, or 98% identical to spider silk fibroin sequences. Also, a SASP may comprise a single polypeptide having a mixture of different spider silk polypeptide domains, or analogs thereof, either from the same or different species.
[0083] SASPs may be generated using molecular techniques. For example, PCR mutagenesis of DNA encoding the spider silk peptide analogs can be used. Or RNA based mutagenesis techniques may be used. An example of a PCR technique for making mutations in DNA is described in WO 92/22653. Another method for making analogs, muteins, and derivatives, is cassette mutagenesis based on the technique described by Wells, Gene, (1985) 34:315. Or, chemical modification of the peptides may be performed.
[0084] Thus, the SASPs may contain amino acid substitutions, deletions, or insertions. The amino acid substitutions can be conservative amino acid substitutions or substitutions to eliminate non-essential amino acid residues such as to alter a glycosylation site, a
phosphorylation site, an acetylation site, or to minimize misfolding by substitution or deletion of one or more cysteine residues that are not necessary for function.
[0085] As used herein, the term "conserved residues" refers to amino acids that are the same among a plurality of proteins having the same structure and/or function. A region of conserved residues may be important for protein structure or function. Thus, contiguous conserved residues as identified in a three-dimensional protein may be important for protein structure or function. To find conserved residues, or conserved regions of 3-D structure, a comparison of sequences for the same or similar proteins from different species, or of individuals of the same species, may be made.
[0086] Conservative amino acid replacements are those that take place within a family of amino acids that are related in their side chains. Genetically encoded amino acids are can be divided into four families: (1) acidic=aspartate, glutamate; (2) basic=lysine, arginine, histidine; (3) nonpolar=alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar=glycine, asparagine, glutamine, cysteine, serine, threonine, tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. In similar fashion, the amino acid repertoire can be grouped as (1) acidic=aspartate, glutamate; (2) basic=lysine, arginine histidine, (3) aliphatic=glycine, alanine, valine, leucine, isoleucine, serine, threonine, with serine and threonine optionally be grouped separately as aliphatic- hydroxyl; (4) aromatic=phenylalanine, tyrosine, tryptophan; (5) amide=asparagine, glutamine;
and (6) sulfur-containing=cysteine and methionine, (see, for example, Biochemistry, 2nd ed., Ed. by L. Stryer, WH Freeman and Co. ,1981). Conservative amino acid substitutions are generally those that preserve the general charge, hydrophobicity/hydrophilicity and/or steric bulk of the amino acid substituted, for example, substitutions between the members of the following groups are conservative substitutions: Gly/Ala, Val/Ile/Leu, Asp/Glu, Lys/Arg, Asn/Gln, Ser/Cys/Thr and Phe/Trp/Tyr.
[0087] Whether a change in the amino acid sequence of a peptide results in a structural SASP of the present invention with the mechanical attributes can be readily determined by assessing the ability of the recombinant SASP to produce a fiber with the desired mechanical properties in a fashion similar to the wild-type spider aciniform silk protein. Polypeptides in which more than one amino acid replacement has taken place can readily be spun into a fiber and tested in the same manner.
[0088] "Percent (%) amino acid sequence identity" with respect to a peptide or polypeptide sequence is defined as the percentage of amino acid residues in a candidate sequence that are identical with the amino acid residues in the specific peptide or polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent amino acid sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN or Megalign (DNASTAR) software. Those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. For purposes herein, % amino acid sequence identity values are generated using the sequence comparison computer program ALIGN-2, as described in U.S. Pat. No. 6,828, 146.
A. ISOLATED SPIDER ACINIFORM SILK PROTEINS ( SASPs)
[0089] In various embodiments of the present invention, the inventors have discovered a method to express, purify, and isolate modified recombinant SASPs. In some embodiments, the recombinant SASPs of the present invention are produced in prokaryotic organisms, such as bacteria.
[0090] In some emnbodiments, the present invention provides a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, or from about 100 amino acids to about 3,000 amino acids. In some embodiments, the SASP comprises: at least one W subunit, each W subunit ranging from about 150 to 250 amino acid residues in length, and at least one of: (a) a non-repetitive N-terminal fragment and (b) a non-repetitive C-terminal fragment. In various embodiments, the N-terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2). In various embodiments, the C-terminal (CT) non- repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2). In various embodiments, the at least one W subunit, the N-terminal (NT) non- repetitive fragment, and the C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in one or more of SEQ ID NOs: 1-131, or one or more of SEQ ID NOs: 1-76, and 106-131, for example, one or more of SEQ ID NOs: 1-84, and 106-131, or for example one or more of SEQ ID NOs: 1-44.
[0091] In these various examples, the at least one W repeat unit, the non-repetitive N- terminal (NT) fragment and the non-repetitive C-terminal (CT) fragment have an amino acid sequence as shown in any one or more of SEQ ID NOs: 1-123. In some preferred exemplary SASPs, the isolated recombinant SASP or a homolog of any thereof, is derived from the arachnid genus and species Argiope trifasciata or a combination of both Argiope trifasciata and
Euprosthenops australis or Argiope trifasciata and Argiope amoena. As used herein "W unit", "W subunit" and "W repeat unit" are used interchangeably.
[0092] In various embodiments, the recombinant SASPs comprise an amino acid sequence as set forth in any one or more of SEQ ID NOs: 1-44 & 47-84, for example, SEQ ID NOs: 1-44, or any homolog thereof. In some embodiments, the isolated SASPs of the present invention comprise or consist of one or more polypeptides having an amino acid sequence as set forth in SEQ ID NOs: 1-84 & 106-131. In various embodiments, the recombinant SASPs of the present invention also contemplate homolog proteins having an amino acid sequence as set forth in SEQ ID NO: 1-131, for example, SEQ ID NOs: 1-44 & 47-84, wherein one to ten, for example, one to
seven, or one to five amino acids are substituted with a conservative amino acid substitution. In other embodiments, recombinant SASPs of the present invention also contemplate allelic variants of the recombinant SASPs expressed in various non-arachnid organisms.
[0093] In some preferred embodiments, the isolated recombinant SASP comprises or consists of one or more polypeptides having an amino acid sequence as set forth in any one or more of SEQ ID NOs: 1-84 & 106-131, for example, SEQ ID NOs: 1-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-58 and 59-84. In some embodiments, an exemplary SASP of the present invention, comprises or consists of one or more polypeptides having an amino acid sequence as set forth in any one or more of SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-58 and 59-84. In each case recited herein, where the exemplary SASP comprises two or more polypeptides having an amino acid sequence set forth in two or more of SEQ ID NOs: 1-131, the polypeptides are fused together to form one contiguous SASP.
[0094] In other exemplary embodiments, an isolated recombinant spider aciniform silk protein (SASP) or a homolog of any thereof, comprises or consists of one or more peptides having an amino acid sequence selected from the group consisting of SEQ ID NOs: 1-84, or one or more of SEQ ID NOs: 1-44 & 47-84, or one or more peptides having an amino acid sequence of SEQ ID NOs: 15-43, or SEQ ID NO: 2, 3, 5, 6, 37, 38, 47-84 or a homolog of any thereof, each of which is optionally fused to a purification tag molecule. As used herein, the term
"purification tag molecule" intends to mean a tag such as glutathione-S-transferase (GST), c- Myc, biotin, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (¾), ¾- SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA). In various embodiments, the purification tag molecule can be operably fused in frame to either the N- terminus, the C-terminus or both termini of the SASP.
[0095] In some embodiments, exemplary recombinant SASPs comprising or consisting of an amino acid sequence of any one or more of SEQ ID NOs: 1-131, for example, SEQ ID NOs: 1- 84, or for example, SEQ ID NOs: 1-44 & 47-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-84, are provided in Table 1 below.
[0096] The W repeat unit protein refers to a protein having an amino acid sequence as shown in SEQ ID NOs: 45 or 46. In some embodiments, W is a wild-type repeat unit comprising 199 residues (SEQ ID NO: 46), if it is a one-repeat-unit protein (Wi), or it is as the first repeat unit in
a protein that comprises two to eight W units (e.g. W2-W8). The W repeat unit W, may also contain 200 amino acid residues (SEQ ID NO: 45) if it is not the first repeat unit. W# refers to a number of W units, as indicated by the subscript #, for example, W2 means a protein comprising two W units, for example, 199 aa + 200 aa = 399 aa in length. The W unit may be the wild-type Wi as shown in SEQ ID NOs: 45 or 46, or it may have an amino acid sequence as shown in SEQ ID NOs: 45 or 46, with one or more amino acid mutations in either of these W units. Illustrative SASPs which may be recombinantly produced, and/or isolated, may have one or more amino acid mutations in the native sequence of W (SEQ ID NOs: 45 or 46) and are provided in Table 1.
[0097] In some embodiments, W units also include mutated W units having polypeptide amino acid sequences as provided in SEQ ID NO: 108-123. In addition, exemplary W units also include mutated W units having polypeptide amino acid sequences as provided in SEQ ID NO: 108-123 wherein each polypeptide has an additional serine (S) positioned as the first N-terminal amino acid.
[0098] Each SASP contains at least one W unit (either wild-type or a wild-type amino acid sequence having one or more amino acid mutations) and at least one of: (a) a non-repetitive N- terminal fragment and (b) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non-repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2).
Table 1. Native and Recombinant SASPs amino acid sequences and nucleic acids.
SEQ no Amino Acid or Nucleotide Sequence
NO
1 SEQ# 1 : WiCac
AGPQGGFGATGGASAGLI SRVANALANTSTLRTVLRTGVSQQIAS SWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I S S S S S FLSTS S S SASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVI S SLNS PVGLRSASAASRLSQLTS S ITNAVGANGVDANSLARSLQS S FSALRS SGMS S SDAK I EVLLETIVGLLQLLSNTQVRG PATAS SVANSAARS FELVLA
2 SEQ#2 : W2Cac
AGPQGGFGATGGASAGLI SRVANALANTSTLRTVLRTGVSQQIAS SWQRAAQSLASTLGVDGNNLARFA
SEQID Amino Acid or Nucleotide Sequence
NO
VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQTSAFSASGAGQSAGVSVISSL NSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSSDAKIEVLLETIVGLL QLLSNTQVRG PATASSVANSAARSFELVLA
3 SEQ#3: W2Cmai
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGSASASAAASAASTVANSVSRLS SPSAVSRVSSAVSSLVSNGQVNMAALPNIISNISSSVSASAPGASGCEVIVQALLEVITALVQIVSSSSV GYI N P S AVNQ I TNWANAMAQVMG
4 SEQ#4: WiCma2
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSG GQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGP TNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLT QALG
5 SEQ#5: W2Cma2
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGP YGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAI SSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
6 SEQ#6: W4Cma2
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDT SAYAQAF SSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQAS ASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTST LRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVL NASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYT GPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGA GQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVL VQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
7 SEQ#7: GL
GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS
SEQID Amino Acid or Nucleotide Sequence
NO
DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATG
8 SEQ#8: LG
SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST
9 SEQ#9 : GLG
GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRV ANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFS SALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST
10 SEQ#10: LGL
SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATG
11 SEQ#11: GLGL
GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRV ANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFS SALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASA SSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATG
12 SEQ#12: LGLG
SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST
13 SEQ#13: GLGLG
GASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGS DTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSS SSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRV ANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFS SALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASA SSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTL RTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLN ASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST
14 SEQ#14: LGLGL
SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY
SEQ ID Amino Acid or Nucleotide Sequence
NO
TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATG
15 SEQ#15: W2 with a Cys mutation at the N- and C-terminus
CAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSD ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGC
16 SEQ#16: W2 with mutations at S29C and S143C in both W units
AGPQGGFGATGGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSCSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSCSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
17 SEQ#17: W2 with mutations at S29C and S143C in the first W unit
AGPQGGFGATGGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSCSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
18 SEQ#18: W2 with mutations at S29C and S143C in the second W unit
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSCSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
19 SEQ#19: W2 with mutations at F7A, F69A and F146A on both W units
AGPQGGAGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARAA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSALSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGAGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARAAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSALSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
20 SEQ#20: W2 with mutations at L17A and V21A on both W units
AGPQGGFGATGGASAGAISRAANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGAISRAANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
21 SEQ#21: W2 with mutations at L17T and V21T on both W units
AGPQGGFGATGGASAGTISRTANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT
SEQ ID Amino Acid or Nucleotide Sequence
NO
GGASAGTISRTANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
22 SEQ#22: W2 with mutations at T10S, I18S and V21S on both W units
AGPQGGFGASGGASAGLSSRSANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAS GGASAGLSSRSANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
23 SEQ#23: W2 with a V47P mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSPVQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSPVQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
24 SEQ#24: W2 with a V61P mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGPDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGPDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
25 SEQ#25: W2 with mutations at V61S, V71S and V74S on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGSDGNNLARFA SQASSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGSDGNNLARFASQASSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
26 SEQ#26: W2 with a F90W mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAWSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAWSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
27 SEQ#27: W2 with a F95W mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
28 SEQ#28: W2 with a R36A mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLATGVSQQIASSWQRAAQSLASTLGVDGNNLARFA
SEQ ID Amino Acid or Nucleotide Sequence
NO
VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLATGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
29 SEQ#29: W2 with a R36W mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLWTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLWTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
30 SEQ#30: W2 with a F146W mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSWLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSWLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
31 SEQ#31: W2 with a Y169W mutation on both W units
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGWTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGWTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
32 SEQ#32 : W3 with a F95W mutation on the last W unit
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAF SSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQAS ASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
33 SEQ#33: W3 with a Y169W mutation on the last W unit
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAF SSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQAS ASSTSGAGWTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
34 SEQ#34: W3 with a F95W mutation on the middle W unit
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA
SEQ ID Amino Acid or Nucleotide Sequence
NO
VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSVVQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAF SSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQAS ASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
35 SEQ#35: W3 with a Y169W mutation on the middle W unit
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGAT GGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAG SDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTS SSSASYSQASASSTSGAGWTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISR VANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAF SSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQAS ASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
36 SEQ#36: AT tag-W2Cac
KYYKAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNL ARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSV QSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGG FGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSR LPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSF LSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQTSAFSASGAGQSAGVSV ISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSSDAKIEVLLETI VGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
37 SEQ#37: H6-W3
HHHHHHAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGN NLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSG SVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQ GGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAV SRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSS SFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGAS AGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTS AYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSA SYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
38 SEQ#38: H6-W4
HHHHHHAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGN NLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSG SVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQ GGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAV SRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSS SFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGAS AGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTS AYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSA SYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANA LANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSAL FNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASST SGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQID Amino Acid or Nucleotide Sequence
NO
39 SEQ#39: WiCac with one mutation in the Cac R37G)
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVISSLNSPVGLRSASAASGLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSSDAK IEVLLETIVGLLQLLSNTQVRG PATASSVANSAARSFELVLA
40 SEQ#40: WiCac with one mutation in the Cac F67W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSWSALRSSGMSSSDAK IEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
41 SEQ#41: WiCac with one mutation in the Cac S78C
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSCDAK IEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
42 SEQ#42: WiCac with one mutation in the Cac S78W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSWDAK IEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
43 SEQ#43: WiCac with one mutation in the Cac F120W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D I SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSQTSAFSASGA GQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALRSSGMSSSDAK IEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSWELVLA
44 SEQ#44: C8Cma2
AGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGGPGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGG PGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGGPGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGG PGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGGPGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGG PGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGGPGSSAAAAAAAASGPGGYGPENQGPSGPGGYGPGG PGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASS RVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYA ASSQYAQLVGQSLTQALG
45 SEQ#45: W repeat unit (200 aa, wild-type W starting with S)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG or
SEQ ID Amino Acid or Nucleotide Sequence
NO
46 SEQ#46: W repeat unit (199 aa, wild-type W starting with A)
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
47 SEQ#47: NW2
AGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGGG SGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQQ GPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQ SLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSA AQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFG QSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDG NNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDS GSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
48 SEQ#48: NW4
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA QSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSS AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
49 SEQ#49: NW4Cma2
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA QSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSS AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYG PSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISS WSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
50 SEQ#50: W6
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSD ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA
SEQID Amino Acid or Nucleotide Sequence
NO
GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGG
51 SEQ#51: NW6
SAGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQID Amino Acid or Nucleotide Sequence
NO
52 SEQ#52: W6Cma2
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRL SSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSS IGQINYAASSQYAQLVGQSLTQALG
54 SEQ#54: W8
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQID Amino Acid or Nucleotide Sequence
NO
55 SEQ#55: NW8
SAGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGG
56 SEQ#56: W8Cma2
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGS GGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSG PTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSL
SEQ ID Amino Acid or Nucleotide Sequence
NO
TQALG
57 SEQ#57: NW8Cma2
SAGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA QSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSS AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA QSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSS AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSP QASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQ INYAASSQYAQLVGQSLTQALG
58 SEQ#58: H6-W3Cma2
HHHHHHSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDG NNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDS GSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGP QGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQA VSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSS SSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGA SAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDT SAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSS ASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGQRGPRSQGPGSGGQQGPGGQGPYG PSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISS WSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
Silk-intein fusion proteins
59 SEQ#59: WiRBnH (Wi5NH6)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSD ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCLAGDTLITL ADGRRVPIRELVSQQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIRATANHRFLTPQGW KRVDELQPGDYLALPRRIPRVLASLEHHHHHH
SEQID Amino Acid or Nucleotide Sequence
NO
60 SEQ#60: WiSGnH (Wi4NH6)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCFSGDTLVAL TDGRSVSFEQLVEEEKQGKQNFCYTIRHDGSIGVEKIINARKTKTNAKVIKVTLDNGESIICTPDHKFML RDGSYKCAMDLTLDDSLMPLHRKISTTEDSGLEHHHHHH
61 SEQ#61: W2RBnH (W25NH6)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCLAGDTLITLADGRRVPIRE LVSQQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIRATANHRFLTPQGWKRVDELQPGD YLALPRRIPRVLASLEHHHHHH
62 SEQ#62: W2SGnH (W24NH6)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCFSGDTLVALTDGRSVSFEQ LVEEEKQGKQNFCYTIRHDGSIGVEKIINARKTKTNAKVIKVTLDNGESIICTPDHKFMLRDGSYKCAMD LTLDDSLMPLHRKISTTEDSGLEHHHHHH
63 SEQ#63: NW2SGnH (NW24NH6)
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCFS GDTLVALTDGRSVSFEQLVEEEKQGKQNFCYTIRHDGSIGVEKIINARKTKTNAKVIKVTLDNGESIICT PDHKFMLRDGSYKCAMDLTLDDSLMPLHRKISTTEDSGLEHHHHHH
64 SEQ#64: NW2RBnH (NW25NH6)
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCLA GDTLITLADGRRVPIRELVSQQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIRATANHR FLTPQGWKRVDELQPGDYLALPRRIPRVLASLEHHHHHH
SEQID Amino Acid or Nucleotide Sequence
NO
65 SEQ#65: W4RBnH (W45NH6)
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCLAGDTLITLADGRRVPIRELVSQQNFSVWALNPQTYRLE RARVSRAFCTGIKPVYRLTTRLGRSIRATANHRFLTPQGWKRVDELQPGDYLALPRRIPRVLASLEHHHH HH
66 SEQ#66: NW4RBnH (NW45NH6)
SAGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGCLAGDTLITLADGRRVPIRELVS QQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIRATANHRFLTPQGWKRVDELQPGDYLA LPRRIPRVLASLEHHHHHH
67 SEQ#67: HRBcWi (H65cWi)
HHHHHHSMAAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
68 SEQ#68: HHSGcWi (Hi24cWi)
HHHHHHSMHHHHHHHSMEAVLNYNHRIVNIEAVSETIDVYDIEVPHTHNFALASGVFVHNSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQ ID Amino Acid or Nucleotide Sequence
NO
69 SEQ#69: HSGcW2 (H64CW2)
HHHHHHSMEAVLNYNHRI IEAVSETIDVYDIEVPHTHNFALASGVFVHNSAGPQGGFGATGGASAGLI SRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQ AFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQ ASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANT STLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAG VLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAG YTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
70 SEQ#70: HRBcW2 (H65CW2)
HHHHHHSMAAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
71 SEQ#71: HRBcW2Cma2 (H65cW2Cma2)
HHHHHHSMAAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPG AGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDV LVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
72 SEQ#72: HRBcW4 (H65CW4)
HHHHHHSMAAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNG VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGG
73 SEQ#73: HRBcW4Cma2 (H65cW4Cma2)
HHHHHHSMAAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP
SEQID Amino Acid or Nucleotide Sequence
NO
SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRL SSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSS IGQINYAASSQYAQLVGQSLTQALG
74 SEQ#74 :H66NW44CH6
HHHHHHGGSVRHERPSTSKLDTTLLRINSIELEDEPTKWSGFWDKDSLYLRHDYLVLHNSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGCFSGDTLVALTDGRSVSFEQLVEEEKQGKQNFCYTIRHDGSIGVEKIINA RKTKTNAKVIKVTLDNGESIICTPDHKFMLRDGSYKCAMDLTLDDSLMPLHRKISTTEDSGLEHHHHHH
75 SEQ#75 : 4NW45cH6
EAVLNYNHRI IEAVSETIDVYDIEVPHTHNFALASGVFVHNSAGPQGGFGATGGASAGLISRVANALA NTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFN AGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSG AGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLR TGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNID TLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPS TGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASS WQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL SAL LNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPL GGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLA STL GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS SAAQ G LGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSG FGGCLAGDTLITLADGRRVPIRELVSQQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIR ATANHRFLTPQGWKRVDELQPGDYLALPRRIPRVLASLEHHHHHH
SEQID Amino Acid or Nucleotide Sequence
NO
76 SEQ#76: 5NW4lcH6
AAQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHNSAGPQGGFGATGGASAGLISRVANALAN TSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNA GVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGA GYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRT GVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDT LGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPST GPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSV VQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL L NGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLG GGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL GINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGF GGCLTGDSQVLTRNGLMSIDNPQIKGREVLSYNETLQQWEYKKVLRWLDRGEKQTLSIKTKNSTVRCTAN HLIRTEQGWTRAENITPGMKILSPASLEHHHHHH
77 SEQ#77: Wi0
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQID Amino Acid or Nucleotide Sequence
NO
78 SEQ#78: Wi2
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
79 SEQ#79: NWio
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV
SEQID Amino Acid or Nucleotide Sequence
NO
LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
80 SEQ#80: WioCma2
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAA AAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSA SNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
SEQID Amino Acid or Nucleotide Sequence
NO
81 SEQ#81: NWioCma2
SAGEQGGLSLEAKTNAIASALSAAFLETTGY QQF EIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQ QGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTN PASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQA LG
82 SEQ#82: NWi2
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT SAYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR
SEQID Amino Acid or Nucleotide Sequence
NO
TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
83 SEQ#83: Wi2Cma2
SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQ RAAQ S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN G VSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGG APFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTL GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I NVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG SAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD S G S VQ S D ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGA TGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPA GSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLST SSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLIS RVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQA FSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQA SASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTS TLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGV LNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGY
SEQID Amino Acid or Nucleotide Sequence
NO
TGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGV SQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLG SRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGP SGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQA P VAS AAAS RL S S P QAS S RVS S AVS T L VS S G P TN P AS L S NAI S S WS Q VS AS N P GL S GC D VL VQAL L E I VS ALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
84 SEQ#84 :NWi2Cma2
SAGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGG GSGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQ QGPSQQGPGQQGPGGRGPYGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLR TVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNA SNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGP SGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQ IASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRV LSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGY PGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAA Q S LAS T L GVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPF GQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVD GNN LARFAVQAVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAG PQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQ AVS RL PAG S DT S AYAQAF S S AL FNAGVLNAS N I DT L G S RVL S AL LN GVS S AAQ GL G I N VD SGSVQSDISS SSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGG ASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSD TSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSS SASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVA NALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSS ALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSSASYSQASAS STSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGQRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAA GPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRVSSAVSTLVSSGPTNPASLSNAISSWSQVSASNP GLSGCDVLVQALLEIVSALVHILGSSSIGQINYAASSQYAQLVGQSLTQALG
85 SEQ#85: Nma2 (N-terminal non-repetitive domain of MaSp2 from Argiope trifasciata )
AGEQGGLSLEAKTNAIASALSAAFLETTGYVNQQFVNEIKTLIFMIAQASSNEISGSAAAAGGSSGGGGG SGQGGYGQGAYASASAAAAYGSAPQGTGGPASQGPSQQGPVSQPSYGPSATVAVTAVGGRPQGPSAPRQQ GPSQQGPGQQGPGGRGPY
SEQ ID Amino Acid or Nucleotide Sequence
NO
86 SEQ#86: Cmai (C-terminal non-repetitive domain of MaSpl from
Euprosthenops australis)
GSASASAAASAASTVANSVSRLSSPSAVSRVSSAVSSLVSNGQVNMAALPNI ISNISSSVSASAPGASGC EVIVQALLEVITALVQIVSSSSVGYINPSAVNQITNWANAMAQVMG
87 SEQ#87: CMA 2 (C-terminal non-repetitive domain of MaSp2 from Argiope trifasciata )
QRGPRSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGPGYGPGAGQQGPGSQAPVASAAASRLSSPQASSRV SSAVSTLVSSGPTNPASLSNAISSWSQVSASNPGLSGCDVLVQALLEIVSALVHILGSSSIGQINYAAS SQYAQLVGQSLTQALG
88 SEQ#88: Cac (C-terminal non-repetitive domain of AcSpl from Argiope trifasciata )
QTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSALR SSGMSSWDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
89 SEQ#89: Ctu (C-terminal non-repetitive domain of TuSpl from Nephila Antipodiana)
I SVGVPGYLRTPSSTILAPSNAQ11 SLGLQTTLAPVLSSSGLSSASASARVSSLAQSLASALSTSRGTLS LSTFLNLLSSISSEIRASTSLDGTQATVEVLLEALAALLQVINGAQITDVNVSSVPSVNAALVSALVA
90 SEQ#90 :
Intein #1: Ssp DnaX (SX) from Synechocystis species, strain PCC6803 IN: N-fragment of SX intein
CLTGDSQVLTRNGLMSIDNPQIKGREVLSYNETLQQWEYKKVLRWLDRGEKQTLSIKTKNSTVRCTANHL IRTEQGWTRAENITPGMKILSPASLE
91 SEQ#91: Intein #1: Ssp DnaX (SX) from Synechocystis species, strain PCC6803
lc: C-fragment of SX intein
GGSPQWHTNFEEVESVTKGQVEKVYDLEVEDNHNFVANGLLVHN
92 SEQ#92 :
Intein #2: Ter ThyX (TX) from Trichodesmium erythraeum IMS101
2N: N-fragment of TX intein
CLSGNTKVRFRYSSSSQEAKYYEETIEKLANLWHYGSKNQYTSKDAKCMQENISSRNIFTLDTQTNQIVS SKITNIYINGEKETYTIKTVSGKEIRATLEHQFWTNQGWKRLKDFNNSTQLCEVQLASLE
93 SEQ#93: Intein #2: Ter ThyX (TX) from Trichodesmium erythraeum IMS101 2c : C-fragment of TX intein
GGSGVFVEIESIEKFGKEITYDLEVEHPEHNFIANGLWHN
94 SEQ#94 :
Intein #3: Ter DnaE3 (TE3) from Trichodesmium erythraeum IMS101 3N: N-fragment of TE3 intein
CLTYETEIMTVEYGPLPIGKIVEYRIECTVYTVDKNGYIYTQPIAQWHNRGMQEVYEYSLEDGTVIRATP EHKFMTEDGQMLPIDEIFERNLDLKCLGTLELEASLE
SEQ ID Amino Acid or Nucleotide Sequence
NO
95 SEQ#95: Intein #3: Ter DnaE3 (TE3) from Trichodesmium erythraeum
IMS101
3C: C-fragment of TE3 intein
GGSVKIVSHKLAKTENVYDIGVTKDHN
96 SEQ#96:
Intein #4: Ssp GyrB (SG) from Synechocystis species, strain PCC6803 4N: N-fragment of SG intein
CFSGDTLVALTDGRSVSFEQLVEEEKQGKQNFCYTIRHDGSIGVEKIINARKTKTNAKVIKVTLDNGESI ICTPDHKFMLRDGSYKCAMDLTLDDSLMPLHRKISTTEDSGLE
97 SEQ#97 :
Intein #4: Ssp GyrB (SG) from Synechocystis species, strain PCC6803 4C: C-fragment of SG intein
SMEAVLNYNHRI IEAVSETIDVYDIEVPHTHNFALASGVFVHN
98 SEQ#98 :
Intein #5: Rma DnaB (RB) from Rhodothermus marinus
5N: N-fragment of RB intein
CLAGDTLITLADGRRVPIRELVSQQNFSVWALNPQTYRLERARVSRAFCTGIKPVYRLTTRLGRSIRATA NHRFLTPQGWKRVDELQPGDYLALPRRIP
99 SEQ#99 :
Intein #5: Rma DnaB (RB) from Rhodothermus marinus
5c : C-fragment of RB intein
AQSDVYWDPIVSIEPDGVEEVFDLTVPGPHNFVANDIIAHN
100 SEQ#100 :
Intein #6: Cne-AD PRP8 (CP) from Cryptococcus neoformans
6N: N-fragment of CP intein
CLQNGTRLLRADGSEVLVEDVQEGDQLLGPDGTSRTASKIVRGEERLYRIKTHEGLEDLVCTHNHILSMY KERSGSERAHSPSADLSLTDSHERVDVTVDDFVRLPQQEQQKYQLFRSTASLE
101 SEQ#101:
Intein #6: Cne-AD PRP8 (CP) from Cryptococcus neoformans
6c : C-fragment of CP intein
GGSVRHERPSTSKLDTTLLRINSIELEDEPTKWSGFWDKDSLYLRHDYLVLHN
102 SEQ#102 :
Intein #7: SG-E: evolved intein
7N~N-fragment :
CFSGDTLVALTDGRSVSFEQLVEEEKQGKQNFCYTIRHDGSIGVEKIINARKTKTNAKVIKVTLDNGESI ICTPDHKFMLRDGSYKCAMELTHDDSLMPFHRKISTTEDSGLE
SEQ ID Amino Acid or Nucleotide Sequence
NO
103 SEQ#103 :
Intein #7 SG-E: evolved intein
7c-C-fragment :
EAVLNYNRRI IEAVSETIDVYDIEVPHTHNFALASGVFVHN
104 SEQ#104 :
Intein #8 CP-E: evolved intein
8N-fragment :
CLQNGTRLLRADGSEVLVEDVQEGDQLLGPDGTSRTASKIVRGEERLYRIKIHEGLEDLVCTHNHILPMY KERSGSERAHSPSADLSLTDSHERVDVTVDDFVRLPQQEQQKYRLFRSTASLE
105 SEQ#105 :
Intein #8 CP-E: evolved intein
8c- fragment : GGSGRHERPSTSKLDITLLRIDSIELEDEPTKWSGFWDKDRLYLRHDYLVLHN
106 SEQ#106:
W: Cys mutation at N-terminus
CAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARF AVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSD ISSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGS
107 SEQ#107 :
W: Cys mutation C-terminus
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGC
108 SEQ#108 :
W: Cys mutation at 29
AGPQGGFGATGGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
109 SEQ#109 :
W: Cys mutation at 143
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSCSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
110 SEQ#110:
W: Cys mutation at 29 and 143
AGPQGGFGATGGASAGLISRVANALANTCTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSCSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
111 SEQ#111:
W: mutation at F7A, F69A and F146A
AGPQGGAGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARAA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSALSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQ ID Amino Acid or Nucleotide Sequence
NO
112 SEQ#112 :
W: mutation at L17A and V21A
AGPQGGFGATGGASAGAISRAANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
113 SEQ#113:
W: mutation at L17T and V21T
AGPQGGFGATGGASAGTISRTANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
114 SEQ#114 :
W: mutation at T10S, I18S, V21S
AGPQGGFGASGGASAGLSSRSANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
115 SEQ#115:
W: mutation at V47P
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSPVQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
116 SEQ#116:
W: mutation at V61P
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGPDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFG
117 SEQ#117 :
W: mutation at V61S, V71S, V74S
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGSDGNNLARFA SQASSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
118 SEQ#118 :
W: mutation at F90W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAWSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
119 SEQ#119:
W: mutation at F95W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALWNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
120 SEQ#120 :
W: mutation at R36A
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLATGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
SEQ ID Amino Acid or Nucleotide Sequence
NO
121 SEQ#121:
W: mutation at R36W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLWTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
122 SEQ#122 :
W: mutation at F146W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSWLSTSSSSASYSQASASSTSGAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
123 SEQ#123 :
W: mutation at Y169W
AGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSWQRAAQSLASTLGVDGNNLARFA VQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDI SSSSSFLSTSSSSASYSQASASSTSGAGWTGPSGPSTGPSGYPGPLGGGAPFGQSGFGG
124 SEQ#124 :
Cac : mutation at R37G
SQTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASGLSQLTSSITNAVGANGVDANSLARSLQSSFSAL RSSGMSSSDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
125 SEQ#125 :
Cac : mutation at F67W
SQTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSWSAL RSSGMSSSDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
126 SEQ#126:
Cac : mutation at S78C
SQTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSAL RSSGMSSCDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
127 SEQ#127 :
Cac : mutation at S78W
SQTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSAL RSSGMSSWDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSFELVLA
128 SEQ#128 :
Cac: mutation at F120W
SQTSAFSASGAGQSAGVSVISSLNSPVGLRSASAASRLSQLTSSITNAVGANGVDANSLARSLQSSFSAL RSSGMSSSDAKIEVLLETIVGLLQLLSNTQVRGVNPATASSVANSAARSWELVLA
129 SEQ#129: Nmai (N-terminal non-repetitive domain of MaSpl from
Euprosthenops . australis)
SHTTPWTNPGLAENFMNSFMQGLSSMPGFTASQLDDMSTIAQSMVQSIQSLAAQGRTSPNKLQALNMAFA SSMAEIAASEEGGGSLSTKTSSIASAMSNAFLQTTGWNQPFINEITQLVSMFAQAGMNDVSA
130 SEQ#130: Nac : (N-terminal non-repetitive domain of AcSpl from
Latrodectus hesperus)
NWLTSLSLIFILAFVQNVQVEGRKGHHHSSGSSKSPWANPAKANAFMKCLIQKISTSPVFPQQEKEDMEE IVETMMSAFSSMSTSGGSNAAKLQAMNMAFASSMAELVIAEDADNPDSISIKTEALAKSLQQCFKSTLGS
SEQ ID Amino Acid or Nucleotide Sequence
NO RHFIAEIKDLIGMFAREAAAMEEAGDEEEETYPSAFEIPDQSISVPSADFISGMDTFIG
131 SEQ#131: Ntu: (N-terminal non-repetitive domain of TuSpl from Nephila Antipodiana)
GSEQQDLDDLAQVILSAVTSNTDTSKSARAQALSTALASSLADLLISESSGSSYQTQISALTNILSDCFV TTTGSNNPAFVSRVQTLIAVLSQSSSNAISGATGGSAFAQSQAFQQSA
132 DNA sequence for SEQ ID NO: 2
GCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACG CGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGT GGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCT GTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCC TGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCT GAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATC TCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCA CCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGG TGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACT GGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCG TTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATC CACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGT AGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCA ACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCT GGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCC AGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCG GCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTT TGGTCAAACCTCTGCTTTTTCAGCTTCTGGCGCTGGCCAAAGTGCTGGTGTTTCCGTGATCAGTTCCCTG AACAGCCCGGTCGGCCTGCGCTCTGCTTCGGCTGCTTCTCGTCTGTCCCAACTGACCTCCTCAATTACCA ATGCTGTCGGTGCTAACGGCGTGGACGCGAACTCTCTGGCCCGTTCCCTGCAGAGCTCTTTTAGCGCACT GCGCAGTTCCGGCATGTCATCGAGCGATGCGAAAATTGAAGTCCTGCTGGAAACCATCGTTGGCCTGCTG CAGCTGCTGAGTAACACGCAAGTGCGTGGTGTTAATCCGGCGACGGCCTCATCGGTGGCGAACTCGGCTG CCCGCTCGTTTGAACTGGTCCTGGCG
133 DNA sequence for SEQ ID NO: 3
GCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACG CGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGT GGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCT GTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCC TGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCT GAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATC TCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCA CCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGG TGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACT GGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCG TTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATC CACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGT AGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCA ACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCT GGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCC AGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCG GCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTT TGGTAGCGCAAGTGCAAGCGCAGCAGCAAGCGCAGCATCTACCGTGGCAAACTCTGTTAGTCGTCTGTCT AGTCCGAGTGCGGTGAGCCGCGTTAGCTCTGCCGTGAGTAGCCTGGTTTCTAACGGCCAGGTGAACATGG CCGCACTGCCGAACATTATCAGCAATATTTCTAGTAGCGTTAGCGCATCTGCACCGGGTGCAAGCGGTTG CGAAGTGATCGTTCAGGCGCTGCTGGAAGTGATTACCGCCCTGGTGCAGATCGTTTCTAGTAGCTCTGTG
SEQ ID Amino Acid or Nucleotide Sequence
NO
GGCTACATTAATCCGAGCGCGGTTAACCAGATCACGAATGTGGTTGCCAACGCAATGGCCCAGGTTATGG GT
134 DNA sequence for SEQ ID NO: 5
GCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACG CGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGT GGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCT GTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCC TGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCT GAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATC TCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCA CCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGG TGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACT GGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCG TTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATC CACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGT AGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCA ACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCT GGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCC AGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCG GCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTT TGGTCAACGTGGTCCTCGCTCTCAAGGTCCTGGTTCTGGCGGTCAGCAGGGTCCGGGTGGTCAGGGTCCT TATGGTCCTAGCGCGGCTGCAGCAGCAGCTGCAGCTGGCCCGGGTTACGGTCCGGGCGCTGGCCAGCAGG GTCCGGGTAGCCAGGCACCGGTGGCTTCTGCGGCAGCGAGCCGTCTGTCTTCCCCGCAGGCGTCCAGCCG TGTCTCCAGCGCTGTCTCCACTCTGGTATCTTCCGGTCCGACGAATCCGGCAAGCCTGTCTAACGCAATC AGCTCCGTTGTGTCCCAAGTCTCCGCAAGCAATCCGGGTCTGTCCGGTTGCGATGTGCTGGTCCAGGCAC TGCTGGAAATCGTGTCTGCTCTGGTCCACATTCTGGGTTCTAGCTCCATCGGCCAGATCAACTACGCGGC GAGCTCTCAGTACGCGCAGCTGGTAGGTCAGTCCCTGACTCAGGCTCTGGGT
135 DNA sequence for SEQ ID NO: 6
GCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACG CGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGT GGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCT GTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCC TGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCT GAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATC TCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCA CCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGG TGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACT GGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCG TTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATC CACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGT AGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCA ACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCT GGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCC AGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCG GCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTT TGGTCAAAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGT GTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCG CGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGC GCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTC TCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGT CTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCA AAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCT
SEQ ID Amino Acid or Nucleotide Sequence
NO
GCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTG GTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTCAAAGCGCTGGTCCTCAAGGTGGTTT CGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACT CTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAAT CTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCT GCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTG AACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGG CTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCT GAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACG GGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTC AATCTGGCTTTGGTCAACGTGGTCCTCGCTCTCAAGGTCCTGGTTCTGGCGGTCAGCAGGGTCCGGGTGG TCAGGGTCCTTATGGTCCTAGCGCGGCTGCAGCAGCAGCTGCAGCTGGCCCGGGTTACGGTCCGGGCGCT GGCCAGCAGGGTCCGGGTAGCCAGGCACCGGTGGCTTCTGCGGCAGCGAGCCGTCTGTCTTCCCCGCAGG CGTCCAGCCGTGTCTCCAGCGCTGTCTCCACTCTGGTATCTTCCGGTCCGACGAATCCGGCAAGCCTGTC TAACGCAATCAGCTCCGTTGTGTCCCAAGTCTCCGCAAGCAATCCGGGTCTGTCCGGTTGCGATGTGCTG GTCCAGGCACTGCTGGAAATCGTGTCTGCTCTGGTCCACATTCTGGGTTCTAGCTCCATCGGCCAGATCA ACTACGCGGCGAGCTCTCAGTACGCGCAGCTGGTAGGTCAGTCCCTGACTCAGGCTCTGGGT
136 DNA sequence for SEQ ID NO: 37
CATATGGGTCACCATCATCACCACCATgctGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTG CAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGG CGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTT GACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCG CGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCT GGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTT GATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGT CTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGG TCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCT GGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGC TGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGT ACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTT CAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGT TCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAA CGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCT TCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCT CCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGG CGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGT GGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTC TGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCAC CCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGC GACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACA TCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGG TATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGC TCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCC CAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGG CGGT
137 DNA sequence for SEQ ID NO: 38
CATATGGGTCACCATCATCACCACCATGCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTG CAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGG CGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTT GACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCG CGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCT GGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTT
SEQ ID Amino Acid or Nucleotide Sequence
NO
GATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGT CTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGG TCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCT GGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGC TGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGT ACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTT CAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGT TCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAA CGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCT TCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCT CCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGG CGGTGCACCGTTCGGTCAATCTGGCTTTGGTGGCAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGT GGTGCATCTGCAGGCCTGATTTCTCGTGTCGCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTC TGCGTACGGGCGTTTCTCAGCAGATCGCGTCTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCAC CCTGGGCGTTGACGGCAACAACCTGGCGCGTTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGC GACACCAGCGCGTACGCACAGGCTTTCTCTTCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACA TCGACACCCTGGGTTCTCGTGTTCTGTCTGCGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGG TATCAATGTTGATTCCGGTTCTGTGCAAAGCGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGC TCCAGTGCGTCTTACAGCCAGGCTTCTGCTTCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCC CAAGCACCGGTCCTTCTGGCTACCCTGGTCCTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGG CGGTAGCGCTGGTCCTCAAGGTGGTTTCGGTGCAACTGGTGGTGCATCTGCAGGCCTGATTTCTCGTGTC GCAAACGCGCTGGCAAACACTAGCACTCTGCGTACCGTTCTGCGTACGGGCGTTTCTCAGCAGATCGCGT CTTCCGTGGTACAACGCGCAGCTCAATCTCTGGCATCCACCCTGGGCGTTGACGGCAACAACCTGGCGCG TTTCGCTGTTCAGGCTGTTTCTCGTCTGCCTGCGGGTAGCGACACCAGCGCGTACGCACAGGCTTTCTCT TCTGCCCTGTTCAACGCTGGCGTACTGAACGCAAGCAACATCGACACCCTGGGTTCTCGTGTTCTGTCTG CGCTGCTGAACGGTGTGTCTTCTGCGGCTCAGGGTCTGGGTATCAATGTTGATTCCGGTTCTGTGCAAAG CGATATCTCTTCTTCCTCTTCTTTCCTGAGCACCTCCAGCTCCAGTGCGTCTTACAGCCAGGCTTCTGCT TCCAGCACCTCCGGTGCTGGCTATACGGGCCCTAGCGGCCCAAGCACCGGTCCTTCTGGCTACCCTGGTC CTTTGGGTGGCGGTGCACCGTTCGGTCAATCTGGCTTTGGCGGT
[0099] In some embodiments, the SASPs encompass full length (from about 100 amino acids to about 5,000 amino acids, or and any number therebetween, for example, 1 to 3,000 amino acids) wherein the polypeptides are polypeptides two or more, each of which has an amino acid sequence from two or more of SEQ ID Nos: 1-131. In each case, an exemplary SASP
comprises: at least one W subunit, each W subunit ranging from about 150 to 250 amino acid residues in length, and at least one non-repetitive fragment selected from: (i) a non-repetitive N- terminal fragment and (ii) a non-repetitive C-terminal fragment. In various embodiments, the N- terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues in length, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2). In various embodiments, the C-terminal (CT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length may be derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmal) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal
fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2).
[00100] In some embodiments, an exemplary SASP will have at least one W repeat unit, wherein each W repeat unit (whether wild-type or mutated) or its circularly permuted analog (e.g., GLG, LGL, etc.) is repeated 0-22 times and is flanked, if present, by a non-repetitive N- and/or C-terminal fragment derived from AcSpl, MaSpl or MaSp2, and/or a mutated W subunit and/or a mutated C-terminal fragment thereof. In some embodiments, SASPs of the present invention comprise one to fourteen (and all numbers therebetween, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14) W units, (either wild-type or mutated, or combinations thereof), for example one to four W units, wherein at least one of the W units is mutated compared to the W subunit derived from AcSpl (SEQ ID NOs: 45 and/or 46). In other embodiments, SASPs of the present invention comprise one to fourteen W unit(s), wherein at least one of the W units is mutated compared to the W unit derived from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl, MaSp2 or TuSpl . In other embodiments, SASPs of the present invention comprise one to fourteen W unit(s), wherein at least one of the W units is mutated compared to the W unit derived from AcSpl (SEQ ID NOs: 45 or 46) in combination with a mutated non-repetitive C-terminal fragment derived from
AcSpl, MaSpl or MaSp2. In other embodiments, SASPs of the present invention comprise a single W unit from AcSpl (SEQ ID NOs: 45 and/or 46, which may either be wild-type or mutated) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl, MaSp2 or TuSpl . In other embodiments, SASPs of the present invention comprise a single mutated W unit from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2. In other embodiments, SASPs of the present invention comprise a single mutated W unit from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2. In other embodiments, SASPs of the present invention comprise one to four W units from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2. In other embodiments, SASPs of the present invention comprise a mutated W unit derived from AcSpl (SEQ ID NOs: 45 and/or 46) in combination with a mutated non-repetitive C-terminal fragment derived from AcSpl, MaSpl or MaSp2.
B. Recombinant Constructs and Vectors
[00101] In various embodiments, SASPs of the present invention are recombinantly produced.
[00102] In some embodiments, the method of synthesizing a recombinant SASP provides an isolated and purified SASP protein that has a purity (on a %wt versus protein contaminants) that is typically greater than 90% pure, or greater than 95% pure, or greater than 96% pure, or greater than 97%) pure, or greater than 98% pure, or greater than 99% pure, or greater than 99.5% pure, or greater than 99.9% pure.
[00103] In certain aspects, the present invention also provides isolated and/or recombinant nucleic acids encoding a recombinant spider aciniform silk proteins (SASPs). The subject nucleic acids may be single-stranded or double-stranded, DNA or RNA molecules. These nucleic acids are useful as therapeutic agents. For example, these nucleic acids are useful in making recombinant spider aciniform silk proteins which are spun into a variety of articles of manufacture.
[00104] In certain embodiments, the invention provides isolated or recombinant nucleic acid sequences that are at least 80%, 85%, 90%, 95%, 97%, 98%, 99% or 100% identical to a region of the nucleotide sequence depicted in SEQ ID NOs: 124-129 in which the nucleotide sequence encodes a recombinant spider aciniform silk protein as described herein. One of ordinary skill in the art will appreciate that nucleic acid sequences complementary to the subject nucleic acids, and variants of the subject nucleic acids are also within the scope of this invention. In further embodiments, the nucleic acid sequences of the invention can be isolated, recombinant, and/or fused with a heterologous nucleotide sequence, or in a DNA library.
[00105] In other embodiments, nucleic acids of the invention also include nucleotide sequences that hybridize under highly stringent conditions to the nucleotide sequence depicted in SEQ ID NOs: 132-137, or a complement sequence thereof. As discussed above, one of ordinary skill in the art will understand readily that appropriate stringency conditions which promote DNA hybridization can be varied. One of ordinary skill in the art will understand readily that appropriate stringency conditions which promote DNA hybridization can be varied. For example, one could perform the hybridization at 6. Ox sodium chloride/sodium citrate (SSC) at about 45°C, followed by a wash of 2. Ox SSC at 50°C. For example, the salt concentration in the wash step can be selected from a low stringency of about 2. Ox SSC at 50°C to a high stringency of about 0.2x SSC at 50°C. In addition, the temperature in the wash step can be increased from
low stringency conditions at room temperature, about 22°C, to high stringency conditions at about 65°C. Both temperature and salt may be varied, or temperature or salt concentration may be held constant while the other variable is changed. In one embodiment, the invention provides nucleic acids which hybridize under low stringency conditions of 6x SSC at room temperature followed by a wash at 2x SSC at room temperature.
[00106] In some embodiments, the recombinant nucleic acids of the invention may be operably linked to one or more regulatory nucleotide sequences in an expression construct. Regulatory nucleotide sequences will generally be appropriate for a host cell used for expression. Numerous types of appropriate expression vectors and suitable regulatory sequences are known in the art for a variety of host cells. Typically, one or more regulatory nucleotide sequences may include, but are not limited to, promoter sequences, leader or signal sequences, ribosomal binding sites, transcriptional start and termination sequences, translational start and termination sequences, and enhancer or activator sequences. Constitutive or inducible promoters as known in the art are contemplated by the invention. The promoters may be either naturally occurring promoters, or hybrid promoters that combine elements of more than one promoter. An expression construct may be present in a cell on an episome, such as a plasmid, or the expression construct may be inserted in a chromosome. In a preferred embodiment, the expression vector contains a selectable marker gene to allow the selection of transformed host cells. Selectable marker genes are well known in the art and will vary with the host cell used.
[00107] In some embodiments, the nucleotide sequence encoding a recombinant SASP is operably fused (in frame) to a signal peptide, or expression or purification aid, for example, an illustrative recombinant SASP (comprising an amino acid sequence of one or more of SEQ ID NOs: 1-131, for example, SEQ ID NOs: 1-44 & 47-84, or for example, SEQ ID NOs: 2, 3, 5, 6, 37, 38, 45-84, or SEQ ID NOs: 1-44 provided in Table 1 herein), is fused to a purification tag, for example, an enzyme label, a peptide, a marker, for example, glutathione-S-transferase (GST), c-Myc, biotin, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (¾), ¾- SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA), etc. at the N- terminus of the SASP, at the C-terminus, or both.
[00108] In some embodiments, the subject nucleic acid is provided in an expression vector comprising a nucleotide sequence encoding a SASP and operably linked to at least one
regulatory sequence. Regulatory sequences are art-recognized and are selected to direct expression of the soluble polypeptide. Accordingly, the term "regulatory sequence" includes promoters, enhancers, and other expression control elements. Exemplary regulatory sequences are described in Goeddel; Gene Expression Technology: Methods in Enzymology, Academic Press, San Diego, Calif. (1990). For instance, any of a wide variety of expression control sequences that control the expression of a DNA sequence when operatively linked to it may be used in these vectors to express DNA sequences encoding a soluble polypeptide. Such useful expression control sequences, include, for example, the early and late promoters of SV40, tet promoter, adenovirus or cytomegalovirus immediate early promoter, the lac system, the tip system, the TAC or TRC system, T7 promoter whose expression is directed by T7 RNA polymerase, the major operator and promoter regions of phage lambda, the control regions for fd coat protein, the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase, e.g., PhoS, the promoters of the yeast a-mating factors, the polyhedron promoter of the baculovirus system and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof. It should be understood that the design of the expression vector may depend on such factors as the choice of the host cell to be transformed and/or the type of protein desired to be expressed. Moreover, the vector's copy number, the ability to control that copy number and the expression of any other protein encoded by the vector, such as antibiotic markers, should also be considered.
[00109] This invention also pertains to a host cell transfected with a recombinant gene including a coding sequence for one or more of the subject SASPs. The host cell may be any prokaryotic or eukaryotic cell. For example, a soluble polypeptide of the invention may be expressed in bacterial cells such as E. coli, insect cells (e.g., using a baculovirus expression system), yeast, or mammalian cells. Other suitable host cells are known to those skilled in the art. Large numbers of suitable vectors are known to those of skill in the art, and are commercially available. Such vectors include, but are not limited to, the following vectors: 1) Bacterial~pET, pQE70, pQE60, pQE-9 (Qiagen), pBS, pDIO, phagescript, psiX174, pbluescript SK, pETDuet™, pBSKS, pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); 2) Eukaryotic-pWLNEO, pSV2CAT, pOG44, PXT1, pSG
(Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia); and 3) Baculovirus— pPbac and pMbac
(Stratagene). Any other plasmid or vector may be used as long as they are replicable and viable in the host. In some preferred embodiments of the present invention, mammalian expression vectors comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation sites, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking non-transcribed sequences. In other embodiments, DNA sequences derived from the SV40 splice, and polyadenylation sites may be used to provide the required non-transcribed genetic elements. In some embodiments of the present invention, transcription of the DNA encoding the SASPs by higher eukaryotes is increased by inserting an enhancer sequence into the vector. Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp that act on a promoter to increase its transcription.
Enhancers useful in the present invention include, but are not limited to, the SV40 enhancer on the late side of the replication origin bp 100 to 270, a cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.
[00110] In certain embodiments of the present invention, the DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (for example, a promoter) to direct mRNA synthesis. Promoters useful in the present invention include, but are not limited to, the LTR or SV40 promoter, the E. coli lac or tip, the phage lambda PL and PR, T3 and T7 promoters, and the cytomegalovirus (CMV) immediate early, herpes simplex virus (HSV) thymidine kinase, and mouse metallothionein-I promoters and other promoters known to control expression of gene in prokaryotic or eukaryotic cells or their viruses. In other
embodiments of the present invention, recombinant expression vectors include origins of replication and selectable markers permitting transformation of the host cell (e.g., dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or selectable antibiotic markers, for example, tetracycline or ampicillin resistance m E. coli).
[00111] In other embodiments, the expression vector may also contain a ribosome binding site for translation initiation (IRES) and a transcription terminator. In still other embodiments of the present invention, the vector may also include appropriate sequences for amplifying expression.
[00112] In a further embodiment, the present invention provides host cells containing the above-described vector constructs. In some embodiments of the present invention, the host cell is a higher eukaryotic cell (e.g., a mammalian or insect cell). In other embodiments of the present invention, the host cell is a lower eukaryotic cell (e.g., a yeast cell). In still other embodiments of
the present invention, the host cell can be a prokaryotic cell (e.g., a bacterial cell). Specific examples of host cells include, but are not limited to, Escherichia coli, Salmonella typhimurium, Bacillus subtilis, species within the genera Pseudomonas, Streptomyces, Staphylococcus, as well as eukaryotic host cells Saccharomycees cerivisiae, Schizosaccharomycees pombe, Drosophila S2 cells, Spodoptera Sf9 cells, Chinese hamster ovary (CHO) cells, COS-7 lines of monkey kidney fibroblasts, CI 27, 3T3, 293, 293 T, HeLa, epithelial cell lines, (for example, A549, BEAS-2B, PtKl, NCI H441), BHK cell lines, T-l (tobacco cell culture line), root cell and cultured plant cells.
[00113] The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. In some embodiments, introduction of the construct into the host cell can be accomplished by calcium phosphate transfection, DEAE- Dextran mediated transfection, or electroporation, gene gun approach and other known methods for introducing DNA into cells (See e.g., Davis et al. [1986] Basic Methods in Molecular Biology). Alternatively, in some embodiments of the present invention, the polypeptides and polynucleotides, including nucleic acid probes of the invention can be synthetically produced by conventional peptide and oligonucleotide synthesizers.
[00114] SASPs can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such SASPs using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook, et al. (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor, N. Y. Exemplary methods for expressing SASPs, and SASPs fusion proteins are provided in further detail in the Examples below.
[00115] In some embodiments of the present invention, following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter is induced by appropriate means (e.g., temperature shift or chemical induction) and cells are cultured for an additional period. In other embodiments of the present invention, cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification. In still other embodiments of the present invention, microbial cells employed in expression of SASPs can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
[00116] The protein can be expressed in bacterial cells using plasmid based expression vectors, insect cells using baculoviral vectors, or in mammalian cells using vaccinia virus or specialized eukaryotic expression vectors. For expression in mammalian cells, the cDNA sequence may be ligated to heterologous promoters, such as the simian virus (SV 40) promoter in the pSV2 vector or other similar vectors and introduced into cultured eukaryotic cells such as COS cells to achieve transient or long-term expression. The stable integration of the chimeric gene construct may be maintained in mammalian cells by biochemical selection, such as neomycin and mycophenolic acid.
[00117] The DNA sequence can be altered using procedures such as restriction enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA sequences and site-directed sequence alteration with the use of specific oligonucleotides together with PCR.
[00118] The cDNA sequence or portions thereof can be introduced into eukaryotic expression vectors by conventional techniques. These vectors permit the transcription of the cDNA in eukaryotic cells by providing regulatory sequences that initiate and enhance the transcription of the cDNA and ensure its proper splicing and polyadenylation. Different promoters within vectors have different activities, which alters the level of expression of the cDNA. In addition, certain promoters can also modulate function such as the glucocorticoid-responsive promoter from the mouse mammary tumor virus.
[00119] Cell lines can also be produced which have integrated the vector into the genomic DNA. In this manner, the gene product is produced on a continuous basis. Vectors are introduced into recipient cells by various methods including calcium phosphate, strontium phosphate, electroporation, lipofection, DEAE dextran, microinjection, or by protoplast fusion.
Alternatively, the cDNA can be introduced by infection using viral vectors. Using the techniques mentioned, the expression vectors containing the SASP gene or portions thereof can be introduced into a variety of mammalian cells from other species or into non-mammalian cells.
[00120] The recombinant expression vector, according to this invention, comprises the selected DNA of the DNA sequences of this invention for expression in a suitable host. The DNA is operatively joined in the vector to an expression control sequence in the recombinant DNA molecule so that normal or mutant protein can be expressed. The expression control sequence may be selected from the group consisting of sequences that control the expression of genes of
prokaryotic or eukaryotic cells and their viruses and combinations thereof. The expression control sequence may be selected from the group consisting of the lac system, the trp system, the tac system, the trc system, major operator and promoter regions of phage lambda, the control region of the fd coat protein, early and late promoters of SV40, promoters derived from polyoma, adenovirus, retrovirus, baculovirus, simian virus, 3-phosphoglycerate kinase promoter, yeast acid phosphatase promoters, yeast alpha-mating factors and combinations thereof.
[00121] The host cells to be transfected with the vectors of this invention may be from a host selected from the group consisting of viruses, bacteria, yeasts, fungi, insects, mice or other animals or plant hosts. For the mutant DNA sequence, similar systems are employed to express and produce the SASPs.
[00122] C. Methods Of Making And Purifying The Recombinant Spider Aciniform Silk Proteins
[00123] The invention further contemplates methods for producing SASPs of the present invention using methods of recombinant molecular biology. In various embodiments, SASPs of the present invention are produced by and isolated from, eukaryotic or prokaryotic organisms after the polynucleotide sequences encoding the SASPs of the present invention are transfected or transformed into said eukaryotic or prokaryotic organisms. In some exemplary methods for producing the SASPs of the present invention, the SASPs are expressed recombinantly in prokaryotic organisms, such as bacteria. In some illustrative examples, methods for producing the recombinant SASPs include the steps of:
[00124] (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non- repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (2) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (3) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit, and the at least one of the N- and C-
terminal (CT) non-repetitive fragment, have an amino acid sequence as shown in SEQ ID NOs: 1-131, or a SASP comprising or consisting of one or more amino acid sequence(s) selected from SEQ ID NOs: 1-44 & 47-84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76, or a homolog thereof; and (d) a purification tag;
(ii) removing the purification tag from the recombinant SASP;
(iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and
(iv) isolating the SASP from said affinity chromatography column.
[00125] In various embodiments, methods for producing the recombinant SASP include providing an expression vector containing a polynucleotide encoding an illustrative SASP and transforming the expression vector into a competent expression host cell. In various
embodiments, the polynucleotide encoding the SASP is also operably linked, and in frame with a purification tag, for example, glutathione-S-transferase (GST), c-Myc, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa-histidine (H6), He-SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA). The purification tag fusion portion can be used to isolate the recombinant SASP from a complex mixture using affinity column chromatography.
[00126] In some embodiments, the purification tag can be either the He-SUMO purification tag or the ¾ purification tag. With the former being removed by cleaving off the H6-SUMO purification tag, and the latter not being removed (therefore saving time and steps to purify). In some embodiments, the recombinant SASP may be recombinantly made having the He purification tag fused in frame to a SASP comprising one or more amino acid sequence(s) selected from SEQ ID NOs: 1-44 & 47-84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76.
[00127] In other embodiments, methods for producing and isolating the recombinant SASPs do not involve the use of a purification tag and affinity chromatography. In these exemplary embodiments, the recombinant SASPs are produced in a competent host cell transformed or transfected with a polynucleotide encoding a recombinant SASP or a homolog thereof. The host cells are then induced to express the SASP and then is treated to disrupt the cell wall or cell membrane to release the content within the cell. A cell lysate is thereby produced and is used as the source material for extracting and isolating the recombinant SASP. Methods for disruption
of host cells for the purpose of extracting intracellular recombinantly produced products are well known in the art. In some embodiments, the soluble recombinant SASPs contained within the cell lysate is precipitated by adding a precipitating solution, such as a low-alkyl alcohol, for example ethanol, or alternatively an ammonium sulfate salt can be used. Once the soluble recombinant SASP is precipitated, it can be isolated by centrifugation. The pellet containing the precipitated recombinant SASP is then treated with a solubilizing agent such as a surfactant or a detergent. In some embodiments, the surfactant can be a non-ionic surfactant. In some illustrative embodiments, the solubilizing agent can include a non-ionic surfactant, for example, a hydrophilic polyethylene oxide polymer, for example, a Triton series of surfactant, for example, Triton X-100, or Triton X-l 14 surfactants. Other examples of non-ionic detergents useful as solubilizing agents, can include P-40, Brij series of surfactants, for example, Brij-35 or Brij -58, Tween 20, Tween 80, Octyl glucoside, Octyl thioglucoside, and the like. After one or more cycles of surfactant or detergent washes followed by centrifugation, the recombinant SASPs can be found as an insoluble precipitate and can be isolated using centrifugation.
Ultimately, the surfactant or detergent can be removed and washed with pure water and subsequently frozen at -80 °C prior to the last step of optional lyophilization.
[00128] In some embodiments, the SASP is produced by splicing two smaller SASP precursors together using intein tram-splicing. Methods for producing the recombinant SASPs include the steps of:
(i) each of the two precursor SASPs is linked or fused in frame with an intein N- fragment (denoted as IN) and/or an intein C-fragment (denoted Ic) and a purification tag and expressed separately (SASPN-IN and SASPc-Ic, as defined above);
(ii) the precursors are purified using affinity column chromatography or surfactant washes;
(iii) the two precursors are mixed together and DTT is added to the mixture to allow splicing to occur;
(iv) passing the reaction mixture containing the spliced SASP, unreacted precursors and intein fragments through an affinity chromatography column; and
(v) isolating the SASP from said affinity chromatography column.
[00129] D. Methods For Making SASP fibers
[00130] In various embodiments, methods for producing SASP fibers are also provided. In
illustrative exemplary methods, the present invention provides the following steps for making fibers comprising the SASPs of the present invention: (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, for example, from 1-3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N-terminal (NT) non- repetitive fragment comprises from about 100 amino acids to about 150 amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C- terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (2) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (3) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit, and the at least one of the N- and C- terminal (CT) non-repetitive fragment, have an amino acid sequence selected from SEQ ID NOs: 1-131, or from SEQ ID NOs: 1-44 & 47-84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76, or a homolog thereof; and (c) a purification tag; (ii) removing the purification tag from the recombinant SASP; (iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and (iv) isolating the SASP from said affinity chromatography column and (v) wet-spinning the collected recombinant SASP.
[00131] In various embodiments, once the spinning dopes have been produced comprising the recombinant SASPs, wet-spinning is performed on the recombinant SASPs to produce the fibers. In some embodiments, the present invention provides a method for producing a spider silk fiber. The steps used to produce the fibers are as follows: (i) providing a recombinant SASP
comprising from about 100 amino acids to about 5,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; (i) providing a recombinant SASP comprising from about 100 amino acids to about 5,000 amino acids, for example, from 1-3,000 amino acids, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; and (b) and at least one of: (i) a non- repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment, wherein the N- terminal (NT) non-repetitive fragment comprises from about 100 amino acids to about 150
amino acid residues, and is derived from spider silk protein major ampullate spidroin 2 (MaSp2), and wherein the C-terminal (CT) non-repetitive comprises from about 100 to about 150 amino acid residues and may be derived from: (1) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (2) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (3) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit and optional N- and C- terminal (CT) non-repetitive fragment have an amino acid sequence as shown in any one or more of SEQ ID NO: 1-131; and (c) a purification tag; (ii) removing the purification tag from the recombinant SASP; (iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and (iv) isolating the SASP from said affinity chromatography column and (v) wet-spinning the collected recombinant SASP. In these exemplary methods, the amino acid sequence of the recombinant SASP comprises any one or more of SEQ ID NO: 1-44 & 47- 84, or SEQ ID NO: 1-44, or SEQ ID NOs: 2, 3, 5, 6, 37, 38, 58, 59 and 59-76.
[00132] The present invention provides novel reagents for producing concentrated spinning dope containing concentrated SASPs solutions useful for making fibers using the wet-spinning method. In some embodiments, wet-spinning the recombinant SASP comprises dissolving the recombinant SASP in at least two solvents selected from the group consisting of water, phosphoric acid, acetic acid, formic acid, hydrochloric acid, sulfuric acid, nitric acid,
hexafluoroisopropanol (HFIP), hexafluoropropanol (HFP), hexafluoroacetone (HFA), trifluoroacetic acid (TFA), trifluoroethanol (TFE), and methylimidazolium chloride, for example, HFIP and water or a mixture of TF A/TFE/water. In some preferred embodiments, the
solubilizing solvent is 70% HFIP/30% dH20 (v/v), or 40-60%TF A/20-40 %TFE/20% dH20.
[00133] In various embodiments, the spinning dope containing concentrated recombinant silk protein is wet-spun, i.e. the wet-spinning method subjects a spinning dope to shear force followed by extrusion into a coagulation bath, serving to amalgamate the protein in a solid fiber (just like the spider) at a rate of 0.3 to 20 mL/hr. Typically, the coagulation bath contains a single or plurality of solvents, selected from the group consisting of methanol, ethanol, isopropanol, acetone, ammonium sulfate, and water. Preferably, the coagulation bath contains a mixture of an organic dehydrating solvent such as methanol, ethanol, isopropanol, acetone, ammonium sulfate, with water. In some preferred embodiments, the coagulation bath contains 80-95%) ethanol/5-20%) dH20 or aqueous solutions of ammonium sulfate. The fiber thus induced
(referred to as "as-spun" (AS) at this stage) may then be wound onto a spool/collector.
[00134] E. Apparatus For Making SASP Fibers
[00135] In various embodiments, the apparatus is fully automated and consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing or needle (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), a first set of rollers to guide the as-spun fiber into the post-spin stretching bath, a second set of rollers to facilitate stretching in the bath and a smooth transition of the post-spun fibers to the collector. Referring now to FIGS. 21 and 22, a spinning apparatus 10 is used to form silk fibers. Spinning dope 16 is loaded into syringe 12. The spinning dope 16 may comprise recombinant spider silk protein powder and one or more solvents. For example, the spinning dope 16 may comprise HFIP, TFA, TFE, water, or any combination thereof, in addition to mixtures of different recombinant spider silk protein powders, metal ions including, but not limited to, aluminum (Al3+), titanium (Ti3+ or Ti4+), iron (Fe2+ or Fe3+), and zinc (Zn2+). The syringe 12 is connected to syringe pump 14. The syringe pump 14 pumps the spinning dope 16 out of the syringe 12. The spinning dope 16 flows out of the syringe 12, is extruded through syringe tubing 18 and exits into coagulation bath 20. The syringe 12 is connected to and in fluid communication with the syringe tubing 18 at a proximal end of the syringe tubing 18; a distal end of the syringe tubing 18 is below the surface of the coagulation bath 20. The coagulation bath 20 comprises one or more solvents to aid in coagulation of the spinning dope 16. For example, the coagulation bath 20 may comprise a mixture of water and ethanol. In some embodiments, the coagulation bath 20 can also include organic solvents such as isopropanol and methanol, combinations of these organic solvents (including ethanol), and mixtures of this with water, metal ions including, but not limited to aluminum (Al3+), titanium (Ti3+ or Ti4+), iron (Fe2+ or Fe3+), and zinc (Zn2+). The spinning dope can be immersed in the coagulation bath from about 1 second to about thirty seconds. The coagulation bath 20 amalgamates protein in the soluble spinning dope 16 into an insoluble fiber form. The coagulation bath 20 is contained within coagulation vessel 22. The coagulated spinning dope 16 forms as-spun fiber (AS fiber) 24 in the coagulation bath 20. The AS fiber 24 exits the coagulation bath 20 and is connected to a first roller 26A. The AS fiber 24 may be initially connected to the first roller 26A by manual means, for example, using tweezers. The AS fiber 24 is then rolled across a first set of rollers 26 that are attached to a first roller mount 28.
The rollers are driven rotationally by a motor so as to pull the AS fiber 24 circumferentially across a portion of their surface. A continuous length of fiber is rolled over each of the rollers. After the AS fiber 24 travels across second roller 26B and as it travels across third roller 26C, the AS fiber 24 is soaked in a post-spin stretching bath 30. The AS fiber 24 undergoes post-spin stretching while in the post-spin stretching bath 30. The post-spin stretching bath 30 comprises one or more solvents to aid in the stretching of the AS fiber 24. For example, the post-spin stretching bath 30 comprises water in combination with organic solvents such as isopropanol and ethanol, metal ions including, but not limited to aluminum (Al3+), titanium (Ti3+ or Ti4+), iron (Fe2+ or Fe3+), and zinc (Zn2+). The post-spin stretching bath 30 is contained within post-spin stretching vessel 32. Post-spun fiber (PS fiber) 38 travels across fourth roller 34A, exits the post- spin stretching bath 30 and continues to travel across a second set of rollers 34. The third roller 26C and fourth roller 34A are fully or partially submerged in the post-spin stretching bath 30. A length of the fiber between the third roller 26C and fourth rollers 34A is submerged in the post- spin stretching bath 30. The second set of rollers 34 are attached to a second roller mount 36. The PS fiber 38 continues to be rolled across fifth roller 34B and sixth roller 34C. The PS fiber 38 is collected on the sixth roller 34C also referred to as the collector. Collection of the PS fiber 38 on the collector occurs by continuously winding the PS fiber 38 around the full circumference of the collector. In various embodiments, the second set of rollers 34 rotate at a faster rate than the first set of rollers 26 to stretch the fiber. Typically, different ratios of stretching provide PS fibers with differing mechanical properites. Typically, the PS fibers may be stretched from 3x to 6x. Ratio speed (second set of rollers:first set of rollers) would be 3 : 1, 4: 1, 5: 1, or 6: 1, or 7: 1.
[00136] In some embodiments, the rollers may be grooved rollers or planar cylindrical rollers. In some embodiments, the first roller 26A, second roller 26B, fifth roller 34B and sixth roller 34C are planar cylindrical rollers and the third roller 26C and fourth roller 34A are grooved rollers. In some embodiments, rollers 26A-B and 34B-C can include the planar cylindrical rollers: having diameters ranging from about 3 cm to about 9 cm, preferably around 6 cm. In some embodiments, the rollers 26A-B and 34B-C can have lengths ranging from about 4 cm to about 12 cm, preferably 8 cm. In various embodiments, rollers 26C and 34A, are grooved rollers wherein the diameters in the middle of the grooved portion of the rollers 26C and 34A can range from about 3 cm to about 9 cm, preferably 6 cm and on the ends are preferably about 9 cm, ranging from about 6 cm to aboutl2 cm. The lengths of rollers 26C and 34A can measure from
about 3 cm to about 11 cm in length, preferably 7 cm in length.
[00137] The grooved rollers may have a flared cylindrical shape with a diameter that is smallest at or near the middle of the cylinder and largest at or near the ends of the cylinder. In some embodiments, the rollers are mounted on the roller mounts in a configuration that allows the fiber to be rolled across the rollers in a zig-zagging manner. This allows for tension in the fiber and greater surface contact of the fiber on the rollers. In some embodiments the coagulation vessel 22 and post-spin stretching vessel 32 are adjacent to one another. For example, the coagulation vessel 22 and post-spin stretching vessel 32 are physically attached to one another or are formed from a single vessel with a divider to separate the coagulation bath 20 and post-spin stretching bath 30. The syringe pump 14 has a power supply 40 that may be connected to a power source. The syringe pump 14 may have an arm 42 that is driven by the syringe pump 14 to push a syringe plunger 46 of the syringe 12. The syringe plunger 46 may be connected to the syringe pump arm 42 at a syringe holder 44. The syringe 12 may also be connected to a syringe needle (not shown). The syringe pump 14 may be controlled to produce a desired flow rate based on the diameter of the syringe 12 and the drive rate of the syringe pump arm 42. Correspondingly, this flow rate can be used to control the velocity of the spinning dope 16 into the coagulation bath 20 based on the diameter of the syringe tubing 18. The velocity and diameter of the flow of spinning dope 16 into the coagulation bath 20 may be used to control production of the AS fiber 24, which is fed into the remainder of the apparatus.
[00138] Referring now to FIG. 23, the spinning apparatus 10 is driven by a series of motors. First stepper motor board and keypad 70 and second stepper motor board and keypad 72 are used to control a first set of stepper motors 74 and second set of stepper motors 76, respectively. Each stepper motor board and keypad (70, 72) has a corresponding power supply 80, 82 that may be connected to a power source. Each stepper motor board and keypad (70, 72) is in electrical communication with a series of stepper motor drivers (not shown). Each stepper motor board and keypad (70, 72) is also in electrical communication with its corresponding set of stepper motors (74, 76). The first set of stepper motors 74 consists of a first, second, and third stepper motor (74A, 74B, and 74C) that each drive rotation of the corresponding first, second, and third rollers (26A, 26B, and 26C) (third roller not shown in FIG. 23). Likewise, the second set of stepper motors 76 consists of a fourth, fifth, and sixth stepper motor (76A, 76B, and 76C) that each drive rotation of the corresponding fourth, fifth, and sixth rollers (26A, 26B, and 26C)
(fourth roller not shown in FIG. 23). The rotation rate of the rollers as driven by the stepper motors can be used to control the stretching of the fibers. For example, the rotation rate of the second set of rollers 34 may be greater than the rotation rate of the first set of rollers 26 in order to stretch the fiber as disclosed above. Materials, components, and process conditions familiar to persons of ordinary skill in the art may be used in constructing and using the spinning apparatus 10. The following are some specific exemplary embodiments that may be used in the spinning apparatus. The coagulation vessel 22 may be made from glass or other solid material for example, hardened plastic. Rollers and roller mounts may be made of a non-warping metal, such as stainbless steel or 6061 aluminum. The post-spin stretching vessel 32 may be made from poly(methyl methacrylate). The following motor components may be used: EMA 17 Stepper motor 2P 1.33 A CNC ROUTER MILL ROBOT REPRAP MAKERBOT PRUSA + PLUG; Splitter: DC Power Cable 5.5x2. lmm Female to Male Plug for CCTV Camera; EasyDriver - Stepper Motor Driver A3967 Microstepping Driver; SainSmart LCD Keypad Shield for Arduina Duemilanove UNO MEGA2560 MEGA1280; SainSmart MEGA2560 R3 Development Board Compatible with Arduino MEGA2560 R3; and Super Power Supply® AC / DC Adapter Charger Cord 12V 1.5A 5.5x2. lmm Wall Barrel Plug. The syringe 12 may be a 250 Hamilton reversible needle (RN) syringe (Hamilton, Reno NV) or a 250 Hamilton luer tip special cemented needle (LTSN) syringe (Hamilton, Reno NS). The syringe tubing 18 may be 6-10 cm long polyetheretherketone (PEEK) tube (inner diameter: 0.127-0.254 mm) (Sigma-Aldrich; Oakville, ON). The syringe tubing 18 may be attached to the syringe 12 by RN compression fittings (1/16 inch; Hamilton; Reno, NV). A 26s gauge needle (inner diameter: 0.13 mm) may also be used with the syringe 12. The syringe pump 14 may be a KD Scientific model 100 series syringe pump (Holliston, MA). The syringe pump 14 may be set to a flow rate of 5 to 50μί/πήη.
[00139] In some embodiments, physical characteristics of exemplary fibres as produced using the apparatus of the present invention is described in FIG. 24. Table 2 provides an exemplary set of physical charateristics of a SASP of the present invention.
[00140] Table 2. Physical characteristics of exemplary SASPs of the present invention spun using the apparatus described in FIGs. 21-23.
[00141] The wet-spinning apparatus as described herein was optimized to incorporate an automated post-spin stretching treatment. Since it was demonstrated that a post-spin stretching treatment improves fiber mechanical properties (mechanical testing), increases fiber
birefringence/overall molecular orientation (polarized light microscopy), and improves fiber uniformity (optical light microscopy), it was desirable to incorporate this treatment in an automated fashion. This fully automated wet-spinning apparatus of the present invention allows for more robust fiber production, is less manual-intensive, and makes it suitable for larger-scale production.
[00142] F. Articles Of Manufacture
[00143] The present invention also provides SASP fibers useful in the manufacture of various articles. In some exemplary embodiments, articles of manufacture incorporating the SASP fibers of the present invention may be useful in the manufacture of articles for use the field of biotechnology and/or medicine. In some illustrative examples, such articles could include:
manufacture of wound closure or coverage systems, suture materials for use in neurosurgery or ophthalmic surgery, replacement materials, preferably artificial cartilage or tendon materials.
[00144] Additionally, the threads/fibers of the present invention can be used in the
manufacture of medical devices such as medical adhesive strips, skin grafts, replacement ligaments, and surgical mesh; and in a wide range of industrial and commercial products, such as clothing fabric, bullet-proof vest lining, container fabric, bag or purse straps, cable, rope, adhesive binding material, non-adhesive binding material, strapping material, automotive covers and parts, aircraft construction material, weatherproofing material, flexible partition material, sports equipment; and, in fact, in nearly any use of fiber or fabric for which high tensile strength and elasticity are desired characteristics, or a combination of high tensile strength and extensibility - mechanical features which enable the fibers produced using the compopsitions
disclosed herein to exhibit high toughness (a unique mechanical feature of the described recombinant SASP protein). Adaptability and use of the stable fiber product in other forms, such as a dry spray coating, bead-like particles, or use in a mixture with other compositions is also contemplated by the present invention.
[00145] Other illustrative uses of the SASPs of the present invention are in the manufacture and processing of clothing fabric (textiles) and leather, automotive covers and parts, aircraft construction materials as well as in the manufacture and processing of paper.
[00146] The recombinant SASPs of the present invention may be added to cellulose and keratin and collagen products and thus, the present invention is also directed to a paper or a skin care and hair care product, comprising cellulose and/or keratin and/or collagen and the spider silk proteins of the present invention. Papers and skin care and hair care products, in which the SASPs of the present invention can be incorporated can improve or enhance the tensile strength or tear strength of various woven materials.
[00147] Furthermore, the recombinant SASPs of the present invention can be used as a coating for metals, plastics, textile and leather products, thereby conferring stability and durability to the coated product.
[00148] G. EXAMPLES
Example 1
Experimental procedures
[00149] 1. Protein expression
[00150] (1) Expression in LB medium:
[00151] Expression plasmid was transformed into E. coli BL21(DE3) (Novagen, Darmstadt, Germany). Lauria-Bertania (LB) medium was prepared (Fisher Scientific; Ottawa, ON), and a starter culture was formed by inoculating a single colony of cells in LB media containing 50 μg/mL ampicillin (Fisher Scientific; Ottawa, ON), which was incubated with a shaker at 37 °C overnight. The resulting overnight culture was stored at 4 °C and, following 6 h storage at 4 °C, was transferred into 1.6 L LB medium containing 50 μg/mL ampicillin. The cells were allowed to incubate with shaking at 37 °C until the optical density at 600 nm (OD600) reached 0.8-10.1. Subsequently, expression of the fusion protein was induced with 0.8 mM isopropyl β-D-l- thiogalactopyranoside (IPTG; Fisher Scientific; Ottawa, ON) and incubated with shaking at room temperature (22±2 °C) overnight.
[00152] (2) Expression in rich-auto-induction medium:
[00153] Expression plasmid was transformed into E. coli BL21(DE3) (Novagen, Darmstadt, Germany). A starter culture (100 mL) was formed as described above at 37 °C overnight. Cells were harvested and resuspended in 1 L ZYP auto-induction media containing 10 g/L tryptone, 5 g/L yeast extract, 25 mM sodium succinate, 2 % glycerol, 2 mM MgS04, 50 mM Na2HP04, 50 mM KH2P04, 25 mM ( H4)2S04, 0.05% glucose, 0.2 % alpha-lactose monohydrate and trace metals (50 μΜ FeCl3, 20 μΜ CaCl2, 10 μΜ MnCl2, 10 μΜ ZnS04, 2 μΜ CoCl2, 2 μΜ CuCl2, 2 μΜ NiCl2, 2 μΜ Na2Mo04, 2 μΜ H3B03 in 60 μΜ HC1). Cells were allowed to incubate for 7 hours at 37 °C.
[00154] 2. Protein purification
[00155] Cells were harvested by centrifuging at 7000 rcf for 5 min at 4°C. The supernatant was discarded and the pellet was resuspended by vortexing in native wash buffer (50 mM NaH2P04, 300 mM NaCl, 20 mM imidazole, pH 8.0). A French Pressure Cell Press (American Instrument Company) was used to lyse the cells. The resulting cell lysate was centrifuged at 12,000 rcf for 30 min at 4°C to discard insoluble debris and the target proteins remain in soluble supernatant. Subsequently, the target proteins were purified from the soluble supernatant in two different ways:
[00156] (1) Purification using affinity chromatography method:
[00157] The supernatant was loaded onto a column with immobilized Ni-NTA Sepharose (Qiagen, Germany), flowing through twice at room temperature. As per manufacturer's instructions, the column was then washed using wash buffer and then the bound recombinant proteins with a hexahistidine (H6)-SUMO protein tag or just a H6-tag were eluted using elution buffer (50 mM NaH2P04, 300 mM NaCl, 250 mM imidazole, pH 8.0). For H6-SASPs, the eluted protein is the final purified target protein. For H6-SUMO-silk proteins, they were further digested by SUMO protease in the presence of 1 mM dithiothreitol (Fisher Scientific; Ottawa, ON). The reaction mixture was then transferred to dialysis tubing and dialyzed against 50 mM K3P04 (pH 7.5). SUMO cleavage and dialysis was carried out at 4°C overnight.
[00158] After overnight incubation, SUMO protease digestion was completed and reverse purification was performed by passing the reaction mixture through an immobilized Ni-NTA Sepharose column 3-5 times. This allowed the tag-free SASP to flow through the column, while all the other remaining proteins were left bound to the column.
[00159] (2) Purification using non-chromatography method:
[00160] The soluble fraction of cell lysate (supernatant) was also used for protein purification in this method. Firstly, the soluble target protein was precipitated by adding 20% ( FL^SC (20% to its saturation concentration) to the supernatant at 4°C for overnight. Centrifugation at 12,000 rcf for 30 min was conducted to pellet precipitated silk protein. Pellet was suspended in 0.1-1%) Triton X-100 (-0.5 mL/1 mg protein) with a tissue-grinder homogenizer until all clumps were disrupted. Centrifuging at 12,000 rcf for 10 min to collect pellet and repeat the Triton wash and centrifuging steps twice. After Triton wash, silk protein should remain in the pellet. In order to remove Triton from the pellet, dH20 was used to wash pellet using the suspending and centrifuging method used for Triton. The silk protein (in pellet) was finally suspended in dH20, frozen at -80 °C, and lyophilized.
[00161] 3. Fiber formation
[00162] (1) Fiber formation using wet-spinning method
[00163] Step 1 : Solubilization of recombinant spider silk protein
[00164] Dope #1 : Spinning dope type#l was made by dissolving -8% (w/v) lyophilized recombinant spider silk protein powder into 70% F£FIP/30%> dH20 (v/v) in glass vials. Protein- solvent mixture was vortexed until homogeneous, and then sonicated twice at 37 °C for 5 min, with vortexing in between. Subsequently, the glass vials were wrapped with aluminum foil to prevent exposure of the protein-solvent mixtures to light, and incubated for -48 h at room temperature, with occasional vortexing. After 48 h, the protein-solvent mixture was centrifuged at 14,000 rcf for 30 min at 20°C and transferred into a new glass vial. This was repeated until the protein-solvent mixture (spinning dope) was transparent with no visible insoluble components remaining.
[00165] Dope #2: Spinning dope type#2 was made by dissolving -10-15%) (w/v) lyophilized recombinant spider silk protein powder into 40-60% TF A/20-40% TFE/20% dH20. Protein- solvent mixture was vortexed for -2 min followed by incubation for -30 min at room
temperature. The mixture was then centrifuged at 14,000 rcf for 30 min at 20 °C and transferred into a new glass vial. This was repeated with shorter time increments if necessary until the protein-solvent mixture (spinning dope) was transparent with no visible insoluble components remaining.
[00166] Step 2: Wet-spinning
[00167] The spinning dopes were loaded into a 250 μΙ_, Hamilton reversible needle (RN) syringe (Hamilton, Reno NV), which was attached to 6-10 cm long polyetheretherketone (PEEK) tube (inner diameter: 0.127-0.254 mm); Simga-Aldrich; Oakville, ON) by RN compression fittings (1/16 inch; Hamilton; Reno, NV). A 26s gauge needle (inner diameter: 0.13 mm) was also used for this syringe. The syringe containing the spinning dope was securely attached to a KD Scientific model 100 series syringe pump (Holliston, MA). The dope was extruded through the PEEK tube into a coagulation bath of 80-95% ethanol/5-20%) dH20 at a constant speed of 16 μΙ7πήη. Fibers formed in the coagulation bath were as-spun fiber (AS) and were carefully picked up by tweezers and guided onto a collector. A schematic of this can be seen in FIG. 1.
[00168] FIG. 1 : Schematic of the wet-spinning fiber production method. This apparatus consists of a pump (for a controlled extrusion rate), a glass syringe (to store and allow for controlled delivery of the spinning dope), PEEK tubing (for shear force application), a coagulation bath (to amalgamate protein in the soluble dope into an insoluble fiber form), and a collector for the AS fibers.
[00169] Step 3 : Post-spin stretching
[00170] A custom-designed apparatus was built at the Department of Physics and Atmospheric Science Machine Shop (Dalhousie University; Halifax, NS) to allow consistent stretching of fibers in H20 (FIG. 2). This has been effectively applied for post-spin stretching on all fiber conditions. The container material was aluminum, the metric ruler and the bolts were stainless steel, and the single axis translation stage (Thor Labs, 50 mm; Montreal, QC) was anodized aluminum. AS fiber samples were macroscopically examined for defects before post-spin stretching treatment. 2-3 cm long AS fiber pieces without visible defects were cut and placed at the edge and on top of the translational stage and at the edge and on top of the container mounting surface using Scotch® Double Sided Tape (1/2 inch). Scotch® MagicTM Tape (3/4 inch) was then carefully placed on top to firmly secure both ends of the AS fibers.
[00171] Following fiber affixing, the post-spin stretching apparatus was slowly filled with dH20 until the fibers were fully immersed. Using the control knob, the fibers were smoothly stretched to 4x their original length and allowed to rest in the dH20 bath for 3 min. The dH20 bath was then drained with simultaneous misting using 95% EtOH until the fibers were no longer in contact with dH20. Subsequently, the resulting "PS stretched" fibers were allowed to dry at room temperature for 5-10 min. A schematic of the post-spin stretching procedure with this
apparatus can be seen in FIG. 2.
[00172] FIG. 2. An image of the post-spin stretching apparatus. The translational control knob allows for controlled motion of the stage, the metric ruler allows for a more consistent and precise measurement of fiber stretching, and the drain plug allows the dH20 bath to steadily drain.
[00173] (2) Fiber formation using hand-drawing method
[00174] Fibers were pulled from solutions of purified spider silk proteins in different buffers (10 mM Tris.HCl, pH 8.0 or 50 mM K3P04, pH 7.5) at room temperature. Typically, a 5-20 μΐ, protein solution (-0.1-1 mg/mL) was placed on a glass slide at room temperature, and fibers were pulled from the protein solution using a plastic 200 μΐ. pipette tip, where one end of the fiber apparently attached to the pipette tip. Each pulling action was a continuous motion at a speed of -2-10 mm/s.
[00175] 4. Protein modifications and ligation using inteins
[00176] (1) Screening intein tram-splicing activity
[00177] One split-intein can ligate two proteins together to make a larger protein. In order to make large spidrions, the ability to splice more silk proteins together is more desired and intein tandom splicing is required, meaning more than one split-intein is needed. Intein splicing is also protein dependent, especially the residues ajecent to it. So we selected 6 split-inteins to test splicing activities on the W4 protein having four W repeat units. In total, 30 constructs (Table 3) were made, with each one containing ¾ tag + intein C-fragment (Ic) +W4+ intein N-fragment (IN) + H6 tag.
[00178] Table 3. Protein constructs for splicing activity test. Inteins are numbered as 1 : Ssp DnaX (SX) from Synechocystis species, strain PCC6803, SEQ ID NO 90; 2: Ter ThyX (TX) from Trichodesmium erythraeum FMS101, SEQ ID NO 91; 3 : Ter DnaE3 (TE3) from
Trichodesmium erythraeum FMS101, SEQ ID NO 92; 4: Ssp GyrB (SG) from Synechocystis species, strain PCC6803, SEQ ID NO 93; 5: Rma DnaB (RB) from Rhodothermus marinus, SEQ ID NO 94; 6: Cne-AD PRP8 (CP) from Cryptococcus neoformans, SEQ ID NO 95. W4 is four W repeat units which includes the first W repeat unit of SEQ ID NO: 46 and three W repeat units containing the amino acid sequence of SEQ ID NO 45. For purposes of illustration, the N- fragment of intein No. 1 is referred to as IN, and the C-fragment of intein No. 3 is referred to as 3c etc.
[00179] Plasmid was transformed into E. coli BL21(DE3), a starter culture and a larger culture was then prepared in the same manner as was described in [00140]. The cells were allowed to incubate with shaking at 37 °C until the optical density at 600 nm (OD600) reached 0.8-1 and expression was induced with 0.8 mM IPTG at 37 °C for 2-5 hours.
[00180] Cells were harvested and lysed the same way as described in [00144]. After French pressure press, supernatant was discarded and target protein remained in pellet and was saved for further steps. The insoluble pellet was then washed with inclusion body wash buffer (1% Triton- X100 in PBS or 50 mM potassium phosphate buffer, pH 7.5) 2-3 times using a tissue
homogenizer, followed by washed with water for another 2-3 times. The pellet was dissolved in a buffer (10 mM Tris.Cl, pH 8.0 or 50 mM potassium phosphate, pH 7.5) containing 8 M urea. Centrifugation at 13,000 rcf was performed to separate soluble and insoluble proteins. Most of the target protein can be dissolved and thus the supemantant was saved. Urea was removed from the supernatant by dialysis, from 8 M to 2-4 M.
[00181] Intein tram'-splicing was performed by mixing two precursor proteins together in 50 mM potassium phosphate, pH 7.5, with 1 mM DTT added, for example, mixing 4CW43N and 3CW45N together in order to test splicing activity of intein 3. Splicing reaction was carried out at room temperature or at 4°C for 1-16 hours.
[00182] (2) Modification and ligation using intein-mediated protein splicing
[00183] After screening intein splicing activity, two best performed inteins (4: Ssp GyrB and 5: Rma DnaB) were selected for protein ligation to make larger proteins, eg. Proteins with more than four W units.
[00184] Two constructs containing the N-precursor (WI/2/4IN or NW2/4IN: Wi, W2 or W4 + IN +
H6 tag; or NTD + W2 or W4 + IN + ¾ tag) and C-precursor (IcWi/2/4 or IcW2/4C: H6 tag + intein Ic + Wi, W2 or W4; or H6 tag + Ic + W2 or W4 + CTD) were constructed. N- and C-precursors were purified by Ni-NTA affinity chromatography. The two precursors were mixed together and splicing reaction was carried out in purification elution buffer (50 mM sodium phosphate, 300 mM NaCl, 250 mM imidazole, pH 8.0) with 1 mM DTT at 4°C for > 6 hours. For some of the precursors, urea was needed to solubilize the protein and was added to the purification buffers, and it has to be reduced to <4 M in the final splicing mixture by dislysis. After splicing, the mixture was then dialyzed against 50 mM potassium phosphate, pH 7.5 at 4°C for > 2 hours and reverse purified using Ni-NTA Sepharose as described above. Any remaining unreacted precursors and all intein fragments had ¾ tags and were thus trapped, leaving the tag free spliced (N)W2(C), (N)W4(C), (N)W6(C) or (N)W8(C) proteins to flow through the column and purified.
[00185] Results
[00186] 1. Protein expression and purification
[00187] Proteins that have been tested expression in E. coli includes: protein SEQ ID NO: 1- 47, 58 and precursor proteins to produce SEQ ID NO: 48-57.
[00188] All proteins tested were able to be purified using the affinity chromatography method. H-W3 (SEQ ID NO: 37) has been tested to be purified using non-chromatography method. All proteins can be purified to be > 80% purity.
[00189] FIGs. 3 A-3D provide photomicrograph examples of protein purification of GLGL (SEQ ID NO: 11), LGLG (SEQ ID NO: 12), GLGLG (SEQ ID NO: 13), LGLGL (SEQ ID NO: 14), H-W3 (SEQ ID NO: 37) and H-W (SEQ ID NO: 38) using affinity chromatography. GLGL, LGLG, GLGLG and LGLGL were fused with H6-SUMO and purification was carried out in three steps: initial precursor purification using affinity chromatography, SUMO cleavage and reverse purification. The resulting silk proteins are tag free. H6-W3, H6-W4 and H6-W3Cma2 are the only proteins fused to a ¾ tag and purification was carried out by one step purification using affinity chromatography. The resulting silk proteins have a ¾ tag at the N-terminus. Non:
noninduced total cell lysate; T: induced total cell lysate; S: soluble fraction of cell lysate; P: insoluble fraction of cell lysate; U: unattached proteins without ¾ tag; E: elution of He-silk proteins; C: after SUMO protease cleavage and overnight dialysis; FT1-3 : reverse purification through a series of flow-throughs was carried out by nickel affinity chromatography. Ni: after
reverse purification, the other H6-proteins bound to Ni-NTA Sepharose. The purified target proteins are labeled with red boxes. This was resolved by SDS-PAGE and visualized with Coomassie blue staining.
[00190] 2. Fiber formation
[00191] (1) Fibers formed using wet-spinning method
[00192] Fiber surface morphology:
[00193] Without wishing to be bound to any particular theory, it is believed that all SASPs should be able to form fibers using wet-spinning method. There are at least 10 exemplary SASPs that have been proven to form fibers, including W2, W3, W4, ¾-W3, H6-W4, H6-W3Cma2, GLGL, LGLG, GLGLG and LGLGL. Fibers formed using wet-spinning method are homogeneous, smooth and continuous. FIG. 4A-4B shows an example of wet-spun AS fiber formed by W3.
[00194] As shown in FIGs. 4A-4B, light microscope images of representative AS W3 fibers formed using wet-spinning method. Images were taken at both (FIG. 4A) 100X magnification and (FIG. 4B) at 400X magnification.
[00195] Following the initial wet-spinning, a post-spin stretching procedure is usually performed in order to improve fiber mechanical properties. The surface morphology of post-spun (PS) fibers are thinner and more uniform than AS fibers (FIGs. 5A & 5B).
[00196] As shown in FIGs. 5A & 5B, light microscope images of representative PS 4x (AS fiber being stretched 4 times of its original length) W3 fiber formed from wet-spinning. Images were taken at both 100X magnification (FIG. 5 A) and at 400X magnification (FIG. 5B).
[00197] Fiber mechanical properties:
[00198] The mechanical behavior of each material is related to its response to applied force, and can be represented in a stress-strain distribution, revealing many important mechanical features. The most important features are stress, strain, toughness and elastic modulus (Young's modulus), where stress and strain are used interchangeably with strength and extensibility, respectively.
[00199] Representative stress-strain curves for AS and PS 4x W3 fibers from both HFIP/FhO and TFA/TFE/H2O spinning dopes can be seen in FIG. 6. Mechanical properties acquired from these fibers can be seen in Table 1. As expected, the strength was dramatically improved when comparing AS fibers to PS 4x fibers, which coincides with previous results on post-spin stretching wet-spun recombinant spider silk-based fibers. This is likely the result of promoting
further molecular alignment and favorable structural changes within the fiber.
[00200] Not only post-spin stretching influenced fiber mechanical properties, they were found drastically different when comparing the two types of spinning dopes (FIG. 6). Compared to the TFA/TFE/H2O spinning dope, PS 4x W3 fibres spun from the HFIP/H2O spinning dope were weaker with an engineering stress of 92.4 MPa, were more brittle with an engineering strain of 2.6±1%, and were less tough with a toughness of 1.3 MJ m"3 (Table 4). Conversely, PS 4x W3 fibres spun from the TFA/TFE/H2O spinning dope were stronger, more extensible, and tougher with an engineering stress of 139 MPa, engineering strain of 47.3%, and toughness of 58.8±33 MJ m"3.
[00201] As shown in FIG. 6, stress-strain curves are shown for W3 fibers, where fibres spun from the HFIP/H2O spinning dope are represented as dotted lines and fibers spun from the TFA/TFE/H2O spinning dope are represented as solid lines.
[00202] Table 4. Mechanical properties of AS and PS 4x W3 fibers spun from 10% (w/v) in TFA/TFE/H2O or 8% (w/v) in HFIP/H2O.
[00203] Protein secondary structure in solution and fiber:
[00204] With the same W3 construct, it is clear that spinning dope solvent composition plays an important role in the final mechanical properties when applied to a wet-spinning/post-spin stretching method. This could be because of different pre-structuring and resulting pre-assembly within the two spinning dopes giving rise to different mechanical properties. This is reflected by
the notable differences observed in the CD spectra for soluble protein in spinning dope (FIGs. 7 and 8).
[00205] As shown in FIG. 7, analysis of W3 protein secondary structure in HFIP/FhO is provided. Far-UV CD spectra of either 8% (w/v) W3 in HFIP/H2O (black) or 0.8% (w/v) W3 in HFIP/H2O (red). Referring to FIG. 8 W3 protein secondary structure in TFA/TFE/H2O is analyzed. Far-UV CD spectra of either 10% (w/v) W3 in TFA/TFE/H2O (black) or 1% (w/v) W3 in TFA/TFE/H2O (red). As indicated from the drastic difference of mechanical properties of the fibers, protein secondary structures for AS and PS fibers and for fibers formed from different dopes are different. Raman spectra of AS and PS 4x W3 fibers from both HFIP/H2O and TFA/TFE/H2O spinning dopes can be seen in FIGs. 9A-9D and 10A-10D, respectively.
[00206] A conformational change in protein secondary structure occurs in the fibers formed from the HFIP/H2O spinning dope following post-spin stretching (FIGs. 9A-9D). Particularly, in the amide I region it can be observed that the AS fibers formed from the HFIP/H2O spinning dope are in an a-helical/p-sheet conformation, which then shifts to an enrichment in β-sheet conformation in the PS 4x fibers. This conformational change into a predominant β-sheet structure reinforces the idea that a more complete transition of the crystalline phase is likely occurring. Specifically, this could reflect an increase in β-sheet crystallization, resulting in an increase in local protein concentration and therefore promoting greater mechanical strength (Table 4).
[00207] As with the fibers formed from the HFIP/H2O spinning dope, following post-spin stretching, a conformational change in protein secondary structure also occurs in the fibers formed from the TFA/TFE/H2O spinning dope (FIG. 10). However, there are differences in conformational behavior. Specifically, in the amide I region, compared to the AS fibers formed from the HFIP/H2O spinning dope, AS fibers formed from the TFA/TFE/H2O spinning dope are already in a more pronounced β-sheet conformation (FIG. 10). Following post-spin stretching, this pronounced β-sheet conformation in the PS 4x fibers becomes even more prominent as compared to the PS 4x fibers formed from the HFIP/H2O spinning dope.
[00208] As shown in FIGs. 9 A, Raman spectra of W3 fibers formed from the HFIP/H2O spinning dope in perpendicular (blue) or parallel (red) alignment relative to the incident polarized scattered light are depicted. Full spectra range of AS fibers are shown in FIG. 9A. FIG. 9B depicts amide I region of AS fibers. FIG. 9C illustrates full spectra range of PS 4x
fibers, while in FIG. 9D, the amide I region of PS 4x fibers are shown.
[00209] As shown in FIGs. 10A-10D, Raman spectra of W3 fibers formed from the
TFA/TFE/H2O spinning dope in perpendicular (blue) or parallel (red) alignment relative to the incident polarized scattered light are compared. Full spectra range of AS fibers are shown in FIG. 10A, the amide I region of AS fibers are shown in FIG. 10B, FIG. IOC depicts the full spectra range of PS 4x fibers, while FIG. 10D depicts the amide I region of PS 4x fibers.
[00210] Protein secondary structure also plays an important role in fiber mechanical properties. Better alignment leads to better strength and it can be reflected by brighter birefringence. As expected, notable differences can be observed between AS W3 fibers from the two spinning dopes (FIGs. 11 and 12). In addition, in both conditions, birefringence increased following post- spin stretching, suggesting more alignment of protein molecules within PS fibers (FIGs. 1 IB and 12B).
[00211] As shown in FIGs. 11 A & 1 IB, micrographs depicting the birefringence of W3 fibers spun from HFIP/H2O spinning dope visualized by polarized light microscopy is shown in FIG. 11 A depicting AS fibers and in FIG. 1 IB, PS 4x fibers.
[00212] FIG. 12 provides micrographs depicting birefringence of W3 fibers spun from
TFA/TFE/H20 spinning dope visualized by polarized light microscopy. FIG. 12A depicts AS fibers and FIG. 12B depicts PS 4x fibers.
[00213] (2) Fibers formed using hand-pulling method
[00214] Before the wet-spinning method was developed, fibers were produced by using a hand-pulling method. Although hand-pulling is a manual process and cannot be applied for large-scale fiber production, the mechanical properties of hand-pulled fibers can be a good indication of a given protein's potential mechanical properties prior to using a wet-spinning method. So far, we have tested W3, W2Cma2, H6-W3, H6-W4 and H6-W3Cma2 wet-spun
fibers' mechanical properties. We have tested more types of hand-pulled fibers' mechanical properties and found the trends are that (1) the bigger the protein is; the better mechanical properties of the fibers are; (2) proteins with the C-terminal non-repetitive domain (CTD) have better mechanical properties. We have tested W2, W3, W4, W2Cmai, W2Cma2 and W2Cac's mechanical properties and the best mechanical properties are from fibers formed by the W2Cac (SEQ ID NO: 2) protein construct. Since the automated wet-spinning method has been developed recently, we can easily test more types of wet-spun fibers and scale up for different applications.
[00215] Fiber surface morphology:
[00216] Fibers formed using hand-pulling method are thin (diameter -1.5-4 μιη),
homogeneous and smooth. FIG. 13 shows two examples of hand-pulled fibers formed by W3 and
[00217] As shown in FIGs. 13 A & 13B, optical microscopy images of indicated fiber types (one in FIG 13 A and the other in FIG. 13B) are provided.
[00218] Fiber mechanical properties:
[00219] Adding a CTD significantly increased fiber's strength and toughness (Table 5 and FIG. 14, 15). The average breaking strength of W2Cac fibers was ~2.6x, ~2.2x and 1.5x greater than those observed for W2 and W3 and W4 fibers, respectively. The Young's modulus of W2Cac fiber was ~2-3x those of the W2, W3, and W4 fibers. Although adding a CTD did not
significantly increase fiber extensibility, the average toughness observed for W2Cac fibers is more than double those for W2 and W3 fibers and ~1.2x that for W4 fibers.
[00220] Table 5. Summary of the Mechanical Data for Recombinant Hand-drawn Fibers and Native Fibers
[00221] As shown in FIG. 14, representative stress-strain curves of W3 and W2Cac fibers are shown.
[00222] As shown in FIG. 15, mechanical properties of indicated fibers are shown.
[00223] Protein secondary structure in solution and fiber:
[00224] Far-UV CD spectroscopy was used to compare secondary structure for each recombinant spidroin in solution. Each protein investigated exhibited nearly identical CD spectra (FIG. 16). Prominent negative bands at 208 and 220 nm, along with strongly positive ellipticity at 195 nm, indicate that all proteins contain significant a-helical content in solution. So the differences in fiber mechanical properties are likely caused during/after fiber formation.
[00225] As shown in FIG. 16, far-UV circular dichroism spectroscopy of indicated protein in 50 mM potassium phosphate, pH 7.5, at 22 °C is exemplified.
[00226] In the transition from the soluble to fibrous state, silk protein usually undergoes a partial conversion from a-helix to β-sheet. Polarized Raman spectromicroscopy was applied to investigate protein composition, structure content and orientation in each type of fiber. In comparing intensity and shape of these bands, all fibers demonstrated very close agreement in amino acid composition.
[00227] Secondary structure composition of each fiber type was determined by amide I band decomposition of the orientation-insensitive spectra (FIG. 17 A). Each fiber type demonstrates -32-35% a-helical and -26-28% β-sheet character (Table 6). This composition is highly consistent with aciniform silk fibers produced by Argiope aurantia, which contain - 33% a-helix and - 27% β-sheet.
[00228] As mentioned above, molecular orientation is also known to affect fiber mechanical properties, with greater β-sheet alignment typically being consistent with increased fiber strength. Recombinant and native aciniform silks were analyzed by polarized Raman
spectroscomicroscopy recorded perpendicular (XX) and parallel (ZZ) to the long axis of the fibers. The overlay of amide I and amide III bands for each fiber in the ZZ and XX directions clearly demonstrates greater similarity in band shape and intensities between W2Cac fibers and native aciniform silk than between W3, W2Cmai or W2Cma2 fibers and native silk (FIG. 17B). Across all fiber types, the Izz/Ixx ratio for the amide I α-helical band at 1658 cm"1 was extremely similar (FIG. 17C). In contrast, the Izz/Ixx ratios for the amide III β-sheet band at 1240 cm"1 were significantly higher for W2Cac and native fibers than the other fibers (FIG. 17C). Also notably, this ratio was statistically indistinguishable between W2Cac and native fibers. These
results imply that a-helices are highly similarly aligned in all cases, while β-sheets show greater alignment in W2Cac and native fibers. The difference in mechanical properties between W2Cac and the other three recombinant fibers (W3, W2Cmai and W2Cma2) may, therefore, arise from a greater degree of β-sheet alignment.
[00229] As shown in FIGs. 17A-17C, secondary structure and structural orientation of different types of fiber are shown. In panel (A) Orientation-insensitive Raman spectra of indicated fibers. Amide I decomposition was based on five bands: a-helix (red); β-sheet (purple); random coil, turns, etc. (3 x green). In FIG. 17B an overlay of amide I and amide III bands of indicated fibers from XX (perpendicular) and ZZ (parallel) directions is provided. In FIG. 17C, Izz/Ixx ratios of a-helix and β-sheet bands as indicated in FIG. 17B. The β-sheet structure of W2Cac and native aciniform silk is statistically more orientated and indicated by "*".
[00230] Table 6. Content of a-helix and β-sheet in all four artificial and native aciniform silk fibers based upon Raman spectral decomposition of amide I band. The uncertainty on the values is ± 3% based on the reproducibility of the experiments and curve-fitting procedure.
[00231] OTHER EMBODF ENTS
[00232] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the claims.
Claims
1. An isolated recombinant spider aciniform silk protein (SASP) comprising an amino acid sequence as set forth in SEQ ID NOs: 1-84 and 106-131, or any homolog thereof.
2. The isolated recombinant SASP according to claim 1, wherein the amino acid sequence of said protein comprises any one of SEQ ID NOs: 1-44 and 47-84.
3. The isolated recombinant SASP according to claim 1, wherein the amino acid sequence of said protein comprises SEQ ID NO: 2, 3, 5, 6, 37, 38, 45-84.
4. The isolated recombinant SASP according to claim 1, wherein the SASP comprises from about 100 to about 5,000 amino acid residues, said SASP comprising: at least one W subunit, wherein each W subunit ranges from about 150 to 250 amino acid residues in length, and at least one non-repetitive fragment selected from: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment; wherein, the N-terminal (NT) non- repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues, derived from spider silk protein major ampullate spidroin 2 (MaSp2); and wherein the C-terminal (CT) non-repetitive fragment ranges from about 100 to about 150 amino acids derived from: (a) the C- terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit and C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in SEQ ID NO: 1-131.
5. The isolated recombinant SASP of claim 4, wherein the SASP is derived from SEQ ID NO: 2, 3, 5, 6, 37, 38 and 47-84.
6. A isolated recombinant spider aciniform silk protein (SASP) comprising a peptide having an amino acid sequence selected from the group consisting of SEQ ID NO: 15-44 and 47- 84, or a homolog of any thereof, each of which is fused to a tag molecule.
7. The isolated recombinant SASP of claim 6, wherein the amino acid sequence of said protein comprises SEQ ID NO: 2, 3, 5, 6, 37, 38 and 47-84.
8. The isolated recombinant SASP of claim 6, wherein the spider is Argiope trifasciata, Argiope amoena, Euprosthenops australis, and Araneus diadematus.
9. The isolated recombinant SASP according to claim 1, where SASP comprises an amino acid sequence as set forth in any one of SEQ ID NOs: 59-76 and 76-83.
10. An isolated nucleic acid molecule encoding the SASP or a homolog of claim 1 and naturally occurring allelic variants thereof.
11. An isolated nucleic acid molecule encoding the SASP or a homolog thereof of claim 6 and naturally occurring allelic variants thereof.
12. An expression vector comprising at least one of the nucleic acids of claim 1.
13. An expression vector comprising at least one of the nucleic acids of claim 6.
14. A recombinant host cell, comprising the nucleic acid molecule of claim 10.
15. A recombinant host cell, comprising the nucleic acid molecule of claim 11.
16. An artificially produced fiber comprising the SASP or a homolog thereof of claim
1.
17. An artificially produced fiber comprising the SASP or a homolog thereof of claim
6.
18. An artificially produced fiber, comprising one or more SASP selected from the group consisting of: a) SASP comprising a peptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-44 and 47-84, or a homolog of any thereof.
19. A method for producing a polymeric form of an isolated spider aciniform silk protein (SASP), comprising the steps of:
(i) providing a recombinant SASP comprising from about 150 to about 5000 amino acid residues, said SASP comprising: (a) at least one W subunit, each W subunit ranging from about 150 to 250 amino acids; (b) no N-terminal, or a N-terminal (NT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from spider silk protein major ampullate spidroin 2 (MaSp2) and (c) no C-terminal, or a C-terminal (CT) non-repetitive fragment ranging from about 100 to about 150 amino acid residues in length derived from at least one of a C-terminal fragment of the spider silk protein aciniform spidorin 1 (AcSpl), a C- terminal fragment of the spider silk protein major ampullate spidroin 1 (MaSpl) or a C-terminal fragment of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit and optional C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in SEQ ID NO: 1-83 or a homolog; and (d) a purification tag; The said SASP can be
expressed as one protein or as two fragments which are connected to intein sequences and joined together by intein splicing.
(ii) removing the purification tag from the recombinant SASP;
(iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and
(iv) isolating the SASP from said affinity chromatography column.
20. The method according to claim 19, wherein the recombinant SASP comprises an amino acid sequence of SEQ ID NOs: 1-83.
21. The method according to claim 19, wherein the purification tag is glutathione-S- transferase (GST), c-Myc, biotin, streptavidin, Small Ubiquitin-like Modifier (SUMO), hexa- histidine (He), H6-SUMO, Maltose Binding Protein (MBP), Thioredoxin (Trx), FLAG tag, streptavidin-binding peptide (SBP), calmodulin-binding peptide (CBP), S-tag, and hemagglutinin (HA).
22. The method according to claim 21, wherein the purification tag is Small
Ubiquitin-like Modifier (SUMO), hexa-histidine (He), or He-SUMO.
23. A method for producing a spider silk fiber, the method comprising:
(i) providing a recombinant SASP comprising from about 150 to about 5,000 amino acid residues, said SASP comprising: at least one W subunit, wherein each W subunit ranges from about 150 to 250 amino acid residues in length, and at least one non-repetitive fragment selected from: (i) a non-repetitive N-terminal fragment and (ii) a non-repetitive C-terminal fragment; wherein, the N-terminal (NT) non-repetitive fragment ranges from about 100 amino acids to about 150 amino acid residues, derived from spider silk protein major ampullate spidroin 2 (MaSp2); and wherein the C-terminal (CT) non-repetitive fragment ranges from about 100 to about 150 amino acids derived from: (a) the C-terminal fragment (Cac) of the spider silk protein aciniform spidroin 1 (AcSpl), (b) a C-terminal fragment (Cmai) of the spider silk protein major ampullate spidroin 1 (MaSpl) or (c) a C-terminal fragment (Cma2) of the spider silk protein major ampullate spidroin 2 (MaSp2), wherein the at least one W subunit and optional C-terminal (CT) non-repetitive fragment have an amino acid sequence as shown in SEQ ID NO: 1-84; and (d) a purification tag;
(ii) removing the purification tag from the recombinant SASP;
(iii) passing a solution containing the recombinant SASP through an affinity chromatography column; and
(iv) isolating the SASP from said affinity chromatography column and
(v) wet-spinning the collected recombinant SASP.
24. The method of claim 23, wherein the recombinant SASP comprises an amino acid sequence as provided in any one of SEQ ID NO: 1-44 and 47-84.
25. The isolated recombinant SASP of claim 24, wherein the amino acid sequence of said recombinant SASP comprises any one of SEQ ID NO: 2, 3, 5, 6, 37, 38 and 47-84.
26. The method of claim 23, wherein wet-spinning the recombinant SASP comprises, dissolving the recombinant SASP in at least two solvents selected from the group consisting of water, phosphoric acid, acetic acid, formic acid, hydrochloric acid, sulfuric acid, nitric acid, hexafluoroisopropanol (HFIP), hexafluoropropanol (HFP), hexafluoroacetone (HFA), trifluoroacetic acid (TFA), trifluoroethanol (TFE), and methylimidazolium chloride, and wherein the recombinant silk protein is wet-spun at a rate of 0.3 to 20 mL/hr in at least one coagulation- inducing solvent selected from the group consisting of methanol, ethanol, isopropanol, acetone, ammonium sulfate, and water.
27. The method of claim 26, wherein the at least two solvents is selected from the group consisting of water, trifluoroacetic acid (TFA) and trifluoroethanol (TFE).
28. The method of claim 27, wherein the solvent is 70% HFIP/30% dH20 (v/v), or 40-60% TF A/20-40% TFE/20% dH20.
29. The method of claim 23, further comprising drawing the spun recombinant silk protein.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662287564P | 2016-01-27 | 2016-01-27 | |
US62/287,564 | 2016-01-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017127940A1 true WO2017127940A1 (en) | 2017-08-03 |
Family
ID=59396933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2017/050099 WO2017127940A1 (en) | 2016-01-27 | 2017-01-27 | Artificial spider aciniform silk proteins, methods of making and uses thereof |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2017127940A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019151429A1 (en) * | 2018-01-31 | 2019-08-08 | Spiber株式会社 | Method for manufacturing protein fiber |
WO2020022395A1 (en) * | 2018-07-25 | 2020-01-30 | Spiber株式会社 | Artificial hair fiber, method for manufacturing same, and artificial hair |
WO2020158900A1 (en) * | 2019-01-31 | 2020-08-06 | 株式会社アデランス | Method for producing synthetic hair fibres, method for producing synthetic hair, synthetic hair fibres, and synthetic hair |
WO2021035184A1 (en) * | 2019-08-22 | 2021-02-25 | Bolt Threads, Inc. | Methods for improved extraction of spider silk proteins |
WO2021055440A1 (en) * | 2019-09-16 | 2021-03-25 | Bolt Threads, Inc. | Methods for isolating spider silk proteins via high shear solubilization |
WO2022016052A1 (en) * | 2020-07-17 | 2022-01-20 | Kraig Biocraft Laboratories, Inc. | Synthesis of high molecular weight proteins using inteins |
WO2022104233A1 (en) * | 2020-11-13 | 2022-05-19 | Washington University | Microbial production of titin fibers with exceptional mechanical performance |
WO2022211939A3 (en) * | 2021-02-25 | 2023-01-05 | Washington University | Microbial production of polymeric amyloid fibers |
CN116064486A (en) * | 2021-10-20 | 2023-05-05 | 深圳市灵蛛科技有限公司 | Fusion protein, modified cellulose thereof and preparation method |
US11667682B2 (en) | 2018-12-13 | 2023-06-06 | Washington University | Split intein mediated protein polymerization for microbial production of materials |
EP4361326A1 (en) * | 2022-10-31 | 2024-05-01 | Universidad de Sevilla | Method for generating a plurality of biocompatible fibres |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011063990A2 (en) * | 2009-11-30 | 2011-06-03 | Ludwig-Maximilians-Universität München | Silk particles for controlled and sustained delivery of compounds |
-
2017
- 2017-01-27 WO PCT/CA2017/050099 patent/WO2017127940A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011063990A2 (en) * | 2009-11-30 | 2011-06-03 | Ludwig-Maximilians-Universität München | Silk particles for controlled and sustained delivery of compounds |
Non-Patent Citations (3)
Title |
---|
HAYASHI CY ET AL.: "Molecular and mechanical characterization of aciniform silk: uniformity of iterated sequence modules in a novel member of the spider silk fibroin gene family", MOL. BIOL. EVOL., vol. 21, no. 10, 2004, pages 1950 - 1959, XP055401913 * |
TREMBLAY ML ET AL.: "Characterizing aciniform silk repetitive domain backbone dynamics and hydrodynamic modularity", INT. J. MOL. SCI., vol. 17, no. 8, 10 August 2016 (2016-08-10), pages E1305, XP055401909 * |
TREMBLAY ML ET AL.: "Spider wrapping silk fibre architecture arising from its modular soluble protein precursor", SCI. REP., vol. 5, 26 June 2015 (2015-06-26), pages 11502, XP055401908 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111655913A (en) * | 2018-01-31 | 2020-09-11 | 丝芭博株式会社 | Method for producing protein fiber |
WO2019151429A1 (en) * | 2018-01-31 | 2019-08-08 | Spiber株式会社 | Method for manufacturing protein fiber |
JPWO2019151429A1 (en) * | 2018-01-31 | 2021-01-14 | Spiber株式会社 | Method for producing protein fiber |
JPWO2020022395A1 (en) * | 2018-07-25 | 2021-08-26 | Spiber株式会社 | Fibers for artificial hair, their manufacturing methods, and artificial hair |
JP7320790B2 (en) | 2018-07-25 | 2023-08-04 | Spiber株式会社 | Fiber for artificial hair, method for producing the same, and artificial hair |
US20210214404A1 (en) * | 2018-07-25 | 2021-07-15 | Spiber Inc. | Artificial Hair Fiber, Method for Manufacturing Same, and Artificial Hair |
WO2020022395A1 (en) * | 2018-07-25 | 2020-01-30 | Spiber株式会社 | Artificial hair fiber, method for manufacturing same, and artificial hair |
US11667682B2 (en) | 2018-12-13 | 2023-06-06 | Washington University | Split intein mediated protein polymerization for microbial production of materials |
WO2020158900A1 (en) * | 2019-01-31 | 2020-08-06 | 株式会社アデランス | Method for producing synthetic hair fibres, method for producing synthetic hair, synthetic hair fibres, and synthetic hair |
WO2021035184A1 (en) * | 2019-08-22 | 2021-02-25 | Bolt Threads, Inc. | Methods for improved extraction of spider silk proteins |
CN114269398A (en) * | 2019-09-16 | 2022-04-01 | 保尔特纺织品公司 | Method for isolating spider silk proteins by high shear solubilization |
WO2021055440A1 (en) * | 2019-09-16 | 2021-03-25 | Bolt Threads, Inc. | Methods for isolating spider silk proteins via high shear solubilization |
EP4218843A3 (en) * | 2019-09-16 | 2023-08-23 | Bolt Threads, Inc. | Methods for isolating spider silk proteins via high shear solubilization |
WO2022016052A1 (en) * | 2020-07-17 | 2022-01-20 | Kraig Biocraft Laboratories, Inc. | Synthesis of high molecular weight proteins using inteins |
WO2022104233A1 (en) * | 2020-11-13 | 2022-05-19 | Washington University | Microbial production of titin fibers with exceptional mechanical performance |
WO2022211939A3 (en) * | 2021-02-25 | 2023-01-05 | Washington University | Microbial production of polymeric amyloid fibers |
CN116064486A (en) * | 2021-10-20 | 2023-05-05 | 深圳市灵蛛科技有限公司 | Fusion protein, modified cellulose thereof and preparation method |
EP4361326A1 (en) * | 2022-10-31 | 2024-05-01 | Universidad de Sevilla | Method for generating a plurality of biocompatible fibres |
WO2024094556A1 (en) * | 2022-10-31 | 2024-05-10 | Universidad De Sevilla | Method for generating a plurality of biocompatible fibres |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017127940A1 (en) | Artificial spider aciniform silk proteins, methods of making and uses thereof | |
Bowen et al. | Recombinant spidroins fully replicate primary mechanical properties of natural spider silk | |
EP1773875B1 (en) | Recombinant spider silk proteins | |
US20120231499A1 (en) | High-molecular-weight recombinant silk or silk-like protein and micro- or nano-sized spider silk or silk-like fiber produced therefrom | |
Chung et al. | Recent advances in production of recombinant spider silk proteins | |
EP2475677B1 (en) | Processes for producing silk dope | |
Thamm et al. | Recombinant production, characterization, and fiber spinning of an engineered short major ampullate spidroin (MaSp1s) | |
EP1919938B1 (en) | Modified spider silk proteins | |
JP5584932B2 (en) | Method for producing protein fiber | |
JP5427322B2 (en) | Polypeptide solution, method for producing artificial polypeptide fiber using the same, and method for purifying polypeptide | |
Xu et al. | Recombinant minimalist spider wrapping silk proteins capable of native-like fiber formation | |
US10981959B2 (en) | Compositions and methods for fabricating synthetic dragline spider silk | |
Zhu et al. | Tensile properties of synthetic pyriform spider silk fibers depend on the number of repetitive units as well as the presence of N-and C-terminal domains | |
EP2940033B1 (en) | Partial purification method for hydrophilic recombinant protein | |
Xu et al. | Structural characterization and mechanical properties of chimeric Masp1/Flag minispidroins | |
US8461301B2 (en) | Synthetic dragline spider silk-like proteins | |
JP2022540563A (en) | PROKARYOTIC EXPRESSION SYSTEMS AND METHODS OF USE THEREOF | |
Wang et al. | Properties of two spliceoforms of major ampullate spidroin 1 reveal unique functions of N-linker region |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17743535 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17743535 Country of ref document: EP Kind code of ref document: A1 |