CA2410974A1 - Polypeptides cleaving telomeric repeats - Google Patents

Polypeptides cleaving telomeric repeats Download PDF

Info

Publication number
CA2410974A1
CA2410974A1 CA002410974A CA2410974A CA2410974A1 CA 2410974 A1 CA2410974 A1 CA 2410974A1 CA 002410974 A CA002410974 A CA 002410974A CA 2410974 A CA2410974 A CA 2410974A CA 2410974 A1 CA2410974 A1 CA 2410974A1
Authority
CA
Canada
Prior art keywords
leu
lys
asp
arg
glu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002410974A
Other languages
French (fr)
Inventor
Haruhiko Fujiwara
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dnavec Research Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2410974A1 publication Critical patent/CA2410974A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

DNAs encoding polypeptides which cleave telomeric repeats are successfully isolated form a retrotransposon-like sequences present in the telomeric repeats of genomic DNA. The polypeptides encoded by the above DNAs show an activity of cleaving telomeric repeats of insects and vertebrates. Thus, polypeptides cleaving telomeric repeats, DNAs encoding these polypeptides and the production and use thereof are provided. These polypeptides are usable in specifically cleaving telomeric repeats. Moreover, these polypeptides are applicable to the control of cell aging and treatment of cancer.

Description

Z
DESCRTPTIQN
POLYPEPTIDES THAT CLEAVE TELOMERIC REPEAT SEQUENCES
Technical Field The present invention relates ~.o polypeptides that cleave chromosomal-end (telomere) sequences and DNAs encoding them.
Background Art The structures that form the termini of eukaryotic chromosomes are called "telomeres . " Sequences at chromosomal ends tend to become lost due to 'the failure of DNA polymexase to replicate the very ends of DNA. Telomeres have a function of protecting the chromosomes ~rom losing their ends (Watson, J. D. , 1972, Nat. New Biol . 239:197-201) .
Telomeric DNAs of most eukaryotes are composed of tandem cr. rays of 5 Lo 8 by repetitive units, which are elongated by the reverse transcriptase called telomexase (BJ.aokt~»rn, E. H., 1994, Cell 77:621-623; Lingner, ,3. et al., 1995, Science 26'9:1533-1534).
The length of a telomere is presumed to be closely associated with senescence. The length of a human telomere has been estimated to fall in the range of approximately several kb to a dozen kb. The teJ_omeres in the chromosomes of adult or elderly tissues are shorter than those of fetuses or newborns, and thus the telomere tends to become shorter with age. The telomere protects various genes in the chromosome. Hence, there is a possibility that telomere shortening rESUlts in impaired chromosomal stability and interferes with normal gene expression.
The teJ.omere is also associatedwith cancer. Telomerase, which is an enzyme that elongates the telomeres, is not expressed or is barely expressed in cells of. normal tissues that are noi: actively dividing. However, cancer cells proliferate unlimitedly, and hence the cells must express telomerase that compensates for telomere shortening associated with cell proliferation. Indeed, various cancer cells have been found to express telomerase (Shay, J. W. and Bacchetti, S. , 1997, Eur. ,1. Cancer 33 : '787-791; Kim, N. W. et al. , 1994, Science 266: ~OI1-2015). Thus, Lhe induction of telomerase expression may play key roles in oncogenesis and advancement of cancer .
Based on ~Lhis hypothesis, studies are now in progress to develop anti-Cancer ageni:s which i:arget telomerase (McKenzie, K. L. et al. , 1999, Mol. Med. Today 5: 114-22; Lichtsteiner, S. P. et al. , 1999, Ann. N. Y. Acad. Sci. 886: 1-11; 4ulton, R. and Fiarrington, L. , 2000, Curr. Opin. ~ncol. 1?: 74-$1; Hahn et al. , 1999, Nat. Med. 5, 1164;
Zhang, X. et al., 1999, Gene Dev. 13, 2388--2399). However, even with the inhibition of teJ_omerasc, immediate anti-cancer effects cannot be expected, because it takes some time before the existing J.0 telornere sequence is shortened in cells which originally contain a long telomeric repeat sequence. In addition, cells axe known to have a tel.omerase--independent telomere-maintaining mechanism falternative lengthening of telome~ces; ALT)(Bryan, T. M. et al., 1997, Nature Med. 3: 1277.-1274) . Thus, there is a possibility that telomcre shortening may not be effectively achieved by inhibiting telomerase.
The telomere of the silkworm Bambyx mori comprises the iTTArt;) n repeat, which is conserved widely among insects (C7kazakz, S. et cal. , 1993, Mol. Cell. Biol. 13:7.42!-1432; Sahara, K. F. et al., 1999, Chromosome Res.7:449-460). In addition,the Bombyx telomereshaxbor telomeric repeat-associated retrotransposons, such as TRAS1 and SART1.
TRAS and SART are classified into families of non-long-terminal-repeat (non-LTR) retrotransposons, which have poly (A) tracts at the 3'-ends, and are inserted into the (TTAGG)n telomerio repeat in opposite directions (Okazaki, S. et al. , 1995, Mol . Cel l . Biol . 15 : 4545-4552 ; Takahashi , H . et al . , 1997 , Nucleic Acids Res. 25:1578-1584). A member of these families, TRAS1, has a poJ.y (A) tail inserted facing the centromere. TRAS1 is a unique retrotransposon that exists at a specific position within the teJ.omeric repeat . The structure of TRAS1 is highly conserved among most of the 300 odd copies present in the telomeres of a chromosome (Okazaki, S. et al. , 1995, Mol. Cell. Biol. 15:4545-4552; accession number. D38414) .
Non-LTR retrotransposons make a very abundant family and are wide7.y distributed throughout the genome in most eukaryotes (Gabriel, A. and Boeke, ,T. D. , 1993, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y. ) . Some of them, however, are inserted within specific sites on the chromosome. For example, Txl of Xenopus laevis is inserted within another txa»sposable element (Garrett, J . F . et al. , 1989, Mol. Cell. Biol. 9:307.8-3077) . CRE1, S7,ACS and CZAR are fou»d in spliced leader exons of trypanosomes (Aksoy, S , et al . , 1990, Nuc7,eic Acids Res. 18:785-792; Gabriel, A. et al. , 1990, Mol. Cell.
Biol . 1U : 615-624; Villanueva, M. S . et al . , 1991 , Mol . Cell . Biol .
11:6239-6148) . R1 and R2 are located at specific sites of 28S xF7NA
in most insects (hickbush, T. H. , and B. Robins, 1985, EMBO J.
4:2281-2285; Fujiwara, H. et al., 1984, Nucleic Acids Rcs.
12:6861-6869; Jakubczak, J. L. ev al. , 1991, Proc. Natl. AGad. Sci.
USA 88 :3295-3299) . In two mosquito species, RT1 and RT2 are inserted at the same position about 630 by downstream of the R1 insertion site (Besansky, N. et al. , 1992, Mol. Cell. Biol. 12:5102-5110; Paskewitz, S. M., and ~'. FI. Collins, 1989, Nucleic Acids Res. 17:8125-8133).
Recent reports have shown that two different types of endo»vcleases encoded in non-LTR retrotransposons, function to cleave target site DNA. R2 and some other families such as CRE1 and CZAR
have a single open reading frame (ORF) that encodes an endonuclease domain near its C-terminal end. This type of c~ndonucleases is similar to some amino acid residual motifs scan in vax'ious p:rokaxyotic restxictio» a»donucleases (Yang, J. et al. , 1999, Proc. Natl . Acad.
Sci. USA 96:7$47-7852) . The R2 protein first makes a specific nick on the non-coding bottom strand (i.e. the antisense strand of the integrated retrotransposon). Reverse transcription of the RNA
template is then primed by the exposed 3' - hydroxyl group, a process termed target-primed revexse transcriptio» (TPRT) , before cleavage of the top strand ti. e. the sense strand of, the integrated rctrotransposon) (Luan, D. D. et al. , 1993, Cell 72:595-C>05) _ The sECO»d type of endonuclease is an apurinic/apyrimidinic (AP) endonuclease-like domain, encoded in the N-terminus of ORF2 of many non-LTR retrotransposon groups , such as L1 , R1 and TxlL , which have two ORFs (Cost, G. J., and J. D. Boeke, 1998, $iochemistry 37:28081-18093; Feng, Q. et al . , 1996, CeJ.I 87:905-916; Christensen, S. et al., 2000, Mol. Cell. Biol. 20: 1219-1226; Feng, Q. et a;L., 1998, Proc. Na~.l. Acad. 5ci. USA 95:2033-2088; Malik, H. S. et al., 1999, Mol. Biol. Evo~.. 16: 793-805) .
However, there is no proof that the retrotransposon-like sequences present in telomere sequences, including those belonging to the TRAS family and the SART family, are transposable . In addition, it remains to be clarified whether the sequences encode proteins that are actually associated with the transposing function.
Recently, it was reported that human topoisomerase II cleaves the tandem repeats of telomeric DNA (TTAGGG) n only in the presence of topoisomerase IZpoison (Moon, H. J. et al. , 1998, Biochim. Biophys.
Acta. 1395:110--7.20) . Topoisomerase II is known as a douk~le-strand endonucl.ease, This enzyme also cleaves sequences other than telomeric repeats , arid consensus sequences which are cleaved with the enzyme includes a broad range of sequences (Spitzner, J. R. et al., 1995, Mol. Pharmacol. 4$:238--249).
Disclosure of the Itiverition An obj ective of the present invention is to provide polypeptides cleaving telomeric repeat sequences. Another objective of the present invention is to provide DNAs encoding these polypeptides.
Still another obj ective of the present invention is to provide methods for producing these polypeptides and DNAs and uses of these polypeptides and DNAs.
The present inventor isolated several retrotxansposari-like SPC~uences present ad'~acent to the telomex-ic repeat sequences of silkworm and other insects . These sequences named ~~the TRAS family"
exhibit significant homology to one another. The present inventor identified an apurinic/apyrimidinic (AP) endonuclease (EN)~l.ike domain in the TRAS family. Then, only this domain was specifically amplified from a TRAS member belonging to the family by PCR, and inserted into an expression vector . The vector was introduced into E. cola to express and purify a polypeptide comprising the EN domain (TRAS1 LN). Then, whether the purified polypeptide cleaved the sequence of insect telomeric repeat 5'-(TTAGG)"/5'-(GGTAA)n was examined. The result demonstrated that the TRAS1 ~N polypeptide interacted with a DNA sequence at a particular position upstream of the 5'-(TTAGG)" strand (also referred to as the "G strand" or '"bottom si:rand") and had an activity of nicking every 5 by in the insect i:elomexi.c repeat sequence arranged in tandem. In addition, the polypeptide showed an activity of nicking the 5' - (CCTAA) h strand 5 (also referxed to as tine "C strand" or "top strand") in a site-specific fashion. The cleavage site of TRAS1 EN was examined using as a substrate a DNA having long non-te3.omeric repeat sequences at both ends of a short telomexic repeat sequence. As a result, TRAS7. EN
polypeptide was found to specifically cleave the telomeric .repeat sequence.
TRAS1 EN exhibited an activi~.y of cleaving 5' - (CTAGG) n/S'- {CCTAG) n arid S' - {TCAGG) "/5' - (CCTGA) n which ax'e similar to the siJ.kworm telomeric repeat sequence 5'-(TTAGG)n/5'-(CCTAA)n and which have a single nucleotide substitution T->C conserving purine and pyrimidine, although the cleaving specificity was lower compared to that of 5' - (TTAGG) n/~' - (CCTAA) ~. However, TRAS1 EN showed almost n~ cleavage activity to sequences with other substitutions, namely 5' - (TTCGG) r,/5'- (CCGAAj n, 5' - (TTACG) n/5' - (CGTAA) n, and 5'-(TTAGC)n/5'-(CGTAA)n. In addition, TRAS1 EN did not cleave the AP DNA (Torres--Ramos , C . A, et al . , 2000 , Mol . Cel ~, . Biol . 20 :
3522--3528 ) (data not shown). Thus, TRAS1 EN has an activity to specifically cleave a telomeric repeat sequence. Further, surprisingly, TRAS1 EN showed an activity to cleave not only the telomeric repeat sequence 5' - (TTAGG) n/ 5' - ( CCTAA) " conserved among arthropods , but also the telomeric repeat sequence S'-(TTAGGG)n/S'-(CCCTAA)nconserved among vertebrates. Thus, the present inventor succeeded in isolating a polypeptidc having an activity to specifically cleave a number of the eukaryotic telomeric repeat sequences for the first time. The present invention also demonstrates that the expression product of a particular fragment of DNA derived from a retrotransposon-like sequence present in a telomeric repeat sequence has the activity of cleaving an eukaryotiC telomeric repeat sequPnre for the first time.
zn general, during retrotransposition by non-LTR-type retxoelements, it is thought that the first strand cleavage primes reverse transcription of the RNA template , which is followed by the s second-st~'and cleavage on the top strand. For site-specific integration, retrotransposons nave to make a precise nick on the top strand. However, it was still unclear whether tY~.e AP
endonuclease-Like protein (EN) encoded by site-specific retrotransposons is responsible for the second strand c~.eavage reaction. Although R1 EN was shown to cleave its target 2$S rDNA
sequence in a precise site on the bottom strand, top strand cleavage seemed to be less specific, with a few nonspccifi.c products (Fend, Q. et al. , 1998, Proc. Nato.. Acad. Sci. USA 95:2033-2088) . Similarly, 20 overJ.apping of target sites due to site-specific cleavage of the top strand, has not been clearly ob3erved for TxIL EN so far (Christensen, S. et al. , 2000, Mol. Cell. Biol. 20:1219-1226) . The present invention reveals that the EN polypeptide from the site--specific non-LTR
retrotransposon TRAS1 cleave both strands of double-stranded DNA, wherein the site-specific cleavage on the second strand is also executed by the EN polypeptides, The present invention also demonstratPS that the polypeptide of the invention is capable of carrying out the second-strand cleavage even in the absence of an RNA template from which the polypeptide was derived or the reverse transcription reaction. This is a striking contrast to a site-specific endonuclease encoded in the C-terminal region of the ORF in the R2 element, which requires an RNA transcript for cleavage of the second nNA strand (huan, D. D. and T. H. Eickbush, 7.996, Mol.
Cell. Biol. 16: 4726-4734; Yang, J. H, and T. H. Eickbush, 199$, Mol.
Cell. Biol. 18: 3455-3465).
The polypeptides of the present invention and the DNAs encoding the polypeptides can be used as reagents and pharmaceuticals for specifically cleaving a telomeric repeat sequence. As described above, the loss of the telomeric repeat sequence is associated with physiologicalphenomena includingsuppression of cell proliferation, chromosome destabilization, cell death such as apoptosis, malignant transformation, and senCSCeriCe. Chromosome desf.abilization and cell senescence can be induced by expressing a polypeptide of the present invention to cleave the telomeriC repeat sequence in cells .
Such cells can be used as a model for senescence, and thus are useful to develop preventives for senescence and methods foz- preventing senescence. Further, since the maintenance of the telorneric repeat sequence plays a key role in the growth of cancer cells, the use of polypeptidcs of the present invention against cancer cells can result in tho suppression of cancer cell proliferation or induction of cca~1 death in cancer cells due to loss o~ the telomeric repeat sequence. Thus, the present invEntion al;Lows a novel anti-cancer gene therapy_ The present invention xelates to polypeptides cleaving telomex'ic repeat sequences and DNAs encoding the polypeptides, and production and use thereof, and more specifically xelates to:
(1) a DNA derived from a retrotransposon-like sequence present adjacent to a telomeric repeat sequence, wherein the DNA encodes a polypeptide having an activity to cleave a double-stranded DNA
comprising a telomeric repeat sequence;
(2) the DNA of (1) , which is derived from the genomic DNA of an insect;
(3) a DNA selected from the group consisting of (a) to (e) (a) a DNA comprising a nucleotide sequence selected from f.he group consisting of SEQ ID NOs: 38, 42, 4S, 48, 51, 54, 57 and 6U;
(b) a DNA encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOs : 36 , 37 , 39, 40, 41, 43, 44, 46, 47, 49, 50, 52, 53, 55, 56, 5~, 59, 61 and 62;
(c) a DNA encoding a polypeptide that (i) comprises an amii'to acid sequence selected from the group consisting of SEQ zD NOs: 36, 37, 41, 44, 47, 50, 53, S6, 59 and 62, wherein one or more amino acids have been substituted, deleted, inserted and/or added, and (ii) has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence;
(d) a DNA encoding a polypeptide that (i) is encoded by a DNA
hybridizing to a DNA encoding an amino acid sequence selected from the group consisting of SEQ rD NOs: 36, 41, 44, 47, 50, 53, 56, 59 and 52, or a partial sequence thereof, and (ii) has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence;
and (e) a DNA encoding a fusion pot.ypeptide between the po7.ypeptide encoded by a DNA according to any one of (a) to (d) and another polypeptide;
(4) a DNA of any one of. (1) to (3), which is used to cleave a telomeric repeat sequence in a DNA;
(S) the DNA of (4), wherein the telomeric repeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA);
( 6 ) a polypeptide encoded by a DNA according to any one of ( 1 to (5) ;
(7) the po).ypeptide of (6) , which is used to cleave a telomeric repeat sequence in a DNA;
( $ ) the polypeptide of ( 7 ) , wherein the telomeric xepeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA);
(9) a vector comprising a DNA of ariy one of (1) to (5), or a transcriptional product thereof;
(10) a host cell comprising the vector of (9);
( 11 ) a method for producing a polypeptide c.C aiiy one: oL ( 5 ) to ($) , wherein the method comprises the steps of culturing the host cell of (10) and recovering the expressed polypept~.dc from the host cell, or culture supernatant thereof;
(12) a partial polypeptide of. a polypeptiae comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 39, 40, 43, 46, 49, S2, SS, 58 and 61;
(13) a DNA encoding the partial polypeptide of (12);
(14) a polynucleotide comprising at least 15 nucleotides 2S complementary to a nucleotide sequence se7.ected from SEQ ID NOs:
15 to 28, 38, 42, 45, 4$, SI, 54, 57 and 60 or the complementary ~tr_and thereof;
(15) a DNA or polynucleotide of any one of (1) to (3) , (l~) or ( 14 ) , which is used to isolate a DNA encoding a polypeptide hav'~.ng an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence;
(16) a method for. cleavi.rig a telomeriG repeat sequence using a polypeptide o~ any one of (6) to ($) , wherein the method comprises the step of. contacting the polypeptide with a DNA comprising the telomeric repeat sequence;
(17) a method for cleaving a telome.ric repeat sequenoe using a DNA of any one of (1) to (5), wherein the method comprises the step of expressing the DNA of any one of (1) to (5) under conditions in which the expression, product can come into contact with the DNA
comprising the telomeric repeat sequence;
(18) the method of (16) or (17) , wherein the telomeric repeat sequence is a repetition o~ the sequence (5'-TTAGGG/5'-CCCTAA);
(19) a transgenic nonhuman vertebrate that harbors a foreign DNA of any one of (1) to (5) in an expressible manner;
(2p) thetransgenicnonhumanvertebrateaccordingto (19) , which is mouse or rat;
(21) an agent for cleaving a DNA containing a telomeric repeat sequence, wherein the agent comprises as an active ingredient a DNA
of any one of (1) to (5) , or the vector of. (9) ;
(72) an agent for cleaving a DNA containing a telomexxG repeat sequence, wherein the agent comprises as an active ingredient a polypeptide of any one of (6) to (8) ;
(23) the agent of (21) or (22) , telametic repeat sequence ig a repetit~.on of the sequence (5'-TTAGGG/5' ~CGCTAA) ;
(24) a method for isolating a DNA encoding a polypeptide having an activity to cleave a double-stranded DNA comprising a telomeric repeat sequeriCe, wherein the method comprises the step of screening fox a polynuCleotide that hybridizes with at least one probe selected from the DNAs or palynucleotides of. (15); and (25) a method for isolating a DNA encoding a polypeptide having an activity to Cleave a double-stranded DNA compr.~ising a telomeric repeat sequence, wherein the method comprises the step of &mplifying a DNA using as a primer at least one of the DNAs or polynucleotides of (15) .
As used hersiz~, the term "telomeric repeat sequence" refers to the r..epetitive sequence present at chromosomal ends (telomere) of genomic DNA of eukaryotes . The telomeric repeat sequence is highly conserved throughout the eukaryotic spec ies , and typically comprises a 54c~uence of, tandem repeats of a characteristic unit in which one strand contains T and G abundantly and 'the other A and C abundantly.
For example, major. units of 'the telomeric repeat seguence that are highly conserved are: [5'-TTGGGG/5'-CCCCAAI in tetrahymena that is a protist; [ 5' -TCTGGGTG (TG) 1. 3/5' - (CA) 1-3CACCCAGA] and others in yeast (Saccharomyces) (Cohn, M. et al. , 1998, Curr. Genet. 33: 83-91) ;
[5''-TTAGGC/5'-GCGTAA] in nematode; [S'-TTTAGGG/5'-CCCTAAA] iri plant; [5'-TTAGG/5'-CCTAA] in insect; and [5'-TTAGGG/5'-CCCTAA] in 5 vertebrate ("Telomeres" (Eds. E. H. Blackburn and C. W. Greiner) GSHL Press , 1995 , Chapter 2 by E . Hendreson "Telomere nNA structure"
ppll-34; "The Telomere" by D. Kipling, Oxford Univ Press, 1995, Chapter 3 "Telomere structure" 31-69; Zakiaz~, V. A. (1995) "Telomeres beginning to understand the end". Science, 27p, 1601-1607) . Thcxe 10 axe some variant telorneric repeat sequences in addition to the major repeat unit sequences. ThC tclorneric repeatsequencesof the present invention can include such variants. Repetitive sequences comprising different: types of units in cornbinatipn are included in the telomeric repeat sequences of the present invention. In the present invention, preferred telornerxc repeat sequences are those derived from a vertebrate or art arthropod. In the present invention, telomeric repeat sequences derived from a human. or a silkworm are more ~sr~eterZed. Alternatively, in the piw5eul. invention, preferred telomeric repeat sequences are the sequences of the [5'-TTAGGG/5'-GGGTAA] repeat and [5'-TTAGG/5'-GGTAA] repeat.
As used herein, the term "cleavage of DNA" includes cleavage of double strand DNA as well as single strand DNA. The term "cleavage of double strand DNA" includes cleavage of one strand in the double-stranded DNA and also a cleavage of both strands . For example , in the present invention, a nick generated in a double-stranded DNA
is also referred to as "cleavage of DNA".
As used herein, the team "isolai:ed" refers to being placed apart from an original environment. When a substance has been separated from the whole or part of a co-existing material in a natural system, it can be said to be "isolated". For example, unlike the natural genornic DNA in organisms, an isolated DNA does not maintain the original continuity. Artificially synthesized DNAs and polypeptides are also considered as "isolated". For example, a product amplified by a polymerase chain reaction is also considered as "isolated". Isolated DNAS can be contained in a vector. Further, an isolated DNA or polypeptide can be a component of a composition.

1~
Even when such a vectar or compas~tion is returned to the original natural enviranmez~t, xt is considered as "isolated".
The term "recombinant DNA" refers to DNA comprising nucleotide sequences that are arranged in a manner different to that found in nature. Such recombinant DNAs can be produc2d by genetic recombination techniques. A DNA campx.ising an isolated nNA is a recombinant DNA. The term ".recombinant palypeptide" refers to a polypep ~xde ~ha L lags been produced in a form different from the natural form. An artificially synthesized polypeptide is a recombinant polypeptide. A polypeptide expressed from a recombinant DNA is also a recombinant polypeptide.
"Polypeptide" refers to any peptide or protein comprising two yr mare arnina acids linked by peptide bonds. or modified peptide bonds.
"Pol.ypeptides" mayinclude peptide isosteres. "Polypeptide" refers 7.5 to both short chains , commonly referred to as peptides, oligopeptides or oligomers, and to longer chains , generally referred to as proteins .
"Polypeptides" may be modified either by natural processes, such as posttranslational processing, or by artificial techniques. Such modifications include those in the peptide backbone, the amino acid side-chains and the amino or carboxyl termini, etc. Polypeptides may be branched, or they may be cyclic . Examples of modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent attachment such as: [flavin, a nucleotide, a nuCleotxde derivative, a lipid, a lipid derivative, ox phasphotidyl~.nositoJ. ) , crass-linking,cyclixation,disul,fide band tormation,demethylation, formation of pyroglutamate, gamma-carboxylation, glycosylation, hydroxylation,iodination,methylation,myristoylation,oxidation, phosphorylation, ubiquitination, etc. (Creighton, T. E., "Proteins - Structure and Molecular Properties", 2nd ed., W. H. Fxceman and Company, New York (1993)).
As used herein, the term "DNA deri~red from" a DNA refers ~:o a DNA whose material or informational origin is the DNA from which it is derived from (original DNA). A DNA comprising a sequence identical to the original DNA or a DNA whose sequence has been modified by introducing a mutation into the original DNA is also a DNA der, ivcd from the original DNA. A DNA obtained by adding another DNA sequence ~.o the original DNA or a partial sequence thereof is also a DNA derived from the original DNA. A DNA derived from an original DNA can be prepared by introducing one ar more nucleotide substitutions, deletions, insertions and/or additions to the nucleotide sequence of the original DNA. The replication product of the original DNA
is a DNA derived from the original DNA. These c:an be DNAs amplified by a vector sys tem such as a plasmid , phage or, virus , or a~.ternative~.y can be DNAs synthesized in vitro by the polymerase chain reaction (PCR) me~:hod, or the like. Far example, a mutant DNA produced by 1D PCR using an original DNA as the template under particular cc~r~ditions is a DNA derived from the original DNA. A DNA artificially prepared based on the sequence information of the original DNA is also a DNA
derived fxom the original DNA.
The present invention is illustrated in detail below, All publications and other references cited herein are incorpox'ated by reference.
The present ixiven Lion provides polypeptide~ having an activity to cleave doub7.e-stranded DNAs comprising telameric repeat sequences and DNAs encoding the palypeptides, in which the DNAs are derived froma sequence adjacent to a tclomeric repeat sequence. The present inventor succeeded in isolating a polypeptide having an activity to specifically cleave a eukaryotic telomeric repeat sequence and a DNA encoding the polypeptide from a retrotr_ansposon-like sequence linked to the telomeric xepeatsequence. The polypeptide named TRA~1 EN exhibited an activity to specifically cleave a number of eukaryatic telomexi.c repeat sequences . ,~ polypeptide of the present invention and the DNA encoding the polypeptide are useful to cleave a telorneric repeat sequence.
The term "sequence adjacent to a telomeric repeat sequence"
.refers to a sequence linked directly to the telomeric repeat sequence or a seguence present near the telomeric repeat sequence . In general. , a sequence located v~ri~hin 50 kb, preferably within 30 kb, from a sequence of a telomeric repeat sequence in the genome is said to be adjacent to the telomerxG repeat sequence.
3S In the pxesent invention, a sequence adjacent to a telomeric repeat sequence is preferably a sequence present in the ~:elameric repeatsequence. The term "present in the telomeric repeat sequence"
rofers to a sequence flanked by a telomeri,c repeat sequence on both sides . Such a sequence is typically a sequence flanked by a telomeric repeat sequence on both sides, with a size that is 50 kb ar less, more preferab.l,y 30 kb or, less, even more preferably ~S kb or less.
There is no limitation an the length of the flanking telomeric repeat sequence; preferably, a ~~elomeric repeat sequence consisting of 10 nucleotides or more is linked direcl:ly l.u aL IedSt ane end; more preferably, a telomeric repeat sequence consisting of l5 nucleotides or more, and even more preferably 20 nucleotides 4r more, is linked directly to at least one end . The telomeric repeat sequence an both sides should be arranged in the same orientation.
A method for isolating such DNAs comprises the step of screening a genomic library of a eukaryote using a ~eLameric r_cpeat sequence as a probe . There is no limitation on the type of genomic library;
for examp7.e, a lambda phage genomic library can be used. When a 7~ phage library into which restriction fragments of .genomic DNA have been inserted is used, each clans typically contains a genomia fragment of approximately several kb to 20 kb. Thus, such a library provides a DNA adjacent to a telomeric repeat sequCnce, the length ai which falls within the above-mentioned range. A sequence present away from a teJ_americ repeat sequence can be obtained by re--screening the library using as a probe a portion of the sequence obtained first.
The amino acid sequence encoded by an open reading frame in the sequence can be deduced by determining the DNA sequence of the insert fragment in the isolated clone. For example, a DNA encoding a polypeptide having an activity to cleave a telomeric repeat sequence can be obtained by identifying a DNA region encoding the amino acid sequence in a range homologous to an EN polypeptide by comparing the deduCCd amino acid sequence and the amino acid sequence of an EN polypeptide of the TR.AS family (for example, SEQ ID NO: 36) . Such a comparison with TRAS EN can be carried out by a known method using a computer program. The comparison can be carried out simply, for_ example, by BLAST (Karlin, S, and S. F. Altschul, 1990, Proc. Natl. Acad.
Sci. USA 87:2264-68; Kaxlin, S. and S. F. Altschul, 1993, Proc. Natl:
Acad. Sci. USA 90:5873-7) or. CLUSTAL W (Thornpson, J. D. et al . , 1994, Nucleic Acids Res . 22 : X873-80) described below, When the DNA region is specified, then a DNA encoding a polypeptide cleaving a tQlomeric repeat sequence can be prepared by amplifying the region by PCR.
A polypeptide of the present invention can be prepared by inserting the DNA into an expression vector and expressing the polypeptide.
Ay the procedure described above , a DNA encoding a polypeptide having an activity to cleave a double-stranded DNA comprising a tel.omeric repeat sequence can be prepared fxom a portion of a retrotransposon-like sequence adjacent to a telomeric rCpeat sequence. The term "retrotransposon-like sequence" refexs to retrotransposan sequences and sequences ext~ibitirig a structural similarity to known retrotransposons. Retrotransposons are some~:imes referred to as "retroposons" or "retraelernents"
(retrofactors or retrosequences). A retrotransposon-like sequence includes, regardless of the presence or absence of transposing activity, a sequence that has presumably functioned as a r~e ~ro~transposon previous 1y, but has recently degenera-t.Pd , and a sequence derived from a retrotransposon such as a portion of a retrotransposon. REtratransposonsare categorised roughly into two groups : LTR type containing a long repeat sequence at its end (long ~:erminal repeat; LTR) ; and non-hTR type without such a repeat. Each contains an open reading frame (ORF) characteristic of a retrotransposon in its sequence (Malik H . S . et al . , 1999 , Mol . Biol .
Evol . 16 : 793-805 ) . The non-LTR type retratransposon sometimes has a poly (A) sequence at it.s end. Retroposons are similar to one another in the amino acid sequence encoded by the ORF. Based on these characteristics, it can be determined whether a sequence of intorest is a retrotransposon. Known retrotransposons include,for example, the elements described by Malik H. S. et al. (1999, Mol. Biol. Evol.
16: 793-805) and Xiong, Y. and Eickbush, T. H. (1988, MoJ:. Biol.
Evol. 5:675--690). Retrotranspason-like sequences of the present invention include nucleotide sequences of known retrotransposons, including the retrotransposons described in the above references, or sequences having significant homology to the amino acid sequence of polypeptides encoded by any ORF of these retrotransposons.
A retrotransposan sometimes has a single open reading frame and sometimes multiple open reading frames . In general , there are ORES encoding gag-like palypeptides and oRFs encoding pQ1-like polypeptides; a poZ-like ORF encodes a polypeptide comprising a relatively highly conserved amino acid sequence similar to that of 5 reverse transcxiptase. When comparing amino acid Sequences, for example, an amino acid sequence of interest is compared with any at the amino acid sequences of polypeptides encoded by .known retrotransposons and whether or riot the sequence of interest has a significant homology with the known retrotransposon is determined 10 to assess whether the sequence of interest is a retrotrarisposon-like sequence. When one intends to compare nucleotide Sequences, for example, the nucleotide sequence of the ORF region or untranslated region of a DNA sequence of. interest is compared with the nucleotide sequences of known retrotransposons to evaluate whether the sequences 15 have a significant homology, ror example, it can be safe to conclude that a sequence has significant homology when the ~ value (Expect value) 'to the nucleotide sequence or amino acid sequence of a known re trotransposon is typically about 1 or less , preferably about 0 . 5 or less, more preferably about 0.1 ar less, in the assessment by BLAST (Altschul, S. F. et al. (1997) NucleicAcidsRes. 25:3389-3402) descra.bed below . It is also possible to examine whether the sequence is a retrotransposon-like sequence by phylogenetic analysis. If phylogenetic analysis reveals that in a phylogenetic tree prepared using a DNA sequence of interest (or the amino acid sequence encoded by an ORF), the sequence is located on a branch corresponding t sequences comprising any one of the known retrotransposons and/or' retrotransposon-like sequences, but not non-retxotransposon sequences (for example, reverse transcr.ipt:ase and celJ.ular DNA/RNA
polymerase of retrovirus) , the sequence of interest is predicted to be a ret.rotransposon-likc sequence. The bootstrap probability determined far the branch is preferably 50% or higher, more preferably 80% or higher, even more preferably 90% or higher, most preferably 95% ox higher . Such phylogenetic analyses can be ca rried out, fox example, by a known phylogenetic analysis method such as the neighbor-joining method (Saitou, N. and Nei, M. , 7.9$7, Mol . Biol .
Evol. 4:406--425). Among retrotransposons, some lose their transposing activity due to accumulation of mutations or such, and thus their sequences are degenerated. A DNA encoding a pol,ypeptide that cleaves a telomeric repeat sequence can be isolated from a partial sequence of such a sequence. The DNA encoding a polypeptide of the present invention that cleaves a telomeric repeat sequence is preferably derived from the retrotransposon-like sequence belonging to a non-LTR type retrotransposon.
A sequence derived from a retrotransposon-like sequence normally comprises at least a portion of the original .retrotransposon-like sequence. Such a sequence typica~.ly contains at least 15 consecutive nucleotides, preferably 20 nucleotides or more, more preferably 30 nucleotides or more, even mare preferably 40 nucleotides or more, most preferably 50 nucleotides or more (for example, 1.00 nucleotides or moxe) of the original retrotransposon-like sequence. Alternatively, the sequence derived from the retrotransposon-like sequence typically encodes at leas L a portion of the amino acid sequence encoded by the ~rigir~aJ.
retratransposon-like sequence. Such a sequence typically contains a nucleotide sequence encoding at least 5 consecutive amino acids, preferably 6 amino acids or more, more preferably 7 amino acids or more, even more preferably 8 amino acids o.r. more, most preferably 10 amino acids or more (fox example, 15 amino acids or more) of the amino acid sequence encoded by the original retrotransposon-like sequence.
As shown in Examples, the present inventor isolated a number of retrotXansposon-like sequences adjacent to telomeric repeat sequences, which were collec~4ively named the TRAS family (Fi$s. 1 and 3 ; SEQ ID NOs : 15 to 2 8 , and 3 8 ) by screening a genomic 1 ibrary using a telomerE sequence (TTAOG) 5 as a probe. These DNAs are useful to express tho polypeptide of the present invention and as a probe or primer to isolate DNAs encoding the polypeptides of the present invention.
A DNA encoding a polypeptide that cleaves a telomeric repeat sequence can be isolated by screening a genomic library of a eukaryote using a retrotransposon-like sequence or the above-mentioned telomexic repeat sequence as a probe . In one embodiment of the method, a genomiC libraxy of a eukaryote is screened using an isolated retrotransposon sequence as a probe. There is no limitation on the type of. genomic librazy; for example, a lambda phage genomio library can be used. When the telomeric repeat sequence is used as a probo at the same time, it is also possible to judge whether a sequence is linked adj acent to the telomeric repeat sequcrice . The DNA sequence of the insert fragment of an isolated clone is determined, and the open reading frame is identified from the sequence. Then, the amino acid sequence deduced from the identified open reading frame is compared with the amino acid sequence of an EN polypeptide ( for example , SEQ ID NO: 36) of the TRAS family to identify a location of the DNA
oncoding the amino acid sequence homologous to the sequence. The comparison with the amino acid sequence of an EN polypeptide of the TRAS family can be carried out by a known method using BT~AST or ChUSTAL
W, etc. Once the DNA region is specified, then the DNA ericading a polypeptide cleaving a telomeric repeat sequence can be obtained by amplifyir~t~ Lhe region, for example, by PCR. A polypeptide of the present invention can be prepared by inserting the amplification product into an expression vector and expressing the polypeptide.
It is also possible to isolate a DNR encoding a polypeptide that cleaves a telomexic repeat sequence by DNA amplification using PCR instead of hybridization. For example, the present inventor isolated a number o.f sequences of the TRAS-like family (Figs. 14 to 17; SEQ ID NOs: 42, 45, 4$, 51, 54, 57 and 60) by the CODEHOP
method using a pair of primers to amplify the TRAS-like family. A
DNA encoding a polypeptide cleaving a telomeric repeat sequence can be isolated by such a method. As mentioned above, the isolated DNA
is useful to express a polypeptide of the present invention. In addition, a partial sequence of the isolate, and such, can be used as a probe or primer to isoJ.ate a DNA encoding another polypeptide of the present invention.
When one intends to screen a genomic DNR of a eukaryote to isolate a DNR of the present invention, genomic DNA of an arthropod, partiGUlarly, genomic DNA of an insect is preferred; genamic DNA
of an insect of Lepidoptera, including the silkworm, is particularly preferred. For example, DNA comprising SEQ ID NO: 15 to 28, 38, 1~
42, 45, 48, 51, 54, 57 or 60, or a portion thereof, and such can be used as the probe, primer, and such. In addition, it is possible to use a DNA which has been designed to encode the amino acid sequence of SEQ ID N0: 36, 39, 40, 43, 46, 49, 52, 55, 5f ox 61, or, a portion thereof. It is preferable to design the probe or primer based on a specific conserved region in the amino acid sequences of the TRAS-like family to isolate many DNA species . Such a region includes , for example, a region containing amino acid residues conserved in the TRAS-like family as shorwn in Figs. 14 to 17 (residues in the solid-line open boxes marked by asterisks in each figure).
It can be conf. firmed whether a polypept~.de encoded by an isolated DNA actually cleaves a telomeric repeat sequence by using the assay systems as described in Examples. Specifically, a labeled double-stranded oligonucleotide containing a telomeric repeat sequence (substrate) is incubated with the polypeptide, and the reaction product is elcctrophor_cscd in a polyacxylamide gel to detect substrate c7.eavage. The substrate zncludes plasrnid DNA, phage DNA
and other DNAs in addition to the oligonucleotide. DNA can comprise a sequence in addition to the telomeric repeat sequence; when such a DNA is used, it is tested whether the moiety of the telomeric repeat sequence is cleaved. Specifically, the polypeptide of the present inver~tiori includes a polypeptide having an activity to cleave a velomeric repeat sequence of a double-stranded DNA. A polypeptide of the present invention has an activity to cleave a double-stranded DNA comprising at least any one of the eukaryotic telomeric repeat sequences . Preferably, a polypeptide of the present invention has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence from a ~rertebrate and/or an arthropod. More preferably, a polypeptide of the present a.nvention has an activity to cleave a double-stranded DNA comprising the repetitive sequence of [5'-TTAGGG/5'-CCCTAA] and/or the repetitive sequence [5'-TTAGG/5'-CCTAA]. Further, a polypeptide of the present invention preferably has an activity to cleave both strands of a double-stranded DNA comprising a telomcr_ic repeat sequence.
It Xs also preferred :hat the DNA cleavage by a po7ypeptxde of. the present invention is specific to the telomeric repeat sequence, 7n general, the ~:erm "polypeptide having an activity ~l:o specifically cleave a double-stranded DNA comprising a telomeric repeat seguence"
refers to a polypeptide whose cleavage frequency or cleaving activity for a double-stranded DNA comprising a tclomeric repeat sequence is significantly higher than the cleavage frequency or cleaving activity for a double-stranded DNA comprising a sequence other than a telorneric repeat sequence (non-telomeric repeat sequence). Such polypeptides are not limited to those cleaving only double-stranded DNAs comprising telomeric repeat sequences, but include those cleaving sequences similar to telomeric repeat sequences. For example, the cleavage site is determined by a polypCptide cleavage assay using as the substrate a DNA comprising a telomeric repeal:
sequence and additional sequences (for exampJ,e, the sequence contained in a vector) . When the distribution frequency of cleavage sites in the telorner.ic repeat sequence is significantly higher compared to that of other sequences in the substrate, it is safe to conclude that the polypeptide specifica).ly cleavES the telomeric xepeat sequence. Some examples of such assay systems are described below in the Examples. Thus, the polypeptides of the present invention include polypeptides having an activity to specifically cleave a telomeriC repeat sequence in a double-stranded DNA.
Alternatively, as described in Examples, the specificity can be evaluated, for example, by an assay system using a labeled oligonucleotide. A cleavage assay is carried out using an oligonucleatide comprising a telomeric repeat sequence and a control oligonucl.eotide as substrates; when the cleavage frequency or cleaving activity for the oliganucleotide comprising the telomeric repeal: is significantly higher than for the control, i~: is safe to conclude that t3~e polypeptide is specific to the telomeric repeat sequence. Such a control oligonucleotide includes, for example, an oligonucleotide comprising random sequences or a mixture thereof, vector DNAs such as plasmids or phages , or genomic DNAs other than telomeres, or a f~~agment of such genomic DNA.
Apolypeptide cleavinga teJ.omeric repeat sequence of the present invention can be a naturally occurring polypeptide, a partial.
polypeptide thereof, or an artificial polypeptide. When a polypeptide of the present invention is a partial polypeptide of a naturally occurring polypeptide, it is preferred that vhe polypeptide does not have any activity that is not associated with recognition, binding and/or cleavage of a telomeric repeat sequence 5 (reverse transcr.iptase activity, etc.). Methods for assaying reverse transcriptase activity using nucleic acids of DNA/RNA, labeled nucleotides, and such are known. There is no limitation on the length of the polypeptide; when it is a partial polypeptide from a naturally occurring polypeptide, the length is typically 1000 10 amino acids or less, preferably 800 amino acids or less, more preferably 500 amino acids or less (for example, X150 amino acids ox less?. A DNA encoding a polypeptide of the present invention may contain the 5'-untranslated sequence and 3'-untrar~sJ,ated sequence in addition to the coding region of the polypeptide of the 15 present invention. The DNA can contain another gene or intron. The 1~NA can also contain transcriptional regulatory sequences such as a promoter and enhancer, a sequence participating in translation ~uuli d.~ IRES, RNA-el~SLdLiliLirrg si~mdl, truly (A) 5igndl, ~:1.~:.
In a preferred embodiment, the prcs~nt invention provides a 20 DNA derived from a sequence belonging to the TRAS-like family, which has an activity to cleave a double-stranded DNA comprising a telorneric repeat sequence and a DNA enGOding the polypeptide. As used herein, the term "sequence belonging to the TRAS-like family" refers to a sequence, which, along with at least any one of the identified TRAS-like family members, forms a group Containing no R1 members related to the TRAS-like famxJ_y. zn preferred embodiments, the term refers to a sequence, wh~.ch, along with any one of the TRAS-like familymcmbers (for example, TRA81, TRAS3, TRAS4, TRASS, TRAS6, TRASDJ, TRASSC3 , TRASSC4 , TRASSC9 ; see Figs . 14 to 17 ) described in Examples , forms a group without RlBm. The grouping can be c~ohierred by a known method based on the nucleotide sequence of the DNA or the amino acid sequence encoded by the nucleotide sequence . As seen in the Examples , the grouping can be achieved by preparing a phylogenetie tree based, for example, on the amino arid sequence of the region from the EN
domain to the RT domain encoded by the pot pRF (for example, the region shown in Figs. 1~ to 17) or a sub-region thereof. Such a phylogcnetic tree can be prepared based on the amino acid sequence of EN domain. A desired stratifying method including the neighbor-joining method, furthest neighbor method and maximum likelihood method can be used to prepare a phylogenetic tree. A
preferable example is the neighbor-joining method (Saitou, N. and Nei, M., 19$7, Mol. Biol.. Evol. 4: 406-X25). The reliability for the group can be evaluated based on the bootstrap probabil~.ty.
Preferably, the bootstrap probabil ity for the branch is 50% or higher, more preferably $0% or higher, even more preferably 90% or higher, most preferably 95% or higher (for example, 99.0% or higher) . The number of samplings may be, for example, lOUU.
Altex»atively, the TRAS-like family can be identified based on a significant homology among tho nucleotide sequences or the amino acid sequences encoded by the nucleotide sequences. As seen in Examples, the amino acid sequences encoded by members of the TRAS-like family isolated by the present inventor were distinguished based on the fact thaC L~mse amino acid sequences are highly homologous to one another compared i:o amino acid sequences encoded by eleme»ts other than the TRAS-like family, such as RIBm. Based on such homology, one can evaluate Whether a sequence belongs to the TRAS-like Family.
For example, homology is determined between the amino acid sequence of the region from the LN domain to the RT domain arld any one of the identified members of the TRAS-like family. When the homology value (for example, amino acid sequence identity) for a member of the TRAS-like family is significantly higher than that for an R1 related to the TRAS-like family, it is safe to conclude that the sequence belongs to the TRAS-like family. Known members of the TRAS-like family include those of the TRAS-like family described in Examples (for example, TRAS1, TRAS3, TRASA, TRAS5, TRAS6, TRASDJ, TRASSC3, TRASSC4, TRASSC9; see Figs. 14 to 17). Typically, when compared with any one of the identified sequences of the TRAS~7.ike family, the amino acid sequence identity of the TRAS-like family in the region indicated in Figs. 14 to 17 (the region from EN to RT) or a region similar thereto, is presumed ~4o be about 31% or higher, preferably about 33% ox higher, more preferably about 35% or higher, even more preferably about 37~ or higher (for example, about 40%

ar higher) . When compared with any one of the identified sequences of the TRAS-like family, the nucleotide sequence identity in a coding region in tho region indicated in Figs. 14 to 17 (the region from EN to RT) or a region similar thereto is presumed to be about 45%
or higher, preferably about 47% or higher, more preferably about 50% ar higher, even more preferably about 52% or higher. The calculation of the identity can be carried out according to the method described herein.
The polypeptides of the present invention and DNAs encoding the polypepti.des also include a polypeptide from a region of the endonuclease (EN) belonging to the retxotrd~mpr~son element TRAS-like family, which has an activity to cleave a double-stranded DNA
comprising a telomeric repeat sequence, and the DNA encoding the polypeptide. The amino acid sequences of the EN polypeptides of the TRAS-like family isolated by the present inventor are shown in SEQIDNOs: 36, 37, 41, 44, 47, 50, 53, 56, 59and62. Thesepolypeptides are encodod by the DNl~s of SEQ ID NOs: 35, 38, 42, 45, 48, 5i , 54, 57 and 60, and such. A DNA encoding the EN of the TRAS-like fam~.J.y can be isolated, for example, according to the procedure described in Examples. Specifically, the DNA can be isolated by screening a genomic DNA, library of the silkworm or other insect or by directly amplifying by PCR and such from the genomic DNA. When it is amplified by PCR, degenerated primers can be designed based on the DNA sequences or the encoded amino acid sequences conserved in the TRAS-like family.
A polypeptide can be prepared from an isolated DNA, for example, by expressing the recombinant polypeptide fxom the DNA and r, ecovering and purifying the expression product, as shown in Examples.
Specifically, EN polypeptides of the TRAS-like family include, for example, a polypcptide encoded by the entire sequence of SEQ
ID NO: 35 (the amino acid sequence is shown in SEQ ID NO: 36), a polypeptide encoded by nucleotides 3661 to 4366 of SEQ ID NO: 38 (amino acids 1 to 235 of SEQ ID NO: 40; SEQ rD NO: 41) , a polypeptide encoded by nucleotides 1 to 537 of SEQ ID ND: 42 (the segment of amino acids 1 ~Lo 179 of SEQ ID NO: 43; SEQ 3D NO: 44) , a polypeptide encoded by nucleotides 1 to 588 of SEQ ID NO: 45 (amino acids 1 to 396 of SEQ ID NO: 46; SEQ ID NO: 47), a polypeptide encoded by nucleotides 1 to 492 of SEQ ID N0: 48 (amino acids 1 to 164 of SE(~
ID N0: 49; SEQ ID NO: 50), a poJ.ypept~.de encoded by nucleotides 1 to 426 of SEQ ID NO: 51 ( amino acids 1 to 142 of SEQ ID N0: 52;
SEQ ID N0: 53), a polypeptide encoded by ~nucleot~.des 1 to 492 of SEQ ID NO: 54 ( amino acids 1 to 164 of SEQ ID N0: 55; SFQ ID N0.
56), a polypeptide encoded by nucleotides 1 to 492 of SEQ ID N0:
57 t amino acids 1 ~.0 2.64 of SEQ ID N0: 58; SEQ ID N0: 59) and a polypeptide encoded by nucleotides 1 to 492 of SEQ ID N0: 60 ( amino acids 1 to 164 of. 8EQ ID NO: 61; SEQ ID NO: 62) . The EN polypeptide can also be a polypeptide comprising one of the amino acid sequences of these polypeptides. It is also possible to appxwpriately attach another sequence (such as methionine as the start codon, tag, signal peptide, etc.) to the N and/or C termini.
These regions can be amplified by PCR or the Iikc. For example, TRAS1 EN (SEQ ID N0: 36) can be amplified using primers:
S'-AAAAAAAACATATGCACGGCGAGCAGTGGAA-3' (SEQ ID NO: 2) W'1d 5' -AAAAAACTCGAGTTATTTTTGGAGTCTAATATTG11ATACCATACC-3' ( SEQ T17 N(7 : 3 ) .
The amplified DNA fragment can be digested with Ndel and XhoL.and then cloned, for example, into the expression vector pETl6b (Novagen) at the Nder-Xhol site. TRASI EN containing (His) 1~ at its N term~.nus can be expressed using this vector. Further-, for example, TRAS3 EN can be expressed by carrying out amplification using DNA containing TRAS3 as a template with primers:
5'-AAAAAAARCATATGCGTATTGCTAAGGGCAG-3' (SEQ ID N0: 63) and (5'-A.A.A.AAACTCGAGTTATCGTTTTGTATGAATATTAAATAGGATGGC-3' (SEQ ID NO:
64) and cloning the product into an expression vector by the same method as described above . Other clones can be amplified with primers designed in a similar way to construct expression vectors.
The present invention also relates to a polypeptide structurally similar to an ENpolypeptide of the above--mentioriPd TRAS-like family (for example, SEQ ID NOs: 36, 37, 41, 44, 47, 50, 53, 56, 59 and 62) , which has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence and a DNA encoding the polypeptide. Such a polypeptide includes, fox example, a. polypeptide comprising the amino acid sequence of an above-mentioned EN paJ.ypeptide in which one or more amino acids have been substituted, deleted, inserted ~

and/or added. For example, TRAS1 EN of SE~Q ID NO. 36 is a partial amino acid sequence corresponding to ORF2 of TRASI. iiowever, the region from which a sequence (CDS) encoding a partial polypeptide ~rom ORF2 is taken is not limited to only this region, as long as it has an activ~.ty to cleave a double-stranded DNA comprising a telomeric repeat sequence. Fox example, the N and/or C termini may contain a partial polypeptide derived from a larger ORf2, or the partial polypeptide may be one whose N and/or C termini are narrower than the fRAS1 EN of SEQ TD NQ: 36.
The polypepti.de of the present invention can contain a sequence of a part other than the EN of a TRAS-like family member. For Example , there is a possibility that a Myb-like DNA binding domain (Figs.
14 to 17) contained in the members of the TRAS-like family, the cysteine-histidine motif in the C-terminal xegion of ORF1 or ORF2 of the TRAS family, and the TRAS-specific region (TSR) consisting of 12 amino acids, which is boxed in Fig. x5, are associated with a more strict recoc~niLion of the telomsrxc repeat sequence. The polypeptides of the pxesent invention include a pol,ypeptide to which any one of these peptides has been added. Further, the amino acid 2p sequence o~ a polypeptide of the present invention can be a chimeria sequence resulting from a Fusion with another amino acid sequence.
For example, it is easy to aonstruot a chimeric polypeptide between the C--terminal region of one of the above--mentioned EN polypeptides of the TRAS family and the N-terminal region (for example, 100 to 200 amino acids) of another member of the TRAS family (for example, TRAS1 or TRAS3). When preparing a chimeric polypeptide , corresponding amino acid sequence can be rearranged at arbitrary positions based on the amino acid alignments ir, Figs. 14 to 17.
Furthermore, such a chimeric polypeptide can be prepared from 3 or more sequences.
The po7.ypeptides of the present invention also include a polypeptide that comprises the amino acid sequence of an EN
polypeptide of the TRAS-like family (SEQ TD NOs: 36, 37, 4~., 44, 47, 50, 53, 56, 59 and &2) into which a mutation (s) has been introduced, as long as it has the activity to cleave a double-stranded DNA
comprising a telomeric repeat sequence. For example, those skilled xn the art can introduce a mutation (s) into an amino acid sequence by appropriately introducing a mutations) into the bNA encoding the pol.ypeptide by a site-directed mutagenesis , such as the Ruxlkel' s method (Kunkel, T. A., 1985, Proc. Natl. Acad. Sci. USA 82, 48$;
5 Kunkel, T. A. wL a~.., 1987, Methods Eiizymol. 154, 367; the Gapped duplex method (Kramer, W . et aI. . , 7.984 , Nucleic Acids Res . 12 , 9441;
Kramer, W. and Frits, H. J. , 19$7, Methods Enzymol. 154, 350) ;
Eckstein's method (Sayers, J. R_ et al., 199?_, BxoteGhniques, 13, 592) ; the A7,tered Site method (Lesley S. A. & Bohnsack, R. N. , I994, 10 Promega No~i:es Magazine, 46, 6-10) ; Ito's method (Ito, W. e~C a~.. , 1 991 , Gene, 102, 67) ; the PCR method (Cormack, B. , .fn "Current Protocols in Molecular Biology" (Ausubel, F. M. et al., eds.}, 8 . 5. 1-8 .5.9, 1987) ; or the oligonucleotide l.igationmethod (Uhlmann, E. , 1988, Gene, 71, 29-40; Moore, D. D. , 1987, in "Current Protocols 15 in Molecular Biology" (Ausubel, F. M. et al . , eds . ) , 8 . ~ . $,8 . 2 .
13 , 1987), and can also prepar.'e mutant polypeptides by introducing a mutations) at random according to the deletion method (Ausuber, F. M. et al., eds. in "Cur,rent Protocols in Molecular Biology", 1.02-5.10.2, 1987; S.ambrook, J. et al., in "Molecular Cloning A
20 Laboratory Manual", 2nd ed., 5.1-6.62, 1987); linker-insertion method (Ausuber, F. M. et al. , eds . in "Current Protocols in Molecular Biplogy", 1.02-5.10.2, 1987; Sambrook, J. et al., ira "Molecular Cloning A Laboratory Manual", 2nd ed., 5.1-6.62, 1987); chemical mutagenesis (Myers, R. M. , in "Current Protocols inMolecular Biology"
25 (Ausubel, F. M. et al. , eds. ) , 8.3.1-8.3.6, 1,9$7) ; degenerated oligonucleotide method (Hill, D. E. et al. , Methdos Enzymol . , 155, 558-568, 1987; Hill, D. E. , in "Current Protocols in Molecular Biology" (Ausubel, F, M. et al., eds.), 8.2.1-8.2.7, 1987} linker scanning method (Greene, J. M. et al. , Mol . Cell. Biol. 7, 3646-3655, 1987) , or such. Amino acid mutations can be spontaneous mutations.
The present invention includes a mutant polypeptide, whose mutation may be artificial or spontaneous, comprising the amino acid sequence of an above-mentioned polypcptide (SEQ ID NOs : 36, 37, 41 , 44 , 47 , 50, 53, 56, 59 and 62) in which one or more amino acids have been substituted, deleted, inserted and/or added, and which has an activity to cleave a double-stranded DNA comprising a telomeric repeat seguence.
There is no limitation on the number of amino acids that arc mutated in such a mutant, but typa.cally the number may be 200 amino acids or Less when an amino acid sequence is mutatCd artificiaJ.ly.
Prefcxably, the number of mutated amino acids is J.SO amino acids or less, more preferably 100 amino acids or less, even more preferably 50 amino acids or less (fox example, 10 amino acids) . However, there is no limitation on the number of amino acids when amino acids are added to the sequence .. When one intends to substitute amino acids in a sequence, the amino acids can be rep3.aced, fox example, with amino acids located at a ~~t~responding position in another retrotransposon-J.ike sequence. It is also possible to create a chimeric sequence by repJ.acing a particular region with a region of another retrotra~,sposon, as described above, instead of substituting individual amino acids. The chimeric amino said sequence can be furi:her modified by amino acid substitutions.
In the case of artificial substitution, i.t is thought that the activity of the original polypeptide ~l:ends to be retaindd when an amino acid is replaced with an amino acid with a side chain whose chemical properties are similar to that of the original amino acid.
Such conservative substitutions of amino acids are well known to those skilled in the art. Groups of amino acids between which conservative substitutions can be achieved are : ( 1 ) basic amino acids (for example, lysine, arginine, and histidine); (2) acidic amino acids (for example, aspartic acid, and glutamiC acid) ; (3) uncharged polar amino acids (for example, glycine, asparagine, glutamine, serinc, threonine, tyrosine, and cysteine) ; (4) non-polar amino acids (fox example, alanine, valine, leucine, isoleucine, praline, phenylalanine,methionine, and tryptophan); (5) beta-branchod amino acids (for example, threonine, valine, and isoleucine) ; (6) aromatic amino acids (for example, tyrosine, phenylalanine, tryptophan, and nistidine), and such. There is a possibility that amino acid mutationsresult in polypeptides having improved activity of cleaving the telomeric repeat sequence or improved specificity to the sequence to be oleaved. Zn addition, amino acid mutations oan improve the thermostability of the polypeptide.

preferred polypeptides of the present invention are polypeptides that have high homology to the amino acid sequences of the above-mentioned polypeptides (SEQ ID NOs: 36, 37, 41, 44, 47, 50, 53, 56, S9 and 62) and that have an activity to cleave double-stranded DNAs comprising the telomeric repeat sequences.
Specifically, in the pxesent invention, polypeptides comprising the amino acid sequence of any one of these polypeptides (SEQ ID NOs:
36, 37, 41, 44, 47, 50, 53, 56, 59 and 62) in which one or more amino acids have been substituted, deleted, inserted and/or added include polypeptides having high homology to the amino acid sequence selected from SEQ ID NOs: 36, 37, 41, 44, 4'/, 50, 53, 56, 59 amci 62. The term "high homology" typically means at least 20% or higher identity at the .amino acid level. The amino acid sequence identity is preferably 30% or highex, more preferably 40% or higher, moxe preferably 60% or higher, more preferably $0% or higher (for example, 90$ or higher).
'lhe amino acid sequence identity can be determined by a knnwn computer program. ror example, as describe in Ex~mpJ.es, the value can be calculated by counting the number of identical amino acid residues in an alignment of_ amino acid sequences that axe aligned using an alignment program such as CLUSTAL W. Alternatively, for example, it is possible to use blastp program (B1.ASTP 2Ø1 [Aug-20-1997] Altschul, Stephen F. , Thomas L. Madden, Alejandro A.
Schaffer, ,.Tinghui Zhang, ZhengZhang, WebbMill.er, andDavidJ. Lipman (1.997), "Gapped BLAST and PSI-BLAST, a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402). For example, ~ralues far the identity can obtained as Identities (%) by carrying out search using BL4SUM52 as the scoring matrix (Henikoff , Steven and Jorga G. Henikoff (1992) Amino acid substitution matrices from protein blocks. Proc, Natl. Acad. Sci. USA 89:10915-19)(Open gap penalty=11; extension gap penalty=1) without using FILTER
(filtering treatment for Low-complexity sequences) in BLAST 2 SEQUENCES iTatiana A. Tai:usova, Thomas L. Madden (1999), "Blast 2 sequences - a new tool for. comparing protein and nucleotide sequences", FFMS Microbiol hett. 174:247-250;
http://www.nc:hi,.n~.m.nih.gov/c~orf/bl2.htm1) , which compares two z8 amino acid sequences using blastp.
Other methods for preparing polypeptides structurally similar to a particular polypeptide, which methods are wel.7. known to thane ski).led in the art, include the method using hybridization ~.echnique (Sambrook, J. e~ al., Molecular Cloning 2nd ed., 9.47-9.5$, Cold Spring Harbor Lab . press , 7.989 ) . In other words , those skilled in the art can readily isolate the DNAs exhibiting high homology to the DNA sequences encoding the amino acid sequences of the above-mentioned polypeptides of the TRAS-like family (SFQ ID NOs:
36, 37, 41, 44, 47, 50, 53, 56, 59 and 62) (for example, SEQ ID NOs:
35, 38, 42, 45, 48, 51, 54, 57 and 50) or a portion thereof based on the sequences, and then isolate polypeptides having an activity to cleave the telomeric repeat sequences from the nNl~s . The present invention includes polypeptides that are encoded by DNAs hyb~: idizing to the t~NAs encoding the polypeptides of the present .in~reritiorl and that: have an activity to cleave a double-stranded DNAs comprising the telomeriC repeat sequences.
There is no limitation ors material from which DN11E qncoding such polypeptides are isolated. Preferably, the material is genomic DNAs or cDNAs of eukaryote. A preferred eukaryote is, fox example, arthropod, particularly insect. More specifically, insects of Lepidoptera,including silkworm,are suitable. As sown in Examples, many retrotransposons structurally similar to the members of the TRAS family are distributed over a wide range of insect species including silkworm and other insects. These DNAs can be isolated fxorn insect genomic DNAs by using the hybridization technique.
There is no limitation on conditions of hybridization, arid thosC
skilled in the art can properly select such conditions . For example, pre-hybridization is caxz-ied out using a mixture of 0.9 M NaCl, 90 mM Tris-HC1 (pH 7.9) , 6.mM EDTA, 0.5% sodium dodecyl sulfate (SDS) and 1. 6% skim milk, and then hybridization is car, r. ied out by overnight incubation in a solution of the same composition containing a probe.
Hybridization temperature can be adjusted appropriately. When a probe labeled by random prime labeling is used, hybridization undCr low stringency conditions is carried out at about 40°C to about 45°C
(for example, at 40°C, 42°C or 45°C) and hybridization under high stringency conditions is carried out at about 50°C to about 65°C
(for example, at 50°C, 5S°C, 60°C or 62°C) . When an oligonucleotide is used as the probe, hybridization temperature is adjusted based on 'the Tm o.f. i:he oligonucleotide . Hybridization under low stringency conditions is caxried out at room ~:emperature or. at about 37 °C to about 40°C, and hybridization under high stringency conditions is carried out at a temperature Within a range of about (Tm-30)°C to about ( Trn-10 ) ° C . Tm can be determined , for example , by immobil i zing a template which is perfectly complementary to the probe on a filter and by testing the association or dissociation of the probe at various temperatures. Alternatively, Tm of an oligonucleotide cvmpiising about l4to about 70 nucleotides can be roughly estimated, for example, by using the following formula: Tm = 81.5 -i- 16.6 (logio[Na+] ) -~
0.4J.(~G+C) - (500/N) (where N indicates the number of nucleotides in a strand; [Na' ] , Na+ concentration (M) ; (%G+C) , G+C content (%) ) (Bolton, E. T. and McCarthy, H. J., 1962; Proc. Natl. Acad. Sci.
USA 48: 1390) . Tm c~I an oligonucleotide whose length is up to about 18 nucleotides can be roughly estimated, f. or example, by the following formula: Tm = (A+T) x 2 + (G+C) x 4 (Itakura, K. et al. , 1984, Annu.
?.0 Rev. Biochem. 53: 323). After hybridization under either low stringency or high stringency conditions, washing was caxried out, f. or examp).e, using 4 x SSC at the same temperature as in hybridizations and the washing solution is changed several times during the washirig_ It is preferable to add SDS to the washing solution at a final 2.5 concentration of about 0 . 1 to about 0 . 5% . When one intends to carry out the washing under more stringent conditions, washing is then continued using a diluted washing solution in which SSC concentration is lower than that of the original washing solution. Preferably, 4 x SSC is diluted twice and washing is carried out with the 2 x 30 SSC (when one intends to wash under higher st.r.ingency conditions) .
More preferably, then, 2 x SSC is diluted twice to 1 x SSC and washing is carried out with ~4he 1 x SSC (when one intends to wash under much higher stringency conditions) . Op~:ionally, further diluted SSC can be used for washing. However, there are various factors influencing 35 hybridisation stringency, including probe eonceni:ration, length and sequence of probe, reaction time in hybridization, temperature, salt concentration and composition of solut~.on. Those skilled in the art can achieve the same stringency by appropriately seleC'Ling these factor. s . As hybridization stringency is higher, DNAs having higher homology to a probe sequence can be isolated.
5 As probes, for example, DNA fragments of the TRAS-like family described in f~xamples can be used. For. example, the DNAs of SEQ
ID NOs: 15 to 28, 35, 38, 42, 45, 48, 51, 54, 57 and 60, and a partial fragment thereof (which may be a double-stranded I~NA, or either strand of the two or the complementary strand thereof) sari be used as probes .
10 It is also possible to use a probe which has been designed to encode a portion of any olie ~r Lhe amino acid sequences encoded by TRAS-like family (SEQ TD NOs: 36, 37, 41, 44, 47, 50, 53, 56, 59 arid 62).
Degenerated oligonuclcotides can also be used as probes. The degenerated DNAs can be designed to contain all triplets encoding 15 amino acids, or to contain only particular triplets or all triplets with varying mixing proportion based on the codon usage in an organism from which DNAs to be screened arp rlPrivc:d. The DNAs can also be degenerated to encode multiple amino acids . 0~lce the DNAs are isolated, then the coding region for the polypeptides that cleave 20 the telomeric repeat sequences can be identified by comparing the DNA sequences or the amino acid sequences encoded by the open reading frame with TRAS1 FN (SEQ ID NO: 36) or the Like, as described above.
The DNAs can be isolated by using, instead of hybridization, gene amplification methods in which primers synthesized based on 25 the information about the amino acid sequences of polypep tides encoded by the TRAS-like family are used, for example, polymerise chain reaction (PCR) iSambrook, J et al . , Molecular Cloning 2nd ed. , 9.47-9.58, Cold Spring Harbor Lab_ press, 1.9$9; "The PCR Technique:
DNA sequencing" (Eds . J. Ellingboe and U. Gyllensten) , "BioTeChniques 30 Update Series", Eaton Publishing, 1999; "The PCR Technique: DNA
sequencing II" (Eds. U. Gyllensten and J. Ellingboe) , "BioTechniques Update Series", Eaton Publishing, 1999; "PCR Technoi,ogy: principles and application for DNA amplification" Ed by H. A_ Erlich, 1989, Stocl~ton Press) . Degenerated ol.igonucleotides can be used as the primers.
Such methods include the CODEHOP method (COnsensus-DEgenerate 37.
Hybrid pligonucleotide Primer) (Rose, T. M. et al. , 7.99$, Nucleic Acids Res. 26:1528-35) , described in Examples. The primers can be designed based on the amino acid sequences of conserved regions in the alignments in Figs. 14 to 17 (for. example, the regions of EN
domain, TSR, Myb-like region, RT domain, etc. ) . Preferred regions include, for example, a region containing the amino acid residuCs conserved in the TRAS-like family in Figs . I4 to 17 (such residues of the TRAS-like family are enclosed in a open box with an asterisk in each figure). In addition, a portion of the sequences of SEQ
ID NOs: 15 to 28, 35, 38, 42, 45, 48, 51, 54, 57 and 50 can be used for this purpose. It is not necessary to place the probe or primer in a region encoding EN. Tt~e sequence of an adj scent region or within the the TRAS-like familymember, and preferably, among such sequences, the sequence of a region conserved in the TF~AS-like family, and such can be used as the probe or pximer.
The present invention also relates to polynucleotides comprising a nucleotide sequence complementary to the nucleotide sequence selected from,SFQ ID NOs: 15 to 28, 38, 42, 45, 48, 51, 54, 57 and 60 or the complementary strand thereof. Such polynucleotides contain polynucl.eotides complementary to a sequence comprising at least 15 consecutive nucleotides ox more, preferably 17 nucleotides or more, and more preferably 20 nucleotides or more, or to a complementary sequence thereto, where the sequence is derived from the above-mentioned nucleotide sequences and excludes 7..:5 ~rolynucleotides comprising single-nucleotide repeat sequences , such as poly (A) , or dinucleotide .repeat sequences , such as CA repeats .
The polynualeotidesincludes DNAs,RNAs,and modified forms thereof.
The term "complementary strand" refers to one strand complementary to the other strand of a double-stx-anded nucleic acid Comprising the base pairs of A:T(A:U) and/or G:C. The term "complementary"
is used not onJ.y when the two strand are perfectly complementary to each other in the region of at least 15 consecutive nucleotides , but also when the identity is at least 64~, prefexably at least $0~, more preferably 90~, yet more preferably 95~% or higher (for example, 99% or higher) between the two nucleotide sequences. The algorithm described herein can be used to determine the identity. These DNAs i include ~1) probes arid primers capable of hybridizing to nucleic acids encoding the po'l,ypeptidcs of the present invention or to nucleic acids complementary thereto and (2) antisense oligonucleotides or ribozymes, or polynucleotides encoding them. Such polynucleotides S are uselul as probes and primers to isolate nucleic acids encoding polypeptides that cleave the telomeric repeat sequences . In add~.t~.on, the polynucleotides can be used for DNA chip or DNA microarray.
Such polynucleotides include polynucleotides that comprise at least 15 nucleotides and that hybridize to a nucleic acid comprising the nucleotide sequence selected frpm SEQ ID NOs : 15 to 28 , 38 , 42, 45, 4~3, 51, 54, .57 and 60 or the complementary strands thereto.
Preferably,such polynucleotides hybridizespecifically to a nucleic acid comprising the nucleotide sequence selected from SEQ ID NOs:
to 28, 38, 42, 45, 48, 51, 54, 57 and 60 or the complementary 15 strands thereto. The term "hybridizing specifically to" means that a nucleic acid does not significantly cross-hybr~.dice Lo o LYmr uuc:leic:
acids uxtder any one of high stringency hybridization conditions as described above, preferably unc3sr any one of low stringency hybr:idiza Lice c~c~naitxons as described above .
The present invention also relates to polynucleotides encoding paxtial polypeptides of the polypeptides comprising the amino acid sequence selected from SEQ ID NOs: 39, 40, 43, 46, 49, 52, 55, 58 and 61. Such polynucleotides may be polynucleotides encoding polypeptides that cleave the telomeric repeat sequences, or.
polynucleotides encoding fragments of the above-mentioned polypeptides. These polynucleotides can be used to construct nucleic acids encoding the polypeptides of the pxcsCnt invention. In addition, these polynucleotides can also be used as probes or primers Lo isolate the DNAs encoding polypeptides that cleave the telomcric repeat sequences, and can be used as probes and primers to detect the expression of DNAs encoding polypeptides that cleave the telorner~.c repeat sequences.
The present invention also provides paxtial polypeptides of the polypeptides comprising the amino acid sequence sElected from SEQ ID NOs: 39, 40, 43, 46, 49, 52, 55, 5$ and 61. Such partial polypeptides can be used as the polypeptides that cleave the telomer_ is a repeat sequences, and can be used to construct the polypeptides of the present invention. In addition, the partial polypeptides are also userul to prepare fusion polypeptxdes with other polypeptides.
furthermore, antibodies can be prepared by using these peptides as antigens. Such antibodies can be used to detect or quantify the polypeptides of the present invention that cleave the telomeric ,repeat sequences. The partial polypeptides of the present invention are polypcptides comprising at least 7 amino acids, preferably 8 amino acids or more, and more preferably 9 amino acids or more.
The polypeptides of the present invention can be fusion polypeptidcs with other polypeptides. The ter~ci "fusion polypeptides" reLers topolypeptides inwhiGh two ormorepolypeptides which are originally separate without direct linkage between them are linked to one anpther . Such fusion polypeptides can be prepared, for example, by inserting both DNA encoding an EN polypeptide (for example, SEQ ID NO: 36, 37, 41, 44, 47, 50, S3, 56, 59 or 6~) of.
the TRAS-like family and DNA encoding other polypeptides into an expression vector in frame, by introducing the resulting vector into a host, and by allow~.ng the host to express the fus~.on polypcptides.
Alternatively, it is also possible to design the DNAs encoding the two. polypeptides to be in frame through RNA splicing. Furthermore, it is possible to design the DNAs encoding i~hese polypeptides to be in frame by RNA editing through ribonucleotide insertion or deletion, or othex post-transcriptional modifications. Those described above can be achieved by the methods knawn to those skilled in the art, The fusion polypeptides of the present invention also include polypepl:ides comprising the amino acid sequence designed to contain an artificial sequence.
An example of polypeptides to be fused is a peptide that functions as a tag. In Example, EN polypeptide (SFQ ID NO: 37) which has been fused at its N-terminus with a peptide containing a His-tag was expressed. Polypeptides comprising a His-tag can be purified conveniently with a nickel. column. In addition to tags, far example, a secretory signal can be used for fusion. This signal allows secretion of the fusion polypEptides from the cells producing the fusion polypeptides. Other polypeptides to be used for fus~.on c include GST (glutathi.one-S-transferase), HA (influenza hemagglutinin), immunoglobulin constant region, j3-galactosidase, MIiP (maltose-binding protein) , GfP (green fluorescence protein), and other functional proteins. Tn addition, a peptide that functions in cells, for. example, nuclear localization signal (NLS) or DNA-binding polypeptide can be used a8 l:he fusion partner.
Furthermore, the polypeptide of the present invention can be inserted as a module (functional unit) into a heterologous polypeptide in order to provide the heterologous polypeptide with the function to cleave the telomeric repeat sequences . Such a module can be applied, for example, to chromosome-integrating vectors, (reLLwviral vector, etc.) and transposable elements, znthepresent invention, the polypept~ides comprising the amino acid sequence of SEQ ID NO: 36 also includes the following: (a) polypeptides comprising the amino acid sequence of SEQ ID NO: 36;
(b) polypeptides that comprise the amino acid sequence of SEQ T1?
NO: ~'6 in which ono or more amino acids have been substituted, deleted, inserted and/or added and that have an activity to cleave double-stranded DNAs comprising the telomeric repeat sequences; (c) polypep~ides that are encoded by DNAs hybridizing to the DNA encoding the amino acid sequence of SEQ zD Np: 36 and that have an activity to cleave doub~.e-stranded DNAs comprising the telomeric repeat sequences; and (d) fusion polypeptides comprising the polypeptides according to any one of (a) through (c). DNAs encoding these polypeptides are also included in the present invention. Likewise, the present invention, when concerning the polypeptides comprising the amino acid sequence of any one of SEQ ID NOs: 37, 41, 44, 47, 50, 53, 56, 59 and 62, includes: the polypeptides comprising the amino acid sequences indicated above; polypeptides comprising amino acid sequences derived from the original. ones, in which one or more amino acids have been substituted, deleted, inserted and/or added, which have an activity to cleave double-stranded DNAS comprising the telorneric repeat sequences; polypeptides encoded by DNAs hybridizing to the DNAs encoding the amino acid sequences of the original po7.ypeptides, which have an activity to cleave double-stranded DNAs comprising the telomeric repeat sec~uerices; and f fusion polypeptides comprising any one of the above-mentioned polynucleotides. DNAs encoding these polypeptides are also included in the present invention.
The cleavage of the telomeric repeat sequences using the 5 polypeptides of the present invention can be achieved by contacting the palypeptides of the present invention with DNAs comprising the telomeric repeat sequences under the physiological conditions. In the case of in--v.ftro cleavage, the pH is adjusted to about 6.0, speeifieal 1y about 5 to about 7 , arid preferably about 5 , 8 to about 10 f.z; the concentration of monovalent cation (sodium ion, etc.) is adjusted to 200 mM or less, preferably 100 mM or less, rnorc preferab7.y 50 mM ar less, dnd yet more preferably about 0 to about 10 mM. Tn addition, a divalent ion is also added to the reaction. Preferably, magnesium is added at a final concentration of about 2 mM, specifically 15 about 0.1 to about 70 mM, preferably about 0.5 to about 10 mM, more preferably about 1 to about 5 mM, and yet more preferably about 1 .5 to about 3.0 rnM. Furthermore, it i~ preferable to add DSA and Such to stabilize proteins. An exemplary composition of the reaction solution is as follows:
20 50 mM Pipes-HCl (pH ~.0) 10 mM NaCl 2 mM MgCl2 100 ~/ml HSA
Substrate DNAs and the palypeptides of the presex~t invention 25 are added to the reaction solution, followed by incubation. The incubation is carried out at a temperature within about 4 to about 45°C, preferably about 15 to about 37'C, more preferably at about 2S °C. The cleavage of DNAs comprising the telomerfc repeat sequences using the DNAs of the present invention can be achieved by expressing 30 the. DNAs of the present invention under conditions which ensure the contact of the expression product with the DNAs comprising the telameric repeat sequences. Fox example, the cleavage of the tclomeric repeat sequences using the DNAs of the present invention in cells can be achieved by introducing the expression erector far 35 the DNAs of the present invention, which is described below, into cells and expressing the polypeptides of the present invention. The substrate may be exogenouslyintraduced DNAscomprising the telomeric repeat sequences or endogenous DNAS (genornic DNAs , etc . ) comprising the telomeric repeat sequences.
Polypeptides other than the polypeptides of the ~ax_-es~nt zilvcmtion can coexist during Cleavage reaction for 'Lhe telomer.is repeat sequences. Suah proteins include, for example, telomere~specific chromatin components and telomere-binding piU LCliIS .
The present: invention also provides a cleavage assriy system for the telomeric repeat sequences . The cleavage reaction for the t.elomeric repeat sequences using the polypeptides of the present invention can be assayed easi~,y by using oligonucleotides. Such oligonucleotides can be labeled appropriately. The labeling can be achieved using a label such as radioisotope or fluorescent substance. For instance, as shown in Examples, a labeled double-stranded oligonucleotid~isincubated with the pol.ypeptidcs, and the reaction product is electrophoresed in a denaturing polyacrylamide gel, and detecting the labels allow to determine i:he presence or absence of cleavage of substrate oligonucleotide, or cleavage site. furthermore, the oligonucleot~.de can be immobilized on a support. For example, one end of the oligonuGleotide is immobilized, and the other is labeled with a fluorescent substance or such. The complementary strands ar_e annealed together to dorm a duplex and then the polypeptides are contacted with it. When the polypeptides cleave the oligonucleotide, the fluorescent substance is released from the nucleotide; the release can be used as an index to detect the cleavage. Alternatively, when two ~:ypes of fluorescent substances are used, the respective fluorescent substances move away from each other upon cleavage; the accompanying spectral Change is detected to assess the cleavage. Alternatively, a fluorescent group is associated to one end of the oligonuGleotide, and a quenching group is associated to the other end; cleavage-dependent fluorescence or color development/ can be detected. Alternativea,y, the cleavage product can be detected by mass spectrometry, or such. for example, 3~ in an assay, using microtiter plates on which oligonucleotides have been immobilized allows treatment of, many samples at the same time .

A7.ternatively,oligonucleotides comprising varioustelomeric repeat sequences derived from eukaryote and other control oligonucleotidcs are immobilized on DNA chip or 1~NA microarxay; an assay using such DNA chip or DNA microarray allows rapid assessment of substrate specificity of the cleavage by polypCptides_ For exampJ.e, it is possible to use the 2~ssay systems described above to screen libraries of poaypeptides comprising mutated amino acid sequences in EN region of the TRAS-like family, such as TRASl EN.
The polypeptides of the present invention are characterized in that they cleave DNAs comprising the telomeric repeat sequences .
Alternatively, the po~.ypeptides of the present invention are characterized in that i:hey cleave the V:elomeric repeat sequences in DNAs. The polypeptides of the present invention and the DNAs encoding the polypeptides can be used for cleaving the telomer~,c J.5 repeat sequences . The use of cleavage of the telomeric tepee t~
sequences includes arbitrary use that comprises the process of cleaving the telomeric re~aed L Sec:iuences . The polypcptides of the present invention can be used as polypeptides for cleavingthe telomeric repeat sequences in DNAs ; the DNAs encoding the polypeptides of the present invention can be used as DNAs for, cleaving the telomeric repeat sequences in DNAS . The present invention also provides uses of the pol.ypeptides of the present invention and DNAs encoding the polypeptides of the present invention for the purpose of cleaving the telomeric repeat sequences.
The polypeptides of the present invention and the DNAs encoding the polypcptides can be used suitably, for example, for cleaving telomeric repeat sequences comprising AGG. Alternatively, ~4he polypeptides of thG present invention and the DNAs encoding the polypeptides can be used suitably for cleaving telomeric repeat sequences particuJ..arly comprising GTTAG. Such telomeric repeat sequences include, for example, the "S'-TTAGG/5'-CCTAA" repeat sequence that is a telomeric repeat sequence in insects and others (pkazaki , S , et al . , 1993 , Mol . Cell . Biol . 13 : 1424-1432 ; Sahara , K. F. et al., 1999, Chromosome lies. 7: A49-460), and the 3S "5'-TTAGGG/5'-CCCTAA" repeat sequence that is a tclomeric repeat sequence in many species including human and other vertebrates (Meyne, J. et al. , 1989, Prac. Natl. Acad. Sci. ,USA 86: 7049-7053) .
Preferred polypeptides encoded by nNAs of the present invention are polypcptides cleaving the telomeric repeat sequences derived from vertebrate and/or arthropod. Such polypeptides can be used to cleave the telomerxc repeat sequences derived from vertebrate and/or.
arthropod. Specifically, foz'example, the polypeptides encoded by DNAs of the present invent~.or1 includes those cleaving the sequences comprising the "5'-TTAGG/5'-CCTAA" repCats and/or the "5'-TTAGGG/5'-CCCTAA" repeats. More prefQrably, the polypeptides encoded by DNAs of the present invention are polypeptides Cleaving the human Le~.americ repeat sequences [sequences comprising the (5'-TTAGGG/5'-CCCTAA) repeatsj. The polypeptides can be used to cleave the human telomeric repeat sequences.
The polypeptides encoded by DNAS of the present invention may also include, for example, polypeptides having act~.vity of at least cleaving the pyrimidine-purine junction of telameric repeat sequences. The polypeptides of the present invention may aJ,so be, for example, polypeptides cleaving the T-A junction of telomeriC
repeat sequences.
The DNAs encoding the polypeptides a~ the present invention can be used to cleave the telomeric repeat sequences in Cells as well as to produce the polypeptides of the present iriven.tion in vivo and in vitro. The cleavage of telomeric repeat sequences in the chromosome of cells can result in suppression of cell proliferation.
The DNAs of the present invention can be used for gene therapy to txeat cell-proliferative diseases, for example, cancer.. There is no Z~.rnxtation on the type of DNAs of the present invention, as long as they encode tTte polypeptides of the present invention; the DNAs can be cDNAS synthesized from RNA, genomic DNAs, chemically synthesized DNAs , or such . The nNAs can also include DNAs comprising an appropriately altered nur_leotide sequence due to the degeneracy of genetic code, which is derived from ~:he amino acid sequence of polypeptidcs of the present invention, as long as Lhey encode the polypeptides of the present invention.
The present invention also provides vectors in which the DNAs of the present invention has been inserted and host cells comprising such a vector (transformants) . In the present invention, there is no limitation on the Cypes of vector and host cell. The vector of the present invention can be used to allow host cells to contain the DNas of the present invention and to express the polypep~ides of the present invention in the host cells as well as to amplify DNAs of the present invention. Systems for producing the polypeptides include in~~ri tro and in-vivo production systems . The present invention also provides hosts in the systems for producing the polypeptidcs.
Fox example, when the host is E. coli, vectors include plasmi.d vectors (fox example, pUClB andpUCl9) , phagemidvectors (pBluesc:r~ipl, (Irivitrogen), etc.), phage vectors (for examplC, Lambda ZAP II, SurfZAP, ZAP Express (BTRATAGENE) ) and cosmid vector (for example, pWElS, SuperCos I (STRATAGENE) ) . E. coli expressionvectoxs include, for example, pCALsystem {STRATAGENL) andpETl6 (Novagen) . To express the polypeptides of the present invention, the polypeptides can be secreted to the periplasm or the outsido of cells by adding an appropriate secretory signal ~4o the DNAs. When producing the polypeptides into E. co.Zi periplasm, the secretory signal sequence to secrete the polypeptides can be any of the signal sequences of ompA, pclB, and LTB. In addition to E. coli, Baoillus subtilis and ~thPrs cart bC used as the host cell . The introduction of the vecfor into host can be achieved by a known transformation method, such as calcium chloride method and electroporation.
?,5 When yeast is used as the host, the expression vector system includes, for example, YEX Yeast Expression Systems (Clontech), MATCHMAKER Yeast Expressipn Vectors {Clontech) , ESP Xeast Protein Expression and Purification System (for Schi2osaechaJ.~omyees pombe;
STRATAGENE), and pESC Vectors (for Saccharomyces cerevisiae;
STRATAGENE). When the expression is carried out in Pichia, pPIC, pGAP (Invitrogen), or such can be used. These vectors can be transformed into host cells, for example, by the lithium chloride method.
When using an insect or an insect cell as the host, the expression systems include, for example, systems using l3acPAK'~ BaGu7.ovirus ExpressioriSystem, BacPAKTMRapidTiterRit, pBacPAK8TransferVector, r pBacPAK9 Transfer Vector, pAcUW31 Transfer Vector, and BaCPAIi6 Viral.
DNA (Clantech) ; and Bac-to-Baci~~'baculovxrus expression systems using pFASTBacxM 1, pFASTBac~~ HTa, b, c, and pFASTBacTH DUAL (GIBCO-BRL) .
The host cells include cell lines such as BmN, Sf9, Sf21 and TnS;
5 and larva of silkworm.
When the host is a plant cell, the foreign DNAs can be oxpxesscd with an expression vector using the CMV 35S promoter or such . Methods of introducing DNAs into cells include Agrobactcz~i um method, particle gun method,electroporation,and polyethylene glycol method. A plant 10 that expresses the polypeptides of the present invention can be grown from plant cells by regeneration. Plant cells to be introduced foreign DNAs include, generally,Arabidopsis,tobacco and rice plant, but not limited thereto.
There are many usable expression vector systems for mammalian 1S cells. For example, such systems include pCMV-Script and pCMV-Tag (STRATAGENE), pSGS (STRATAGENE), ecdysone-inducible vector (for example, pERV3, pEGSH: STRATAGENE; pINDsystem vector: Invitrogen), pcDNA3.1 systemvector (rnvitrogen) andMATCHMAIiERvector (Clor~tech) .
There is nv limitation on the type of host cell; such cell lines 20 inc7.ude, for example, CHO cell, GUS cell, 3T3 cell, rnyeloma, BHK
cell, HeLa cell and Vero cell. It is also possible to use primary culture cell and xeno~aus oocyte. The vectors can be introduced into host cells, for example, by calcium phosphate method, DEAE-dextran method, cationic liposome mEthod, eleCtroporatior~, lipofectxor~, or 25 such. Polypeptides can also be expressed in vitz~o by using i»-vitxo translation vector (for example, pSPUTK;,STRATAGENE).
When the DNAs of the present invention are expressed in eukaryotic cells, the produced polypeptides act on the chromosomal DNAs in the nucleus of the host cells, and therefore, cleaves the telomeric repeat 30 sequences . fhis.can Lead to the loss of chromosamaJ, telomeric repeat sequences and thus a.nhxb~.tiorl of. telomere function. Tndeed, it has been demonstra~:ed that cell death and suppression of cell proliferation is induced in cells in which the original length of telomeric repeat sequences is hardly maintained because of mutations 35 in the telomerase system (IIahn, W. C. et al., Nat. Med., 1999, 5:
1164-70, Herbert, B. et al . , Pros. Natl. Acad. Sci. USA. 1999. 96:

r 14276--$1). Eukaryotic cells in which the vector of the present invention has been introduced are useful particularly as a cell model for senescence. The use of such cells allows screening for agents ~Qr compensating f.o.r the lass of telomeric repeat sequences due to the expression of DNAs of the present invention, and agenda for suppression of chromosomal destabilization, cell proliferation, cell death, or such; thereby, it is expected that agents to prevent aging or rejuvenating agents are obtained. kreferred host ce7.ls to be used as a senescence model include mammal ion cells; particu7.arly preferred are human cells. Not only cell lines but also cells unestabJ.ished as cell lines, fox example, primary culture cells can be used preferably for this purpose; normal cells can also be used preferably. -A cancer cell can be used with special benefits as the host 1S ce3.l. As described above, generally, the elongation of 'the chromosomal telomeric repeat sequences by telomerase is considered to contribute to the maintenance of call proliferation in cancer cells. The DNAs of the present invention is expressed in cancer cells to cleave the chromosomal telomeric repeat sequences; this can result in suppression of cancer cell proliferation and recovery of. contact inhibition to suppress the malignant alteration. Thus, cancer cells in which the DNAs of the present invention has been introduced are useful as a model for cancer therapy. Such cancer cells that can be used for this purpose include known ceJ.l lines, cel~.s clinically isoJ.ated from the living tissues, cells artificially cancerated or changed malignantly by introducing a Carcinogen or oncogenc, and so an_ By the use of transgenic technique describEd below, the DNAs of the present invention can be expressed in the body of an individual .
The host cells of the present invention include sells in the body of an individual. When using mammalian species as the host, such hosts include goat, pig, sheep, mouse, rat, rabbit, bovine, etc.
It is also possible to induce the expression of DNAs of the present invention by external stimulation. For example, tetracycline-induciblc promoter, ecdysone-inducible promoter, temperature-sensitive or -inducible promoter, Cre-loxP system, or such can be used for this purpose.
To achieve in-vivo expression of the T7NAs of the present invention in an animal body, the vector of. the present inveni:ion can be administered in combination with a transfection reagent such as cationic lipid and liposome. Alternatively, the naked UNAs can be administered directly. A viral vector can also be used for this purpose. Such viral vectors include retrovirus (for example, pFB
and pFB-Neo, STRATAGENE), adenovirus {for example, pShutLle, Clontech), adeno-associated virus, lentivirus, Sendai virus, SV40, vaccinia virus and Epstein-Barr virus. Tn particular, since viral vectors can be admini~Lered t;asily and allow high-level expression of foreign genes in viv~o, it can be used preferably for gene therapy using the DNAs (or RNAs) of the present invention . The administration of such DNAs to the body can be achieved by either an ex-vivo method or in-vivo method. In the in-vivo method, the DNAs (or the transcript~.anal product thereof ) of the present invention or. vector comprising it can be admina.stPrQd c~ir~:ctly into the body. In the ex-vivo method, the DNAs are introduced into cells outside the body, and then cells are administered into the body. In the ex-vivo method, for example, it is possible to administer cells producing vira3 vector capable of expressing the polypeptides of. the present invention.
In the case of local administration to a target tissue, vectors or Cells can be administered directly to the target tissue or via fiberscope or catheter. Alternatively, the vector can be introduced into the target tissue us~.ng carrior Capable of delivering the vector to the particular tissue. This method allows specific expression of the DNAs of the present invention in tumor cells or such.
The polypeptides of the present invention can be prepared from the host in which an expression vector for DNAs of the present invention has been introduced, or. the culture supernatant thereof. The polypeptides of the present invention can be purified to substantial.
homogeneity by known techniques. Such polypeptides can be separated and purified, for example, by appropriately selecting and using in combination: ammonium sulfate precipitation; various chromatographic procedures, such as ion-exchange chromz~tography, reverse-phase chxomatography,hydrophobic chromatography,affinity chromatography and gel filtration; filtration; ultra~~.~.tratian, sa~.ting out; solvent precxpxtatxan; sa~.vent extraction;
distillation; immunaprecipitation; SDS-polyacrylamide gel electrophoresis;isoelectric focusing; dialysis;recrystallization, and so on. Chromatographic methods also include high perfortuance liquid chromatography (HPLC) and fast protein liquid chromatography (FPLC) . rt is possible to use affinity chromatography xn which an crntibtady ac~airist ttm pVlype,ptic~es ca f. tt« pt'eserl ~ irrverylUrl U1' ac~dirm ~
a tag linked to the polypeptides has been immobilized in a column.
The antibody to be used for affinity purification may be a polyclonal antibody or a monoclonal antibody _ The polypeptides of the present invention can be prepared by in-vitro translation.
The polxpeptidcs of the present invention can be expressed in ~:he farm that the polypep I:ides are attached to a tag peptide or fused with other polypeptides. Such fusion with a tag or other polypeptides allows not only convenient recovery and purification but also easy detection of the polypeptides of the present invention .
ror example, the polypeptides containing His-tag can be purified with a nickel column. Other known tags, such as HA tag, FLAG tag, 70 c-Myc tag, and H5v tag, can be linked i:o the palypeptides . The fusion partner polypeptides include glutathione S-transferase (GST), maltose-binding protein (MBP), ~3-galactosidasc, arid GFP (green fluorescence protein). For example, a GST-fusion polypeptides can be purified using a glutathione column.
The DNAs of the present ~.n~renti~on, the vector of the present invention, and the polypeptides of the present invention can be used as agents for cleaving DNAs comprising the telomeric repeat sequences .
The DNAs of the present invention , the vectors of the present invention , or the polypeptides of the present invention can be combined with known carrier and solvent (far example, physiological saline, pH
buffer, stabili~cr, preservative, suspension, salt, etc. ) to prepare a compos~.tion. An agent far cleaving DNAS Comprising yhe telomeric repea~4 sequences, which comprises as the active ingredient the DNAs of the present invention, the vectors of the present invention, or the polypeptides of the present invention, can be used as a reagent for cleaving the telomeric repeat sequences . When such a cleaving agent is used for cells , the telomeric repeat sequences in chromosome ire cells are cleaved. This allows the suppression of cell proliferation, which can induce cell senescence and cell death. An agent of the present invention fo.r cleaving DNAs comprising the telomeric repeat sequences carp be usod as a pharmaceutical. The DNAs of the present invention, the vec~Cors of, the present invention, and the polypeptides of the present invention curl be administered as pharmaceutical compositions that are formulated by a known pharmaceutical. method. For example, they cam be r:ot'nbimeci with pharmaceutically acceptable carrier or solvent, specifically sterile water,physiologicalsaline,salt,vegot~ble oil,stabilizer, preservative, suspension, emulsifier, and others to formulate a pharmaceutical composition. For example, the polypeptides of the present invention can be combined with liposorne cationic lipid, or.
such to prepare a composition to be introduced into cells. The pharmaceutical composition of the present invention can be used to induce cell senescence and/or cell death. In particular, the pharmaceutical composition of the present: inve~.tion is useful to prevent and treat cell-proliferative diseases, specifically cancer or such.
In particular, the vectors comprising the DNAs of the present invention and the compositions comprising the vectors can be used for gene therapy because they make it easy to supply the polypeptides of the present invention locally or systemica~.ay. Gene therapy using the vector of the present invention and the pharmaceutioal composition comprising the vector can be achieved, for examp7.e, by an in-vivo method o.r. ex-vivo method. The vector to be used for this purpose includes, for example, viral vectors, such as retroviral vector, lentiviral vector, adenoviral vectar, adeno-associated viral vector.
when the agent of the present invention ~or cleaving DNAs comprising the telomeric repeat sequences is administered as a pharmaceutical, into the body, it can be administered local~.y or systemically, typically, by a method known to those skilled in the art, for example, intra-arterial injection, intravenous injection, subcutaneous injection, or intramuscular injection. The agent can be adminis i:ered locally via syringe, catheter, fibers cape or such .
The dosage varies according to body weight and age of a patient, method of administration, condition, efficiency of the gene txansductxon, expression level, metabolizing rate, and such, but 5 one ski f. Led in the art can suitably select the dosage considering them. The administration can be carried out once or several i:imes .
The vector comprising the DNAs of the pxcsent xr~vention can be administered, for example, accoxding to a known gene therapy protocol .
This invention aJ.so relates to a transgenic non-human vertebrate 10 retaining the DNA of this invention in an exogenously expressible stotc. Such animals include animals expressing the exogenausly retained DNA of this invention, and animals in which the expression can be induced. Expression may be a systemic expression or may be cell, tissue, or organ specific. Tt may also be period specific.
15 Furthermore, it may be those r~rhere expression is induced by external stimuli , or those where expression is altered (induced or suppressEd) in the later generation due to crossing. The transgenic animal of this invention is especially useful as an aging model. These transgenic animals may be utilized for. molecular analyses og the 20 aging mechanism and fox development of preventive drugs for aging.
For the transge>Zic animals of this invention, the use of mammals, espECially rodents such as mice and rats, is preferred.
The transgenic non-human vertebrate of this invention can be produced,. for example, by transfecting a vector expressing the 17NA
25 of this invention into a fertilized egg. TransfeCtion of DNA can be performed by mixing the vector and the egg then treating this with calcium phosphate, or by microinj ection under an inverted microscope. Alternatively, the vector of this invention can be transfected into an embryonic stem cell (ES cell) and the selECtad 30 ES cell can be introduced into a fertilized egg (blastocyst) by ma.croinjection (cordon, J. W. et al. , 1980, Proc. Natl. Acad. Sci.
USA 77: 7380-7384) . The obtained fertilized egg can be transplanted into the oviduct of a recipient that has undergone pseudopregnancy by mating a vas deferens ligated male individual, and an offspring 35 can be obtained. By preparing DNA from the tail of the offspring, retention of the transfected DNA can be confirmed by PCR (Katsuki, t M. ed. "Hassei Kogaku Jikken Manyuaru (Developmental Engi~aeer~.ng Experiment Manual)" Kodansha (19$9) ; Japanese Biochemical Society ed. "Shin Seikagaku Jikken Koza Dobutsu Jikkenho (New Biochemistry Experiments, Animal Experiment Methods) " Tokyo Kagaku Doj in (1991) ;
Hogan, B. et al., 198b, "Manipulating the mouse embryo", in "A
laboratory manual", Cold Spring Harbor laboratory gross). from chimeric animals in which gene transduction has been achieved in d cgyer.m cell line, heterozygotes can be obtained by mating with a normal animal . A homozygote can be ob Lamed by ma Liiiy he Lero~ygo Les .
Exampi.es of pzomoters used to express a polypeptide of this invention .in viva, are systemically expressing promoters, and other tissue specific, and period specific pxomatEr_s (Saner, B., 2998, Methods 14: 381-92).
When producing a transgenic animal expressing a DNA of this 1~ invention in a site-specific or period-specific manner, Cre-loxP
system and FhP system of yeast may be used. For example, a transg~enic animal having a Cre reoombinase gene downstream of a site-specific or period-specific promoter is produced,andscparatcly,a transgenic animal retaining a vector in which 17NA encoding a polypcptide of this invention is linked downstream of a universal promoter is prepared. A stop codon or a transcription termination signal and such flanked by loxP sites is placed between the promoter and the DNA encoding the polypept.ide of this invention. By crossing the two individuals , the polypepti.de of this invention can be expressed along with Cre (Saner, B., 199$, Methods 14: 3$~.-92).
Brief Description of the Drawin s Fig. 1 shows clustering structures of different families of TR.AS
and SART in isolated clones of E. mori . Schematic structures of eight clones of i1, phage containing B. mori genomic fragrnen~a screened using (TTAGG)~ and sub-clones of plasmids derived from each phage clone are shown. Solid triangles represent telomeric repeats in the direction of 5'-(TTAGG)n-3~. The transcription of the r, etrot~ransposon unit in the direction of 5' to 3' ( l . a . the ORE
3S direction) is crown by an open arrow. "N.D." represents an undetermined sequence. Six different TRAS family members (TRAS1, 3, 4, Y, Z, and W) and two SART familymembers (SART1 and2) axe inserted into the telomeric repeats (depicted bar solid triangles) in opposite directions and axe tandemly clustered in the sub-telomere region of B . ntori .
S Fig. 2 shows a comparison of the whole unit of TRAS3 axed TRASI.
Restx~.ction zaps of TRAS3 anal TRAS1, axe shown on each line.
Abbreviations of restriction sites used in the map are as follows E, ,EcaRI; Vii, Hindzzz; X, Xbaz; S, Sa~,I; v, ,EcoRV; R, .Kpnl; B, BamHI;
P, PstI; Xh, ,Xhal; Sc, Sacl. The restr5.ction sites shown in parentheses axe ma.ss,ix~g in the data obta5.ned bar genomic Southern hybridization. (CCTAA)n, telomeric repeat; An, poly A tail. Solid r,,: :;
box and open circle indicate dinucleotide (CA) repeats in the 5' -UTR
and poly A, respectively. ORFI (gag-like ORF) and ORF2 (pot-like ORF) are a.nd,icated by open boxes. Vertical lanes a.n the ORFs shave 25 cysteine-histidine motifs (zinc finger domains,ZFs}. Endonuclease domain (EN) , Myb-like DNA binding domain (Myb) , reverse transcriptase region (RT), and RNaseH region (R/~i) are also shown in the figure (also see Examples) . The xegiox~s of TRAS3 fragments (Probe ~. arid 2) used as probes for genomic hybridization are shown by solid lines .
The amino acid sequences for the putative frame-shift region between two ORFs are shown below ORF structures . The TGCTAA sequence (boxed in the figure) is conserved near the C-terminal end of gag-ORF of both TRAS family members.
Fig. 3 shows junction regions between TRAS members and (CCTAA) n ,..
2S tel,omeric repeats in clones encoding TRAS members. (A) The 5'-end sequezaces o~ TRAS membea:s. Most of the TRAS sequences start at the same position, i . a . at the nucleotide C j ust after CC sequence in the telomeric repeat (CCTAA). The consensus sequence of TRAS1 is shown i,n the bottom. Dots denote nucleota.des ,identical. with those in the pBT3-3 (TRAS3) clone. Hyphens indicate gaps introduced for the alignment. Nucleotide sequences of the 5' terminal region (,A,GTCTGC and its derivatives} and those around +100 site (AAGTG and its derivatives) , which are cansexved among TRAS fam,a.ly members, are underlined. These are putative sequences involved in the transcription initiation of non-LTR retrotransposons (see Examples) .
(B) The 3' -end sequences of TRAS elements . Dots denote nucleotides i,den.ta.cal with TRAS3-1 . Each ex~d o~ the TRAS clone had poly (A) tails of various lengths. The extreme 3' end of TRAS1 (4lbp*) is not shown in the figure. "N.D."~ undetermined. Nucleotide sequences (except for the telomeric repeats ) of the 5' -end j unction region of pBT3-3 , p81, pSF4 , pTAB2 , pBT-HK, pX5-1 , pSau, pB2 , axed pB2-L are shown in SEQ TD NOs: 15 to 23, respectively. Nucleotide sequences (except for poly A and telomeric repeats) of the 3'-end junction region o~
TRAS3 , p3-3L, pX5-3 , pTAB2 , and pB2-L are shown in SEQ ZD N0s : 24-2 8 , respectively.
Fig. 4 shows telomeric sequence-specific retrotransposons of the silkworm.
(A) Chromosomal structure of the telomere of the silkworm Bomrbyx rriori. Triangle indicates the telomeric repeat sequence (CGTAA/TTAGG)n. Open boxes indicate telomere-specific retrotransposons TRAS and SART, which are inserted into the telomeric repeats in opposite directions. Arrows show the directions of txanscr,ipt,iox~ by these meznbe,rs . The 5' - and 3' -- j uncta.on xega~oz~s between TRAS1 arid the telomeric repeat are shown below. Since three possible 3' junction sites are envisioned from the sequence information, the precise boundary between poly A and telomeric repeat cannot bespecified. Dotted boxes indicate possiblejunction sites.
(B) Schematic structure of TRAS1. Open reading frames (ORFs) and untranslated regions (UTRs) of TRAS1 are depicted by open boxes .
1.
ORf2 (from nucleotide 3788 to 74&2) over~,aps ORF1 (from 2434 to 3813 nucleotide) with the shift of only the +~, xeada.ng frame, vertical lines near the C--terminus of both ORFs represent cysteine-histidine motifs. Endonuclease (EN) domain and reverse transcriptase (RT) domain are also indicated. The amino acid sequence of TRAS1 EN
expressed in E. coli is shown below (SEQ ID NO : 37 ) . The N-terminal His tag sequence derived from pET-16b is boxed . The putative active site (His-25$) for endonuclease activity, which was changed to Ala in the mutagez~esis expera.ment, is underla.ned.
Fig. 5 shows the expression and nicking activities of TRAS1 EN.
(A) Purified TRAS1 EN and mutant protein H258A were separated on an SDS-8% PAGE gel, and stained with Goomasie blue. Identical bands could be observed at the estimated molecular weight of 30 . 3 kDa . (B) The nicking activity of various amounts of TRAS1 EN. Double-stra~rrded 3z1~- (TTAGG) s substrate was digested with TRAS1 EN (see Fig. 6) , and the radiaactivi.ty of the cleaved product o~ 17 by (5~-32P (TTAGG) 3-TT-3~ ) was quantified with the BAS system (see Fig.
8 ) . Reactions were carried out at 25 ° C for 60 min : The perGen,tage of the cleaved products to the initial substrates were plotted against the increasing amount of TRAS1 EN (N.g) . (C and D) Zx~~luence of pH
and temperature on TRAS1 EN endonuclease activity. The pH influence was tested in two buffer systems. Namely, two reaction solutions containing 10 mM NaCl , 2 mM MgCla , 100 M BSA and 0 . 2 g of T~AS1 EN
was used together with 50 mM PIPES-HCl for pH 5 . 5 to 7 . 0 , axed 50 mM
HEPES-NaOH for pH above 7 . 0 Quantitative titration of TRAS1 EN activity in (C) and (D) was performed as in (B) .
Fig. 6 shows the nicking activity of TRAS1 EN for a double-stranded telomeric repeat sequence . (A) Acta.v~,ty for double-stranded (TTAGG) s substrates. 3ZP-labeled (TTAGG)5 (also called the "G-strand" or "bottom strand") was annealed to the complementary non-labeled (CCTAA)S (also called the "C-strand" or "top strand").
Double-stranded (TTAGG/CCTAA) 5 a~as treated wa,th no protein (lane ~) , 0.2 ~.lg of H25$A mutant protein (lane 2) , or 0.2 dig of TRAS1 EN (lane 3) at 25°C for 60 min. The resulting fragments were separated on a 2$% denaturing gel. (B) Activity for single-stranded (TTAGG)s substrates. 3zP-labeled(TTAGG)soligonucleotideswere prepared,and then treated with TRAS1 EN under the same condition as (A). ds, double-stranded substrates; ss, single-stranded substrates.
Fig. 7 shows the cleavage activity of TRAS1 EN for the vertebrate telameric repeat (TTAGGG)n. Bottom strands were end-labeled and annealed to the complement strands. Double-stranded oligo DNA were treated without (-) or with (+) 0.2 ).~.g TRAS1 EN as iri Fig. 6 and then separated on a 28~k denaturing gel. [Le~t part of the panel]
single-stranded(TTAGG)swere prepared,and incubated withx~o proteins (lane 1), or with the TRAS1 EN protein (lane 2). [Middle part of the panel] Double-stranded (TTAGG)5 were incubated with no protein (,lane 3) , with p _ 2 ~.l.g H258A mutant protein (lane 4) , or with TRAS1 EN protein (lane 5). [Right part of the panel] Double-stranded vertebrate telomeric repeats (TTAGGG)s were also incubated With no a proteins (lane 6) or with the TRAS1 EN protein (lane 7). Ss, single-stranded DNA; ds, double-stranded DNA.
Fig. 8 shows specific cleavage on bot'b strands o~ telomeric repeats by TRAS1 EN. The end-labeled (TTAGG)5 (battam strand) and 5 (CCTAA)6 (top strand) were each annealed to its complementary oligonucleotide. Reactions with TRAS1 EN were conducted fox' 60 min at 25°C as ~.n Fa.g. 6, e~ccept that twice the amount of TRAS1 EN (0.4 [ag) was added. various lengths of single-stranded o~.i,gox~ucleotides were end-labeled and loaded for electrophoresis as molecular weight 10 standards (dGlO, ~zP-5'-TTAGGTTAGG; dGl2, ~2P-5'-TTAGGTTAGGTT; dG~.S, szP-S' ~TTAGGTT,A.GGTTAGG ; dG17 , 32P-5' -TTAGGTTAGGTTAGGTT ; dClO , szP-5.-CCTAACCTAA; dCl2, 3zP-5'-CCTAACCTAACC; dClS, 32P-5i-CCTAACCTAACCTAA; dCl7, 32P-5'-CCTAACCTAACCTAACG). The end-labeled strand was indicated on the top. The nucleotide sequences 15 around the nick5.x~g sites are shown aJ.ongside o~ the lanes . The rnaj or cleavage sites are Shawn by arrows.
Fig. 9 shows the kinetics of the cleavage reaction of TRAS2 EN.
{A and B) Pattern change of 5-nucleotide ladder during the reaction :for 60 ma.n. TRASX EN was incubated with double-stranded substrates 20 3zP- (TTAGG) 5/ (CCTAA) 5 or 3zP-- (CCTAA) s/ (TTAGG) 5, under the same conditions as in Fig . 6 at 2 5 ° C . Note that the bottom-strand cleavage precedes the top-strand cleavage. dGl2, dGl7, dCl2, and dCl7 were used as size zutar~ers (see Fig. 8) . (G) Quantitation o~ the cleavage reaction shown in panels (A} and (B) . The radioactivity of the 17-by 25 cleaved product (*5' -~ [TTAGG] s-TT .~ -3' ) a~x~ the bottom-starand xeaCtioz~
as well as the 12-by cleaved product {*5'-[CCTAA]z-CC~.-3') in the top-strand xeacta.orz~oere quantified as in Fig. 5. The percentage of nicked substrate was plotted against incubation time (minutes) .
Fig. 10 shows target specificity of TRAS1 EN against a long DNA
30 substrate. (A and B) In the assay for site-specificity, 60 by of a non-telomeric repeat sequence was randomly selected from the pBR322 sequence (accession number,701749, position 81 to 140) . A 75-by long DNA substrate comprising a 15-by telomeric repeat sequence (boxed region in C) flanked by 30 by o~ the se7.ected non-telomeric repeat 35 sequence at both ends was synthesized. The entixe sequence of this substrate (5'-CAGCATCCAGGGTGACGGTGCCGAGGATGACCTAACCTAACCTAACGATGAGCGCATTGT
TAGATTTGATAGAGG-3' / SEQ ID Np: 65) is shown in panel C. The end-labeled 75-by DNA was annealed to its complementaary strand and cleaved with TRASI EN undex the same conda~t~,ons as in Fig. 6.
Incubation time (in minutes) at 25°C is indicated at the top o~
the panel. Various lengths of end-labeled oligonucleotides (b32, b37, b42 , t32 , t3 ~ , axed t42 ) Were electxophoxesed with the pNA as exact size markers as shown in panel C. (C) Schematic representation of the cleavage sites on both top and bottom strands of the 75-by substrate .
14 The nucleotide numbers are counted from the 5' end of each, strand.
The solid arrowheads indicate cleavage sites on the bottom strand.
The open arrowhead marks the cleavage site on the top strand. A
putative cleavage pattern on the double-stranded DNA is shown by the dotted line.
XS Fig. 11 shows endonuclease activities of TRA,S1 EN fox mutated telomeric repeats. (A) Each nucleota.de in the (TTAGG) unit of the bottom strand was substituted to a cytosine residue. Mutated nucleotides are underlined. (TTAGG)SOr the mutated double-stranded substrates were txeated with TRAS1 EN as described in Fig. 6. "EN-", 20 no TRAS1 EN; "EN+", with TRASI EN. (B) Nicking of the double-stranded dTATA oligonucleotide 5' - (TTAGG) 2-TTATAGG- (TTAGG) a-3' . The major cleavage site on TATA sequence is indicated schematically on the right .
dGl2, a size marker (see Fig. 8).
Fig. 12 shows the result of the analyses of nucleotides involved 25 in the cleavage reaction of TRAS1 EN . Treatment with TRAS1 EN protein was performed as described for Fig. 6. Each base from -7 to +8 =cleavage sate which produces the 12-by cleaved product) in the 25-by (TTAGG) 5bottomstrandwas systematically changed to a cxtosine (boxed) .
The cleavage patterns around the 12-by cleaved product in the gel 30 are a~.so shown on the top. "Govt.": control. Signal intensity of the 12-by band in each reaction was quantified, and the intensity relative to that of the control is shown on the right (control 100) .
Each value represents the average of six independent experiments, and error baxs acepresent the standard erxor.
35 Fig. 13 shows the phylogenetic tree of TR.A.S ls.he-fam.i~.ies of the silkworm and other insects and other known non-LTR

retrotransposons. The tree was constructed by the neighbor~joining method using CLUSTAL W based on amino acid sequences fa~om EN to RT
domains . The region compared is shown below the tree (corresponding to regions shown in Figs . 14 to 17) . Nine TRAS-like family members fromlepidopteran insects (TRAS1, TRAS3, TRA84, TRASS, and TRAS6 from .8, ,mori; TRASDJ from D. japox~.ica; TR,~SSC3, TRASSC4, and TRASSC9 from S. Cynthia) constitute a single phylogenetic group (enclosed within bzoken lines). Values in the figure indicate the reliability (reproducible number of times) of the tree in 1 , 000 trials using the bootstrap method. "1000" means that all the data ob'~aix~ed bar the 1 , 000 txi.als were ,idezxti,cal, suggesting that the reliability of the branch is extremely high. In general, a value of about 80% (about "800" in 1,000 trials) is the standard of reliability, and the reliability of the branch is thought to be relatively low where the ~ralue is beloo~ 505 ("500" a~~a x, 000 tr~,als) .
Fig. 14 shows a comparison of amino acid sequences among TRAS-like family members of the silkworm and other insects and other known site-specific non-LTR retrotransposons. The amino acid sequences fx-om EN to RT domains ire ORF2 of each. member were ala.gned usa.ng CLUSTAL
W. Nine TRAS-like family members isolated from three lepidapteran insects (TRAS1, TRAS3, TRAS4, TRAS5, and TRAS6 from the silkworm (B.
mori) ; TRASDJ from D, japoxa,z.ca; TRASSC3 , TRASSC4 , and TRASSC9 from S. Cynthia) are shown. RIHm is a site-specific retrotranspason integrated into 28S rDNA of B. mori. SART1 is a telomeric repeat-specific retrotransposon of B. mori. RTIAg is integrated specifically into az~othex site of 28S rDNA of a mosquito (A. garr~b.~ae) .
TARTDm is the telomere-forming retrotransposon of D. melanogaster.
LlHs is a human retrotransposon that has preferable target sequences .
Hyphens indicate gaps introduced for alignment . The EN and RT domains are indicated in the figure. ,In the boxes enclosing amino acid residues, the residues conserved among TRAS-like family members axe shown by narrow open boxes (marked by asterisks). The residues Conserved among retrotransposons are shown by shaded boxes (marked by dots) . In the EN domain, regions ~rha.ch axe highly conserved only among TRAS-like families , A (En-A) and B (EN-~B) , axe ,iz~,dicated by dotted boxes. TRAS specific region (TSR) is a region that shows significant homology only among the TRAS sequences. These regions may be involved in the site-specificity of TRAS. There are three helices in the Myb-like sequence that is highly conserved basically only among TRAS-like family members. Although the he~,ices may be formed in other retrotxazxsposons, nudging from the amino acid properties, the Myb-structure is highly significant among TRAS-~.ik,e family members . Conserved motif structure in the RT domain is shown based on conventional nomenclature for the reverse transcriptase region (Figs. 14 to 17).
Fig . 15 shows a comparison of amino acid sequences among TRAS-like family members of the silkworm and other insects and other 'known site-specific non-LTR retrotransposons (Figs. 14 to ~.7).
Fig . 16 shows a comparison a~ axn,ino acid sequences among TRAS-like family members of the silkworm and other insects and other known site-specific non-LTR retrotransposons (Figs. 14 to 17).
Fig. 17 shows a comparison of amino acid sequences among TRA,S-like family z~nembexs o~ the silkwoxz~n anal other ~.nsects and other known sate-specific non~LTR retratransposons (Figs. 14 to 17).
Fig. 18 shows the amino acid sequence alignment of the putative Myb domain. Uppersectionshowsseveralknown Myb DNA-binding motifs.
Lower section shows the putative Myb-like domain identi,~ied between the EN and RT domains of TRAS members and RlBm. RAPISc is a telomere-binding protein of Saccharomyces cerevisiae. MYB is a transcriptional activator that binds to the specific DNA sequence AACYG (Biedenkapp, H . et aZ . , 1988 , Nature 335 : 835-837 ) . ENGRAILED
(Drosophila) is a transcription factor that binds to TAATTA (Ades, S. E. , and Sauer, R. T. , 1994, Biochemistry 33: 9187-9194) . The human telomeric repeat bindingfactor hTRF binds~to TTAGGG. Important amino acid residues that form the core region of DNA-binding domain in three helices are shown in the figure. Hydrophobic residues :black shaded;
charged residues that involve the interhelix interaction: gray hatched; possa.ble residues that specifically a.ntea~act with DNA
nucleotides: open boxed.
Best Mode for Carrying out the Invention The present invezataon is specifically illustrated below with reference to Examples , but it is not to be construed as being limited thereto.
[Exampl,e la identification of TRAS family members of the silkworm Multiple TRAS family members inserted within the telomeric repeats of the silkworm were identified and characterized as shown below. When a~P-labeled telomeric repeats (TTAGG)5 of the silkworm was used as probe for screening a genomic library of the silkworm (~, EMBL-3 phage library), about 0.5% of the total plaques shored positive signals. These clones include the internal telomerie repeats, which lie inside a 7 kb to 8 kb stretch of (TTAGG/CCTAA.)~
at the extreme ends of chromosomes. lost of the positive clones screened by 32P-labeled (TTAGG) 5 include poly (A) , which is a hallmark of non-LTR retrotransposons, suggesting that they are clones of various classes of telomeric repeat-associated retrotransposons.
From 100 positive cJ.ones , several clones were selected and sub-cloned into plasmid vectors fox sequence analysis (Fig. 1) . From the analysis of phage clones containing (TTAGG)n, six new family members were identified based on the structural difference in the junction regions between the 5'-end of retrotransposons and (TTAGG/CCTAA) ~, These six family members can be c7.ass~.fa.ed a.zato two large groups (Figs. 1 and 3A). Five members, named TRAS3, TRAS4, TRASY, TRASW, and TRASZ, were oriented proximal to the centromere anal were inserted between the A and C nucleotides of (CCTAA)~,. in contrast, ox~e member, named SART2, was orzex~ted a.n the reverse direction of the TRAS groups and was inserted between the T and A
nucleotides of (TTAGG)", similar to 5ART1. Restriction mapping, partialsequencing,and hybridization studies on these isolated phage clones revealed that different classes of TRAS and SART family members are clustered between short stretches of telomeric repeats. Based on this ana~.ysis , it was estimated that there are more than 2000 copies of TRAS and SART ( SART1 , 600 ; TRAS1 , 300 ; TRAS3 , 300 ; other TRAS and SART, 50 to 200) per haploid genome, each of which is about 7.5kb in average, occupying about 3% of the silkworm genome. As shown schemata.cally in. Fig. 1 , the sub-teJ.omeric region near the chromosomal end of the silkworm may consist of alternate tandem arrays of retroelements and short (TTAGG/CCTAA)n sequences.
[Example 2] Complete unit of TRAS3 As shown in Fig. 3A, the 5'-end regions of TRAS3 were highly conserved in many sub-clones isolated from multiple ~. clones. To clarify the structural features of TRAS3 , whichpxesuznably constitutes a major class of TRAS, the sequence of the complete unit of TRAS3 was analyzed. Using as a probe the 5'-end region of TRAS3 unit in 7l, B1 that has a genomic DNA fragment of silkworm that includes telomeric repeats and TRAS1 , the phage clone ~ TRAS3-1 that includes the complete unit of TRAS3 was isolated. Several clones comprising overlapping parts of the TRAS3 unit were isolated separately and sequenced. The fu~,~. length of TRAS3 (SEQ ID NO: 3$) was 79$6 by (excluding the poly A at both ends) and included go g- and pot- like ORFs (SEQ ID NOs:
39 and 40, respectively), and ended with a poly A tail (fig. 2).
Putative functional domains encoded by these ORFs, including three zinc finger domains (CCHCs) in ORF1 (gag-like ORF) , endonuclease (ELI) domain (see below) , reverse transariptase (RT) domain, RNaseH (R/H) domain, and GGHC domains in ORF2 (pot-like ORF) , were well conserved among TRAS1 and TRAS3. The overall amino acid sequence similarity in ORF1 and ORF2 between TRAS3 and TRAS1 was 43% and 60%, respectively.
When the frame-shaft region was compared between TRAS1 and TRAS3, the two ORFs of both elements overlapped, and ox~e nucleotide ~xame was shifted (+1 frame-shift) . Near the C-terminal end of ORF1 , only the TGCTAA sequence (shown boxed in the figure) was conserved between TRA.S~, and TRAS3, suggesting that this sequence may be involved in the frame-shifting mechanism (Fig. 2). The overall structures of TRAS2 and TRAS3 resemble each other, but they exist as distinct groups o~ retrotxansposon families in the silkworm genome_ Southern hybridization Conducted using several parts of TRAS3 as probes showed only one or two prominent bands of genomic DNA of the silkworm, even though various restriction enzymes (using 6-by cutters)were used for the digestion, reflecting the conserved structure of the major copies of TRAS3 (data not shown). The result of geriomic Southern hybridization is consistent with the restriction map predicted from the sequence data of the TRAS3 clone, except that two restriction sites, HindIII and Sall sites in the TRAS3 sequence (in parer~theses in Fig. 2) , were missing in the data based on Southern hybridizat~,on (Fig. 2) . Like TRAS3, mast of the copies of TRAS1 in the ger~oz~ne are also highly conserved iz~ structure, and a S' truncation is also not seen (Fig. 3 and Okazaki, S. et al., 1995, Mol. Gell. Biol. 1S:
4545-4552). Comparison of sequences and restriction maps between TRAS~. and TRAS3 demonstrates that they can be cJ.assi~i.ed as distinct families of retrotxansposons.
[Example 3] Multiple families of TRAS: their structural features Fig. 3 shows sequence alignments of the junction region between the telomeric repeats and 5' -end (A) ax 3' --end (B) of the respective TRAS clones (see Fig. 1 fax original ~, clones) . Based oz~ first 120 by of the 5'-end regions, the present inventor classified the clones into s~.x di~~exent TRAS families. This classification should be reasonable because genama.c Southexz~ hybr~.dization with the 5' -region of the clones of each family as probes confirmed that the structure of genomic copy of TRASY, TRASZ, and TRASW was highly conserved , as well as TR.A,S1 and TRAS3. The structure of these families is so highly corisexved that mast of the clones of each family showed r~o truncation at their 5'-ends. Like TRAS1, all of the TRAS3 clozaes so far analyzed and most of other TRAS clones, start at nucleotide G, just after GG following the (GGTAA)" telomeric repeat (Fig. 3A) .
The first four nucleotides, CAGT (GAGT in TRAS1. and TRASW) , following the four nucleotides (mostly CTGC) , and CGTG (or CGTT iri TRASI , TRASW, and TRASZ) around +40 are conserved widely among other non-LTR
retxotxaz~sposons . These sequences are thought to be essential for transcription inita,ation (ux~deacl~.n.ed i,n Fig . 3A; Takahashi , H . and Fujiwara, H. , 1999, Nucleic Acids Res. 27: 2015-2021) . AAGTG (or AAGTG in TRASZ) around +100 was also conserved among all TRAS familir~s .
This region might be involved a.n txaz~scri.pt~.orral regu~.ation or other functions.
[Example 4] D,istri,bution patterns of TRAS-like members in insects To confirm the existence of the TR.A.S-like members in the silkworm and other various insect genomes , genomic Southern hybrid5.zatian was performed using the silkworm TRASI DNA as a probe. The si~.)Carorm (eombyxmori) Ps'" 788 strain was maintained on artificial diet. Other insects were collected in the field of suburban area near Tokyo in 1991 (Okazaki, S. et al. , 1993, Mol. Cell. Biol. 13: 1424-1432) . In most cases, genomic DNAs were prepared from whole bodies of insects by conventional procedures . Iz~seats were ~rozeza in liquid nitrogen and ground in a mortar into powder. The tissue powder was homogenized in a solution containing 0. 1 mg/ml proteinase K (Merck) , 250 mM EDTA, 0 . 5~S N-l.auroylsarcosine . The homogenate was incubated at 50 ° C
fox 16 hr or more . Phenol extraction and phex~ol~chloroform extraction were each performed twice, then nucleic acids were precipitated with ethanol. After RNase treatment, DNA was extracted twice with phenol-chloroform, precipitated with ethanol, and dissolved ,in sterilized water. Genomic DNA of D. me.Ianogaster was obtained from CLONTECH.
For genomic Southern. hybra.dizata.on, X . 5 ~.g of ,insect genomic DNAs was digested with EcnRI , electrophoresed on, 0 . 8% agaxose gels , and blotted onto z~itxocellvlose filters (Sahleicher & Schuell) in 20x SSC (3 M NaCl, 0.3 M sodium citrate) by the capillary transfer method of Southern (Southern, E. M. , 1975, J. Mol. Biol. 98: 503-517) .
Qre-hybridization was performed at 60 ° C for 1 hr in hybridization buffer (0 . 9 M NaCI , 90 zx~M Txa.s-HCl pH 7 . 9 , 6 mM EDTA, 0 . 5$ SDS , 1 . 6~S
skim milk) , DNA probes were amplified by polymerase chaa.n reacta.on (PGR) using the silkworm genomic DNA as a template and labeled with [a-3~p]dCTP (ICN) by random priming reaction using the BeaBest DNA
labela,z~g kit (Takara) . ~iybxidization was performed at 60°C in the hybridization buffer containing the probe DNA for 16 hr ox more.
Filters were washed successively in ~4x SS~C plus 0 . 5% SDS , 2x SSC plus 0.25% SDS, and 1,x SSC plus 0.125% SDS, at 60°C for 20 min. The washed filters were dried and exposed for autarad~.ography.
Two different probes from TRAS1, probe 1 (0.6 kb, mainly containing EN) and probe 2 (1.4 kb, mainly containing Myb and RT) (see Fig. 2) , were used for hybridization with EcoRI-digested gex~ora~iG
DNA of respective insects from 8 different orders [B. nto.z°i (hep5.doptera) ; ,t.7. me~.anogaster, N. angusticornis, T. trigonus, S.
pleuralis, andE. tenax (Diptera) ; S. hypocx,zta, H. uxzciosus, A. rustics, M. .Iegatus, H. affinis, and S. buprestoides (Coleoptera); T.
nigriCOS~a and H. brevis (Hemiptera) ; P. japonicus (Mecoptera) ; S.
gr.zsei,penna.s (Trichoptera) ; F. scudderi (Dermaptera) ; and P.
fuliginosa (Dictyoptera)]. Bombyx maxi genome that includes TRAS
and SART was used as a pas~.ti~re control. The strong hybridization signals were observed in many insect species in Diptera (T. trigo.a~us and E, tenor) , Coleoptexa (S. bupxesto~.des) , D~,ctyoptera (P.
fuliginosa) , and Hemiptera (T. nigricosta) , when using each of the two probes, except that E, tenor samples showed weak signals with probe 2. Under the same hybridization conditions, no signal for R1 (RlHm) was detected with either probe. Since the EN-RT domain of TRAS1 EN has 42 . 4% identity at nucleic acid 7.evel to the coxxespox~da.ng region of RlBm (Table 1), the signals observed above may represent TRAS-like members which have higher identities (about 50% or more) 1$ to the silkworm TRAS1.
No or very weak signal was observed in T,richoptera (S.
griseipennis}, Diptera (N. angusticornis and S. pleuralis), Coleoptera (H. undosus) , and Dermaptera (F. scudderi) genomes, using either probes . The (TTAGG) r sequence was detected in the genome of S. griseipennis and H. undasus. Therefore, the sequences resembling TRAS1 of the silkworm m~,ght have been eliminated from the genome of these insects.
The wide distribution of TRAS-like members in many of distantly related a.nsects suggests that the progenitor of TRAS may have existed ancestrally before insects branched off . A cockroach P. fuliginosa (Blattaria, Dictyoptera) is one of the oldest insect and speculated to have emerged during the later Carboniferous period. Horizontal transfer of non-LTIt retrotransposons has not been observed so far (D. G. Eickbush, arid T. H. Eickbush, 1995, Genetics 139: 671-684;
H. S. Malik et al. , 1999, Mol. Cell. Evol. 16: 793-805} . The present inventor detected the signals of TRAS in P. ~u2,iga,nosa, suggesting that the TRAS-like members in this species have existed for 280 million years. The (TTAGG)n telomeric repeat sequence was found n.ot only in insects but also in Crustacea (W. K. Klapper et al., 1998, FEBS
Lett. 439: 143-146) . Thus, at the beginning of the emergence of the oldest insects, or even before, the sxmbiotic relationship between TRAS and the target (TTAGGy n sequence may have been established and evolved cooperatively.
[Example 5] Expression and purification of EN of a TRAS-like member The recombinant polypeptide containing an endonuc~.ease derived ~rom a TRAS Family member was expressed as follows.
Construction of the expression vector for TRAS1 EN
It was difficult to produce the full length polypeptide encoded by TRAS1 O~tf2 in E. coli ar using a baculovixus system. The present inventor produced partial polypepta.des consa.sting of only the endonuclease (EN) domain of TRAS1 0Rf2. Amino acid sequence alignments were constructed between AP endonuclease-like domains of various non-LTR retrotransposons and the amino acid sequence of TRA,S1 .
Based an the alignments, EN domaa.n of TR.ASI (TRA.SJ. ENy was estimated (Fig. 4). The TRAS1 EN domain was amplified by polymerise chain reaction (PCR) with Pfu TurboTM DNA polymerise (Stratagene) using s3?88 (S'-AAAAAAAACATATGCACGGCGAGCAGTGGAA-3' /SEQIDN0:2) anda9~528 (5'-AAAAAACTCGAGttaTTTTTGGAGTCTAATATTGAATACCATACC~3' / SEQ ID N0:
3 ) as px,imers and the 7~ B1 clone as a temp~,ate (Fig . 1 ) . Each primer was designed to contain NdeI - XhoI sites (underlined in the primers above) . The amplified DNA fragment was d~,gested with ,NdeT and Xhol and cloned into Ndel and XhoT sites of the pETl6b expression vectox (Novagen) , then the insert sequence in a candidate clone was confirmed.
The resulting plasmid, named pHisTIEN, contains the first 741 bp, corresponding to the first 247 amino acids of TRAS1 pRF2, and expresses TRAS1 EN which has an MGHHHHHHHHHHSSGHIEGRHM tag (SEQ ID NO: 1) comprising (His)1o at the N-terminus (Fig. 4B). This clone Was designed to stop translation at the nucleotides TTR, a complement to the stop codon o~ TA,A (lowercase 7.etters in the a4528 primer above) .
The nucleotide sequence of the coding region of TRAS1 EN is Shown in SEQ ID NO: 35, the encoded amino acid sequence is shown in SEQ
ID NO_ 36. The amino acid sequence of the TRAS1 EN with additional 22 amino acids (SEQ ID NO: 1) comprising text Hi,s residues, which can be expressed from pETl6b, was shown in SEQ ID NO: 37.
A mutant with a point mutation in the putative active site o~

EN (258th His in SEQ ID N0: 37 to Ala) was generated by a Quick Change Site-Directed Mutagenesis kit (Stratagene). The primers used for constxuct,ing the mutant clone, named pHisTIEN (H258A), were s747 (5'-CGATGAAGACCTAGTGAGCTCGGATGCCAAGGGTATGG-3' / SEQ ID NO: 4) and 5 a784 (5'-CGATACCGTTGGCATCCGAGCTGACTAGGTCTTCATCG-3' / SEQ ID NO: S).
Expression and pux~.fication o~ TRA.S1 EN
The pHisTIEN and pHisTIEN {H258A) were introduced into E. coli BL21 (DE3) pLys strain {Novagen) . A 50-ml culture of the transformant 10 was grown at 37°C until ODsao reached O.Ca. Isopropyl, ~3-D-thiogalactopyranoside (IPTG) was then added to a final concentration of 1 mM, and incubation was continued at 25°C for 8 hr. The accumulated protein was run on SDS-polyacrylamide gel electrophoresis (PAGE) and detected as a band with a calculated 15 molecular weight of 30.3 kDa. Western blotting analysis with anti- (His) 6 antibody (Boehxiz~ger Mannheim) confirmed that tha.s band corresponds to the target protein. Cells were pelleted by centrifugation and stored at -80°C. Purification was conducted by nickel chelat,ion chromatography according to the protocol, of His-Bind 20 kit fromNovagen (catalog number : 70156) . Freeze-thawed cell. pe~,~.ets of E. coli were suspended in 6 ml of binding buffer (5 mM irnidazol, 500 mM NaCl, 20 mM Tris-HC1 pH 7. 9) , Triton X-100 was added to a final concentration of 0.1%, and the cells were incubated at 0°C for 10 min. After cell extracts were centr~,~uged at 39,OOOx g at 4°G for 25 20 min, the supernatants containing soluble proteins were filtrated through 0.45 pm (pore size) membrane (Millipore) twice, and applied to Quick 900 cartridge (Novagen). The cartridge was washed subsequently with 20 ml o~ binding buffer and with, 10 ml of washing buffer (60 mM imidazol, 500 mM NaCl, and 20 mM Tris-HC1 pH 7.9) . Most 30 of the target proteins were eluted with 2 ml of elution buffer (300 mM imidazol, 500 mM NaCI, arid 20 mM Tris-HGI, pH 7.9). The eluted proteins were concentrated by ultra-filtration i,n storage buffer (500 mM NaCl , 50 mM Tris-HC1 pH 7 . 9 , 2 0% glycerol , and 10 mM 2-mercaptoetanol ) wa,th an. Ultxafree-MC centrifugal filter unit (Millipore) . A single 35 band was observed at the predicted size, 30.3 kDa, in SDS-polyacrylamide gel electrophoresis (PAGE) (F,ig~. SA). The resulting proteins (about 2 ml) were stored at -80'C in 10-~tl aliquots .
The concentration of the pratea.ns was determined to be approximately 0.5 mg/ml by comparing the intensity of the band on a Coomassie blue-stained SDS-PAGE gel with that of known amounts of bovine serum albumin (BSA) .
[Example6] Assaysfor endonucleolytic activitiesagainst telorneric repeats To investigate the cleavage activity of TR.AS1 EN, a sens~.ta.t~e assay was performed using double-stranded oligonucleotide 5~ - (TTAGG) n / 5~ - (CCTAA) ", as a substrate. fhe synthetic oligonucleotides (Nisshinbo) , (TTAGG) n (G-strand) or (GGT,AA) ~, (G-strand) , were 5' -end-labeled with T4 po~.ynuc~,eatide k,inase (Toyobo) and [y-32P] ATP (ICN) . The end-labeled oligonucleotide was incubated with the non-labeled complementary strand of the same length at 95°C for 2 min and gradually cooled to room temperature, thereby annealing the two oligonucleotides. The samples were separated on 28% polyacrylannide gels, and the precise locata~.ox~ of axe annealed oligonucleotide wasidentified by autoradiography. The band was cut out from the gel, incubated in elution buffer (500 mM ammonium acetate, 10 mM magnesium acetate, 1 mM EDTA, and 0.1% SDS) at 37°C for 4 hr, and the eluted oligox~ucleotides were ethanol-precipitated. The purified double-stranded oiigonucleotides were dissolved in 0.5 ml of TE buffer (1 mM EDTA, 10 mM Tris-HC1 pH 7.6) , and stored at -30'C.
About 1 ng of the substrate DNA in which n is 5 (5 repeats) was .incubated with 0.2 ~g of the purified TRAS1 EN protein in reaction buffer (50 mM PIPES [piperazine-N, N'-bis(2-ethanesulfonic acid) ] -HC1 (pH 6. 0) , 10 mM NaCl, 2 mM MgCla, and X00 Etg/ml BSA) at 25°C fox 60 min. The reaction was stopped by adding EDTA to a final concentration of 50 mM. The reaction mixture was denatured 3.x~ loading buffer containing 75% formamide at 95'C for 5 min, immediately chilled on ice, and run an a 28% palyacrylamide denaturing 'sequencing gel.
when the detecti,an of the reaction products o~as carried out with BAS
2500imaging analyzersystem(FUJTFILM),ladder patterns representing cleavage sites were clearly observed at 5 nucleotide intervals (Fi,g.
6A) . Similar ladder patterns were also detected for various lengths of substrates (n= 6 , 9 , and 12 ; data not shown) . These results demonstrate that T~i,AS1 EN cleaves the telomeric repeats at speca.fic sates.
A DNA fragment of silkworm telomexic repeat sequence (TTAGG) a9 with adapter sequences at both ends was inserted into pGL3 vector (Promega) . The plasmid (0.1 Etg) was digested with 50 ng of fRAS~, EN in the above-described reaction buffer at 2 5 ° C for 60 min .
Agarose gel electrophoresis of the reaction product confirmed that the plasmid was digested withTRASIEN. The mutant proteinH258A (describedbelow) showed no cleavage activity.
In order to deter~m5.n.e the optima. conditions for the cleavage reaction of TRAS1 EN, the present inventor quantified the nicking activities against the (TTAGG)n substrate by measuring the radioactivity of a specifically-cleaved product under the various conditions of pH, salt cox~centarata.o~n, and temperature. Under the standard condition, about 1 rig of the substrate DNA was treated with 0.2 ~g of purified TRAS1 EN protein in 10 ~l of the above reaction buffer at 25°C for 60 min, followed by separation in sequencing gel as described above. Usa.ng (TTAGG) 5 sequence as a substrate, a product generated by nicking at the l7thnucleotide, (TTAGG) 3-TT, was measured.
Quantitation of the reaction products was carried out with HAS 2500 imaging analyzer system (FUJIFILM).
First, the nicking activity o~ the gradually-increasing amount of TRAS1 EN protein was investigated. The radiolabeled DNA substrate (1 ng) was cleavedmost efficiently (approximately 13% of the substrate DNA was cleaved) with 0.2 ~g of TRAS1 EN (Fig. 5B) . Endonucleolytic activity of TRAS1 EN was affected by the pH o:~ the reaction buffer and temperature (Fig. 5C and D} . The nioking activity pealed in pH
6Ø Optimum cleavage of the substrate was observed at 25°C, and a suffic~,ent J.evel of the activity was observed up to 30-C. The activity was decreased at 37 °C (to approximately 7% of the activity at 25°C) . Furthermore, the concentrations of Na+, MgZ+, and BSA in the reaction buffer were optimized. The endonucleolytic activity of TRAS1 EN was detex~xnined in the reaction mixture containing various concentrations of NaCl. The activity became maximum at the NaCI
concentration of 0 to 10 mM, and it decreased to approx. 30 to 40%

at 50 mM, and to approx. 10% at 100 mM. The activity was almost lost at 200 mM. For MgClZ, the activity was maximum at 2 mM, approx. 40 to 50% at 1 mM, approx. 50 to 60~ at 5 mM, and approx. ~D to 60% at lOmM. Further repeated measurements of the enzymatic activity under various conditions revealed that the highest nicking activity of TARS1 EN was obtained in the reaction buffer containing 50 mM PIQES-HC1 (pH 6 . 0 ) , 10 mM NaGl , 2 mM MgCl2, and 100 ).~.g/ml BSA.
To determine ~rhethex these x'ricking activities indeed result from TRAS1 EN itself, a similar experiment Was performed using a mutant protein, H258A, which has a single-amino-acid substitution (His to Ala). The corresponding residue to histidine mutated in H258,A, is known to be essent9.al fox the cata~.ytic activity o~ the L1 EN domain (feng, Q. et al., 1996, Cell $7: 905-916), Exo III (Mol, C. D. et al. , 1995, Nature 23: 381-386) , and DNaseZ (Suck, D. et al. , 198$, Nature 332: 464-46$)_ The nicking activity on the telomeric repeats was not detected by ~i2S8 in the assay, indicating that the nicl~i,ng activity detected using the purified protein was not due to the contamination of endonucleolytic activities from E. coli {Fig. SA
and 6A). Therefore, it was demonstrated that TRAS1 EN itself has the r~icklng activity on the (TTAGG/CCTAA) repeats.
[Example 7] Enzymatic properties of TRAS1 EN
Telomeres are known to have a longer G-strand than the compleme~ntaxy C-strand and have the singJ.e-stranded 3' -overhang at the end of the chromosome (Henderson, E . , 1995 , "Telomere structure" , p . 11-34 , in E _ H . Blackburn and C . W . Greider (ed. ) , "Telomeres" , Cold Spring Harbor Laboratory Pres s , Cold Spring Harbor, N . Y . ; Wright, W. E. et al. , 199?, Genes Dev. 11 : 2801-2809) . To confirm that TRAS1 EN does not act on the single-stranded DNA such as the telomeric G~~stxand overhang, the present inventor assayed a single-stranded ola.gox~ucleotide as a substrate (fig. 68) . Any ladder patterns for the 5'-end-labeled (TTAGG) s was not obser~red, a.ndicating that TRASI
EN has no nicking activity on the single-stranded DNA. This result suggests that TRAS1 EN acts on double-stranded DNA as a target.
In order to confirm that the missing activity on the single-stranded telomeric repeat is not due to the influence of the i buffer used for the assays, the following experiment eras conducted.
After the TRAS1 EN reaction procedure on the single-stranded substrate was completed, its complementary strand was added and annealed. When the cleavage reaction was repeated again under the same condit~.ons , 5-nucleotide ladder bands were observed (data not shown) . This result demonstrates that the activity that was not observed in the reaction with the single-stranded substrate could be found by adding the complementary strand. This result shows that TRAS1 EN acts on double-stranded DNA as a substrate. It is also strongly suggested that the target of TRAS1 EN i.ri uivo is double-stranded DNA in the telomeric region.
The 5'-labeled 30-mer telomeric repeat (TTAGGG)s, which is a representative of telomeres of vertebrates including human, was used as a substrate in the above-described endonuclease assay of TRAS1 1S EN for the double-strandedsubstrate. The result revealed that TRAS1 EN cleaves the telomeric repeat sequence of vertebrates.
Six-nucleotide ladder patterns , ~,n which one ladder aoxxespox~d to one unit of repeat, were observed by electx-ophoretic analxsis of cleaved products (Fig. 7). Electrophoresis on sequencing gels identi f ied the cleavage site between T and A on the G-strand and between C and T on the G-strand, which is the same as the silkworm genome (see below) .
[Example $] Determination of TRAS1 EN cleavage site The precise cleavage site of (TTAGG) ~ by TRAS1 EN was determined by comparing the sizes of the cleaved DNA products with those of the (TTAGG) ~, sequences of known sizes (Fig. 8) . Various-sized telamexic repeat oligonucleotides as shown below were end-labeled with T4 polynucleotide kinase and [Y 3ZP) ATP, and used as precise molecular size markers: dGlO, 5'-(TTAGG)z-3' (SEQ ID NO: 6); dGl2, 5' - (TTAGG) z-TT-3' (SEQ ID NO: 7 ) ; dGlS , 5' - (TTAGG) 3-3' (SEQ ID NO
8); and dGl7, 5'-(TTAGG)3-TT-3' (SEQ zD NO: 9). A double-stranded substrate, in which the 5' -end of the bottoz7n (TTAGG) ~ strand (G-strand) was labeled, was cleavedbyTRASI EN ( 0 . 4 Elg) under the optimal condition as shown above. Reaction products were run on PAGE alongside with four DNA size markers mentioned above.

Major bands detected after digestion with TRASl EN were located at the positions upper than. the dGlO and dGlS markers by 2 bp, and at identical positions to the dGl2 and dGx? markers (Fig. $, left) .
This indicates that all of the cleaved bottom strands terminate with S the same structure of 5' - . . . TTAGGTT in their 3' -ends . Thus , 1,t a,s concluded that TRAS7. EN cleaves the (TTAGG) bottom strand specifically between T and A (T-A junction; TT ~. AGG) . This site corresponds to the boundary of the pyrimidine tract TT and the following purine tract AGG (pyrimldine-purine junction). When the (TTAGG)5 was digested 10 with TRAS1 EN, result~,x~g maj or bands were observed at 7 , 12 , 17 , and 22 by positions but not at 2 by pos ~,ti.on (data not shown) . This suggests that only 2 by (TT) tract upstream of the cleavage site is insufficient for TRAS1 ELI to exhibit its endonucleolytic activity on the bottom strand. In contrast, the upstream ?-nucleotide sequence (TTAGGTT) 15 or less as well as the downstream 3-nucleotide sequence (AGG) or less from the cleavage site are long enough to ensure the cleavage reaction .
When the double-stranded substrate composed of the labeled top (CCTAA) strand (C-strand) of the telomeric repeats was treated with TR.AS1 EN, the 5-nucleotide interval ladder pattern was also observed, 20 suggesting that the top strand was also Cleaved at the spec~.fic site in the repeat unit. Analysis of the cleavage site by the method as described above using four DNA size markers, dClO: 5'-(CCTAA)2-3' (SEQ ID NO: 101 ; dCx2 : 5' - (CCTAA) 2-CC-3' (SEQ ID NQ: 11) ; dCl5 5' - (GCTAA) ~-3' (SEQ ID NO ; 12 } ; and dC1? : 5' - (CCTAA) 3--~CC-~3' (SEQ
ID
25 NO. 13) , revealed that the top strand was specifically cleaved between C and T (C-T j unction; CC ,~ TAA) (Fig . 8 , right) . These results , showing specific digestion of (TTAGG/CCTAA)~, ,repeats by T~A.S1 EN, are consistent with the structure of the insertion sates o~ Tk~.ASI in the silkworm genome . The 5' -end of TR.AS1 in the genome appears to start 30 between C and T of the top tCCTAA) strand and 3' -end of its poly (A) n/
(T) n appears to terminate between G and T of the (TTAGG) bottom strand (Fig. 4A)_ However, three possible 3'-junction structures, i.e., ( I ) 5' -AGG .~ TT ( T ) n-3' ( 5' - (A) n AA ~. CCT-3' on the top strand) , ( I I ) 5' -AGGT ~. T ( T) n-3' ( 5' - (A) n A ~. ACCT-3' on the top strand) , and ( TI I ) 35 5' -AGGTT ~. (T) n-3' (5' - (A) ~, ~. AACCT-3' on the top strand) ( ~. ; j unction site) , were proposed since the oligo (A) /oligo (T) . sequence connects directly to the telomeric (CCTA,A./TTAGG)n sequence ~.zz va.vo. The 5'-and 3'-ends of TRASl correspond to the cleavage sites an the top and bottom strands, respectively. Therefore, it is expected that the specific cleavage site between C and T on the top (CCTAA) strand as shown above is consistent with the 5'-junction structure of TRAS1 in the genome, and that the specific cleavage between T and A of the bottom (TTAGG) strand is consistent with the 3'-junction structure of TRASI in the genome. In conclusion, the 3' junction site between TRAS1 and telomeric repeats is 5'-AGGTT,~(T)n-3'.
[Example 9] TRAS1 EN cleaves the G- (bottom) strand before it cleaves the C- (top) strand To detexmir~e whether TRAS1.EN has a strand preference for nicking, the time course of TRASX EN endonucleolytic activities against telomeric C-strand and G-strand was analyzed. The double-stranded oligonucleotides, in which only one strand was end-labeled, were incubated with 0.2 )1g of TRAS1 EN protein for 60 min. The change of 5-nucleotide ladder pattexns during the 50-min reaction is shown in Fig. 9A and B. Cleaved products from the C-strand gradually increased over 60 min. On the other hand, the cleavage reaction on the G-strand appeared to reach a maximum within the first 30 min, indicating that cleavage on the G-strand precedes that on the C-strand .
To demonstrate this result more clearly, the amounts of nicked substrates were quantified With the BAS2500 imaging analyzer and 2 S plotted against the reaction time (Fig . 9C) . The result clearly showed that the G-strand was cleaved in preference to the C-strand_ Judging from the structure in which TRAS1 is inserted, the botto~nn and top strands correspond to the G- and C-strand, respectively. The TPRT
modEl suggests that cleavage on the bottom strand generates an exposed 3' -hydoxyl group that serves as a primer for the reverse transcription of the RNA template prior to the top-strand cleavage (Luau, D. D.
et al . , 1993 , Cell 72 : 595-605 ) . Thus , the result that cleavage on the G-strand before the C-strand is Gonsistent with the proposed TPRT
model.
[Example 10] TRAS1 EN specifically cleaves the target site of a long DNA substrate All genomic copies of TRAS1 identified so far exist in the specific site of telomeric repeat {Okazaki, S. et al. , 1995, Mol. Cell. Biol.
15 : 4545-4SS2 ) . To investigate whether the site-specificity o~ T~tAS1 for telomere is defined by the EN domain itsel.~, the nicking activities of TR.A51 EN for a long DNA substrate including both telomeric and non.-telomeric repeat sequences were assayed. The subsfirate contained three repeats of telomeric sequence unit (15 nucleotides) flanked on both sides by 30 by of non-telomeric sequences which Were 1p randomly selected from pBR322 (Fig. 10C). Based on Comparison of sizes of bands of the nicked substrates with those of several szP-labeledola.gonucleotides (b32, b37, b42, t32, t37, andt42) , major bands resulted from the reaction were thought to be produced by spec9.f,ic cleavage within the telomeric sequence but not ~,x~ nonwtelomeric sequences . The TTAGG bottom strand was mainly cleaved between T and A at the positions of 37 and 42 from the 5' -end of the telomeric repeat unit (shown as solid arrowheads) (Fig. 10A) . The CCTAA top strand.
was cleaved between C and T at position 32 from the 5'--ex~.d (shotcxa, as open arrowhead) (Fig. 10B). These cleaved products gradually a.ncxeased during the 50-min reaction. It was also shown that the cleavage reacta.on on the 'Gop-strand occurred more slowly than that on the bottom strand. Cleavage at other possible target sites (b32 , t37, and t42) in the {TTAGG) unit was not observed, suggesti,x~g that essential sequences recognized by TR.AS1 EN are absent around these regions.
Several minor bands were observed zz~ both bottom- and top- strand reactions. The non-telomeric sequence ([TTAGG] -46TCA48) adjacel~t to the 3' -end of the telomeric repeat is similar to TTA of the TTAGG
unit and may cause the minor cleavages downstream of b42 in the bottom-strand reaction. The non-specific bands inn the middle of the gel in the top-strand reaction seemed to appear due to incomplete denaturation of the oligonucleotides because these bands were also observed before the reaction with TRAS1 EN {time zero) . When another long substrate including {TTAGG)3 interposed by 125--xlucleotide non-telozneric sequences at both ends was used, TRAS1 EN cleaved DNA
only within the telomeric repeats at the same cleavage patterns shown above (data not shown} . Thus, it is concluded that EN domain of TRAS1 specifically targets the telomeric repeat and is mainly responsible for the target specifa.city o~ TRASI eJ.ements ~or the telomeric repeat sequence.
[Example X1] Sequence involved in target-site specificity of TRAS1 EN
To investigate how TRAS1 EN protein recognizes a target site, several substrates having mutant bottom. strands were prepared, annealed to the complementary (top) strands to produce the double-stranded substrates , thenTRA.SX EN nicking activities far each substrate were compared on PAGE. One of nucleotides of the TTAGG
unit was altered to a cytosine (C) residue to produce five different subs'~rates, (CTAGG)5, (TCAGG}s. (TTCGG)s, (TTACG)s, and (TTAGC)s, (mutated nucleotides are underlined) (F,ig. 11A) . when, the first two thymines of TTAGG were mutated, non-~specif~.c bands were somewhat observed, but the nicking reaction of TRAS1 EN itself was not inhibited .
This resu~.t suggests that the first and the second T' s of the TTAGG
unit are involved in cleav'ag'e site recognition by TRAS1 EN. The purine-pyrimidine tract of the original (TTAGG) 5a,s conserved in the two substrates. Major cleavage sites of (CTAGG)s and (TCAGG)5 substrates are T-A junction and C-A junction, respectively, which axe consistent with the nicked position in the positive control (TTAGG) s substrate. when other three substrates (TTCGG) 5 , (TTACG) s, and (TTAGC) s, were used, the TRAS1 EN cleavage products were greatly reduced, indicating that the third A, the fourth G, and the fifth G of the TTAGG unit axe important ~or endonucleolytic activity of TRAS1 EN.
For further characterization, a double-stranded dTATA
[5' ~ (TTAGG) 2-TTATAGG- (TTAGG) 2-3' ] substrate (SEQ ID N0: 14) was prepared. The substrate contains duplicated cleavage sites (underlined) , and has the TATA sequence interposed by intact telomeric repeats . The upstream TA site in the TATA sequence was cleaved more effectively than the downstream TA site (T~.ATA, Fig. 118). This result suggests that the telomeric repeat seq't~ence upstream of the cleavage site on the.TATA sequence is more important far n~.cking ss activity of TRAS1 EN, while the downstream sequence of the TATA may have little effect on the cleavage reaction. The TT just adjacent to the TATA sequence (TT ~. ATA, underlined) , however, may have little effect on the endoz~ucleolytic act~.vity of TEFrSI EN, as shown in Fig.
11A. Furthermore, results presented in Fig. 8 demonstrate that the TRAS1 EN cleaves the oligonucleotide consisting of only the upstxeam 7 by and downstream 3 by of the TA cleavage site, i.e. TTAGGTT,~
AGG ( ~. , cleavage site} . Combining these results with the results shown in Fig. 11, it appears that the upstream sequence of the cleavage site, especially the upstream AGG (AGGTT ~. ATA, underlined) sequence, is izmportant for the sequence-specific digestion of the bottom (TTAGG) strand by TRAS1 EN.
Further analysis was performed with respect to the 7 nucleotides upstream (5'-side) of the cleav~xge site which may contain the first xecogns,tiox~ region of TR.AS1 EN. TRAS1 EN could recognize and cut another type of substrate (5'-(dC}~-AGGTT ~. AGG-(dC}5-3') that includes the 5-nucleotide sequence upstream of the T-A junction, but did not cleave substrates (5' - (dC) 5-TTAGG-3' or 5' - (dC) 5-GTTAGG-3' ) including the only 2-nucleotide or 3-nucleotide upstream sequence 24 (data not shown) . These resu,~ts also indicate the importance of the upstream AGG sequence.
The results shown above (Fig. 11A) indicate that all nucleotides in the TTAGG unit are somehow involved in site-specific cleavage of the bottam strand by TR.A.S1 EN. Zn oxder to identify the nucleotides essential for TRAS1 EN activitymore specifa.cally, a series of (TTAGG) s mutant substrates with a single-nucleotide substitution was produced and used to figure out how TR.AS1 EN recognizes the nucleotides around the cleavage site . zn these mutant substrates , a nucleotide at various positions within the range of position -7 to approximately +8 (where 0 is cleavage site} from the cleavage site is substituted with cytosine (C) (Fig. 12) . Significant difference was not observed in the nicking activities for the mutant substrates with a mutation in any of positions +8 to +4 (Fig. 12) and in positions upstream of -8 (data not shown). When a nucleotide at positions -7 to about -3 in the 3S 5'-flanking region of the cleavage site was changed, the nicka.r~g activity of TRAS1 EN reduced to about 40% to 80% of that for non-mutated substrate ("font." in Fig. 12). Among them, the mutant substrate containing a mutation in -3 had a severe effect on TRAS1 EN activity (3$% o~ control) . The nicking activity was also greatly inh3,bl,ted bx substitut~.ons a.n the 3' -~~.anking region {+1 to +3) . Especially, 5 substitutions in +1 [(TTAGG)zTT~.CGG(TTAGG)Z] and +2 [(TTAGG)2TT~.
ACG{TTAGG)2] resulted in reduction in nicking activity to 20% and 25% of cantxol, respectively. The results in Fig. X1 showing the great reduction in nicking activity for three substrates , (TT ~. _CGG) n, (TT ~. ACG) ~,, and (TT .~ .AGC) ~,, support the above observation. tTT ~.
AG_C) n 10 has mutations at both the +3 and -3 positions from the cleavage sites, and thus would more strongly inhibit the nicking activity than a single substitution mutant ix~ either +3 ox -3 in Fig. X2, In contrast, the TRAS1 EN activity increased for the substrates mutated at position -1 or -2 in appearance. Both substrates were, 15 howevex, c].ctaved in non-specific sites upstream of the cleavage site (Fig. 12 top) , suggest,in,g that the target site speci~ici~.y o~ TE,A,SI
EN is relaxed due to the -2 or -1 mutations . These xesults suggest that, further considering the aberrant cleavage patterns of (CT
~. AGG) n and (TC ,~ AGG) n in Fig. 11 , the TT sequence of (TT ~. AGG) is 20 involved in determining the precise cleavage site by TRAS1 EN.
Summarizing the resultsof the sa.ngle-nucleotide substitutions, the upstream of -8 (data not shown) and downstream of +4 have little effect on the nicking activity of TRAS1 EN. This finding is consistent with the result o~ Fig. 8 ~rhi.ch shows that TRAS1 EN cleaves even the 25 short sequence TTAGGTT ~. AGG ( ~. ; cleavage site) . The TR.AS1 EN
cleavage reaction was influenced by mutations at positions -7 to +3 ~rom the cleavage site, and was greatly influenced by especially -3 to +2 mutations ( 5' - . . . TTAGGTT ,~ A_GGTT . . . -3' ; underlined) . These residues in the {TTAGG)n bottom strand would be essent~.al ~or the 30 sequence-specific recognition and digestion by TRAS1 EN. This suggests that the GTTAG pentamer an the bottom strand of the telomeric repeat sequence contains the first recagna.ta.on region of TEA.S1 EN.
[Example 12] Cloning of TR.AB-like members from insects other than 35 silkwoxxn, To analyze the detailed structure of TR,A.S-like members from various insect species , the present inventor amplified unknown targets by PCR using the Consensus-Degenerate Hybrid Qligonucleotide 8rimex (CODEHOP) strategy (Rose, T. M. et al. , 1998, Nuc~eia Acxds Res. 26:
1628-1635) which allows eff~,c~,ent az~npl5,f~,cation of unknown DNAs corresponding to multiply-aligned protein sequences In general, the reverse transcriptase (RT) domain of x~ori-LTR
acetxotransposons consists of seven conserved regzons (Xiang, Y. and Eickbush, T. H. , 1990, EMBO J. 9: 33533362; Nakamura, T. M. et al. , 1997, Science 277: 955-959). The endonuclease (EN) domain in the N-terminal region of ORF2 is thought to be responsible for taxget digestion, and contains the relatively conserved residues amo~ag manx non-LTR retroelemerits . LJsirig PCR, the inventor amplified the region from EN to RT domains of TRAS-like members (approximately 70~s of RT
domain between the C and D domains of RT) . At the beginning of the 1S experi~m,ent, sequence information of TRAS1, TRAS3, and TRAS4 (SEQ ID
NO: 42) of the silkworm was analyzed. Based on the sequence comparison between these TRAS members and other non-LTR elements, the inventor designed a primer set within the conserved regions of EN and RT, to amplify TRAS-like members specifically, but not other xetroe~.emer~ts .
Further, to amplify the TRAS-like memlaer from i.xzsects that are phylogenetically distant from the silkworm, the inventor designed and used two CCfDE hybrid oligonucleotide primers , CH-VVGI and CH-FADD .
They consist of a 3' -degenerated core region and a longer 5' -consensus clamp region. Primers used for PCR axe as follows: TR4QSAG, 5'-AGGGCGCAAGATCTTCCAAAGCGCTGGCCC-3' (SEQ TD NO: 29); TRSGYKG, 5'-GTTCTTCAACAGTGAGGGGATATAAAGGAGC-3' (SEQ ID NO: 30); TR6SLEN, S'-GCCTGGCAATACTCTCCACTGTTTTCGAGAC-3' (SEQ ID NO: 31); GTVK, S' -~GGGACTGTiVAAAGCNGCNAT-~3' (SEQ ID NO: 32 ) ; Cat-WGI , 5'-ACCACCAACAACATCGTCGTRGTNGRRRTC-3' (SEQ Ib NO: 33); CH-FADD, 5'-GTCTCCGTGGAAAACCAGGACGACRTCRTCNGCRAA-3' (SEQ ID NO: 34).
To amplify each of TRAS-like membexs , the followa.z~g pri,mex sets were used in PCR reaction: TRAS4, TR4QSAG + CH-FADD (annealing at 61 ° C) ; TRAS5 , TRSGYKG + CH-FADD ( 51 ° C) ; TRAS6 , GTVK +
TR6SLEN ( 51 ° C) ;
TRAS of D, japonica, CH-tT'VGI + CH-FADD (51 °C) ; TRAS of S.
cyx~thia, GTVK + CH-FADD (51°C) . PCR was performed for 35 cycJ.es o~
94°G for sec, 51'C to 61 °C for 45 sec, and 72 °C for 90 sec, Amplified PCR

products were cloned into the pGEM(R}~T Easy vector (Promega) and sequenced with AHI-310 DNA sequencer (PE Applied Biosystems)_ TRAS-like sequences were successfully isolated ~rozn too Lepidoptera, Sarrria Cynthia (saturniid silkworm} and Dictyoploca japonica (giant silkmoth} . Seven clones were isolated from S, cyrlthi8, which were divided into three families, TRASSC3 (SEQIDNO: 54) , TktASSC4 (SEQ ID N0: 57) , and TRASSC9 (SEQ ID N0: 60} , based ox~, sequence comparison. From D. japonica, one clone, named TRASDJ (SEQ ID NQ:
51) was isolated. Furthermore, EN-RT regions for two additional families, TRA55 and TRAS6 (SEQ ID NOs: 45 and 48, respectively) , were successfully a.solated from the si.lk~rorm, Bombyx xrro,ri .
Tables ~, and 2 show the sequence identities at nucleotide and amino acid level among various TRAS-like families in Bombyxmori (Table 1) and in Lepidoptera (Table 2) , when each sequence of their isolated region (from EN to RT region, see Figs. 1~ to 17} is aligned to allow the maxim matching. The regions from EN to RT of B. moxi TRAS members are highly conserved (about 50% to 72% identity iri their amino acid sequences and about 56% to 70% identity in nucleic acid sequences) (Table 1) . The amino acid identity between these TRAS members and RlBzn is only 25% to 27%, probably reflecting structural dif;~erences in functional domains , such as the reg~,on involved in sequence-specificity, between R1 and TRAS family. Similar results were also obtained in a comparison of amino acid sequences among TRAS-lake members of Lepidoptera (about 3?% to about SS~t amino acid.
identities within TRAS-like members, about 25% to about 30% a_dexlt~.ti.es among TRAS-like members and RlBm; Table 2} . Among known families, R1 is phylogenetically closest to TRAS-like members. The critical di~ferez~ce between amino acid sequence identity of the region from EN to RT domains among TRAS-like members (about 37% to about 70 %) and that between TRAS-like members and RlBm (about 25% to about 30%) indicates that the TRAS-like members isolated here can be classified into the TRAS family. This is also supported by the phyloqenet~,c tree constructed based on amino acid sequence from EN to RT by neighbor-joining method (Saitou, N. and Nei, M., 1987, Mol. Biol.
Evol. 4: 406-425) as shown in Fig. 13. As shown in the figure, all TRAS-like members isolated from lepidopteran insects constitute a single phy~.ogenetic group and are close to other site-specific members , Rl, RT1, and SART1.
Table 1 XdeDtacal ztucleotides I Copse~o~aus uucleotidea Nucleotide identity TRAS TR,A,S TRAS TRAS TRAS TRASDJ R~Bra 1248/ 1245/ 1182/ 1279/ 10'9/ 839/
r~ 1 1950 1962 1931 1911 1787 1977 64 0% 63.5% 61296 66,9% 80.4% 42~X
b 400/650 1322!19631238!19a71318119111034/18468D3/1973 s1.5% 87.3% 638% 891% 660% 407X

4 398/654425/855 1218119641348/t9221060/184784611988 809% 849% 621% 70.1% 66~% 42.8%

~0 3811649402 397 853 U ~~ 5 849 ! ~ 1221119081 D38/ 818/
.~ l 1842 t 984 .0 58..7% 61.9% 80.8% 64.0% 58.4~C 41 1 %

' 412/6374351837481/641 404!838 1074/ 82911947 184a sa7% 683% 71.5% 629% 58.3% 42b%

310/815308/615322/619 308!614318/814 807/1883 50 4% SD 116 52.0% 502% 518% 42~~, i~ ~1~~ 1831670173(669187(882 1801670188/866176/886 27.3% 25.9% 27116 26,9% 27.896 2T 7%

Table 2 Tdentical nucltotidea / Consensus nucleotides Nuclaotidc idaDtity TR.ASSC3TRASSC4 TRASSC9 TRAS 1 TRASDJ R1$,rn~

TRASSC3 1005! 1087/ 1008! 1004'/ 850/

52.0% 56.5% 52.3% 54,0% 435%

2841644 1041/192410$2 ~ / 1025/1858804!1947 41_0% 541.6 56.3% 552% 41.3%

: TRAS5C9 358/ 3 1 8/ 1027/ 1036/ 797/

U 55.4% 49.3% 53.8% 55,991 41,191, .~y 298/x41 3~g1841 304/x38 107911787839/197y TRAS

455, 49.9% 47.6% 60.4% 4$.4%

~
~

. J 299/620 2511678 2941676 3101815 807!1883 482% 37.0% 43.5% 60.46 42.9%

RlBm 198/$7o 169/663 192/664 183!670 178/sss 29.8% 25.5% 28.6% 27.a% 27..7%

Example 13] Regions highly conserved only within the TRAS-like members Figs . 14 to 17 shows sequence alignments of am,iz~o acid sequences ~rom EN to RT domains of nine TRAS-like families of Lepidoptera, three site-specific retroelements, Rl,Bm, S.~,RT1, and RT~.Ag, Drosophila telomeric retrotxansposon TART, and human L1. The amino acid sequences were aligned using CLUSTAL W multiple alignment options {Thompson, J. D. et al., 1994, Nucleic Acids Res. 22: 4673-46$0)'.
The alignments were arranged for maximum matching. The highly conserved amino acids were observed mostly within EN and RT domains , but rarely in the region between EN and RT . These highly conserved amino acid residues are probably involved in respective enzymatic activity of each domain, some of which have already been reported (Feng, Q. et al. , 1996, Cell 87: 905-916; Lingner, J. et al. , 1997, Science 276: 561-567). Interestingly, many amino acid residues are conserved only within the region between EN and RT of TRAS-like members .
In particu],ar, 12 amino acids in the boxed region {TRAS Specific Region, TSR) , and about 80 amino acids, which may form a putative Myb-like domain (see below), are highly conserved (Fig. 15).
The endonuclease domain is believed to make the first nick on the target DNA and initiate target primed reverse transcription (TPRT) (Yang, J. and Eickbush, T. H. , 1998, Mol. Cell. Biol. 18: 3455-3465) .
As shown above, the endonuclease domain of TRAS1 expressed in ,E. coli could cleave a stretch of (TTAGG/CCTAA)" double-stranded substrate in a very specific manner. Thus, the TRAS member endonuclease itself 5.s a.z~9.tially responsible for recognition of the target (TTAGG) n sequence. Based on the primarx sequence alignments in Figs. 14 to 17 , the amino acid residues conserved specifically ~aithiz~ the EN domain o~ TRAS~,like members are not apparent. However, two regions named En-A and EnWB iza F~,g. X4 are relatively conserved within TRAS-like members {dotted box). The En-B region is also conserved in RIBm.
For further searching for such regions, the analysis on the basis of secondary or more fine structure of EN domain will be needed . Strict target specificity msy also be ensured by other region{s) than EN
domain. One possible candidate for such a reg~.on is the above conserved regions between EN and RT domains.

7$
Regarding the (TTAGG),~ recognition domain in the TRAS-like members , it is suggestive that human telomex,ic repeat binding factor hTRF1 bind to (T'TAGGG) n through its Myb domain (FCbnig, P. et al . , 1998, Nucleic Acids Res. 25: 1731-1740) . In addition, a recent report suggested that a site-specific retroelement R2 a~.so retained the Myb-like domain near the N-term~.nal region of the ORF protein (Burke, W . D . et al . , 1999 , Mol . Biol . Evol . 15 : 502-511) . The Myb domaix~
is usually composed of 50 to 60 amino acids forming three helices, and is found in many DNA binding proteins , such as MYB oncogene (Ogata , FG. et al., 1994, Cell 79: 639-648) and Engra~.led (Aden, S. E. and Saner, R. T. , 1994, Biochemistry 33: 9187-9194) . Tn order to assess the Myb-like domain in the (TTAGG)n-specific retrotransposon TRAS, the present inventor searched the helix structure within EN to RT
domains us~,ng a secondary structure prediction program (Hitachi, DNASIS version 3.7). A helix-turn-hr~lix motif between the EN and RT domains was found not only in all TRAS-like members, but also in other non-LTR retrotransposons, R1, SART1, RT1, TART, and L1.
In the Myb related pxoteizxs so far reported, amino acid residues are not conserved strictly in three helices , but hydrophobic or charged amino acids are conserved at the specific positions (Kt3nig, P. and Rhodes, A. , 1997, Trends 8iochem. Sci. 22: 43-47) . To investigate whether these Charged or hydrophobic residues are conserved within the putative helix-turn-helix regions, amino acid sequences in corresponding regionsof TRAS and other retrotransposons were compared.
The amino acids conserved among known Myb domains were also highly conserved among TRAS-like members, while some of these amino acids were not conserved in other retrotransposons (Fig. 18) . The h9.ghly conserved regions between EN and RT domains of the TRAS-like members may znteract with DNA 'through Myb-like function. There is a possibility that some cysteine-hi,stidine motifs found in both oRF1 and ORF2, and/or this Myb-like domain, may contribute to recognizing the longer arrays of telomeric repeats or telomeres.
Industrial Applicability The present invention provides polypeptides Cleaving the telomeric repea'G sequence and DNAs encoding the polypeptides . The DNAs and polypeptides of the present a.n~rex~tion enable to genetically manipulate the telomeric repeat sequence. Loss of the telomeric repeat sequence is believed to result in suppression of Ce~,l prpliferation, senescence or cell crisis. In addition, elongation of the chromosomal telomeric repeat seguezace in cancer cells due to activation of telomerase is believed to be important in the proJ.i~eration of cancer cells. The DNAs and polypeptides of the present ix~vez~tion can be used as new tools to control the ce~.l functions .

lrza <110> FUJIWARA, Haruhil~o <120? POLYPEPTIDES THAT CLEAVE TELOMERIC REPEAT SEQUErICES
<130> SEN-A0002P
<1~0>
<141>
<X50> JP 2000-1,~8fiXl.
« 51> 2040-OS-X6 <1.fi0> 65 <170~ Patentln Vex. 2.0 <210> 1 <211> 22 <212> PRT
<213> Artificial Sequence <220>
<223> Description of Artificial Sequence: His tag <400> 1 Met Gly His His His His His His His His His His Ser Ser Gly His 1 s to x~
Tle Glu G1y Arg His Met <210> 2 <2XX> 31 <212> DNA
<213> .Artificial Sequence <220>
<223> Description of Artificial Sequence: artificially syntEesized sequence <400> 2 aaaaaaaaca tatgcacggc gagcagtgga a 31 <210> 3 C211? 45 C212> DNA
C213> Artificial Sequence C220>
C223> Description of Artificial Sequence: artificially synthesized sequence C400> 3 aaaaaactcg agttattttt ggagtctaat attgaatacc atacc 45 C210> ~
C211? 38 C212> DNA
C2X3> .Artificial Sequence C220>
C223> Desex7;ption of Artificial Sequence: artificially synthesized sequence C400> 4 cgatgaagac ctagtgagct cggatgccaa cgg~tatgg 38 C210> s C211? 38 C212> DNA
C213? Artificial Sequence <220>
C223> Description of Artificial Sequence: artificially synthesized sequence C~00> 5 ecataccgtt ggcatccgag ctcactaggt cttcatcg 3$
C2J.0> 6 C211> 10 C212> DNA
C2X3> Artificial Sequence C220>
C223> Dese~xpti,on of Artificial Sequence: artificiglly synthesized sequence C400> 6 ttaggttagg C210> ?
C211> 12 C212> ~1,A, C213> Artificial Sequence C22o>
C223> Description o~ A~rtx~icial Sequence: ayct~.fi,ci,al.ly syntb~esxzed sequence «00> 7 ttaggttagg tt 12 C210> 8 C211> 15 C212> DNA
C213> Artificial Sequence C220>
<223> Description of Artificial Sequence: artificially synthesized sequence C400> 8 ttaggttagg ttagg 15 C210> 9 C211> 17 C212> DNA
C2].3> Al~tx~xciaJ, Sequence C220>
C223> Description of Artificial Sequence= artificially sYZithesized sequence C400> 9 ttaggttagg ttaggtt 17 C210> 10 C21,1 > 10 C212> DNA
C213> Artificial Sequence C220>

<223? Description of .Artificial Sequence: artificially synthesized sequence <400? 14 cctaacctaa 10 C210> 11 C211> x2 <212> DNA
C213> Arti~icxa], Sequence C220>
C223> Description of Artificial Sequence: artificially synthesized sequence C400> 11 cctaacctaa cc 12 <210> 12 <211> 15 <212> DNA
<213> Artificial Sequence C220?
<228> Description o~ ,Asti~icial Sequence: artx~i.ci,al,ly synthesized sequence C~00> X2 cctaacctaa cctaa 15 C210> 13 C211> x7 C212> DNA
C213> Artificial Sequence C220>
C223> Description of Artificial Sequence: artificially synthesized sequence C400> 13 cctaacctaa cctaacc 1?
C210> 14 C211> 27 $/7$
C212> DNA
<213? Artificial Sequence Czzo>
C223> Description o~ Artificial Sequence: artificially synthesized sequence C4a0> 14 ttaggttagg ttataggtta ggttagg 27 C210> 1,5 C21X> 1,1,9 C212> 11NA
C213> Bombya mori C400> 15 cagtctgccg tggccctccg cagcgaacac gcgcgtccCa agttgctccg cagtgattta 60 cgctcgasas atcattaaaa attccgaaag tggtgaaaag tgtaccaaaa taacagtgc 119 C210> 16 C211> 1I8 C212> DNA
C213> Bombya mori C~aa> 1s agtcagccga ggccctccgc agcgaacacg cgcgtcccaa gttgctccgc agtgatttac 80 gctcgaaaaa tcattaaaaa ttccgaaagt ggtgaaaagt gtaccaaaat tgcagtgt 1x8 <2X0> 1.?
C211> X1.9 C212> DNA
C213> Botobyrz nuox~x C400> 17 cagtcagccg tggccctccg cagcgaacac gcgcgtccca agttgctccg cagtgattta 60 cgctcgaaaa atcattasaa attccgaaag tggtgaaaag tgtaccaaaa ttgcagtgt 119 C210> 18 C211> 120 C212> DNA
C213> Bomblrx mori <400> 18 cagtctgccg tggccctccg cagcgaacac gcgcgtccca agttcctccg cacgtgattt fi0 acgttcgaaa aatcattaaa aaattccgag agtggcaaaa gtgcatcgat tttaatgtcc 120 C210> 19 C211> 120 C212> DNA
C213> Bombyx ~aoxi.
C400> 19 cagtctgccg tggtccctcc gcagcgaaca cgcgcgtccc aagcgctccg cacgtgattt 60 acgttcgaaa aatccttaaa aaattccgag agtggcaaaa gtgcatcgat tttaatgtcc 1.2Q
<210>20 C211>1,15 <212>DNA

<213>Bombpx mori C400> 20 agtctgctgg cagccacgac gcgagcgcca cgaaaataca cgtccgcgtt taaaaaatcc 60 gaaaaaatag tgtcaagtgc atggattgca gtgaagtggt cttaaaagaa gcgga 115 C210> 2~
C2XX> 1X4 C212> DNA, <213> bomby~ moxx C~00> 2X
ctgctggcag ccacaaacgc gagcgccacg aaaatacatc gttccgcgtt taaaaaatcc 60 gaaaaatagt gtcaagtgca tggattgcag tgaagtggtc ttaaaagaag cgga 114 C2~0> 22 C21X> 1.X7 <212> DNA
<213> Bomby~ ~uoxx C400> 22 agtctgctgg cagccacgac gcgaacacgc gcttccttcc actgccgcac ttaaaattac fi0 gaagcctggc gaccaagagc aatgaaagaa cacaactgct atgacgtgGg tatattt 117 C2X0> 23 C21.1> XZX
C2X2> ANA
C213> Botnby~ Maori C400> 23 ggagtctgct ggcagccacc gacgcgaaca cgcgcttccc agcacgctcc gtctattaaa 60 ttcacgaaas aagaattttt agtcaaaata tacaagtgtt tggtgtctcc gtacgcgtct 124 a 12X
C210> 24 C211> 120 C212? DNA
C213> Bombyx mori C400> 24 agtatg$agC 8tgaaagtgC gagaagCata &a$gagagaa Ctag8atg8a tgagtBaGtt 6O
agactaaaaa agacccggtg atctcacgat cggggaaagg cattaaaaaa aaatcttatt X20 C210> 25 C2x1> x05 C212> DNA
C213> Bombyx mori C400> 25 agtatgcggC ttgaaagtgc gsgasgaata catgtgagaa cttgaatgaa tg$gtatata 60 gatastagga ccgtgttctc atgatcggga agcagaacaa taaag 106 C210> 26 <211> 26 C212> DNA
<213> Bombyx mori C400> 26 caacgtcagg ggaaagcggg atgtat 26 C2~0> 27 8/7$
C211> 26 C212> DNA
C213> Bomby~ ~aoxi C9~00> 27 gcgagatggt ggactcgcag aagtcg 26 C210> 28 C211> 14 C212> DNA
C213> Bombyx mori <400> 28 gcagsacaa't 888g 1~
Czlo> zs C211> 30 <21z> DNA
C213> Artificial Sequence C220>
C223> Description of ,Artificial Sequence: artificially synthesized sequence C~00> 29 agggcgcaag atcttccaaa gcgctggccc 30 <210> 30 C2X1> 31 G212> DNA
C213> Artificial Sequence <220>
C223? Description of Artificial Sequence: artificially synthesized sequence <400> 30 g~ttc~ttcaac ag'EgaggBHa ta'taaaggag c 31 C210> 31 C211? 31 Czlz> DNA
C213> Artificial Sequence C220>
C223> Description of Artificial Sequence: artificially synthesized sequence C400> 31 gcctggcaat actctccact gttttcgaga c 31 <210> 32 <211> 20 <212> DNA
C213> Artificial Sequence C220>
C223> Description of ~A,x~tificial Sequence: artificiaJ.ly synthesized sequence C400> 32 gggactgtna aagcngcnat 20 C210> 33 C211> 30 C212> DNA
C213> Artificial Sequence <220?
C223? Description of Artificial Sequence: artificially synthesized sequence ~400> 33 accaccaaca acatcgtcgt rgtngrrrtc 30 <210> 34 <211> 36 C212> DNA
<2X3> Artificial Sequence <220>
C223> Description of Axtificial Sequence: artificially synthesized sequence C4oa> 34 gtctccgtcg aaaaccagga ccacrtcrtc ngcraa 36 CZxO> 35 1U~78 <211> 741 <212? DNA
<213> Artificial Sequence C220>
<223> Description of Artificial Sequence: TRASI EN
<Z20>
<221> CDS
<222> (1) . . (?41) <400> 35 cac ggc gag cag tgg aat acc act get aaa gtt aga cca sag eat gga 48 His Gly Glu Glz~ Trp Asn Thx Tbur AJ.a Lys Val A~rg Pro Lys Asn Gly ccc cca tca cec ccc tac aga gtt ttg caa gca aac ctc caa agg aaa 9fi Pro Pro Ser Pro Pro Tyr Arg Val Leu Gln A1a Asn Leu Gln Arg Lys aaa tta gca acc get gag ttg gcc att gaa gcc get act cgg aaa get 144 Lys Leu A1a Thr A1a G1u Leu A1a I1e Glu Ala Ala Thr Arg Lys Ala gca ata gcc tta att caa gag cca tac gtg ggc ggg gca aag agt atg 192 Ala Ile Ala Leu Ile Gln Glu Pro Tyr Yal Gly Gly Ala Lys Ser Met aaa gga ttc cgg ggc gta agg gtc ttc caa ~agc act gca caa gga gat 240 Lys Gly Phe Arg Gly Val Arg Val Phe Gln Ser Thr Ala Glc; Gly Asp ggg act gtc aaa get geg ata get gtc ttt gat cac gac ttg gae gtg 288 Gly Tbx Val Lys Ala Ala Zle Ala Val Phe Asp Ni,s Asp Leu Asp Val ata cag tac ccg caa ctc acc acc aat aac atc gtg gtg gtg ggg atc 336 Ile Gln Tyx' Pro Gln Leu Thr T~ Asn Asn Ile Val, Vat. Val Gly Ile 100 x05 110 cgg acc agg gce tgg gag atc acg ctg gtg tcc tat tac ttc gag cca 384 Axg Thx~ Axg Ala Trp Glu Ile Thr Leu Val Ser Tyr Tyr Phe G1u Pro gac aag ccc ata gag tct tat ctt gaa cag atc aaa agg gta gsg aga 432 Asp Lys Pxo Ile G1u Ser Tyr Leu Glu Gln Ile Lys Arg Va1 G1u Arg x30 135 140 aaa atg gga ccc aaa agg cta atc ttt gga ggt gac gcg aat gcc aag 480 Lys Met G1y Pro Lys Arg Leu Ile Phe Gly Gly Asp Ala Asn Ala Lys 145 150 1.55 1C0 agt acc tgg tgg ggg tgc aag gas gat gat gca cga gga gat caa ttg 628 Sex Thx Tarp Txp Gly Cys Lys Glu Asp Asp Ala Axg Gly Asp Gln Leu Ifi5 170 1?5 atg ggg act ctc gga gag ttg ggc cta oat att cta aac gag gga gat 576 stet Gly Tbur Leu Gly Glu Leu GIy Leu HLs Zle Leu Asn Glu Gly Asp 1$0 1.85 X90 gtc ccg acs ttt gat acg atc aga gga ggt aag agg tac caa agc cgc 824 Va1 Pxo T~ Phe Asp Thr Ile Arg Gly GZy Lys Arg TYr GIn Sex~,Axg gtg gat gtg acg ttc tgt acc gaa gac atg ctg gac ctg ata gat gga 672 Va1 Asp Va1 Thr Phe Cys Thr Glu Asp Met Leu Asp Leu Ile Asp G1y tgg cga gtc gat gaa gac cta gtg agc tcg gat cac aac ggt atg gta 720 Trp Arg Yal Asp Glu Asp Leu Va1 Ser Ser Asp His Asn Gly Met Val ttc aat att aga ctc caa aaa 741.
fhe Asn Ile Arg Leu Gln Lys C210> 36 C21I> 247 C212> PRf C213> Artificial Sequence C400> 3B
His G1y Glu GIn Trp Asn Thr Thr AIa Lys Va1 Arg Pro Lys Asn G1y Pro Pro Ser Pro Pro Tyr Arg Va1 Leu G1n A1a Asn Leu Gln Arg Lys Lys Leu Ala Thr A1a GIu Leu Ala I1e Glu A1a Ala Thr Arg Lys Ala A1a Ile Ala Leu I1e GIn GIu Pro Tyr VaI GIy Gly Ala Lys Ser Met Lys Gly Phe Arg Gly Vai Arg Val Phe Gln Ser 'rhr Ala Gln Gly Asp 12/?8 Gly Thr Val Lys Al.a Ala Ile Ala Va3. Pie Asp His Asp Leu Asp Val Ile Glz~ Tyr Pro Glz~ Leu Thr Tbur Asn Asn zle Val Val Val Gly Zle Arg Tbur Arg Ala Trp Glu Zle Tic l,eu Val. Sec Tyx~ Tyr Phe Glu Pro Asp Lys Pxo Ii,e Glu Ser Tyr Leu Glu Gln IIe Lys Arg Va1 Glu Arg Lys Met Gly Pro Lys Arg Leu I1e Phe G1y Gly Asp A1a Asn A1a Lys Ser Thr Trp Trp Gly Cys Lys Glu Asp Asp Ala Arg Gly Asp G1n Leu let Gly Thr Leu Gly G1u Leu Gly Leu His Ile Leu Asn Glu Gly Asp 180 185 1.90 Va1 Pro Thr Phe Asp Tar Ile Axg Gly Gly Lys Axg Tyx Glza Ser Axg l95 200 205 Val Asp Val Tlt~x' PIZe Cys Thx Glu Asp Met Leu Asp Leu Ile Asp Gly 210 21.5 220 Tx-p Axg Val Asp Glu Asp Leu Val Ser Ser Asp His Asn Gly Met Val Phe Asn Ile Arg Leu Gln Lys C210> 37 C211> 269 C212> PRT
C213> Artificial Sequence C220>
<223> Description of Artificial Sequence: lOxHis-TRAS1 EN
C400> 37 Met Gly His His His His His His His His His His Sex Sex Gly His Ile Gl.u GIy Arg Hi.s Met His Gly GIu Gln Trp Asn Thr Th~r Ala Lys VaI Arg Pro Lys Asn Gly Pro Pro Ser Pro Pro Tyr Arg Val Leu Gln A1a Asn Leu G1n Arg Lys Lys Leu A1a Thr A1a Glu Leu A1a Ile G1u A1a Ala Thr Arg Lys A1a Ala IIe Ala Leu Ile Gln Glu Pra Tyr Val 65 'l0 75 80 Gly G1y Ala Lys Ser Met Lys G1y Phe Arg Gly Val Arg Va1 Phe G1n Sex Thr Ala Glz~ Gly Asp Gly Thx Val Lys Ala Ala Ile Ala Va1 Phe X00 1,05 110 Asp His Asp Leu Asp Val Zle GIn Tyx Pro Glz~ Leu Th~r Thx Asz~ Asn 1.15 120 125 Il.e Val Yal, Val GIy Ile Arg Thr Arg Ala Trp Glu Il,e Thr Leu VaJ.

Ser Tyr Tyr Phe Glu Pro Asp Lys Pro I1e Glu Ser Tyr Leu Glu G1n 145 ~ 150 165 I60 I1e Lys Arg Va1 Glu Arg Lys Met Glyr Pro Lys Arg Leu Ile Phe Gly Gly Asp Ala Asn Ala Lys Ser Thr Trp Trp Gly Cys Lys Glu Asp Asp Ala Arg Gly Asp G1n Leu Met GIy Thr Leu Gly Glu Leu Gly Leu His Ile Leu Asn Glu Gly Asp Val Pro Thr Phe Asp Thx zle Axg Gly Gly 21o a15 220 Lys Arg Tyr Gln Sex Axg Val Asp Val Thx Phe Cys Thr Glu Asp Met Leu Asp Leu Ile Asp Gly Trp Arg Val Asp Glu Asp Leu Yal Sex Sex Asp His Asn Gly Met Val Phe ,Asz~ Zle Axg Leu GIz~ Lys <210> 38 <211> 8403 <212> DNA
<213> ~ouaby~z moxx <220>
<221> CDS
<222> (2205) . . (3668) <220>
<221> CDS
<222> (3661).. (7383) <400> 38 cagtctgccg tggccctccg cagcgaacac gcgcgtccca agttgctccg cagtgattta 60 cgctcgaaaa atcattaaaa attccgaaaa gtggtgasaa gtgtaccaas ataacagtgc 120 gatacaccaa aatttgtgga aaagtgcggc gaacctgttg gtgaaacttt attttggatc 180 gcagtgaaat ttggacgatt ttccgtggaa aaactcggaa attgcagaat tcgaaacttt 240 gaccgcgtgc gccgcgggtt ttatcgctgt gacatacagt ttttctactg ctgtcctgac 300 gggcttgttg aaccctacaa tccaatacca aatttggaat ttttcggcca gagacggccg 360 agaaatcacc gaaaaaaaca aaaaceacgt ggcacgtggt cagtgatcgg cggccatttt 420 ggaaattttg aaaatagtga ctggaggagg cagacaagag aagcccagaa ccgtttcaag 480 acctgaaaaa agttaaggcg tttttaagtt attgaatttt gaaagtttgg caatattcga 540 aattttctat ctttttataa cgatcgataa catatagttt tttcatagtc atattaaagc 600 ttacatacac ctgcaacatt gttgaaasaa tcagaaattt otatcagaas attccggaga 664 aaataatttt tcaaaatttt tcaaattttc aattgcattt aaaaaataat tattaaaaat 7Z0 ctagaacata ttaatttaaa tattattgta gagttaaaaa tttagcatag ctttgaCgcc 780 ttatttgaaa ttattggacc gaaattgtac ttttataatc actttasgga aaaaatttat 840 tacttttaag aatattaaat cggacgtcgg ttggttttta ttgttaaata aatattcgtg 900 accctgagag aagcccttac aaaacgaacc caaactcgac tagtttcgga ccactttttg 960 aattttttaa tttttggctt attttatttg tgtttgtatt attaagtagg aactttaaaa 1,020 atttagaact gtacatactt tggctgtgcg tgttgtgtat gtcttagaat aggttatgaa 1,080 tttgattgat tcgtgaatca cttgtatttt tgacccgcat cgaaagcgga acaatataaa 119ra acatttatct taagccgtaa gaaaccgagc gaacttgtcg accsatagaa cagcgcggcg X200 gtgacgtgat atcgatgttg tttacatgca ggtgagcgga tgagtcacgc gccgcaccat X260 ctggtcccgt cttttgtttc tttatgtttt acaacattta aatttgaagt aaaataacat 1320 cgaacataac acagccagcg tattgaatag cgaatagaac gaaatagaac gaasttgaat 1380 gCttggtttg gt'~aCtt$'C~ gtC8GtgaGt tagCtgtgGa tatttagtgC at&ataatat 1440 tagttttagt gtgaatgtga aatacat~a~ tgtattactt agaaattgtg ga~agtgtgt lsoo aactgagctg acgttattgt tttgtttcgt cttgtttttt ctctaatttt tattttgtat X680 cactatttta ttttttatat atttcatatt ataattcaga actgcgaata atttaattgt X620 gtctgttatg ttttaaaatt ttttaacagt ttgactgctt attttagtta gttggtcaat X680 atcgaaagcn naatrrantta aattaaatca ttatatataa tatacacaca cacacacaca 1740 cacacataca CaCacaCaCg cgcacacaca cacacacaca cacaagtaaa taaataaata 1800 atttcttacc gacatccgac gagtccagta ctcgtgtagt ctgacgaaat ctgcaacaaa 1860 asacagagac ataaactttc atatgcagag tacagctcct gtaaagtaas ataaaatatt 120 taaaaaaata gaaagttaca tatacatttc aataagacat atctatctgg gtagtataga 1980 cagcacgggt ttatattcga ctgaccctat tgggcaaata aacaagcctt ttttgggcgg 2040 aataagtgtg ttgcgacgac ttaattatta ttaagctaag taagacattg acaattattt 2100 ttacaatata tacattttac acatttacac ctatacattt atttaagcca taggacaact 2X60 trtccccgata tactcgcata acacattaca ttgcatcata ctaactaact gggctgaatc 2220 actatcggac tagacagtcg ttcgctagaa atacttcasc acctctgaca acatgttcgg 2280 aatacgatcc cccgtecaga aaagcgatga cgagtgataa atcaccccga aaccctaaat 2340 ccccagaagg cggcgagaga cgcagcatag gcttgagaat agatgaatgg gaaacctcca 2400 ifi/78 aaggtaaaac catcccatca tccactaagt tagccgcgca gactgtcgta gccgccasac 2460 caacaacttc gcaagccggc aaaacaacaa gtccaggcca atgccaggac agatctgcgg 2520 aagcaaaagc ctgcctttta aaggccaagc tgcacttaag caactccggg aatattaaaa 2580 gagaaataaa aatcgaggtt acagcggcac ttgacaaact ttaccaattg gtgaaggatg 2640 ccgaaatgga ccggaaasaa gggaaatcga aagaagagaa gacgggtgat gtggggcaaa 2700 gagcgactga taggaccttt acggtaagcc caatggacgc gaatatttat gtagccaaaa 2760 tggaggagca cgccaagctc ttgcaagaga gtaaasaaat gatgggagac ctgaaggaag 2$20 caatggagaa agcaacagtg acagcagcaa cgtacgccag cgtagccgcg atacagccag 2$$0 ctgctacaaa tgagaagcct gaagttatga ggcaaacact gcactccgtc gtgatcacet 2940 caaaggacga atgtgagact ggggaaaagg tgctcgacag agtaagaaag gcagtagatg 3000 caaaagaggg atggat8gag gt8888aaCg t~agg88ggC 888ag8t8g8 888gttatca 3060 taggcctagg taccaaggcg gaaagggaca aattaaaaaa taggttggag aaggcggaga 3120 ctcagctcca cgtcgaggaa gtggaaaacc gggatccctt aatgatgctt aggagtgttc 3180 ttaccataca ttcggatgag gacatcctta aagctctgag aaaccaaaat cgagatattt 3240 tcCgCgaGCt ctgcgaggga gaggacagag tggtgatCCg gtaCagaCgg 2gggcgagaS 3300 acecacacac asaccacgtg gtggtcageg tctcgeccac cgtetggcaa agagcaaccg 3360 gaaaaggaag cgtgcacata gatctgcgaa ggattaaagt agaagaccaa tctcctctgg 3420 tgcaatgcac gcgctgccta ggctatggac acagcaagag attttgcgtc gaatctgtag 3480 acctgtgtag ccattgcggg ggtccgcatt tgaaaactga atgttctgac tggttggcta 3640 aggtaccacc caaatgtagg aattgcacaa aggcagatat agataacgca gagcacasCg 3600 cttttgactc gaactgtcag gtgaggaaaa gatgggatga tttggcccga tcgactgtag 36fi0 cgtattgcta agggcagcca ggaggatcaa gtcccttatc gggtagtsca agcaaacctc 3720 caaagaaata aactagcgac aaacgaggtt cttgtggagg cggcaaggct caaaatcgcc 3780 gtgggccttc tacaggaacc atatgtgggt ggggcgaaag aaatgaaaac tcsaagggga 3$40 atgcgtgtgt tccaaaacgc tgatgtgagt ggtgggactg taaaagcagc gatagttgta 3900 ttcgaccata acatcaacgt agtgcagtac ccgaaactca ccaccaacaa catctgcgtg 3960 gtggggatCa aCaCCagCgC gtggagCatC aCgCtagtct ccttttattt CgagCCagaC 4020 catcceatag agccctatct tgaacatcta gggaaaatca aagaagaaat aggaagaagc 4080 aaaataatct acggaggaga ctcgaacgca aagagcacct ggtggggaag ccctagaata 4140 gataacaggg gtacaagtat gttgggaaca ctggaggaac tggaactgaa catattaaat 4200 acaggggaaa ttccgacctt cgacacgatt aggggaggaa agcgctacaa aagttatgtc 4260 gacgttacag cctgctcaac agacttgatg gatctggtga gcgactggag agttgatgaa 4320 ggactgacga cctcagacca caacgccatc ctatttaata ttcatacaaa acgagcaata 4380 ggaataaasa tacaaagaac cacaagaata tacaacacaa agaaagccaa ctggtcaaat 4440 tttcatgaga aaatgcgaca gttgatacaa gaaaaacaat tgaccattga aastataaaa 4500 caaataaata caatagcaga aattgaaata gcagaaaaca aatacacaaa cataattaaa 4560 acagtatgta accaaaccat acctaaga$a aaaacacaag aaaaatttac cttgcegtgg 9:$20 tggtctgatg agctagccgc aatgaaacgt gaagtcgcca ccagaaagcg cag$atccga 4680 tgcgctgcgc caatccgaag gtcacgggtc gtcgaaga~gt acctgaaact aaaacaagaa 4740 tatgagttaa aagcagctag tgcccagata gaaagttgga agaactattg tcatagacaa 4800 gataaggaag gagtgtggga gggastctat agggttattg gaagagtgac taaacgggsa 4860 gaagacttgc cactggaaaa agacggaaac attctagatg ctaagcagtc agtcaaattg 4920 ttgtcggaga cattctatcc aaaggattct accgacggcg ataacgacta ccatcgccaa 4980 atcagggasg aagccgaaaa agtgaattgt ggcaagcaaa ataataatat tttcgaaccg 5040 caattcacca tgtcagaatt gaaatgggca agtaactcct tcaaccccaa aaasgcaccc b100 ggagcagatg gctttacagc ggatatatgt catcatgcca taaacagtag ccctcatgta 5160 tttctcacgc tcctcaacaa atgtctggaa caaagctact tcccaaaggc ctggaaggaa 5220 gctaccgtgg tggtgrttgcg gaagccgggt aaagagtcat aeacaaacca caagtcgtat 5280 agaccaatcg gtttgcacac aatactgggc aaaatatacg aaaaaatgct gatttcacgc 5340 gtaaaatacc atttgatccc aaggacaagc acaaggcagt tcgggttcat gccacaaagg 5400 agcaccgagg actccctcta taccatgatg caacatattt ctaacaaaag gaaggaaaag 54$0 aaaatagtaa cgttggtgtc attagatata gagggagcct ttgattgcgc ctggtggcet 5520 gccatcagag tccgattagc ccaggaaaac tgtccactga acctgcggaa agtaatggac 5580 agctatctca cggatcgaaa agtccgagtc agatacgcag gggaagagca cagcgtgaat 5640 accagcaaag gctgtgtgca gggctcaatc ggtggccctg tgctgtggaa cctcctgttg s700 gacccactcc tgaaaagtct ggacacccaa aaagtgtact gtcaggcatt cgcagacgat 5760 gttgtccttg ttttcgacgg agacacggcg ttggaaattg aaaaccgggc caatgcggct 6$2O
ctcgaacatg ttcaggaatg gggtatcaat aacaaactga agttcgcacc acaaaaaact 5880 aaagctatgg tcattacaag gagattgaaa tatgatatcc cacggctgaa catgggcggg 5940 acagtcattc ccatgtctga agacattaag attctagggg taaccgtcga caacaagttg 6000 acatttaacg cgcacgtctc gaatgtttgc agaagagcga ttgaggtgta taaacaacta 6060 gccagagcag ccagggccag ttggggtcta caccccgagg tcattaaatt aatatatacc 6120 gccaccatag agcccatagt cttgtacgcc gccagtgtat gggtatcggc agtcgccaaa 6180 ctgggcgtaa ttaaacaatt agccgctgtg cagaggggea ttgcacaaaa ggtatgcaaa 6240 gcgtatcgca ccgtatctct taactcagct ctgatcctag cgggtatgct ccecctagac 6300 ctccgagttc gtgaggcggc etcattatac gaagecaaga agggacaact gctgccggg~ 6$60 ctggctgacg cggagattga gcaaatgaca ccttttgcag agatgccaca ccccgtggaa 6420 cgtgcggatc tgcagatagt ctgcttggag gaccaagaac aagtcgacgg taacagcgac 6480 tacgacgaat gtatttttac agacggaagt aaaatcggag gcaaagtggg ggccgcgctg 6540 tcgatttgga aaggggacac agagactaag acccgcaaac ttgccctgtc aaactactgc 8600 acggtctacc aagcagagct gctggcactg tgtgtggcga cgacggaagt caggaagagt 6660 aaaagcaaat cttttggagt ttatagcgat tccatgtcgg ccctccaaac cataacaaac 6?20 tatgatagcc cccatccact ggcagtcgaa gctagacaaa atattaaagc ctcgttactc 8780 caaggcaagg ctgtcacctt gcattggata asagctcacg cagggctgaa gggcaatgag 6840 agggccgacg gacttgcaaa ggaagccgct gaaaactcca ggaaaagacc agactacgat 6900 cgctgcccga tctcattcgt caagcgaagc ctacgaatga ccacgcttga ggaatggaac 6960 cggcgctata caactggcga gacggcatcc gtcactaagt tgtttttccc agatgcattg 7020 gtggcgtaca gaatagtgag aaagatacag cccagtaaca tactcacaca aatcatgacg 7080 gggcatggcg ggttctcgga atacttatgt cggtttaagt gtaaagagag cccgtcatgc 7140 atttgcgacc cagcagtgaa agaaaccgtt cctcatgtgc tggtggaatg tcccatcttt 7200 gcacaggcta gacatgacat cgagcaaaag ctggacgtaa aaattggact tgacacgctg 72$0 catgaaa~taa taatagacac asatagaaat cagtttttga aatactgcat agctatcatt 7320 ggaatagtaa taasasgasa taaatagaaa ataaagtatg tttacaatsa tattaagata 7380 tattagatat aagcatacaa tatattaagc aaaaacaaaa gtatatatat acgcagtaaa 7440 aataagaaaa caaaggaaat gggcttgaca taaattaaaa cttaccttcc tcttctcctg 7600 tccagctccc tggaaaataa aattaaaatt gaaaagtatt gttagaatag aactagaaga ?5$0 gaaatgtaac aastaacata gasaaaagta gaataagcac gtaatataat aagcaataga 7$20 ataagaagct aaaatgtaaa ccataggaat aagataataa tattgaaatt gttaaaatta 7680 tataaagata agatccaaat agtataagct tcaaaataga cataagtttc aaacaatatt 7740 tttgtaattg atttatgaat aaatgagcaa aatgctattt accccgaaaa ataaaacaat 7800 agtaggttaa agtagcggga gtcccacccg gctaagtatg acagatgatg aatgcaagaa 7860 agaaacagta tgaagcatga aagtgcgaga agcataaaag agagaactag aatgaatgag 7920 taacttagac taaaaaagac ccggtgatct cacgatcggg gaaaggcatt aaaaaaaaat 79$0 cttattaaaa aaaaaanasa aaa 8003 <210> 39 <2I1> 488 <212? PKT

<213> Bombya mori <400> 39 Leu Thr G1y Leu Asn His Tyr Arg Thr Arg Gln Ser Phe Ala Arg Asn Tar Ser Thr Pro Leu Thr Thr Cys Ser Glu Tyr Asp Pro Pro Ser Arg Lys Ala Met TLx Ser Asp Lys Ser Pro Arg Asn Pra Lys Sex Pro Glu Gly Gly Gl.u Axg Axg Sex lJ.e Gly Leu Ax~g Zle Asp Glu Txp Glu Thx' Ser I~ys G1y Lys Thr Ile Pro Ser Ser Thr Lys Leu Ala Ala Gln TS,r Val Val Ala A1a Lys Pro Thr Thr Ser Gln Ala G1y Lys Thr Thr Sex Pro Gly G1n Cys Gln Asp Arg Ser Ala Glu A1a Lys Ala Cys Leu Leu Lys A1a Lys Leu His Leu Sex Asn Ser Gly Asn I1e Lys Arg G1u Ile lls 120 125 Lys Ile Glu Val T5r Ala Ala Leu Asp Lys Leu Tyx Gln, Leu Val Lys Asp Ala GZu Met Asp Axg Lys Lys Gly Lys Ser Lys Glu GJ.u Lys T~
1.45 X50 1.55 X60 Gl,y Asp Val Giy Glrr Arg Ala Thx~ Asp Arg Thx Phe Thr Val Sex Pxo Met Asp Ala Asn I1e Tyr Va1 Ala Lys Met Glu Glu His Ala Lys Leu Leu Gln G1u Ser Lys Lys Met Met Gly Asp Leu Lys G1u A1a Met Glu Lys A1a Thr Va1 Thr A1a Ala Thr Tyr A1a Ser Va1 A1a A1a Ile Gln 2i0 215 220 Pro A1a A1a Thr Asn G1u Lys Pro G1u Val Met Arg Gln Thr Leu His Sex Val. Val Zle Thx~ Sex Lys Asp Glu Cys Glu Thr Gly Glu Lys Val Leu Asp Arg Va1 Arg Lys A1a Val Asp Ala Lys Glu Gly Trp Ile Glu Ya1 Lys Asn Val Arg Lys Als Lys Asp Arg Lys Val TIe IIe G1y Leu G1y Thr Lys A1a Glu Arg Asp Lys Leu Lys Asn Arg Leu G1u Lys Ala Glu Thr Gln Leu His Va1 G1u Glu Val Glu Asn Arg Asp Pro Leu Met Met Leu Axg Ser Val Leu Thx Ile His Sex Asp Glu Asp Zle Leu Lys Ala Leu Arg ,Asz~ G].n Asrt Ax'g Asp Ile Phe A~rg Asp Leu Cys GZu Gly Glu Asp Arg Ya1 Val Ile Arg Tyr Arg Arg Arg Ala Arg Asn Pro His Thr Asn His Val Va1 Val Ser Va1 Ser Prr~ Thr Val Trp Gln Arg A1a Thx Gly Lys Gly Ser Yal His I1e Asp Leu Arg Arg I1e Lys Va1 G1u Asp Gln Ser Pro Leu Val Gln Cys Tar Arg Cys Leu Gly Tyx Gly His 4os ~l0 415 Ser Lys Arg Phe Cys Val Glu Ser Yal Asp Leu Cys Sex His Cys Gly Gly Pro His Leu Lys Tax Glu Cys Sex Asp Txp Leu Ala Lys Val Pra 435 440 4~;5 Pro Lys Cys Axg Asn Cys Thx Lys Ala Asp Zle Asp Asn Ala Glu His 4&0 455 4B0 Asn Ala Ph~e Asp Sex Asn Cys Gln Ya7, A~'g Lys Arg Trp Asp Asp Leu Ala Arg Set- Thr Val Ala Tyr Cys <210> 40 <211> 1228 <212> PRT
<213> ~ombyx mori <400> 40 Arg Ile Ala Lys Gly Ser Gln Glu Aep Gln Val Pro Tyr Arg Va1 Yal Gln A1a Asn Leu G1n Arg Asn Lys Leu A1a Thr Asn Glu Yal Leu Val Glu Ala Ala Arg Leu Lys Ile Als Val Gly Leu Leu GZz~ Glu Pra Tyr 35 ~0 ~5 Val Gly Gly Ala Lys Glu Met Lys Thx GJ,n Arg Gly Met ,Axg Val Qhe 50 55 fi0 Gl~z~ A,sz~ Ala Asp Val Ser Gl,y Gly Thr Val Lys Ala A1a I1e Val Val Phe Asp His Asn I1e Asn Va1 Val Gln Tyr Pro Lys Leu Thr Thr Asn 85 90 ~ 95 Asn Ile Cys Val Val Gly Ile Asn Thr Ser Ala Trp Ser Ile Thr Leu Va1 Ser Phe Tyr Phe G1u Pro Asp His Pro Ile Glu I'xo Tyx Leu Glu 115 120 x26 His Leu Gly Lys Ile Lys Glu Glu Ile Gly Arg Ser Lys Zle Ile Tyr 130 1.35 1.~0 Gly Gly Asp Sex Asn Ala Lys Sex Thr Trp Txp Gl.y Ser Pro Sex Ile 195 x50 1,55 160 Asp Asn Axg Gly '~ Ser ~det Leu Gly Thr Leu GIu Glu Leu G1u Leu lfi5 170 175 Asx~ Zle Leu Asn Thr Gly Gl.u Ile Pxo Th~r Phe Asp Thr I1e Arg Gly 7,80 185 190 Gly Lys Axg Tyr Lys Sex Tyr Val Asp Val Thr Ala Cys Set Thr Asp Leu Met Asp Leu Val Ser Asp Trp Arg Val Asp G1u Gly Leu Thr Thr zs~rs Ser Asp His Asn Ala Ile Leu Phe Asn Ile His Thr Lys Arg A1a I1e Gly Tle Lys Ile Gln Arg Thr Thr Axg Ile Tyr Asn Thr Lys Lys Ala 2~5 250 255 Asn Txp Sex Asn Phe Hls Glu Lys Met Arg Gln Leu Ile Gln Glu Lys Glz~ Leu Thr Ile Glu Asn Zle Lys Gln. Zle Asn Thlc Zle Ala Glu Ile Glu Zle Ala Glu Asn Lys Tyr Thr Asn I1e Ile Lys Thr Val Cys Asn Gln Thr I1e Pro Lys Lys Lys Thr Gln Glu Lys Phe Thr Leu Pro Trp Trp Ser Asp Glu Leu Ala Ala Met Lys Arg Glu Val Ala Thr Arg Lys Arg Arg Ile Arg Cys A1a A1a Pro I1e Arg Arg Ser Arg Ya1 Yal G1u Glu Tyr Leu Lys Leu Lys Glz~ Glu Tyr Glu Leu Lys Ala Ala Ser Ala Gln Ile Glu Sex Tx~p Lys Asz~ Tyr Cys Hxs Axg Gln Asp Lys Glu G1Y

Val Tx~p Glu Gly Ile Tyr Arg Vai, Tie GJ,y Ax~g Val Thr Lys Arg Glu 385 390 ~ 395 400 Glu Asp Leu Pro Leu Glu Lys AsQ G1y Asn I1e Leu Asp A1a Lys Gln Ser Yal Lys Leu Leu Ser Glu Thr Phe Tyr Pro Lys Asp Ser Thr Asp G1y Asp Asn Asp Tyr His Arg G1n Ile Arg Glu G1u Ala Glu Lys Ya1 Asn Cys Gly Lys Gln Asn Asn Asn Ile Phe G1u Pro Gln Phe Thr Met Ser G1u Leu Lys Trp Ala Ser Asn Ser Phe Asn Pro Lys Lys A1a Pro 465 474 475 .e~$0 GLy Ala Asp GIy Phe Thr AIa Asp Ile Cys His His Ala Zle Asn Ser Ser Pro His Val Phe Leu Thr Leu Leu Asn Lys Cys Leu G1u Gln Sex Tyr Phe Fro Lys A1a Trp Lys G1u A1a Thr VaI Va1 Va1 Leu Arg Lys Pro G1y Lys G1u Ser Tyr Thr Asn His Lys Ser Tyr Arg Pro Ila G1y 530 535 54a Leu His Thr Ile Leu GIy Lys T1e Tyr G1u Lys Met Leu I1e Ser Arg 545 55a 555 560 Val Lys Tyx His Leu ZIe Pxo Arg Thr Ser Thr Arg Gln Phe Gly Phe Met Pxo GIn Axg Sex Thr GIu Asp Sex Leu Tyr TIZx Met Met Gl.z~ His Zle Sex Asn Lys Arg Lys GIu Lys Lys IIe Val Thr Leu Val Ser Leu Asp Ile Glu G1y A1a Ph$ Asp Cys A1a Trp Trp Fro Ala I1e Arg Yal Arg Leu A1a G1n G1u Asn Cys Pro Leu Asn Leu Arg Lys Yal Met Asp s25 630 635 644 Ser l~rr Leu Thr Asp Arg Lys Va1 Arg VaI Arg Tyr AIa GIy GIu Glu 645 65a 655 His Ser Ya1 Asn 1'hx Ser Lys GIy Cys VaI Gln Gly Ser IIe GIy GIy Pro Va1 Leu Trp Asn Leu Leu Leu Asp fro Leu Leu Lys Sex Leu Asp Thr Gln Lys VaI Tyx Cys Glz~ AIa Phe Ala Asp Asp VaI VaI Leu VaI
~9a 69s 700 Phe Asp GIy Asp Thr AIa Leu Glu Zle Glu Asn Axg AIa Asn Ala Ala ?05 7x0 715 720 Leu GIu His Yak, GJ~n Glu firp Gly Ile Asn Asn Lys Leu Lys Phe Ala Pro Gln Lys Thx~ Lys Al,a Met Val Ile Thr Arg Arg Leu Lys Tyx Asp 744 ?45 ?50 IJ,e Pxo Axg Leu Asn Met Gly Gly Thr Val I1e Pro Met Sex G1u Asp I1e Lys Ile Leu Gly Ya1 Thr Val Asp Asn Lys Leu Thr Phe Asn Ala 770 ~ 775 780 His Va1 Ser Asn Val Cys Arg Arg A1a I1e Glu Val Tyr Lys Gln Leu Ala Arg Ala Ala Arg Ala Ser Trp Gly Leu His Pro Glu Ya1 Ile Lys 805 81(1 815 Leu Ile Tyr Thx Ala Tax- Zle Glu Pro Zle Val Leu Tyr Ala Ala Ser Val Trp Val Ser Ala Val Ala Lys Leu Gly Val Zle Lys Gl.n Leu A'la Ala Va]. Gln Arg Gly Ile Ala Gln Lys Val Cys Lys Ala Tyr Arg Thr Val Ser Leu Asn Ser Ala Leu Ile Leu A1a G1y Met Leu Pro Leu Asp Leu Arg Val Arg Glu Ala Ala Ser Leu Tyr Glu Ala Lys Lys Gly Gln $85 890 896 Leu Leu Pro Gly Leu A1a Asp Ala Glu Ile Glu Gln Met Thx Pro Phe 900 9a5 910 Ala Glu Met Pro His Pro Val Glu Arg Ala Asp Leu Gln Ile Val Cys Leu Glu Asp Gln Glu Gln Yal Asp Gly Asn Sex Asp Tyr Asp Glu Cys Ile Phe Thr Asp Gly Ser Lys Zle Gly Gly Lys Val GJ.y Ala Ala Leu 9.~5 950 955 960 Sex Zle Trp Lys Gly Asp Thx~ Glu Thur Lys Thx~ Arg Lys Leu AZa Leu Ser Asn Tyx Cys Thr YaI. Tyr Gln Ala Glu Leu Leu Ala Leu Cys Val Al.a Thr Thr Glu Val Arg Lys Ser Lys Ser Lys Sex Phe Gly Val Tyr Ser Asp Ser Met Ser A1a Leu G1n Thr I1e Thr Asn Tyr Asp Ser Pro His Pra Leu Ala Val Glu A1a Arg Gln Asn Ile Lys A1a Ser Leu Leu Gln Gly Lys Ala Val Thr Leu His Trp Ile Lys Ala His Ala G1y Leu 1045 loco loss Lys Gly Asn Glu Arg Ala Asp Gly Leu Ala Lys Glu Ala Ala G1u Asn logo loss lo7a Ser Arg Lys Arg Pro Asp Tyr Asp Arg Cys Pro Ile Ser Phe Val Lys Axg Sex Leu Axg Met Thr Thx~ l,eu Gl.u GZu Trip Asn Arg Axg Tyx Thr Thr Gly Glu Thr Ala Ser Yal Thr Lys Leu Phe Phe Pro Asp A1a Leu Yal Ala Tyr Arg Ile Vnl Arg Lys Ile Gln Pro Ser Asn Ile Leu Thr G1n Ile Met Thr Gly His Gly Gly Phe Ser Glu Tyr Leu Cys Arg Phe Lys Cys Lys G1u Ser Pro Ser Cys Ile Cys Asp Pro Ala Val Lys Glu 1165 11.60 1, X65 Thr Val Pro His Val Leu Val Glu Cys Pro Zle Phe Ala Gln Ala Arg 117o x 17s x lso His Asp Ile Glu Glx~ Lys Leu Asp VaI Lys Zl.e Gly Leu Asp Thx Leu 1185 1.1.90 1.195 1200 His Glu Zle Ile Il.e Asp Thx Asn Axg Asn Gln, Phe Leu Lys Tyx Cys 1205 121,0 1216 1).e Ala Zle Ile Gl.y Zl.e Val Z].e Lys Arg Asn Lys x220 2225 <210> 41 <211> 235 <212> PRT
<213> Bouuby~ more <400> 41 A~rg lle Ala Lys Gly Ser Gln Glu Asp Gln Val Prv Tyx~ Axg Val Val x ~ Ia xs Gln Ala Asn Leu G1n Arg Asn Lys Leu Ala Thr Asn G1u Va1 Leu Val G1u A1a Ala Arg Leu Lys Ile Ala Va1 Gly Leu Leu G1n Glu Pro Tyr Val Gly Gly Ala Lys G1u Met Lys Thr Gln Arg Gly Met Arg Val Phe Gln Asn Ala Asp Val Ser Gly Gly Tk~ur Val Lys Ala Ala I1e Val Val 65 7a 75 so Phe Asp Hzs Asn Ile Asz~ Val Val Gln Tyx Pxo Lys Leu Thr Thx Asn Asn Ile Cys Val Val Gly Ile Asz~ Thr Ser Ala Trp Sex Ile Th~c Leu Val Ser Phe Tyr Phe Glu Pro Asp His Pro Ila Glu Pro Tyr Leu Glu His Leu G1y Lys Ile Lys G1u G1u ile Gly Arg Ser Lys Ile Ile Tyr Gly Gly Asp Ser Asn Ala Lys Ser Thr Trp Trp Gly Ser Pro Ser Ile Asp Asn Arg Gly Thr Ser Met Leu Gly Thr Leu G1u Glu Leu Glu Leu 1S5 170 ~ 1T5 Asn Ile Leu Asn Thx Gly Glu Ile Pro Thr Phe Asp Thr Zle Arg Gly lso 1~~ 190 Gly Lys Arg Tyr Lys Ser Tyr Val Asp Val Thr Ala Cys Ser Thr Asp Leu Met Asp Leu Val Ser Asp Trp Arg Val Asp Glu Gly Leu Thr T~
210 21.5 220 Sex Asp Hi.s Asn Ala Ile Leu Phe Asn Z~.e His <210> 42 <211> 1859 <212> DNA, <213> Bomby~ m~oxi <2205 ~221> CDS
<222> (1) . - (X959) <400> 42 ggg cgc aag atc ttc caa agc get ggc cca aga gac gga aca gtt aaa 48 Gly Arg Lys Ile Phe Gln Ser Ala Gly Pro Arg Ash Gly Thx VaJ. Lys 1 5 10 x5 get gcg ata get gt~ ttt aac cct gac cta aac gta gtc caa tac cog 98 Ala A1a I1e A1a Val Phe Asn Fro Ash Leu Asn Val Val Gln Tyr Pro aag ctc acc aca aac aat atc gtg gtg gtg gga ctc cgt ace ggy get 144 Lys Leu Thr Thr Asn Asn Ile Val Ya1 Yal G1y Leu Arg Thr Xaa A1a tgg gag atc acg cta gtc tct ttt tac ttt gag aac tca gag ggt ggt 192 Txp Glu Zl~e Thx Leu VaJ. Sex Phe Tyx Phe Glu Asn Ser GJ.u Gly Gly 50 55 fi0 cag ccc atc act cct gac tta gas tat cta gtg cga ate gac aaa gaa 29:0 Glz~ Pxo Zle Thx Fx~o Asp Leu Glu Tyx Leu Val Axg Ile Asp Lys Glu atc ggg tct aaa aag tgg att att gga ggt gac gcc sac get aaa agc 288 Ile Gly Ser Lys Lys Trp Ile Ile Gly Gly Asp Ala Ast~ Ala Lys Sex tca tgg tgg ggg agt cca tta aac gac cac cgs ggt gag gag atg ctg 336 Ser Trp Trp Gly Ser Pro Leu Asn Asp His Arg G1y Glu Glu Met Leu 100 10b I10 ggt get ctc aat gag ctg ggg cta cac ata caa aac aga ggg gaa act 384 Gly A1a Leu Asn G1u Leu Gly Leu His Ile G1n Asn Arg G1y G1u Thr 115 12,0 125 ccg acc ttt gat acc atc cga ggg ggt aag cga tac eaa agt ttt gtg 432 Pro Thr Ph~ Asp Thr Ile Arg Gly G1y Lys Arg Tyr Gln Ser Phe Va1 29!.78 gac gta aca get tgt tct gcg gac tta atc ggc gta gtg gaa gac tgg 4$4 Asp Val Thx Ala Cys Ser Ala Asp Leu Ile Gly Va1 Yal Glu Asp Trp 145 150 155 x,60 aga gtt gac gag ggc ctg acg agc get gac cae aac ggc ata gta ttt 528 Arg Val Asp Glu G1y Leu Thr Ser A1a Asp His Asn Gly I1e Val Phe agt att cgg ctg cag aaa ttt ttg ggc aca aaa ata agt agg acc act 578 Ser I1e Arg Leu G1n Lys Phe Leu Gly Thr Lys Ile Ser Arg Thr Thr 180 1$5 190 agg ttg ttc aac aca aaa aaa gcc aat tgg acg agt ttt cgt gag aaa 624 Arg Leu Phe Asn Thr Lys Lys Ala Asn Trp Thx Sex Phe Arg G1u Lys cta aat caa aaa tta gta gaa aat aaa ttt ace acc get gaa ctc acg 6?2 Leu Asn Gln Lys Leu Yal Glu Asz~ Lys Phe Thx Thx Ala Glu Leu Thr 2x0 215 220 aat gta cga agt get gas caa ctt aac aac acc aca cta ggt ctt acg 720 Asz~ Val Arg Sex Ala G~,u G~,z~ Leu Asn Asn Thx Thx~ Leu Gly L.eu Thr aaa tca att aeg gac get tgc gta gag tcg atg cca atc aaa aaa tca 768 Lys Ser I1e Thr Asks Ala Cys Ya1 G1u Ser Met Pro Ile Lys Lys Ser aat gaa aca ctt acc ctg ccg tgg tgg tcc gag gaa ctc gca gat ctg 81fi Asn G1u Thr Leu Thr Leu Pro Trp Trp Ser Glu Glu Leu Ala Asp Leu aag agg aac gtt gcc act aaa aaa cgc aga atc agg aac get gcc cca 864 Lys Arg Asn Va1 Ala Thr Lys Lys Arg Arg Ile Arg Asn Ala Ala Fro atc cgg agg tca aag gtc atc gag gag tac ctg a~a caa aaa gas aag 9x2 ale Arg Arg Ser Lys Val ZJ.e Glu Glu Tyr Leu Lys Glz~ Lys Glu Lys tat gaa tta gas gea gca aaa get cag acg gaa agt tgg aag gag ttt 960 Tyr Glu Leu Glu Ala Ala Lys AJ.a Gln Tar Glu Sex Trp Lys Glu Phe 305 3x0 3X5 320 tgc tgt aag cag gac agg gag ggt gtg tgg gag ggt att tac agg gtg 1008 Cys Cys Lys Gln Asp Axg Glu Gly Val Txp Glu Gly Ile Tyr Axg Vat.

atc gga agg aca acg aat agg gag gag gat cag ccg ctg gta aag gaa 1056 I ~.e Gly Arg Thr Thr Asn Arg Gl,u Glu Asp Gln Pro Leu Val Lys Glu ggg gag gtt ctg gat gag aag ggc tct gtc aag ttt ctg gca gag acc 1104 Gly Gl,u Val Leu Asp Glu Lys Gly Sex Val Lys Phe Leu Al.a Glu Thx ttc tat cca gac gat ctc act gac gcc gaa aat agt gac cac cgt caa 1152 Phe Tyr Pro Asp Asp Leu Thr Asp Ala Glu Asn Ser Asp His Axg Gln acg cga gcc gaa gcc gaa aas gtg aat gat ggc aag caa gtt gag ccc 1200 Thr Arg Ala Glu Ala G1u Lys Va1 Asn Asp Gly Lys Gln Val G1u Pro tgc gac cca cca atc acg atg gcg gag cta caa cag gcc agc gaa tcc 1248 Cys Asp Pro Pro Ile Thr Met Ala Glu Leu G1n Gln A1a Ser G1u Ser ttt aac ccg aaa aaa gcc ccg gga gcg gat ggc ttt act gca gat att 1296 Phe Asn Pro Lys Lys Ala Pro Gly Ala Asp Gly Phe Thr Ala Asp Zle 420 425 .130 tgc caa cac gtc att aga aac cac agt gat gca ttt tta ata ttt ttg X344 Cys G1n His Val Ile Arg Asr~ His Ser Asp Ala Phe Leu Ile Phe Leu sac aag tgt etg gaa tae cat cac tt~t cca gag ctt tgg aag aag get 1392 Asn Lys Cys Leu Glu Tyr His His Phe Pro Glu Leu Trp Lys Lys Ala acg gtc gtg gtg ttg aaa aag ccg gga agg act gac tat acc acc ccc 1440 Thr Va1 Val Val Leu Lys Lys Pro Gly Arg Thr Asp Tyr Thr Thr Pro 465 ' 4?0 475 480 aaa gca tat aga cca att ggt cta ctg cca ata cta ggc aaa ata tat 1488 Lys A1a Tyx Arg Pro Zle Gly Leu Leu Pro rte Leu Gly Lys Ile Tyr gaa aaa atg ttg gtg tca cgc ctc aaa ttt cac cta ctg cct aaa atg 1536 Glu Lys Met Leu Val Ser Arg Leu Lys Phe His Leu Leu Pro Lys Met agt act cgc cag tac gga ttc atg ccc cag aga agc acc gag gac tcc x584 Ser Thr Arg Gln Tyr Gly Phe Met Pro Gln Arg Ser Thr Glu Asp Ser ctc tat aat cta atg caa cac att cac agg aaa tta gat gaa aag aaa 1632 Leu Tyr Asn Leu Met Glr~ His Zle His Arg Lys Leu Asp Glu Lys Lys ata att gtt att gtt ctg gtt tct ttg gac ata gag gga gcc ttc gat 1680 Ile I1e Val I1e Val Leu Va1 Ser Leu Asp I1e Glu G1y A1a Phe Asp agc gcc tgg tgg cca get ata cga gte cga etg get gag gas aag tgc 172$
Ser A1a Trp Trp Pro A1a Ile Arg Val Arg Leu Ala Glu Glu Lys Cys 5fi5 570 575 cca ctg aac ctc agg aag gta ttt gat agc tac ctg agg aac aga gag 1776 Pro Leu Asn Leu Arg Lys Val Phe Asp Ser Tyr Leu Arg Asn Arg Glu ata gtg gtt agg tat gcg gga gag gag tgc acc aaa atc act acc aag 1824 Ile Val Val Arg Tyr Ala Gly Glu Glu Cys Thr Lys Ile Thr Thx Lys ggg tgc gtc cag ggt tcc atc gga gga eca ate ttg tgg sac ctc ctc x872 Gl~y Cys Val. Gln Gly Sex Ile Gly Gly ~.'ro Ile Leu Trp Asn Leu Leu 610 fi15 fi20 ctg gac cct ctc tta aaa agc cta gag aat tgg ggt gag tat ggt cag 1920 Leu Asp Pro Leu Leu Lys Ser Leu Glu Asn Trp Gly Glu Tyr Gly G1n fi25 fi30 635 640 gcc ttc gca gac gat gtg gtc ctg gtt ttc gac gga gac 195g Ala Phe Aia Asp Asp Val Val Leu Val Phe Asp Gly Asp fi45 650 <210> 43 <z11> 6ss <212> PRT
<213> Bomby~ m~ori <400> 43 Gly Arg Lys T~.e Phe Gln Sex Ala Gly Pro Arg Asp GZy Th~r Val Lys x 5 1.0 x 5 Ala Ala Tle Ala Yal Phe ,A,sz~ fro Asp Leu Asn Vat, Val Glz~ Tyx Pro Lys l.eu Thr Thr Asn Asn Ile Yal Val Val. Gly Leu Arg Thr Xaa Ala 32/.78 Trp G~.u Zle Thr Leu Va1 Ser Phs Tyr Phe GIu Asn Ser G1u G1y G1y Gln Pro I1e Thr Pro Asp Leu G1u Tyr Leu Val Arg I1e Asp Lys Glu 65 70 ?6 80 I1e G1y Ser Lys Lys Trp Ile Ile Gly G1y Asp Ala Asn AIa Lys Ser Ser Trp Trp G1y Ser Pxo Leu Asn Asp His Axg Gly Glu Glu Met Leu X00 x05 1X0 Gly A1a Leu Asn Glu Leu Gly Leu Hxs Ile G].n ,A.sz~ Ax~g Gly Glu Thr Pro Thr Phe Asp Thx Zle Arg Gly Gly Lys Arg Tyr Gln Ser Phe Val Asp Val Thr Ala Cys Ser Ala Asp Leu I1e Gly Val Va1 G1u Asp Trp Arg Val Asp G1u Gly Leu Thr Ser Ala Asp His Asn Gly Ile Va1 Phe Ser Ile Arg Leu Gln Lys Phe Leu Gly Thr Lys Ile Sex Arg Thr Thr loo x~~ 190 Arg Leu Phe Asn Thr Lys Lys Ala Asn Trp Thr Sex Pk~e Axg Glu Lys Leu Asn Gln Lys Leu Yal Glu Asz~ Lys Phe Thx Thr Ala Glu Leu Thr 210 2x5 220 Asn Val Arg Se;r Ala Glu G~.n Leu Asn Asn Thr Thr Leu Gly Leu Thr Lys Sex Zle Thx~ Asp Ala Cys Val Glu Ser Met Pro IIe Lys Lys Ser Asn Glu Tht Leu Thr Leu Pro Try Trp Ser Glu Glu Leu Ala Asp Leu Lys Arg Asn Val Ala Thr Lys Lys Arg Arg I1e Arg Asn A1a Ala Pro I1e Arg Ar$ Ser Lys Val I1e G1u G1u Tyr Leu Lys Gln Lys Glu Lys 290 295 30p 33/.7$
Tyr Glu Leu Glu AIa Aia Lys Ala G~,n Thx' Glu Sax Txp Lys Glu Phe Cys Cys Lys Gln Asp Arg Giu Gly Val Trp GIu G1y Ile Tyr Arg Val Ile Gly Arg Thr Thr Asn Arg Glu Glu Asp G1n Pro Leu Ya1 Lys Glu Gly Glu Val Leu Asp Glu Lys G1y Ser VaI Lys Phe Leu A1a G1u Thr Phe Tlrr Pro Asp Asp Leu Thr Asp Ala Glu Asn Ser Asp His Arg Gln 370 3?5 380 Thr Arg Ala Glu Ala Glu Lys Va1 Asn Asp G1y Lys Glz~ Val Glu Pro Cys Asp Pxo Pxo Zle Thr Met Ala GIu Leu Gln Gln Ala Ser Glu Ser Phe Asn Pro Lys Lys AIa Pro Gly Ala Asp Giy Phe Thr Aia Asp IIe Cys Gln His Val Ile Arg Asn His Ser Asp A1a Phe Leu Ile Phe Leu Asn Lys Cys Leu Glu Tyr His His Phs Pro G1u Leu Trp Lys Lys Ala Thr Va1 Va1 Va1 Leu Lys Lys Pro G1y Arg Thr Asp Tyr ?hr Thx Pro 4s5 470 4?5 480 Lys Ala Tyr Arg Pro IIe Gly Leu Leu Pro Ile Leu Gly Lys Zle Tyx G1u Lys let Leu Val Sex Arg Leu Lys Phe His Leu Leu Pro Lys Met Ser Thr Arg Gln Tyr Gly Phe Met Pro Gln Arg Seer Thx Glu Asp Sex Leu Tyr Asn Leu Met Glz~ His Ile His Arg Lys Leu Asp Glu Lys Lys Ile Zle Val Ile Val ~,eu Val Sex' Leu Asp Zle Glu Gly Ala Phe Asp 545 550 555 5fi0 34/.78 Ser Ala Trp Trp Pro Ala I1e Arg Val Arg Leu Ala G1u Glu Lys Cys Pro Leu Asn Leu Arg Lys Val Phe Asp Ser Tyr Leu Arg Asn Arg G1u 580 5s5 590 I1e Va1 Yal Arg Tyr Ala Gly G1u Glu Cys Thr Lys Ile Thr Thr Lys Gly Cys Val Gln Gly Ser I1e Gly Gly Pro lle Leu Trp Asn Leu Leu Leu Asp Pro Leu Leu Lys Ser Leu Glu Asn Trp Gly Glu Tyr Gly Gln Ala Phe Ala Asp Asp Val Val Leu Val Phe Asp Gly ,Asp <210> 44 <211> 179 <212> PRT
<213> Bombyx mori <400> 44 Gly Arg Lys Ile Phe Gln Ser Ala Gly Pro Arg Asp Gly Thr Val Lys Ala Ala Ile Ala Val Phe Asn Pro Asp Leu Asn Va1 Va1 Glz~'~y;r Pro Lys Leu Thr Thr Asn Asn I~.e Val Val Val Gly Leu Arg Thr GZy Ala Trp Glu Zle Tbx Leu Val. Sex Phe Tyr Phe Glu Asn Ser Glu GIy G1y 50 55 fi0 Glz~ Pro Zl.e Thr Pro Asp Leu Glu Tyr Leu Val Arg I1e Asp Lys G1u ZLe Gly Ser Lys Lys Trp Ile Il,e Gly Gly Asp Ala Asn Ala Lys Ser Ser Trp Trp Gly Ser Pro Leu Asn Asp His Arg Gly G1u Glu Met Leu Gly Ala Leu Asn Glu Leu G1y Leu His I1e G1n Asn Arg Gly Glu Thr lls 1a0 12s Pro Th~r Phe Asp Thr lle Arg Gly Gly Lys Arg Tyr Gln Ser Phe Va1 Asp Val Thr Ala Cys Sex Ala Asp Leu Ile G~,y Yal Val Glu Asp Trp 145 x50 155 1~0 Arg Ya1 Asp Glu G1y Leu Thr Ser A1a Asp His Asn Gly Ile Val Phe Ser I1e Arg <210~ 45 <211> 2004 <212> DNA
<213> Bombyx mori <220>
<221> CDS
<222> (1) . . (2004) <400> 45 gca gta gcc ctt att cag gag ccg tac gtg ggt ggt tct tca aca gtg 48 Ala Val A1a Leu Ile Gln Glu Pro Tyr Val G1y G1y Ser Ser Thx Val agg g8a tat aaa gga get agg atc tac cag aat acc eat act gga ggt 96 Arg G1y Tyr Lys Gly Ala Arg Ile Tyr Gln Asn Thr Asn Thr G1y G1y ggg aCt gtC aaa gCg gCg ata gCt gta taC gaC agt gCC Cta gac gta 144 Gly Thr Yal Lys Ala Ala I1e Ala Yal Tyr Asp Ser Ala Leu Asp Ya1 agg cag tae cca aaa ctc ace aca aac sac atc get gtg gtg ggg atc 192 Arg Gln Tyr Pro Lys Leu Thr Thr Asn Asn rte Ala Val Va1 Gly Ile cag acg gtg gcy tgg gaa atc get ctt gta tcc ctc tac ttc gag cct 240 G1n Thr Va1 Xaa Trp Glu Ile Ala Leu Va1 Ser Leu Tyr Phe Glu Pro s5 70 7s so gat sag ccc att gaa cca tac ctc gat cat tta aag aaa att gag gag 288 Asp Lys Pro Zle Glu Pro Tyx Leu Asp His Leu Lys Lys Zle Glu Glu 36/7$
gcg att act aca aaa sac tgg ata atc ggt ggt gac gcc aat tct aaa 336 Ala Ile Thr Thr Lys Asz~ Trp z1e Ile Gly Gly Asp Ala Asn Ser Lys 100 i05 110 agt ttg tgg tgg ggt agc cgg atc aca gaa gac agg ggt gag gag tta 384 Ser Leu Trp Trp GIy Ser Arg lle Thr Glu Asp Arg Gly Glu G1u Leu x15 X20 125 ata ggt acs tta aat gaa cta gac atg sac gtg cta aac tct ggt acg 432 Zle Gly Tar Leu Asn Glu ~,eu Asp Met Asz~ Val 1"eu Asn Sex Gly '~hr 130 x35 140 act cct acc tat gat act atc aga gga gga aaa agg tac agc agt cat 480 Thr Pro Thr Tyr Asp Thr I1e Arg Giy G1y Lys Arg Tyr Ser Ser His 145 150 155 x80 gtg gac atc act gcg tgt tca aca aat atg ctc ga.a ctt att tcg gac 528 Val Asp I1e Thr Ala Cys Ser Thr Asn Met Leu Glu Leu I1e Ser Asp tgg aaa gtg gtg gaa ggc ctc acg agc teg gac cac aac ggc att gcc 576 Trp Lys Val Val G1u Gly Leu Thr Ser Ser Asp His Asn Gly I1e A1a 1$0 1$5 lsa ttt aat gtG Sgt cta aGt aa& tGt aaa ggt atg Sat gt8 gCt 8ga aCa 624 Phe Asn Val Sex ~,eu Thr ~,ys Sex Lys GIy Met Asn Val. Ala Ax~g Thr aea aga ata tat aae aeg aaa aaa get sac tgg aca aaa ttt cgc gas 672 Thr Ax~g Zl.e Tyr Asr~ Thr Lys Lys Ala Asn Tx~ Thr Lys Phe Arg Glu aaa cta aas tca aca atg cag act sat tta ata aat att gag gta ata 720 Lys Leu Lys Ser Thr Met G1n Thr Asn Leu IIe Asn Ile Glu Val IIe aat aaa att tct aag ata aac gat ata gat gat ata acg gat aaa tat 768 Asn Lys Ile Ser Lys Iie Asn Asp Ile Asp Asp Ile Thr Asp Lys Tyr ata aag get ata acg gag gea tgt act gag tca atg cct ttg aag aaa 816 Ile Lys A1s I1e Thr G1u A1a Gys Thr Glu Ser Met Pro Leu Lys Lys aag act gaa aaa ctt aca ttg ccg tgg tgg tcc gag gaa ctc gcg cgg 864 Lys Thr G1u Lys Leu Thr Leu Pro Trp Trp Ser Glu G1u Leu Ala Arg cta aag aag gaa gtc tct acc ttg aag cga cgt atc aga tgt gcg gca 912 Leu Lys Lys Glu Val Ser Thr Leu Lys Arg Arg Ile Arg Cps A1a A1a ecc atc cga cga gag agg gta gta gca gcc tac ctg get aag aaa cat 960 Pro I1e Arg Arg Glu Arg Val Yal Ala Ala Tyr Leu A1a Lys Lys His gcg tat gag ttg caa gcc gec aat get caa acc ata agc tgg aag eaa 1008 Ala Tyr G1u Leu Gln Ala Ala Asn A1a Gln Thr Ile Ser xrp Lys Gln ttc tgt gaa aaa cag gat aaa gaa ggt tta tgg gaa ggg ata tat aga 1058 Phe Cys Glu Lys Gln Asp Lys Glu Gly Leu Tx~p Glu Gl.y Ile Tyr Arg gta ata ggg aga acc act aca aga gaa gag gat ata ctt cta gta sag x104 Val Z],e G~,y A;~g Thx Thr Thx Axg Glu Glu Asp Ile Leu Leu VaI Lys aac gga atc acg ctg gac gcg gag aaa tcc gtg aga ttc ctc gcc gat 1152 Asn Gly Ile Thr Leu Asp Ala Glu Lys Ser Va1 Arg Phe Leu A1a Asp act ttt tac cca caa gac tcc aca aat acc gac aac ccg caa cac acc 1200 Thr Phe Tyr Pro G1n Asp Ser Thr Asn Thr Asp Asn Pro Gln His Thr ctg atc aga cac aga gcg gag ttc ata aac saa ata gta tta aat gaa 1248 Leu Ile Arg His Arg Ala Glu Phe Ile Asn Lys Zle Val Leu Asn Glu acc atc gac cca cct ttt act tta stg gaa tta aac tcc gca tat aaa X296 Thr xle Asp Pro Pro Phe Thx Leu let Glu Leu Asn Sex Ala Tyr Lys tca ttt aat get aag aaa get cct gge tca gat ggt ttc aca get gac 1344 Ser Phe Asz~ Ala Lys Lys Ala Pxo Gly Sex Asp Gly Phe Thr A1a Asp 435 4g;0 ata tgc tgc caa gec atc tct aac gac ccg ctg gta ttc ctg gcc ctc 1392 Zle Cys Cys Gln Ala Zle Ser Asn Asp Pxo Leu Val Phe Leu Ala Leu 450 455 ~0 ctg aac aaa tgt ctc gcc tac aat tat ttt cca tac gcc tgg aaa gaa 1440 Leu Asz~ Lys Cys Leu Ala 1'yr Asn Tyx Phe Pro Tyr A1a Trp Lys Glu ~5 470 476 480 gca acc gtc gtt gta ctt agg aaa cca ggg aaa gac gat tac tcc act 1488 A1a Thr Val Val Val Leu Arg Lys Pro Gly Lys Asp Asp Tyr Ser Thr cct asg tct tac cgg cct att ggg ctg ttg ccg gta ctc ggc aaa att 1536 Pro Lys Ser Tyr Arg Pro zle Gly Leu Leu Pro Val Leu Gly Lys Ile ttg gig aaa atg ctg gtg gcc cgc ctt aag cac tac tta ctt cca aaa 1584 Leu Glu Lys Met Leu Vat. Ala Arg Leu Lys Hxs Tyr Leu Leu Pxo Lys 53,5 520 525 ttc tgc gta aaa cag ttt ggg ttt ctg cca cag aaa agc acc gaa gac X632 Phe Cys VaI Lys G1n Phe G1y Fhe Leu Pro G1n Lys Ser Thr Glu Asp tcc ctc tac atc ctg atg aga cat gtt caa agt saa ctt gag cag aaa 1680 Ser Leu Tyr I1e Leu Met Arg His Val G1n Ser Lys Leu G1u Gln Lys aaa ata gtc acc ttg gtc tcg cta gat ata gag ggg gcg ttc gac agt 1728 Lys I1e Va1 Thr Leu Val Ser Leu Asp I1e Glu Gly Ala Phe Asp Ser get tgg tgg cea aaa att aag atc egc cta gcc gag atg aag tgt ccg 1776 Ala Trp Trp Pro Lys Zle Lys Ile Arg Leu Ala Glu Met Lys Cys Pro cca aat ttg agg cgt act atc gac agt tat cta act gac agg agg gtc 1824 Pro Asn Leu Arg Arg Thr Zle Asp Sex Tyr Leu Thx Asp Arg Arg Val agg gtt agg tat gcg gga aaa gag ttt ggc aaa aat act gat aag ggg 1872 Arg Val Ax~g Tyx~ Ala Gly Lys G].u Phe GJ,y Lys Asn Thr Asp Lys Gly 6x0 615 620 tgt gtt caa ggg tcc att gcg gga ccg cta ctg tgg sac ctt ctc ctt 1920 Cys Val Gln G1y Ser I1e A1a Gly Pro Leu Leu Trp Asn Leu Leu Leu gac cca atc ctt cac gag ctg agt gga atg gga ctg cat tgt cag gca 1968 Asp Pro Ile Leu His Glu Leu Ser Gly Met Gly Leu His Cys Gln Ala tty gcn gay gay gtg gtc ctg gtt ttc gac gga gac 2004 Xaa Xaa Xaa Xaa Val Val Leu Val Phe Asp G1y Asp sso 665 39/x$
<210> 46 <211? 6f 8 <2127 PRT
<213> Bombyx mori <400> 46 Ala Val Ala Leu Zle Glz~ Glu Pro Tyr Val Gly Gly Sex Sex Thr Val 1 5 10 la Ax~g Gly Tyx~ Lys Gly Ala Axg Ile Tyx Gl,n Asn Thr Asn Thr Gly Gly Gly Thr Va1 Lys A1a A1a Ile Ala Va1 Tyr Asp Ser Ala Leu Asp Val Arg Gln Tyr Pro Lys Leu Thr Thr Asn Asrr I1e Ala Ya1 Va1 G1y I1e Gln Thr Yal X,aa Trp G~lu Ile A1s Leu Val Ser Leu Tyr Phe Glu Pro Asp Lys Pro I1e Glu Pro Tyr Leu Asp His Leu Lys Lys x1e Glu Glu Ala Zle Thr Thx~ Lys Asz~ Trp Zle Zle Gly Gly Asp Ala Asn. Sex Gys loo l05 110 Sex leu Txp Trp Gly Seer Arg ZZe Thx- Glu Asp Ax~g Gly Gl.u Glu heu 115 X20 x25 lle Gly Thx~ Leu Asn Glu Leu Asp Met Asn Val Leu Asn Ser Gly Thr Thr Pro Thr Tyr Asp Thr Ile Arg Gly Gly Lys Arg Tyr Ser Ser His Val Asp I1e Thr Ala Cys Ser Thr Asn Met Leu G1u Leu I1e Ser Asp 1fi5 170 175 Trp Lys Va1 Val G1u G1y Leu Thr Ser Ser Asp His Asn Gly Ile Ala Phe Asn Va1 Ser Leu Thr Lys Ser Lys G1y Met Asn Va1 Ala Arg Thr Thr Ars Ile Tyr Asn Thr Lys Lys Ala Asn Trp Thr Lys Phe Arg Glu 40/'.T8 Lys Leu Lys Ser Thr 3Net Gln Thr Asn Leu Zle Asn xle Glu Va1 rte Asn Lys Ile Ser Lys Ile Asn Asp Ile Asp Asp Zle Thr Asp Lys Tyr lle Lys Ala Zle Thr Glu Ala Cys Thx Glu Sex Met Pxo Leu Lys Lys Lys Thx Glu Lys Leu Thr Leu Pxo Trp Trp Ser Glu Glu Leu Ala Axg Leu Lys LYs Glu Val Ser Thr Leu Lys Arg Arg I1e Arg Cys Ala Ala Pro Ile Arg Arg G1u Arg Val Val AIa Ala Tyr Leu A1a Lys Lys His Ala Tyr Glu Leu Gln A1a Ala Asn A1a G1n 'Fhr Ile Ser Trp Lys Gln Phe Cys Glu Lys Gln Asp Lys Glu Gly Leu Tx~p Glu Gly Ile Tyr Arg Val T1e G1y Arg Thx Thr Thr Arg Glu Glu Asp Zle Leu Leu VaZ Lys Asn Gly Zle Thx Leu Asp Ala Glu Lys Sex Val Arg Phe Leu A1a Asp Th~,r Phe Tyx Pro Gln Asp S.ex Thx Asn Thr Asp Asn Pro G1n His Thr Leu Ile Arg His Arg A1a Glu Phe Ile Asn Lys I1e Va1 Leu Asn Glu Thr Ile Asp Pro Pro Phe Thr Leu Met Glu Leu Asn Ser A1a Tyr Lys Ser Phe Asn A1a Lys Lys Ala Pro G1y Ser Asp Gly Phe Thr Ala Asp Ile Cys Cys G1n A1a I1e Ser Asn Asp Pro Leu Val Phe Leu Ala Leu Leu Asn Lys Cys Leu A1a Tyr Asn Tyr Phe Pra Tyr Ala Tarp Lys Glu 465 470 475 4$0 Ala Thr Val Val Val Leu Axg i.ys Pxo G~.y Lys Asp Asp Tyx~ Sex Thx Pxo ~,ys Sex Tyx Ax~g Pro Ile Gly Leu Leu Pro Va1 Leu Gly Lys Zle Leu GIu l.ys Met Leu Va1 Ala Arg Leu Lys His Tyr Leu Leu Pro Lys Phe Cys Val Lys G1n Phe G1y Phe Leu Pro Gln Lys Ser Thr Glu Asp Ser Leu Tyr Ile Leu Met Arg His Va1 Gln Ser Lys Leu G1u Gln Lys Lys Ile Val Thr Leu Val Sex Leu Asp Zle Glu Gly Ala Phe Asp Ser 5fi5 570 575 A1a Trp Trp Pxo Lys Ile Lys Zle ,Axg Leu Ala Glu Met Lys Cys Pxo 5$0 5$5 590 Pro Asn Leu Axg Axg Thx Zle Asp Sex Tyr Leu Thr Asp Arg Arg Val Axg Val Axg Tyr A1a G1y Lys Glu Phe Gly Lys Asn Thr Asp Lys G1y 6x0 615 620 Cys Val Gln Gly Ser Ile Ala G1y Pro Leu Leu Txp Asn Leu Leu Leu 625 630 636 f>40 Asp Pro I1e Leu His Glu Leu Ser Gly Met Gly Leu His Cys Gln A1a Xaa Xaa Xaa Xaa Val Yal Leu Val Phe Asp G1y Asp fis0 6s5 <210> 47 <211> 196 <212? PRT
<213> Bombyx mori <400> 47 Ala Val A1a Leu rle G1n Glu Pxa Tyx Val G7.y Gly Sex Sex- Thr Val Arg Gly Tyx Lys Gly Ala Axg Ile Tyr Gln Asn Thr Asn Thr G1y GIy GIy Thr Val. Lys AIa Ala I1e A1a Val Tyr Asp Ser A1a Lsu Asp Val Arg Gln Tyr Pro Lys Leu Thr Thr Asn Asn Ile Ala Yal Ya1 Gly I1e Gln Thr Ya1 Ala Trp G1u I1e Ala Leu Yal Ser Leu Tyr Phe Glu Pro Asp Lys Pro Ile G1u Pro Tyr Leu Asp Ilis Leu Lys Lys Ile Glu Glu ss ~o ~s Ala Ile Thx Thr Lys Asn Txp Ile IIe Gly Gly Asp AIa Asn Ser Lys 140 X06 x10 Ser Leu Trp Txp Gly Sex Axg Ile Thx~ G~,u Asp Arg Gly Glu Glu Leu x15 124 125 Zle Gly Leu Asn Glu Leu Asp Met Asn Ya1 Leu Asn Ser Gly Thr Thr Pro Thr Tyr Asp Thr Ile Arg Gly Gly Lys Arg Tyr Ser Ser His Val Asp Ile Thr A1a Cys Ser Thr Asn Met Leu G1u Leu Ile Ser Asp Trp Lys Val Val Glu Gly Leu Thr Ser Ser Asp H~.s Asn Gly IIe Ala Phe Asn Val Ser <2I0~ 48 t211> 1948 <212> ANA
<2I3> 8ombyx mori <220>
<z21> rags <222> (1) . . (1.908) <400> 4$
ggg act gtt aaa get gcg ata gtt gtc ttc aac sac gaC tte aaa gtt 48 G1y Thr Val Lys Ala Ala Ile Val Val Phe Asn Asn Asp Phe Lys Va1 ata cag tat cca aaa ctc atc acc aaa aac atc atg gtg gtg ggg atc 96 I1e G1n Tyr Pro Lys Leu I1e Thr Lys Asn Ile Met Val Val Gly I1e eaa acg gtt get tgg gag atc aca cta gtc tce ttt tac ttt gag ccg 144 GIn Thr VaI Ala Trp Glu ZIe Tar Leu VaI Ser Pie Tyr Phe G1u Pro gac tcc cct ata gaa ccc tac ctg gag cat ctg aaa agg ata gag ctc 192 Asp Ser k'xo ZIe GIu Pxo Tyr Leu GIu his Leu Lys Arg ZJ~e Glu Leu C...:.: .
gag att ggg tct ata aaa ttg ctc gtt gga gga gat acg aac gcg aaa 240 Glu xIe GIy Sex IIe Lys Leu Leu VaI GIy GIy Asp Tk~r Asn AIa Lys agc tcg tgg tgg gga agc cca ata ata gat cat agg ggt gaa gac ctt 288 Ser Ser Trp Trp Gly Ser Pro Ile I1e Asp His Arg Gly Glu Asp Leu tct ggg aCg ctt gag gas stg ggc cta cat ata ctc aac gca ggt gaa 336 Ser G1y Thr Leu Glu G1u Met Gly Leu His Ile Leu Asn A1a Gly G1u act ccg act ttc gac tgc gtt cga gga ggc sag cgg tat aca agc tat 384 Thr Pro Thr Phe Asp Cys Val Arg GIy Gly Lys Arg Tyr Thr Ser Tyr ata gac atc acc gcg tgt tca gtc gac cta cta gac ttg gtg gac ggc 432 I1e Asp ZIe Tbx AIa Cys Ser Val Asp Leu Leu Asp Leu VaI Asp GIy tgg aaa att g8c gaa ggt Ctc ~Gg agG tGg gat G8G aaC ggt att gta 4$O
Trp Lys IIe Asp Glu GIy Leu Thr Sex Ser Asp His Asn GIy Zle VaI
145 150 x55 160 ttt aat att cgg cta caa agg tca aaa ggc att aat atc tca aga aca 528 Phe Asn IIe Arg Leu GIn Axg Sex Lys GIy IIe Asn IIe Ser Arg Tour X65 1fi0 175 act agg saa ttt aac aca aaa asa gca aat tgg cct aag ttt cat gag ~a76 Thx Arg Lys Pk~e Asn Tour Lys Lys Ala ,A,sz~ Txp Pro Lys Pk~e Hxs Glu 180 1.85 X90 aag ctt agc caa tta atg caa gaa aac aaa cta aea gca gca gaa ata 624 44/7$
Lys Leu Ser G7.n Leu Met Gln Glu Asn Lys Leu Tlar Ala Ala Glu Ile aat aaC ata aac act ata gaa Caa ttg gaa aaa aca ata aat aC8 Ctt 6'~2 Asn .A,sr~ Zle Asn Tlar Ile Glu Gln Leu Glu Lys Tlur lle Asn Thx Leu acc aaa aca ata gat aat aca tgt aca atc tca ata cca att aaa caa ?20 Thx Lys Thx Ile Asp Asn Thx~ Cys Tk~x~ Zle Sex Ile Pro Zle Lys Gln aea aaa gaa aaa ctt acc ttg ccg tgg tgg tcc gag aaa cta get ggg ?B$
Thr Lys Glu Lys Leu Thr Leu Pro Tx~a Trp Ser Glu Lys Leu Ala Gly w atg aag aag gaa gtc gcc acc agg aaa cgt aga gtg cga aac gcc get $18 Met Lys Lys Glu Val A1a Thr Arg Lys Arg Arg Val Arg Asn Ala Ala cca att cgt agg tcc sag gtc gtt gaa gag tac cta aaa aag aaa gae~ $64 Pro Ile Arg Arg Ser Lys Val Val Glu Glu Tyr Leu Lys Lys Lys Glu 2?5 280 285 gag tat gag gag gag gca gcc aaa gcg cag aca gat agc tgg aaa gat 912 Glu Tyr Glu Glu Glu Ala Ala Lys Ala G1n Thr Asp Ser Tx~p Lys Asp ttt tgt tgt agg caa ggc ggg gag ggg gtt tgg agc gga ata tat aga 960 Pk~e Cys Cys Arg Gln Gly Gly Glu Gly Val Txp Sex Gly lle Tyr Arg gta ata tcg aga acg act act agg gag gaa gac tct ata ctg gta aag X008 Vat, Ile Ser Arg Thr Thr Thx Axg Glu Glu Asp Ser Zle Leu Val Lys gac gga gag ttc ctg gac gcg aag ggg tcc gca aag tta cta gcg gat 1066 Asp Gly Glu Phe Leu Asp Ala Lys Gly Ser AJ.a Lys Leu Leu Ala Ash aac ttc tat ccg gag gat ctg agg acc aac gat aac gcc tat cac cgc 1104 Asn Phe Tyr Pro Glu Asp Leu Arg Thr As,n Asp Asn Ala Tyr His Arg eag att aga agt gag get aat att gtg aat gtt ggt aaa caa act gag 1152 Gln I1e Arg Ser Glu Ala Asn Ile Va1 Asn Val G1y Lys G1n Thr Glu tat tgc gac cca cct ttc acg atg gcc gaa ttg aga cag geg agt gga 1200 45/?8 Tyr Cys Asp Pro Pro Phe Thr Met AIa Glu Leu Arg GIn Ala Ser Gly tcc ttc aac cca aaa aag gcc ccg ggc atg gac ggt ttc act gcg gac 1248 Ser Phe Asn Pro Lys Lys A1a Pro Gly Met Asp Gly Phe xhr Ala Asp atc tgc tgc cat acc ata gaa gcc aat cca gaa ctc ttt ttg tcg ttg 1296 xle Cys Cys His Thr Zle Glu Ala Asz~ Pro Glu Leu Phe ~,eu Sex Leu ctc aat aaa tgc ctg gag cta tat cat ttc ccc atg get tgg aag gta X344 Leu Asn Lys Cys Leu Glu Leu Tyr Hzs Phe Pxo diet Ala Trp Lys Val get aca gtt gta atg ctg agg aag cca gga aaa gga gac tac acc act 1392 Ala Thr Val. Vat. Met Leu Axg Lys Pro Gly Lys Gly Asp Tyr Thr Thr cca aag gca tac aga cca att gga cta ctg cct ata cta ggc sag att 1440 Pro Lys A1a Tyr Arg Pro Ile Gly Leu Leu Pro Ile Leu G1y Lys Ile tac gaa aag atg ctg gtg acc cgc ctc aaa ttc cat cta tta cca agg 1488 Tyr Glu Lys Met Leu Yal Thr Arg Leu Lys Phe His Leu Leu Pro Arg atg agt act cgc cag tac gga ttc atg cca cag agg ggt gcc gaa gac X536 Met Ser Thr Arg Gln Tyr Gly Phe Met Pro G1n Arg Gly Ala Glu Asp 500 505 51.0 tcc ctc tat att ctg atg caa cat atc cgc aag aag cta aaa gaa aag 1,584 Ser Leu Tyr Zle Leu Met Gln His lle Axg Lys ~,ys Leu Lys G~.u Lys aaa ata att gca tta gta tcg ttg gat ata gag gga gcc ttc gac agt 1632 Lys Ile Ile Ala Leu Val Ser Leu Asp Zle Glu G~.y Ala Phe Asp Ser gce tgg tgg cca gca ata aga gte cga ctg get gag gaa aag tgt cca 1680 Ala Trp Trp Pro Ala I~.e Arg Val Arg l.eu Ala Glu Glu Lys Cys Pro gta aat ctg agg cgg gtc ata gac agc tat ctt agt aac agg aag gtg 1728 Val Asz~ ~,eu A7rg Ax~g Val Ile Asp Ser Tyyr Leu Ser Ash Arg Lys Ya1 gtg gtc aag tac get ggg gag gaa tat gat aag gga acg aat a$g gga 1776 Yal Va1 Lys Tyr Ala Gly G1u Glu Tyr Asp Lys Gly Thr Asn Lys Gly tgt gtt caa ggc tca atc ggg ggc cca att tta tgg aac ctg ctg ctc 1824 Cys Val GIz~ Gly Sex Ile Gly GIy Pro Ile Leu Txp Asn Leu Leu Leu gac cct ctc ctg aaa agt ctc gaa aac agt gga gag tat tgc cag gcg 1872 Asp Pxo Leu Leu Lys Seer Leu Glu Asn Se;~ Gly GIu Tyx Cys Gln Ala 6x0 815 620 ttc gag gat gat gtg gtc ctg gtt ttc gac gga gac 1,908 Phe Ala Asp Asp Val Val Leu VaI Phe Asp G~,y Asp y. ::
C210? 49 C211> 636 C212> PRT
<213> Bombyx mori C400> 49 Gly Thr Val Lys Ala Ala Ile Val Val Phe Asn Asn Asp Phe Lys Va1 I1e GIn Tyr Pro Lys Leu Ile Thr Lys Asn I1e Met Val Val Gly rle Gln Thx Val Ala Txp Glu IIe Thx Leu VaI Sex Phe Tyx Phe Glu fro Asp Sex Pxo IIe GIu Pxo '~yx Leu Glu H;is Leu Lys Ax'g Ile Glu Leu Glu Z~,e GIy Sex Zle Lys Leu Leu VaJ, Gly Gly Asp Thr Asz~ AIa Lys Sex Sex Txp Trp GIy Sex Px~o I~,e Ile Asp Hxs Axg Gly Glu Asp Leu Sex GIy Thx Leu Glu Glu Met Gl,y Leu Hxs Ile Leu Asn, Ala Gly Glu 'Ihx~ Pro Thr Phe Asp Cys Val Arg GIy Gly Lys Arg Tyr Thr Ser Tyx Ile Asp I1e Thr Ala Cys Ser Val Asp Leu Leu Asp Leu Va1 Asp G1y 47/.78 Tx~p Lys Zle Asp Glu Gly Leu Thx Ser Sex Asp His A,sn Gly Il,e Val Phe Asn Ile Arg Leu Gln Arg Ser Lys Gly Ile Asn Ile Ser Arg Thx Thr Arg Lys Phe Asn Thr Lys Lys Ala Asn Trp Pro Lys Phe His Glu Lys Leu Ser Gln Leu Met Gln Glu Asn Lys Leu Thr Ala Als Glu Ile Asn Asn Ile Asn Thr Ile Glu G1n Leu G1u Lys Thr I1e Asn Thr Leu y. ; 210 215 220 .. .
Thr Lys Thr Tle Asp Asn Thr Cys Thr Ile Sex Ile Pro Zle Lys G1n Thr Lys Glu Lys Leu I'~ Leu Pro Txp Trp Sex Glu Lys Leu Ala Gly Met ~,ys hys G~,u Val AZa Thx- Axg Lys Ax~g A~rg Val Aarg Asn Ala Al.a Pro Xl,e Arg Arg Ser Lys Val Val Ghu Glu Tyr Leu Lys Lys Lys Glu G1u Tyr Glu Glu G1u A1a Ala Lys A1a G1n Thr Ash Ser Trp Lys Asp Phe Cys Cys Arg Gln G1y G1y Glu Gly Val Trp Ser Giy I1e Tyr Arg Val Ile Ser Arg Thr Thr Thr Arg Glu Glu Asp Ser Ile Leu Vai Lys Asp Gly G1u Phe Leu Asp Ala Lys GIy Ser Ala Lys Leu Leu Ala Asp Asn Phe Tyr Pro Glu Asp Leu Arg Thr Asn Asp Asn A1a Tyr His Arg Gln Ile Arg Ser G1u Ala Asn Ile Val Asn Val Gly Lys Gln Thr G2u Tyr Cys Asp Pro Pro Phe Thr Met Ala Glu Leu Arg Gln Ala Sex Gly 48/.78 Ser Phe Asn Pro Lys Lys Ala Pro G1y Met Ash G1y Phe Thr Ala Asp Ile Cys Cys His Thr Ile G1u A1a Asn Pro Glu Leu Fhe Leu Ser Leu Leu Asn Lys Cys Leu Glu Leu Tyr His Phe Pro Met Ala Trp Lys Val Ala Thr Val Val Met Leu ,Axg Lys Pro G1y Lys Gly Asp Tyr Thr Thr Pro Lys Ala Tyx Arg Pro Ile Gly Leu Leu Pxo Ile Leu Gly Lys Ile ~, .,.. 486 470 475 480 Tyr Glu Lys Met Leu Val Thr Arg Leu Lys Phe His Leu Leu Pro Arg Met Ser Thx~ Arg Gln Tyr Gl,y Phe Met Pro Gln Arg Gly Ala Glu Asp Ser Leu Tyr IIe Leu Met G1n His Ile Arg Lys Lys Leu Lys G~.u Lys Lys I1e I1e A1a Leu Val Ser Leu Asp I1e G1u G1y A1a Phe Asp Ser A1s Trp Trp Pro Ala Ile Arg Ya~1 Arg Leu A1a G1u Glu Lys Cys Pro Va1 Asn Leu Arg Arg Val I1e Asp Ser Tyr Leu 5er Asn Arg Lys Va1 Yal Val Lys Tyr Ale Gly Glu G1u Tyr Asp Lys GIy Thr Asn Lys Gly Cys Val Gln Gly Ser Ile Gly Gly Pro Iie Leu Trp Asn Leu Leu ~,eu Asp Pro Leu Leu Lys Ser Leu G1u Asn Ser Gly Glu 1'yr Cys Gln Ala 61.0 61.5 620 Phe Ala Asp Asp Val Val Leu Val Phe Asp Gly Asp <210> 50 49/.78 <21X> 164 <212> PRT
<213> Bombyx mori <4oa> 50 Gly Thr Va1 Lys A1a A1a I1e Val Val Phe Asn Asn Asp Phe Lys VaJ.
1 5 x0 I5 Ile G1n Tyr Pro Lys Leu I1e Thr Lys Asn Ile Met Va1 Val Gly Ile Gln Thr Val Ala Trp G1u I1e Thr Leu Val Ser Phe Tyr Phe G1u Pro y;:v,. Asp Ser Pro I1e Glu Pro Tyr Leu Glu His Leu Lys Arg Ile G1u Leu Glu I1e G1y Sex Tle Lys Leu Leu Va1 G1y G1y Asp Thr Asn Ala Lys Ser Ser Trp Trp Gly Ser Pro Zle lle Asp His Arg Gly Glu Asp Leu Sex Gly Thr Leu Glu Gl,u Met Gly Leu His Zle Leu Asz~ Ala Gly Glu l.oo xa5 axo Thr Pro Thr Phe Asp Cys Val Axg Gly GJ,y Lys Arg Tyr Thr Ser Tyr 115 1.20 125 Ile Asp Ile Thr Ala Cys Ser Val Asp Leu Leu Asp Leu Va1 Asp Gly Trp Lys Ile Asp Glu Gly Leu Thr Ser Ser Asp His Asn Gly I1e Val Phe Asn I1e Arg <210? 51 <21I> 1842 <212? DNA
<213> Dictyoploca japonica <220?
<221? CDS
<222> (1) . . (1842) 50/.78 «00> 51 acc acc aac aac atc gtc gta gtt gag gtc gga tcc aaa a~a tgg aag 48 Thr Thr Asn Asn Ile Va1 Val Val Glu Val. Gly Ser I,ys Lys Txp Lys 1 5 1.0 X 5 acc acs atc atc tct g~c tac ttc gaa cca gac cag caa cta gac gac 96 Thr Thr Ile I1e Ser Val Tyr Phe Glu Fro Asp Gln Gln. Leu Asp Asp tat ctc cac cac ctg agc cgg gtt gtc ggg gag ctt ggg cca aaa gcc 1~4 Tyr Leu His His Leu Ser Arg Va1 Va1 G1y Glu Leu Gly Pro Lys Ala ata ata ata gga ggc gac ata aac gcg aaa agt att tgg tgg ggg agc 192 Ile I1e Ile Gly Gly Asp Ile Asn Ala Lys Ser Ile Trp Trp Gly Ser gab aGC tCa gaG agG aga ggg gag gCC gta Ctg ggt ttC gCa 8aC Ca8 24O
Glu Thr Ser Asp Ser Arg Gly Glu Ala Val Leu Gly Phe Ala Asn Gln 65 ?'0 75 80 cat ggg ttg gaa ata cta aac CgC ggC ggg gtt CGa aCa ttG Ca8 aCC 2$8 His Gly Leu Glu Zle Leu Asn Axg Gly Gly Val Pro Thr Phe Gln Thr gta aga gga agc agg atg ttg agc agc atc atc gat gtc acc ttc tgc 33fi Val Arg Gly Ser Arg Met Leu Ser Sex Zle ZZe Asp Val, Thr Phe Cys 100 x05 1,10 tcc cct gac ctg cta agc ggg ata gag gag tgg agg gtg atc gat gat 384 ,~. Ser Pro Asp Leu Leu Ser Gly I1e Glu Glu Trp Arg Va1 I1e Asp Asp r.
..: 115 120 125 cta att agt '~cc gat cac aac tgc atc atg ttt aca ctc agg sag gag 432 Leu Ile Ser Ser Asp His Asn Cys Ile Met Phe Thr Leu Arg Lys Glu ggg tea agt gae aag cca agg get aac aca acc agg aaa tat sac acc 480 Gly Ser Ser Asp Lys Pro Arg Ala Asn Thr Thr Arg Lys Tyr Asn Thr agg aat gtg aat tgg caa acc ttt att gaa aag gtt tcc caa ctc aag 628 Arg Asn Va1 Asn Trp Gln Thr Phe I1e Glu Lys Va1 Ser Gln Leu Lys aag aaa aga aca ctt gac act cga agg ttg gaa aac gtc aat aca aag 676 Lys Lys Arg Thr Leu Asp Thr Arg Arg Leu G1u Asn Val Asn Thr Lys 51/7$
gag gag tta gac aac aca ata aac gag tat gat gca gtt gtg acc gaa 624 Glu Glu Leu Asp Asn Thr Ile Asn G1u Tyr Asp Ala Va1 Va1 Thr Glu get tgt gac gaa gtg cta cct aaa ttt aag tgt aga ccc aaa ctg aas 672 Ala Cys Asp Glu Va1 ~.eu Pro Lys ~'he Lys Cys Arg Pxo Lys Leu Lys att cct tgg tgg tcc aag gaa ttg gaa gtt aaa aag aaa aga aca ctc 720 xle Pro Txp Trp Sex Lys Glu Leu Glu Val Lys Lys Lys Arg Thx Leu acc ctt aag cgc cga atc cgc agc aca gca cct agc agg agg aac aCC 7$$
t..'.v:, . 'fhx Leu Lys A~rg Axg I7.e Axg Ser Thr Ala Pxo Ser Arg Axg Asz~ Thr gta gtg gcc agg tac cta gas cag aag gaa gas tat gag aac gat gtt 8X6 Val Val Ala Arg Tyr Leu Glu Gln Lys Glu Glu Tyr G1u Asn Asp Val aga aaa get aaa ata acg agt 'tgg aaa g88 ttc tgt gas agg cag gaa $64 Arg Lys Ala Lys I1e Thr Ser Trp Lys G1y Phe Cys G1u Arg Gln Glu aag gaa gga atg tgg gat gga ata tat agg ata ata agg aca gtg acg 912 Lys Glu G1y Met Trp Asp G1y Ile Tyr Arg I1e Ile Arg Thr Yal T'hr agg aga cag gag gat gaa agg ttg gtg aaa ggg gac agg gtt ctt tct 9$0 Arg Arg Gln Glu Asp Glu Arg Leu Val Lys Gly Asp Axg Val Leu Sex .. . ' 305 3x0 315 320 cct aaa gag tcg gcg gaa gaa ctc gcg gtc acc ttc ttt ccg gac gac 1008 Pxo Lys Glu Sex Ala Glu Glu Leu Ala Val Thr Phe Phe Pxo Asp Asp gac agc caa act gac ctc gac cac cac agg aag gta agg ggg gcg gtg 1056 Asp Sex Gl.n Thx Asp Leu Asp H;is Hxs Axg Lys Val Axg Gly Al.a Val gag caa ata aga gac cag gac ttg gga gag gaa cat gac cct cct ttt 1.104 Glu Glz~ Zle Axg Asp Gln Asp Leu Gly Glu Glu Hxs Asp Pxo Pxa Phe act aag gag gaa atg cta aac gcc gcg agg gca ttt aat cca sag aag X152 Thx Lys Glu Glu Met Leu Asn AIa Ala Axg Ala Phe Asn Pxo Lys Lys gca cca gga gag gac gga ata act gcg gac att gtc cga gcg ttg gtc 1200 AIa Pxo GIy GIu Asp GIy ZIe Thx AIa Asp ZIe VaI Az~g AIa Leu VaI

gaa gga gac acg gac ttc cac ctg acg atg gtc aac agg tgt cta caa 1248 Glu Gly Asp Thx~ Asp the His ~,eu Thx Met Val. Asn A;~g Cys Leu Gln 405 4x0 415 tta agc ctt ttc ccc acc att tgg aag gag gcc acg. gta ata gta ctc 129$
Leu Ser Leu Phe Pro Thr Zle Txp Lys Glu Ala Thx Val, IIe VaJ. Leu cgg aaa ccg gat agg gaa acc tat aca aca gcg aaa tct tat agg cct 1344 ~ ... Arg Lys Pro Asp Arg Glu Thr Tyr Thr Thr A1a Lys Ser Tyr Arg Pro '~, . 435 440 445 att ggc ctc ctc cca gtg atg ggg aaa cta cte gag.agg atg att gtg 1392 Ile Gly Leu Leu Pro Val Met G1y Lys Leu Leu Glu Arg Met Ile Va1 ggc cgt ctC agg tgg cac ctg gtc ccg aga ctg agt ccg agg cag tat 1440 Gly Arg Leu Arg Txp His Leu Val Pro Arg Leu Ser Pro Arg GIn Tyr 4$5 470 475 ,4g0 ggt ttc gta cca caa aag agc acg gag gac gcc ctc tat gat ttg atg 1488 GIy Phe VaI Pxo GZr~ Lys Sex Thr GIu Asp AIa Leu Tyx Asp Leu Met aag cac ata cgc gaa aag ctg gaa caa aaa caa ata atc acc atg gtt 1536 Lys Hxs Ile Arg Glu Lys Leu Glu Gln. Lys GLn. I7,e Z~,e Thar Met Val '=. 500 505 510 tca ttg gac ata gag ggg gcg ttc gac agc gcc tgg tgg cca gcc att 1584 Sex Leu Asp Ile Glu Gly Ala Phe Asp Sex Ala Trp Tx~p ~xo Ala Ile agg ctg aga ctc gcg gag gag aga tgc ccg ata aat att aga cat ata 1632 Arg Leu Arg Leu A1a Glu Glu Arg Cys Pro I1e Asn I1e Arg His Ile gtg gat aat tac tta atg gac agg aga gtg aga ttg aga tat gcg ggg 1680 Ya1 Asp Asn Tyr Leu Met Asp Arg Arg Val Arg Leu Arg Tyr A1a Gly 8a8 gag 8t8 gta agg agc act acc aag gga tgt gtc cag ggt tcg ata 1728 G1u Glu Yal Val Arg Ser Thr Thr Lys Gly Cys Va1 Gln Gly Ser Ile ggg gga ccc att ttg tgg aac ctc ctt atc gac cca ttg ttg aag gac 1776 Gly Gl.y Pro Ile Leu Trp Asn Leu Leu Ile Asp Pro Leu Leu Lys ,Asp ttg ggg cgg aaa gga tac cac gcg caa get ttt gCa gat gac gtg gtc 1.829:
Leu Gly Arg Lys Gly Tyr His Ala G1n A1a Phe Ala Asp Asp Val Val ctg gtt ttc gac aga gac 1842 Leu Va1 Phe Asp Arg Asp r...':.;.. <210? 52 <211> 614 <212> PRT
<213> Dictyoploca japonica <400> 52 Thx Thr Asn Asn Zle Val Yal Val Glu Val Gly Sex ~,ys Lys Txp Lys x 5 10 X5 Tbx' Thx' Zle Zle Sex Val Tyx Phe Glu Px~o Asp Gln Glz~ Leu Asp Asp Tyr Leu His His Leu Ser Arg Val Yal Gly Glu Leu G1y Pro Lys Ala I1e I1e Ile G1y G1y Asp I1e Asn A1a Lys Ser I1e Trp Trp Gly Ser 50 55 fi0 GluThr Ser Asp Ser Arg G1y Va1Leu Gly Phe Ala Gln G1u Ala Asn 65 70 75 g0 HisG1y Leu Glu Ile Leu Asn GlyVa1 Pro Thr Phe Thr Arg Gly G1n Va1Arg G1y Ser Arg Met Leu IleI1e Asp Val Thr Cys Ser Ser Phe SerPro Asp Leu Leu Ser Gly G1uTrp Arg Val Ile Asp I1e Glu Asp LeuIle Ser Ser Asp His Asn MetPhe Thr Leu Arg G1u Cys Ile Lys Gly Ser Ser Asp Lys Pro Arg A1a Asn Thr Thr Axg Lys Tyx Asn Thr Arg Asn Ya1 Asn Trp Gln Thr Phe I1e G1u Lys Val Ser Gln Leu Lys Lys Lys Arg Thr Leu Asp Thr Arg Arg Leu Glu Asn Val Asn Thr Lys Glu Glu Leu Asp Asn Thr Ile Asn GIu Tyr Asp A1a Val Yal Thr G1u Ala Cys Asp GIu Val Leu Pro Lys Phe Lys Cys Arg Pro Lys Leu Lys t.:....y;, IIe Pro Trp Trp Ser Lys Glu Leu GIu VaI Lys Lys Lys Arg Thr Leu Thr Leu Lys Arg Arg IIe Arg Sex Thx Ala Pxo Sex Arg Arg Asn Thr Val VaI. AJ.a Axg Tyr Leu Glu Gln Lys Glu Glu Tyr Glu Asn Asp Val Arg Lys Ala Lys I1e Thr Ser Trp Lys Gly Phe Cys G1u Arg G1n Glu Lys Glu G1y Met Trp Asp Gly Ile Tyr Arg I1e I1e Arg Thr Yal Thr Arg Arg Gln G1u Asp Glu Arg Leu Val Lys Gly Asp Arg Ya1 Leu Ser i.
v... .
Pro Lys Glu Ser Ala Glu Glu Leu Ala Val Thr Phe Phe Pro Asp Asp Asp Ser GTn Thr Asp Leu Asp His His Arg Lys Yal Arg Gly Ala Val Glu G1n I1e Arg Asp Gln Asp Leu GIy Glu Glu His Asp Pro Pro Phe Thr Lys Glu Glu Met Leu Asn Ala Ala Arg Ala Phe Asn Pro Lys Lys Ala Pro G1y GIu Asp Gly IIe Thr Ala Asp Ile VaI Arg AIa Leu Val Glu GIy Asp Thx Asp Phe His Leu Thr Met Val Asn Arg Cys Leu Gln 6s/7s Leu Ser Leu Phe Pro Thr I1e Trp Lys G1u Ala Thr Val Ile Val Leu 4a0 426 4so Arg Lys Pro Asp Arg Glu Thr Tyr Thr Thr Ala hys Sex Tyr Arg 1'ro 435 440 ~&
Ile G1y Leu Leu Pro Val Met Gly Lys Leu Leu Glu A~rg Met Zle Val Gly Axg Leu Arg Trp His ~,eu Val Pro Arg heu Ser Pxo Arg Gln. Tyr r,;:..." Gly Phe Val, Pxo Glz~ hys Sex Thx Glu Asp Ala Leu Tyr Asp Leu I~et ' ' 485 490 495 hys Hi,s Ile Arg Glu Lys Leu Glu Gln Lys G1n I1e Ile Thr Met Val Ser Leu Asp Ile Glu Gly Ala Phe Asp Ser A1a Trp Trp Pro Ala Ile Arg Leu Arg Leu Ala Glu Glu Arg Cys Pro Ile Asn I1e Arg His Ile Val Asp Asn Tyr Leu Met Asp Arg Arg Val Arg Leu Arg Tyr Ala Gly b45 550 556 560 Glu G1u Val Val Arg Sex Thr Thr lys Gly Cys Val Glzz Gly Sex Zle Gly Gly Pro Zle l.eu Trp Asn Leu Leu Ile Asp Pro Leu ~,eu Lys Asp 6$0 685 590 Leu Gly Axg Lys Gly Tyr His Ala Glzz Ala Phe Ala Asp Asp Val Val l.eu Val Phe Asp Arg Asp C210> 63 C211> 142 C212> PRT
C213> Dictyoploca japonica C400> 53 56/.78 Thx '~hx~ Asn Asn Zle Val Val. Val Glu Val Gly Sex Lys Lys Txp Lys Thr Thr Ile Ile Ser Va1 Tyr Phe Glu Pro Asp Gln Gln Leu Asp Asp Tyr Leu His His Leu Ser Arg Val Va1 Gly Glu Leu Gly Pro Lys Ala Ile I1e I1e Gly Gly Asp Ile Ash Ala Lys Ser I1e Trp Trp G1y Ser G1u Thr Ser Asp Ser Arg G1y G1u A1a Ya1 Leu Gly Phe A1a Asn Gln 65 ?0 75 80 i~;:, ' ' His Gly Leu G1u T1e Leu Asn Arg G1y Gly Val Pro Thr Phe Gln Thr s5 9a 95 Val Axg Gly Ser Axg Met Leu Ser Ser Tle xle Asp Val 2hr Phe Cys 100 l05 110 Sex Pxo Asp Leu Leu Sex' Gly Zle Glu Glu Txp Axg Val Zle Asp Asp XX5 x20 X26 Leu Ile Sex Sex Asp His Asn Cys ZJ,e Met Phe Thr Leu Arg x30 1,35 L40 C210> 54 C211> 1923 ,.:.
C212> DNA
C213> Samia Cynthia <220>
C221> CDS
C222> (1).. (1923) C400> 54 ggg act gtt aaa gcg get atc gtt gtt tat gga gat aag ttc ggg gtc 4$
Gly Thr Val Lys Ala Ala Ile Val Ya1 Tyr G1y Asp Lys Phe Gly Yal act gtt gat cce gga cte gte gac aaa aae atc gee get gca gtg ctg 96 Thr Va1 Asp Pro Gly Leu Val Asp Lys Asn Ile Ala Ala Ala Val Leu cac gag ggc cac ctg tcg cta ggg gtg atc tcc gtc tac ttc gag ccg 144 His Ala Gly His Leu Sex Leu Gly Val Tle Sex Val T'yr Phe Glu Pro 57/.78 aac gaa ccc ata gag acg tac att gta cga ctt gag aga atc tgc gac 192 Asn G1u Pro ile Glu Thr Tyr I1e Val Arg Leu Glu Arg Ile Cys Asp aaa tta gga gcg ctg aac ctg ata atc ggt ggc gac gtt aat gcc aaa 240 Lys Leu Gly Ala Leu Asn Leu Ile Ile G1y Gly Asp Val Asn A1a Lys age ctg tgg tgg gga tcc agt tcc gaa ggt cat agg ggt gag gcg tae ass Ser Leu Trp Trp Gly Ser Sex Ser Glu Gly his Axg Gay Glu Ala Tyr cgc tcc ttt ctg gat gcc act gga ctg cag atc ctc aat gaa ggc gac 33$
E.,: ;...
Arg Ser Phe Leu Asp A3.a Thx Gly Leu Gln Zle ~,eu Asz~ Glu Gly Asp X00 lOb 110 tta ccc aca ttc cag gtc gtt agg ggg ggc cgc ttg ttc aca agc att 384 I,eu Pxo Thx~ Phe G~,n Val Val Axg Gly Gly Arg Leu Phe Thx Ser Ile gta gac gtt acc gtc tgc agc ccc act ctc ctc ggc agg atc gat gac 432 Va1 Asp Val Thr Yal Cys Ser Pro Thr Leu Leu G1y Arg Ile Asp Asp tgg aag gtc gat atg aat tta aca tct tcc gac cat aac tcc atc acc 480 Trp Lys Va1 Asp Met Asn Leu Thr Ser Ser Asp His Asn Ser I1e Thr ttc tcg ata cgc gtg gac cag cct ctg cct agt cag agg ccg gtc act 52$
Phe Ser Ile Arg Yal Asp G1n Pro Leu Pro Ser Gln Arg Pro Val Thr X65 x70 x75 acg cgt atc tac aac aca aag aag gtt gta tgg tcg gaa ttc ata tcc 576 Thr Arg Zle Tyr Asn Thr Lys hys Val Val Txp Ser Gl.u Phe Ile Sex 1$0 1$5 190 aca ttc caa gag aaa ctg tcc gag agg agt ctg acg gca tgt aat gtg 624 Thr Phe Glxz Glu Lys Leu Ser Glu Arg Ser Leu Thr Ala Cys Asn Val gat aag gtc gag gac att gag ggt ctg gag acs gtg gtg tcc gat tac 672 Asp hys Val Glu Asp ile Glu Gly Leu Glu ~'hr Val Val Ser Asp Tyx gtc atc tgc att gag gaa gcc tgt aat aag gta gtt ccc aag gca ggt 720 Va3. ale Cys Zle Glu Glu Ala Cys Asn Lys Val Val Pro Lys Ala Gly Jr"$~7 ggc gta sag aaa gtg gca cgt cca ccc tgg tgg tct gaa gat ctg gac 768 Gly Val Lys Lys Val Ala Arg Pro Pro Trp Trp Sex Glu Asp Leu Asp cgc cta aaa agg gag gcg acg cga cgg aag cgc agg atc cgc tgc gcc 816 Arg Leu Lys Arg Glu Ala Thr Axg Arg lys Arg Arg Ile Arg Cys A1a gcc cct agc agg agg cag tac gtc gtc aag gat tat ctt gat get ctt $64 Ala Px~o Ser Arg Arg Gln Tyx~ Val Val Lys Asp Tyr Leu Asp Ala Leu rw.v gag ttg tac aag agg cag gca gcg gac gcc cag acg agg agc tgg aag 912 Glu Leu Tyr Lys Arg G1n Ala Ala Asp A1a Gln Thr Arg Ser Trp Lys gag ttt tgt $cg aca cag gas agg gab agc ctc tgg gsc gga atc tac 960 Glu Phe Cys Thr Thr Gln Glu Arg Glu Ser Leu Trp Asp Gly Ile Tyr agg gta ctt aga cgg aca gag cgg aga cgt gag gaa gtg ctg ctc aag 1008 Arg Va1 Leu Arg Arg Thr G1u Arg Arg Arg G1u Glu Val Leu Leu Lys gac ccc aac ggg gtc att ctt gac ccg cag ggg tcc gtg gag cgg cta 1058 Asp Pro Asn Gly Val Zle Leu Asp Pro Glz~ Gly Ser Val Glu Arg Leu gcc tcc gtg ctc ttt ccg gag gat act gta gag gat gas aca gag gat XX04 Ala Ser Val Leu Phe Pro GZu Asp Tar Val Glu Asp Glu Thr Glu Asp cac cgg gcc gtg agg gaa gga acg gag ggg atc ctc ccg gcc gat att x152 Hi.s Arg Ala Val. Arg Glu G,ly Thr Glu Gly Ile Leu Pro Ala Asp ile cgg ggg cta tcc gcg gac gat ccg ccg ttt acc cgg gaa gas gtg atg 1200 Ax~g Gl.y Leu Ser Ala Asp Asp Pro Pro Phe Th;r Arg Glu Glu Va1 Met cgg gtg tgt aaa gag att cac ceg etg aag get cct ggg aac gac gga 1248 Ax~g Vat. Cys Lys GIu Il,e Hxs Pro Leu Lys AJ.a Pxo Gly Asn Asp G~,y tta acc gcg gac ata tgc aca cgt gcg att cta ggg ggg gga gag gac 1296 Leu Thr A1a Asp I1e Cys Thr Arg Ala Ile Leu Gly Gly G1y Glu Asp gtg ttc etg get ett gca aac aag tgc ctg gag ctg tct cac ttc cet 1344 Val Phe Leu Ala Leu Ala Asz~ Lys Cys ~.eu Glu Leu Ser His Phe Pro aga ccc tgg aag gta gcc cac gtg tgc ata ctc cgg aaa ccc ggc agg 1392 A~'g Px'o T~ Lys Val Ala His Vat, Cys Zle heu Arg ~,ys Pro Gly Arg gag gac tac tgc gac cct aag tcg tac cgc ccg atc ggc cta ctt cct 1440 G1u Asp Tyr Cys Asp Pro Lys Ser Tyr Arg Pro Ile Gly Leu Leu Pro 465 4?0 475 480 gtg ttg gga aag ctc ctg gaa aag ctc ttc gtc cga cgt tta cgc tgg 1488 v Val Leu Gly Lys Leu Leu Glu Lys Leu Phe Yal Arg Arg Leu Arg Trp cat ctg ctg cca aaa ctc agc gtg cgc caa tat ggg ttt atg ccc cag 1636 His Leu Leu Pro Lys Leu Ser Val Arg G1n Tyr Gly Phe Met Pro Gln cgt ggg aca gag gac tcg ctc tat gat ctg gtg aat cat atc agg acc 1584 Arg Gly.Thx~ Glu Asp Ser Leu ~'yr Asp Leu Val Asn His Ile Axg Thr cgt gtg gtg gcc aag gag gtc gtt acc ctg gta tcg cta gac ata gag 1632 Axg Val. Val Ala Lys Glu Val Val I'hx Leu Val Sex Leu Asp Z],e Glu y ggg gcc ttt gat aae get tgg tgg ccg gge ttg aag tce eaa ctc ata 1680 '~. Gly Al,a Phe Asp Asn Ala Trp Txp Pxo Gly Leu Lys Sex Gln Leu Ile gcc aag gag tgc cca agg aat ttg tac ggt ata gtc tct tca tac cta 1728 Ala Lys G1u Cys Pro Arg Asn Leu Tyr Gly I1e Va1 Ser Ser Tyr Leu gag gac cgg aga gta gag ctc aat tac gcc ggt ata aag gtg agt cgg 177$
G1u Asp Arg Arg Val Glu Leu Asn Tyr Ala Gly Ile Lys Yal Ser Arg gaa agt tcc asg gga tgt gte cag ggt tct ata get gga ccg tcc ttt 1824 G1u Ser Ser Lys Gly Cys Val Gln Gly Ser Ile A1a Gly Pro Ser Phe 595 s00 say tgg gaC gtt atC Cta gaC tCC Ctt ttg gtg gag CtC gaC tCC gCg gga 1872 Trp Asp Ya1 I1e Leu Asp Ser Leu Leu Yal G1u Leu Asp Ser Ala G1y gtg tac tgt eag get ttt get gac gsc gtg gtc ctg gtt ttc gac gga 1920 Ya1 Tyr Cys Gln Ala Phe Ala Asp Asp Va1 Val Leu Va1 Phs Asp GIy gac 1923 Asp <210>55 <211>641 <212>PRT

<213>Samia cynthi.a <400> 55 Gly Thr Val Lys AIa Ala Zle Val Val Tyr Gly Asp Lys Phe Gly Val 1 5 1.0 15 Thr Val Asp Pro Gl.y Leu Val Asp Lys Asn Ile Ala Ala AIa Val Leu has Ala Gly His Leu Ser Leu Gly Val IIe Ser Val Tyr Phe Glu Pro Asn Glu Pro I1e GIu Thr Tyr Ile Va1 Arg Leu Glu Arg Ile Cys Asp Lys Leu Gly Ala Leu Asn Leu I1e I1e Gly G1y Asp Val Asn Ala Lys 65 70 75 g0 (.
Ser Leu Trp Trp Gly Ser Ser Ser Glu G1y His Arg G1y GIu Ala Tyr g5 90 95 Arg Ser Phe Leu Asp Ala Thr Gly Leu Gln I1e Leu Asn G1u Gly Asp x04 I05 110 Leu Pro Thr Phe Gln VaI Val Arg G1y Gly Arg Leu Phe Thr Sex Ire Va1 Asp Va1 Thr Val Cys Ser Pro Thr Leu Leu G1y Arg Ile Asp Asp Trp Lys Val Asp Met Asn Leu Thr Ser Ser Asp His Asn Ser Ile Thr I~5 150 15b 160 Phe Ser zle Arg Val_Asp Glz~ Pxo Leu Pro Sex Gln Arg Pro Val Thr 165 170 1.75 61/.78 Thr Arg Yle Tyr Asn Thx Lys Lys Val Val~ Tarp Sex Glu Phe Zle Ser Thr Phe GJ.z~ Glu Lys Leu Sex Glu Axg Sex Leu Thx Ala Cys Asn Val Asp Lys Val Glu Asp Zl.e Glu Gly Leu Glu Thr Val Val Ser Asp Tyx~

Val Zle Cys Ile Glu Glu Ala Cys Asn Lys Va1 Ya1 Pro Lys Ala Gly Gly Val Lys Lys Yal Ala Arg Pro Pro Trp Trp Ser GIu Asp Leu Asp Arg Leu Lys Arg G1u A1a Thr Arg Arg Lys Arg Arg Ile Arg Cys A1a A1a Pro Ser Arg Arg G1n Tyr Va1 Val Lys Asp Tyr Leu Asp A1a Leu Glu Leu Tyr Lys Arg G1n Ala Ala Asp Ala Gln Thx Arg Sex Txp Lys 2g0 295 300 Glu Phe Cys Thr Thx Gln Glu Axg Glu Sex Leu Txp Asp Gly Zle Tyx 305 31.0 3 i 5 320 Arg Val Leu Arg Arg Thx~ Glu Arg Axg Arg Glu Glu Ya1 Leu Leu Lys Asp Pxo Asn Gly Val Ile Leu Asp Pro Gln Gly Ser Va1 Glu Arg Leu Ala Sex Yal Leu Phe Pro Glu Asp Thr Val Glu Asp Glu Thr Glu Asp His Arg Ala Va1 Arg Glu G1y Thr Glu G1y I1e Leu Pro Ala Asp I1e Arg Gly Leu Ser A1a Asp Asp Pro Pro Phe Thr Arg G1u Glu Val Met 38b 390 395 400 Arg Val Cys Lys G1u Ile His Pro Leu Lys A1a Pra G1y Asn Asp G1y Leu Thr A1s Asp I1e Cys Thr Arg Ala Ile Leu Gly Gly Gly Glu Asp ' CA 02410974 2002-11-15 62/.78 Val Phe Leu Ala Leu Ala Asn Lys Cys Leu Glu Leu Ser Hxs ~'he Pro 435 444 44b Axg Pro Trp Lys Val Ala His Val Cys Zle Leu Arg Lys Pro Gly Axg Glu Asp Tyr Cys Asp Pro Lys Ser Tyr Arg Pro Ile Gly Leu Leu Pro Val Leu G1y Lys Leu Leu Glu Lys Leu Phe Ya1 Arg Arg Leu Arg Trp His Leu Leu Pro Lys Leu Ser Val Arg Gln Tyr Gly Phe Met Pro G1n Arg G1y Thr G1u Asp Ser Leu Tyr Asp Leu Ya1 Asn kris xle Arg Thr Arg Val Ya1 Ala Lys Glu Val Val Thr Leu Val Sex heu Asp Ile Glu Gly Ala Phe Asp Asn Ala Tx~p 1'rp Pro Gly Leu Lys Ser Glz~ Leu ZJ.e Ala Lys Glu Cys Pxo Axg Asn Leu Tyr Gly Ile Val Ser Ser Tyr Leu Glu Asp Arg Arg Val GIu Leu Asn Tyr A1a Gly I1e Lys Ya1 Ser Arg Glu Ser Ser Lys G1y Cys Va1 Gln G1y Ser I1e A1a Gly Pro Ser Phe 595 600 fi05 Trp Asp Va1 IIe Leu Asp Ser Leu Leu Yal G1u Leu Asp Ser Ala Gly fiI0 fi15 620 Ya1 Tyr Cys Gln A1a Phe Ala Asp Asp Ya1 Ya1 Leu Val Phe Asp Gly Asp <210>56 <211>164 <212>PRT

<213>Samia Cynthia <400> 56 Gly Thr Val Lys Ala Ala Ile Val Val Tyr Gly Asp Lys Phe Gly Val Thr Val Asp Px'o Gly Leu Val Asp Lys Asz~ Zle Ala Ala Ala Val Leu His Ala Gly His Leu Sex Leu Gly Val Ile Ser Val Tyr Phe Glu Pxo Asn Glu bra Zl,e Glu Thr Tyr Ile Val Arg Leu Glu Arg I1e Cys Asp l,ys Leu Gly Ala Leu Asn Leu I1e Ile Gly Gly Asp Val Asn Ala Ly~s Ser Leu Trp Trp Gly Ser Ser Ser Glu Gly His Arg G1y G1u Aia Tyr Arg Ser Phe Leu Asp Ala Thr Gly Leu G1n I1e Leu Asn Glu Gly Asp Leu Pro Thr Phe Gln Val Val Arg Gly Gly Axg Leu Phe Thx Ser Zle Val Asp Yal Thr Val Cys Sex Pro Thr Leu Leu Gly Arg Ile Asp Asp Trp Lys Val Asp let Asz~ Leu Thx Ser Seer Asp His Asn Ser I1e Thr Phe Ser Zle Ax~g <210>57 <211>1920 <212>DNA

<213>Samia cy~thi.a <220>
<221> CDS
<222> (1) . . (1920}
<.400> 57 ggg act gtg aaa gcc get ata gca ata ttt gat gaa aga ttg agt att 48 Gly Thr Va1 Lys A1a A1a I1e Ala Ile Phe Asp Glu Arg Leu Ser Ile ata gag cat tct cag cta aca acg cac aat gta gca gta gcc aca ctt 96 I1e G1u His Ser Gln Leu Thr Thr His Asn Ya1 Ala Va1 A1a T'hr Leu gac aca gga cac aca aaa att ggt atc ata tcg gtc tat ttt gag gac 144 Asp Thr Gly His Thr Lys rte G1y Tle Ile Ser Val Tyx Phe Glu Asp acc aaa cca tta acg cca tac ttg gac aaa ata aaa aca ata ata gaa 192 Tar Lys Pxo Leu Th~r fro Tyr Leu Asp Lys Ile Lys Th~r Zl,e Zle Gl.u ;'.~,,.::: aaa ttg gac aca aaa aaa gtt att ata gga ggg gac gta aat gcc tgg 240 Lys Leu Asp Thx' Lys Lys Val. Zl.e Zl.e Gly Gly Asp Val Asn Ala Tx~p s5 70 75 8a agt agt tgg tgg gga agt agg aga gag aat gat agg ggg gaa gag ata 288 Sex Sex Trp Trp Gly Ser Arg Arg Glu Asn Asp Arg G1y Glu G1u Ile acg gga tgg att aca gaa gaa gga tat cat gtc ctg aat caa ggt agt 338 Thr G1y Trp I1e Thr Glu Glu G1y Tyr His Va1 Leu Asn Gln Gly Ser ata ccg aca ttt tac acg ata aga gga ggg aaa gag tac cag agc tgt 384 Ile Pro Thr Phe Tyr Thr Ile Arg Gly G1y Lys Glu Tyr Gln Ser Cys gtg gac atc acg atc tgt tca gac caa ata cta agt aag ata cgt ggc 432 Val Asp Ile Thr Ile Cys Ser Asp G1n rte Leu Sex Lys Zle Axg Gly 130 x36 140 tgg acg att gac cag gaa ttg gta aat tcg gac cat aat tgc att aaa 480 Tx~p Tar Ile Asp Gln Glu Leu Val Asn Sex Asp His Asn Cys ZJ,e Lys 145 150 1.55 164 ttc caa ata ata aca gga gag cta aaa aca aga aca cag aaa aag aca 528 ~'~ae Gln Ile Ile Thr Gly Glu Leu Lys Thx Arg Tour Gln Lys Lys Thr lfi5 X70 X75 aca aga ata tat aaa aca caa aaa gcg aag tgg aca gat ttt aga tta 576 Thr Arg Ile Tyr Lys Thx Gln Lys Ala Lys Trp Tbur Asp Pie Axg Leu 18o xs5 19a acg att agt aaa aaa ata gaa gca aac aaa ata aca aet aat gca atc 624 Tbur Il.e Sex Lys Lys Ile Gl.u Al.a Asn Lys Ile Thr Thr Asn Ala Ile aga gas att aca gaa aca caa gaa tta gat agt ata ata gaa asa tac 672 Ax~g Glu Ile Thr Glu Thr G1n G1u Leu Asp Ser I1e Ile Glu Lys Tyr aac aac att ata aca cag gca tgt aaa tta aat atc cct aaa ata sat 720 Asn Asn Ile Ile Thr G1n Ala Cys Lys Leu Asn Ile Pro Lys I1e Asn aga aat aca act aat aaa caa aaa aat aac tta ccg tgg tgg acc gtc 768 Arg Asn Thr Thr Asn Lys Gln Lys Asn Asn Leu Pro Trp Trp Thr Va1 y:.. gag ttg gag gaa gaa aag agg aga gta ctt acg atg aag cga aga atc 816 G1u Leu Glu Glu G1u Lys Arg Arg Val Leu Thr Met Lys Arg Arg Ile 26o z65 270 egt tgc gca gca cag caa agg aag tcc cac gtc gtc gaa gaa tac ctg 864 Arg Cys A1a Ala Gln Gln Arg Lys Sex His Val Val Glu Glu Tyr Leu aaa agg aaa gaa aas tat gag tta gca gca aat gaa gca cgt aca sat 912 Lys Axg Lys GJ.u l.ys Tyr Gl,u Leu Ala Ala Asn G1u Ala Arg Thr Asn agc tgg aga gaa ttt tgc acc aag cag asa agg gaa acs atg tgg gag 960 Sex Tx~p Arg Glu Phe Cys Thr Lys Gln Lys Arg G1u Thr Met Trp Glu ggt ata tat aga gta atc cgg aaa gca gcc cca caa tac gaa gac cag 1008 ... Gly Ile Tyr Arg Va1 I1e Arg Lys A1a Ala Pro G1n Tyr Glu Asp G1n cta ctg agc cag aac gga cag asc cta aat ccg gaa gat tcg gta aaa 105f Leu Leu Ser G1n Asn Gly Gln Asn Leu Asn Pro Glu Asp Ser Val Lys ctg ctt ggt gca acc ttc ttt.cca gac gac tgt acc aca gat gat acg 1X04 Leu Leu Gly Ala Thr Phe Phe Pro Asp Asp Cys Thr Thr Asp Asp Thr gta gaa cat acs aas ata agg gat gat gcg aag gta acc aac ata gaa 11.52 Va1 Glu His Thr Lys Ile Arg Asp Asp Ala Lys Va1 Thr Asn Ile Glu gtc gat gat acg gaa gat gac cct cca ata acc gaa gcc gag atg atc 1200 Va1 Asp Asp Thx G1u Asp Asp Pro Pro Ile Thr Glu Ala Glu ,Met Ile 68.78 cac gca gca cga tca ttt aac aaa aag aaa gca cct gga aaa gat gga 1248 His Ala A1a Arg Ser Fhe Asn Lys Lys Lys Ala Fro Gly Lys Asp Gly ttc act gca gat atc tgt ttt aac gca ata aaa gcc aac agt gag aca 1296 Phe Thr Ala Asp Ile Cys Phe Asn Ala Zle Lys Ala Asn Ser Glu Thr ttc ctg gaa ata ata aat aag tgt atg gaa ttg tca tgg tat ccg aca 1344 Fhe Leu Glu Zle Zle Asn Lys Cys Met Glu Leu Ser Trp Tyr Pxa Thx teg tgg aag agt gca ttc ata cta att ttg cgg asa cea aat aaa get 1392 Sex Tip Lys Ser AIa Phe I1e Leu Ile Leu Arg Lys Pro Ash Lys Ala agt tac..gaa aac cca aga.gca tac aga ccg ata gga ttg ctg cca gtg 1440 Ser Tyr Glu Asn Pro Arg A-la Tyr Arg Pro I1e G1y Leu Leu Fro Va1 tta ggg aag att atg gaa aaa ata ata gta aaa aga ata aga tgg cac 1488 Leu Gly Lys I1e Met G1u Lys Ile Ile Ya1 Lys Arg Ile Arg Trp His aca gca ccg aaa ttg aat cca cga cag tac ggg ttt aca cca cag cgc 1536 Thr Ala Fxo Lys Leu Asn Px~o Axg Gln Tyr Gly the Thx fro Gln Axg 50o soy 510 tgt acg gag gac tcc ctc tat gat cta atg aca cac atc atg aac aac x584 ... Cys Thr Glu Asp Sex Leu Tyx Asp Leu Met Thx Hips Zle Met Asn Asn 51.5 520 525 tta aca cag aga aag ata aac att gtt gtg tcg ttg gac ata gag ggg 1632 Leu Thx Gln Ax'g Lys Zle Asn Zle Val Val Sex Leu Asp Il,e Glu Gly gcc ttc gac agc gcg tgg tgg ccc gtg ttg aag tgt aga tta aaa gas 1680 Ala Phe Asp Sex Ala Txp Trp Fxo Val. Leu Lys Cys Axg Leu Lys Glu cta aaa tgt cct agg aat ctc agg aaa ata gta gac agc tac cta gac 1728 Leu Lys Cys Fxo Axg Asn Leu Axg Lys Zl.e Val Asp Sex Tyx Leu Asp aat aga cag gtt gaa atg aat tat gcg gga gcc tca tac agt saa ata 1776 Asr~ Arg G1n Val G1u Met Asn Tyr Ala Gly A1a Ser Tyr Ser Lys I1e acg acc aaa gga tgt gta cag gga tcc atc agc ggc cca ,gtt ttc tgg X82 Thr Thr Lys G1y Gys Val G1n Gly Ser Ile Ser Gly Pro Val Phe Txp sat ata ata ata gac cca ctt ata gat cga ttg gca gac aaa sac att 1.8'72 Asn Ile Ile Ile Asp Pro Leu I1e Asp Arg Leu Ala Asp Lys Asn Zle tat tgt cag gcg ttc gcg gac gat gtg gtc ctg gtt ttc gaC gga gac 1920 'fyr Cys Gln Ala Phe Ala Asp Asp Val Va1 Leu Val Phe Asp Gly Asp :.::..
r.;,1.~
w <210> b8 <2~.X> 6~0 <21.2> PRT
<21.3> Samia Cynthia <400? 58 Gly Thr Vai, Lys Ala Ala Ile Ala Zle Phe Asp Glu Axg Leu Sex Zle 1 5 l.0 l.5 Ile Glu His Ser Gln Leu Thr Thr His Asn Va1 Ala Val Ala Thr Leu Asp Thr G1y His Thr Lys Ile Gly I1e Ile Ser Ya1 Tyr Phe Glu Asp Thr Lys Pro Leu Thr Pro Tyr Leu Asp Lys Ile Lys Thr Ile Ile G1u Lys Leu Asp Thr Lys Lys Yal I1e Ile Gly Gly Asp Val Asn Ala Trp Ser Ser Trp Trp G1y Ser Arg Arg Glu Asn Asp Arg G1y Glu Glu rte Thr Gly Trp Ile Thr Glu Glu Gly Tyr His Va1 Leu Asn GIn Gly Ser Ile Pro Thr Phe Tyx Thr lle Arg Gly G1y Lys G1u Tyr Gln Ser Cys 115 Zzo las Val Asp Zle Thr Ile Cys Ser Asp Gln Ile Leu Sex Lys Zle Arg Gly _ 68.78 Trp Thr I1e Asp Gln G1u Leu Va1 Asn Ser Asp His Asn Cys Ile Lys Phe G1n I1e Ile Thr G1y G1u Leu Lys Thr Arg Thr GIn Lys Lys Thr Thr Arg I1e Tyr Lys Thr Gln Lys Ala Lys Trp Thr Asp Phe Arg Leu 180 185 i90 Thr Ile Ser Lys Lys Zle Glu Ala Asn Lys IIe Thr Thr Asn A1a IIe Arg Glu Zle Thr Glu Thr GIn Glu Leu Asp Sex Zle Ile Glu Lys Tyr Asn Asn zl.e ZIe Than Gl.n AIa Cys Lys Leu Asn IIe Pro Lys Zle Asn Ax~g Asn Thr Thr Asn Lys Gln Lys As>« Asn Leu Pro Tarp Tarp Thr VaI

Glu Leu Glu G1u Glu Lys Arg Arg Val Leu Thr Met Lys Arg Arg Ile Arg Cys Ala Ala Gln G1n Arg Lys Ser His Val Va1 Glu G1u Tyr Leu Lys Arg Lys Glu Lys Tyr Glu Leu Ala Ala Asn Glu Ala Arg Thr Asn ,. Ser Trp Arg Glu Phe Cys Thr Lys Gln Lys Arg G1u Thr Met Trp GIu Gly I1e Tyr Axg Val Ile Arg Lys Ala A1a Pro G1n Tyr GIu Asp Gln Leu Leu Ser Gln Asn Gly Gln Asn Leu Asn ~'ro GIu Asp Ser VaI Lys Leu Leu Gly AIa Tour Phe ~'he ~'ro Asp Asp Cys Thr Thr Asp Asp Thr Val Glu His Thr Lys zle Arg Asp Asp Ala Lys Val Thr Asn Ile GIu 3?0 3?5 380 Vat. Asp Asp Thx GIu Asp Asp Qro Pro Zle Than Glu AIa Glu Met Zle 385 390 3g5 400 69/.'T8 His A1a AIa Arg Ser Phe Asn Lys Lys Lys Ala Pro Gly Lys Asp G1y Phe Thr Ala Asp Ile Cys Phe Asn A1a Ile Lys Ala Asn Ser G1u Thr Phe Leu Glu Ile Ile Asn Lys Cys Met Glu Leu Sex Trp 'I~r Pro Thr Sex Txp Lys Sex Ala Phe Zle Leu lle Leu Axg Lys Pxo Asn Lys A1a 454 455 4so Sex Tyx Glu Asn Pxo Axg Ala Tyx Axg Pxo Zle Gly Leu Leu Pxo Val 465 470 475 4s0 w Leu Gly Lys Il.e Met Glu Lys IIe Ile VaJ. Lys Axg Ile Axg Txp Hzs Thr Ala Pro Lys Leu Asn Pro Arg Gln Tyr Gly Phe Thr Pro Gln Axg Cys Thr G1u Abp Ser Leu Tyr Asp Leu Met Thr His Ile Met Asn Asn Leu Thr Gln Arg Lys I1e Asn Ile Val Va1 Ser Leu Asp I1e G1u Gly A1a Phe Asp Ser A1a Trp Trp Pro Val Leu Lys Cys Arg Leu Lys G1u Leu Lys Cys Pro Arg Asn Leu Axg Lys Xle Val Asp Sex Tyx Leu Asp 565 5'I0 575 Asn Axg Gln Val Glu bet Asn Tyr Ala Gly Ala Sex Tyx Ser Lys Zle 580 585 59a Thr Txzr Lys Gly Cys Val Gln Gly Ser Ile Sex Gly Pxo Val Phe Tx~p Asn zle Ile Zle Asp Pxo Leu Ile Asp Axg Leu Ala Asp Lys Asn Ile fi10 6x5 620 Tyx Cys GJ.z~ Ala Phe Ala Asp Asp Val Val Leu Val Plte Asp Gly Asp <2~.0> 59 <211> 164 <212> PRT
<213> Samia Cynthia <400? 59 G1y Thr Val Lps A1a A1a I1e A1a Ile Phe Asp Glu Art Leu Ser IJ,e 1 5 10 is Ile G1u His Ser Gln Leu Thr Thr His Asn Va1 Ala Val Ala Thr Leu Asp Thr Gly His Thr Lys I1e G1y Ile Ile Ser Yal Tyr Phe G1u Asp Thx Lys Pro Leu Thr Pro Tyr Leu Asp Lys Tle Lys Thr Ile Ile G1u .. 50 55 80 Lys Leu Asp Thr Lys lys Val Ile Ile Gly Gly Asp Val Asn A1a Trp 65 70 75 $0 Sex Sex ~'xp Tx~p Gly Seer Arg Axg Glu Asn Asp Axg Gly Glu Glu Ile Thr Gly Trp Ile Thr Glu Glu Gly Tyr Hiss Val Leu Asn GJ,n Gly Ser I1e Pro Thr Phe Tyr Thr Ile Arg Gly G1y Lys Glu Tyr Gln 5er Cys Va1 Asp I1e Thr I1e Cys Ser Asp Gln Ile Leu Ser Lys I1e Arg Gly .~ Trp Thr Ile Asp Gln Glu Leu Val Asn Ser Asp His Asn Cys Ile Lys x45 16o ms 1so Phe G1n I1e I1e <210>f 0 <211>1.9U8 <212>DNA

<213>Samia cyz~thi.a <220>
<221? GDS
<222> (1).. (19U$) «00> 60 ggg act gtt aaa gcg get atc ate gtt ttc ggt gac cta ctg aac gtc 48 G1y Thr Va1 Lys Ala Ala Ile Ile Va1 Phe Gly Asp Leu Leu Asn Yal att cat gac ect cag ctg gtg acc gag acc gag gcg get gtc ttg ctg 96 Zle His Asp Pro G~ Leu Val Th~r Glu Thx Glu Al.a A1a Va1 Leu Leu gaa ggg gga ggc ctg aaa ctt gga gta gtg tcc gtc tac ctt gag gga 144 Glu Gly Gly Gly Leu Lys Leu Gly Val Val Sex Val, Tyx l.eu Glu Gly 3fi 40 45 aac aat gac atc gag ccc tae cta cat agg ata aag ctg acc tgc gga x92 Asn Asn Asp Il,e G1u Pxo Tyr Leu His Arg Z~.e Lys Leu Thr Cys GZy ~.. ;~," 50 55 60 aaa ctc gac acc gga cac ctc atc gtg gca ggg gat gta aat gcc tgg 240 Lys Leu Asp Thx GIy His Leu IIe Va1 Ala Gly Asp Yal Asn A1a Trp agc cac tgg tgg ggg agc agc tcg gag gac ggc aga ggc gta gcg tac 288 Ser His Trp Trp Gly Ser Ser Ser Glu Asp G1y Arg Gly Val Ala Tyr cat tcc ttc ctg aac gaa atg gga ctg gaa atc cta aac act gga acc 338 His Ser Phe.Leu Asn Glu Met G1y Leu Glu Ile Leu Asn Thx Gly Thr 144 105 1 l.0 act cca aca ttc gag gtc ctt agg ggt gac agg ttg tac aca agt tgc 3$4 Tour Pro Thx P~,e Glu Val Leu Axg Gly Asp Axg Leu Tyx Thr Ser Cys r~ 116 120 X25 att gat gta aca gcg tgc agc ggg tea ctc cta ggg agg gtc gag gac 432 Ile Asp Val Thx Al.a Cys Sex Gly Sex Leu Leu Gly Axg Val Glu Asp tgg agg gtc gat cga gga ctg aca act tcc gac cat aat act att acg 480 Txp Axg Val Asp A~rg Gly Leu Tb~x~ Th~r Sex ,Asp ~xs Asn Thr I1e Thr 145 x50 ~ 155 160 ttt tca ctg cgc gta gag agg gca ctg aca cct ttg ccc tca atc acc 528 Phe Sex Leu Axg Yal Glu Arg Ala Leu Thr Pro Leu Pro Ser Ile Thr acg cgc aag tat aac acg aag aaa gca aat tgg acg gac ttc ggt cga 576 Tbur Ax~g Lys Tyr Asn Thr Lys Lys Ala Asn Trp Thr Asp Phe Gly Arg ctc ttt ggt gcc atg ttg gag gaa aat aac atc tcg gag tcc att agg fi24 Leu Phe Gly Al.a Met Leu Glu GIu Asn Asn IIe Ser GXu.Se~r Zle Axg x95 200 205 aaa tcc aga acc cca gaa gat ctt gag atc gca ata caa gcc tao acc 672 Lys Ser Arg Thr Pro Glu Asp Leu G1u Ile AIa Ile Gln A1a Tyr Thr agc gca atc cat gaa gCC tgt tCC aCa aCC atC CCa Cag gCa agB GCa 720 Ser Ala I1e His Glu Ala Cys Ser Thr Thr Ile Pro Gln Ala Arg Pro tgg aaa gga aat CGG gtt CCt CCC tgg tgg aat agg Caa ttg gaa gaC '~$$
Trp Lys G1y Asn Pro Val Pro Pro Trp Trp Asn.Arg Gln Leu Glu Asp .'..: 245 250 255 GtC aag aga g&a gtt Ctt aga agg aaa Gga aga ata agg aaC gCC gcg $16 Leu Lys Arg GIu Val Leu Axg Arg Lys Axg Axg lle Axg Asz~,A,Ia Ala 260 265 2?0 ecg tta.cgt aaa eaa ttt gtt atc gat gag tac ctc cga get aaa ctg 864 Pro Leu Arg Lys Gln Phe Val Z~.e ,Asp Gl.u Tyr Leu Arg Ala Lys Leu gag tae get geg gag gce agg aaa gee caa aca gag agt tgg aag gag 912 Glu Tyr Ala Ala G].u Ala Arg Lys Ala Gln Thr GIu Ser Trp Lys Glu ttc tgc tcg acg caa aag aga gag agt atg 'egg gac aaa atc tac agg 960 Phe Cys Ser Thr Gln Lys Arg GIu Ser Met Trp Asp Lys Ile Tyr Arg gtc atc agg aag tcg tca agg agg cag gag gac gtg ctC ctg aga gac x008 Val I1e Arg Lys Ser Ser Arg Arg G1n G1u Asp YaI Leu Leu Arg Asp cag gta ggt aac acc ttg tcc ccc caa Gag tcg gca gaa ctt ctc gcc x056 G1n Va1 G1y Asn Thr Leu Ser Pro Gln Gln Ser A1a Glu Leu Leu Ala 340 34s 3so aaa tct ttc tac cca gac gac ttg gaa get tCC gae gac caa tac cac X,04 Lys Ser Phe Tyr Pro Asp Asp Leu G1u A1a 5er Asp Asp Glz~ Tyx Hxs aag gat ctc agg aca acg gta gaa gtt ggg cag tct ggg ttt ttc tca 1X52 Lys Asp Leu Arg Thr Thr Yal Glu Va1 Gly Gln Ser Gly Phe Phe Sex 370 375 3$0 73f 78 gag gac gac ccc ctt ata aca tct acg gaa ctg gac aca gtg ctt agg 1200 Glu Asp Asp Pro Leu I1e Thr Ser Thr G1u Leu Asp Thr Val Leu Arg get caa aat ccg aaa aaa gca ccg ggt cca gat ggt ctg acc tcg gat 1248 A1a Gln Asn Pra Lys Lys Ala Pro G1y Pra Asp G1y Leu Thr Ser Asp ata tgt acg gcg gca atc aac tgt gat cgg gtg gtg ttc cta gcg ctg 1296 Ile Cys Thr Ala Ala Ile Asn Cys Asp Arg Va1 Val Phe Leu A1a Leu gcc aac aag tgc cta gcg ctg tca cat ttc cec cga gca tgg aag gta 1344 Ala Asn Lys Cys Leu Ala Leu Sex His Phe Pro Arg Ala Trp Lys Val 435 44a 445 gcg cac gtc att atc ctg aga aaa ccg ggc aaa gat gae tat acc agc 1392 Ala His Val lle Ile Leu Ax~g Lys Px'o Gly Lys Asp Asp Tyx Thr Ser cct aaa tcc tat aga cct ata ggc ctt cta cca gtc cta ggg aaa att 1440 Pro Lys Sex Tyr Arg Pro Ile Gly Leu Leu Pro Va1 Leu G1y Lys I1e gtg gaa aaa ctg atc ata ggt cgc ctc caa tgg cac att atg cca gcc 1488 Val Glu Lys Leu Ile I1e Gly Arg Leu Gln Trp His I1e Met Pro A1a tta aac cgt agg caa tac ggt ttt atg ccg caa agc agc acc gag gat 1536 Leu Asn Arg Arg Gln Tyr Gly Phe Met Pro Gln Ser Ser T'hr Glu Asp w 500 505 510 \., gcc ctc tat gac tta gta cac cat att agg aca gag ctg cag gac aaa 1.584 Ala Leu Tyr Asp.Leu Val His His I1e Arg Thr G1u Leu Gln Asp Lys aag tca gtg ctc gtc ata tca ctg gac ata gag gga gcc ttc gac sac X632 Lys Ser Val Leu Va1 Ile Ser Leu Asp Ile Glu Gly Ala Phe Asp Asn gca tgg tgg cca get cta aaa ctt caa ttg cag gag agg agg att cct X680 A1a Trp Trp Pro Ala Leu Lys Leu Gln Leu Gln Glu Arg Arg Zle Pro cga aac cta tac aag ctg gtg gac tcg tac ctt aga gac cgc aag atc X728 Arg Asn Leu xyr Lys Leu Val Asp Ser Tyr Leu Arg Asp Arg Lys lle 5s5 s7o s7s 74/.78 acg gtc aac tat gca cga gcg aca tac gag aag ggt act acc aaa ggt lTl6 Thr Val Asn Tyr Ala Arg Ala Thr Tyr Glu Lys Gly Thr Thr Lys Gly tgt gtc caa ggt tcc ata agt gga ccc acc ttt tgg aac atc ata ctC 1$24 Gys Va], G~,n G~,y Sex Zle Se:~ Gly Pro Thx~ Phe Trp Asn ZZe Ile Leu gac ccg ttg ttg caa cta ctt gca agg g~a ggg atc cac get caa get 1$72 Asp Pro Leu Leu Gln Leu Leu Ala Arg Glu Gly Ile His Ala Gln A~.a ttt gca gac gac gtg gtc ctg gtt ttc gac gga gac x908 Phe A1a Asp Asp Val Va1 Leu Yal Phe Asp Gly Asp (,;.;, . . 625 630 635 <210> 61 <211> 636 <212> PRT
<213> Samia Cynthia <400> 61 Gly Thr Val Lys Ala Ala lle Ile Val Phe Gly Asp Leu Leu Asn Val Ile His Asp Px~o Gln Leu Val Thx Glu T~nur Glu Ala Al.a Val Leu Leu G~,u Gly Gly Gl.y Leu ~ys Leu Gly Val, Val Ser Val Tyr Leu Glu Gly ,~ 35 40 ~5 Asn Asr~ Asp Ile G1u Pro Tyx Leu His Arg Z~.e Lys Leu Thr Gys G1y Lys Leu Asp Thr Gly His Leu Ile VaI Ala Gly Asp Val Asn Ala Trp Ser His Trp Trp Gly Ser Ser Ser Glu Asp Gly Arg Gly Va1 A1a Tyr His Ser Phe Leu Asn Glu Met G1y Leu Glu Ile Leu Asn Thr G1y Thr Thr Pro Thr Phe G1u Yal Leu Arg Gly Asp Arg Leu Tyr Thr Ser Cys I1e Asp Yal Thr Ala Cys Ser Gly Ser Leu Leu Gly Arg Yal G1u Asp 75/.7$

Trp Arg Val Asp Arg Gly Leu Tar Thr Sex Asp Hxs Asn Thur lle Thr P~ae Ser Leu Arg VaZ G3.u Arg Ala leu Thr Pxa Leu Pro Ser Zle Thr Thr Axg Lys Tyx Asn Thx~ LYs LYs Ala Asn Trp Thr Asp Phe G1.Y Arg Leu Phe Gay Ala diet Leu G1u Glu Asn Asn Ile Ser Glu Ser Ile Arg ,,, Lys Ser Arg Thr Pro Glu Asp Leu Glu I1e A1a Ile Gln A1a Tyr Thr Ser A1a Ile His G1u Ala Cys Ser Thr Thr Ile Pro G1n Ala Arg Pra Trp Lys G1y Asn Pra VaI Pro Pro Trp Trp Asn Arg G1n Leu Glu Asp Leu Lys Arg G1u Val l,eu Arg Arg Lys Arg Axg Ile Arg Asz~ Ala Ala Pro l,eu Arg Lys . Gln Phe Val IJ.e Asp Glu Tyr ~,eu Arg Ala I,ys Leu GZu Tyr Ala A~.a Glu Ala Arg Lys Ala Gln Thx Glu Ser Trp Lys G1u Phe Cys Sex Thx~ Gl,n LYs Ax~g Glu Ser Met Trp Asp Lys Zle Tyr Arg 305 3x0 31.5 320 Val lle Axg Lys Sex' Ser Arg Axg Gln G1u Asp Val Leu Leu Arg Asp GJ,n Val. G~,y Asn Thr Leu Ser Pro Gln Gln Ser Ala Glu Leu L~u A1a LYs Sex Phe Tyr Pro Asp Asp Leu Glu A1a Ser Asp Asp Gln Tyr His Lys Asp Leu Arg Thr Thr Val Glu Val G1y Gln Ser Gly Phe Phe Ser Glu Asp Asp Pro Leu Iie Thr Ser Thr G1u Leu Asp Thr Val Leu Arg A1a Gln Asn Pro Lys Lys A1a Pro Gly Pro Asp G1y Leu Thx Ser Asp 405 410 41,5 Ile Cys Thr Ala Ala Ile Asn Cys Asp Arg Va1 Val Phe Leu Ala Leu Ala Asn Lys Cys Leu Ala Leu Ser His Phe Pra Arg A1a Trp Lys Val A1a His Ya1 I1e rle Leu Arg Lys Pra G1y Lys Asp Asp Tyr Thr Ser Pro Lys Ser Tyr Arg Pro IIe Gly Leu Leu Pro Va1 Leu G1y Lys Ile VaI GIu Lys Leu Zle Ile GIy Arg Leu Gln Txp His ZIe Met Pro A1a Leu Asz~ Arg Ax~g GZn Tyx Gly Phe Met Pro Glx~ Sex Ser Thr Glu Asp Ala Leu Tyr Asp Leu Val His Hi,s Ile Axg Thr Glu Leu Gln ,Asp Lys Lys Ser Val Leu Va1 Ile Sex Leu Asp Ile G1u G1y A1a Phe Asp Asn Ala Trg 'IYp Pra Ala Leu Lys Leu G1n Leu Gln G1u Arg Arg I1e Pra Arg Asn Leu Tyr Lys Leu Val Asp Ser Tyr Leu Arg Asp Arg Lys Ile Thr Val Asn Tyr AIa Arg Ala Thr Tar Glu Lys Gly Thr Thr Lys Gly Cys VaI Gln Gly Ser T1e Ser Gly Pro Thr Phe Trp Asn I1e ZIe Leu Asp Pro Leu Leu Gln Leu Leu A1a Arg~Glu Gly Ile His A1a G1n Ala Phe AIa Asp Asp VaI VaI Leu Val Phe Asp Gly Asp <210>62 <211>164 <212>PRT

<213>Samia Cynthia <400> 62 G1y Thr Va1 Lys Ala Ala Ile Ile Yal Phe Gly Asp Leu Leu Asn Ya1 1 b 10 15 Ile His Asp Pro Gln Leu Val Thr Glu Thr G1u A1a A1a Va1 Leu Leu G1u Gly G1y Gly Leu Lys Leu G1y Val Val Sex Val Tyr Leu Glu Gly Asn Asn Asp Ile Glu Pro Tyr Leu His Arg Ile Lys Leu Thr Cys Gly Lys Leu ,Asp Thr G.ly Hi.s Leu Ile Val. Ala Gly Asp Val Asn Ala Trp Sex Hxs Trp Trp Gly Ser Ser Ser Glu Asp Gly Arg Gly Val Ala Tyr His Ser Phe Leu Asn Glu Met G1y Leu G1u I1e Leu Asn Thr G1y Thr Thr Pro Thr Phe Glu Val Leu Arg G1y Asp Arg Leu Tyr Thr Ser Cys 115 120 12b I1e Asp Val Thr Ala Cys Ser Gly Ser Leu Leu Gly Arg Va1 Glu Asp '. 130 135 140 Trp Arg Ya1 Asp Arg G1y Leu Thr Thr Ser Asp His Asn Thr Ile Thr 145 lso 155 16a Phe Ser Leu Arg <210> 63 <211> 31 <212> DNA
<213> Artificial. Sequence <220>
<223> Description of Artificial Sequence: artifxcilally synthesized sequence ?8/78 <400> B3 aaaaaaaaca tatgcgtatt gctaagggca g 31 <210> 64 <211> 45 <212> DNA
<213> Artificial Sequence <220>
<223> Description of Artificial Sequence: artificilally synthesized sequence ;" <400> 64 aasaaactcg agttatcgtt ttgtatgaat attaaatagg atggc 45 <210> 66 <211> 75 t212> DNA, <213> Artificial Sequence <220>
<223> Descrxptxoz~ of Artificial Sequence: axtificilally synthesized sequence <400> 65 cagcatccag ggtgacggtg ccgaggatga cctaacctaa cctaacgatg agcgcattgt 60 tagatttcat acacg 75

Claims (25)

1. A DNA derived from a retrotransposon-like sequence present adjacent to a telomeric repeat sequence, wherein the DNA encodes a polypeptide having an activity to cleave a double-stranded DNA
comprising a telomeric repeat sequence.
2. The DNA of claim 2, which is derived from the genomic DNA, of an insect.
3. A DNA selected from the group consisting of (a) to (e):
(a) a DNA comprising a nucleotide sequence selected from the group consisting of SEQ ID NOs: 38, 42, 45, 48, 51, 54, 57 and 60;
(b) a DNA encoding a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 36, 37, 39, 40, 41, 43, 44, 46, 47, 49, 50, 52, 53, 55, 56, 58, 59, 61 and 62;
(c) a DNA encoding a polypeptide that (i) comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 36, 37, 41, 44, 47, 50, 53, 56, 59 and 62, wherein one or more amino acids have been substituted, deleted, inserted and/or added, and (ii) has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence;
(d) a DNA encoding a polypeptide that (i) is encoded by a DNA
hybridizing to a DNA encoding an amino acid sequence selected from the group consisting of SEQ ID NOs : 36, 41, 44, 47, 50, 53, 56, 59 and 62, or a partial sequence thereof, and (ii) has an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence;
and (e) a DNA encoding a fusion polypeptide between the polypeptide encoded by a DNA according to any one of (a) to (d) and another polypeptide.
4. A DNA of any one of claims 1 to 3, which is used to cleave a telomeric repeat sequence in a DNA.
5. The DNA of claim 4, wherein the telomeric repeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA).
6. A polypeptide encoded by a DNA according to any one of claims 1 to 5.
7 . The polypeptide of claim 6, which is used to cleave a telomeric repeat sequence in a DNA.
8. The polypeptide of claim 7, wherein the telomeric repeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA).
9. A vector comprising a DNA of any one of claims 1 to 5, or a transcriptional product thereof.
10. A host cell comprising the vector of claim 9.
11. A method for producing a polypeptide of any one of claims 6 to 8, wherein the method comprises the steps of culturing the host cell of claim 10 and recovering the expressed polypeptide from the host cell, or culture supernatant thereof.
12. A partial polypeptide of a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NOs: 39, 40, 43, 46, 49, 52, 55, 58 and 61.
13. A DNA encoding the partial polypeptide of claim 12.
14. A polynucleotide comprising at least 15 nucleotides complementary to a nucleotide sequence selected from SEQ ID NOs:
15 to 28, 38, 42, 45, 48, 51, 54, 57 and 60 or the complementary strand thereof.
15. A DNA or polynucleotide of any one of claims 1 to 3, 13 or 14, which is used to isolate a DNA encoding a polypeptide having an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence.
16. A method for cleaving a telomeric repeat sequence using a polypeptide of any one of claims 6 to 8, wherein the method comprises the step of contacting the polypeptide with a DNA comprising the telomeric repeat sequence.
17. A method for cleaving a telomeric repeat sequence using a DNA of any one of claims 1 to 5, wherein the method comprises the step of expressing the DNA of any one of claims 1 to 5 under conditions in which the expression product can come into contact with, the DNA
comprising the telomeric repeat sequence.
18. The method of claim 16 or 17, wherein the telomeric repeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA).
19. A transgenic nonhuman vertebrate that harbors a foreign DNA of any one of claims 1 to 5 in an expressible manner.
20. The transgenic nonhuman vertebrate according to claim 19, which is mouse ar rat.
21. An agent for cleaving a DNA containing a telomeric repeat sequence, wherein the agent comprises as an active ingredient a DNA
of any one of claims 1 to 5, or the vector of claim 9.
22. An agent for cleaving a DNA containing a telomeric repeat sequence, wherein the agent comprises as an active ingredient a polypeptide of any one of claims 6 to 8.
23. The agent of claim 21 or 22, telomeric repeat sequence is a repetition of the sequence (5'-TTAGGG/5'-CCCTAA).
24. A method for isolating a DNA encoding a polypeptide having an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence, wherein the method comprises the step of screening for a polynucleotide that hybridizes with at least one probe selected from the DNAs or polynucleotides of claim 15.
25. A method for isolating a DNA encoding a polypeptide having an activity to cleave a double-stranded DNA comprising a telomeric repeat sequence, wherein the method comprises the step of amplifying a DNA using as a primer at least one of the DNAs or polynucleotides of claim 15.
CA002410974A 2000-05-16 2000-11-22 Polypeptides cleaving telomeric repeats Abandoned CA2410974A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2000148611 2000-05-16
JP2000-148611 2000-05-16
PCT/JP2000/008242 WO2001088149A1 (en) 2000-05-16 2000-11-22 Polypeptides cleaving telomeric repeats

Publications (1)

Publication Number Publication Date
CA2410974A1 true CA2410974A1 (en) 2002-11-15

Family

ID=18654612

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002410974A Abandoned CA2410974A1 (en) 2000-05-16 2000-11-22 Polypeptides cleaving telomeric repeats

Country Status (2)

Country Link
CA (1) CA2410974A1 (en)
WO (1) WO2001088149A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CL2007002502A1 (en) 2006-08-31 2008-05-30 Hoffmann La Roche VARIANTS OF THE SIMILAR GROWTH FACTOR TO HUMAN INSULIN-1 (IGF-1) PEGILATED IN LISIN; METHOD OF PRODUCTION; FUSION PROTEIN THAT UNDERSTANDS IT; AND ITS USE TO TREAT ALZHEIMER'S DISEASE.
BRPI0715754A2 (en) 2006-08-31 2013-07-09 Hoffmann La Roche Method for the production of insulin-like growth factor
US9051610B2 (en) * 2008-05-19 2015-06-09 Celish Fd, Inc. RNA in situ hybridization

Also Published As

Publication number Publication date
WO2001088149A1 (en) 2001-11-22

Similar Documents

Publication Publication Date Title
LUO et al. The human Prp8 protein is a component of both U2-and U12-dependent spliceosomes
US6410693B1 (en) Inhibitors of the JNK signal transduction pathway and methods of use
Platero et al. Functional analysis of the chromo domain of HP1.
Dombrádi et al. Drosophila contains three genes that encode distinct isoforms of protein phosphatase 1
US6627405B1 (en) 53BP2 complexes
KR100581990B1 (en) Vertebrate telomerase genes and proteins and uses thereof
Meuwissen et al. Human synaptonemal complex protein 1 (SCP1): isolation and characterization of the cDNA and chromosomal localization of the gene
Guthridge et al. FIN13, a novel growth factor-inducible serine-threonine phosphatase which can inhibit cell cycle progression
Dosé et al. Cloning and chromosomal localization of a human class III myosin
Riera et al. Maize protein kinase CK2: regulation and functionality of three β regulatory subunits
US6586577B2 (en) Telomere repeat binding factors and diagnostic and therapeutic use thereof
US6521412B1 (en) HsReq*1 and hsReq*2proteins and use thereof to detect CDK2
AU784008B2 (en) Card proteins involved in cell death regulation
CA2410974A1 (en) Polypeptides cleaving telomeric repeats
Yamasu et al. Molecular Cloning of a cDNA that Encodes the Precursor to Several Exogastrula‐inducing Peptides, Epidermal‐growth‐factor‐related Polypeptides of the Sea Urchin Anthocidaris crassispina
US7189569B2 (en) Modulation of cell division by an early mitotic inhibitor protein
US6255095B1 (en) Human diacylglycerol kinase iota
US6991909B2 (en) Enkurin and uses thereof
US20040242468A1 (en) Gene involved in mineral deposition and uses thereof
US20020090706A1 (en) Human RRN3 and compositions and methods relating thereto
CZ20014480A3 (en) Head trauma induced cytoplasmatic calcium binding protein
EP1162460A1 (en) Mammalian Suv39h2 proteins and isolated DNA molecules encoding them
EP1090987A1 (en) Cell cycle regulatory factor
JP2003144168A (en) Cell cycle-relating protein having nuclear export function or nucleus-cytoplasm transport function
KR20030064271A (en) Nucleic acids encoding a novel regulator of G protein signaling, RGS18, and uses thereof

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued