AU2490801A - Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors - Google Patents

Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors Download PDF

Info

Publication number
AU2490801A
AU2490801A AU24908/01A AU2490801A AU2490801A AU 2490801 A AU2490801 A AU 2490801A AU 24908/01 A AU24908/01 A AU 24908/01A AU 2490801 A AU2490801 A AU 2490801A AU 2490801 A AU2490801 A AU 2490801A
Authority
AU
Australia
Prior art keywords
fiber
polypeptide
protein
adenovirus
plasmid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU24908/01A
Inventor
Glen R. Nemerow
Daniel J. Von Seggern
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Novartis AG
Scripps Research Institute
Original Assignee
Novartis AG
Scripps Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novartis AG, Scripps Research Institute filed Critical Novartis AG
Priority to AU24908/01A priority Critical patent/AU2490801A/en
Publication of AU2490801A publication Critical patent/AU2490801A/en
Priority to AU2004202701A priority patent/AU2004202701A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Medicines Containing Material From Animals Or Micro-Organisms (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Description

AUSTRALIA
PATENTS ACT 1990 DIVISIONAL APPLICATION NAME OF APPLICANTS: Novartis AG AND The Scripps Research Institute ADDRESS FOR SERVICE: DAVIES COLLISON CAVE Patent Attorneys 1 Little Collins Street Melbourne, 3000.
INVENTION TITLE: "Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors" The following statement is a full description of this invention, including the best method of performing it known to us:
I-
PACKAGING CELL LINES FOR USE IN FACILITATING THE DEVELOPMENT OF HIGH- CAPACITY ADENOVIRAL VECTORS This is a divisional of Australian patent application No. 46241/97, the disclosure of which is included herein in its entirety by way of reference.
This invention was made with U.S government support under NIH Grant No. HL 54352. The government has certain rights in the invention.
The present invention relates to gene therapy, especially to adenovirus-based gene therapy. In particular, novel packaging cell lines are disclosed, for use in facilitating the development of high-capacity vectors. High-capacity adenovirus vectors are also disclosed herein, as are related compositions, kits, and methods of preparation and use of the disclosed vectors, cell lines and kits.
Enhanced transfer of DNA conjugates into cells has been achieved with adenovirus, a human DNA virus which readily infects epithelial cells (Horwitz, "Adenoviridae and their replication", in Virology, Fields and Knipe, eds., Raven Press, NY (1990) pp. 1679-1740).
S: Although adenovirus-mediated gene therapy represents an improved method of DNA transfer into cells, a potential limitation of this approach is that adenovirus replication results in disruption of the host cell. In addition, adenovirus also possesses oncogenic properties including the ability of one of its proteins to bind to tumor suppressor gene products. The use of so-called replication defective strains of adenovirus (which typically possess E 1A and/or EIB deletions that render the virus unable to replicate in host cells) is in principle more suitable S for in vivo therapy; however, the potential of co-infection of epithelial cells with wild-type strains of virus resulting in transactivation of the recombinant virus may represent a significant safety concern for in vivo applications. Furthermore, it is not yet known which recombinant adenoviruses are capable of integrating their genome into host cell DNA allowing for long-term stable expression of any foreign genes they may be transporting.
Another undesirable aspect of using intact or replication-competent adenovirus as a gene transfer means is that it is an oncogenic virus whose gene products are known to interfere with the function of host cell tumor suppressor proteins as well as immune recognition molecules, such as the major histocompatibility complex (MHC). In addition, pre-existing circulating antibodies to adenovirus may significantly reduce the efficiency of in vivo gene delivery. Lastly, only a foreign gene of 6 kilobases (kb) or less can be incorporated into the intact adenovirus genome for gene transfer experiments, whereas DNA segments of greater than 15 kb can be transferred using the methods of this invention.
-2- In order to make Ad vectors more replication-incompetent, some investigators have attempted to construct recombinant Ad-derived vectors which have nearly all of their genome deleted, except for portions known to be required for packaging of virus particles. For example, helper-dependent vectors lacking all viral ORFs but including essential cis elements (the inverted terminal repeats ITRs and the contiguous packaging sequence) have been constructed, but the virions package less efficiently than the helper and package as multimers part of the time, which suggests that the virus may "want" to package a fuller DNA complement (see, Fisher, et al., Virology 217:11-22 (1996)). Mitani et al. (PNAS USA 92: 3854-3858 (1995)) also describe a helper-dependent Ad vector that was apparently not completely replication-defective.
Amalfitano, et al. (PNAS USA 93: 3352-3356 (1996)) describe the construction of an Ad packaging cell lines that support the growth of El- and polymerase-deleted Ad vectors, in an effort to block the replication of Ad vectors in vivo. Similarly, Armentano, et al. (Hum.
Gene Ther. 6: 1343-53 (1995)) describes Ad vectors with most but not all of the E4 sequence deleted therefrom. However, since such a small amount of genetic material is deleted from the vectors, their ability to transport therapeutic sequences is rather limited.
In view of the aforementioned problems, the design and construction of the withindisclosed packaging cell lines and systems provides a novel and elegant solution, as described further herein. The use of the recombinant sequences and vectors of this invention to mediate the transfer of foreign genes into recipient cells both in vitro and in vivo overcomes the limitations of the above-described gene transfer systems. This invention utilizes recombinant constructs which duplicate the cell receptor binding and DNA delivery properties of intact adenovirus virions and thus represents an improved method for gene therapy as well as for antisense-based antiviral therapy.
In contrast to the disadvantages of using intact adenovirus, modified adenovirus vectors requiring a helper plasmid or virus, or so-called replication-deficient adenovirus, the use of recombinant adenovirus-derived vectors according to the present invention provides certain advantages for gene delivery. First, the Ad-derived vectors of the present invention possess all of the functional properties required for gene therapy including binding to epithelial cell receptors and penetration of endocytic vesicles. Therapeutic viral vectors of the present -3invention may also be engineered to target the receptors of and achieve penetration of nonepithelial cells; means of engineering viral vectors to accomplish these ends are described in detail hereinbelow.
Second, the vectors of the present invention have deletions of substantial portions of the Ad genome, which not only limits the ability of the Ad-derived vectors to "spread" to other host cells or tissues, but allows significant amounts of "foreign" (or non-native) nucleic acids to be incorporated into the viral genome without interfering with the reproduction and packaging of the viral genome. Therefore, the vectors of the present invention are ideal for use in a wide variety of therapeutic applications.
Third, while the vectors disclosed herein are safe for use as therapeutic agents in the treatment of a variety of human afflictions, they do not require the presence of any "helpers" S for propagation and packaging, largely because of the novel cell lines in which they are reproduced. Such cell lines referred to herein as packaging cell lines comprise yet another aspect of the invention.
To reduce the frequency of contamination with wild-type adenovirus, it is desirable to improve either the viral vector or the cell line to reduce the probability of recombination. For example, an adenovirus from a group with less homology to the group C viruses may be used to engineer recombinant viruses with little propensity for recombination with the Ad5 sequence in 293 cells. Similarly, an epithelial cell line 293 or another may be prepared according to within-disclosed methods which stably expresses adenovirus proteins or polypeptides from Ad3 and/or proteins or polypeptides from another non-group-C or group C serotype; such a cell line would is useful for supporting adenovirus-derived viral vectors bearing deletions of regulatory and/or structural genes, irrespective of the serotype from which such a vector was derived.
It is also contemplated that the constructs and methods of the present invention will support the design and engineering of chimeric viral vectors which express amino acid residue sequences derived from two or more Ad serotypes. Thus, unlike methods and constructs available prior to the advent of the present disclosure, this invention allows the greatest possible flexibility in the design and preparation of useful viral vectors and cell lines which support their construction and propagation all with a decreased risk of recombining with wild-type Ad to produce potentially-harmful recombinants.
-4- In part, the present invention discloses a simpler, alternative means of reducing the recombination between viral and cellular sequences than those discussed in the art. One such means is to increase the size of the deletion in the recombinant virus and thereby reduce the extent of shared sequences between that virus and any Ad genes present in a packaging cell line the Ad5 genes in 293 cells, or the various Ad genes in the novel cell lines of the present invention.
Deletions of all or portions of structural genes of the adenovirus have been considered undesirable because of the anticipated deleterious effects such deletions would have on viral reproduction and packaging. Indeed, the use of "helper" viruses or plasmids has often been recommended when using Ad-derived vectors containing large deletions in structural protein sequences precisely for this reason.
Contrary to what has been suggested in the art, however, this invention discloses and claims the preparation, propagation and use of recombinant Ad-derived vectors having deletions of all or part of various gene sequences encoding Ad structural proteins, both as a means of reducing the risk of wild-type adenovirus contamination in virus preparations and as a means of allowing foreign DNA to be packaged in such vectors for a variety of diagnostic S* and therapeutic applications.
Thus, in one embodiment of the present invention, a packaging cell line wherein DNA sequences encoding one or more adenovirus regulatory polypeptides and DNA sequences encoding one or more adenovirus structural polypeptides have been stably integrated into the cellular genome is disclosed.
Thus, in a further embodiment of the present invention, a packaging cell line expressing one or more adenovirus structural proteins, polypeptides, or fragments thereof, wherein said structural protein is selected from the group consisting of: a. penton base; b. hexon; c. fiber; d. polypeptide lla; e. polypeptide V; f. polypeptide VI; g. polypeptide VII; h. polypeptide VIII; and i. biologically active fragments thereof is disclosed.
In one variation, the sequences are constitutively expressed; in another, one or more sequences is under the control of a regulatable promoter. In a preferred embodiment expression is constitutive. In various preferred embodiments, the polypeptides expressed by the DNA sequences are biologically active.
In a further and preferred embodiment the packaging cell line of the present invention supports the production of a viral vector. In a preferred embodiment the viral vector is a therapeutic vector.
In one aspect of the present invention, each DNA sequence is introduced into the genome of the within-disclosed cell lines via a separate complementing plasmid. In other embodiments, two or more DNA sequences were introduced into the genome via a single complementing plasmid. In one variation, the complementing plasmid comprises a DNA sequence encoding adenovirus fiber protein, polypeptide or fragment thereof. An example of a useful complementing plasmid according to the present invention is a plasmid having the characteristics of pCLF (for deposit details, see Example 3) In another aspect of the present invention, the complementing plasmid used to transform a cell line of the present invention further comprises a DNA sequence encoding an adenovirus regulatory protein, polypeptide or fragment thereof. In one variation, the regulatory protein is selected from the group consisting ofE1A, ElB, E2A, E2B, E3, E4 and S. LA (also referred to as "the 1OOK protein"); an exemplary complementing plasmid has the characteristics of is pE4/Hygro?? (for deposit details, see Example In another aspect, the complementing plasmid used to transform a cell line of the present invention further comprises a DNA sequence encoding two or more of the above mentioned adenovirus regulatory proteins, polypeptides or fragments thereof.
In one variation, the two or more regulatory proteins, polypeptides or fragments thereof are selected from the group consisting of E1A, ElB, E2A, E2B, E3, E4 and LA (also referred to as "the 100K protein"). In another variation, the structural protein is selected from the group consisting of penton base; hexon; fiber; polypeptide Mla; polypeptide V; polypeptide VI; polypeptide VII; polypeptide VIII; and biologically active fragments thereof.
In one variation of the present invention, a packaging cell line expresses fiber protein.
In one embodiment, the fiber protein has been modified to include a non-native amino acid residue sequence which targets a specific receptor, but which does not disrupt trimer formation or transport of fiber into the nucleus. In another variation, the non-native amino acid residue sequence alters the binding specificity of the fiber for a targeted cell type. In still another embodiment, the structural protein is fiber comprising amino acid residue sequences from more than one adenovirus serotype. As disclosed herein, the nucleotide sequences encoding fiber protein or polypeptide need not be modified solely at one or both termini; fiber protein and indeed, any of the adenovirus structural proteins, as taught herein may be modified "internally" as well as at the termini.
The present invention also discloses a packaging cell line wherein the viral vector produced in that cell line comprises a nucleic acid sequence having a deletion or mutation of a 0 0 DNA sequence encoding an adenovirus structural protein, polypeptide, or fragment thereof. In one variation, the viral vector further comprises a nucleic acid sequence having a deletion or mutation of the DNA sequences encoding regulatory polypeptides EIA and EIB. In another variation, the viral vector further comprises a nucleic acid sequence having a deletion or mutation of a DNA sequence encoding one or more of the following regulatory proteins or V, polypeptides: E2A, E2B, E3, E4, LA, or fragments thereof Yet another variation discloses that a foreign DNA sequence encoding one or more foreign proteins, polypeptides or fragments thereof has been inserted in place of any of the deletions in the therapeutic viral vector. In one embodiment, the foreign DNA encodes a tumor-suppressor protein or a biologically active fragment thereof. In another embodiment, the foreign DNA encodes a suicide protein or a biologically active fragment thereof. As before, cell lines as described herein may be procaryotic or eucaryotic in origin, with mammalian cell lines often being preferred. Epithelia] and non-epithelia] cell lines are useful in the aforementioned variations; some particularly useful cell lines include 293, A549, W 162, HeLa, Vero, 211, and 21 IA cell lines.
The invention further contemplates that the aforementioned cell lines support the production of viral vectors including foreign DNA sequences encoding one or more foreign proteins, polypeptides or fragments thereof has been inserted in place of any structural and/or regulatory proteins (or portions thereof) that have been deleted. Thus, in one embodiment, the foreign DNA encodes a tumor-suppressor protein; a suicide protein; a cystic fibrosis transmembrane conductance regulator (CFTR) protein; or a biologically active fragment of any of them.
Any of the within-disclosed cell lines may have a DNA sequence encoding all or part of a fiber protein including modified or chimeric proteins stably integrated into the genome.
thus, in one variation, the fiber protein has been modified to include a non-native amino acid residue sequence which targets a specific receptor, but which does not disrupt trimer formation or transport of fiber into the nucleus. In one variation, the non-native amino acid residue sequence is coupled to the carboxyl terminus of the fiber. In yet another, the non-native amino acid residue sequence further includes a linker sequence. Alternatively, the fiber protein further comprises a ligand coupled to the linker. A suitable ligand may be selected from the group consisting of ligands that specifically bind to a cell surface receptor and ligands that can be used to couple other proteins or nucleic acid molecules. In one variation, the ligand is selected from the group consisting of ligands that specifically bind to a cell surface receptor and ligands that can be used to couple other proteins or nucleic acid molecules.
S In yet another embodiment, the non-native amino acid residue sequence is incorporated into the fiber amino acid residue sequence at a location other than one of the fiber termini.
Alternatively, the non-native amino acid residue sequence alters the binding specificity of the fiber for a targeted cell type. In other embodiments, the linker sequence alters the binding specificity of the fiber for a targeted cell type. The expressed fiber may, in various embodiments, bind to a specific targeted cell type not usually targeted by adenovirus and/or may comprise amino acid residue sequences from more than one adenovirus serotype.
In various aspects of the present invention, a packaging cell line of the present invention is derived from a procaryotic cell line; in another, it is derived from a eucaryotic cell line. While various embodiments suggest the use of mammalian cells, and more particularly, epithelial cell lines, a variety of other, non-epithelial cell lines are used in various embodiments.
Thus, while various embodiments disclose the use of a cell line selected from the group consisting of 293, A549, W162, HeLa, Vero, 211, and 211A cell lines, it is understood that various other cell lines are likewise contemplated for use as disclosed herein.
The invention further discloses a wide variety of nucleic acid sequences and viral vectors. Thus, in one embodiment, the invention discloses a nucleic acid sequence encoding any one of the aforementioned adenovirus fiber proteins, polypeptides or fragments thereof including, without limitation, those that include deletions or other mutations; those that are chimeric; and those that have linkers, foreign amino acid residues, or other molecules attached for various purposes as disclosed herein. Nucleic acid sequences encoding various other adenovirus structural and/or regulatory proteins or polypeptides are also within the scope of the present invention.
A wide variety of therapeutic viral vectors are also embodiments of the present invention. In one embodiment, a therapeutic viral vector is disclosed which lacks a DNA sequence encoding fiber protein, or a portion thereof. In another variation, a therapeutic viral vector may further or alternatively comprise deletion of a DNA sequence encoding one or i more regulatory proteins, polypeptides, or fragments thereof. In various embodiments, foreign 9**o DNA sequences are inserted in place of the DNA sequence encoding fiber protein in the viral vectors of the present invention. In other embodiments, the therapeutic viral vectors further I. comprise foreign DNA sequences inserted in place of the DNA sequences encoding one or more regulatory proteins, polypeptides, or fragments thereof, and/or one or more structural proteins, polypeptides, or fragments thereof.
The present invention further discloses a number of viral vectors. In one variation, a viral vector comprises a deletion or mutation of a DNA sequence encoding an adenovirus structural protein, polypeptide, or fragment thereof. A vector may further comprise deletion or mutation of the DNA sequences encoding regulatory polypeptides ElA and E1B; and it may still further comprise deletion or mutation of the DNA sequence encoding one or more of the following regulatory proteins or polypeptides: E2A, E2B, E3, E4, L4, or fragments thereof.
In another variation, in a viral vector of the present invention, the structural protein comprises fiber. Any combination of the foregoing is also contemplated by the present invention. The viral vectors of the present invention are suitable for the preparation of pharmaceutical compositions comprising any of the therapeutic viral vectors disclosed herein including combinations thereof are also disclosed herein. A further use of the viral vectors of the present invention is for targeting specific cells in a cell population comprising different cell types.
-9- The invention further discloses complementing plasmids and methods of making same.
In one embodiment, a complementing plasmid comprises a promoter nucleotide sequence operatively linked to a nucleotide sequence encoding an adenovirus structural polypeptide. In one variation, the complementing plasmid comprises pCLF. In another variation, a complementing plasmid further comprises a nucleotide sequence encoding a first adenovirus regulatory polypeptide, a nucleotide sequence encoding a second regulatory polypeptide, a nucleotide sequence encoding a third regulatory polypeptide; or any combination of the foregoing. In still another embodiment, the adenovirus structural polypeptide is selected from the group consisting of penton base; hexon; fiber, polypeptide ma; polypeptide V; polypeptide VI; polypeptide VII; polypeptide VIII; and biologically active fragments thereof.
The present invention also discloses a complementing plasmid comprising a promoter nucleotide sequence operatively linked to a nucleotide sequence encoding an adenovirus structural protein, polypeptide or fragment thereof and a nucleotide sequence encoding an adenovirus regulatory protein, polypeptide or fragment thereof. In one variation, the early region polypeptide is E4; in another, the plasmid comprises pE4/Hygro. In still another variation, the early region polypeptides are El and E4. Complementing plasmids further comprising a nucleotide sequence encoding an adenovirus structural protein, polypeptide or fragment thereof are also contemplated, as are plasmids wherein the promoter nucleotide sequence is selected from the group consisting of MMTV, CMV and E4 promoter nucleotide sequences.
Viral vectors are also disclosed which comprise nucleotide sequences encoding a packaging signal and a foreign protein or polypeptide, wherein the nucleotide sequence encoding an adenovirus structural protein has been deleted. In one variation, the nucleotide sequence encoding the foreign protein or polypeptide is a DNA molecule up to about 3 kb in length; in another, the nucleotide sequence encoding the foreign protein or polypeptide is a DNA molecule up to about 9.5 kb in length; in still another, the nucleotide sequence encoding the foreign protein or polypeptide is a DNA molecule up to about 12.5 kb in length.
Nucleotide sequences of intermediate lengths are also contemplated by the present invention, as are sequences in excess of 12.5 kb.
The invention also discloses viral vectors wherein the sequence encoding a foreign protein or polypeptide is a sequence encoding an anti-tumor agent, a tumor suppressor protein, a suicide protein, or a fragment or functional equivalent thereof. In one variation, nucleotide sequences encoding one or more regulatory proteins have also been deleted from the vector.
In another, the regulatory proteins are selected from the group consisting of ElA, EIB, E2A, E2B, E3, E4, and L4 (100K protein).
In various embodiments, the adenovirus is a Group C adenovirus selected from serotypes 1, 2, 5, or 6; in other embodiments, adenovirus selected from other serotypes are useful as disclosed herein. The invention also discloses useful vaccines comprising a viral vector according to any of the foregoing specifications, and a pharmaceutically acceptable carrier or excipient.
Various useful compositions are also disclosed herein. One embodiment discloses a composition useful in the preparation of recombinant adenovirus viral vectors comprising a cell containing a delivery plasmid comprising an adenovirus genome lacking a nucleotide sequence encoding fiber. In one variation, the cell further comprises a complementing plasmid S" containing a nucleotide sequence encoding fiber, the plasmid being stably integrated into the cellular genome of the cell. In another variation, the delivery plasmid further comprises a nucleotide sequence encoding a foreign polypeptide. In one variation, the delivery plasmid is pDV44, p ElB gal, or p ElsplB.
In another embodiment, the polypeptide is a therapeutic molecule. In yet another, the polypeptide is a therapeutic molecule. Another variation provides that the delivery plasmid further comprises a nucleotide sequence encoding a foreign polypeptide.
Compositions useful in the preparation of recombinant adenovirus viral vectors are also disclosed. In one embodiment, a composition comprises a cell containing a first delivery plasmid comprising an adenovirus genome lacking a nucleotide sequence encoding fiber and incapable of directing the packaging of new viral particles in the absence of a second delivery plasmid; and a second delivery plasmid comprising an adenoviral genome capable of directing the packaging of new viral particles in the presence of the first delivery plasmid.
In another variation, the first and second delivery plasmids interact within the cell to produce a therapeutic viral vector. In yet another, the cell further comprises a complementing plasmid containing a nucleotide sequence encoding fiber, the plasmid being stably integrated into the cellular genome of the cell. In still another, the first or second delivery plasmid -11 further comprises a nucleotide sequence encoding a foreign polypeptide. In various embodiments, the polypeptide is a therapeutic molecule.
Another embodiment discloses a composition as before, wherein the first delivery plasmid lacks adenovirus packaging signal sequences. In another aspect, the second delivery plasmid contains a LacZ reporter construct. Another variation discloses that the second delivery plasmid further lacks a nucleotide sequence encoding an adenovirus regulatory protein. In one variation, the regulatory protein is El. In one embodiment of the above-noted compositions, the complementing plasmid has the characteristics of pCLF.
In another embodiment, a composition is disclosed wherein the first delivery plasmid lacks a nucleotide sequence encoding an adenovirus structural protein and the second delivery plasmid lacks a nucleotide sequence encoding adenovirus El protein. In another, the first delivery plasmid lacks a nucleotide sequence encoding adenovirus E4 protein and the second S delivery plasmid lacks a nucleotide sequence encoding adenovirus El protein. In still another, the cell contains at least one complementing plasmid encoding an adenoviral regulatory protein o. and a structural protein.
In alternative embodiments, the regulatory protein is E4 and the structural protein is fiber; or the regulatory protein is El and the structural protein is fiber. In still another embodiment, the adenoviral regulatory protein and the structural protein are encoded by separate complementing plasmids.
Another variation discloses a composition wherein the cell is selected from the group consisting of 293, A549, W162, HeLa, Vero, 211, and 211A. In another embodiment, the delivery plasmid is DV1, or p E1B gal, or p ElsplB.
Various methods of making and using the vectors, plasmids, cell lines and other compositions and constructs of the present invention are also disclosed herein. The following methods are considered exemplary and not limiting.
Thus, in one variation, the invention discloses a method of constructing therapeutic viral vectors, comprising introducing a delivery plasmid into an Ad fiber-expressing complementing cell line, wherein the DNA sequence encoding Ad fiber protein has been deleted from the delivery plasmid. In one variation, the delivery plasmid further includes a DNA sequence encoding a foreign protein, polypeptide, or fragment thereof. In other embodiments, the delivery plasmid is DV1, p EIB gal, p ElsplB, or similar constructs.
-12- The invention also discloses methods of transforming a pathologic hyperproliferative mammalian cell comprising contacting the cell with any of the vectors described herein. In another embodiment, methods of infecting a mammalian target cell with a viral vector containing a preselected foreign nucleotide sequence are disclosed. One such variation comprises the following steps: infecting the target cell with a viral vector of the present invention the viral vector carrying a preselected foreign nucleotide sequence; and (b) expressing the foreign nucleotide sequence in the targeted cell.
The invention also encompasses mammalian target cells infected with a preselected foreign nucleotide sequence produced by the methods disclosed herein. In one variation, the target cells are selected from the group consisting of replicating, slow-replicating and nonreplicating human cells.
Methods of treating an acquired or hereditary disease are also disclosed. One method comprises administering a pharmaceutically acceptable dose of a viral vector to a target S' cell, wherein the vector comprises a preselected therapeutic nucleotide sequence; and (b) expressing the therapeutic sequence in the target cell for a time period sufficient to ameliorate the acquired or hereditary disease in the cell. Method of gene therapy comprising "i administering to a subject an effective amount of a therapeutic viral vector produced by a packaging cell line of the present invention are also disclosed.
Also contemplated by the present invention are various methods of inhibiting the proliferation of a tumor in a subject comprising administering an effective amount of a therapeutic viral vector of the present invention under suitable conditions to the subject. IN one variation, the gene encodes an anti-tumor agent. In another variation, the agent is a tumor-suppressor gene. In still another embodiment, the agent is a suicide gene or a functional equivalent thereof. In another variation, the vector is administered via intra-tumoral injection.
The invention also discloses systems or kits for use in any of the aforementioned methods. The systems or kits may contain any appropriate combination of the withindescribed vectors, plasmids, cell lines, and additional therapeutic agents as disclosed.
Preferably, each such kit or system includes a quantity of the appropriate therapeutic substance or sequence sufficient for at least one administration, and instructions for administration and use. Thus, one system further comprises an effective amount of a therapeutic agent which enhances the therapeutic effect of the therapeutic viral vector-containing composition.
-13- Another variation discloses that the composition and the therapeutic agent are each included in a separate receptacle or container.
It will also be appreciated that any combination of the preceding elements may also be efficacious as described herein, and that all related methods are also within the scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic diagram of the entire adenoviral E4 transcriptional unit with the open reading frames (ORF) indicated by blocked segments along with the promoter and terminator sequences. The location of primers for amplifying specific portions of E4 are also indicated as further described in Example 1A.
Figure 2 is a schematic map of plasmid pE4/Hygro as further described in Example IB.
Figure 3 is a schematic map of plasmid pCDNA3/Fiber as further described in Example LB. lB.
Figure 4 is a schematic map of plasmid pCLF as further described in Example IB.
Figure 5 is a photograph of a Southern blot showing the presence of intact adenovirus E4 3.1 kilobase (kb) insert in the 211 cell line as further described in Example 1C.
Figure 6 is a photograph of a Western blot showing labeled fiber protein detected under native and denaturing electrophoresis conditions as described in Example 1C. The 293 cells lack fiber while the sublines 21 lA, 21 1B and 211R contain fiber protein detectable in functional trimerized form and denatured monomeric form.
Figure 7 is a schematic map ofplasmid pDEX/E as further described in Example ID.
*Figure 8 is a schematic map of plasmid pE1/Fiber as further described in Example IFl.
Figure 9 is a schematic map of plasmid pE4/Fiber as further described in Example 1F2).
Figure 10 is a schematic illustration of linearized pAElBgal delivery plasmids for use in cotransfection and recombination to form a recombinant adenoviral vector having multiple adenoviral gene deletions. The plasmids and recombination event are more fully described in Example 2A.
Figure 11 is a schematic of plasmid pi 1.3 as further described in Example 2A used in the construction of pDV44 delivery plasmid with plasmid p 8 .2.
-14- Figure 12 is a schematic of plasmid p8.2 as further described in Example 2A used in the construction of pDV44 delivery plasmid with plasmid p 1.3.
Figure 13. Trimeric structure of the recombinant fiber: 293,211A, 211B, or 211R cells as indicated were metabolically labeled with 3 5 S]methionine, soluble protein extracts prepared, and fiber was immunoprecipitated. A portion of the precipitated protein was electrophoresed on an 8% SDS- PAGE gel under either semi-native or denaturing conditions. The positions of trimeric and monomeric fiber are indicated. As a control for electrophoretic conditions, recombinant Ad2 fiber produced in baculovirus-infected cells was run under identical conditions and stained with Coomassie blue.
S Fig. 14. Complementation of a fiber mutant adenovirus by fiber-producing cells: The cell lines indicated (2x106 cells per sample) were infected with the temperature-sensitive fiber mutant adenovirus H5ts142 at 10 PFU/cell and incubated at either the permissive (32.5 0 C, stippled bars) or the restrictive (39.5 C, solid bars) temperature. 48 hours post-infection, virus was isolated by freeze-thaw lysis and 0: yields determined by fluorescent focus assay on SW480 cells. Each value represents the mean of duplicate samples, and the data shown is representative of multiple experiments.
S* Fig. 15. Incorporation of the recombinant Ad5 fiber into Ad3 particles: A) Alignment of the Nterminal (penton base-binding) domains of fiber proteins from several different adenovirus serotypes.
B) Type 3 adenovirus was propagated in 293, 211B, or 211R cells as indicated and purified by two sequential CsCI centrifugations. 10 mg of the purified viral particles was then electrophoresed under denaturing conditions and transferred to a PVDF membrane. Ad5 fiber was detected with a polyclonal rabbit antibody raised against recombinant Ad2 fiber. As a positive control for detection, 400 ng of wild-type Ad2 was run in the lane marked 'Ad2'. Under these conditions, the mobilities of the Ad2 and fibers are indistinguishable and the antibody reacts with both proteins.
Fig. 16. Nuclear localization of the recombinant fiber protein in three packaging cell lines: Cells were grown on 8-well chamber slides, stained with a rabbit anti-fiber polyclonal antibody and visualized with a FITC-conjugated goat anti-rabbit antibody. A) line 211A. B) Line 211B C) Line 211R D) 293 cells (negative control). E) 293 cells infected with Ad.RSVbgal at 1 pfu/cell and stained 24 hour post-infection (positive control). F) Infected cells prepared as in but stained without the primary antibody.
DETAILED DESCRIPTION To reduce the frequency of contamination with wild-type adenovirus, it is considered desirable to improve either the viral vector or the cell line to reduce the probability of recombination. For example, an adenovirus from a group with less homology to the group C viruses may be used to engineer recombinant viruses with little propensity for recombination with the Ad5 sequence in 293 cells. Similarly, an epithelial cell line e.g. the cell line known as 293 may be used or further modified according to withindisclosed methods which stably expresses adenovirus proteins or polypeptides from Ad3 and/or proteins or polypeptides from another non-group-C or group C serotype; such a cell line would be useful to support adenovirus-derived viral vectors bearing deletions of regulatory and/or structural genes, irrespective of the serotype from which such a vector was derived.
It is also contemplated that the constructs and methods of the present invention will support the design and engineering of chimeric viral vectors which express amino acid residue sequences derived from two or more Ad serotypes. Thus, unlike methods and constructs available prior to the advent of the present disclosure, this invention allows the greatest possible flexibility in the design and preparation of useful viral vectors and cell lines which support their construction and propagation all with a decreased risk of recombining with wild-type Ad to produce potentially-harmful recombinants.
In part, the present invention discloses a simpler, alternative means of reducing the recombination between viral and cellular sequences than those discussed in the art. One such means is to increase the size of the deletion in the recombinant virus and thereby reduce the -16extent of shared sequences between that virus and any Ad genes present in a packaging cell line the Ad5 genes in 293 cells, or the various Ad genes in the novel cell lines of the present invention.
By the term "substantially homologous"is meant having at least 80%, preferably at least 90%, most preferably at least 95% homology therewith.
The amino acid residues described herein are preferably in the isomeric form. However, residues in the isomeric form can be substituted for any L-amino acid residue, as long as the desired functional property is retained by the polypeptide. NH 2 refers to the free amino group present at the amino terminus of a polypeptide.
DNA Homolog: A nucleic acid having a preselected conserved nucleotide sequence and a sequence encoding a preferred polypeptide according to the present invention.
Foreign Gene: This term is used to identify a DNA molecule not present in the exact orientation and position as the counterpart DNA molecule found in wild-type adenovirus. It may also refer to a DNA molecule from another Ad serotype or from an entirely different species e.g. a human DNA sequence.
Penton: The terms "penton" or "penton complex" are preferentially used herein to designate a complex of penton base and fiber. The term "penton" may also be used to indicate penton base, as well as penton complex. The meaning of the term "penton" alone should be clear from the context within which it is used.
-17- Polypeptide and Peptide: These terms are used interchangeably herein to designate a series of no more than about 50 amino acid residues connected one to the other by peptide bonds between the alpha-amino and carboxy groups of adjacent residues.
Receptor: Receptor is a term used herein to indicate a biologically active molecule that specifically binds to (or with) other molecules. The term "receptor protein" may be used to more specifically indicate the proteinaceous nature of a specific receptor.
Transgene or Therapeutic Nucleotide Sequence: As described and claimed herein, such a sequence includes DNA and RNA sequences encoding an RNA or polypeptide. Such sequences may be "native" or naturally-derived sequences; they may also be "non-native" or "foreign" sequences which are naturally- or recombinantly-derived. The term "transgene," which may be used interchangeably herein with the term "therapeutic nucleotide sequence," is often used to describe a heterologous or foreign (exogenous) gene that is carried by a viral vector and transduced into a host cell.
Therefore, therapeutic nucleotide sequences include antisense sequences or nucleotide sequences which may be transcribed into antisense sequences. Therapeutic nucleotide sequences (or transgenes) further comprise sequences which function to produce a desired effect in the cell or cell nucleus into which said therapeutic sequences are delivered. For example, a therapeutic nucleotide sequence may encode a functional protein intended for delivery into a cell which is unable to produce that functional protein.
Expression or Delivery Vector: Any plasmid or virus into which a foreign DNA may be inserted for expression in a suitable host cell the protein or polypeptide encoded by the DNA is synthesized in the host cell's system. Vectors capable of directing the expression of DNA segments (genes) encoding one or more proteins are referred to herein as "expression vectors". Also included are vectors which allow cloning of cDNA (complementary DNA) from mRNAs produced using reverse transcriptase.
Adenoviral Vector or Ad-Derived Vector: Any adenovirus-derived plasmid or virus into which a foreign DNA may be inserted or expressed. This term may also be used interchangeably with "viral vector" This "type" of vector may be utilized to carry nucleotide -18sequences encoding therapeutic proteins or polypeptides to specific cells or cell types in a subject in need of treatment, as described further hereinbelow.
Complementing Plasmid: This term is generally used herein to describe plasmid vectors used to deliver particular nucleotide sequences into a packaging cell line, with the intent of having said sequences stably integrate into the cellular genome.
Delivery Plasmid: This term is generally used herein to describe a plasmid vector that carries or delivers nucleotide sequences in or into a cell line a packaging cell line) for the purpose of propagating therapeutic viral vectors of the present invention.
The adenovirus (Ad) particle is relatively complex and may be resolved into various substructures. The outer shell is strikingly icosahedral in shape and, at first glance, appears to have a triangulation number of 25. The structures at the fivefold positions ("pentons") are different from the rest ("hexons"), however, and the hexons are chemically trimers rather than hexamers. Thus, the structure really does not correspond to a simple sub-triangulated :0 icosahedral design. (See, Fields, et al., Virology, Vol. I, Raven Press, NY, pp. 54-56 (1990).) Fiber plays a crucial role in adenovirus infection by attaching the virus to a specific .receptor on the cell surface. The fiber consists of three domains: an N-terminal tail that interacts with penton base; a shaft composed of 22 repeats of a 15-amino-acid segment that forms -sheet and -bends; and a knob at the C-terminus that contains the type-specific antigen and is responsible for binding to the cell surface receptor. The fiber protein is also responsible for transport of viral nucleic acids into the nucleus. The gene encoding the fiber protein from Ad2 has been expressed in human cells and has been shown to be correctly assembled into trimers, glycosylated and transported to the nucleus. (See, Hong and Engler, Virology 185: 758-761 (1991).) Thus, alteration of gene delivery mediated by recombinant adenovirus vectors to specific cell types has great utility for a variety of gene therapy applications and is thus one of the objects of the present invention.
Hexon and penton capsomeres are the major components on the surface of the virion.
Their constituent polypeptides, nos. II, 1I and IV, contain tyrosine residues that are exposed on the surface of the virion and can be labeled by iodination of intact particles.
-19- The fiber is an elongated protein which exists as a trimer of three identical polypeptides (polypeptide IV) of 582 amino acids in length. The N-terminus of the fiber mediates binding to the penton base to form what is generally called the penton capsomere. The C-terminus of the fiber is involved in initial binding of the virus to cellular receptors.
The 35,000+ base pair (bp) genome of adenovirus type 2 has been sequenced and the predicted amino acid sequences of the major coat proteins (hexon, fiber and penton base) have been described. (See, Neumann et al., Gene 69: 153-157 (1988); Herisse et al., Nuc.
Acids Res. 9: 4023-4041 (1981); Roberts et al., J. Biol. Chem. 259: 13968-13975 (1984); Kinloch et al., J. Biol. Chem. 259: 6431-6436 (1984); and Chroboczek et al., Virol. 161: 549-554 (1987).) The sequence of Ad5 DNA was completed more recently; its sequence includes a total of 35,935 bp. Portions of many other adenovirus genomes have also been sequenced. It is presently understood that the upper packaging limit for adenovirus virions is about 105% of S the wild-type genome length. (See, Bett, et al., J. Virol. 67(10): 5911-21 (1993).) Thus, for Ad2 and Ad5, this would be an upper packaging limit of about 38kb of DNA.
Adenovirus DNA also includes inverted terminal repeat sequences (ITRs) ranging in size from about 100 to 150 bp, depending on the serotype. The inverted repeats enable single strands of viral DNA to circularize by base-pairing of their terminal sequences, and the resulting base-paired "panhandle" structures are thought to be important for replication of the viral DNA.
For efficient packaging, the ITRs and the packaging signal (a few hundred bp in length) appear to comprise the "minimum requirement." Helper-dependent vectors lacking all viral ORFs but including these essential cis elements (the ITRs and contiguous packaging sequence) have been constructed, but the virions package less efficiently that the helper and package as multimers part of the time, which suggests that the virus may "want" to package a fuller DNA complement (see, Fisher, et al., Virology 217: 11-22 (1996).
While some prefer to use replication-defective Ad viral vectors for fear that replicationcompetent vectors raise safety issues, the viral vectors of the present invention may retain their ability to express the genome packaged within they may retain their "infectivity" they do not act as infectious agents to the extent that they cause disease in the subjects to whom they are administered for therapeutic purposes.
It is to be appreciated that the viral vectors of the present invention have several distinct advantages over adenoviral and Ad-derived vectors described in the art. For example, recombination of such vectors is rare; there are no known associations of human malignancies with adenoviral infections despite common human infection with adenoviruses; the genome may be manipulated to accommodate foreign genes of a fairly substantial size; and host proliferation is not required for expression of adenoviral proteins.
.An extension of this invention is that the Ad-derived viral vectors disclosed herein may be used to target and deliver genes into specific cells by incorporating the attachment sequence for other receptors (such as CD4) onto the fiber protein by recombinant DNA techniques, thus producing a chimeric molecule. This should result in the ability to target and deliver genes into a wide range of cell types with the advantage of evading recognition by the host's immune system. The within-disclosed delivery systems thus provide for increased flexibility in gene design to enable stable integration into proliferating and nonproliferating cell types.
For example, published International App. No. W095/26412 and Krasnykh, et al. (J.
o Virol. 70: 6839-46 (1996)), the disclosures of which are incorporated by reference herein, describe modifications that may be made to the adenovirus fiber protein. Such modifications are useful in altering the targeting mechanism and specificity of adenovirus and could readily be utilized in conjunction with the constructs of the present invention to target the novel viral vectors disclosed herein to different receptors and different cells. Moreover, modifications to fiber protein which alter its tropism may permit greater control over the localization of viral vectors in therapeutic applications.
Similarly, incorporation of various structural proteins into cell lines of the present invention, whether or not those proteins are modified, is also contemplated by the present invention. Thus, for example, modified penton base polypeptides such as those described in Wickham, et al. Virol. 70: 6831-8 (1996)) may have therapeutic utility when used according to the within-disclosed methods.
-21- C. Packaging Cell Lines The first generation of recombinant adenoviral vectors currently available tend to have a deletion in the first viral early gene region which is generally referred to as El, which comprises the Ela and Elb regions. (These regions typically span genetic map units 1.30 to 9.24.) Figure 3 in chapter 67 of Fields Virology, 3d Ed. (Fields et al. Lippincott-Raven Publ., Philadelphia, (1996), p. 2116) illustrates a transcription and translation map of adenovirus type 2 (Ad2) that is a helpful example.
According to various published reports, deletion of the viral El region renders the recombinant adenovirus defective for replication and incapable of producing infectious viral particles in the subsequently-infected target cells. Thus, the ability to generate El-deleted adenovirus is often based on the availability of the human embryonic kidney packaging cell line called 293. This cell line contains the El region of adenovirus, which provides El gene region products to "support" the growth of El-deleted virus in the cell line (see, Graham et al., J.
Gen. Virol. 36: 59-71 (1977)).
Nevertheless, the inherent problems with current first-generation recombinant adenoviruses have raised increasing concerns about their use in patients. For example, several recent studies have shown that El-deleted adenoviruses are not completely replicationincompetent (see Rich, Hum. Gene. Ther. 4: 461-476 (1993); Engelhardt, et al., Nature Genet.
4: 27-34 (1993)).
Three general limitations are associated with the adenoviral vector technology. First, infection both in vivo and in vitro with the adenoviral vector at high multiplicity of infection has resulted in cytotoxicity to the target cells, due to the accumulation of penton protein, which is itself toxic to mammalian cells (Kay, Cell Biochem. 17E: 207 (1993)).
Second, host immune responses against adenoviral late gene products, including penton protein, cause the inflammatory response and destruction of the infected tissue which received the vectors (Yang, et al., PNAS USA 92: 4407-4411 (1994)). Lastly, host immune responses and cytotoxic effects together prevent the long-term expression of transgenes and cause decreased levels of gene expression following subsequent administration of adenoviral vectors (Mittal, et al., Virus Res. 28: 67-90 (1993)).
-22- The packaging cell lines disclosed herein support viral vectors with deletions of major portions of the viral genome, without the need for helper viruses.
D. Therapeutic Viral Vectors and Related Systems 1. Nucleic Acid Segments A therapeutic viral vector or composition of the present invention comprises a nucleotide sequence encoding a protein or polypeptide molecule or a biologically active fragment thereof which may be used for therapeutic applications, as described herein. A therapeutic viral vector or composition may further comprise an enhancer element or a promoter located 5' to and controlling the expression of such a therapeutic nucleotide sequence or gene.
In general, promoters are DNA segments that contain a DNA sequence that controls the expression of a gene located 3' or downstream of the promoter. The promoter is the DNA S sequence to which RNA polymerase specifically binds and initiates RNA synthesis (transcription) of that gene, typically located 3' of the promoter. If more than one nucleic acid sequence encoding a particular polypeptide or protein is included in a therapeutic viral vector or nucleotide sequence, more than one promoter or enhancer element may be included, particularly if that would enhance efficiency of expression. For purposes of the present invention, regulatable (inducible) as well as constitutive promoters may be used, either on separate vectors or on the same vector.
.A subject therapeutic nucleotide composition or vector consists of a nucleic acid molecule that comprises at least 2 different operatively linked DNA segments. The DNA can be manipulated and amplified by PCR as described herein and by using standard techniques, such as those described in Molecular Cloning: A Laboratory Manual, 2nd Ed, Sambrook et al., eds., Cold Spring Harbor, New York (1989). Typically, to produce a therapeutic viral vector of the present invention, the sequence encoding the selected therapeutic composition and the promoter or enhancer are operatively linked to a DNA molecule capable of autonomous replication in a cell either in vivo or in vitro. By operatively linking the enhancer element or promoter and the nucleotide sequence encoding the therapeutic nucleotide composition to the vector, the attached segments are replicated along with the vector sequences.
-23- Thus, a recombinant DNA molecule (rDNA) of the present invention is a hybrid DNA molecule comprising at least 2 nucleotide sequences not normally found together in nature. In various preferred embodiments, one of the sequences is a sequence encoding an Ad-derived polypeptide, protein, or fragment thereof. Stated another way, a therapeutic nucleotide sequence of the present invention is one that encodes an expressible protein, polypeptide or fragment thereof, and it may further include an active constitutive or regulatable (e.g.
inducible) promoter sequence.
A therapeutic viral vector or composition of the present invention is optimally from about 20 base pairs to about 40,000 base pairs in length. Preferably the nucleic acid molecule is from about 50 bp to about 38,000 bp in length. In various embodiments, the nucleic acid molecule is of sufficient length to encode one or more adenovirus proteins or functional polypeptide portions thereof. Since individual Ad polypeptides vary in length from about 19 amino acid residues to about 967 amino acid residues, corresponding nucleotide sequences will range from about 50 bp up to about 3000 bp, depending on the size and of individual polypeptide-encoding sequences that are "replaced" in the viral vectors by therapeutic nucleotide sequences of the present invention.
Various Ad proteins are comprised of more than one polypeptide sequence. Thus, deletion of the corresponding genes from an Ad vector as taught herein will thus allow the vector to accommodate even larger "foreign" DNA segments. Thus, if the sequences encoding S one or more adenovirus polypeptides or proteins are supplanted by a recombinant nucleotide sequence of the present invention, the length of the recombinant sequence can conceivably extend nearly to the packaging limit of the relevant adenovirus-derived vector.
S In view of the fact that preferred embodiments disclosed herein are helper-independent Ad-derived vectors, the entire wild-type Ad genome cannot be completely supplanted by recombinant nucleic acid molecules without transforming such a vector into a vector requiring "help" of some kind. However, the Ad-derived vectors of the present invention do not depend on a helper virus; instead, the vectors of the present invention are propagated in cell lines stably expressing proteins or polypeptides that have been removed from said vectors to allow the addition of "foreign" DNA into the vectors. In various disclosed embodiments, specific early region and structural polypeptides are deleted from the vectors of the present invention, thereby enabling the vectors to accommodate recombinant nucleic acid sequences (or -24cassettes) of various lengths. For example, Ad-derived vectors of the present invention may easily include 12 kb or more of foreign (or "therapeutic") DNA sequences.
The therapeutic (or foreign) nucleotide sequence can be a gene or gene fragment that encodes a protein or polypeptide or a biologically active fragment thereof that provides a desired therapeutic effect such as replacement of alpha 1-antitrypsin or cystic fibrosis transmembrane conductance regulator protein (CFTR) and the like. (See, Crystal, et al., Nature Genetics 8: 42-51 (1994); Zabner, et al., Cell 75: 207-216 (1993); Knowles, et al., NEJM 333(13): 823-831 (1995); and Rosenfeld, et al., Cell 68: 143-155 (1992), the disclosures of which are incorporated by reference herein.) An Ad-derived vector of the present invention may also comprise a nucleotide sequence encoding a protein, polypeptide or fragment thereof that is effective in regulating the cell cycle such as p53, Rb, or mitosin or which is effective in inducing cell death, such as thymidine kinase. (See, published International App. No. WO 95/11984, the disclosures of which are incorporated by reference herein.) It is further contemplated that a therapeutic protein or polypeptide expressed by a therapeutic viral vector of the present invention may be used in conjunction with another therapeutic agent when appropriate a thymidine kinase metabolite may be used in conjunction with the gene encoding thymidine kinase and its gene product in order to be even more effective.
Alternatively, a therapeutic viral vector can include a DNA or RNA oligonucleotide sequence that exhibits enzymatic therapeutic activity without needing to be translated into a polypeptide product before exerting a therapeutic effect. Examples of the latter include antisense oligonucleotides that will inhibit the transcription of deleterious genes or ribozymes that act as site-specific ribonucleases for cleaving selected mutated gene sequences. In another variation, a therapeutic nucleotide sequence of the present invention may comprise a DNA construct capable of generating therapeutic nucleotide molecules, including ribozymes and antisense DNA, in high copy numbers in target cells, as described in published PCT application No. WO 92/06693 (the disclosure of which is incorporated herein by reference). Other preferred therapeutic nucleotide sequences according to the present invention are capable of delivering HIV antisense nucleotides to latently-infected T cells via CD4. Similarly, delivery of Epstein-Barr Virus (EBV) EBNa-1 antisense nucleotides to B cells via CR2 is capable of effecting therapeutic results.
As noted elsewhere herein, an Ad-derived vector of the present invention may also include a promoter sequence. Both constitutive and regulatable (often called "inducible") promoters are useful in constructs and methods of the present invention. For example, some useful regulatable promoters are those of the CREB-regulated gene family and include and inhibin, gonadotropin, cytochrome c, glucagon, and the like. (See, published International App. No. W096/14061, the disclosures of which are incorporated by reference herein.) A regulatable or inducible promoter may be described as a promoter wherein the rate of RNA polymerase binding and initiation is modulated by external stimuli. Such stimuli include various compounds or compositions, light, heat, stress, chemical energy sources, and the like.
Inducible, suppressible and repressible promoters are considered regulatable promoters.
Regulatable promoters may also include tissue-specific promoters. Tissue-specific •0 promoters direct the expression of the gene to which they are operably linked to a specific cell type. Tissue-specific promoters cause the gene located 3' of it to be expressed predominantly, if not exclusively, in the specific cells where the promoter expressed its endogenous gene.
Typically, it appears that if a tissue-specific promoter expresses the gene located 3' of it at all, Sthen it is expressed appropriately in the correct cell types (see, Palmiter et al., Ann. Rev.
Genet. 20: 465-499 (1986)).
When a tissue-specific promoter controls the expression of a gene, that gene will be expressed in a small number of tissues or cell types rather than in substantially all tissues and cell types. Examples of tissue-specific promoters include the immunoglobulin promoter described by Brinster et al., Nature 306: 332-336 (1983) and Storb et al., Nature 310: 238-231 (1984); the elastase-I promoter described by Swift et al., Cell 38: 639-646 (1984); the globin promoter described by Townes et al., Mol. Cell. Biol. 5: 1977-1983 (1985), and Magram et al., Mol. Cell. Biol. 9: 4581-4584 (1989), the insulin promoter described by Bucchini et al., PNAS USA, 83: 2511-2515 (1986) and Edwards et al., Cell 58: 161 (1989); the immunoglobulin promoter described by Ruscon et al., Nature 314: 330-334 (1985) and Grosscheld et al., Cell 38: 647-658 (1984); the alpha actin promoter described by Shani, Mol.
Cell. Biol. 6: 2624-2631 (1986); the alpha crystalline promoter described by Overbeek et al., PNAS USA 82: 7815-7819 (1985); the prolactin promoter described by Crenshaw et al., Genes and Development 3: 959-972 (1989); the proopiomelanocortin promoter described by 26 Tremblay et al., PNAS USA 85: 8890-8894 (1988); the beta-thyroid stimulating hormone (BTSH) promoter described by Tatsumi et al., Nippon Rinsho 47: 2213-2220 (1989); the mouse mammary tumor virus (MMTV) promoter described by Muller et al., CeL 54: 105 (1988); the albumin promoter described by Palmiter et al., Ann. Rev. Genet. 20: 465-499 (1986); the keratin promoter described by Vassar et al., PNAS USA 86: 8565-8569 (1989); the osteonectin promoter described by McVey et al., J. Biol. Chem. 263: 11,111-11,116 (1988); the prostate-specific promoter described by Allison et al., Mol. Cell. Biol. 9: 2254-2257 (1989); the opsin promoter described by Nathans et al., PNAS US A 81: 4851-4855 (1984); the olfactory marker protein promoter described by Danciger et al., PNAS USA 86: 8565-8569 (1989); the neuron-specific enolase (NSE) promoter described by Forss-Pelter et al., J. Neurosci. Res. 16: 141-151 (1986); the L-7 promoter described by Sutcliffe, Trends in ego. Genetics 3: 73-76 (1987) and the protamine 1 promoter described Peschon et al., Ann. New *eees* York Acad. Sci. 564: 186-197 (1989) and Braun et al., Genes and Develoment 3: 793-802 (1989). (The -disclosures of all references cited are incorporated by reference herein.) 2. Compositions In various alternative embodiments of the present invention, therapeutic sequences and compositions useful for practicing the therapeutic methods described herein are contemplated.
see* Therapeutic compositions of the present invention may contain a physiologically tolerable carrier together with one or more therapeutic nucleotide sequences of this invention, dissolved or dispersed therein as an active ingredient. In a preferred embodiment, the composition is not immunogenic or otherwise able to cause undesirable side effects when administered to a subject for therapeutic purposes.
As used herein, the terms "pharmaceutically acceptable", "physiologically tolerable" and grammatical variations thereof, as they refer to compositions, carriers, diluents and reagents, are used interchangeably and represent that the materials are capable of administration to or upon a subject a mammal without the production of undesirable physiological effects such as nausea, dizziness, gastric upset and the like.
For example, the present invention comprises therapeutic compositions useful in the specific targeting of epithelia] or non-epithelial cells as well as in delivering a therapeutic nucleotide sequence to those cells. Therapeutic compositions designed to preferentially target to epithelia] cells may comprise an adenovirus-derived vector including a therapeutic -27nucleotide sequence. As described herein, a number of adenovirus-derived moieties are useful in the presently-disclosed therapeutic compositions and methods.
While some of the Examples appearing below specifically recite fiber proteins, polypeptides, and fragments thereof, it is expressly provided herein that other structural and non-structural Ad proteins and polypeptides regulatory protein s and polypeptides) may be used as components of the various disclosed vectors and cell lines. Moreover, chimeric molecules comprised of proteins, polypeptides, and/or fragments thereof which are derived from different Ad serotypes may be used in any of the within-disclosed methods, constructs and compositions. Similarly, recombinant DNA sequences of the present invention may be prepared using nucleic acid sequences derived from different Ad serotypes, in order to design useful constructs with broad applicability, as disclosed and claimed herein.
It should also be appreciated that, while the members of Group C adenovirus Ad serotypes 1, 2, 5, and 6 are specifically recited in various examples herein, the present invention is in no way limited to those serotypes alone. In view of the fact that the adenovirus serotypes are all closely-related in structure and functionality, therapeutic viral vectors, packaging cell lines, and plasmids of the present invention may be constructed from components of any and all Ad serotypes and the within-disclosed methods of making and S using the various constructs and cell lines of the present invention apply to all of said serotypes.
The preparation of a pharmacological composition that contains active ingredients dissolved or dispersed therein is well understood in the art. Typically such compositions are prepared as injectables either as liquid solutions or suspensions however, solid forms suitable for solution or suspension in liquid prior to use can also be prepared. A preparation can also be emulsified, or formulated into suppositories, ointments, creams, dermal patches, or the like, depending on the desired route of administration.
The active ingredient can be mixed with excipients which are pharmaceutically acceptable and compatible with the active ingredient and in amounts suitable for use in the therapeutic methods described herein. Suitable excipients are, for example, water, saline, dextrose, glycerol, ethanol or the like and combinations thereof, including vegetable oils, propylene glycol, polyethylene glycol and benzyl alcohol (for injection or liquid preparations); and petrolatum VASELINE), vegetable oil, animal fat and polyethylene glycol (for -28externally applicable preparations). In addition, if desired, the composition can contain wetting or emulsifying agents, isotonic agents, dissolution promoting agents, stabilizers, colorants, antiseptic agents, soothing agents and the like additives (as usual auxiliary additives to pharmaceutical preparations), pH buffering agents and the like which enhance the effectiveness of the active ingredient.
The therapeutic compositions of the present invention can include pharmaceutically acceptable salts of the components therein. Pharmaceutically acceptable salts include the acid addition salts (formed with the free amino groups of the polypeptide) that are formed with inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, tartaric, mandelic and the like. Salts formed with the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, 2-ethylamino ethanol, histidine, procaine and the like.
S: Physiologically tolerable carriers are well known in the art. Exemplary of liquid carriers are sterile aqueous solutions that contain no materials in addition to the active ingredients and water, or contain a buffer such as sodium phosphate at physiological pH value, physiological saline or both, such as phosphate-buffered saline. Still further, aqueous carriers can contain more than one buffer salt, as well as salts such as sodium and potassium chlorides, dextrose, polyethylene glycol and other solutes.
Liquid compositions can also contain liquid phases in addition to and to the exclusion of water. Exemplary of such additional liquid phases are glycerin, vegetable oils such as cottonseed oil, and water-oil emulsions.
A therapeutic composition typically contains an amount of a therapeutic nucleotide sequence of the present invention sufficient to deliver a therapeutically effective amount to the target tissue, typically an amount of at least 0.1 weight percent to about 90 weight percent of therapeutic nucleotide sequence per weight of total therapeutic composition. A weight percent is a ratio by weight of therapeutic nucleotide sequence to total composition. Thus, for example, 0.1 weight percent is 0.1 grams of DNA segment per 100 grams of total composition.
The therapeutic nucleotide compositions comprising synthetic oligonucleotide sequences of the present invention can be prepared using any suitable method, such as, the phosphotriester or phosphodiester methods. See Narang et al., Meth. Enzymol., 68:90, -29- (1979); U.S. Patent No. 4,356,270; and Brown et al., Meth. Enzymol., 68:109, (1979), the disclosures of which are incorporated by reference herein.
For therapeutic oligonucleotides sequence compositions in which a family of variants is preferred, the synthesis of the family members can be conducted simultaneously in a single reaction vessel, or can be synthesized independently and later admixed in preselected molar ratios. For simultaneous synthesis, the nucleotide residues that are conserved at preselected positions of the sequence of the family member can be introduced in a chemical synthesis protocol simultaneously to the variants by the addition of a single preselected nucleotide precursor to the solid phase oligonucleotide reaction admixture when that position number of the oligonucleotide is being chemically added to the growing oligonucleotide polymer. The addition of nucleotide residues to those positions in the sequence that vary can be introduced simultaneously by the addition of amounts, preferably equimolar amounts, of multiple preselected nucleotide precursors to the solid phase oligonucleotide reaction admixture during chemical synthesis. For example, where all four possible natural nucleotides (A,T,G and C) are to be added at a preselected position, their precursors are added to the oligonucleotide synthesis reaction at that step to simultaneously form four variants.
This manner of simultaneous synthesis of a family of related oligonucleotides has been previously described for the preparation of "Degenerate Oligonucleotides" by Ausubel et al.
(Current Protocols in Molecular Biology, Suppl. 8. p.2.11.7, John Wiley Sons, Inc., New S York (1991)), and can readily be applied to the preparation of the therapeutic oligonucleotide compositions described herein.
Nucleotide bases other than the common four nucleotides (A,T,G or or the RNA equivalent nucleotide uracil can also be used in the present invention. For example, it is well known that inosine is capable of hybridizing with A, T and G, but not C. Examples of other useful nucleotide analogs are known in the art; many may be found listed in 37 C.F.R.
§1.822.
Thus, where all four common nucleotides are to occupy a single position of a family of oligonucleotides, that is, where the preselected therapeutic nucleotide composition is designed to contain oligonucleotides that can hybridize to four sequences that vary at one position, several different oligonucleotide structures are contemplated. The composition can contain four members, where a preselected position contains A,T,G or C. Alternatively, the composition can contain two members, where a preselected position contains I or C, and has the capacity the hybridize at that position to all four possible common nucleotides. Finally, other nucleotides may be included at the preselected position that have the capacity to hybridize in a non-destabilizing manner with more than one of the common nucleotides in a manner similar to inosine.
3. Expression Vector Systems The introduction of exogenous DNA into eucaryotic cells has become one of the most powerful tools of the molecular biologist. The term "exogenous" encompasses any therapeutic composition of this invention which is administered by the therapeutic methods of this invention. Thus, "exogenous" may also be referred to herein as "foreign," "non-native," and the like. The methods of this invention preferably require efficient delivery of the DNA into the nucleus of the recipient cell and subsequent identification of cells that are expressing the foreign DNA.
A widely-used plasmid is pBR322, a vector whose nucleotide sequence and endonuclease cleavage sites are well known. Various other useful plasmid vectors are described in the Examples that follow.
A vector of the present invention comprises a nucleic acid (preferably DNA) molecule capable of autonomous replication in a cell and to which a DNA segment, a gene or S polynicleotide, can be operatively linked so as to bring about replication of the attached segment. In the present invention, one of the nucleotide segments to be operatively linked to vector sequences encodes at least a portion of a therapeutic nucleic acid molecule in effect, a nucleic acid sequence that encodes one or more therapeutic proteins or polypeptides, or fragments thereof.
In various embodiments, the entire peptide-coding sequence of the therapeutic gene is inserted into the vector and expressed; however, it is also feasible to construct a vector which also includes some non-coding sequences as well. Preferably, however, non-coding sequences are excluded. Alternatively, a nucleotide sequence for a soluble form of a polypeptide may be utilized. Another preferred therapeutic viral vector includes a nucleotide sequence encoding at least a portion of a therapeutic nucleotide sequence operatively linked to the vector for expression. As used herein with regard to DNA sequences or segments, the phrase -31- "operatively linked" generally means the sequences or segments have been covalently joined into one piece of DNA, whether in single or double stranded form.
The choice of viral vector into which a therapeutic nucleotide sequence of this invention is operatively linked depends directly, as is well known in the art, on the functional properties desired, vector replication and protein expression, and the host cell to be transformed these being limitations inherent in the art of constructing recombinant DNA molecules. Although certain adenovirus serotypes are recited herein in the form of specific examples, it should be understood that the present invention contemplates the use of any adenovirus serotype, including hybrids and derivatives thereof. As one will observe, it is not unusual or outside the scope of the present invention to utilize nucleotide and/or amino acid residue sequences of two or more serotypes in constructs, compositions and methods of the invention.
As one of skill in the art will note, in various embodiments of the present invention, S" different "types" of vectors are disclosed. For example, one "type" of vector is used to deliver particular nucleotide sequences into a packaging cell line, with the intent of having said sequences stably integrate into the cellular genome; these "types" of vectors are generally identified herein as complementing plasmids. A further "type" of vector described herein carries or delivers nucleotide sequences in or into a cell line a packaging cell line) for the purpose of propagating therapeutic viral vectors of the present invention; hence, these vectors are generally referred to herein as delivery plasmids. A third "type" of vector described herein is utilized to carry nucleotide sequences encoding therapeutic proteins or polypeptides to specific cells or cell types in a subject in need of treatment; these vectors are generally *identified herein as therapeutic viral vectors or Ad-derived vectors.
In one embodiment, the directional ligation means is provided by nucleotides present in the upstream nucleotide sequence, downstream nucleotide sequence, or both. In another embodiment, the sequence of nucleotides adapted for directional ligation comprises a sequence of nucleotides that defines multiple directional cloning means. Where the sequence of nucleotides adapted for directional ligation defines numerous restriction sites, it is referred to as a multiple cloning site.
A translatable nucleotide sequence is a linear series of nucleotides that provide an uninterrupted series of at least 8 codons that encode a polypeptide in one reading frame.
-32- Preferably, the nucleotide sequence is a DNA sequence. The vector itself may be of any suitable type, such as a viral vector (RNA or DNA), naked straight-chain or circular DNA, or a vesicle or envelope containing the nucleic acid material and any polypeptides that are to be inserted into the cell.
A preferred viral vector in which therapeutic nucleotide compositions of this invention are present is derived from adenovirus It is also desirable that the vector contain a promoter sequence. As taught herein, viral vectors of this invention may be designed and constructed in such a way that they specifically target a preselected recipient cell type, depending on the nature of therapy one seeks to administer. Methods of making and using therapeutic viral vectors that target specific cells are further described in the Examples that follow.
Novel vectors and compositions may also be designed and prepared to preferentially target cells that might not otherwise be targeted by wild-type adenovirus virions. For example, in order to target non-epithelial cells, one following the teachings of the present specification may be able to prepare a therapeutic vector including a nucleotide sequence encoding a foreign protein, polypeptide or other ligand directed to a non-epithelial cell or to a different receptor than that generally targeted by a particular adenovirus. Examples of useful ligands directed to specific receptors (identified in parentheses) include the V3 loop ofHIV gpl20 (CD4); transferrin (transferrin receptor); LDL (LDL receptors); and deglycosylated proteins S (asialoglycoprotein receptor). Various useful ligands which may be added to adenovirus fiber and methods for preparing and attaching same are set forth in published International App.
No. W095/26412, the disclosures of which are incorporated by reference herein.
Useful ligands which may be encoded by a foreign nucleotide sequence contained within a viral vector of the present invention, or which may be linked to proteins or polypeptides expressed thereby after said vectors are administered to a subject, also include antibodies and attachment sequences, as well as receptors themselves. For example, antibodies to cell receptor molecules such as integrins and the like, MHC Class I and Class II, asialoglycoprotein receptor, transferrin receptors, LDL receptors, CD4, and CR2 are but a few which are useful according to the present invention. It is also understood that the ligands typically bound by receptors, as well as analogs to those ligands, may be used as cellular targeting agents, as disclosed herein.
-33- E. Therapeutic Methods The vectors of the present invention are particularly suited for gene therapy. Thus, various therapeutic methods are contemplated by the present invention.
For example, it has now been discovered that Ad-derived viral vectors are capable of delivering a therapeutic nucleotide sequence to a specific cell or tissue, thereby expanding and enhancing treatment options available in numerous conditions in which more conventional therapies are of limited efficacy. Accordingly, methods of gene therapy utilizing these vectors are within the scope of the invention. Vectors are typically purified and then an effective amount is administered in vivo or ex vivo into the subject.
For example, the compositions may be used prophylactically or therapeutically in vivo to disrupt HIV infection and mechanisms of action by inhibiting gene expression or activation, via delivery of antisense HIV sequences or ribozymes to T cells or monocytes. Using methods S of the present invention, one may target therapeutic viral vectors as disclosed herein to specific cells and tissues, including hematopoietic cells, as infection of such cells appears to be mediated by distinct integrins to which viral vectors of the present invention may readily be targeted. (See, Huang, et al., J. Virol. 70: 4502-8 (1996).) Other useful therapeutic nucleotide sequences include antisense nucleotide sequences complementary to EBV EBNa-1 gene. Use of such therapeutic sequences may remediate or prevent latent infection of B cells with EBV. As discussed herein and in the Examples below, targeting and delivery may be accomplished via the use of various ligands, receptors, and other appropriate targeting agents.
Thus, in one embodiment, a therapeutic method of the present invention comprises *."..contacting the cells of a subject infected with EBV or HIV with a therapeutically effective amount of a pharmaceutically acceptable composition comprising a therapeutic nucleotide sequence of this invention. In a related embodiment, the contacting involves introducing the therapeutic nucleotide sequence composition into cells having an EBV or HIV-mediated infection.
Methods of gene therapy are well known in the art (see, Larrick and Burck, Gene Therapy: Application of Molecular Biology, Elsevier Science Publ. Co., Inc., New York, NY (1991); Kriegler, Gene Transfer and Expression: A Laboratory Manual, W. H. Freeman and Company, New York (1990)). The term "subject" should be understood to include any animal -34particularly mammalian patient, such as any murine, rat, bovine, porcine, canine, feline, equine, ursine, or human patient When the foreign gene carried in the vector encodes a tumor suppressor gene or another anti-tumor protein, the vector is useful to treat or reduce hyperproliferative cells in a subject, to inhibit tumor proliferation in a subject or to ameliorate a particular, related pathology. Pathologic hyperproliferative cells are characteristic of various disease states, such as thyroid hyperplasia, psoriasis, eczema, benign prostatic hypertrophy, Li-Fraumeni syndrome including breast cancer, sarcomas and other neoplasms, bladder cancer, colon cancer, lung cancer, various leukemias, and lymphomas.
Non-pathologic hyperproliferative cells are found, for example, in cells associated with wound repair. Pathologic hyperproliferative cells, however, characteristically exhibit loss of contact inhibition and a decline in their ability to selectively adhere which implies a change in the surface properties of the cell and a further breakdown in intercellular communication.
These changes include stimulation to divide and the ability to secrete proteolytic enzymes.
The present invention also contemplates methods of depleting suitable samples of pathologic mammalian hyperproliferative cells contaminating hematopoietic precursors during bond marrow reconstitution via the introduction of a wild-type tumor suppressor gene into the cell preparation using a vector of this invention. As used herein, a suitable sample is defined as a heterogeneous cell preparation obtained from a patient, a mixed population of cells containing both phenotypically normal and pathogenic cells.
Administration includes but is not limited to the introduction of therapeutic agents of the present invention into a cell or subject via various means, including direct injection, intravenously, intraperitoneally, via intra-tumor injection, via aerosols, or topically.
Therapeutic agents as disclosed herein may also be combined for administration of an effective amount of the agents with a pharmaceutically-acceptable carrier, as described herein.
As used herein, "effective amount" generally means the amount of vector (or proteins produced/released thereby) which achieves a positive outcome in the subject to whom the vector is administered. The total volume administered will necessarily vary depending on the mode of administration, as those of skill in the relevant art will appreciate, and dosages may vary as well.
The dose of a biologic vector is somewhat complex and may be described in terms of the concentration (in plaque-forming units per milliliter (pfu/ml)), the total dose (in pfus), and the estimated number of vectors administered per cell (the estimated multiplicity of infection or MOI). Thus, if a vector is administered via infusion say, across nasal epithelium at a constant total volume, the respective concentration, etc. may be described as follows: Concentration Volume Dose Estimated (pfu/ml) (ml) (pfu) MOI 107 2 2 x 10 7 1 108 2 2x 10 8 109 2 2 x 10 9 100 101 0 2 2x10 10 1000 In general, when adenoviral vectors are administered via infusion across the nasal
S
Sepithelium, administered amounts producing an estimated MOI of about 10 or greater are S much more effective than lower dosages. (See, Knowles, et al., New Eng. J. Med. 333: 823-831 (1995).) Similarly, when direct injection is the preferred treatment modality e.g., direct injection of a viral vector into a tumor doses of 1 x 10 9 pfu or greater are generally preferred. (See, published International App. No. W095/11984.) Thus, depending on the mode of administration, an effective amount administered in a single dose preferably contains from about 106 to about 1015 infectious units. A typical course of treatment would be one such dose per day over a period of five days. As those of skill in the art will appreciate, an effective amount may vary depending on the pathology or other condition to be treated, the status and sensitivity of the patient, and various other factors well known to those of skill in the art, such as the patient's tolerance to other courses of treatment that may have been applied previously. Thus, those of skill in the art may easily and precisely determine effective amounts of the agents/vectors of the present invention which may be administered to a particular patient, based on their understanding of and evaluation of such factors.
-36- The present invention also contemplates methods of ameliorating pathologies characterized by hyperproliferative cells or genetic defects in a subject, by administering to the subject an effective amount of a vector as described herein. Such vectors preferably contain a foreign gene encoding a gene product polypeptide or protein) having the ability to ameliorate the pathology, under suitable conditions. As used herein, the term "genetic defect" means any disease, condition or abnormality which results from inherited factors, e.g.
Huntington's Disease, Tay-Sachs Disease, or Sickle Cell Disease.
The present invention further provides methods for reducing the proliferation of tumor cells in a subject by introducing into the tumor mass an effective amount of an adenoviral expression vector containing an anti-tumor gene other than a tumor suppressor gene. The anti-tumor gene can encode, for example, thymidine kinase An effective amount of a therapeutic agent is then administered to the subject; the therapeutic agent, in the presence of the anti-tumor gene, is toxic to the cell.
Using thymidine kinase as exemplary, the therapeutic agent is a thymidine kinase metabolite such as ganciclovir (GCV), 6-methoxypurine arabinonucleoside (araM), or a S" functional equivalent thereof. Both the thymidine kinase gene and the thymidine kinase metabolite must be used concurrently in order to exert a toxic effect on the host cell. In the presence of the TK gene, GCV is phosphorylated and becomes a potent inhibitor of DNA synthesis, whereas araM is converted to the cytotoxic anabolite araATP. Thus, the precise S method of action or synergism is not relevant to therapeutic efficacy; what is relevant is the fact that the concurrent use of appropriate genes and therapeutic agents may effectively ameliorate a specific disease condition.
Another useful example contemplates use of a vector of the present invention which expresses the enzyme cytosine deaminase. Such a vector could be used in conjunction with administration of the drug 5-fluorouracil (Austin and Huber, Mol. Pharm. 43: 380-387 (1993)) or the recently-described E. coli Deo gene in combination with 6-methyl-purine-2'deoxyribonucleoside (Sorscher et al., Gene Therapy 1: 233-238 (1994)).
As with the use of the tumor suppressor genes described previously, the use of other anti-tumor genes, either alone or in combination with the appropriate therapeutic agent, provides a treatment for the uncontrolled cell growth or proliferation characteristic of tumors and malignancies. Thus, the present invention provides therapies to halt the uncontrolled -37cellular growth in a patient, thereby alleviating the symptoms or the disease or cachexia present in the patient. The effect of this treatment includes, but is not limited to, prolonged survival time of the patient, reduction in tumor mass or burden, apoptosis of tumor cells, or the reduction in the number of circulating tumor cells. Means of quantifying the beneficial effects of this therapy are well known to those of skill in the art.
The present invention provides a recombinant adenovirus expression vector characterized by the partial or total deletion of one or more adenoviral structural protein genes, such as the gene encoding fiber, which allows the vector to accommodate a therapeutic, foreign nucleic acid sequence encoding a functional foreign polypeptide, protein, or biologically active fragment thereof. For example, such a functional polypeptide moiety may be a suicide gene or a functional equivalent thereof, of which the anti-cancer gene TK is but one example. TK genes, when expressed, produce a gene product which is lethal to the cell, *S particularly in the presence of GCV. One source of the TK gene is the herpes simplex virus (HSV), albeit other sources are known as well and may be used as taught herein. The TK gene may readily be obtained from HSV by methods well known to those of skill in the art. For 4. example, the plasmid pMLBKTK in E. coliHB101 (from ATCC #39369) is a source of the s* HSV-I TK gene, which may be used as disclosed herein. (See, e.g, published International application No. WO 95/11984, the disclosures of which are incorporated by reference herein.) A therapeutic gene sequence may be introduced into a tumor mass by combining the o adenoviral expression vector with a suitable pharmaceutically acceptable carrier. Introduction can be accomplished, for example, via direct injection of the recombinant Ad vector into the tumor mass. In the specific case of a cancer such as hepatocellular carcinoma (HCC), direct injection into the hepatic artery can be used for delivery, because most HCCs derive their circulation from this artery. Similar techniques of administration may be applied to other specific types of tumors and malignancies, as is known to those of skill in the art.
A method of tumor-specific delivery of a tumor-suppressor gene is accomplished by contacting target tissue in a subject with an effective amount of a recombinant Ad-derived vector of this invention. In the case of anti-tumor therapy, the gene is intended to encode an anti-tumor agent, such as a functional tumor suppressor gene product or suicide gene product.
The term "contacting" is intended to encompass any delivery method for the efficient transfer of the vector, such as via intra-tumoral injection.
-38- In another example, adenovirus vectors of the present invention can be used to transfer genes to central nervous system (CNS) tumors in vivo. Using stereotactic delivery, Adderived vectors can transfer genes into the CNS intended for tumor therapy. For example, Badie, et al. (Neurosurgery 35(5): 910-916 (1994), incorporated by reference herein) reported that 50% and 90% transduction at vector titers of approximately 107 and 108 plaque-forming units/ml (pfu/ml) were observed in in vitro experiments. In their in vivo studies using appropriate animal brain tumor models, titers above 107 were observed to have a cytopathic effect; more than 50% reduction in tumor cell growth was noted at 108 pfu/ml; no toxic effects were noted when titers as high as 1010 pfu/ml were injected into the brain tissue of subject animals Thus, the use of titers greater than 107 pfu/ml appear appropriate when challenging CNS tumors.
The present invention also contemplates methods for determining the efficacy of the So within-disclosed therapeutic compositions and methods. One such method for confirming efficacy utilizes the human/SCID (severe combined immunodeficient) mouse model of EBVinduced LPD (lymphoproliferative disease) to ascertain whether EBV-antisense therapeutic nucleotide sequences block tumor formation. (See, Pisa, et al., Blood 79: 173-179 (1992); Rowe, et al., Curr. Top. Microbiol. Immunol. 166: 325 (1990); and Cannon, et al., Clin. Invest. 85: 1333-1337 (1990), the disclosures of which are incorporated by reference herein.) .****.Finally, the use of Ad vectors of the present invention to prepare medicaments for the treatment, therapy and/or diagnosis of various diseases is also contemplated by this invention.
Moreover, other anti-tumor genes may be used in combination with the corresponding therapeutic agent to reduce the proliferation of tumor cells. Such other gene-and-therapeuticagent combinations are known to those of skill in the art and may be applied as taught herein.
F. Construction of Therapeutic Viral Vectors for Gene Delivery For in vitro gene transfer, administration is often accomplished by first isolating a selected cell population from a patient such as lung epithelial cells, lymphocytes and the like followed by in vitro gene transfer of the therapeutic compositions of this invention and the replacement of the cells into the patient. In vivo therapy is also contemplated, via the administration of therapeutic compositions of this invention by various delivery means. For -39example, aerosol administration and administration via subcutaneous, intravenous, intraperitoneal, intramuscular, ocular means and the like are also within the scope of the present invention.
Other gene-delivery methods are also useful in conjunction with the methods, compositions and constructs of the present invention; see, published International Application No. WO 95/11984, the disclosures of which are incorporated by reference herein.
Similarly, various non-human animals having inserted therein the vectors or transformed cells of this invention. These "transgenic" animals are made using methods well known to those of skill in the art. For example, see U.S. Patent No. 5,175,384 (the disclosures of which are incorporated by reference herein).
The present invention also contemplates various methods of targeting specific cells cells in a subject in need of diagnosis and/or treatment. As discussed herein, the present o invention contemplates that the viral vectors and compositions of the present invention may be directed to specific receptors or cells, for the ultimate purpose of delivering those vectors and compositions to specific cells or cell types. The viral vectors and constructs of the present invention are particularly useful in this regard.
S• In general, adenovirus attachment and uptake into cells are separate but cooperative events that result from the interaction of distinct viral coat proteins with a receptor for attachment and V integrin receptors for internalization. Adenovirus attachment to the cell surface via the fiber coat proteins has been discovered to be dissociable and distinct from the subsequent step of internalization, and the present invention is able to take advantage of and function in conjunction with these differing receptors.
G. Other Applications The cell lines, viral vectors and methods of the present invention may also be used for purposes other than the direct administration of therapeutic nucleotide sequences. In one such application, the production of large quantities of biologically active proteins or polypeptides in cells transfected with the within-disclosed viral vectors is contemplated herein. For example, human lymphoblastoid cells may be transfected with an integrative viral vector of the present invention carrying a human hematopoietic growth factor such as the gene for erythropoietin (EPO); cells so transfected are thus able to produce biologically active EPO. (See, Lopez et al., Gene 148: 285-91 (1994).) Various other applications and uses of the within-described methods, cell lines, plasmids, vectors, and compositions of the present invention shall become apparent upon closer examination of the Examples that follow.
EXAMPLES
The following examples are intended to illustrate, but not limit, the present invention.
As such, the following description provides details of the manner in which particular embodiments of the present invention may be made and used. This description, while exemplary of the present invention, is not to be construed as specifically limiting the invention.
Variations and equivalents, now known or later developed, which would be within the understanding and technical competence of one skilled in this art are to be considered as falling within the scope of this invention.
Example 1 Preparation of Adenovirus Packaging Cell Lines Cell lines that are commonly used for growing adenovirus are useful as host cells for the preparation of adenovirus packaging cell lines. Preferred cells include 293 cells, an adenovirus-transformed human embryonic kidney cell line obtained from the ATCC, having SAccession Number CRL 1573; HeLa, a human epithelial carcinoma cell line (ATCC Accession Number CCL A549, a human lung carcinoma cell line (ATCC Accession Number CCL 1889); and the like epithelial-derived cell lines. As a result of the adenovirus transformation, the 293 cells contain the El early region regulatory gene. All cells were maintained in complete DMEM 10% fetal calf serum unless otherwise noted.
The cell lines of this invention allow for the production and propagation of novel adenovirus-based gene delivery vectors having deletions in preselected gene regions by cellular complementation of adenoviral genes. To provide the desired complementation of such deleted adenoviral genomes in order to generate a novel viral vector of the present invention, plasmid vectors that contain preselected functional units were designed as described herein.
Such units include but are not limited to El early region, the viral fiber gene. The preparation of plasmids providing such complementation, thereby being "complementary plasmids or constructs", that are stably inserted into host cell chromosomes are described below.
-41 A. Preparation of an E4-Expressing Plasmid for Complementation of E4-Gene-Deleted Adenoviruses The viral E4 regulatory region contains a single transcription unit which is alternately spliced to produce several different mRNAs. The E4-expressing plasmid prepared as described herein and used to transfect the 293 cell line contains the entire E4 transcriptional unit as shown in Figure 1. A DNA fragment extending from 175 nucleotides upstream of the E4 transcriptional start site including the natural E4 promoter to 153 nucleotides downstream of the E4 polyadenylation signal including the natural E4 terminator signal, corresponding to nucleotides 32667-35780 of the adenovirus type 5 (hereinafter referred to as Ad5) genome as described in Chroboczek et al. (Yirol., 186:280-285 (1992), GenBank Accession Number M73260), was amplified from Ad5 genomic DNA, obtained from the ATCC, via the polymerase chain reaction (PCR). Sequences of the primers used were 5'CGGTACACAGAATTCAGGAGACACAACTCC3' (forward or 5' primer referred to as E4L) (SEQ ID NO 1) and 5'GCCTGGATCCGGGAAGTTACGTAACGTGGGAAAAC3' (SEQ ID NO 2) (backward or 3' primer referred to as E4R). To facilitate cloning of the PCR fragment, these oligonucleotides were designed to create novel sites for the restriction enzymes EcoRI and BamHI, respectively, as indicated with underlined nucleotides. DNA was amplified via PCR using 30 cycles of 92 C for 1 minute, 50 C for 1 minute, and 72 C for 3 minutes resulting in amplified full-length E4 gene products.
The amplified DNA E4 products were then digested with EcoRI and BamHI for cloning into the compatible sites of pBluescript/SK+ by standard techniques to create the 0. plasmid pBS/E4. A 2603 base pair (bp) cassette including the herpes simplex virus thymidine kinase promoter, the hygromycin resistance gene, and the thymidine kinase polyadenylation signal was excised from the plasmid pMEP4 (Invitrogen, San Diego, CA) by digestion with Fspl followed by addition of BamHI linkers (5'CGCGGATCCGCG3') (SEQ ID NO 3) for subsequent digestion with BamHI to isolate the hygromycin-containing fragment.
The isolated BamHI-modified fragment was then cloned into the BamHI site of pBS/E4 containing the E4 region to create the plasmid pE4/Hygro containing 8710 bp (Figure The pE4/Hygro plasmid has been deposited with the ATCC as described in Example 3. The complete nucleotide sequence of pE4/Hygro is listed in SEQ ID NO 4. Position number I of the linearized vector corresponds to approximately the middle portion of the pBS/SK+ -42backbone as shown in Figure 2 as a thin line between the 3' BamHI site in the hygromycin insert and the 3' EcoRI site in the E4 insert. The 5' and 3' ends of the E4 gene are located at respective nucleotide positions 3820 and 707 of SEQ ID NO 4 while the 5' and 3' ends of the hygromycin insert are located at respective nucleotide positions 3830 and 6470. In the clone that was selected for use, the E4 and hygromycin resistance genes were divergently transcribed.
B. Preparation of a Fiber-Expressing Plasmid for Complementation of Fiber-Gene-Deleted Adenoviruses To prepare a fiber-encoding construct, primers were designed to amplify the fiber coding region from Ad5 genomic DNA with the addition of unique BamHI and Not sites at the 5' and 3' ends of the fragment, respectively. The Ad5 nucleotide sequence is available with the GenBank Accession Number M18369. The 5' and 3' primers had the respective nucleotide sequences of 5'ATGGGATCCAAGATGAAGCGCGCAAGACCG3' (SEQ ID NO 5) and 5'CATAACGCGGCCGCTTCTTTATTCTTGGGC3' (SEQ ID NO where the inserted BamHI and NotI sites are indicated by underlining. The 5' primer also contained a nucleotide substitution 3 nucleotides 5' of the second ATG codon (C to A) that is the initiation site. The nucleotide substitution was included so as to improve the consensus for initiation of fiber protein translation.
The amplified DNA fragment was inserted into the BamHI and Not sites of pcDNA 3 (Invitrogen) to create the plasmid designated pCDNA3/Fiber having 7148 bp, the plasmid map of which is shown in Figure 3. The parent plasmid contained the CMV promoter, the bovine growth hormone (BHG) terminator and the gene for conferring neomycin resistance. The viral sequence included in this construct corresponds to nucleotides 31040-32791 of the genome.
The complete nucleotide sequence of pCDNA3/Fiber is listed in SEQ ID NO 7 where the nucleotide position I corresponds to approximately the middle of the pcDNA 3 vector sequence. The 5' and 3' ends of the fiber gene are located at respective nucleotide positions 916 with ATG and 2661 with TAA.
To enhance expression of fiber protein by the constitutive CMV promoter provided by the pcDNA vector, a BglII fragment containing the tripartite leader (TPL) of adenovirus type 2 was excised from pRDI12a (Sheay et al., BioTechniques, 15:856-862 (1993) and inserted into -43the BamHI site of pCDNA3/Fiber to create the plasmid pCLF having 7469 bp, the plasmid map of which is shown in Figure 4. The adenovirus tripartite leader sequence, present at the end of all major late adenoviral mRNAs as described by Logan et al., Proc. Natl. Acad. Sci., USA, 81:3655-3659 (1984) and Berkner, BioTechniques, 6:616-629 (1988), is encoded by three spatially separated exons corresponding to nucleotide positions 6071-6079 (the 3' end of the first leader segment), 7101-7172 (the entire second leader segment), and 9634-9721 (the third leader segment) in the adenovirus type 2 genome. The tripartite sequence, however, also shows correspondence with the Ad5 leader sequence having three spatially separated exons corresponding to nucleotide positions 6081-6089 (the 3' end of the first leader segment), 7111- 7182 (the entire second leader segment), and 9644-9845 (the third leader segment and sequence downstream of that segment). The corresponding cDNA sequence of the tripartite leader sequence present in pCLF is listed in SEQ ID NO 8 bordered by BamHI/BglII 5' and 3' o sites at respective nucleotide positions 907-912 to 1228-1233.
The pCLF plasmid has been deposited with the ATCC as described in Example 3. The complete nucleotide sequence of pCLF is listed in SEQ ID NO 8 where the nucleotide position 1 corresponds to approximately the middle of the pcDNA 3 parent vector sequence.
The 5' and 3 ends of the Ad5 fiber gene are located at respective nucleotide positions 1237- 1239 with ATG and 2980-2982 with TAA. The rest of the vector construct has been previously described above.
C. Generation of an Adenovirus Packaging Cell Line Carrying Plasmids Encoding Functional E4 and Fiber Proteins The 293 cell line was selected for preparing the first adenovirus packaging line as it already contains the El gene as prepared by Graham et al., J. Gen. Virol., 36:59-74 (1977) and as further characterized by Spector, Virol., 130:533-538 (1983). Before electroporation, 293 cells were grown in RPMI medium 10% fetal calf serum. Four x 106 cells were electroporated with 20 pg each of pE4/Hygro DNA and pCLF DNA using a BioRad GenePulser and settings of 300 V, 25 pF. DNA for electroporation was prepared using the Qiagen system according to the manufacturer's instructions (Bio-Rad, Richmond, CA).
Following electroporation, cells were split into fresh complete DMEM 10% fetal calf serum containing 200 pg/ml Hygromycin B (Sigma, St. Louis, MO).
-44- From expanded colonies, genomic DNA was isolated using the "MICROTURBOGEN" system (Invitrogen) according to manufacturer's instructions. The presence of integrated E4 DNA was assessed by PCR using the primer pair E4R and ORF6L (5'TGCTTAAGCGGCCGCGAAGGAGAAGTCC3') (SEQ ID NO the latter of which is a forward primer near adenovirus 5 open reading frame 6. Refer to Figure 1 for position of the primers relative to the E4 genes.
One clone, designated 211, was selected exhibiting altered growth properties relative to that seen in parent cell line 293. The 211 clone contained the expected product, indicating the presence of inserted DNA corresponding to most, if not all, of the E4 fragment contained in the pE4/Hygro plasmid. The 211 cell line has been deposited with the ATCC as described in Example 3 This line was further evaluated by amplification using the primer pair E4L/E4R described above, and a product corresponding to the full-length E4 insert was detected.
Genomic Southern blotting was performed on DNA restricted with EcoRI and BamHI. The E4 fragment was then detected at approximately one copy/genome compared to standards with the EcoRI/BamHI E4 fragment as cloned into pBS/E4 for use as a labeled probe with the Genius system according to manufacturer's instructions (Boehringer Mannheim, Indianapolis, S IN). In DNA from the 211 cell line, the expected labeled internal fragment pE4/Hygro hybridized with the isolated E4 sequences. In addition, the probe hybridized to a larger fragment which may be the result of a second insertion event (Figure Although the 211 cell line was not selected by neomycin resistance, thus indicating the absence of fiber gene, to confirm the lack of fiber gene, the 211 cell line was analyzed for expression of fiber protein by indirect immunofluorescence with an anti-fiber polyclonal antibody and a FITC-labeled anti-rabbit IgG (KPL) as secondary. No immunoreactivity was detected. Therefore, to generate 211 clones containing recombinant fiber genes, the 211 clone was expanded by growing in RPMI medium and subjected to additional electroporation with the fiber-encoding pCLF plasmid as described above.
Following electroporation, cells were plated in DMEM 10% fetal calf serum and colonies were selected with 200 pg/ml G418 (Gibco, Gaithersburg, MD). Positive cell lines remained hygromycin resistant. These candidate sublines of 211 were then screened for fiber protein expression by indirect immunofluorescence as described above. The three sublines screened, 211A, 211B and 211R, along with a number of other sublines, all exhibited nuclear staining qualitatively comparable to the positive control of 293 cells infected with AdRSV gal (1 pfu/cell) and stained 24 hours post-infection.
Lines positive for nuclear staining in this assay were then subjected to Western blot analysis under denaturing conditions using the same antibody. Several lines in which the antibody detected a protein of the expected molecular weight (62 kd for the Ad5 fiber protein) were selected for further study including 211A, 211B and 211R. The 211A cell line has been deposited with ATCC as described in Example 3.
Western blot analysis using soluble nuclear extracts from these three cell lines and a seminative electrophoresis system demonstrated that the fiber protein expressed is in the functional trimeric form characteristic of the native fiber protein as shown in Figure 6. The predicted molecular weight of a trimerized fiber is 186 kd. The lane marked 293 lacks fiber while the sublines contain detectable fiber. Under denaturing conditions, the trimeric form was S destroyed resulting in detectable fiber monomers as shown in Figure 6. Those clones containing endogenous El, newly expressed recombinant E4 and fiber proteins were selected for use in complementing adenovirus gene delivery vectors having the corresponding adenoviral genes deleted as described in Example 2.
D. Preparation of an El-Expressing Plasmid for Complementation of El-Gene-Deleted Adenoviruses In order to prepare adenoviral packaging cell lines other than those based on the Elgene containing 293 cell line as described in Example 1C above, plasmid vectors containing El alone or in various combinations with E4 and fiber genes are constructed as described below.
o* The region of the adenovirus genome containing the Ela and Elb gene is amplified from viral genomic DNA by PCR as previously described. The primers used are EIL, the 5' or forward primer, and ElR, the 3' or backward primer, having the respective nucleotide sequences 5'CCGAGCTAGCGACTGAAAATGAG3' (SEQ ID NO 10) and 5'CCTCTCGAGAGACAGCAAGACAC3' (SEQ ID NO 11). The ElL and E1R primers include the respective restriction sites NheI and XhoI as indicated by the underlines. The sites are used to clone the amplified El gene fragment into the NheI/XhoI sites in pMAM commercially available from Clontech (Palo Alto, CA) to form the plasmid pDEX/El having 11152 bp, the plasmid map of which is shown in Figure 7.
-46- The complete nucleotide sequence of pDEX/E is listed in SEQ ID NO 12 where the nucleotide position 1 corresponds to approximately 1454 nucleotides from the 3' end of the pMAM backbone vector sequence. The pDEX/E plasmid includes nucleotides 552 to 4090 of the adenovirus genome positioned downstream (beginning at nucleotide position 1460 and ending at 4998 in the pDEX/E1 plasmid) of the glucocorticoid-inducible mouse mammary tumor virus (MMTV) promoter of pMAM. The pMAM vector contains the E. coli gpt gene that allows stable transfectants to be isolated using hypoxanthine/aminopterin/thymidine
(HAT)
selection. The pMAM backbone occupies nucleotide positions 1-1454 and 5005-11152 of SEQ ID NO 12.
E. Generation of an Adenovirus Packaging Cell Line Carrying Plasmids Encoding Functional El. and Fiber Proteins To create separate adenovirus packaging cell lines equivalent to that of the 211 o sublines, 211A, 211B and 211R, as described in Example IC, alternative cell lines lacking adenoviral genomes are selected for transfection with the plasmid constructs as described below. Acceptable host cells include A549, Hela, Vero and the like cell lines as described in Example 1. The selected cell line is transfected with the separate plasmids, pDEX/Eland pCLF, respectively for expressing El, and fiber complementary proteins. Following transfection procedures as previously described, clones containing stable insertions of the two plasmids are isolated by selection with neomycin and HAT. Integration of full-length copy of the El gene is assessed by PCR amplification from genomic DNA using the primer set EL/E I R as described above. Functional insertion of the fiber gene is assayed by staining *with the anti-fiber antibody as previously described.
The resultant stably integrated cell line is then used as a packaging cell system to complement adenoviral gene delivery vectors having the corresponding adenoviral gene deletions as described in Example 2.
F. Preparation of a Plasmid Containing Two or More Adenoviral Genes for Complementing Gene-Deleted Adenoviruses The methods described in the preceding Examples rely on the use of two plasmids, pE4/Hygro and pCLF, or, pCLF and pDEX/E for generating adenoviral cell packaging systems. In alternative embodiments contemplated for use with the methods of this invention, complementing plasmids containing two or more adenoviral genes for expressing of encoded -47proteins in various combinations are also prepared as described below. The resultant plasmids are then used in various cell systems with delivery plasmids having the corresponding adenoviral gene deletions. The selection of packaging cell, content of the delivery plasmids and content of the complementing plasmids for use in generating recombinant adenovirus viral vectors of this invention thus depends on whether other adenoviral genes are deleted along with the adenoviral fiber gene, and, if so, which ones.
1. Preparation of a Complementing Plasmid Containing Fiber and El Adenoviral Genes A DNA fragment containing sequences for the CMV promoter, adenovirus tripartite leader, fiber gene and bovine growth hormone terminator is amplified from pCLF prepared in Example 1B using the forward primer 5'GACGGATCGGGAGATCTCC3' (SEQ ID NO 13), that anneals to the nucleotides 1-19 of the pCDNA3 vector backbone in pCLF, and the backward primer 5'CCGCCTCAGAAGCCATAGAGCC3' (SEQ ID NO 14) that anneals to nucleotides 1278-1257 of the pCDNA3 vector backbone. The fragment is amplified as previously described and then cloned into the pDEX/E1 plasmid, prepared in Example ID.
For cloning in the DNA fragment, the pDEX/El vector is first digested with NdeI, that cuts at a unique site in the pMAM vector backbone in pDEX/El, then the ends are repaired by treatment with bacteriophage T4 polymerase and dNTPs.
The resulting plasmid containing El and fiber genes, designated pEl/Fiber, provides both dexamethasone-inducible El function as described for DEX/E and expression of fiber protein as described above. A schematic plasmid map of pEl/Fiber, having 14455 bp, is shown in Figure 8.
The complete nucleotide sequence of pE1/Fiber is listed in SEQ ID NO 15 where the nucleotide position 1 corresponds to approximately to 1459 nucleotides from the 3' end of the parent vector pMAM sequence. The 5' and 3 ends of the Ad5 El gene are located at respective nucleotide positions 1460 and 4998 followed by pMAM backbone and then separated from the Ad5 fiber from pCLF by the filled-in blunt ended Ndel site. The 5' and 3' ends of the pCLF fiber gene fragment are located at respective nucleotide positions 10922- 14223 containing elements as previously described for pCLF.
The resultant pEl/Fiber plasmid is then used to complement one or more delivery plasmids expressing El and fiber.
-48- The pEl/Fiber construct is then used to transfect a selected host cell as described in Example IE to generate stable chromosomal insertions preformed as previously described followed by selection on HAT medium. The stable cells are then used as packaging cells as described in Example 2.
2) Preparation of a Complementing Plasmid Containing E4 and Fiber Adenoviral Genes pCLF prepared as described in Example 1B is partially digested with Bgll to cut only at the site in the pCDNA3 backbone. The pE4/Hygro plasmid prepared in Example 1A is digested with BamHI to produce a fragment containing E4. The E4 fragment is then inserted into the BamHI site of pCLF to form plasmid pE4/Fiber. The resultant plasmid provides S expression of the fiber gene as described for pCLF and E4 function as described for pE4/Hygro.
A schematic plasmid map of pE4/Fiber, having 10610 bp, is shown in Figure 9. The complete nucleotide sequence of pE4/Fiber is listed in SEQ ID NO 16 where the nucleotide position 1 corresponds to approximately 14 bp from the 3' end of the parent vector pCDNA3 backbone sequence. The 5' and 3 ends of the Ad5 E4 gene are located at respective nucleotide positions 21 and 3149 followed by fused BgllI/BamHI sites and pCDNA3 backbone including the CMV promoter again followed by BglII/BamHI sites. The adenovirus leader sequence begins at nucleotide position 4051 and extends to 4366 followed by fused BamHI/BglII sites and the 5' and 3' ends of the fiber gene located at respective nucleotide positions 4372 and 6124.
Stable chromosonal insertions of pE4/Fiber in host cells are obtained as described above.
-49- Example 2 Preparation of Adenoviral Gene Delivery Vectors Using Adenoviral Packaging Cell Lines Adenoviral delivery vectors of this invention are prepared to separately lack the combinations of El/fiber and E4/fiber. Such vectors are more replication-defective than those previously in use due to the absence of multiple viral genes. A preferred adenoviral delivery vector of this invention that is replication competent but only via a non-fiber means is one that only lacks the fiber gene but contains the remaining functional adenoviral regulatory and structural genes. Furthermore, the adenovirus delivery vectors of this invention have a higher capacity for insertion of foreign DNA.
A. Preparation of Adenoviral Gene Delivery Vectors Having Specific Gene Deletions and Methods of Use To construct the E /fiber deleted viral vector containing the LacZ reporter gene construct, two new plasmids were constructed. The plasmid pA E1BP gal was constructed as follows. By digestion of pSV gal (ProMega Corp., Madison, WI) with VspI, a DNA fragment containing the SV40 regulatory sequences and the E. coli -3-galactosidase gene was isolated. The resulting fragment having overhanging ends was then filled in with Klenow fragment of DNA polymerase I in the presence of dNTPs followed by digestion with BamHI.
The resulting fragment was cloned into the EcoRV and BamHI sites in the polylinker of pA ElsplB (Microbix Biosystems, Hamilton, Ontario) to form p A EIB gal that therefore contained the left end of the adenovirus genome with the Ela region replaced by the LacZ cassette (nucleotides 6690 to 4151) of pSVP gal. Plasmid DNA was prepared by the alkaline lysis method as described by Bimboim and Doly, Nuc. Acids Res., 7:1513-1523 (1978) from transformed cells used to expand the plasmid. DNA was then purified by CsCl-ethidium bromide density gradient centrifugation.
The second plasmid (pDV44), prepared as described herein, is derived from pBHGIO, a vector prepared a described by Bett et al., Proc. Natl. Acad. Sci.. USA, 91:8802-8806 (1994) and commercially available from Microbix, which contains an Ad5 genome with the packaging signals at the left end deleted and the E3 region (nucleotides 28133:30818) replaced by a linker with a unique site for the restriction enzyme Pacl. An 11.3 kb BamHI fragment, which contains the right end of the adenovirus genome, is isolated from pBHG10 and cloned into the BamHI site of pBS/SK(+) to create plasmid p 1.3 having approximately 14,658 bp. A schematic of the plasmid map is shown in Figure 13. The pl 1.3 plasmid was then digested with PacI and SalI to remove the fiber, E4, and inverted terminal repeat (ITR) sequences.
This fragment is replaced with a fragment containing the ITR segments and the E4 gene which is generated by PCR amplification from pBHGIO using the following oligonucleotide sequences(SEQ ID NO 17) (SEQ ID NO 18). These primers incorporate sites for PacI and BamHI, respectively. Cloning this fragment into the Pacl and blunt ended Sall sites of the pi 1.3 backbone resulted in a substitution of the fused ITRs, E4 region and fiber gene present in pBHG10, by the ITRs and E4 region alone.
-In general, the method for virus production by recombination of plasmids followed by complementation in cell culture involves the isolation of recombinant viruses by cotransfection of any one of the adenovirus packaging cell systems prepared in Example 1, namely 211 A, 21 IB, 211R, A549, Vero cells, and the like, with plasmids carrying sequences corresponding to viral gene delivery vectors.
A selected cell line is plated in dishes and cotransfected with pDV44 and p EIB gal using the calcium phosphate method as described by Bett et al., Proc. Natl. Acad. Sci., USA, 91:8802-8806 (1994). Recombination between the overlapping adenovirus sequences in the two plasmids leads to the creation of a full-length viral chromosome where pDV44 and pA ElBP gal recombine to form a recombinant adenovirus vector having multiple deletions. The deletion of El and of the fiber gene from the viral chromosome is compensated for by the sequences integrated into the packaging cell genome, and infectious virus particles are produced. The plaques thus generated are isolated and stocks of the recombinant virus are produced by standard methods.
In a preferred embodiment of this invention, a delivery plasmid is prepared that does not require the above-described recombination events to prepare a therapeutic viral vector having a fiber gene deletion. A single delivery plasmid containing all the adenoviral genome -51necessary for packaging but lacking the fiber gene is prepared from plasmid pFG140 containing full-length Ad85 that is commercially available from Microbix. The resultant delivery plasmid referred to as pFG140-f is then used with pCLF stably integrated cells as described above to prepare a therapeutic viral vector lacking fiber. In a preferred aspect of this invention, the fiber gene is replace with a therapeutic gene of interest for preparing a therapeutic delivery adenoviral vector.
Vectors for the delivery of any desired therapeutic gene are prepared by cloning the gene of interest into the multiple cloning sites in the polylinker of commercially available p ElsplB (Microbix Biosystems), in an analogous manner as performed for preparing p ElB gal as described above. The same cotransfection and recombination procedure is then followed as described herein to obtain viral gene delivery vectors.
S. The recombinant viruses thus produced are used as gene delivery tools both in cultured S cells and in vivo. For studies of the effectiveness and relative immunogenicity of multiplydeleted vectors, virus particles are produced by growth in the packaging lines described in Example 1 and are purified by CsCl gradient centrifugation. Following titering, virus particles are administered to mice via systemic or local injection or by aerosol delivery to lung. The LacZ reporter gene allows the number and type of cells which are successfully transduced to be evaluated. The duration of transgene expression is evaluated in order to determine the long-term effectiveness of treatment with multiply-deleted recombinant adenoviruses relative to the standard technologies which have been used in clinical trials to date. The immune response to the improved vectors described here is determined by assessing parameters such as inflammation, production of cytotoxic T lymphocytes directed against the vector, and the nature and magnitude of the antibody response directed against viral proteins.
Versions of the vectors which contain therapeutic genes such as CFTR for treatment of cystic fibrosis or tumor suppressor genes for cancer treatment are evaluated in the animal system for safety and efficiency of gene transfer and expression. Following this evaluation, they are used as experimental therapeutic agents in human clinical trials.
B. Retargeting of Adenoviral Gene Delivery Vectors by Producing Viral Particles Containing Different or Altered Fiber Proteins As the specificity of adenovirus binding to target cells is largely determined by the fiber protein, viral particles that incorporate modified fiber proteins or fiber proteins from different -52adenoviral serotypes (pseudotyped vectors) have different specificities. Thus, the expression of the native Ad5 fiber protein in adenovirus packaging cells as described above is also applicable to production of different fiber proteins.
In one aspect of invention, chimeric fiber proteins are produced according to the methods of Stevenson et al., J. Virol., 69:2850-2857 (1995). The authors showed that the determinants for fiber receptor binding activity are located in the head domain of the fiber and that isolated head domain is capable of trimerization and binding to cellular receptors. The head domains of adenovirus type 3 (Ad3) and Ad5 were exchanged in order to produce chimeric fiber proteins. Similar constructs for encoding chimeric fiber proteins for use in the methods of this invention are contemplated. Thus, instead of the using the intact Ad5 fiberencoding construct prepared in Example 1 as a complementing viral vector in adenoviral S. packaging cells, the constructs described herein are used to transfect cells along with E4 and/or El-encoding constructs.
~Briefly, full-length Ad5 and Ad3 were amplified from purified adenovirus genomic DNA as a template. The AdS and Ad3 nucleotides sequences are available with therespective GenBank Accession Numbers M18369 and M1241 Oligonucleotide primers are designed to amplify the entire coding sequence of the full-length fiber genes, starting from the start codon, ATG, and ending with the termination codon TAA. For cloning purposes, the 5' and 3' primers contain the respective restriction sites BamHI and NotI for cloning into pcDNA plasmid as described in Example 1A. PCR is performed as described above.
The resultant products are then used to construct chimeric fiber constructs by PCR gene overlap extension, as described by Horton et al., BioTechniues, 8:525-535 (1990). The fiber tail and shaft regions (5TS; the nucleotide region encoding amino acid residue positions 1 to 403) are connected to the Ad3 fiber head region (3H; the nucleotide region encoding amino acid residue positions 136 to 319) to form the 5TS3H fiber chimera.
Conversely, the Ad3 fiber tail and shaft regions (3TS; the nucleotide region encoding amino acid residues positions 1 to 135) are connected to the Ad5 fiber head region (5H; the nucleotide region encoding the amino acid residue positions 404 to 581) to form the fiber chimera. The fusions are made at the conserved TLWT (SEQ ID NO 21) sequence at the fiber shaft-head junction.
-53- The resultant chimeric fiber PCR products are then digested with BamHI and NotI for separate directional ligation into a similarly digested pcDNA vector. The Ad2 leader sequence is then subcloned into the BamHI as described in Example 1A for preparing an expression vector for subsequent transfection into 211 cells as described above or into the alternative packaging cell systems as previously described. The resultant chimeric fiber constructcontaining adenoviral packaging cell lines are then used to complement adenoviral delivery vectors as previously described. Other fiber chimeric constructs are obtained using a similar approach with the various adenovirus serotypes known.
In an alternative embodiment, the methods of this invention contemplate the use of the modified proteins including novel epitopes as described by Michael et al., Gene Therapy, 2:660-668 (1995) and in International Publication WO 95/26412, the disclosures of which are incorporated by reference herein. Both publications describe the construction of a cell-type specific therapeutic viral vector having a new binding specificity incorporated into the virus concurrent with the destruction of the endogenous viral binding specificity. In particular, the authors described the production of an adenoviral vector encoding a gastrin releasing peptide (GRP) at the 3' end of the coding sequence of the Ad5 fiber gene. The resulting fiber-GRP fusion protein was expressed and shown to assemble functional fiber trimers that were correctly transported to the nucleus of HeLa cells following synthesis.
Based on the teachings in the paper and International Publication, similar constructs are C contemplated for use in the complementing adenoviral packaging cell systems of this invention for generating new adenoviral gene delivery vectors that are replication-deficient and less immunogenic. Heterologous ligands contemplated for use herein to redirect fiber specificity range from as few as 10 amino acids in size to large globular structures, some of which necessitate the addition of a spacer region so as to reduce or preclude steric hindrance of the heterologous ligand with the fiber or prevent trimerization of the fiber protein. The ligands are inserted at the end or within the linker region. Preferred ligands include those that target specific cell receptors or those that are used for coupling to other moieties such as biotin and avidin. The types of cell signaling as a result of binding by a ligand is dependent upon the specificity of that ligand; i.e, receptor internalization or lack thereof.
A preferred spacer includes a short 12 amino acid peptide linker composed of a series of serines and alanine flanked by a proline residue at each end. One of ordinary skill in the art -54is familiar with the preparation of linkers to accomplish sufficient protein presentation and for altering the binding specificity of the fiber protein without compromising the cellular events that follow viral internalization. Moreover, within the context of this invention, preparation of modified fibers having ligands positioned internally within the fiber protein, at the amino terminus and at the carboxy terminus as described below are contemplated for use with the methods described herein.
The preparation of a fiber having a heterologous binding ligand is prepared essentially as described in the above-cited paper. Briefly, for the ligand of choice, site-directed mutagenesis is used to insert the coding sequence for a linker into the NotI site at the 3' end of the Ad5 fiber construct in pCLF as prepared in Example 1. The 3' or antisense oligonucleotide encodes a preferred linker sequence of ProSerAlaSerAlaSerAlaSerAlaProGlySer (SEQ ID NO 22) followed by a unique restriction site and two stop codons, respectively, to allow the insertion of a coding sequence for a selected heterologous ligand and to ensure proper :translation termination. The 3' end of the antisense oligonucleotide includes sequences that 0 overlap with vector sequence into which the oligonucleotide is inserted via site-directed mutagenesis. Following mutagenesis of the pCLF sequence adding the linker and stop codon sequences, a nucleotide sequence encoding a preselected ligand is obtained, linkers corresponding to the unique restriction site are attached and then the sequence is cloned into linearized corresponding restriction site.
Into the resultant pCLF vector containing a Ad5 fiber gene sequence with 3' nucleotides encoding a linker and a ligand, the Ad2 leader sequence is inserted as previously described. The resultant fiber-ligand construct is then used to transfect 211 or the alternative cell packaging systems previously described to produce complementing viral vector packaging systems for use with the methods of this invention.
In a further embodiment, fiber proteins encoded by fiber genes isolated from different adenoviral serotypes are used intact for transfection into 211 or an alternative cell packaging system as previously described.
A gene encoding the fiber protein of interest is first cloned to create a plasmid analogous to pCLF, and stable cell lines producing the fiber protein are generated as described above for Ad5 fiber. The adenovirus vector described which lacks the fiber gene is then propagated in the cell line producing the fiber protein relevant for the purpose at hand. As the only fiber gene present is the one in the packaging cells, the adenoviruses produced contain only the fiber protein of interest and therefore have the binding specificity conferred by the complementing protein. Such viral particles are used in studies such as those described above to determine their properties in experimental animal systems.
C. Targeted Gene Delivery Using Viral Vector ParticlesLcking Fiber Protei An alternative mode of entry for adenoviral infection of hematopoietic cells has been described by Huang, et al., J. Virol., 69:2257-2263 (1995) which does not involve the fiber protein-host cell receptor interaction. As infection of most other cell types does require the presence of fiber protein, vector particles which lack fiber may preferentially infect hematopoietic cells, such as monocytes or macrophages.
To produce a fiber-free adenovirus vector particle, a vector lacking the fiber gene as described above in Example 2A but containing a gene of interest for delivery is amplified by growth in cells which do not produce a fiber protein, such as the 211 cells prepared in Example 1, thereby producing large numbers of particles lacking fiber protein. The recovered fiber-free viral particles are then used to deliver the inserted gene of interest following the methods of this invention via targeting mechanisms provided by other regions of the adenoviral vector, i.e., via the native penton base.
Example 3 Deposit of Materials The following cell lines and plasmids have been deposited on September 25, 1996, with the American Type Culture Collection, 1301 Parklawn Drive, Rockville, MD, USA (ATCC): Material ATCC Accession No.
Plasmid pE4/Hygro 97739 Plasmid pCLF 97737 211 Cell Line CRL-12193 211A Cell Line CRL-12194 The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the invention. The present invention is not to be limited in scope by the -56cell lines and plasmids deposited, since the deposited embodiment is intended as a single illustration of one aspect of the invention and any cell lines or plasmid vectors that are functionally equivalent are within the scope of this invention.
The foregoing specification, including the specific embodiments and examples, is intended to be illustrative of the present invention and is not to be taken as limiting. Numerous other variations and modifications can be effected without departing from the true spirit and scope of the present invention.
Example 4 The native fiber protein is a homotrimer Henry L.J. et al 1994 Characterization of the knob domain of the adenivirus Type 5 fiber protein expressed in Escherichia coli J. Virol 68:5239-5246 d and trimerization is essential for assembly of the penton/fiber complex {Novelli A et al 1991 Assembly of adenovirus type 2 fiber synthesized in cell-free translation system. J. Biol. Chem 266:9299-9303 To assess the multimeric structure of the recombinant fiber protein produced by the cell lines, cells were labeled with 50 pCi/ml Translabel (ICN) for two hours at 37 o C, lysed in RIPA buffer, and fiber protein was immunoprecipitated as described Harlow E et al 1988 Antibodies. Cold Spring Harbour Laboratory, cold Spring Harbour}. Immune complexes were collected on Protein A-Sepharose beads (Pierce), extensively washed with RIPA buffer, and incubated at room temperature in 0.1 M triethylamine, pH 11.5 to release bound fiber protein. A portion of the precipitated fiber was electrophoresed on a 8% SDS-PAGE gel under denaturing SDS in loading buffer, samples boiled for 5 minutes) or semi-native SDS in loading buffer, samples not heated) conditions.
As seen in Fig. 13, lines 211A, 211B, and 211R, but not the control 293 cells, expressed an immunologically reactive protein which migrated at the predicted molecular weight for trimer (186 kD) under seminative conditions and for monomer (62 kD) under denaturing conditions. The behavior of the precipitated fiber was indistinguishable from that of purified baculovirus-produced recombinant Ad2 -57fiber Wickham T et al 1993 Cell 73:309-319) (the 58 kD Ad2 and 62 kD Ad5 fibers have very similar mobilities under these conditions).
To determine whether the fiber-expressing lines could support the growth of a fiber-defective adenovirus, we performed one-step growth experiments using the temperature-sensitive fiber mutant Ad H5ts142 (the gift of Harold Ginsberg). At the restrictive temperature (39.5 this mutant produces an underglycoslyated fiber protein which is not incorporated into mature virions Chee-Sheung C. C et al 1982 J. Virol 42: 932-950 This results in the accumulation of non-infectious viral particles. We asked whether the recombinant fiber protein expressed by our cell lines could complement the H5ts 142 defect and rescue viral growth.
Cell lines 293, 211A,211B and 211R (2 x 106 cells/sample) were infected with H5ts142 at pfu/cell. 48 hours later, cells were detached with 25 mM EDTA and virus was harvested by four rapid freeze-thaw cycles. Debris was removed by a 10 minute spin at 1500 x g, and viral titers determined by fluorescent focus assay Thiel J.F et al 1967 Proc. Soc. Exp. Biol. Med. 125:892-895 on SW480 cells with a polyclonal anti-penton base Ab Wickham T et al 1993 Cell 73:309-319). As shown in Fig. 14, the fiber mutant virus replicated to high titers in 293 cells at 32.5" C (the permissive temperature), but to a much lower extent at the restrictive temperature of 39.50 C. The fiber-producing packaging lines 211A, 211B, or 211R supported virus production at 39.5° C to levels within two- to three-fold of those seen at the permissive temperature in 293 cells, indicating that these cells provided partial complementation of the fiber defect.
Interestingly, virus yields from the fiber-producing cell lines were also somewhat higher than those from 293 cells at 32.5"C (the 'permissive' temperature). This suggests that fiber produced by the ts142 virus may be partially defective even at the permissive temperature. Alternatively, a nonspecific increase in adenoviral titer could result when viruses are grown in our packaging cells, by a mechanism not involving fiber complementation. However, we have found that viruses with wild type fiber genes (such as Ad.RSVbgal) replicate to identical levels either in our packaging lines or in 293 -58cells (data not shown). Taken together, these results demonstrate that the observed increase in H5rs142 growth is due to specific complementation of the fiber mutation.
Even in the fiber-expressing cell lines, the fiber mutant grows to higher titers at 32 °C than at 39.5 This incomplete complementation may be due to the packaging lines' expression of fiber at a level somewhat below that seen in a wild-type infection (Fig. 16). A recent study reported an E4-deleted vector which coincidentally reduced fiber protein expression, resulting in a large reduction in the titer of virus produced Brough et al 1996 J. Virol. 70: 6497-6501). Another possibility is that the defective ts142 fiber protein produced at the restrictive temperature might form complexes with some of the wild type protein produced by the cells and prevent its assembly into particles.
Although the fiber proteins of different Ad serotypes differ in the length of their shaft domains and in their receptor-binding knob domains, the N-terminal regions responsible for interaction with the viral penton base are highly conserved Arnberg N et al 1997 Virology 227:239-244 (Fig. This suggests that fibers from many viral serotypes, with their different cell-binding specificities, may be amenable for use in producing gene delivery vectors.
We asked whether the recombinant Ad5 fiber produced by our packaging cells could be incorporated into particles of another adenovirus serotype. Adenovirus type 3 was grown either in fiber-producing cell lines or in 293 cells. Viral particles were purified by two sequential centrifugations (3 h at 111,000 x g) on preformed 15-40% CsCI gradients to remove soluble cellular proteins and then dialyzed extensively against 10 mM Tris-HCI, pH 8.1, 150 mM NaCI, 10% glycerol.
AdS fiber protein was detected by immunoblotting using the polyclonal anti-fiber serum, followed by detection with a horseradish peroxidase-conjugated goat anti-rabbit antibody (Kirkegaard and Perry Laboratories) and the ECL chemiluminescence substrate (Amersham). The purified Ad3 particles contained Ad5 fiber protein after a single passage through a fiber-expressing cell line but not after passage through 293 cells (Fig. 15B). Previous work has demonstrated that Ad2 fiber is capable of interacting in vitro with Ad3 penton base {Fender et al 1997 Nature Biotech. 15:52-56 and our result -59demonstrates that the type 5 fiber protein produced by the cells is capable of assembling into complete Ad3 particles.
A vector based on Ad5 but containing the gene for the Ad7 fiber protein has been described Gall J. et al 1996 J. Virol. 70:2116-2123), as well as Ads containing chimeric fiber genes {Krasnykh V. N et al J. Virol. 70:6839-6846). Addition of a short peptide linker to the fiber in order to confer binding to a different cellular protein has also been reported 8188 By using packaging technology such as that presented here, Ad vectors equipped with different fiber proteins may be produced simply by growth in cells expressing the fiber of interest, without the time-consuming step of generating a new vector genome for each application.
Replacing or modifying the fiber gene in the vector chromosome would also require that the new S fiber protein bind a receptor on the surface of the cells it which it is to be grown. The packaging cell approach will allow the generation of Ad particles containing a fiber which can no longer bind to its host cells, by a single round of growth in cells expressing the desired fiber gene. This will greatly expand the repertoire of fiber proteins which can be incorporated into particles, as well as simplifying the process of retargeting gene delivery vectors.
Finally, a novel fiber-independent pathway of infection has recently been described in hematopoietic cells, in which penton base provides the initial virus-cell interaction by binding to integrin am 2 (Huang S. et al 1996 J. Virol 70: 4502-4508). This suggests that viral particles lacking fiber protein may be useful in targeting gene delivery to specific cell types via this pathway.
SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Nemerow, Glen R.
Von Seggern, Daniel J.
(ii) TITLE OF INVENTION: PACKAGING CELL LINES, ADENOVIRUS VECTORS, AND METHODS OF USING SAME (iii) NUMBER OF SEQUENCES: (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: THE SCRIPPS RESEARCH INSTITUTE STREET: 10550 North Torrey Pines Road CITY: La Jolla STATE: California COUNTRY: United States ZIP: 92037 COMPUTER READABLE FORM: MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.25 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: US FILING DATE: 25-SEP-1996
CLASSIFICATION:
(viii) ATTORNEY/AGENT INFORMATION: NAME: Logan, April C.
REGISTRATION NUMBER: 33,950 REFERENCE/DOCKET NUMBER: TSRI 554.0 (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (619) 554-2937 -61- TELEFAX: (619) 554-6312 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO S(iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: CGGTACACAG AATTCAGGAG ACACAACTCC INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid S(C) STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO -62- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: GCCTGGATCC GGGAAGTTAC GTAACGTGGG AAAAC INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 12 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: o* CGCGGATCCG CG 12 INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 8710 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: circular (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO 53- (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: CACCTAAATT GTAAGCGTTA ATATTTTGTT AAAATTCGC!G TTAAATTTTT GTTAAATCAG CTCATTTTTT AACCAATAGG CCGAAATCGG CAAAATCCCT TATAAATCAA AAGAATAGAC 120 CGAGATAGGG TTGAGTGTTG TTCCAGTTTG GAALCAAGAGT CCACTATTAA AGAACGTGGA 180 CTCCAACGTC AAAGGGCGAA AAACCGTCTA TCAGGGCGAT GGCCCACTAC GTGAACCATC 240 ACCCTAATCA AGTTTTTTGG GGTCGAGGTG CCGTAAAGCA C-TAAATCGGA ACCCTAAAGG 300 GAGCCCCCGA TTTAGAGCTT GACGGGGAAA GCCGGCGA.AC GTGGCGAGAA AGGAAGGGAA 360 GAAAGCGAAA GGAGCGGGCG CTAGGGCGCT GGCAAGTGTA GCGGTCACGC TGCGCGTAAC 420 V0*.
0.000* CACCACACCC GCCGCGCTTA ATGCGCCGCT ACAGGGCGCG TCCCATTCGC CATTCAGGCT 480 GCGCAACTGT TGGGAAGGGC GATCGGTGCG GGCCTCTTCG CTATTACGCC AGCTGGCGAA 540 AGGGGGATGT GCTGCAAGGC GATTAAGTTG GGTAACGCC!A GGGTTTTCCC AGTCACGACG 600 TTGTAAAACG ACGGCCAGTG .AATTGTAATA CGACTCACTA TAGGGCGAAT TGGGTACCGG 660 GCCCCCCCTC GAGGTCGACG GTATCGATAA GCTTGATATC GAATTCAGGA GACACAACTC 720 -64- 0 .0 0 0
CAAGTGCATA
780
TATTTGCCAC
840
TTATGTTTCA
900
AGTATAGCCC
960
ACCCTAGTAT
1020
CGGCTGGCCT
1080
CACACGGTTT
1140
TCACTTAAGT
1200
TGCTTAACGG
1260
ATCAGGATAG
1320
GTCCTGCAGG
1380
ATAAGGCGCC
1440
TAACTGCAGC
1500 CTCTATGTCA TTTTCATGGG ACTGGTCTGG CCACAACTAC ATTAATGAAA ATCCTCTTAC ACTTTTTCAT ACATTGCCCA AGAATAAAGA ATCGTTTGTG ACGTGTTTAT TTTTCAATTG CAGAAAATTT CAAGTCATTT TTCATTCAGT CACCACCACA TAGCTTATAC AGATCACCGT ACCTTAATCA AACTCACAGA TCAACCTGCC ACCTCCCTCC CAACACACAG AGTACACAGT CCTTTCTCCC TAA.AAAGCAT CATATCATGG GTAACAGACA TATTCTTAGG TGTTATATTC CCTGTCGAGC CAAACGCTCA TCAGTGATAT TAATAAACTC CCCGGGCAGC TCATGTCGCT GTCCAGCTGC TGAGCCACAG GCTGCTGTCC AACTTGCGGT GCGGCGAAGG AGAAGTCCAC GCCTACATGG GGGTAGAGTC ATAATCGTGC GGCGGTGGTG CTGCAGCAGC GCGCGAATAA ACTGCTGCCG CCGCCGCTCC AATACAACAT GGCAGTGGTC TCCTCAGCGA TGATTCGCAC CGCCCGCAGC TTGTCCTCCG GGCACAGCAG CGCACCCTGA TCTCACTTAA ATCAGCACAG ACAGCACCAC AATATTGTTC AAAATCCCAC AGTGCAAGGC GCTGTATCCA *0 0 00 0 0 9 00 0 0S 9* I 0006 0000
AAGCTCATGG
1560
AAGTGGCGAC
1620
TTCACCACCT
1680
CTAXACCAGC
1740
CAATGACAGT
1800
ATGTTGGCAC
1860
GTTAGAACCA
1920
GGAAGACCTC
1980
GGATGATCCT
2040
CTGTACGGAG
2100
ACGCCGGACG
2160
GCGTCTCCGG
2220
CAAAGCATCC
2280 CGGGGACCAC AGAACCCACG TGGCCATCAT ACCACAAGCG CAGGTAGATT CCCTCATAAA CACGCTGGAC ATAAACATTA CCTCTTTTGG CATGTTGTAA CCCGGTACCA TATAAACCTC TGATTAAACA TGGCGCCATC CACCACCATC TGGCCAAAAC CTGCCCGCCG GCTATACACT GCAGGGAACC GGGACTGGAA GGAGAGCCCA GGACTCGTAA CCATGGATCA TCATGCTCGT CATGATATCA AACACAGGCA CACGTGCATA CACTTCCTCA GGATTACAAG CTCCTCCCGC TATCCCAGGG AACAACCCAT TCCTGAATCA GCGTAAATCC CACACTGCAG GCACGTAACT CACGTTGTGC ATTGTCAAAG TGTTACATTC GGGCAGCAGC CCAGTATGGT AGCGCGGGTT TCTGTCTCAA AAGGAGGTAG ACGATCCCTA TGCGCCGAGA CAACCGAGAT CGTGTTGGTC GTAGTGTCAT GCCAAATGGA TAGTCATATT TCCTGAAGCA AAACCAGGTG CGGGCGTGAC AAACAGATCT TCTCGCCGCT TAGATCGCTC TGTGTAGTAG TTGTAGTATA TCCACTCTCT AGGCGCCCCC TGGCTTCGGG TTCTATGTAA ACTCCTTCAT GCGCCGCTGC -66-
CCTGATAACA
2340
CGAGTCACAC
2400
AAAGATTATC
2460
CGTGGTCAAA
2520
CTTCCAAALAG
2580
GAATCTCCTC
2640
ACCTTCTCAA
2700
GCTCCAGAGC
2760
TTCCTCACAG
2820
TAGGTCCCTT
2880
CACTTCCCCG
2940
AGCTATGCTA
3000
GCAAGGTGCT
3060 TCCACCACCG CAGAATAAGC CACACCCAGC CAACCTACAC ATTCGTTCTG ACGGGAGGAG CGGGAAGAGC TGGAAGAACC ATGTTTTTTT TTTTATTCCA CAAAACCTCA AAATGAAGAT CTATTAAGTG AACGCGCTCC CCTCCGGTGG CTCTACAGCC AAAGAACAGA TAATGGCATT TGTAAGATGT TGCACAATGG GCAA.ACGGCC CTCACGTCCA AGTGGACGTA AAGGCTAAAC CCTTCAGGGT TATAAACATT CCAGCACCTT CAACCATGCC CAAATAATTC TCATCTCGCC TATATCTCTA AGCAAATCCC GAATATTAAG TCCGGCCATT GTAAAAATCT GCCCTCCACC TTCAGCCTCA AGCAGCGAAT CATGATTGCA AAAATTCAGG ACCTGTATAA GATTCAAAAG CGGAACATTA ACAAAAA.TAC CGCGATCCCG CGCAGGGCCA GCTGAACATA ATCGTGCAGG TCTGCACGGA CCAGCGCGGC CCAGGAACCT TGACAAAAGA ACCCACACTG ATTATGACAC GCATACTCGG ACCAGCGTAG CCCCGATGTA AGCTTTGTTG CATGGGCGGC GATATAAAAT GCTCAAAAAA TCAGGCAAAG CCTCGCGCAA AAAAGAAAGC ACATCGTAGT 67-
CATGCTCATG
3120
TTCTCTCAAA
3180
TTAAACATTA
3240
TACGGCCATG
3300
CAGCTCCTCG
3360
CATCGGTCAG
3420
GAGACAACAT
3480
AAACACCTGA
3540
ACAGCGCTTC
3600
AAAAAACACC
3660
GCAGAGCGAG
3720
CCAGAAAACC
3780
CAAATCGTCA
3840 CAGATAAAGG CAGGTAAGCT CCGGAACCAC CACAGAAAAA GACACCATTT CATGTCTGCG GGTTTCTGCA TAAACACAAA ATAAAATAAC AAAAAAAcAT GAAGCCTGTC TTACAACAGG AAAAACAACC CTTATAAGCA TAAGACGGAC CCGGCGTGAC CGTAAAAAAA CTGGTCACCG TGATTAAAAA GCACCACCGA GTCATGTCCG GAGTCATAAT GTAAGACTCG GTAAACACAT CAGGTTGATT TGCTAAAAAG CGACCGAAAT AGCCCGGGGG AATACATACC CGCAGGCGTA TACAGCCCCC ATAGGAGGTA TAACAAAATT AATAGGAGAG AAAAACACAT AAAACCCTCC TGCCTAGGCA AAATAGCACC CTCCCGCTCC AGAACAACAT ACAGCGGCAG CCTAACAGTC AGCCTTACCA GTAAAAAAGA AAACCTATTA ACTCGACACG GCACCAGCTC AATCAGTCAC AGTGTAAAAA AGGGCCAAGT TATATATAGG ACTAAAAAAT GACGTAACGG TTAk-AGTCCA CAAAAAACAC GCACGCGAAC CTACGCCCAG AAACGAAAGC CAAAAAACCC ACAACTTCCT CTTCCGTTTT CCCACGTTAC GTAACTTCCC GGATCCGCGG CATTCACAGT -68 TCTCCGCAAG AATTGATTGG CTCCAATTCT TGGAGTGGTG AATCCGTTAG CGAGGTGCCG 3900
CCGGCTTCCA
3960
CAGACAAGGT
4020
GCGGCATAAA
4080
GCGAGCGATC
4140
TGCAACGCGG
4200
CAGCCTCGCG
4260
TTCATCCCCG
4320
CATGTCTTTA
4380
AACACGCAGA
4440
CGTGTGGCCT
4500
CCGCAGATCC
4560
AGTTTCTGAT
4620 TTCAGGTCGA GGTGGCCCGG CTCCATGCAC CGCGACGCAA CGCGGGGAGG ATAGGGCGGC GCCTACAATC CATGCCAACC CGTTCCATGT GCTCGCCGAG TCGCCGTGAC GATCAGCGGT CCAGTGATCG AAGTTAGGCT GGTAAGAGCC CTTGAAGCTG TCCCTGATGG TCGTCATCTA CCTGCCTGGA CAGCATGGCC GCATCCCGAT GCCGCCGGAA GCGAGAAGAA TCATAATGGG GAAGGCCATC TCGCGAACGC CAGCAAGACG TAGCCCAGCG CGTCGGCCGC CATGCCCTGC TGGCCCGTTG CTCGCGTTTG CTGGCGGTGT CCCCGGAAGA AATATATTTG GTTCTATGAT GACACAAACC CCGCCCAGCG TCTTGTCATT GGCGAATTCG TGCAGTCGGG GCGGCGCGGT CCCAGGTCCA CTTCGCATAT TAAGGTGACG CGAACACCGA GCGACCCTGC AGCGACCCGC TTAACAGCGT CAACAGCGTG CGGGCAATGA GATATGAAAA AGCCTGAACT CACCGCGACG TCTGTCGAGA CGAAAAGTTC GACAGCGTCT CCGACCTGAT GCAGCTCTCG GAGGGCGA6AG 69
AATCTCGTGC
4680
GCGCCGATGG
4740
CGATTCCGGA
4800
GCCGTGCACA
4860
AGCCGGTCGC
4920
TCGGCCCATT
4980
CGATTGCTGA
5040
CCGTCGCGCA
5100
ACCTCGTGCA
5160
CGGTCATTGA
5220
TCTTCTGGAG
5280
ATCCGGAGCT
5340 TTTCAGCTTC GATGTAGGAG GGCGTGGATA TGTCCTGCGG GTAAATAGCT TTTCTACAAA GATCGTTATG TTTATCGGCA CTTTGCATCG GCCGCGCTCC AGTGCTTGAC ATTGGGGAAT TCAGCGAGAG CCTGACCTAT TGCATCTCCC GGGTGTCACG TTGCAAGACC TGCCTGAAAC CGAACTGCCC GCTGTTCTGC GGAGGCCATG GATGCGATCG CTGCGGCCGA TCTTAGCCAG ACGAGCGGGT CGGACCGCAA GGAATCGGTC AATACACTAC ATGGCGTGAT TTCATATGCG TCCCCATGTG TATCACTGGC AAACTGTGAT GGACGACACC GTCAGTGCGT GGCTCTCGAT GAGCTGATGC TTTGGGCCGA GGACTGCCCC GAAGTCCGGC CGCGGATTTC GGCTCCAACA ATGTCCTGAC GGACAATGGC CGCATAACAG CTGGAGCGAG GCGATGTTCG GGGATTCCCA ATACGAGGTC GCCAACATCT GCCGTGGTTG GCTTGTATGG AGCAGCAGAC GCGCTACTTC GAGCGGAGGC TGCAGGATCG CCGCGGCTCC GGGCGTATAT GCTCCGCATT GGTCTTGACC AACTCTATCA GAGCTTGGTT GACGGCAATT TCGATGATGC AGCTTGGGCG CAGGGTCGAT 5400 GCGACGCAAT CGTCCGATCC GGAGCCGGGA CTGTCGGGCG TACACAAATC GCCCGCAGAA 5460 0*
S.
*9
GCGCGGCCGT
5520
CCAGCACTCG
5580
AGGAGACAAT
5640
ACGGGTGTTG
5700
GATACCCCAC
5760
CACCCCCCAA
5820
GCCATAGCCA
5880
TCGTGGGGGT
5940
CAGACCCATG
6000
CCGGGCGTCT
6060
CGCCCAGTGC
6120
GCCCTGACGG
6180 CTGGACCGAT GGCTGTGTAG AAGTACTCGC CGATAGTGGA AACCGACGCC TCCGAGGGCA AAGGAATAGG GGAGATGGGG GAGGCTAACT GAAACACGGA ACCGGAAGGA ACCCGCGCTA TGACGGCAAT AAAAAGACAG AATAAAACGC GGTCGTTTGT TCATAAACGC GGGGTTCGGT CCCAGGGCTG GCACTCTGTC CGAGACCCCA TTGGGGCCAA TACGCCCGCG TTTCTTCCTT TTCCCCACCC GTTCGGGTGA AGGCCCAGGG CTCGCAGCCA ACGTCGGGGC GGCAGGCCCT CTGGCCCCGT GGGTTAGGGA CGGGGTCCCC CATGGGGAAT GGTTTATGGT TATTATTTTG GGCGTTGCGT GGGGTCTGGT CCACGACTGG ACTGAGCAGA GTTTTTGGAT GGCCTGGGCA TGGACCGCAT GTACTGGCGC GACACGAACA GTGGCTGCCA AACACCCCCG ACCCCCAAAA ACCACCGCGC GGATTTCTGG CGTCGACCGG TCATGGCTGC GCCCCGACAC CCGCCAACAC CCGCTGACGC GCTTGTCTGC TCCCGGCATC CGCTTACAGA CAAGCTGTGA CCGTCTCCGG -71 GAGCTGCATG TGTCAGAGGT TTTCACCGTC ATCACCGAAA CGCGCGAGGC AGCCGGATCA 6240
V.
S
SS
5* 5 S.
S.
5 *gSS
S
*S~S
55
S
S
C, S
S
.9 55.5
S
SOS.
TAATCAGCCA
6300
CCTGAACCTG
6360
TAATGGTTAC
6420
GCATTCTAGT
6480
GTTCTAGAGC
6540
ATTTCGAGCT
6600
ACAATTCCAC
6660
GTGAGCTAAC
6720
TCGTGCCAGC
6780
CGCTCTTCCG
6840
GTATCAGCTC
6900
AAGAACATGT
6960 TACCACATTT GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCCACCTCCC AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCACTA GGCCGCCACC GCGGTGGAGC TCCAGCTTTT GTTCCCTTTA GTGAGGGTTA TGGCGTAATC ATGGTCATAG CTGTTTCCTG TGTGAAATTG TTATCCGCTC ACAACATACG AGCCGGAAGC ATAAAGTGTA AAGCCTGGGG TGCCTAATGA TCACATTAAT TGCGTTGCGC TCACTGCCCG CTTTCCAGTC GGGAAACCTG TGCATTAATG AATCGGCCAA CGCGCGGGGA GAGGCGGTTT GCGTATTGGG CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT GCGGCGAGCG ACTCAAAGGC GGTAATACGG TTATCCACAG AATCAGGGGA TAACGCAGGA GAGCAAAAGG CCAGCAAAAG GCCAGGAACC GTAAAAAGGC CGCGTTGCTG -72 GCGTTTTTCC ATAGGCTCCG CCCCCCTGAC GAGCATCACA AAAATCGACG CTCAAGTCAG 7020 AGGTGGCGAA ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCTGG AAGCTCCCTC 7080 GTGCGCTCTC CTGTTCCGAC CCTGCCGCTT ACCGGATACC TGTCCGCCTT TCTCCCTTCG 7140 GGAAGCGTGG CGCTTTCTCA TAGCTCACGC TGTAGGTATC TCAGTTCGGT GTAGG'rCGTT 7200 CGCTCCALAGC TGGGCTGTGT GCACGAACCC CCCGTTCAGC CCGACCGCTG CGCCTTATCC 7260 GGTAACTATC GTCTTGAGTC CAACCCGGTA AGACACGACT TATCGCCACT GGCAGCAGCC 7320 ACTGGTAACA GGATTAGCAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG 7380 TGGCCTAACT ACGGCTACAC TAGAAGGACA GTATTTGGTA TCTGCGCTCT GCTGAAGCCA 7440 GTTACCTTCG GAAAA.AGAGT TGGTAGCTCT TGATCCGGCA AACAAACCAC CGCTGGTAGC 7500 GGTGGTTTTT TTGTTTGCAA GCAGCAGATT ACGCGCAGAA AAAAAGGATC TCAAGAAGAT 000::0* 7560 CCTTTGATCT TTTCTACGGG GTCTGACGCT CAGTGGAACG AAAACTCACG TTA.AGGGATT 7620 TTGGTCATGA GATTATCAAA AAGGATCTTC ACCTAGATCC TTTTAAATTA AA.AATGAAGT 7680 TTTAAATCAA TCTAAAGTAT ATATGAGTAA ACTTGGTCTG ACAGTTACCA ATGCTTAATC 7740 -73-
AGTGAGGCAC
7800
GTCGTGTAGA
7860
CCGCGAGACC
7920
GCCGAGCGCA
7980
CGGGAAGCTA
8040
ACAGGCATCG
8100
CGATCAAGGC
8160
CCTCCGATCG
8220
CTGCATAATT
8280
TCAACCAAGT
8340
ATACGGGATA
8400
TCTTCGGGGC
8460
ACTCGTGCAC
8520 CTATCTCAGC GATCTGTCTA TTTCGTTCAT CCATAGTTGC CTGACTCCCC TAACTACGAT ACGGGAGGGC TTACCATCTG GCCCCAGTGC TGCAATGATA CACGCTCACC GGCTCCAGAT TTATCAGCAA TAAACCAGCC AGCCGGAAGG GAAGTGGTCC TGCAACTTTA TCCGCCTCCA TCCAGTCTAT TAATTGTTGC GAGTAAGTAG TTCGCCAGTT AATAGTTTGC GCAACGTTGT TGCCATTGCT TGGTGTCACG CTCGTCGTTT GGTATGGCTT CATTCAGCTC CGGTTCCCAA GAGTTACATG ATCCCCCATG TTGTGCAAAA AAGCGGTTAG CTCCTTCGGT TTGTCAGAAG TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA CTCTTACTGT CATGCCATCC GTAAQATGCT TTTCTGTGAC TGGTGAGTAC CATTCTGAGA ATAGTGTATG CGGCGACCGA GTTGCTCTTG CCCGGCGTCA ATACCGCGCC ACATAGCAGA ACTTTAAAAG TGCTCATCAT TGGAAAACGT GAAAACTCTC AAGGATCTTA CCGCTGTTGA GATCCAGTTC GATGTA.ACCC CCAACTGATC TTCAGCATCT TTTACTTTCA CCAGCGTTTC TGGGTGAGCA -74- AAAACAGGAA GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACACGGAA ATGTTGAATA 8580 CTCATACTCT TCCTTTTTCA ATATTATTGA AGCATTTATC AGGGTTATTG TCTCATGAGC 8640 GGATACATAT TTGAATGTAT TTAGAAAAAT AAACAAATAG GGGTTCCGCG CACATTTCCC 8700
CGAA.AAGTGC
8710 INFORMATION FOR SEQ ID SEQUE'{CE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genoinic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID ATGGGATCCA AGATGAAGCG CGCAAGACCG INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 30 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: CATAACGCGG CCGCTTCTTT ATTCTTGGGC INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: o LENGTH: 7148 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: circular (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTGTGTGTT GGAGGTCGCT GAGTAGTGCG 120 CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 180 -76 TTAGGGTTAG GCGTTTTGCG CTGCTTCGCG ATGTACGGGC 240 CAGATATACG CGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA 300 TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC 360 TGGCTGACCG CCCAACGACC CCCGCCCATT GACGTCAATA ATGACGTATG TCCCATAGT AACGCCAATA GGGACTTTCC 420 ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT 480 ATCATATGCC AAGTACGCCC CCTATTGACG TCAATGACGG 540 TAAATGGCCC GCCTGGCATT ATGCCCAGTA CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA 600 TCGCTATTAC CATGGTGATG CGGTTTTGGC AGTACATCAA 660 ACTCACGGGG ATTTCCAAGT CTCCACCCCA TTGACGTCAA 720 AAAATCAACG GGACTTTCCA AAATGTCGTA ACAACTCCGC 780 TGGGCGTGGA TAGCGGTTTG TGGGAGTTTG TTTTGGCACC CCCATTGACG CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA 840 CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTTGGTACC 900 GAGCTCGGAT CCAAGATGAA GCGCGCAAGA CCGTCTGAAG 960 ATACCTTCAA CCCCGTGTAT -77- 0 00 0
CCATATGACA
1020
CCCAATGGGT
1080
GTTACCTCCA
1140
GGCAACCTTA
1200
AACATAAACC
1260
GCCGCCGCAC
1320
ACCGTGCACG
1380
AAGCTAGCCC
1440
ACTGCCTCAC
1500
ATTTATACAC
1560
GACCTAAACA
1620
CAAACTAAAG
1680
GCAGGAGGAC
1740 CGGAAACCGG TCCTCCAACT GTGCCTTTTC TTACTCCTCC CTTTGTATCC TTCAAGAGAG TCCCCCTGGG GTACTCTCTT TGCGCCTATC CGAACCTCTA ATGGCATGCT TGCGCTCAAA ATGGGCAACG GCCTCTCTCT GGACGAGGCC CCTCCCAAAA TGTAACCACT GTGAGCCCAC CTCTCAAAAkA AACCAAGTCA TGGAAATATC TGCACCCCTC ACAGT'TACCT CAGAAGCCCT AACTGTGGCT CTCTAATGGT CGCGGGCAAC ACACTCACCA TGCAATCACA GGCCCCGCTA ACTCCAAACT TAGCATTGCC ACCCAAGGAC CCCTCACAGT GTCAGAAGGA TGCAAACATC AGGCCCCCTC ACCACCACCG ATAGCAGTAC CCTTACTATC CCCCTCTAAC TACTGCCACT GGTAGCTTGG GCATTGACTT GAAA.GAGCCC AAAATGGAAA ACTAGGACTA AAGTACGGGG CTCCTTTGCA TGTAACAGAC CTTTGACCGT AGCAACTGGT CCAGGTGTGA CTATTAATAA TACTTCCTTG TTACTGGAGC CTTGGGTTTT GATTCACAAG GCAATATGCA ACTTAATGTA TAAGGATTGA TTCTCAAAAC AGACGCCTTA TACTTGATGT TAGTTATCCG 78 a.
a a.
a a.
a. a a. a a a a.
a a
S
a.
a
TTTGATGCTC
1800
GCCCACAACT
1860
TCCAAAAAGC
1920
ATAGCCATTA
1980
CCCCTCAAAA
2040
AAACTAGGAA
2100
AATGATAAGC
2160
GAGAAAGATG
2220
GTTTCAGTTT
2280
CATCTTATTA
2340
GAATATTGGA
2400
GGATTTATGC
2460
ATTGTCAGTC
2520 AAAACCAACT AAATCTAAGA CTAGGACAGG GCCCTCTTTT TATAAACTCA TGGATATTAA CTACAACAAA GGCCTTTACT TGTTTACAGC TTCAAACAAT TTGAGGTTAA CCTAAGCACT GCCAAGGGGT TGATGTTTGA CGCTACAGCC ATGCAGGAGA TGGGCTTGAA TTTGGTTCAC CTAATGCACC AAACACAAAT CAAAAATTGG CCATGGCCTA GAATTTGATT CAAACAAGGC TATGGTTCCT CTGGCCTTAG TTTTGACAGC ACAGGTGCCA TTACAGTAGG AAACAAAAAT TAACTTTGTG GACCACACCA GCTCCATCTC CTAACTGTAG ACTAAATGCA CTAAACTCAC TTTGGTCTTA ACAAAATGTG GCAGTCAAAT ACTTGCTACA TGGCTGTTAA AGGCAGTTTG GCTCCAATAT CTGGAACAGT TCAAAGTGCT TAAGATTTGA CGAA.AATGGA GTGCTACTAA ACAATTCCTT CCTGGACCCA ACTTTAGAAA TGGAGATCTT ACTGAAGGCA CAGCCTATAC AAACGCTGTT CTAACCTATC AGCTTATCCA AAATCTCACG GTAAAACTGC CAAAAGTAAC AAGTTTACTT AAACGGAGAC AAAACTAAAC CTGTAACACT AACCATTACA -79- CTAAACGGTA CACAGGAAAC AGGAGACACA ACTCCAAGTG CATACTC TAT GTCATTTTCA 2580 *00 6900 too 9.6.0 0.60
TGGGACTGGT
2640
TCATACATTG
2700
CTATAGTGTC
2760
CAGCCATCTG
2820
ACTGTCCTTT
2880
ATTCTGGGGG
2940
CATGCTGGGG
3000
AGGGGGTATC
3060
CGCAGCGTGA
3120
TCCTTTCTCG
3180
GGGTTCCGAT
3240 CTGGCCACAA CTACATTAAT GAAATATTTG CCACATCCTC TTACACTTTT CCCAAGAATA AAGAAGCGGC CGCTCGAGCA TGCATCTAGA GGGCCCTATT ACCTAAATGC TAGAGCTCGC TGATCAGCCT CGACTGTGCC TTCTAGTTGC TTGTTTGCCC CTCCCCCGTG CCTTCCTTGA CCCTGGAAGG TGCCACTCCC CCTAATAAAA TGAGGAAATT GCATCGCATT GTCTGAGTAG GTGTCATTCT GTGGGGTGGG GCAGGACAGC AAGGGGGAGG ATTGGGAAGA CAATAGCAGG ATGCGGTGGG CTCTATGGCT TCTGAGGCGG AAAGAACCAG CTGGGGCTCT CCCACGCGCC CTGTAGCGGC GCATTAAGCG CGGCGGGTGT GGTGGTTACG CCGCTACACT TGCCAGCGCC CTAGCGCCCG CTCCTTTCGC TTTCTTCCCT CCACGTTCGC CGGCTTTCCC CGTCAAGCTC TAAATCGGGG CATCCCTTTA TTAGTGCTTT ACGGCACCTC GACCCCAAAA AACTTGATTA GGGTGATGGT TCACGTAGTG GGCCATCGCC CTGATAGACG GTTTTTCGCC CTTTGACGTT GGAGTCCACG 3300
TTCTTTAATA
3360
TCTTTTGATT
3420
TAACAAAAAT
3480
CCCCAGGCTC
3540 GTGGACTCTT GTTCCAAACT GGAACAACAC TCAACCCTAT CTCGGTCTAT TATAAGGGAT TTTGGGGATT TCGGCCTATT GGTTAAAAAA TGAGCTGATT TTAACGCGAA TTAATTCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT CCCAGGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA GTCAGCAACC AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT 3600 0 0 .0 0 0000
TAGTCAGCAA
3660
TCCGCCCATT
3720
GCCTCTGCCT
3780 CCATAGTCCC GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG CCTAGGCTTT TGCAAAAAGC TCCCGGGAGC TTGTATATCC ATTTTCGGAT CTGATCAAGA GACAGGATGA 3840
GGATCGTTTC
3900
GAGAGGCTAT
3960
TTCCGGCTGT
4020
CTGA.ATGAAC
4080 GCATGATTGA ACAAGATGGA TTGCACGCAG GTTCTCCGGC CGCTTGGGTG TCGGCTATGA CTGGGCACAA CAGACA.ATCG GCTGCTCTGA TGCCGCCGTG CAGCGCAGGG GCGCCCGGTT CTTTTTGTCA AGACCGACCT GTCCGGTGCC TGCAGGACGA GGCAGCGCGG CTATCGTGGC TGGCCACGAC GGGCGTTCCT -81 TGCGCAGCTG TGCTCGACGT TGTCACTGAA GCGGGAAGGG ACTGGCTGCT ATTGGGCGAA 4140
GTGCCGGGGC
4200
GCTGATGCAA
4260
GCGAAACATC
4320
GATCTGGACG
4380
CGCATGCCCG
4440
ATGGTGGAAA
4500
CGCTATCAGG
4560
GCTGACCGCT
4620
TATCGCCTTC
4680
CGACGCCCAA
4740
GCTTCGGAAT
4800
TGGAGTTCTT
4860 AGGATCTCCT GTCATCTCAC CTTGCTCCTG CCGAGAAAGT ATCCATCATG TGCGGCGGCT GCATACGCTT GATCCGGCTA CCTGCCCATT CGACCACCAA GCATCGAGCG AGCACGTACT CGGATGGAAG CCGGTCTTGT CGATCAGGAT AAGAGCATCA GGGGCTCGCG CCAGCCGAAC TGTTCGCCAG GCTCAAGGCG ACGGCGAGGA TCTCGTCGTG ACCCATGGCG ATGCCTGCTT GCCGAATATC ATGGCCGCTT TTCTGGATTC ATCGACTGTG GCCGGCTGGG TGTGGCGGAC ACATAGCGTT GGCTACCCGT GATATTGCTG AAGAGCTTGG CGGCGAATGG TCCTCGTGCT TTACGGTATC GCCGCTCCCG ATTCGCAGCG CATCGCCTTC TTGACGAGTT CTTCTGAGCG GGACTCTGGG GTTCGAAATG ACCGACCAAG CCTGCCATCA CGAGATTTCG ATTCCACCGC CGCCTTCTAT GAAAGGTTGG CGTTTTCCGG GACGCCGGCT GGATGATCCT CCAGCGCGGG GATCTCATGC CGCCCACCCC AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA -82-
ATAGCATCAC
4920
CCAALACTCAT
4980
CGTAATCATG
5040
ACATACGAGC
5100
CATTAATTGC
5160
ATTAATGAAT
5220
CCTCGCTCAC
5280
CAAAGGCGGT
5340
CAAAAGGCCA
5400
GGCTCCGCCC
5460
CGACAGGACT
5520
TTCCGACCCT
5580
TTTCTCAATG
5640 AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CAATGTATCT TATCATGTCT GTATACCGTC GACCTCTAGC TAGAGCTTGG GTCATAGCTG TTTCCTGTGT GAAATTGTTA TCCGCTCACA ATTCCACACA CGGAAGCATA AAGTGTAAAG CCTGGGGTGC CTAATGAGTG AGCTAACTCA GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG AA.ACCTGTCG TGCCAGCTGC CGGCCAACGC GCGGGGAGAG GCGGTTTGCG TATTGGGCGC TCTTCCGCTT TGACTCGCTG CGCTCGGTCG TTCGGCTGCG GCGAGCGGTA TCAGCTCACT AATACGGTTA TCCACAGAAT CAGGGGATAA CGCAGGAAAG AACATGTGAG GCAAAALGGCC AGGAACCGTA AAAAGGCCGC GTTGCTGGCG TTTTTCCATA CCCTGACGAG CATCACAAAA ATCGACGCTC AAGTCAGAGG TGGCGAAACC ATAAAGATAC CAGGCGTTTC CCCCTGGAAG CTCCCTCGTG CGCTCTCCTG GCCGCTTACC GGATACCTGT CCGCCTTTCT CCCTTCGGGA AGCGTGGCGC CTCACGCTGT AGGTATCTCA GTTCGGTGTA GGTCGTTCGC TCCAAGCTGG 83-
GCTGTGTGCA
5700
TTGAGTCCAA
5760
TTAGCAGAGC
5820
GCTACACTAG
5880
AAAGAGTTGG
5940
TTTGCAAGCA
6000
CTACGGGGTC
6060
TATCAAAAAG
6120
AAAGTATATA
6180
TCTCAGCGAT
6240
CTACGATACG
6300
GCTCACCGGC
6360
GTGGTCCTGC
6420 CGAACCCCCC GTTCAGCCCG ACCGCTGCGC CTTATCCGGT AACTATCGTC CCCGGTAAGA CACGACTTAT CGCCACTGGC AGCAGCCACT GGTAACAGGA GAGGTATGTA GGCGGTGCTA CAGAGTTCTT GAAGTGGTGG CCTAACTACG AAGGACAGTA TTTGGTATCT GCGCTCTGCT G.AAGCCAGTT ACCTTCGGAA TAGCTCTTGA TCCGGCAAAC AAACCACCGC TGGTAGCGGT GGTTTTTTTG GCAGATTACG CGCAGAAAAA AAGGATCTCA AGAAGATCCT TTGATCTTTT TGACGCTCAG TGGAACGAAA ACTCACGTTA AGGGATTTTG GTCATGAGAT GATCTTCACC TAGATCCTTT TAAATTAAAA ATGAAGTTTT AAATCAATCT TGAGTAAACT TGGTCTGACA GTTACCAATG CTTAATCAGT GAGGCACCTA CTGTCTATTT CGTTCATCCA TAGTTGCCTG ACTCCCCGTC GTGTAGATAA GGAGGGCTTA CCATCTGGCC CCAGTGCTGC AATGATACCG CGAGACCCAC TCCAGATTTA TCAGCAATAA ACCAGCCAGC CGGAAGGGCC GAGCGCAGAA AACTTTATCC GCCTCCATCC AGTCTATTA.A TTGTTGCCGG GAAGCTAGAG 84- 9*
TAAGTAGTTC
6480
TGTCACGCTC
6540
TTACATGATC
6600
TCAGAAGTAA
6660
TTACTGTCAT
6720
TCTGAGAATA
6780
CCGCGCCACA
6840
AACTCTCAAG
6900
ACTGATCTTC
6960
AAAATGCCGC
7020
TTTTTCAATA
7080
AATGTATTTA
7140
CTGACGTC
7148 GCCAGTTAAT AGTTTGCGCA ACGTTGTTGC CATTGCTACA GGCATCGTGG GTCGTTTGGT ATGGCTTCAT TCAGCTCCGG TTCCCAACGA TCAAGGCGAG CCCCATGTTG TGCAAAAAAG CGGTTAGCTC CTTCGGTCCT CCGATCGTTG GTTGGCCGCA GTGTTATCAC TCATGGTTAT GGCAGCACTG CATAATTCTC GCCATCCGTA AGATGCTTTT CTGTGACTGG TGAGTACTCA ACCAAGTCAT GTGTATGCGG CGACCGAGTT GCTCTTGCCC GGCGTCAATA CGGGATAATA TAGCAGAACT TTAAAAGTGC TCATCATTGG AAAACGTTCT TCGGGGCGAA GATCTTACCG CTGTTGAGAT CCAGTTCGAT GTAACCCACT CGTGCACCCA AGCATCTTTT ACTTTCACCA GCGTTTCTGG GTGAGCAAAA ACAGGAAGGC AAAAAAGGGA ATAAGGGCGA CACGGAAATG TTGAkATACTC ATACTCTTCC TTATTGAAGC ATTTATCAGG GTTATTGTCT CATGAGCGGA TACATATTTG GAAAAATAAA CAAATAGGGG TTCCGCGCAC ATTTCCCCGA AAAGTGCCAC INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 7469 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: circular (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi SEUNEDSRPIO:SQI 4B *AGATG .AACTC GACCTTGTGCC AT*ACTCCG
CCCTGTAGCATTCGTCCGCTTTGTGAGCCTGGATC
*120 *feSACAA TAGTC CAGAA CTACACA.GAGAGACG S.180 TTGGTA GCTTGGCGTCC TT.GG AAAAGCTGCT 240 GACGGCATG GAGTCTCCA AGACCTAT GTCGATCGT ACGCAATC GCTTGT 420 -86 ATTGACGTCA ATGGGTGGAC TATTTACGGT AAACTGCCCA CTTGGCAGTA CATCAAGTGT 480
ATCATATGCC
540
ATGCCCAGTA
600
TCGCTATTAC
660
ACTCACGGGG
720
AAAATCAACG
780
GTAGGCGTGT
840
CTGCTTACTG
900
GAGCTCGGAT
960
GTCTTTCCAG
1020
GGGACCTGAG
1080
AGTCACAGTC
1140
TGTTTCTGGC
1200 AAGTACGCCC CCTATTGACG TCAATGACGG TAAATGGCCC GCCTGGCATT CATGACCTTA TGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA CATGGTGATG CGGTTTTGGC AGTACATCAA TGGGCGTGGA TAGCGGTTTG ATTTCCAAGT CTCCACCCCA TTGACGTCAA TGGGAGTTTG TTTTGGCACC GGACTTTCCA AAATGTCGTA ACAACTCCGC CCCATTGACG CAAATGGGCG ACGGTGGGAG GTCTATATAA GCAGAGCTCT CTGGCTAACT AGAGAACCCA GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTTGGTACC CTGAATTCGA GCTCGCTGTT GGGCTCGCGG TTGAGGACAA ACTCTTCGCG TACTCTTGGA TCGGAAACCC GTCGGCCTCC GAACGGTACT CCGCCACCGA CGAGTCCGCA TCGACCGGAT CGGAAAACCT CTCGAGAAAG GCGTCTAACC GCAAGGTAGG CTGAGCACCG TGGCGGGCGG CAGCGGGTGG CGGTCGGGGT GGAGGTGCTG CTGATGATGT AATTAAAGTA GGCGGTCTTG AGACGGCGGA 87 TGGTCGAGGT GAGGTGTGGC AGGCTTGAGA TCCAAGATGA AGCGCGCAAG 1260 GATACCTTCA ACCCCGTGTA TCCATATGAC ACGGAAACCG GTCCTCCAAC 1320 CTTACTCCTC CCTTTGTATC CCCCAATGGG TTTCAAGAGA GTCCCCCTGG 1380 TTGCGCCTAT CCGAACCTCT AGTTACCTCC AATGGCATGC TTGCGCTCAA 1440 GGCCTCTCTC TGGACGAGGC CGGCAACCTT ACCTCCCAAA ATGTAACCAC 1500 CCTCTCAAAA AAACCAAGTC .AAACATAAAC CTGGAAATAT CTGCACCCCT 1560 TCAGAAGCCC TAACTGTGGC TGCCGCCGCA CCTCTAATGG TCGCGGGCAA 1620 ATGCAATCAC AGGCCCCGCT AACCGTGCAC GACTCCAAAC TTAGCATTGC 1680 CCCCTCACAG TGTCAGAAGG AAAGCTAGCC CTGCAAACAT CAGGCCCCCT 1740 GATAGCAGTA CCCTTACTAT CACTGCCTCA CCCCCTCTAA CTACTGCCAC 1800 GGCATTGACT TGAAAGAGCC CATTTATACA CAAAATGGAA AACTAGGACT 1860 GCTCCTTTGC ATGTAACAGA CGACCTAAAC ACTTTGACCG TAGCAACTGG 1920 ACTATTAATA ATACTTCCTT GCAAACTAAA GTTACTGGAG CCTTGGGTTT 1980
ACCGTCTGAA
TGTGCCTTTT
GGTACTCTCT
AATGGGCAAC
TGTGAGCCCA
CACAGTTACC
CACACTCACC
CACCCAAGGA
CACCACCACC
TGGTAGCTTG
AALAGTACGGG
TCCAGGTGTG
TGATTCACAA
88-
GGCAATATGC
2040
ATACTTGATG
2100
GGCCCTCTTT
2160
TTGTTTACAG
2220
TTGATGTTTG
2280
CCTAATGCAC
2340
TCAAACAAGG
2400
ATTACAGTAG
2460
CCTAACTGTA
2520
GGCAGTCAAA
2580
TCTGGAACAG
2640
AACAATTCCT
2700
ACAGCCTATA
2760 AACTTAATGT AGCAGGAGGA CTAAGGATTG ATTCTCAAAA CAGACGCCTT TTAGTTATCC GTTTGATGCT CAAAACCAAC TAAATCTAAG ACTAGGACAG TTATAAACTC AGCCCACAAC TTGGATATTA ACTACAACAA AGGCCTTTAC CTTCAAACAA TTCCAAAAAG CTTGAGGTTA ACCTAAGCAC TGCCAAGGGG ACGCTACAGC CATAGCCATT AATGCAGGAG ATGGGCTTGA ATTTGGTTCA CAAACACAAA TCCCCTCAAA ACAAAAATTG GCCATGGCCT AGAATTTGAT CTATGGTTCC TAA.ACTAGGA ACTGGCCTTA GTTTTGACAG CACAGGTGCC GAAACAAAAA TAATGATAAG CTAACTTTGT GGACCACACC AGCTCCATCT GACTAAATGC AGAGAAAGAT GCTAAACTCA CTTTGGTCTT AACAAAATGT TACTTGCTAC AGTTTCAGTT TTGGCTGTTA AAGGCAGTTT GGCTCCAATA TTCAAAGTGC TCATCTTATT ATAAGATTTG ACGAAAATGG AGTGCTACTA TCCTGGACCC AGAATATTGG AACTTTAGAA ATGGAGATCT TACTGAAGGC CAAACGCTGT TGGATTTATG CCTAACCTAT CAGCTTATCC AAAATCTCAC 89-
GGTAAAACTG
2820
CCTGTAACAC
2880
GCATACTCTA
2940
GCCACATCCT
3000
ATGCATCTAG
3060
TCGACTGTGC
3120
ACCCTGGAAG
3180
TGTCTGAGTA
3240
GATTGGGAAG
3300
GAA.AGAACCA
3360
GCGGCGGGTG
3420
GCTCCTTTCG
3480
CTAAATCGGG
3540 CCAAAAGTAA CATTGTCAGT CAAGTTTACT TAAACGGAGA CAAAkACTAAA TAACCATTAC ACTAAACGGT ACACAGGAAA CAGGAGACAC AACTCCAAGT TGTCATTTTC ATGGGACTGG TCTGGCCACA ACTACATTAA. TGAAATATTT CTTACACTTT TTCATACATT GCCCAAGAAT AAAGAAGCGG CCGCTCGAGC AGGGCCCTAT TCTATAGTGT CACCTAAATG CTAGAGCTCG CTGATCAGCC CTTCTAGTTG CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCT'TCCTTG GTGCCACTCC CACTGTCCTT TCCTAATAAA ATGAGGAAAT TGCATCGCAT GGTGTCATTC TATTCTGGGG GGTGGGGTGG GGCAGGACAG CAAGGGGGAG ACAATAGCAG GCATGCTGGG GATGCGGTGG GCTCTATGGC TTCTGAGGCG GCTGGGGCTC TAGGGGGTAT CCCCACGCGC CCTGTAGCGG CGCATTAAGC TGGTGGTTAC GCGCAGCGTG ACCGCTACAC TTGCCAGCGC CCTAGCGCCC CTTTCTTCCC TTCCTTTCTC GCCACGTTCG CCGGCTTTCC CCGTCAAGCT GCATCCCTTT AGGGTTCCGA TTTAGTGCTT TACGGCACCT CGACCCCAAA 90 AAACTTGATT AGGGTGATGG TTCACGTAGT GGGCCATCGC CCTGATAGAC GGTTTTTCGC 3600
CCTTTGACGT
3660
CTCAACCCTA
3720
TGGTTAAAAA
3780
GTCAGTTAGG
3840
CATCTCAATT
3900
ATGCAAAGCA
3960
CCGCCCCTAA
4020
ATTTATGCAG
4080
TTTTTTGGAG
4140
TCTGATCAAG
4200
GGTTCTCCGG
4260
GGCTGCTCTG
4320 TGGAGTCCAC GTTCTTTAAT AGTGGACTCT TGTTCCAAAC TGGAACAACA TCTCGGTCTA TTCTTTTGAT TTATAAGGGA TTTTGGGGAT TTCGGCCTAT ATGAGCTGAT TTAACAAAAA TTTAACGCGA ATTAATTCTG TGGAATGTGT GTGTGGAAAG TCCCCAGGCT CCCCAGGCAG GCAGAAGTAT GCAAAGCATG AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG GCTCCCCAGC AGGCAGAAGT TGCATCTCAA TTAGTCAGCA ACCATAGTCC CGCCCCTA-AC TCCGCCCATC CTCCGCCCAG TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT AGGCCGAGGC CGCCTCTGCC TCTGAGCTAT TCCAGAAGTA GTGAGGAGGC GCCTAGGCTT TTGCAAAAAG CTCCCGGGAG CTTGTATATC CATTTTCGGA AGACAGGATG AGGATCGTTT CGCATGATTG AACAAGATGG ATTGCACGCA CCGCTTGGGT GGAGAGGCTA TTCGGCTATG ACTGGGCACA ACAGACAATC ATGCCGCCGT GTTCCGGCTG TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC 91
AAGACCGACC
4380
CTGGCCACGA
4440
GACTGGCTGC
4500
GCCGAGAAAG
4560
ACCTGCCCAT
4620
GCCGGTCTTG
4680
CTGTTCGCCA
4740
GATGCCTGCT
4800
GGCCGGCTGG
4860
GAAGAGCTTG
4920
GATTCGCAGC
4980
GGTTCGAAAT
5040 TGTCCGGTGC CCTGAATGAA CTGCAGGACG AGGCAGCGCG GCTATCGTGG CGGGCGTTCC TTGCGCAGCT GTGCTCGACG TTGTCACTGA AGCGGGAAGG TATTGGGCGA AGTGCCGGGG CAGGATCTCC TGTCATCTCA CCTTGCTCCT TATCCATCAT GGCTGATGCA ATGCGGCGGC TGCATACGCT TGATCCGGCT TCGACCACCA AGCGAAACAT CGCATCGAGC GAGCACGTAC TCGGATGGAA TCGATCAGGA TGATCTGGAC GAAGAGCATC AGGGGCTCGC GCCAGCCGAA GGCTCAAGGC GCGCATGCCC GACGGCGA'% ATCTCGTCGT GACCCATGGC TGCCGAATAT CATGGTGGAA AATGGCCGCT TTTCTGGATT CATCGACTGT GTGTGGCGGA CCGCTATCAG GACATAGCGT TGGCTACCCG TGATATTGCT GCGGCGAATG GGCTGACCGC TTCCTCGTGC TTTACGGTAT CGCCGCTCCC GCATCGCCTT CTATCGCCTT CTTGACGAGT TCTTCTGAGC GGGACTCTGG GACCGACCAA GCGACGCCCA ACCTGCCATC ACGAGATTTC GATTCCACCG CCGCCTTCTA TGAA.AGGTTG GGCTTCGGAA TCGTTTTCCG GGACGCCGGC TGGATGATCC 5100
TCCAGCGCGG
5160
ATAATGGTTA
5220
TGCATTCTAG
5280
CGACCTCTAG
5340
ATCCGCTCAC
5400 GGATCTCATG CTGGAGTTCT TCGCCCACCC CAACTTGTTT ATTGCAGCTT CAAATAAAGC AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TTGTGGTTTG TCCAAACTCA TCAATGTATC TTATCATGTC TGTATkCCGT CTAGAGCTTG GCGTAATCAT GGTCATAGCT GTTTCCTGTG TGAAATTGTT AATTCCACAC AACATACGAG CCGGAAGCAT AAAGTGTAAA GCCTGGGGTG 0 .0.
CCTAATGAGT GAGCTAACTC ACATTAATTG CGTTGCGCTC ACTGCCCGCT TTCCAGTCGG 5460 9. *9 *9*9 9 99
GAAACCTGTC
5520
GTATTGGGCG
5580
GGCGAGCGGT
5640
ACGCAGGAAA
5700
CGTTGCTGGC
5760
CAAGTCAGAG
5820 CTCTTCCGCT TCCTCGCTCA CTGACTCGCT GCGCTCGGTC GTTCGGCTGC ATCAGCTCAC TCAAAGGCGG TAATACGGTT ATCCACAGAA TCAGGGGATA GAACATGTGA GCAAAAGGCC AGCAAAAGGC CAGGAACCGT AAAAAGGCCG GTGCCAGCTG CATTAATGIA TCGGCCALACG CGCGGGGAGA GGCGGTTTGC GTTTTTCCAT AGGCTCCGCC CCCCTGACGA GCATCACAAA AATCGACGCT GTGGCGAAAC CCGACAGGAC TATAAAGATA CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC CGGATACCTG TCCGCCTTTC 5880 93 *0~4
S
C
S
*5*S
S
C.
S
TCCCTTCGGG
5940
AGGTCGTTCG
6000
CCTTATCCGG
6060
CAGCAGCCAC
6120
TGAAGTGGTG
6180
TGAAGCCAGT
6240
CTGGTAGCGG
6300
AAGAAGATCC
6360
ALAGGGATTTT
6420
AATGAAGTTT
6480
GCTTAATCAG
6540
GACTCCCCGT
6600 AAGCGTGGCG CTTTCTCAAT GCTCACGCTG TAGGTATCTC AGTTCGGTGT CTCCAAGCTG GGCTGTGTGC ACGAACCCCC CGTTCAGCCC GACCGCTGCG TAACTATCGT CTTGAGTCCA ACCCGGTAAG ACACGACTTA TCGCCACTGG TGGTAACAGG ATTAGCAGAG CGAGGTATGT AGGCGGTGCT ACAGAGTTCT GCCTAACTAC GGCTACACTA GAAGGACAGT ATTTGGTATC TGCGCTCTGC TACCTTCGGA AAAAGAGTTG GTAGCTCTTG ATCCGGCAAA CA.AACCACCG TGGTTTTTTT GTTTGCAAGC AGCAGATTAC GCGCAGAAAA AAAGGATCTC TTTGATCTTT TCTACGGGGT CTGACGCTCA GTGGAACGAA AACTCACGTT GGTCATGAGA. TTATCAAAAA GGATCTTCAC CTAGATCCTT. 'TTAAATTAA.A TAAATCAATC TAAAGTATAT ATGAGTAAAC TTGGTCTGAC AGTTACCAAT TGAGGCACCT ATCTCAGCGA TCTGTCTATT TCGTTCATCC ATAGTTGCCT CGTGTAGATA ACTACGATAC GGGAGGGCTT ACCATCTGGC CCCAGTGCTG CAATGATACC GCGAGACCCA CGCTCACCGG CTCCAGATTT ATCAGCAATA AACCAGCCAG 6660 -94
CCGGAAGGGC
6720
ATTGTTGCCG
6780
CCATTGCTAC
6840
GTTCCCAACG
6900
CCTTCGGTCC
6960 CGAGCGCAGA AGTGGTCCTG CAACTTTATC CGCCTCCATC CAGTCTATTA GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA TAGTTTGCGC AACGTTGTTG AGGCATCGTG GTGTCACGCT CGTCGTTTGG TATGGCTTCA TTCAGCTCCG ATCAAGGCGA GTTACATGAT CCCCCATGTT GTGCAAAAAA GCGGTTAGCT TCCGATCGTT GTCAGAAGTA AGTTGGCCGC AGTGTTATCA CTCATGGTTA
OS
Co S. C 0000 9006 0* 0* *0 *00000
C
C, 0* C.
0 0 *000
C
TGGCAGCACT GCATAATTCT CTTACTGTCA TGCCATCCGT AAGATGCTTT TCTGTGACTG 7020 GTGAGTACTC AACCAAGTCA TTCTGAGAAT AGTGTATGCG GCGACCGAGT TGCTCTTGCC 7080
CGGCGTCAAT
7140 GAAAACGTt& 7200
TGTAACCCAC
7260
GGTGAGCAAA
7320
GTTGAATACT
7380
TCATGAGCGG
7440 ACGGGATAAT ACCGCGCCAC ATAGCAGAAC TTTAAAAGTG CTCATCATTG TTCGGGGCGA-AAACTCTCAA GGATCTTACC GCTGTTGAGA .TCCAGTTCGA TCGTGCACCC AACTGATCTT CAGCATCTTT TACTTTCACC AGCGTTTCTG AACAGGAAGG CAAAATGCCG CAAAAA6GGG AATAAGGGCG ACACGGAAAT CATACTCTTC CTTTTTCAAT ATTATTGAAG CATTTATCAG GGTTATTGTC ATACATATTT GAATGTATTT AGAAAAATAA ACAAA'TAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTC 7469 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 28 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: TGCTTAAGCG GCCGCGAAGG AGAAGTCC 28 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO -96- (xi) SEQUENCE DESCRIPTION: SEQ ID CCGAGCTAGC GACTGAAAAT GAG 23 INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 23 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) o (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1l: o CCTCTCGAGA GACAGCAAGA CAC 23 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 11152 base pairs TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: circular (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO 97 (iv) ANTI-SENSE: NO (xci) SEQUENCE DESCRIPTION: SEQ ID N0:12: AAGCTTGGGC AGAAATGGTT GAACTCCCGA GAGTGTCCTA CACCTAGGGG AGAAGCAGCC
AAGGGGTTGT
120
AAAGACATAT
180
GGGGAAGTTG
240
GTGCAAGATT
300
CAGCCAACTT
360
CTTGCTAAAA
420
ATGTTAAGAA
480
ATGGGAATAG
540
TTGACCACAG
600
AGGTGGTGGC
660 TTCCCACCAA GGACGACCCG TCTGCGCACA AACGGATGAG CCCATCAGAC TCATTCTCTG CTGCAAACTT GGCATAGCTC TGCTTTGCCT GGGGCTATTG CGGTTCGTGC TCGCAGGGCT CTCACCCTTG ACTCTTTTAA. TAGCTCTTCT ACAATCTAAA CAATTCGGAG AACTCGACCT TCCTCCTGAG GCAAGGACCA CCTCTTACAA GCCGCATCGA TTTTGTCCTT CAGAAATAGA AATAAGAATG ATTATATTTT TACCAATAA.G ACCAATCCAA TAGGTAGATT ATTAGTTACT ATGAATCATT ATCTTTTAGT ACTATTTTTA CTCAAATTCA GAAGTTAGAA AAAATAGAAA GAGACGCTCA ACCTCAATTG AAGAACAGGT GCAAGGACTA GCCTAGAAGT AAAAAAGGGA AAAAAGAGTG TTTTTGTCAA AATAGGAGAC AACCAGGGAC TTATAGGGGA CCTTACATCT ACAGACCAAC AGATGCCCCC TTACCATATA. CAGGAAGATA TGACTTAAA.T TGGGATAGGT GGGTTACAGT CAATGGCTAT 720 98-
AAAGTGTTAT
780
TGTATGTTGT
840
CTAGGAACAG
900
GGACTAATAG
960
TTGGCCCAAC
1020
AACGAGGATG
1080
GCTCTGAGTG
1140
TTATGTAAAC
1200
CATTCACCTC
1260
TCACTTTCCA
1320
GCTGGCGCCC
1380
TGTTTGTCTT
1440
CCATAGGGAC
1500 ATAGATCCCT CCCTTTTCGT GAAAGACTCG CCAGAGCTAG ACCTCCTTGG CTCA6AGAAGA AAAAGACGAC ATGAAACAAC AGGTACATGA TTATATTTAT GAATGCACTT TTGGGGAAAG AT'ITTCCATA CCAAGGAGGG GACAGTGGCT AACATTATTC TGCAAAAACT CATGGCATGA GTTATTATGA ATAGCCTTTA CTTGCGGTTC CCAGGGCTTA AGTAAGTTTT TGGTTACAAA CTGTTCTTAA TGAGACAAGT GGTTTCCTGA CTTGGTTTGG TATCAAAGGT TCTGATCTGA TTCTATTTTC CTATGTTCTT TTGGAATTTA TCCAAATCTT ATGTAAATGC CAAGATATAA AAGAGTGCTG ATTTTTTGAG TAAACTTGCA ACAGTCCTAA TTGTGTGTTT GTGTCTGTTC GCCATCCCGT CTCCGCTCGT CACTTATCCT GAGGGTCCCC CCGCAGACCC CGGCGACCCT CAGGTCGGCC GACTGCGGCA GAACAGGGAC CCTCGGATAA GTGACCCTTG TCTCTATTTC TACTATTTGG GTATTGTCTC TTTCTTGTCT GGCTATCATC ACAAGAGCGG AACGGACTCA CAAGCTAGCG ACTGAAAATG AGACATATTA TCTGCCACGG AGGTGTTATT 99- ACCGAAGAAA TGGCCGCCAG TCTTTTGGAC 1560 CTTCCACCTC CTAGCCATTT TGAACCACCT 1620 ACGGCCCCCG AAGATCCCAA CGAGGAGGCG 1680 TTGGCGGTGC AGGAAGGGAT TGACTTACTC 1740 CCGCCTCACC TTTCCCGGCA GCCCGAGCAG 1800 ATGCCAAACC TTGTACCGGA GGTGATCGAT 1860 AGTGACGACG AGGATGAAGA GGGTGAGGAG 1920 CACGGTTGCA GGTCTTGTCA TTATCACCGG 1980 TCGCTTTGCT ATATGAGGAC CTGTGGCATG 2040 GTGGGTGATA GAGTGGTGGG TTTGGTGTGG 2100 GGTTTAAAGA ATTTTGTATT GTGATTTTTT 2160 AGCCCGAGCC AGAACCGGAG CCTGCAAGAC 2220 CAGCTGATCG AAGAGGTACT GGCTGATAAT ACCCTTCACG AACTGTATGA TTTAGACGTG GTTTrCGCAGA TTTTTCCCGA CTCTGTAATG ACTTTTCCGC CGGCGCCCGG TTCTCCGGAG CCGGAGCAGA GAGCCTTGGG TCCGGTTTCT CTTACCTGCC ACGAGGCTGG CTTTCCACCC TTTGTGTTAG ATTATGTGGA GCACCCCGGG AGGAATACGG GGGACCCAGA TATTATGTGT TTTGTCTACA GTAAGTGAAA ATTATGGGCA TAATTTTTTT TTTAATTTTT ACAGTTTTGT TAAAAGGTCC TGTGTCTGAA CCTGAGCCTG CTACCCGCCG TCCTAAAATG GCGCCTGCTA TCCTGAGACG CCCGACATCA CCTGTGTCTA GAGAATGCAA TAGTAGTACG GATAGCTGTG 2280 -100- ACTCCGGTCC TTCTAACACA CCTCCTGAGA TACACCCGGT GGTCCCGCTG TGCCCCATTA 2340 A.ACCAGTTGC CGTGAGAGTT GGTGGGCGTC 2400 GCCAGGCTGT GGAATGTATC GAGGACTTGC TTAACGAGCC TGGGC.AACCT TTGGACTTGA GCTGTAAACG CCCCAGGCCA TAAGGTGTAA 2460 ACCTGTGATT GCGTGTGTGG TTAACGCCTT TGTTTGCTGA ATGAGTTGAT GTAAGTTTAA 2520 .*0 011.
So..
So..
0 0 0 TAAAGGGTGA GATAATGTTT AACTTGCATG 2580 ATATAATGCG CCGTGGGCTA ATCTTGGTTA 2640 TGGAAGATTT -TTCTGCTGTG CGTAACTTGC 2700 TTTGGAGGTT TCTGTGGGGC TCATCCCAGG 2760 ACAAGTGGGA ATTTGAAGAG CTTTTGAAAT 2820 TGGGTCACCA GGCGCTTTTC CAAGAGAAGG 2880 GGCGCGCTGC GGCTGCTGTT GCTTTTTTGA 2940 GCGTGTTAAA TGGGGCGGGG CTTAAAGGGT CATCTGACCT CATGGAGGCT TGGGAGTGTT TGGAACAGAG CTCTAACAGT ACCTCTTGGT CAAAGTTAGT CTGCAGAATT AAGGAGGATT CCTGTGGTGA GCTGTTTGAT TCTTTGAATC TCATCAAGAC TTTGGATTTT TCCACACCGG GTTTTATAAA GGATAAATGG AGCGAAGAAA CCCATCTGAG CGGGGGGTAC CTGCTGGATT TTCTGGCCAT GCATCTGTGG AGAGCGGTTG 3000 TGAGACACAA GAATCGCCTG CTACTGTTGT CTTCCGTCCG CCCGGCGATA ATACCGACGG 3060 101 0 O* 0 0 0 00.0 S00...
6..s
AGGAGCAGCA
3120
ACCCGAGAGC
3180
AGAACTGAGA
3240
GGAGCGGGGG
3300
CAGACACCGT
3360
TGATCTGCTG
3420
GGATGATTTT
3480
GTACAAGATC
3540
CGAGGTGGAG
3600
GCCGGGGGTG
3660
TTTTAGCGGT
3720
TGGGTTTAAC
3780
TTACTGCTGC
3840 GCAGCAGCAG GAGGAAGCCA GGCGGCGGCG GCAGGAGCAG AGCCCATGGA CGGCCTGGAC CCTCGGGAAT GAATGTTGTA CAGGTGGCTG ALACTGTATCC CGCATTTTGA CAATTACAGA GGATGGGCAG GGGCTAAAGG GGGTAAAGAG GCTTGTGAGG CTACAGAGGA GGCTAGGAAT CTAGCTTTTA GCTTAATGAC CCTGAGTGTA TTACTTTTCA ACAGATCAAG GATAATTGCG CTAATGAGCT GCGCAGAAGT ATTCCATAGA GCAGCTGACC ACTTACTGGC TGCAGCCAGG GAGGAGGCTA TTAGGGTATA TGCAAAGGTG GCACTTAGGC CAGATTGCAA AGCAAACTTG TAAATATCAG GAATTGTTGC TACATTTCTG GGAACGGGGC ATAGATACGG AGGATAGGGT GGCCTTTAGA TGTAGCATGA TAAATATGTG CTTGGCATGG ACGGGGTGGT TATTATGAAT GTAAhGGTTTA CTGGCCCCAA ACGGTTTTCC TGGCCAATAC CAACCTTATC CTACACGGTG TAAGCTTCTA AATACCTGTG TGGAAGCCTG GACCGATGTA AGGGTTCGGG GCTGTGCCTT TGGAAGGGGG TGGTGTGTCG CCCCAAAAGC AGGGCT'TCAA TTAAGAAATG 102
S
S
S. 5.
S S *5*S
S
5*SS CCTCTTTGAA AGGTGTACCT TGGGTATCCT GTCTGAGGGT AACTCCAGGG TGCGCCACAA 3900 TGTGGCCTCC GACTGTGGTT GCTTCATGCT AGTGAAAAGC GTGGCTGTGA TTAAGCATAA 3960 CATGGTATGT GGCAACTGCG AGGACAGGGC CTCTCAGATG CTGACCTGCT CGGACGGCAA 4020 CTGTCACCTG CTGAAGACCA TTCACGTAGC CAGCCACTCT CGCAAGGCCT GGCCAGTGTT 4080 TGAGCATAAC ATACTGACCC GCTGTTCCTT GCATTTGGGT AACAGGAGGG GGGTGTTCCT 4140 ACCTTACCAA TGCAATTTGA GTCACACTAA GATATTGCTT GAGCCCGAGA GCATGTCCAA 4200 GGTGAACCTG AACGGGGTGT TTGACATGAC CATGAAGATC TGGAAGGTGC TGAGGTACGA 4260 TGAGACCCGC ACCAGGTGCA GACCCTGCGA GTGTGGCGGT AAACATATTA GGAACCAGCC 4320 TGTGATGCTG GATGTGACCG AGGAGCTGAG GCCCGATCAC TTGGTGCTGG CCTGCACCCG 4380 CGCTGAGTTT GGCTCTAGCG ATGAAGATAC AGATTGAGGT ACTGAAATGT GTGGGCGTGG 4440 CTTAAGGGTG GGAAAGAATA TATAAGGTGG GGGTCTTATG TAGTTTTGTA TCTGTTTTGC 4500 AGCAGCCGCC GCCGCCATGA GCACCAACTC GTTTGATGGA AGCATTGTGA GCTCATATTT 4560 GACAACGCGC ATGCCCCCAT GGGCCGGGGT GCGTCAGAAT GTGATGGGCT CCAGCATTGA 4620 103- TGGTCGCCCC GTCCTGCCCG CAAACTCTAC TACCTTGACC 4680 GCCGTTGGAG ACTGCAGCCT CCGCCGCCGC TTCAGCCGCT 4740 TGTGACTGAC TTTGCTTTCC TGAGCCCGCT TGCAAGCAGT 4800 CCGCGATGAC AAGTTGACGG CTCTTTTGGC ACAATTGGAT 4860 TGTCGTTTCT CAGCAGCTGT TGGATCTGCG CCAGCAGGTT 4920 CCCTCCCAAT GCGGTTTAAA ACATAAATAA AAAACCAGAC 4980 GCAAGTGTCT TGCTGTCTCT CGAGGGATCT TTGTGAAGGA 5040 CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA TACGAGACCG TGTCTGGAAC GCAGCCACCG CCCGCGGGAT GCAGCTTCCC GTTCATCCGC TCTTTGACCC GGGAACTTAA TCTGCCCTGA AGGCTTCCTC TCTGTTTGGA TTTGGATCAA ACCTTACTTC TGTGGTGTGA GGTAAATATA AAATTTTTAA
S
0*SS S S S. t
S.C.
S. S
SS
S
S
4.55.
S
*0.S S. 55 S S
S
SC..
S.
OSSS
*5*S
S
*.SS
5100 GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT ATTTTAGATT CCAACCTATG 5160 GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA GGAAAACCTG TTTTGCTCAG 5220 AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC TCAACATTCT ACTCCTCCAA 5280 AA.AAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC AGAATTGCTA AGTTTTTTGA 5340 GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC TATTTACACC ACAAAGGAAA 5400 -104- AAGCTGCACT GCTATACAAG AAAATTATGG AAAATATTC TGTAACCTTT ATAAGTAGGC 5460 ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC ACACAGGCAT AGAGTGTCTG 5520 CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT TTTAATTTGT AAAGGGGTTA 5580 ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA TAATCAGCCA TACCACATTT 5640 GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC CCCTGAACCT GAAACATAAA 5700 ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGc 5760 AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 5820 TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCGGC TGTGGAATGT GTGTCAGTTA 5880 GGGTGTGGAA AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT 5940 TAGTCAGCAA CCAGGTGTGG AAAGTCCCCA GGCTCCCCAG CAGGCAGAAG TATGCAAAGC 6000 ATGCATCTCA ATTAGTCAGC AACCATAGTC CCGCCCCTAA CTCCGCCCAT CCCGCCCCTA 6060 ACTCCGCCCA GTTCCGCCCA TTCTCCGCCC CATGGCTGAC TAAT'rTTTTT TATTTATGCA 6120 GAGGCCGAGG CCGCCTCGGC CTCTGAGCTA TTCCAGAAGT AGTGAGGAGG CTTTTTTGGA 6180 -105- GGCCTAGGCT TTTGCAAAAA GCTTGGACAC AAGACAGGCT TGCGAGATAT GTTTGAGAAT 6240
ACCACTTTAT
6300
ACTGGTTTTT
6360 CCCGCGTCAG GGAGAGGCAG TGCGTAAAAA GACGCGGACT CATGTGAAAT AGTGCGCCAG ATCTCTATAA TCTCGCGCAA CCTATTTTCC CCTCGAACAC TTTTTAAGCC GTAGATAAAC AGGCTGGGAC ACTTCACATG AGCGAAA.AAT ACATCGTCAC 6420
CTGGGACATG
6480
ATGGAAAGGC
6540
TGAACTGGGT
6600
GCGCGAGCTT
6660
TGACCTGGTG
6720
CTTTGTCACC
6780
TATCCCGCAA
6840
AATCTCCGGT
6900
CAGGCGGGTT
6960 TTGCAGATCC ATGCACGTAA ACTCGCAAGC CGACTGATGC CTTCTGAACA ATTATTGCCG TAAGCCGTGG CGGTCTGGTA CCGGGTGCGT TACTGGCGCG ATTCGTCATG TCGATACCGT TTGTATTTCC AGCTACGATC ACGACAACCA AAAGTGCTGA AACGCGCAGA AGGCGATGGC GAAGGCTTCA TCGTTATTGA GATACCGGTG GTACTGCGGT TGCGATTCGT GAAATGTATC CAAAAGCGCA ATCTTCGCAA AACCGGCTGG TCGTCCGCTG GTTGATGACT ATGTTGTTGA GATACCTGGA TTGAACAGCC GTGGGATATG GGCGTCGTAT TCGTCCCGCC CGCTAATCTT TTCAACGCCT GGCACTGCCG GGCGTTGTTC TTTTTAACTT ACAATAGTTT CCAGTAAGTA TTCTrGGAGGC TGCATCCATG ACACAGGCAA 106 ACCTGAGCGA AACCCTGTTC AAACCCCGCT TTAAACATCC TGAAACCTCG ACGCTAGTCC 7020 GCCGCTTTAA TCACGGCGCA CAACCGCCTG TGCAGTCGGC CCTTGATGGT AAAACCATCC 7080 CTCACTGGTA TCGCATGATT AACCGTCTGA TGTGGATCTG GCGCGGCATT GACCCACGCG 7140 AAATCCTCGA CGTCCAGGCA 7200 TATACGATAC GGTGATTGGC 7260 TTTGTGAAGG AACCTTACTT 7320 TAA.AGCTCTA AGGTAAATAT 7380 ATTGTTTGTG TATTTTAGAT 7440 GCCTTTAATG AGGAAAACCT 7500 ACTGCTGACT CTCAACATTC 7560 GACTTTCCTT CAGAATTGCT 7620 CGTATTGTGA TGAGCGATGC CGAACGTACC GACGATGATT TACCGTGGCG GCAACTGGAT TTATGAGTGG GCCCCGGATC CTGTGGTGTG ACATAATTGG ACAAACTACC TACAGAGATT AAAATTTTTA AGTGTATAAT GTGTTAAACT ACTGAT'TCTA TCCAACCTAT GGAACTGATG AATGGGAGCA GTGGTGGAAT GTTTTGCTCA GAAGAA.ATGC CATCTAGTGA TGATGAGGCT TACTCCTCCA AAAAAGAAGA GAAAGGTAGA AGACCCCAAG AAGTTTTTTG AGTCATGCTG TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 7680 GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 7740 TGCTATACAA GAAAATTATG ATAATCATAA CATACTGTTT -107
TTTCTTACTC
7800
ACCTTTAGCT
7860
ACTAGAGATC
7920
CCCACACCTC
7980
TATTGCAGCT
8040 CACACAGGCA TAGAGTGTCT GCTATTAATA ACTATGCTCA AAAATTGTGT TTTTAATTTG TAAAGGGGTT AATAAGGAAT ATTTGATGTA TAGTGCCTTG ATAATCAGCC ATACCACATT TGTAGAGGTT TTACTTGCTT TAAA;AACCT CCCCTGAACC TGAAACATAA AATGAATGCA ATTGTTGTTG TTAACTTGTT 'TATAATGGTT ACAAATAAAG CAATAGCATC ACAAATTTCA CAAATAAAGC 0.0* *o ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC ATCAAVGTAT CTTATCATGT 8100
CTGGATCCCC
8160
ACATTCCAAT
8220
ACAAAAAGGA
8280
ATCTGGGAAG
8340
CAACAGCAGA
8400
GAAGCACTGT
8460 AGGAAGCTCC TCTGTGTCCT CATAAACCCT AACCTCCTCT ACTTGAGAGG CATAGGCTGC CCATCCACCC TCTGTGTCCT CCTGTTAATT AGGTCACTTA AATTGGGTAG GGGTTTTTCA CAGACCGCTT TCTAAGGGTA ATTTTAAAAT TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT.GGTAAACAGC
CCACAAATGT
AACATACAAG CTGTCAGCTT TGCACAAGGG CCCAACACCC TGCTCATCALA GGTTGCTGTG TTAGTAATGT GCAAAACAGG AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACT'rGG ATCAGGAACC CAGCACTCCA 8520 108-
CTGGATAAGC
8580
CA.ACTGTAGC
8640
ACACACCCTG
8700
GAATGGGTTT
8760
TAGCAGTTAC
8820
CACAGGTTAA
8880
GATACGCCTA
8940
CACTTTTCGG
9000
TATGTATCCG
9060
GAGTATGAGT
9120
TCCTGTTTTT
9180
TGCACGAGTG
9240 ATTATCCTTA TCCAAAACAG CCTTGTGGTC AGTGTTCATC TGCTGACTGT ATTTTTTGGG GTTACAGTTT GAGCAGGATA TTTGGTCCTG TAGTTTGCTA CAGCTCCAAA GGTTCCCCAC CAACAGCAAA AAAATGAAAA TTTGACCCTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT CCCTGAATGC AAGTTTAACA CCCAATAACC TCAGTTTTAA CAGTAACAGC TTCCCACATC AAAATATTTC GTCCTCATTT AAATTAGGCA AAGGAATTCT TGAAGACGAA AGGGCCTCGT TTTTTATAGG TTAATGTCAT GATAATAATG GTTTCTTAGA CGTCAGGTGG GGAAATGTGC GCGGAACCCC TATTTGTTTA TTTTTCTAAA TACATTCAAA CTC.ATGAGAC AATAACCCTG ATAAATGCTT C.ATAATATT GAAAAAGGAA ATTCAACATT TCCGTGTCGC CCTTATTCCC TTTTTTGCGG CATTTTGCCT GCTCACCCAG AAACGCTGGT GAAAGTAAAA GATGCTGAAG-ATCAGTTGGG GGTTACATCG AACTGGATCT CAACAGCGGT AAGATCCTTG AGAGTTTTCG CCCCGAAGAA CGTTTTCCAA TGATGAGCAC TTTTAAAGTT CTGCTATGTG GCGCGGTATT 9300 -109- ATCCCGTGTT GACGCCGGGC AAGAGCAACT CGGTCGCCGC ATACACTATT CTCAGAATGA 9360 CTTGGTTGAG TACTCACCAG TCACAGAAAA GCATCTTACG GATGGCATGA CAGTAAGAGA 9420 ATTATGCAGT GCTGCCATAA CCATGAGTGA TAACACTGCG GCCAACTTAC TTCTGACAAC 9480 GATCGGAGGA CCGAAGGAGC TAACCGCTTT TTTGCACAAC ATGGGGGATC ATGTAACTCG 9540 CCTTGATCGT TGGGAACCGG AGCTGAATGA AGCCATACCA AACGACGAGC GTGACACCAC 9600 GATGCCTGCA GCA.ATGGCAA CAACGTTGCG CAAACTATTA. ACTGGCGAAC TACTTACTCT 9660 AGQ.TTCCCGG CAACAATTAA TAGACTGGAT GGAGGCGGAT AAAGTTGCAG GACCACTTCT 9720 GCGCTCGGCC CTTCCGGCTG GCTGGTTTAT TGCTGATAAA TCTGGAGCCG GTGAGCGTGG 9780 GTCTCGCGGT ATCATTGCAG CACTGGGGCC AGATGGTAAG CCCTCCCGTA TCGTAGTTAT 9840 CTACACGACG GGGAGTCAGG CAACTATGGA TGAACGAAAT AGACAGATCG CTGAGATAGG 9900 TGCCTCACTG ATTAAGCATT GGTAACTGTC AGACCAAGTT TACTCATATA TACTTTAGAT 9960 TGATTTAAAA CTTCATTTTT AATTTAAAAG GATCTAGGTG AAGATCCTTT TTGATAATCT 10020 CATGACCAAA ATCCCTTAAC GTGAGTTTTC GTTCCACTGA GCGTCAGACC CCGTAGAAAA 10080 -110- GATCAAAGGA TCTTCTTGAG ATCCTTTTTT TCTGCGCGTA ATCTGCTGCT TGCAAACAAA 10140 AAAACCACCG CTACCAGCGG TGGTTTGTTT GCCGGATCAA GAGCTACCAA CTCTTTTTCC 10200 GAAGGTAA.CT GGCTTCAGCA GAGCGCAGAT ACCAAATACT GTCCTTCTAG TGTAGCCGTA 10260 GTTAGGCCAC CACTTCAAGA ACTCTGTAGC ACCGCCTACA TACCTCGCTC TGCTAATCCT 10320 GTTACCAGTG GCTGCTGCCA GTGGCGATAA GTCGTGTCTT ACCGGGTTGG ACTCAAGACG 10380 ATAGTTACCG GATAAGGCGC AGCGGTCGGG CTGAACGGGG GGTTCGTGCA CACAGCCCAG 10440 CTTGGAGCGA ACGACCTACA CCGAACTGAG ATACCTACAG CGTGAGCTAT GAGAAAGCGC 10500 CACGCTTCCC GAAGGGAGAA AGGCGGACAG GTATCCGGTA AGCGGCAGGG TCGGAACAGG 10560k AGAGCGCACG AGGGAGCTTC CAGGGGGAAA CGCCTGGTAT CTTTATAGTC CTGTCGGGTT 10620 TCGCCACCTC TGACTTGAGC GTCGATTTTT GTGATGCTCG TCAGGGGGGC GGAGCCTATG 10680 GAA.AAACGCC AGCAACGCGG 10740 CATGTTCTTT CCTGCGTTAT 10800 AGCTGATACC GCTCGCCGCA 10860 CCTTTTTACG GTTCCTGGCC TTTTGCTGGC CTTTTGCTCA CCCCTGATTC TGTGGATAAC CGTATTACCG CCTTTGAGTG GCCGAACGAC CGAGCGCAGC GAGTCAGTGA GCGAGGAAGC ill GGAAGAGCGC CTGATGCGGT ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT 10920 ATGGTGCACT CTCAGTACAA TCTGCTCTGA TGCCGCATAG TTAAGCC!AGT
ATACACTCCG
10980 CTATCGCTAC GTGACTGGGT CATGGCTGCG CCCCGACACC CGCCAACACC
CGCTGACGCG
11040 CCCTGACGGG CTTGTCTGCT 1CCCGGCATCC GCTTACAGAC AAGCTGTGAC
CGTCTCCGGG
11100 AGCTGCATGT GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGC!GAGGCA
GC
11152 go INFORMATION FOR SEQ ID NO: 13: 0 SEQUENCE CHARACTERISTICS: *(A),LENGTH: 19 base pairs 0. 0:9 TYPE: ncecacid STRANDEDNESS: single TOPOLOGY: linear 990 0 9 (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE:
NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: GACGGATCGG
GAGATCTCC
19 INFORMATION FOR SEQ ID NO: 14: -112- SEQUENCE CHARACTERISTICS: LENGTH: 22 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: CCGCCTCAGA AGCCATAGAG
CC
o 22 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 14455 base pairs o* TYPE: nucleic acid STRANDEDNESS: double TOPOLOGY: circular (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (xi) SEQUENCE DESCRIPTION: SEQ ID AAGCTTGGGC AGAAATGGTT GAACTCCCGA GAGTGTCCTA CACCTAGGGG
AGAAGCAGCC
AAGGGGTTGT TTCCCACCAA GGACGACCCG TCTGCGCACA AACGGATGAG CCCATCAGAC 120 -113- **00 :*000.
0*.0
AAAGACATAT
180
GGGGAAGTTG
240
GTGCAAGATT
300
CAGCCAACTT
360
CTTGCTAAAA
420
ATGTTAAGAA
480
ATGGGAATAG
540
TTGACCACAG
600
AGGTGGTGGC
660
TTACCATATA
720 TCATTCTCTG CTGCAAACTT GGCATAGCTC TGCTTTGCCT GGGGCTATTG CGGTTCGTGC TCGCAGGGCT CTCACCCTTG ACTCTTTTAA TAGCTCTTCT ACAATCTAAA CAATTCGGAG AACTCGACCT TCCTCCTGAG GCAAGGACCA CCTCTTACAA GCCGCATCGA TTTTGTCCTT CAGAAATAGA AATAAGAATG ATTATATTTT TACCAATAAG ACCAATCCAA TAGGTAGATT ATTAGTTACT
ATGAATCATT
AAAATAGAAA
ATCTTTTAGT
GAGACGCTCA
ACTATTTTTA
ACCTCAATTG
CTCAAA'rTCA
AAGAACAGGT
GAAGTTAGAA
GCAAGGACTA
GCCTAGAAGT AAAAAAGGGA AAAAAGAGTG TTTTTGTCAA AATAGGAGAC AACCAGGGAC TTATAGGGGA CCTTACATCT ACAGACCAAC AGATGCCCCC CAGGAAGATA TGACTTAAAT TGGGATAGGT GGGTTACAGT CAATGGCTAT AAAGTGTTAT. ATAGATCCCT CCCTTTTCGT GAAAGACTCG CCAGAGCTAG ACCTCCTTGG 780 TGTATGTTGT CTCAAGAAGA AA.AAGACGAC ATGAAAC.AAC AGGTACATGA TTATATTTAT 840 CTAGGAACAG GAATGCACTT TTGGGGAAAG ATTTTCCATA CCAAGGAGGG GACAGTGGCT 900 -114- 0*t* C C
C
C. .C *1 C S
C
C
9.
C
GGACTAATAG
960
TTGGCCCAAC
1020
AACGAGGATG
1080
GCTCTGAGTG
1140
TTATGTAAAC
1200
CATTCACCTC
1260
TCACTTTCCA
1320
GCTGGCGCCC
1380
TGTTTGTCTT
1440
CCATAGGGAC
1500
ACCGAAGAAA
1560
CTTCCACCTC
1620 AACATTATTC TGCAAAAACT CATGGCATGA GTTATTATGA ATAGCCTTTA CTTGCGGTTC CCAGGGCTTA AGTAAGTTTT TGGTTACAAA CTGTTCTTAA TGAGACAAGT GGTTTCCTGA CTTGGTTTGG TATCAAAGGT TCTGATCTGA TTCTATTTTC CTATGTTCTT TTGGAATTTA TCCAAATCTT ATGTAAATGC CAAGATATAA AAGAGTGCTG ATTTTTTGAG TAAACTTGCA ACAGTCCTAA TTGTGTGTTT GTGTCTGTTC GCCATCCCGT CTCCGCTCGT CACTTATCCT GAGGGTCCCC CCGCAGACCC CGGCGACCCT CAGGTCGGCC GACTGCGGCA GAACAGGGAC CCTCGGATAA GTGACCCTTG TCTCTATTTC TACTATTTGG GTATTGTCTC TTTCTTGTCT GGCTATCATC ACAAGAGCGG AACGGACTCA CAAGCTAGCG ACTGAAAATG AGACATATTA TCTGCCACGG AGGTGTTATT TGGCCGCCAG TCTTTTGGAC CAGCTGATCG AAGAGGTACT GGCTGATAAT CTAGCCATTT TGAACCACCT ACCCTTCACG AACTGTATGA TTTAGACGTG ACGGCCCCCG AAGATCCCAA CGAGGAGGCG GTTTCGCAGA TTTTTCCCGA CTCTGTAATG 1680 -115-
S.
0O* see* 0000 o. 0 490 TTGGCGGTGC AGGAAGGGAT 1740 CCGCCTCACC TTTCCCGGCA 1800 ATGCCAAACC TTGTACCGGA 1860 AGTGACGACG AGGATGAAGA 1920 CACGGTTGCA GGTCTTGTCA 1980 TCGCTTTGCT ATATGAGGAC 2040 GTGGGTGATA GAGTGGTGGG 2100 GGTTTAAAGA ATTTTGTATT 2160 AGCCCGAGCC AGAACCGGAG 2220 TCCTGAGACG CCCGACATCA 2280 ACTCCGGTCC TTCTAACACA 2340 AACCAGTTGC CGTGAGAGTT 2400 TTAACGAGCC TGGGCAACCT 2460 TGACTTACTC ACTTTTCCGC CGGCGCCCGG TTCTCCGGAG GCCCGAGCAG CCGGAGCAGA GAGCCTTGGG TCCGGTTTCT GGTGATCGAT CTTACCTGCC ACGAGGCTGG CT'rTCCACCC GGGTGAGGAG TTTGTGTTAG ATTATGTGGA GCACCCCGGG TTATCACCGG AGGAATACGG GGGACCCAGA TATTATGTGT CTGTGGCATG TTTGTCTACA GTAAGTGAAA ATTATGGGCA TTTGGTGTGG TAATTTTTTT TTTAATTTTT ACAGTTTTGT GTGATTTTTT TAAAAGGTCC TGTGTCTGAA CCTGAGCCTG CCTGCAAGAC CTACCCGCCG TCCTAAAATG GCGCCTGCTA CCTGTGTCTA GAGAATGCA6A TAGTAGTACG GATAGCTGTG CCTCCTGAGA TACACCCGGT GGTCCCGCTG TGCCCCATTA GGTGGGCGTC GCCAGGCTGT GGAATGTATC GAGGACTTGC TTGGACTTGA GCTGTAAACG CCCCAGGCCA TAAGGTGTAA 116- ACCTGTGATT GCGTGTGTGG TTAACGCCTT TGTTTGCTGA ATGAGTTGAT GTAAGTTTAA 2520 TAAAGGGTGA GATAATGTTT AACX'TGCATG GCGTGTTAAA TGGGGCGGGG CTTAAAGGGT 2580 ATATAATGCG CCGTGGGCTA ATCTTGGTTA CATCTGACCT CATGGAGGCT TGGGAGTGTT 2640 TGGAAGATTT TTCTGCTGTG CGTAACTTGC TGGAACAGAG CTCTAACAGT ACCTCTTGGT 2700 TTTGGAGGTT TCTGTGGGGC TCATCCCAGG CAAAGTTAGT CTGCAGAATT AAGGAGGAT 2760 ACAAGTGGGA ATTTGAAGAG CTTTTGAAAT CCTGTGGTGA GCTGTTTGAT TCTTTGAATC 2820 TGGGTCACCA GGCGCTTTTC CAAGAGAAGG TCATCAAGAC TTTGGATTTT TCCACACCGG 2880 GGCGCGCTGC GGCTGCTGTT GCTTTTTTGA GTTTTATAAA GGATAAATGG AGCGAAGAAA 2940 CCCATCTGAG CGGGGGGTAC CTGCTGGATT TTCTGGCCAT GCATCTGTGG AGAGCGGTTG 3000 TGAGACACAA GAATCGCCTG CTACTGTTGT CTTCCGTCCG CCCGGCGATA ATACCGACGG 3060 AGGAGCAGCA GCAGCAGCAG GAGGAAGCCA GGCGGCGGCG GCAGGAGCAG AGCCCATGGA 3120 ACCCGAGAGC CGGCCTGGAC CCTCGGGAAT GAATGTTGTA CAGGTGGCTG AACTGTATCC 3180 AGAACTGAGA CGCATTTTGA CAATTACAGA GGATGGGCAG GGGCTAAAGG GGGTAAAGAG 3240 -117- GGAGCGGGGG GCTTGTGAGG CTACAGAGGA GGCTAGGAAT CTAGCTTTTA GCTTAATGAC 3300
CAGACACCGT
3360
TGATCTGCTG
3420
GGATGATTTT
3480
GTACAAGATC
3540
CGAGGTGGAG
3600
GCCGGGGGTG
3660
TTTTAGCGGT
3720
TGGGTTTAAC
3780
TTACTGCTGC
3840
CCTCTTTGAA
3900
TGTGGCCTCC
3960
CATGGTATGT
4020 CCTGAGTGTA TTACTTTTCA ACAGATCAAG GATAATTGCG CTAATGAGCT GCGCAGAAGT ATTCCATAGA GCAGCTGACC ACTTACTGGC TGCAGCCAGG GAGGAGGCTA TTAGGGTATA TGCAA.AGGTG GCACTTAGGC CAGATTGCAA AGCAAACTTG TAAATATCAG GAATTGTTGC TACATTTCTG GGAACGGGGC ATAGATACGG AGGATAGGGT GGCCTTTAGA TGTAGCATGA TAAATATGTG CTTGGCATGG ACGGGGTGGT TATTATGAAT GTAAGGTTTA CTGGCCCCAA ACGGTTTTCC TGGCCAATAC CAACCTTATC CTACACGGTG TAAGCTTCTA AATACCTGTG TGGAAGCCTG GACCGATGTA AGGGTTCGGG GCTGTGCCTT TGGAAGGGGG TGGTGTGTCG CCCCAAAAGC AGGGCTTCAA TTAAGAAATG AGGTGTACCT TGGGTATCCT GTCTGAGGGT AACTCCAGGG TGCGCCACAA GACTGTGGTT GCTTCATGCT AGTGAAAAGC GTGGCTGTGA TTAAGCATAA GGCAACTGCG AGGACAGGGC CTCTCAGATG CTGACCTGCT CGGACGGCAA -118-
CTGTCACCTG
4080 TGAdCATAAC 4140
ACCTTACCAA
4200
GGTGAACCTG
4260
TGAGACCCGC
4320
TGTGATGCTG
4380
CGCTGAGTTT
4440
CTTAAGGGTG
4500
AGCAGCCGCC
4560
GACAACGCGC
4620
TGGTCGCCCC
4680
GCCGTTGGAG
4740
TGTGACTGAC
4800 CTGAAGACCA TTCACGTAGC CAGCCACTCT CGCAAGGCCT GGCCAGTGTT ATACTGACCC GCTGTTCCTT GCATTTGGGT AACAGGAGGG GGGTGTTCCT TGCAATTTGA GTCACACTAA GATATTGCTT GAGCCCGAGA GCATGTCCAA AACGGGGTGT TTGACATGAC CATGAAGATC TGGAAGGTGC TGAGGTACGA ACCAGGTGCA GACCCTGCGA GTGTGGCGGT AAACATATTA GGAACCAGCC GATGTGACCG AGGAGCTGAG GCCCGATCAC TTGGTGCTGG CCTGCACCCG GGCTCTAGCG ATGAAGATAC AGATTGAGGT ACTGAAATGT GTGGGCGTGG GGAAAGAATA TATAAGGTGG GGGTCTTATG TAGTTTTGTA TCTGTTTTGC GCCGCCATGA GCACCA.ACTC GTTTGATGGA AGCATTGTGA GCTCATAT'TT ATGCCCCCAT GGGCCGGGGT GCGTCAGAAT GTGATGGGCT CCAGCATTGA GTCCTGCCCG CAAACTCTAC TACCTTGACC TACGAGACCG TGTCTGGAAC ACTGCAGCCT CCGCCGCCGC TTCAGCCGCT GCAGCCACCG CCCGCGGGAT TTTGCTTTCC TGAGCCCGCT TGCAAGCAGT GCAGCTTCCC GTTCATCCGC 119- CCGCGATGAC AAGTTGACGG CTCTTTTGGC ACAATTGGAT TCTTTGACCC GGGAACTTA.A 4860
TGTCGTTTCT
4920 CAGCAGCTGT TGGATCTGCG CCAGCAGGTT TCTGCCCTGA AGGCTTCCTC CCCTCCCAAT GCGGTTTAAA ACATAAATAA AAAACCAGAC TCTGTTTGGA TTTGGATCAA 4980
GCAAGTGTCT
5040
CATAATTGGA
5100
GTGTATAATG
5160
GAACTGATGA
5220
AAGAAATGCC
5280
AAAAGAAGAG
5340
GTCATGCTGT
5400
AAGCTGCACT
5460
ATAACAGTTA
5520
CTATTAATAA
5580 TGCTGTCTCT CGAGGGATCT TTGTGAAGGA ACCTTACTTC TGTGGTGTGA CAAACTACCT ACAGAGATTT AAAGCTCTAA GGTAAATATA AAATTTTTAA TGTTAAACTA CTGATTCTAA TTGTTTGTGT ATTTTAGATT CCAACCTATG ATGGGAGCAG TGGTGGAATG CCTTTAATGA GGAAAACCTG TTTTGCTCAG ATCTAGTGAT GATGAGGCTA CTGCTGACTC TCAACATTCT ACTCCTCCAA AAAGGTAGAA GACCCCAAGG ACTTTCCTTC AGAATTGCTA AGTTTTTTGA GTTTAGTAAT AGAACTCTTG CTTGCTTTGC TATTTACACC ACAAAGGAAA GCTATACAAG AAAATTATGG AAAAATATTC TGTAACCTTT ATAAGTAGGC TAATCATAAC ATACTGTTTT TTCTTACTCC ACACAGGCAT AGAGTGTCTG CTATGCTCAA AAATTGTGTA CCTTTAGCTT TTTAATTTGT AAAGGGGTTA -120 ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA TAATCAGCCA TACCACATTT 5640 GTAGAGGTTT TACTTGCTTT AA.AAAACCTC CCACACCTCC CCCTGAACCT GAAACATAAA 5700 ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC 5760 AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 5820 TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCGGC TGTGGAATGT GTGTCAGTTA 5880 GGGTGTGGAA AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA TGCAAAGCAT GCATCTCAAT 5940 TAGTCAGCAA CCAGGTGTGG AAAGTCCCCA GGCTCCCCAG CAGGCAGAAG TATGCAAAGC 6000 ATGCATCTCA ATTAGTCAGC AACCATAGTC CCGCCCCTAA CTCCGCCCAT CCCGCCCCTA 6060 ACTCCGCCCA GTTCCGCCCA TTCTCCGCCC CATGGCTGAC TAATTTTTTT TATTTATGCA 6120 GAGGCCGAGG CCGCCTCGGC CTCTGAGCTA TTCCAGAAGT AGTGAGGAGG CTTTTTTGGA 6180 GGCCTAGGCT TTTGCAAAAA GCTTGGACAC AAGACAGGCT TGCGAGATAT GTTTGAGAAT 6240 ACCACTTTAT CCCGCGTCAG GGAGAGGCAG TGCGTAAAAA GACGCGGACT CATGTGAAAT 6300 ACTGGTTTTT AGTGCGCCAG ATCTCTATAA TCTCGCGCAA CCTATTTTCC CCTCGAACAC 6360 -121 t* A S.
555555
S
5A55
S
5
S
TTTTTAAGCC
6420
CTGGGACATG
6480
ATGGAAAGGC
6540
TGAACTGGGT
6600
GCGCGAGCTT
6660
TGACCTGGTG
6720
CTTTGTCACC
6780
TATCCCGCAA
6840
AATCTCCGGT
6900
CAGGCGGGTT
6960
ACCTGAGCGA
7020
GCCGCTTTAA
7080
CTCACTGGTA
7140 GTAGATAAAC AGGCTGGGAC ACTTCACATG AGCGAAAAAT ACATCGTCAC TTGCAGATCC ATGCACGTAA ACTCGCAAGC CGACTGATGC CTTCTGALACA ATTATTGCCG TAAGCCGTGG CGGTCTGGTA CCGGGTGCGT TACTGGCGCG ATTCGTCATG TCGATACCGT TTGTATTTCC AGCTACGATC ACGACAACCA AAAGTGCTGA AACGCGCAGA AGGCGATGGC GAAGGCTTCA TCGTTATTGA GATACCGGTG GTACTGCGGT TGCGATTCGT GAAATGTATC CAAAAGCGCA ATCTTCGCAA AACCGGCTGG TCGTCCGCTG GTTGATGACT ATGTTGTTGA GATACCTGGA TTGAACAGCC GTGGGATATG GGCGTCGTAT TCGTCCCGCC CGCTAATCTT TTCAACGCCT GGCACTGCCG GGCGTTGTTC TTTTTAACTT 5555 ACAATAGTTT CCAGTAAGTA TTCTGGAGGC TGCATCCATG ACACAGGCAA AACCCTGTTC AAACCCCGCT TTAAACATCC TGAAACCTCG ACGCTAGTCC TCACGGCGCA CAACCGCCTG TGCAGTCGGC CCTTGATGGT AAAACCATCC TCGCATGATT AACCGTCTGA TGTGGATCTG GCGCGGCATT GACCCACGCG -122- AAATCCTCGA CGTCCAGGCA CGTATTGTGA TGAGCGATGC CGAACGTACC GACGATGATT 7200
TATACGATAC
7260
TTTGTGAAGG
7320
TAAAGCTCTA
7380 GGTGATTGGC TACCGTGGCG GCAACTGGAT TTATGAGTGG GCCCCGGATC AACCTTACTT CTGTGGTGTG ACATAATTGG ACAA.ACTACC TACAGAGATT AGGTAAATAT AAAATTTTTA AGTGTATAAT GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG AATGGGAGCA GTGGTGGAAT 7440 0 0* 0*S* 0 0**0 0000 Oe t* 0 0000 000@ 00 0 0* to
GCCTTTAATG
7500
ACTGCTGACT
7560
GACTTTCCTT
7620
GCTTGCTTTG
7680
GAAAAATATT
7740
TTTCTTACTC
7800
ACCTTTAGCT
7860
ACTAGAGATC
7920 AGGAAAACCT GTTTTGCTCA GAAGAAATGC CATCTAGTGA TGATGAGGCT CTCAACATTC TACTCCTCCA AAAAAGAAGA GAAAGGTAGA AGACCCCAAG CAGAATTGCT AAGTTTTTTG AGTCATGCTG TGTTTAGTAA TAGAACTCTT CTATTTACAC CACAAAGGAA AAAGCTGCAC TGCTATACAA GAAAATTATG CTGTAACCTT TATAAGTAGG CATAACAGTT ATAATCATAA CATACTGTTT CACACAGGCA TAGAGTGTCT GCTATTAATA ACTATGCTCA AAAATTGTGT TTTTAATTTG TAAAGGGGTT AATAAGGAAT ATTTGATGTA TAGTGCCTTG ATAATCAGCC ATACCACATT TGTAGAGGTT TTACTTGCTT TAAAAAACCT -123
CCCACACCTC
7980
TATTGCAGCT
8040
ATTTTTTTCA
8100
CTGGATCCCC
8160
ACATTCCAAT
8220
ACAAAAAGGA
8280
ATCTGGGAAG
8340
CAACAGCAGA
8400
GAAGCACTGT
8460
TGTAGGTTCC
8520
CTGGATAAGC
8580
CAACTGTAGC
8640
ACACACCCTG
8700 CCCCTGAACC TGAA.ACATAA AATGAATGCA ATTGTTGTTG TTAACTTGTT TATAATGGTT ACAAATAAAG CAATAGCATC ACAAATTTCA CAAATAAAGC CTGCATTCTA GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT AGGAAGCTCC TCTGTGTCCT CATAAACCCT AACCTCCTCT ACTTGAGAGG CATAGGCTGC CCATCCACCC TCTGTGTCCT CCTGTTAATT AGGTCACTTA AATTGGGTAG GGGTTTTTCA CAGACCGCTT TCTAAGGGTA ATTTTAAAAT TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT GGTAAACAGC CCACAAATGT AACATACAAG CTGTCAGCTT TGCACAAGGG CCCAACACCC TGCTCATCAA GGTTGCTGTG TTAGTAATGT GCAA.AACAGG AGGCACATTT TCCCCACCTG AAAATATCTA GTGTTTTCAT TTTTACTTGG ATCAGGAACC CAGCACTCCA ATTATCCTTA TCCAAAACAG CCTTGTGGTC AGTGTTCATC TGCTGACTGT ATTTTTTGGG GTTACAGTTT GAGCAGGATA TTTGGTCCTG TAGTTTGCTA CAGCTCCAAA GGTTCCCCAC CAACAGCAAA AAAATGAAAA TTTGACCCTT -124-
GAATGGGTTT
8760
TAGCAGTTAC
8820
CACAGGTTAA
8880
GATACGCCTA
8940
CACTTTTCGG
9000
TATGTATCCG
9060
GAGTATGAGT
9120
TCCTGTTTTT
9180
TGCACGAGTG
9240
CCCCGAAGAA
9300
ATCCCGTGTT
9360
CTTGGTTGAG
9420
ATTATGCAGT
9480 TCCAGCACCA TTTTCATGAG TTTTTTGTGT CCCTGAATGC AAGTTTAACA CCCAATAACC TCAGTTTTAA CAGTAACAGC TTCCCACATC AA.AATATTTC GTCCTCATTT AAATTAGGCA AAGGAATTCT TGAAGACGAA AGGGCCTCGT TTTTTATAGG TTAATGTCAT GATAATAATG GTTTCTTAGA CGTCAGGTGG GGAAATGTGC GCGGAACCCC TATTTGTTTA TTTTTCTAAA TACATTCAAA CTCATGAGAC AATAACCCTG ATAAATGCTT CAATAATATT GAAAAAGGAA ATTCAACATT TCCGTGTCGC CCTTATTCCC TTTTTTGCGG CATTTTGCCT GCTCACCCAG AAACGCTGGT GAAAGTAAAA GATGCTGAAG ATCAGTTGGG GGTTACATCG AACTGGATCT CAACAGCGGT AAGATCCTTG AGAGTTTTCG CGTTTTCCAA TGATGAGCAC TTTTAAAGTT CTGCTATGTG GCGCGGTATT GACGCCGGGC AAGAGCAACT CGGTCGCCGC ATACACTATT CTCAGAATGA TACTCACCAG TCACAGAAAA GCATCT'rACG GATGGCATGA CAGTAAGAGA GCTGCCATAA CCATGAGTGA TAACACTGCG GCCAACTTAC TTCTGACAAC 125 GATCGGAGGA CCGAAGGAGC TAACCGCTTT TTTGCACAAC ATGGGGGATC A'rGTAACTCG 9540 CCTTGATCGT TGGGAACCGG AGCTGAATGA AGOCATACCA AACGACGAGC GTGACACCAC 9600 GATGCCTGCA GCAATGGCAA CAACGTTGCG CAAACTATTA ACTGGCGAAC TACTTACTCT 9660 AGCTTCCCGG CAACAATTAA TAGACTGGAT GGAGGCGGAT AAAGTVGCAG GACCACTTCT 9720 GCGCTCGGCC CTTCCGGCTG GCTGGTTTAT TGCTGATAAA TCTGGAGCCG GTGAGCGTGG 9780 GTCTCGCGGT ATCATTGCAG CACTGGGGCC AGATGGTAAG CCCTCCCGTA TCGTAGTTAT 9840 CTACACGACG GGGAGTCAGG CAACTATGGA TGAACGAAAT AGACAGATCG CTGAGATAGG 9900 TGCCTCACTG ATTAAGCATT GGTAACTGTC AGACCAAGTT TACTCATATA TACTTTAGAT 9960 TGATTTAAAA CTTCATTTTT AATTTAAAAG GATCTAGGTG AAGATCCTTT TTGATAATCT 10020 CATGACCAAA ATCCCTTAAC GTGAGTTTTC GTTCCACTGA GCGTCAGACC CCGTAGAAAA 10080 GATC.AAAGGA TCTTCTTGAG ATCCTTTTTT TCTGCGCGTA ATCTGCTGCT TGCAAACAAA 10140 AAAACCACCG CTACCAGCGG TGGTTTGTTT GCCGGATCAA GAGCTACCAA CTCTTTTTCC 10200 GAAGGTAACT GGCTTCAGCA GAGCGCAGAT ACCAAATACT GTCCTTCTAG TGTAGCCGTA 10260 126- G TTAGGCCAC 10320
GTTACCAGTG
10380
ATAGTTACCG
10440
CTTGGAGCGA
10500
CACGCTTCCC
10560
AGAGCGCACG
10620
TCGCCACCTC
10680
GAAAAACGCC
10'740
CATGTTCTTT
10800
AGCTGATACC
10860
GGAAGAGCGC
10920
ACCGCCTCAG
10980 CACTTCAAGA ACTCTGTAGC ACCGCCTACA TACCTCGCTC TGCTAATCCT GCTGCTGCCA GTGGCGATAA GTCGTGTCTT ACCGGGTTGG ACTCAAGACG GATAAGGCGC AGCGGTCGGG CTGAACGGGG GGTTCGTGCA CACAGCCCAG ACGACCTACA CCGAACTGAG ATACCTACAG CGTGAGCTAT GAGAAAGCGC GAAGGGAGAA AGGCGGACAG GTATCCGGTA AGCGGCAGGG TCGGAACAGG AGGGAGCTTC CAGGGGG.AAA CGCCTGGTAT CTTTATAG.TC CTGTCGGGTT TGACTTGAGC GTCGATTTTT GTGATGCTCG TCAGGGGGGC GGAGCCTATG AGCAACGCGG CCTTTTTACG GTTCCTGGCC TTTTGCTGGC CTTTTGCTCA CCTGCGTTAT CCCCTGATTC TGTGGATAAC CGTATTACCG CCTTTGAGTG GCTCGCCGCA GCCGAACGAC CGAGCGCAGC GAGTCAGTGA GCGAGGAAGC CTGATGCGGT ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT AAGCCATAGA GCCCACCGCA TCCCCAGCAT GCCTGCTATT GTCTTCCCAA TCCTCCCCCT TGCTGTCCTG CCCCACCCCA CCCCCCAGAA TAGAATGACA CCTACTCAGA 11040 127- CAATGCGATG CAATTTCCTC ATTTTATTAG G.AAAGGACAG TGGGAGTGGC 11100 GTCAAGGAAG GCACGGGGGA GGGGCAAACA ACAGATGGCT GGCAACTAGA 11160 GAGGCTGATC AGCGAGCTCT AGCATTTAGG TGACACTATA GAATAGGGCC 11220
ACCTTCCAGG
AGGCACAGTC
CTCTAGATGC
ATGCTCGAGC GGCCGCTTCT 11280 GCAAATATTT CATTAATGTA 11340 GCACTTGGAG TTGTGTCTCC 11400 GGTTTAGTTT TGTCTCCGTT 11460 CCGTGAGATT TTGGATAAGC 11520 GTGCCTTCAG TAAGATCTCC 11580 TTTAGTAGCA CTCCATTTTC 11640 GATATTGGAG CCAAACTGCC 11700 CCACATTTTG TTAAGACCAA 117 GGAGATGGAG CTGGTGTGGT 11820 TTATTCTTGG GCAATGTATG AAAALAGTGTA AGAGGATGTG GTTGTGGCCA GACCAGTCCC ATGAAAATGA CATAGACTAT TGTTTCCTGT GTACCGTTTA GTGTAATGGT TAGTGTTACA TAAGTAAACT TGACTGACAA TGTTACTTTT GGCAGTTTTA TGATAGGTTA GGCATAAATC CAACAGCGTT TGTATAGGCT ATTTCTAAAG TTCCAATATT CTGGGTCCAG GAAGGAATTG GTCAAATCTT.ATAATAAGAT GAGCACTTTG AACTGTTCCA TTTAACAGCC AAAACTGAAA CTGTAGCAAG TATTTGACTG AGTGAGTTTA GCATCTTTCT CTGCATTTAG TCTACAGTTA CCACAAAGTT AGCTTATCAT TATTTTTGTT TCCTACTGTA -128- ATGGCACCTG TGCTGTCAAA ACTAAGGCCA GTTCCTAGTT TAGGAACCAT AGCCTTGTTT 11880
GAATCAAATT
11940
GGTGAACCAA
12 000
AACCCCTTGG
12060
AAGTAAAGGC
12120
CCCTGTCCTA
12180
ATAAGGCGTC
12240
CCTTGTGAAT
12300
GTCACACCTG
12360
GCCCCGTACT
12420
CCCAAGCTAC
12480
TCGGTGGTGG
12 540
GGTCCTTGGG
12 600 CTAGGCCATG GCCAATTTTT GTTTTGAGGG GATTTGTGTT TGGTGCATTA ATTCAAGCCC ATCTCCTGCA TTAATGGCTA TGGCTGTAGC GTCAAACATC CAGTGCTTAG GTTAACCTCA AGCTTTTTGG AATTGTTTGA AGCTGTAAAC CTTTGTTGTA GTTAATATCC AAGTTGTGGG CTGAGTTTAT AAAAAGAGGG GTCTTAGATT TAGTTGGTTT TGAGCATCAA ACGGATA.ACT AACATCAAGT TGTTTTGAGA ATCAATCCTT AGTCCTCCTG CTACATTAAG TTGCATATTG CAAAACCCAA GGCTCCAGTA ACTTTAGTTT GCAAGGAAGT ATTATTAATA GACCAGTTGC TACGGTCAAA GTGTTTAGGT CGTCTGTTAC ATGCAAAGGA TTAGTCCTAG TTTTCCATTT TGTGTATAAA TGGGCTCTTT CAAGTCAATG CAGTGGCAGT AGTTAGAGGG GGTGAGGCAG TGATAGTAAG GGTACTGCTA TGAGGGGGCC TGATGTTTGC AGGGCTAGCT TTCCTTCTGA CACTGTGAGG TGGCAATGCT AAGTTTGGAG TCGTGCACGG TTAGCGGGGC CTGTGATTGC 129
ATGGTGAGTG
12660
GAGGTAACTG
12720
GGTGGGCTCA
12780
CCGTTGCCCA
12 840
AAAGAGAGTA
12900
AGAAAAGGCA
12960
TCTTCAGACG
13020
CATCCGCCGT
13080
CAACCCCGAC
13140
CTGGTTAGAC
13200
CCTCGGTGGC
13260
ACCGCGAAGA
13320 TGTTGCCCGC GACCATTAGA GGTGCGGCGG CAGCCACAGT TAGGGCTTCT TGAGGGGTGC AGATATTTCC AGGTTTATGT TTGACTTGGT TTTTTTGAGA CAGTGGTTAC ATTTTGGGAG GTAAGGTTGC CGGCCTCGTC CAGAGAGAGO TTTTGAGCGC AAGCATGCCA TTGGAGGTAA CTAGAGGTTC GGATAGGCGC CCCCAGGGGG ACTCTCTTGA AACCCATTGG GGGATACAAA GGGAGGAGTA CAGTTGGAGG ACCGGTTTCC GTGTCATATG GATACACGGG GTTGAAGGTA GTCTTGCGCG CTTCATCTTG GATCTCAAGC CTGCCACACC TCACCTCGAC CTCAAGACCG CCTACTTTAA TTACATCATC AGCAGCACCT CCGCCAGAAA CGCCACCCGC TGCCGCCCGC CACGGTGCTC AGCCTACCTT GCGACTGTGA GCCTTTCTCG AGAGGTTTTC CGATCCGGTC GATGCGGACT CGCTCAGGTC GGAGTACCGT TCGGAGGCCG ACGGGTTTCC GATCCAAGAG TACTGGAAAG GTTTGTCCTC AACCGCGAGC CCAACAGCGA GCTCGAATTC AGATCCGAGC TCGGTACCAA GCTTGGGTCT CCCTATAGTG AGTCGTATTA ATTTCGATAA GCCAGTAAGC 13380 130
S.
S
S
S
S
.55*
S.
S
SS*
AGTGGGTTCT
13440
ACCGCCCATT
13500
TTGGTGCCAA
13 560
GTCAAACCGC
13620
GATGACTAAT
13680
ATAATGCCAG
13740
ATACACTTGA
13800
ATGGAAAGTC
13860
GGGGTCGTTG
13920
CATATATGGG
13 980
TCAATGTCAA
14040
AAGCAGATTC
14100
CGCGCACTAC
14160 CTAGTTAGCC AGAGAGCTCT GCTTATATAG ACCTCCCACC GTACACGCCT TGCGTCAATG GGGCGGAGTT GTTACGACAT TTTGGAAAGT CCCGTTGATT AACAAACTCC CATTGACGTC AATGGGGTGG AGACTTGGAA ATCCCCGTGA TATCCACGCC CATTGATGTA CTGCCAAAAC CGCATCACCA TGGTAATAGC ACGTAGATGT ACTGCCAAGT AGGAAAGTCC CATAAGGTCA TGTACTGGGC GCGGGCCATT TACCGTCATT GACGTCAATA GGGGGCGTAC TTGGCATATG TGTACTGCCA AGTGGGCAGT TTACCGTAAA TAGTCCACCC ATTGACGTCA CCTATTGGCG TTACTATGGG AACATACGTC ATTATTGACG TCAATGGGCG GGCGGTCAGC CAGGCGGGCC ATTTACCGTA AGTTATGTAA CGCGGAACTC CTATGAACTA ATGACCCCGT AATTGATTAC TATTAATAAC TAGTCAATAA CGCGTATATC TGGCCCGTAC ATCGCGAAGC AGCGCAAAAC GCCTAACCCT TTCATGCAAT TGTCGGTCAA GCCTTGCCTT GTTGTAGCTT AAATTTTGCT TCAGCGACCT CCAACACACA AGCAGGGAGC AGATACTGGC TTAACTATGC -131- GGCATCAGAG CAGATTGTAC TGAGAGTCGA CCATAGGGGA TCGGGAGATC TCCCGATCCG 14220 TCTATGGTGC ACTCTCAGTA CAATCTGCTC TGATGCCGCA TAGTTAAGCC AGTATACACT 14280 CCGCTATCGC TACGTGACTG GGTCATGGCT GCGCCCCGAC ACCCGCCAAC ACCCGCTGAC 143-40 GCGCCCTGAC GGGCTTGTCT GCTCCCGGCA. TCCGCTTACA GACAAGCTGT GACCGTCTCC 14400 GGGAGCTGCA TGTGTCAGAG GTTTTCACCG TCATCACCGA AACGCGCGAG GCAGC 0 CID 14455 (C SOADDES:dul N RATI-SN FSEQ N DO O1 TCTTGCA TTCTGACGTTG AACTC TAGATATGC .5120 132-
TCCTCTTACA
180
CGTGTTTATT
240
ACCACCACAT
300
CAACCTGCCA
360
AAAAAGCATC
420
CTGTCGAGCC
480
CATGTCGCTG
540
CGGCGAAGGA
600
GCGGTGGTGC
660
ATACAACATG
720
TGTCCTCCGG
780
CAGCACCACA
840
GGGGACCACA
900 CTTTTTCATA CATTGCCCAA GAATAAAGAA TCGTTTGTGT TATGTTTCAA.
TTTCAATTGC AGAAAATTTC AAGTCATTTT TCATTCAGTA GTATAGCCCC AGOTTATACA GATCACCGTA CCTTAATCAA ACTCACAGAA CCCTAGTATT CCTCCCTCCC AACACACAGA GTACACAGTC CTTTCTCCCC GGCTGGCCTT ATATCATGGG TAACAGACAT ATTCTTAGGT GTTATATTCC ACACGGTTTC AAACGCTCAT CAGTGATATT AATAAACTCC CCGGGCAGCT CACTTAAGTT TCCAGCTGCT GAGCCACAGG CTGCTGTCCA ACTTGCGGTT GCTTAACGGG GAAGTCCACG CCTACATGGG GGTAGAGTCA TAATCGTGCA TCAGGATAGG TGCAGCAGCG CGCGAATAAA CTGCTGCCGC CGCCGCTCCG TCCTGCAGGA GCAGTGGTCT CCTCAGCGAT GATTCGCACC GCCCGCAGCA TAAGGCGCCT GCACAGCAGC GCACCCTGAT CTCACTTAAA TCAGCACAGT AACTGCAGCA ATATTGTTCA AAATCCCACA GTGCAAGGCG CTGTATCCAA AGCTCATGGC GAACCCACGT GGCCATCATA CCACAAGCGC AGGTAGATTA AGTGGCGACC -133 CCTCATAAAC ACGCTGGACA TAAACATTAC CTCTTTTGGC ATGTTGTAAT TCACCACCTC 960 CCGGTACCAT ATAAACCTCT GATTAAACAT GGCGCCATCC ACCACCATCC TAAACCAGCT 1020 GGCCAAAACC TGCCCGCCGG CTATACACTG CAGGGAACCG GGACTGGAAC AATGACAGTG 1080 GAGAGCCCAG GACTCGTAAC CATGGATCAT CATGCTCGTC ATGATATCAA TGTTGGCACA 1140 ACACAGGCAC ACGTGCATAC ACTTCCTCAG GATTACAAGC TCCTCCCGCG TTAGAACCAT 1200 ATCCCAGGGA ACAACCCATT CCTGAATCAG CGTAAATCCC ACACTGCAGG GAAGACCTCG 1260 CACGTAACTC ACGTTGTGCA TTGTCAAAGT GTTACATTCG GGCAGCAGCG GATGATCCTC 1320 CAGTATGGTA GCGCGGGTTT CTGTCTCAAA AGGAGGTAGA CGATCCCTAC TGTACGGAGT 1380 GCGCCGAGAC AACCGAGATC GTGTTGGTCG TAGTGTCATG CCAAATGGAA CGCCGGACGT 1440 AGTCATATTT CCTGAAGCAA A.ACCAGGTGC GGGCGTGACA AACAGATCTG CGTCTCCGGT 1500 CTCGCCGCTT AGATCGCTCT GTGTAGTAGT TGTAGTATAT CCACTCTCTC AAAGCATCCA 1560 GGCGCCCCCT GGCTTCGGGT TCTATGTAAA CTCCTTCATG CGCCGCTGCC CTGATAACAT 1620 CCACCACCGC AGAATAAGCC ACACCCAGCC AACCTACACA TTCGTTCTGC GAGTCACACA 1680 -134 CGGGAGGAGC GGGAAGAGCT GGAAGAACCA TGTTTTTTTT TTTATTCCAA AAGATTATCC 1740 AAAACCTCAA AATGAAGATC TATTAAGTGA ACGCGCTCCC CTCCGGTGGC GTGGTCAAAC 1800 TCTACAGCCA AAGAACAGAT AATGGCATTT GTAAGATGTT GCACAATGGC TTCCAAAAGG 1860 CAAACGGCCC TCACGTCCAA GTGGACGTAA AGGCTAAACC CTTCAGGGTG AATCTCCTCT 1920 ATAAACATTC CAGCACCTTC AACCATGCCC AAATAATTCT CATCTCGCCA CCTTCTCAAT 1980 ATATCTCTAA GCAAATCCCG AATATTAAGT CCGGCCATTG TAA;AATCTG CTCCAGAGCG 2040 CCCTCCACCT TCAGCCTCAA GCAGCGAATC ATGATTGCAA AAATTCAGGT TCCTCACAGA 2100 CCTGTATAAG ATTCAAAAGC GGAACATTAA CAAAAATACC GCGATCCCGT AGGTCCCTTC 2160 GCAGGGCCAG CTGAACATAA TCGTGCAGGT CTGCACGGAC CAGCGCGGCC ACTTCCCCGC 2220 CAGGAACCTT GACAAAAGAA CCCACACTGA TTATGACACG CATACTCGGA GCTATGCTAA 2280 CCAGCGTAGC CCCGATGTAA GCTTTGTTGC ATGGGCGGCG ATAT.AAAATG CAAGGTGCTG 2340 CTCAAAAAAT CAGGCAAAGC CTCGCGCAAA AAAGAAAGCA CATCGTAGTC ATGCTCATGC 2400 AGATAAAGGC AGGTAAGCTC CGGAACCACC ACAGAAAAAG ACACCATTTT TCTCTCAAAC 2460 135- ATGTCTGCGG GTTTCTGCAT AAACACAAAA 2520 AAGCCTGTCT TACAACAGGA AAAACAACCC 2580 CGGCGTGACC GTAAAAAAAC TGGTCACCGT 2640 TCATGTCCGG AGTCATAATG TAAGACTCGG 2700 GCTAAAAAGC GACCGAAATA GCCCGGGGGA 2760 ACAGCCCCCA TAGGAGGTAT AACAAAATTA 2820 AAACCCTCCT GCCTAGGCAA AATAGCACCC 2880 CAGCGGCAGC CTAACAGTCA GCCTTACCAG 2940 CTCGACACGG CACCAGCTCA ATCAGTCACA 3000 ATATATAGGA CTAAAAAATG ACGTAACGGT 3060 TAAAATAACA AAAAAACATT TAAACATTAG TTATAAGCAT AAGACGQACT ACGGCCATGC GATTAAAAAG CACCACCGAC .AGCTCCTCGG TAAACACATC AGGTTGATTC ATCGGTCAGT ATACATACCC GCAGGCGTAG AGACAACATT ATAGGAGAGA AAAACACATA AACACCTGAA TCCCGCTCCA GAACAACATA CAGCGCTTCA TAAAAAAGAA AACCTATTAA AAAAACACCA GTGTAAAAAA GGGCCAAGTG CAGAGCGAGT TAAAGTCCAC AAAAAACACC CAGAAAACCG CACGCGAACC TACGCCCAGA AACGAAAGCC AAAAAACCCA CAACTTCCTC AAATCGTCAC 3120 TTCCGTTTTC CCACGTTACG TAACTTCCCG GATCCTCTCC CGATCCCCTA TGGTCGACTC 3180 TCAGTACAAT CTGCTCTGAT GCCGCATAGT TAAGCCAGTA TCTGCTCCCT GCTTGTGTGT 3240 -136 a.
a a a a a a a
TGGAGGTCGC
3300
ACAATTGCAT
3360
CCAGATATAC
3420
CATTAGTTCA
3480
CTGGCTGACC
3540
TAACGCCALAT
3600
ACTTGGCAGT
3660
GTAAATGGCC
3720
AGTACATCTA
3780
ATGGGCGTGG
3840
ATGGGAGTTT
3900
CCCCATTGAC
3960
TCTGGCTAAC
4020 TGAGTAGTGC GCGAGCAAAA TTTAAGCTAC AACAAGGCAA GGCTTGACCG GAAGAATCTG CTTAGGGTTA GGCGTTTTGC GCTGCTTCGC GATGTACGGG GCGTTGACAT TGATTATTGA CTAGTTATTA ATAGTAATCA ATTACGGGGT TAGCCCATAT ATGGAGTTCC GCGTTACATA ACTTACGGTA AATGGCCCGC GCCCAACGAC CCCCGCCCAT TGACGTCAAT AATGACGTAT GTTCCCATAG AGGGACTTTC CATTGACGTC AATGGGTGGA CTATTTACGG TAAACTGCCC ACATCAAGTG TATCATATGC CAAGTACGCC CCCTATTGAC GTCAATGACG CGCCTGGCAT TATGCCCAGT ACATGACCTT ATGGGACTTT CCTACTTGGC CGTATTAGTC ATCGCTATTA CCATGGTGAT GCGGTTTTGG CAGTACATCA ATAGCGGTTT GACTCACGGG GATTTCCAAG TCTCCACCCC ATTGACGTCA GTTTTGGCAC CAAAATCAAC GGGACTTTCC AAAATGTCGT AACAACTCCG GCAAATGGGC GGTAGGCGTG TACGGTGGGA GGTCTATATA AGCAGAGCTC TAGAGAACCC ACTGCTTACT GGCTTATCGA AATTAATACG ACTCACTATA -137 GGGAGACCCA AGCTTGGTAC CGAGCTCGGA TCTGAATTCG AGCTCGCTGT TGGGCTCGCG 4080
GTTGAGGACA
4140
CGAACGGTAC
4200
TCTCGAGAAA
4260
GCAGCGGGTG
4320
AGGCGGTCTT
4380
AAGCGCGCAA
4440
GGTCCTCCAA
4500
AGTCCCCCTG
4560
CTTGCGCTCA
4620
AATGTAACCA
4680
TCTGCACCCC
4740
GTCGCGGGCA
4800 AACTCTTCGC GGTCTTTCCA GTACTCTTGG ATCGGAAACC CGTCGGCCTC TCCGCCACCG AGGGACCTGA GCGAGTCCGC ATCGACCGGA TCGGAAAACC GGCGTCTAAC CAGTCACAGT CGCAAGGTAG GCTGAGCACC GTGGCGGGCG GCGGTCGGGG TTGTTTCTGG CGGAGGTGCT GCTGATGATG TAATTAAAGT GAGACGGCGG ATGGTCGAGG TGAGGTGTGG CAGGCTTGAG ATCCAAGATG GACCGTCTGA AGATACCTTC AACCCCGTGT ATCCATATGA CACGGAAACC CTGTGCCTTT TCTTACTCCT CCCTTTGTAT CCCCCAATGG GTTTCAAGAG GGGTACTCTC TTTGCGCCTA TCCGAACCTC TAGTTACCTC CAATGGCATG AA.ATGGGCAA CGGCCTCTCT CTGGACGAGG CCGGCAACCT TACCTCCCAA CTGTGAGCCC ACCTCTCAAA AAAACCAAGT CAAACATAAA CCTGGAAATA VCACAGTTAC CTCAGAAGCC CTAACTGTGG CTGCCGCCGC ACCTCTAATG ACACACTCAC CATGCAATCA CAGGCCCCGC TAACCGTGCA CGACTCCAAA 138 CTTAGCATTG CCACCCAAGG ACCCCTCACA 4860 TCAGGCCCCC TCACCACCAC CGATAGCAGT 4920 ACTACTGCCA CTGGTAGCTT GGGCATTGAC 4980 GTGTCAGAAG GAAAGCTAGC CCTGCAAACA ACCCTTACTA TCACTGCCTC ACCCCCTCTA TTGAAAGAGC CCATTTATAC ACAAAATGGA AAACTAGGAC TAAAGTACGG GGCTCCTTTG CATGTAACAG ACGACCTAAA CACTTTGACC 5040 99 9* 9 99 9 ~9 9 99 99 9 9* 9 9 .9 9.
9 9 9 .9 9 99..
GTAGCAACTG GTCCAGGTGT GACTATTAAT 5100 GCCTTGGGTT TTGATTCACA AGGCAATATG 5160 GATTCTCAAA ACAGACGCCT TATACTTGAT 5220 CTAAATCTAA GACTAGGACA GGGCCCTCTT 5280 AATACTTCCT TGCA.AACTAA AGTTACTGGA CAACTTAATG TAGCAGGAGG ACTAAGGATT GTTAGTTATC CGTTTGATGC TCAAAACCAA TTTATAAACT CAGCCCACAA CTTGGATATT AACTACAACA: AAGGCCTTTA CTTGTTTACA GCTTCAAACA- ATTCCAAAAA .GCTTGAGGTT 5340 AACCTA6AGCA CTGCCAAGGG GTTGATGTTT 5400 GATGGGCTTG AATTTGGTTC ACCTAATGCA 5460 GACGCTACAG CCATAGC CAT TAATGCAGGA CCAAACACAA ATCCCCTCAA AACAAAAATT GGCCATGGCC TAGAATTTGA TTCAA6ACAAG GCTATGGTTC CTAAACTAGG AACTGGCCTT 5520 AGTTTTGACA GCACAGGTGC CATTACAGTA GGAAACAAAA ATAATGATAA GCTAACTTTG 5580 139-
TGGACCACAC
5640
ACTTTGGTCT
5700
AAAGGCAGTT
5760
GACGAAAATG
5820
AATGGAGATC
5880 CAGCTCCATC TCCTAACTGT AGACTAAATG CAGAGAAAGA TGCTAAACTC TAACAAAATG TGGCAGTCAA ATACTTGCTA CAGTTTCAGT TTTGGCTGTT TGGCTCCAAT ATCTGGAACA GTTCAAAGTG CTCATCTTAT TATAAGATTT GAGTGCTACT AAACAATTCC TTCCTGGACC CAGAATATTG GAACTTTAGA TTACTGAAGG CACAGCCTAT ACAAACGCTG TTGGATTTAT GCCTAACCTA s00 0:.9 0 0 :.0 00000 10 0sees 0see* TCAGCTTATC CAAAATCTCA CGGTAAAACT GCCAAAAGTA ACATTGTCAG TCAAGTTTAC 5940
TTAAACGGAG
6000
ACAGGAGACA
6060
'AACTACATTA
6120
TAAAGAAGCG
6180
GCTAGAGCTC
6240
CCCTCCCCCG
6300
AATGAGGAA
6360 ACAAAACTAA ACCTGTAACA- CTAACCATTA CACTAAACGG TACACAGGAA CAACTCCAAG TGCATACTCT ATGTCATTTT CATGGGACTG GTCTGGCCAC ATGAAATATT TGCCACATCC -TCTTACACTT TTTCATACAT TGCCCAAGAA GCCGCTCGAG CATGCATCTA GAGGGCCCTA TTCTATAGTG TCACCTAAAT GCTGATCAGC CTCGACTGTG CCTTCTAGTT GCCAGCCATC TGTTGTTTGC TGCCTTCCTT GACCCTGGAA GGTGCCACTC CCACTGTCCT TTCCTAATAA TTGCATCGCA TTGTCTGAGT AGGTGTCATT CTATTCTGGG GGGTGGGGTG -140-
GGGCAGGACA
6420
GGCTCTATGG
6480
CCCTGTAGCG
6540
CTTGCCAGCG
6600
GCCGGCTTTC
6660
TTACGGCACC
6720 GCAAGGGGGA GGATTGGGAA GACAATAGCA GGCATGCTGG GGATGCGGTG CTTCTGAGGC GGAA.AGAACC AGCTGGGGCT CTAGGGGGTA TCCCCACGCG GCGCATTAAG CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA CCCTAGCGCC CGCTCCTTTC GCTTTCTTCC CTTCCTTTCT CGCCACGTTC CCCGTCAAGC TCTAAATCGG GGCATCCCTT TAGGGTTCCG ATTTAGTGCT TCGACCCCAA AAAACTTGAT TAGGGTGATG GTTCACGTAG TGGGCCATCG CCCTGATAGA CGGTTTTTCG CCCTTTGACG TTGGAGTCCA CGTTCTTTAA TAGTGGACTC 6780
TTGTTCCAAA
6840
ATTTTGGGGA
6900 CTGGAACAAC ACTCAACCCT ATCTCGGTCT ATTCTTTTGA TTTATAAGGG TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG AATTAATTCT GTGGAATGTG TGTCAGTTAG GGTGTGGAAA GTCCCCAGGC TCCCCAGGCA 6960
GGCAGAAGTA
7020
GGCTCCCCAG
7080
CCGCCCCTAA
7140 TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCAGGTGTGG AAAGTCCCCA CAGGCAGAAG TATGCAAAGC ATGCATCTCA ATTAGTCAGC AACCATAGTC CTCCGCCCAT CCCGCCCCTA ACTCCGCCCA GTTCCGCCCA TTCTCCGCCC 141 CATGGCTGAC TAATTTTTTT TATTTATGCA GAGGCCGAGG CCGCCTCTGC CTCTGAGCTA 7200 TTCCAGAAGT AGTGAGGAGG CTTTTTTGGA GGCCTAGGCT TTTGCAAAAA GCTCCCGGGA 7260 GCTTGTATAT CCATTTTCGG ATCTGATCAA GAGACAGGAT GAGGATCGTT TCGCATGATT 7320 GAACAAGATG GATTGCACGC AGGTTCTCCG GCCGCTTGGG TGGAGAGGCT ATTCGGCTAT 7380 GACTGGGCAC AACAGACAAT CGGCTGCTCT GATGCCGCCG TGTTCCGGCT GTCAGCGCAG 7440 GGGCGCCCGG TTCTTTTTGT CAAGACCGAC CTGTCCGGTG CCCTGAATGA ACTGCAGGAC 7500 GAGGCAGCGC GGCTATCGTG GCTGGCCACG ACGGGCGTTC CTTGCGCAGC TGTGCTCGAC 7560 GTTGTCACTG AAGCGGGAAG GGACTGGCTG CTATTGGGCG AAGTGCCGGG GCAGGATCTC 7620 CTGTCATCTC ACCTTGCTCC TGCCGAGAAA GTATCCATCA TGGCTGATGC AATGCGGCGG 7680 CTGCATACGC TTGATCCGGC TACCTGCCCA TTCGACCACC AAGCGAAACA TCGCATCGAG 7740 CGAGCACGTA CTCGGATGGA AGCCGGTCTT GTCGATCAGG ATGATCTGGA CGAAGAGCAT 7800 CAGGGGCTCG CGCCAGCCGA ACTGTTCGCC AGGCTCAAGG CGCGCATGCC CGACGGCGAG 7860 GATCTCGTCG TGACCCATGG CGATGCCTGC TTGCCGAATA TCATGGTGGA AAATGGCCGC 7920 -142- 4* TT'TTCTGGAT TCATCGACTG TGGCCGGCTG GGTGTGGCGG ACCGCTATCA GGACATAGCG 7980 TTGGCTACCC GTGATATTGC TGAAGAGCTT GGCGGCGAAT GGGCTGACCG CTTCCTCGTG 8040 CTTTACGGTA TCGCCGCTCC CGATTCGCAG CGCATCGCCT TCTATCGCCT TCTTGACGAG 8100 TTCTTCTGAG CGGGACTCTG GGGTTCGAAA TGACCGACCA AGCGACGCCC AACCTGCCAT 8160 CACGAGATTT CGATTCCACC GCCGCCTTCT ATGAAAGGTT GGGCTTCGGA ATCGTTTTCC 8220 GGGACGCCGG CTGGATGATC CTCCAGCGCG GGGATCTCAT GCTGGAGTTC TTCGCCCACC 8280 CCAACTTGTT TATTGCAGCT TATAATGGTT ACA.AATAAAG CAATAGCATC ACAAATTTCA 8340 CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC ATCAATGTAT 8400 CTTATCATGT CTGTATACCG TCGACCTCTA GCTAGAGCTT GGCGTAATCA TGGTCATAGC 8460 TGTTTCCTGT GTGAAATTGT TATCCGCTCA CAATTCCACA CAACATACGA GCCGGAAGCA 8520 TAAAGTGTAA AGCCTGGGGT GCCTAATGAG TGAGCTAACT CACATTAATT GCGTTGCGCT 8580 CACTGCCCGC TTTCCAGTCG GGAAACCTGT CGTGCCAGCT GCATTAATGA ATCGGCCAAC 8640 GCGCGGGGAG AGGCGGTTTG CGTATTGGC-C GCTCTTCCGC TTCCTCGCTC ACTGACTCGC 8700 143-
TGCGCTCGGT
8760
TATCCACAGA
8820
CCAGGAACCG
8880
AGCATCACAA
8940
ACCAGGCGTT
9000
CCGGATACCT
9060
GTAGGTATCT
9120
CCGTTCAGCC
9180
GACACGACTT
9240 CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAA.AGGCG GTAkATACGGT ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC CAGCAAAAGG TAAAALAGGCC GCGTTGCTGG CGTITTTTCCA TAGGCTCCGC CCCCCTGACG AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA CTATAALAGAT TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC CTGCCGCTTA GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAA TGCTCACGCT CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG CACGAACCCC CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC AACCCGGTAA ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA GCGAGGTATG TAGGCGGTGC TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT AGAAGGACAG 9300 TATTTGGTAT CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGkGTT GGTAGCTCTT 9360
GATCCGGCAA
9420 ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG CAGCAGATTA CGCGCAGAAA AAAAGGATCT 9480 CAAGAAGATC CTTTGATCTT TTCTACGGGG TCTGACGCTC -144- AGTGGAACGA AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA AGGATCTTCA 9540
CCTAGATCCT
9600
CTTGGTCTGA
9660
TTCGTTCATC
9720
TACCATCTGG
9780
TATCAGCAAT
9840
CCGCCTCCAT
9900
ATAGTTTGCG
9960
GTATGGCTTC
10020
TGTGCAAAAA
1008 0
CAGTGTTATC
10140
TAAGATGCTT
10200 TTTAAAkTTAA AAATGAAGTT TTAAATCAAT CTAAAGTATA TATGAGTAAA CAGTTACCAA TGCTTAATCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT CATAGTTGCC TGACTCCCCG TCGTGTAGAT AACTACGATA CGGGAGGGCT CCCCAGTGCT GCAATGATAC CGCGAGACCC ACGCTCACCG GCTCCAGATT AAACCAGCCA GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GCAACTTTAT CCAGTCTATT AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA CAACGTTGTT GCCATTGCTA CAGGCATCGT GGTGTCACGC TCGTCGTTTG ATTCAGCTCC GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCCATGT AGCGGTTAGC TCCTTCGGTC CTCCGATCGT TGTCAGAAGT AAGTTGGCCG ACTCATGGTT ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG TTCTGTGACT GGTGAGTACT CAACCAAGTC ATTCTGAGAA TAGTGTATGC GGCGACCGAG TTGCTCTTGC CCGGCGTCAA TACGGGATAA TACCGCGCCA CATAGCAGAA 10260 145- CTTTAAAAGT GCTCATCATT GGAAAACGTT CTTCGGGGCG AAAACTCTCA
AGGATCTTAC
10320 CGCTGTTGAG ATCCAGTTCG ATGTAACCCA CTCGTGCACC CAACTGATCT
TCAGCATCTT
10380 TTACTTTCAC CAGCGTTTCT GGGTGAGCAA AA.ACAGGAAG GCAAAATGCC
GCAAAAAAGG
10440 GAATAAGGGC GACACGGAAA TGTTGAATAC TCATACTCTT CCTTTTTCAA
TAT'TATTGAA
10500 GCATTTATCA GGGTTATTGT CTCATGAGCG GATACATATT TGAATGTATT
TAGAAAAATA
10560 AACAAATAGG GGTTCCGCGC ACATTTCCCC GAAAAGTGCC
ACCTGACGTC
10610 INFORMATION FOR SEQ ID NO:17: SEQUENCE
CHARACTERISTICS:
LENGTH: 24 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE:
NO
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l7: TGTACACCGG ATCCGGCGCA CACC24 -146- INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 35 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL:
NO
(iv) ANTI-SENSE: NO o o *4* (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: CACAACGAGC TCAATTAATT AATTGCCACA TCCTC
G
G
INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 4 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: Thr Leu Trp Thr 147 1.
INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 12 amino acids TYPE: amino acid TOPOLOGY: linear (ii) MOLECULE TYPE: peptide FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID Pro Ser Ala Ser Ala Ser Ala Ser Ala Pro Gly Ser 15 sees

Claims (8)

1. A packaging cell line expressing one or more adenovirus structural proteins, polypeptides, or fragments thereof, wherein said structural protein is selected from the group consisting of: a. penton base; b. hexon; c. fiber; d. polypeptide ma; e. polypeptide V; f. polypeptide VI; g. polypeptide VII; h. polypeptide VII; and i. biologically active fragments thereof.
2. A packaging cell line according to claim 1, which supports the production of a viral vector.
3. A packaging cell line according to claim 2, wherein the viral vector comprises a tripartite leader sequence or a sequence substantially homologous thereto.
4. A packaging cell line according to any one of the preceding claims wherein said structural protein is fiber and wherein said fiber protein has been modified to include a non-native amino acid residue sequence which targets a specific receptor, but which does not disrupt trimer formation or transport of fiber into the nucleus.
5. A packaging cell line according to claim 4 wherein said non-native amino acid residue sequence alters the binding specificity of the fiber for a targeted cell type.
6. A packaging cell line according to, any one of the preceding claims wherein said structural protein is fiber comprising amino acid residue sequences from more than one adenovirus serotype.
7. A packaging cell line according to any one of claims 2 to 6, wherein said viral vector includes a nucleic acid sequence having a deletion or mutation of a DNA sequence encoding an adenovirus structural protein, polypeptide, or fragment thereof.
149- 8. A packaging cell line according to claim 7 wherein said viral vector includes a nucleic acid sequence having a deletion or mutation of the DNA sequences encoding regulatory polypeptides EIA and ElB. 9. A packaging cell line according to claim 7, wherein said viral vector further includes a nucleic acid sequence having a deletion or mutation of a DNA sequence encoding one or more of the following regulatory proteins or polypeptides: E2A, E2B, E3, E4, LA, or fragments thereof. A packaging cell line according to claim 7, wherein a foreign DNA sequence encoding one or more foreign proteins, polypeptides or fragments thereof has been inserted in place of any of said deletions in said therapeutic viral vector. 11. A packaging cell line according to claim 10, wherein said foreign DNA encodes a S tumor-suppressor protein or a biologically active fragment thereof. 12. A packaging cell line according to claim 10, wherein said foreign DNA encodes a suicide protein or a biologically active fragment thereof. 13. A packaging cell line according to any one of the preceding claims, wherein said cell line is an epithelial cell line. 14. A packaging cell line according to claim 12, wherein said cell line is selected from the group consisting of 293, A549, W162, HeLa, Vero, 211, and 21 IA cell lines. A viral vector comprising deletion or mutation of a DNA sequence encoding an adenovirus structural protein, polypeptide, or fragment thereof, wherein said structural protein, polypeptide or fragment thereof is selected from the group consisting of: a. penton base; b. hexon; c. fiber; d. polypeptide ma; e. polypeptide V; f. polypeptide VI; g. polypeptide VII; h. polypeptide VIII; and i. biologically active fragments thereof. -150- 16. A viral vector according to claim 15, further comprising deletion or mutation of the DNA sequences encoding regulatory polypeptides ElA and E1B, or fragments thereof. 17. A viral vector according to claim 15, further comprising deletion or mutation of the DNA sequence encoding one or more of the following regulatory proteins or polypeptides: E2A, E2B, E3, E4, LA, or fragments thereof. 18. A viral vector according to claim 15 further comprising a foreign DNA sequence inserted in place of the DNA sequence encoding said structural protein, polypeptide, or fragment thereof. 19. A viral vector according toany one of claims 16, 17 or 18, further comprising foreign DNA sequences inserted in place of the DNA sequences encoding said regulatory .polypeptides or fragments thereof. 20. A viral vector lacking all or part of a DNA sequence encoding adenovirus fiber protein, wherein said DNA sequence has been replaced by a foreign DNA sequence encoding a therapeutic molecule. 21. A viral vector having a mutation in a DNA sequence encoding adenovirus fiber protein. S 22. A viral vector according to any one of 15 to 21 wherein the vector is a therapeutic vector. 23. A complementing plasmid comprising a promoter nucleotide sequence operatively linked to a nucleotide sequence encoding an adenovirus structural protein, polypeptide, or fragment thereof. 24. A complementing plasmid according to claim 23, further comprising a nucleotide sequence encoding: a. a first adenovirus regulatory protein, polypeptide, or fragment thereof; or b. a second regulatory protein, polypeptide, or fragment thereof; or c. a third regulatory protein, polypeptide, or fragment thereof; or d. any combination of the foregoing. A complementing plasmid according to claim 23, wherein said adenovirus structural protein or polypeptide is selected from the group consisting of: a. penton base; -151 b. hexon; c. fiber; d. polypeptide mia; e. polypeptide V; f. polypeptide VI; g. polypeptide VII; h. polypeptide VIII; and i. biologically active fragments thereof. 26. A complementing plasmid according to claim 24, wherein said regulatory proteins, polypeptides or fragments thereof are selected from the group consisting of E A, E B, *E2A, E2B, E3, E4, and L4. 27. A complementing plasmid comprising a promoter nucleotide sequence operatively linked to a nucleotide sequence encoding an adenovirus regulatory protein, :0 polypeptide, or fragment thereof. 28. A composition useful in the preparation of a therapeutic viral vectors, the composition comprising a cell containing a delivery plasmid comprising an adenovirus genome lacking a nucleotide sequence encoding fiber. 29. A composition according to claim 28, wherein said delivery plasmid further comprises a nucleotide sequence encoding a foreign polypeptide. A composition according to claim 29, wherein said polypeptide is a therapeutic molecule. 31. A composition according to claim 28, wherein said delivery plasmid is selected from the group consisting of pDV1, p ElB gal, p ElsplB, and pFG140-f. 32. A composition according to claim 28, wherein said cell further comprises a complementing plasmid containing a nucleotide sequence encoding fiber, said plasmid being stably integrated into the cellular genome of said cell. 33. A composition according to claim 32, wherein said complementing plasmid has the characteristics of pCLF having ATCC Accession Number 97737. 34. A composition useful in the preparation of therapeutic viral vectors, said composition comprising a cell containing: -152- a. a first delivery plasmid comprising an adenovirus genome lacking a nucleotide sequence encoding fiber and incapable of directing the packaging of new viral particles in the absence of a second delivery plasmid; and b. a second delivery plasmid comprising an adenoviral genome capable of directing the packaging of new viral particles in the presence of said first delivery plasmid. A composition according to claim 34, wherein said first and second delivery plasmids interact within said cell to produce a therapeutic viral vector. 36. A composition according to claim 34, wherein said cell further comprises a complementing plasmid containing a nucleotide sequence encoding fiber, said plasmid being stably integrated into the cellular genome of said cell. 37. A composition according to claim 34, wherein said first or second delivery plasmid further comprises a nucleotide sequence encoding a foreign polypeptide. 38. A composition according to claim 37, wherein said polypeptide is a therapeutic molecule. 39. A composition according to claim 34, wherein said first delivery plasmid lacks adenovirus packaging signal sequences. C S 40. A composition according to claim 34, wherein said second delivery plasmid contains a LacZ reporter construct. 41. A composition according to claim 34, wherein said second delivery plasmid further lacks a nucleotide sequence encoding an adenovirus regulatory protein. S.42. A composition according to claim 41, wherein said regulatory protein is El. 43. A composition according to claim 36, wherein said complementing plasmid has the characteristics of pCLF having ATCC Accession Number 97737. 44. A composition according to claim 34, wherein said first delivery plasmid lacks a nucleotide sequence encoding adenovirus E4 protein and said second delivery plasmid lacks a nucleotide sequence encoding adenovirus El protein. A composition according to claim 44, wherein said cell contains at least one complementing plasmid encoding an adenoviral regulatory protein and a structural protein. 46. A composition according to claim 45, wherein said regulatory protein is E4 and said structural protein is fiber. -153- 47. A composition according to claim 45, wherein said regulatory protein is El and said structural protein is fiber. 48. A composition according to claim 45, wherein said regulatory protein is both El and E4 and said structural protein is fiber. 49. A composition according to claim 45, wherein said adenoviral regulatory protein and said structural protein are encoded by separate complementing plasmids. A composition according to claim 45, wherein said cell is selected from the group consisting of 293, A549, W162, HeLa, Vero, 211, and 211A. DATED this 7th day of March, 2001 Novartis AG AND The Scripps Research Institute By DAVIES COLLISON CAVE Patent Attorneys for the applicants *oooo gi O*o
AU24908/01A 1996-09-25 2001-03-07 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors Abandoned AU2490801A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU24908/01A AU2490801A (en) 1996-09-25 2001-03-07 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors
AU2004202701A AU2004202701A1 (en) 1996-09-25 2004-06-18 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08719806 1996-09-25
AU24908/01A AU2490801A (en) 1996-09-25 2001-03-07 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU46241/97A Division AU4624197A (en) 1996-09-25 1997-09-24 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2004202701A Division AU2004202701A1 (en) 1996-09-25 2004-06-18 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Publications (1)

Publication Number Publication Date
AU2490801A true AU2490801A (en) 2001-05-17

Family

ID=3713783

Family Applications (2)

Application Number Title Priority Date Filing Date
AU24908/01A Abandoned AU2490801A (en) 1996-09-25 2001-03-07 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors
AU2004202701A Abandoned AU2004202701A1 (en) 1996-09-25 2004-06-18 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2004202701A Abandoned AU2004202701A1 (en) 1996-09-25 2004-06-18 Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors

Country Status (1)

Country Link
AU (2) AU2490801A (en)

Also Published As

Publication number Publication date
AU2004202701A1 (en) 2004-07-15

Similar Documents

Publication Publication Date Title
US20020193327A1 (en) Vectors for occular transduction and use therefor for genetic therapy
WO1998013499A2 (en) Packaging cell lines for use in facilitating the development of high-capacity adenoviral vectors
WO2017165859A1 (en) Modified viral capsid proteins
US6090393A (en) Recombinant canine adenoviruses, method for making and uses thereof
US6156567A (en) Truncated transcriptionally active cytomegalovirus promoters
KR20210143230A (en) Methods and compositions for editing nucleotide sequences
KR20230019843A (en) Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US20200157570A1 (en) Enhanced modified viral capsid proteins
JP4495587B2 (en) Recombinant adenovirus vector and use thereof
AU775988B2 (en) Ligand activated transcriptional regulator proteins
US20040013648A1 (en) Vector system
AU772630B2 (en) Adenovirus vectors, packaging cell lines, compositions, and methods for preparation and use
US20030157688A1 (en) Adenovirus vectors, packaging cell lines, compositions, and methods for preparation and use
CN101208425A (en) Cell lines for production of replication-defective adenovirus
CN112888426A (en) AAV three-plasmid system
CN101842479A (en) Signal sequences and co-expressed chaperones for improving protein production in a host cell
US20040002060A1 (en) Fiber shaft modifications for efficient targeting
CN114181957B (en) Stable T7 expression system based on virus capping enzyme and method for expressing protein in eukaryote
CN102002105B (en) Gene, expression vector, expression method, expression cell and application of human papilloma virus (HPV) 16 E7E6 fusion protein
KR102584628B1 (en) An engineered multicomponent system for the identification and characterization of T-cell receptors, T-cell antigens, and their functional interactions.
JP2024083457A (en) Bio-production of lentiviral vectors
KR20210049133A (en) Vector preparation in serum-free medium
CN111094569A (en) Light-controlled viral protein, gene thereof, and viral vector containing same
CN112877292A (en) Human antibody producing cell
CA2519680A1 (en) Adenovirus particles with enhanced infectivity of dendritic cells and particles with decreased infectivity of hepatocytes

Legal Events

Date Code Title Description
MK5 Application lapsed section 142(2)(e) - patent request and compl. specification not accepted