EP1034258A2 - Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule - Google Patents

Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule

Info

Publication number
EP1034258A2
EP1034258A2 EP98957974A EP98957974A EP1034258A2 EP 1034258 A2 EP1034258 A2 EP 1034258A2 EP 98957974 A EP98957974 A EP 98957974A EP 98957974 A EP98957974 A EP 98957974A EP 1034258 A2 EP1034258 A2 EP 1034258A2
Authority
EP
European Patent Office
Prior art keywords
nucleic acid
cell
coding sequence
sequence
transposase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98957974A
Other languages
German (de)
English (en)
Inventor
Perry B. Hackett
Karl J. Clark
Adam J. Dupuy
Stephen C. Ekker
David A. Largaespada
Zoltan Ivics
Zsuzsanna Izsvak
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Minnesota
Original Assignee
University of Minnesota
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Minnesota filed Critical University of Minnesota
Publication of EP1034258A2 publication Critical patent/EP1034258A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/60Vectors containing traps for, e.g. exons, promoters
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/90Vectors containing a transposable element
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2830/00Vector systems having a special element relevant for transcription
    • C12N2830/001Vector systems having a special element relevant for transcription controllable enhancer/promoter combination
    • C12N2830/005Vector systems having a special element relevant for transcription controllable enhancer/promoter combination repressible enhancer/promoter combination, e.g. KRAB
    • C12N2830/006Vector systems having a special element relevant for transcription controllable enhancer/promoter combination repressible enhancer/promoter combination, e.g. KRAB tet repressible
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2840/00Vectors comprising a special translation-regulating system
    • C12N2840/20Vectors comprising a special translation-regulating system translation of more than one cistron
    • C12N2840/203Vectors comprising a special translation-regulating system translation of more than one cistron having an IRES
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2840/00Vectors comprising a special translation-regulating system
    • C12N2840/44Vectors comprising a special translation-regulating system being a specific part of the splice mechanism, e.g. donor, acceptor

Definitions

  • This invention relates to methods for functional genomics including identifying expression control sequences, coding sequences and the function of coding sequences in the genomic DNA of a cell.
  • the invention also relates to transposons and transposases.
  • Transposons or transposable elements include a short piece of nucleic acid bounded by inverted repeat sequences. Active transposons encode enzymes that facilitate the insertion of the nucleic acid into DNA sequences.
  • DNA transposable elements transpose through a cut-and-paste mechanism; the element-encoded transposase catalyzes the excision of the transposon from its original location and promotes its reintegration elsewhere in the genome (Plasterk, 1996 Curr. Top. Microbiol. Immunol. 204, 125-143).
  • Autonomous members of a transposon family can express an active transposase, the tr ⁇ w-acting factor for transposition, and thus are capable of transposing on their own.
  • Nonautonomous elements have mutated transposase genes but may retain cw-acting DNA sequences. These cw-acting DNA sequences are also referred to as inverted terminal repeats.
  • Some inverted repeat sequences include one or more direct repeat sequences. These sequences usually are embedded in the terminal inverted repeats (IRs) of the elements, which are required for mobilization in the presence of a complementary transposase from another element or from itself.
  • DNA-transposons can be viewed as transitory components of genomes which, in order to avoid extinction, must find ways to establish themselves in a new host. Indeed, horizontal gene transmission between species is thought to be one of the important processes in the evolution of transposons (Lohe et al., 1995 Mol. Biol. Evol. 12, 62-72 and Kidwell, 1992. Curr. Opin. Genet. Dev. 2, 868-873).
  • Tc ⁇ s can be classified into three major types: zebrafish-, salmonid- and Xenopus TXr-type elements, of which the salmonid subfamily is probably the youngest and thus most recently active (Ivies et al., 1996, Proc. Natl. Acad. Sci. USA 93, 5008-5013).
  • examination of the phylogeny of salmonid Tc ⁇ s and that of their host species provides important clues about the ability of this particular subfamily of elements to invade and establish permanent residences in naive genomes through horizontal transfer, even over relatively large evolutionary distances.
  • Tc ⁇ s from teleost fish (Goodier and Davidson, 1994 J. Mol. Biol. 241, 26-34), including Tdrl in zebrafish (Izsvak et al., 1995 Mol. Gen. Genet. 247,
  • the interrupted genetic locus can be isolated based on the inserted genetic tag and the gene can be correlated with a phenotype, i.e., a physical result due to the loss of function of the interrupted gene.
  • Genetic tags called gene-traps have been devised wherein a marker gene is inserted randomly into a genome (reviewed in Mountford, P. S., et al. Trends
  • a variation of the gene trap is to employ a splice acceptor site followed by an internal ribosome entry site (IRES) placed in front of a marker gene.
  • IRS internal ribosome entry site
  • Splice acceptor sites provide signals to target the sequences following the splice acceptor site to be expressed as mRNA provided there is an intron upstream of the splice acceptor site (Padgett, T., et al., Ann. Rev. Biochem. J., 55, 1119-1150 (1988)).
  • An IRES allows ribosomal access to mRNA without a requirement for cap recognition and subsequent scanning to the initiator AUG (Pelletier, J.A., et al., Nature, 334, 320-325 (1988)). This expands the probability that the marker gene will be expressed when inserted into a gene.
  • a construct containing a splice acceptor site followed by an IRES is placed in front of a marker gene, it is possible to get expression of the marker gene even if the construct integrates in an intron or if it integrates out of frame with respect to the interrupted gene.
  • the splice acceptor increases the likelihood that the inserted sequences will be present in the resulting mRNA, and the IRES increases the likelihood of translation of the inserted sequences.
  • the encephalomycarditis virus (EMCV) IRES has been used for gene- trapping (von Melchner et al., J Virol. , 63, 3227-3233 ( 1989)), is well characterized (Jang, S. K., et al., Genes Dev 4, 1560-1572 (1990); Kaminski, A., et al., EMBO J 13, 1673-1681 (1994); Hellen, C. U., et al., Curr. Top. Microbiol. Immunol. 203, 31-63 (1995)) and has been shown to function efficiently in mammalian (Borman, A. M., et al., Nucleic Acids Res. 25, 925-32 (1997), Borman, A.
  • IRESs have been adapted into dicistronic vectors for the expression of two open reading frames. For instance, using an IRES in a dicistronic vector can result in more than 90% of transfected cells producing both the biological gene of interest and the selectable marker (Ghattas et al. Mol. Cell. Biol, ⁇ ,
  • Another strategy results in the "trapping" of sequences 3' of the inserted marker gene.
  • This entails the use of a retrovirus to deliver a marker gene that is placed between a promoter and a splice donor site (Zambrowicz, B.P., et al., Nature, 392, 608-611 (1998)).
  • Splice donor sites provide signals to target the RNA sequences encoding the marker gene to be spliced to the next downstream splice acceptor site.
  • the mRNA may contain a poly(A) tail and therefore be more stable and more efficiently translated. This expands the probability that the marker gene will be expressed only when inserted into a gene.
  • an enhancer-trap (Weber, F. , et al. , Cell, 36, 983-992 (1984)).
  • the marker gene is placed behind a weak promoter to give a minimal promoter-marker gene construct.
  • the minimal promoter by itself does not have the ability to direct high expression of the marker gene.
  • the enhancer-trap tag does not have to insert only within a coding sequence; it can be activated by insertion outside of the transcription unit.
  • An enhancer-trap may direct higher levels of expression than a gene-trap vector, which may increase the ability of a researcher to detect the insertion of the molecular tag.
  • DNA condensing reagents such as calcium phosphate, polyethylene glycol, and the like
  • lipid-containing reagents such as liposomes, multi-lamellar vesicles, and the like
  • virus-mediated strategies ballistic methods and microinjection and the like.
  • a nucleic acid fragment is provided that includes a nucleic acid positioned between at least two inverted repeats wherein the inverted repeats can bind to a transposase, preferably an SB protein.
  • the nucleic acid sequence includes a coding sequence.
  • the coding sequence is a detectable marker coding sequence that encodes a detectable marker or a selectable marker, such as green fluorescent protein, luciferase or neomycin.
  • the nucleic acid sequence optionally includes at least one of (i) a weak promoter, for instance a carp ⁇ - actin promoter, (ii) a splice acceptor site and (iii) an internal ribosome entry site, each of which is operably linked to the detectable marker coding sequence.
  • the nucleic acid sequence can include an analyte coding sequence located 5' of the detectable marker coding sequence and an internal ribosome entry site located therebetween, the internal ribosome entry site being operably linked to the detectable marker coding sequence.
  • the analyte coding sequence is operably linked to a promoter.
  • the present invention further provides a method for identifying an expression control region, such as an enhancer, in a cell.
  • a nucleic acid fragment of the invention containing a nucleic acid sequence that includes a detectable marker coding sequence is introduced into a cell, together with a source of transposase.
  • the detectable marker coding sequence is operably linked to a weak promoter, and the nucleic acid sequence is positioned between at least two inverted repeats, wherein the inverted repeats can bind to transposase.
  • the detectable marker or the selectable marker is then detected in the cell or its progeny containing the nucleic acid fragment, wherein the expression of the detectable marker or the selectable marker indicates that the nucleic acid fragment has integrated into the DNA of the cell or its progeny within a domain that contains an enhancer.
  • the transformed cell or its progeny can be evaluated for any changes in phenotype resulting from the insertion.
  • the DNA of the cell can be cleaved with a restriction endonuclease to yield one or more restriction fragments that contain at least a portion of the inverted repeat and genomic DNA of the cell that is adjacent to the inverted repeat.
  • the restriction fragment can be sequenced to determine the nucleotide sequence of the adjacent genomic DNA, and this sequence can then be compared with sequence information in a computer database.
  • Also provided by the invention is a method for identifying a genomic coding sequence in a cell.
  • a nucleic acid fragment of the invention containing a detectable marker coding sequence, a splice acceptor site and an internal ribosome entry site is introduced into along with a source of transposase.
  • the splice acceptor site and internal ribosome entry site are each operably linked to the detectable marker coding sequence, and the nucleic acid sequence is positioned between at least two inverted repeats wherein the inverted repeats can bind to the transposase.
  • the detectable marker or the selectable marker is detected in the cell or its progeny containing the nucleic acid fragment, wherein expression of the detectable marker or the selectable marker indicates that the nucleic acid fragment has integrated within a genomic coding sequence of the cell or its progeny.
  • the detectable marker or the selectable marker can be expressed spatially and temporally in the same way as the genomic coding sequence is expressed when not interrupted.
  • the cell or its progeny can be evaluated for any change in phenotype resulting from the insertion.
  • the DNA of the cell can be cleaved with a restriction endonuclease and the resulting restriction fragments sequenced in order to determine the location in the cell DNA into which the nucleic acid fragment has inserted.
  • Another aspect of the invention provides a method for identifying the function of an analyte coding sequence.
  • a nucleic acid fragment containing a detectable marker coding sequence, an analyte coding sequence located 5' of the detectable marker coding sequence, and an internal ribosome entry site located therebetween is introduced into a cell along with a source of transposase.
  • the internal ribosome entry site is operably linked to the detectable marker coding sequence, and the nucleic acid fragment is positioned between at least two inverted repeats that can bind to a transposase.
  • the detectable marker or the selectable marker is detected in the cell or its progeny containing the nucleic acid fragment, wherein the expression of the detectable marker or the selectable marker indicates that the nucleic acid fragment has integrated into the DNA of the cell and that the analyte coding sequence is expressed.
  • the cell or its progeny can be evaluated for any change in phenotype resulting from the insertion, wherein an altered phenotype indicates that the analyte coding sequence plays a function in the phenotype.
  • the DNA of the cell can be cleaved with a restriction endonuclease and the resulting restriction fragments sequenced in order to determine the location in the cell DNA into which the nucleic acid fragment has inserted
  • the invention also provides a gene transfer system to introduce a nucleic acid sequence into the DNA of a cell.
  • the system includes a nucleic acid fragment and a source of transposase, wherein the nucleic acid fragment includes a nucleic acid sequence that contains a coding sequence and is positioned between at least two inverted repeats that can bind the transposase.
  • the coding sequence is a detectable marker coding sequence that encodes a detectable marker or a selectable marker, including green fluorescent protein, luciferase or neomycin.
  • the nucleic acid sequence of the gene transfer system can include one or more of (i) a weak promoter, for instance a carp ⁇ -actin promoter, (ii) a splice acceptor site and (iii) an internal ribosome entry site, each being operably linked to the detectable marker coding sequence.
  • the nucleic acid sequence of the gene transfer system can include an analyte coding sequence located 5' of the detectable marker coding sequence and an internal ribosome entry site located therebetween, the internal ribosome entry site being operably linked to the detectable marker coding sequence.
  • the analyte coding sequence is operably linked to a promoter.
  • the nucleic acid fragment of the gene transfer system can by part of a plasmid or a recombinant viral vector.
  • the invention provides a method for producing a transgenic animal including introducing a nucleic acid fragment and a transposase source into a cell wherein the nucleic acid fragment includes a nucleic acid sequence that contains a heterologous coding sequence.
  • the nucleic acid sequence is positioned between at least two inverted repeats wherein the inverted repeats can bind to the transposase to yield a transgenic cell.
  • the cell is grown into a transgenic animal, and progeny can be derived from the transgenic animal.
  • a gene transfer system to introduce a nucleic acid sequence into the DNA of a fish, preferably a zebrafish, which includes a nucleic acid fragment containing a nucleic acid sequence that includes an internal ribosome entry site, wherein the nucleic acid fragment is capable of integrating into the genomic DNA of a fish.
  • the nucleic acid sequence of the gene transfer system can further include a first coding sequence located 3' to and operably linked to the internal ribosome entry site and a second coding sequence located 5' to both the first coding sequence and the internal ribosome entry site.
  • transgenic fish or fish cell preferably a zebrafish or zebrafish cell, that comprises a heterologous internal ribosome entry site.
  • Fig. 1 illustrates the molecular reconstruction of a salmonid Tel -like transposase gene.
  • Fig. 1(A) is a schematic map of a salmonid TcE.
  • the TcE includes inverted repeat/direct repeat (IR/DR) flanking sequences. Depicted on the nucleotide sequence between the inverted repeat/direct repeat sequences is the location of conserved domains in the transposase encoded by the nucleotide sequence.
  • the numbers 1 and 340 refer to the amino acids of the transposase encoded by the nucleotide sequence.
  • DNA-recognition a DNA- recognition binding domain
  • NLS a bipartite nuclear localization signal
  • the boxes marked D and E comprising the DDE domain (Doak, et al., Proc. Natl.
  • Fig. 1(B) provides an exemplary strategy for constructing an open reading frame for a salmonid transposase (SB1-SB3) and then systematically introducing amino acid replacements into this gene (SB4-SB10). Amino acid residues are shown using single letter code, typed black when different from the consensus. Positions within the transposase polypeptide that were modified by site-specific mutagenesis are indicated with arrows. Translational termination codons appear as asterisks, frameshift mutations are shown as #. Residues changed to the consensus are check-marked and typed in white italics. In the right margin, the results of various functional tests that were done at various stages of the reconstruction are indicated.
  • Fig. 2(A) is a double-stranded nucleic acid sequence encoding the SB protein (SEQ ID NO:3).
  • Fig. 2(B) is the amino acid sequence (SEQ ID NO:l) of an SB transposase. The major functional domains are highlighted; see the legend to Fig. IA for abbreviations.
  • Fig. 3 illustrates the DNA-binding activities of an N-terminal derivative (N123) of the SB transposase.
  • Fig. 3(A) provides the SDS-PAGE analysis illustrating the steps in the expression and purification of N 123.
  • Molecular weights in kDa are indicated on the right.
  • 3(B) illustrates the results of mobility-shift analysis studies to determine whether N123 bound to the inverted repeats offish transposons.
  • Fig. 4 provides the DNase I footprinting of deoxyribonucleoprotein complexes formed by N 123.
  • Fig. 4(A) is a photograph of a DNase I footprinting gel containing a 500-fold dilution of the N123 preparation shown in lane 4 of
  • Fig. 3A using the same transposon inverted repeat DNA probe as in Fig. 3B. Reactions were run in the absence (lane 3) or presence (lane 2) of N123. Maxam-Gilbert sequencing of purine bases in the same DNA was used as a marker (lane 1).
  • Fig 4(B) provides a sequence comparison of the salmonid transposase-binding sites illustrated in Panel A with the corresponding sequences in the zebrafish Tdrl elements.
  • Fig. 4(C) is a sequence comparison between the outer and internal transposase-binding sites in the SB transposons.
  • Fig. 5 illustrates the integration activity of SB in human HeLa cells.
  • Fig. 5(A) is a schematic illustrating the genetic assay strategy for SB-mediated transgene integration in cultured cells.
  • Fig. 5(B) demonstrates HeLa cell integration using Petri dishes of HeLa cells with stained colonies of G418- resistant HeLa cells that were transfected with different combinations of donor and helper plasmids.
  • Fig. 6 summarizes the results of transgene integration in human HeLa cells. Integration was dependent on the presence of an active SB transposase and a transgene flanked by transposon inverted repeats. Different combinations of the indicated donor and helper plasmids were cotransfected into cultured HeLa cells and one tenth of the cells, as compared to the experiments shown in Fig. 5, were plated under selection to count transformants. The efficiency of transgene integration was scored as the number of transformants surviving antibiotic selection. Numbers of transformants at right represent the numbers of
  • Fig. 7 illustrates the integration of neomycin resistance-marked transposons into the chromosomes of HeLa cells.
  • Fig. 7(A) illustrates the results of a southern hybridization of HeLa cell genomic DNA with neomycin- specific radiolabeled probe from 8 individual HeLa cell clones that had been cotransfected with pT/neo and pSBlO and survived G418 selection.
  • Genomic DNA was digested with the restriction enzymes Nhel, Xhol, BgHl, Spel and Xbal, enzymes that do not cut within the He ⁇ -marked transposon, prior to agarose gel electrophoresis and blotting.
  • Fig. 7 illustrates the integration of neomycin resistance-marked transposons into the chromosomes of HeLa cells.
  • Fig. 7(A) illustrates the results of a southern hybridization of HeLa cell genomic DNA with neomycin- specific radiolabeled probe from 8 individual HeLa cell clones that
  • FIG. 7(B) is a diagram of the junction sequences of T/neo transposons integrated into human genomic DNA.
  • the donor site is illustrated on top with plasmid vector sequences that originally flanked the transposon (black arrows) in pT/neo.
  • Human genomic DNA serving as target for transposon insertion is illustrated as a white box containing the base pairs TA, i.e., the site of DNA integration mediated by the SB transposase.
  • IR sequences and the flanking TA base pairs are uppercase, and the flanking genomic sequences are in lowercase.
  • Fig. 8 is a schematic demonstrating an interplasmid assay for excision and integration of a transposon.
  • the assay was used to evaluate transposase activity in zebrafish embryos.
  • Two plasmids plus an RNA encoding an SB transposase protein were coinjected into the one-cell zebrafish embryo.
  • One of the plasmids had an ampicillin resistance gene (Ap) flanked by IR/DR sequences (black arrows) recognizable by the SB transposase.
  • Amicillin resistance gene Ap
  • IR/DR sequences black arrows
  • the bacteria were grown on media containing ampicillin and kanamycin (Km) to select for bacteria harboring single plasmids containing both the Km and Ap antibiotic-resistance markers.
  • Km ampicillin and kanamycin
  • the plasmids from doubly resistant cells were examined to confirm that the Ap-transposon was excised and reintegrated into the Km target plasmid.
  • Ap-transposons that moved into either another indicator Ap-plasmid or into the zebrafish genome were not scored. Because the amount of DNA in injected plasmid was almost equal to that of the genome, the number of integrations of Ap-transposons into target plasmids should approximate the number of integrations into the genome.
  • Fig. 9 illustrates two preferred methods for using the gene transfer system of this invention.
  • the effect can be either a loss-of-function or a gain-of- function mutation.
  • Integrations as depicted with functional coding sequences in a transposon, typically result in gain-of-function gene transfer.
  • a subset are also a loss-of-function or gene inactivation event. Both types of activity can be exploited, for example, for gene discovery and/or functional genomics or gene delivery, i.e., human gene therapy.
  • Fig. 10 illustrates a preferred screening strategy using IRS-PCR (interspersed repetitive sequence polymerase chain reaction).
  • Fig. 10(A) illustrates a chromosomal region in the zebrafish genome containing the retroposon DANA (D), Tdrl transposons (T, and T 2 ), and the highly reiterated miniature inverted-repeat transposable element Angel (A).
  • D retroposon DANA
  • Tdrl transposons Tdrl transposons
  • T 2 T 2
  • A inverted-repeat transposable element Angel
  • the various amplified sequence tagged sites (STSs) are identified by lowercase letter (a through g), beginning with the longest detectable PCR product.
  • the products marked with an X are not produced in the PCR reaction if genomes with defective "X-DNA” are amplified. Elements separated by more than about 2000 base pairs (bp) and elements having the wrong orientation relative to each other are not amplified efficiently.
  • Fig. 10(B) is a schematic of the two sets of DNA amplification products from both genomes with (lane 1) and without (lane 2) the DANA element marked with an X. Note that bands "a” and "d” are missing when the marked DANA sequence is not present.
  • Fig. 11 illustrates a preferred method for using an expression control sequence-trap transposon vector.
  • Fig. 12 illustrates a preferred method for using a gene-trap transposon vector.
  • Fig. 12(A) is a gene-trap that contains a GFP operably linked to a splice acceptor site and an IRES.
  • Fig. 12(B) is a gene trap similar to Fig. 12(A), but encodes an activator which activates expression of a GFP coding sequence, elsewhere in the genome, thereby amplifying the level of GFP expression over what it would be were the GFP coding sequence in the gene trap vector.
  • I intron
  • E exon.
  • Fig. 13 illustrates the dicistronic vectors pBeL, phBeL, and pBL.
  • the promoters are indicated by the large arrows on the left; the smaller raised arrows indicate the transcriptional initiation sites for the dicisctronic mRNAs.
  • the IRES is depicted by a set of stem-loops. Changes in the control vectors phBeL and pBL are circled.
  • Fig. 14 The expression levels of ⁇ -galactosidase and luciferase are shown for embryos at 6 hours after injection with either pBeL, phBeL, and pBL mRNA. The error bars indicate 95% confidence intervals. Abbreviation: RLU, relative light units.
  • Fig. 15 illustrates a strategy for using dicistronic coding sequence expression transposon vectors.
  • Fig. 16 illustrates an inverse PCR strategy to identify genomic DNA adjacent to an inserted nucleic acid fragment.
  • the present invention relates to novel transposases and the transposons that are used to introduce nucleic acid sequences into the DNA of a cell.
  • a transposase is an enzyme that is capable of binding to DNA at regions of DNA termed inverted repeats.
  • a transposon contains two inverted repeats that flank an intervening nucleic acid sequence, i.e., there is an inverted repeat 5' to and 3' to the intervening nucleic acid sequence.
  • Inverted repeats of an SB transposon can include two direct repeats and preferably include at least one direct repeat.
  • the transposase binds to recognition sites in the inverted repeats and catalyzes the incorporation of the transposon into DNA.
  • Transposons are mobile, in that they can move from one position on DNA to a second position on DNA in the presence of a transposase.
  • DNA-transposons including members of the Tel /mariner superfamily, are ancient residents of vertebrate genomes (Radice et al., 1994 Mol. Gen. Genet. , 244, 606-612; Smit and Riggs, 1996 Proc. Natl. Acad. Sci. USA 93,
  • transposase pseudogenes from a single organism may simply reflect the mutations that had occurred during vertical inactivation that have subsequently been fixed in the genome as a result of amplification of the mutated element. For instance, most Tdrl elements isolated from zebrafish contain a conserved, 350-bp deletion in the transposase gene (Izsvak et al., 1995, supra). Therefore, their consensus is expected to encode an inactive element. In the present invention, because independent fixation of the same mutation in different species is unlikely, a consensus from inactive elements of the same subfamily of transposons from several organisms is derived to provide a sequence for an active transposon.
  • Example 1 describes the methods that were used to reconstruct a transposase gene of the salmonid subfamily offish elements using the accumulated phylogenetic data. This analysis is provided in the EMBL database as DS30090 from FTP.EBI.AC.AK in directory/pub/databases/embl/align and the product of this analysis was a consensus sequence for an inactive SB protein. All the elements that were examined were inactive due to deletions and other mutations.
  • a salmonid transposase gene of the SB transposase family was created using PCR- mutagenesis through the creation of 10 constructs as provided in Fig. 1 and described in Example 1.
  • the SB protein typically recognizes nucleotide sequences located within inverted repeats on a nucleic acid fragment and each inverted repeat includes at least one direct repeat.
  • the gene transfer system of this aspect of the invention therefore, comprises two components: a transposase and a cloned, nonautonomous (i.e., non-self inserting) salmonid-type element or transposon (referred to herein as a nucleic acid fragment having at least two inverted repeats) that carries the inverted repeats of the transposon substrate DNA. When put together these two components provide active transposon activity.
  • the transposase binds to the direct repeats in the inverted repeats and promotes integration of the intervening nucleic acid sequence into DNA of a cell including chromosomes and extra chromosomal DNA offish as well as mammalian cells.
  • the transposase that was reconstructed using the methods of Example 1 represents one member of a family of proteins that can bind to the inverted repeat region of a transposon to effect integration of the intervening nucleic acid sequence into DNA, preferably DNA in a cell.
  • One example of the family of proteins of this invention is provided as SEQ ID NO:l (see Fig. 2B). This family of proteins is referred to herein as SB proteins.
  • the proteins of this invention are provided as a schematic in Fig. 1 A.
  • the proteins include, from the amino-terminus moving to the carboxy-terminus, a paired-like domain with leucine zipper, one or more nuclear localizing domains (NLS) domains and a catalytic domain including a DD(34)E box (i.e., a catalytic domain containing two invariable aspartic acid residues, D(153) and D(244), and a glutamic acid residue, E(279), the latter two separated by 43 amino acids) and a glycine-rich box as detailed in an example in Fig. 2.
  • the SB family of proteins includes the protein having the amino acid sequence of SEQ ID NO: 1.
  • a member of the SB family of proteins also includes proteins with an amino acid sequence that shares at least an 80% amino acid identity to SEQ ID NO:l .
  • Amino acid identity is defined in the context of a homology comparison between the member of the SB family of proteins and SEQ ID NO: 1. The two amino acid sequences are aligned in a way that maximizes the number of amino acids that they have in common along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to maximize the number of shared amino acids, although the amino acids in each sequence must nonetheless remain in their proper order.
  • the percentage amino acid identity is the higher of the following two numbers: (a) the number of amino acids that the two polypeptides have in common within the alignment, divided by the number of amino acids in the member of the SB family of proteins, multiplied by 100; or (b) the number of amino acids that the two polypeptides have in common within the alignment, divided by the number of amino acids in the reference SB protein, i.e., SEQ ID NO: 1, multiplied by 100.
  • Proteins of the SB family are transposases, that is, they are able to catalyze the integration of nucleic acid into DNA of a cell.
  • the proteins of this invention are able to bind to the inverted repeat sequences of SEQ ID NOs:4-5 and direct repeat sequences (SEQ ID NOs:6-9) from a transposon as well as a consensus direct repeat sequence (SEQ ID NO: 10).
  • the SB proteins preferably have a molecular weight range of about 35 kD to about 40 kD on about a 10% SDS-polyacrylamide gel.
  • chromosomal fragments were sequenced and identified by their homology to the zebrafish transposon-like sequence Tdrl, from eleven species offish (Ivies et al., 1996, supra). Next these and other homologous sequences were compiled and aligned. The sequences were identified in either GenBank or the EMBL database.
  • amino acid residues described herein employ either the single letter amino acid designator or the three-letter abbreviation. Abbreviations used herein are in keeping with the standard polypeptide nomenclature. All amino acid residue sequences are represented herein by formulae with left-to-right orientation in the conventional direction of amino-terminus to carboxy-terminus.
  • amino acid sequences encoding the transposases of this invention have been described, there are a variety of conservative changes that can be made to the amino acid sequence of the SB protein without altering SB activity. These changes are termed conservative mutations, that is, an amino acid belonging to a grouping of amino acids having a particular size or characteristic can be substituted for another amino acid, particularly in regions of the protein that are not associated with catalytic activity or DNA binding activity, for example.
  • Other amino acid sequences of the SB protein include amino acid sequences containing conservative changes that do not significantly alter the activity or binding characteristics of the resulting protein. Substitutes for an amino acid sequence may be selected from other members of the class to which the amino acid belongs.
  • nonpolar (hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan, and tyrosine.
  • the polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine and glutamine. The positively charged
  • (basic) amino acids include arginine, lysine and histidine.
  • the negatively charged (acidic) amino acids include aspartic acid and glutamic acid. Such alterations are not expected to substantially affect apparent molecular weight as determined by polyacrylamide gel electrophoresis or isoelectric point. Particularly preferred conservative substitutions include, but are not limited to,
  • Lys for Arg and vice versa to maintain a positive charge
  • Glu for Asp and vice versa to maintain a negative charge
  • Ser for Thr so that a free -OH is maintained
  • Gin for Asn to maintain a free NH 2 .
  • the SB protein has catalytic activity to mediate the transposition of a nucleic acid fragment containing recognition sites that are recognized by the SB protein.
  • the source of the SB protein can be the protein introduced into a cell, or a nucleic acid introduced into the cell.
  • the SB protein can be introduced into the cell as ribonucleic acid, including mRNA; as DNA present in the cell as extrachromosomal DNA including, but not limited to, episomal DNA, as plasmid DNA, or as viral nucleic acid.
  • an mRNA typically includes a guanine added to the 5' end of the mRNA to form a 5' cap.
  • the 5' cap region can be methylated at several locations as described by Lewin, B., Genes VI, Oxford University Press, pp. 171-172 (1997).
  • An mRNA also typically includes a sequence of polyadenylic acid (i.e., a poly(A) tail) at the 3' end of the mRNA.
  • DNA encoding the SB protein can be stably integrated into the genome of the cell for constitutive or inducible expression.
  • the SB encoding sequence is preferably operably linked to a promoter.
  • promoters There are a variety of promoters that could be used including, but not limited to, constitutive promoters, tissue- specific promoters, inducible promoters, and the like. Promoters are regulatory signals that bind RNA polymerase in a cell to initiate transcription of a downstream (3' direction) coding sequence.
  • a DNA sequence is operably linked to an expression control sequence, such as a promoter when the expression control sequence controls and regulates the transcription and translation of that DNA sequence.
  • the term "operably linked" includes having an appropriate start signal (e.g., ATG) in front of the DNA sequence to be expressed and maintaining the correct reading frame to permit expression of the DNA sequence under the control of the expression control sequence to yield production of the desired protein product.
  • nucleic acid sequence encoding the SB protein is provided as SEQ ID NO:3.
  • DNA or RNA sequences encoding an SB protein having the same amino acid sequence as an SB protein such as SEQ ID NO:3, but which take advantage of the degeneracy of the three letter codons used to specify a particular amino acid.
  • RNA codons can be used interchangeably to code for each specific amino acid: Phenylalanine (Phe or F) UUUorUUC Leucine (Leu or L) UUA, UUG, CUU, CUC, CUA orCUG Isoleucine (lie or I) AUU,AUCorAUA Methionine (Met or M) AUG Valine (Val or V) GUU, GUC, GUA, GUG Serine (Ser or S) UCU, UCC, UCA, UCG, AGU, AGC Proline (Pro or P) CCU, CCC, CCA, CCG Threonine (Thr or T) ACU, ACC, ACA, ACG Alanine (Ala or A) GCU, GCG, GCA, GCC Tyrosine (Tyr or Y) UAUorUAC Histidine (His or H) CAU
  • antibodies directed to an SB protein of this invention are also contemplated in this invention.
  • An "antibody” for purposes of this invention is any immunoglobulin, including antibodies and fragments thereof that specifically binds to an SB protein.
  • the antibodies can be polyclonal, monoclonal and chimeric antibodies.
  • Various methods are known in the art that can be used for the production of polyclonal or monoclonal antibodies to SB protein. See, for example, Antibodies: A Laboratory Manual, Harlow and Lane, eds., Cold Spring Harbor Laboratory Press: Cold Spring Harbor, New York (1988).
  • Nucleic acid encoding the SB protein can be introduced into a cell as a nucleic acid vector such as a plasmid, or as a gene expression vector, including a viral vector.
  • the nucleic acid can be circular or linear. Methods for manipulating DNA and protein are known in the art and are explained in detail in the literature such as Sambrook et al, (1989) Molecular Cloning: A Laboratory Manual., Cold Spring Harbor Laboratory Press or Ausubel, R.M., ed. (1994). Current Protocols in Molecular Biology.
  • a vector, as used herein, refers to a plasmid, a viral vector or a cosmid that can incorporate nucleic acid encoding the SB protein or the nucleic acid fragment of this invention.
  • the term "coding sequence” or "open reading frame” refers to a region of nucleic acid that can be transcribed and/or translated into a polypeptide in vivo when placed under the control of the appropriate regulatory sequences.
  • a nucleic acid fragment sometimes referred to as a transposon or transposon element, that includes a nucleic acid sequence positioned between at least two inverted repeats.
  • Each inverted repeat preferably includes at least two direct repeats (hence, the name IR/DR).
  • a direct repeat is typically between about 25 and about 35 base pairs in length, preferably about 29-31 base pairs in length.
  • an inverted repeat can contain only one direct "repeat," in which event it is not actually a "repeat” but is nonetheless a nucleotide seqeunce having at least about 80% identity to a consensus direct repeat sequence as described more fully below.
  • the transposon element is a linear nucleic acid fragment
  • an inverted repeat on the 5' or "left" side of a nucleic acid fragment of this embodiment typically comprises a direct repeat (i.e., a left outer repeat), an intervening region, and a second direct repeat (i.e., a left inner repeat).
  • An inverted repeat on the 3' or "right" side of a nucleic acid fragment of this embodiment comprises a direct repeat (i.e., a right inner repeat), an intervening region, and a second direct repeat (i.e., a right outer repeat).
  • the direct repeats in the 5' inverted repeat of the nucleic acid fragment are in a reverse orientation compared to the direct repeats in the 3 1 inverted repeat of the nucleic acid fragment.
  • the intervening region within an inverted repeat is generally at least about 150 base pairs in length, preferably at least about 160 base pairs in length.
  • the intervening region is preferably no greater than about 200 base pairs in length, more preferably no greater than about 180 base pairs in length.
  • the nucleotide sequence of the intervening region of one inverted repeat may or may not be similar to the nucleotide sequence of an intervening region in another inverted repeat.
  • transposons have perfect inverted repeats, whereas the inverted repeats that bind SB protein generally have at least about 80% to identity to a consensus direct repeat, preferably about 90% identity to a consensus direct repeat.
  • a preferred consensus direct repeat is 5'-
  • CAKTGRGTCRGAAGTTTACATACACTTAAG-3' (SEQ ID NO: 10) where K is G or T, and R is G or A.
  • the presumed core binding site of SB protein is nucleotides 4 through 22 of SEQ ID NO: 10.
  • Nucleotide identity is defined in the context of a homology comparison between a direct repeat and SEQ ID NO: 10. The two nucleotide sequences are aligned in a way that maximizes the number of nucleotides that they have in common along the lengths of their sequences; gaps in either or both sequences are permitted in making the alignment in order to maximize the number of shared nucleotides, although the nucleotides in each sequence must nonetheless remain in their proper order.
  • the percentage nucleotide identity is the higher of the following two numbers: (a) the number of nucleotides that the two sequences have in common within the alignment, divided by the number of nucleotides in the direct repeat, multiplied by 100; or (b) the number of nucleotides that the two sequences have in common within the alignment, divided by the number of nucleotides in the reference direct repeat, i.e., SEQ ID NO: 10, multiplied by 100.
  • Examples of direct repeat sequences that bind to SB protein include: a left outer repeat 5'- GTTGAAGTCGGAAGTTTACATACACTTAG-3' (SEQ ID NO:6); a left inner repeat 5'-CAGTGGGTCAGAAGTTTACATACACTAAGG-3' (SEQ ID NO: 7); a right inner repeat 5'- TTAACTCACATACAATTGAAGACTGGGTGAC-3' (SEQ ID NO:8); and a right outer repeat 5'-GATTCCACATACATTTGAAGGCTAAGTTGA-3* (SEQ ID NO:9).
  • the right side direct repeats (SEQ ID NOs:8 and 9) are depicted as they would appear on the transposon, i.e., the nucleotides are in a reverse complement order when compared for homology to the nucleotide sequence of the left side repeats (SEQ ID NOs:5 and 6).
  • the direct repeat sequence includes at least the following sequence: ACATACAC (SEQ ID NO:l 1).
  • One preferred inverted repeat sequence of this invention is SEQ ID NO:4
  • the inverted repeat contains the poly(A) signal AAT AAA at nucleotides 104-109.
  • This poly(A) signal can be utilized by a coding sequence present in the nucleic acid fragment to result in addition of a poly(A) tail to an mRNA.
  • the addition of a poly(A) tail to an mRNA typically results in increased stability of that mRNA relative to the same mRNA without the poly(A) tail.
  • the inverted repeat (SEQ ID NO:5) is present on the 3' or "right side" of a nucleic acid fragment that comprises two direct repeats in each inverted repeat sequence.
  • the direct repeats are preferably the portion of the inverted repeat that bind to the SB protein to permit insertion and integration of the nucleic acid fragment into the cell.
  • the site of DNA integration for the SB proteins occurs at TA base pairs (see Figure 7B).
  • the inverted repeats flank a nucleic acid sequence which is inserted into the DNA in a cell.
  • the nucleic acid sequence can include all or part of an open reading frame of a gene (i.e., that part of a gene encoding protein), one or more expression control sequences (i.e., regulatory regions in nucleic acid) alone or together with all or part of an open reading frame.
  • Preferred expression control sequences include, but are not limited to promoters, enhancers, border control elements, locus-control regions or silencers.
  • the nucleic acid sequence comprises a promoter operably linked to at least a portion of an open reading frame.
  • the combination of the nucleic acid fragment of this invention comprising a nucleic acid sequence positioned between at least two inverted repeats wherein the inverted repeats can bind to an SB protein and wherein the nucleic acid fragment is capable of integrating into
  • DNA in a cell in combination with an SB protein (or nucleic acid encoding the SB protein to deliver SB protein to a cell) results in the integration of the nucleic acid sequence into the cell.
  • an SB protein or nucleic acid encoding the SB protein to deliver SB protein to a cell
  • the nucleic acid fragment of this invention it is possible for the nucleic acid fragment of this invention to be incorporated into DNA in a cell through non- homologous recombination through a variety of as yet undefined, but reproducible mechanisms. In either event the nucleic acid fragment can be used for gene transfer.
  • the SB family of proteins mediates integration in a variety of cell types and a variety of species.
  • the SB protein facilitates integration of the nucleic acid fragment of this invention with inverted repeats into both pluripotent (i.e., a cell whose descendants can differentiate into several restricted cell types, such as hematopoietic stem cells or other stem cells) and totipotent cells (i.e., a cell whose descendants can become any cell type in an organism, e.g., embryonic stem cells).
  • pluripotent i.e., a cell whose descendants can differentiate into several restricted cell types, such as hematopoietic stem cells or other stem cells
  • totipotent cells i.e., a cell whose descendants can become any cell type in an organism, e.g., embryonic stem cells.
  • the gene transfer system of this invention can be used in a variety of cells including animal cells, bacteria, fungi (e.g., yeast) or plants. Animal cells can be vertebrate or invertebrate
  • Cells such as oocytes, eggs, and one or more cells of an embryo are also considered in this invention.
  • Mature cells from a variety of organs or tissues can receive the nucleic acid fragment of this invention separately, alone, or together with the SB protein or nucleic acid encoding the SB protein.
  • Cells receiving the nucleic acid fragment or the SB protein and capable of receiving the nucleic acid fragment into the DNA of that cell include, but are not limited to, lymphocytes, hepatocytes, neural cells, muscle cells, a variety of blood cells, and a variety of cells of an organism.
  • Example 4 provides methods for determining whether a particular cell is amenable to gene transfer using this invention.
  • the cells can be obtained from vertebrates or invertebrates.
  • Preferred invertebrates include crustaceans or mollusks including, but not limited to shrimp, scallops, lobster, clams, or oysters.
  • Vertebrate cells also incorporate the nucleic acid fragment of this invention in the presence of the SB protein.
  • Cells from fish, birds and other animals can be used, as can cells from mammals including, but not limited to, rodents, such as rats or mice, ungulates, such as cows or goats, sheep, swine or cells from a human.
  • the DNA of a cell that acts as a recipient of the nucleic acid fragment of this invention includes any DNA in contact with the nucleic acid fragment of this invention in the presence of an SB protein.
  • the DNA can be part of the cell genome or it can be extrachromosomal, such as an episome, a plasmid, a circular or linear DNA fragment.
  • Targets for integration are double-stranded DNA.
  • nucleic acid fragment of this invention including a nucleic acid sequence positioned between at least two inverted repeats wherein the inverted repeats can bind to an SB protein and wherein the nucleic acid fragment is capable of integrating into DNA of a cell and a transposase or nucleic acid encoding a transposase, wherein the transposase is an SB protein, including SB proteins that include an amino acid sequence that is at least about
  • the SB protein comprises the amino acid sequence of SEQ ID NO:l and in another preferred embodiment the DNA encoding the transposase can hybridize to the DNA of SEQ ID NO:3 under the following hybridization conditions: in 30%
  • Gene transfer vectors for gene therapy can be broadly classified as viral vectors or non- viral vectors.
  • the use of the nucleic acid fragment of this invention as a transposon in combination with an SB protein represents a tremendous advancement in the field of non- viral DNA-mediated gene transfer.
  • Non- viral vectors Up to the present time, viral vectors have been found to be more efficient at introducing and expressing genes in cells. There are several reasons why non- viral gene transfer is superior to virus-mediated gene transfer for the development of new gene therapies. For example, adapting viruses as agents for gene therapy restricts genetic design to the constraints of that virus genome in terms of size, structure and regulation of expression. Non- viral vectors are generated largely from synthetic starting materials and are therefore more easily manufactured than viral vectors. Non-viral reagents are less likely to be immunogenic than viral agents making repeat administration possible. Non-viral vectors are more stable than viral vectors and therefore better suited for pharmaceutical formulation and application than are viral vectors.
  • the SB protein can be introduced into the cell as a protein or as nucleic acid encoding the protein.
  • the nucleic acid encoding the protein is RNA and in another, the nucleic acid is DNA.
  • nucleic acid encoding the SB protein can be incorporated into a cell through a viral vector, anionic or cationic lipid, or other standard transfection mechanisms including electroporation, particle bombardment or microinjection used for eukaryotic cells.
  • the nucleic acid fragment of this invention can be introduced into the same cell.
  • the nucleic acid fragment can be introduced into the cell as a linear fragment or as a circularized fragment, preferably as a plasmid or as recombinant viral DNA.
  • the nucleic acid sequence comprises at least a portion of an open reading frame to produce an amino-acid containing product.
  • the nucleic acid sequence encodes at least one protein and includes at least one promoter selected to direct expression of the open reading frame or coding region of the nucleic acid sequence.
  • the protein encoded by the nucleic acid sequence can be any of a variety of recombinant proteins new or known in the art.
  • the protein encoded by the nucleic acid sequence is a marker protein such as GFP, chloramphenicol acetyltransferase (CAT), ⁇ -galactosidase (lacZ), and luciferase (LUC).
  • the protein encoded by the nucleic acid is a growth hormone, for example to promote growth in a transgenic animal, or insulin-like growth factors (IGFs).
  • the protein encoded by the nucleic acid fragment is a product for isolation from a cell.
  • Transgenic animals as bioreactors are known. Protein can be produced in quantity in milk, urine, blood or eggs. Promoters are known that promote expression in milk, urine, blood or eggs and these include, but are not limited to, casein promoter, the mouse urinary protein promoter, ⁇ -globin promoter and the ovalbumin promoter respectively. Recombinant growth hormone, recombinant insulin, and a variety of other recombinant proteins have been produced using other methods for producing protein in a cell. Nucleic acid encoding these or other proteins can be incorporated into the nucleic acid fragment of this invention and introduced into a cell.
  • Transgenic zebrafish were made, as described in Example 6. The system has also been tested through the introduction of the nucleic acid with a marker protein into mouse embryonic stem cells (ES) and it is known that these cells can be used to produce transgenic mice (A. Bradley et al., Nature, 309, 255-256
  • the first is classical breeding, which has worked well for land animals, but it takes decades to make major changes. Controlled breeding, growth rates in coho salmon (Oncorhynchus kisutch) increased 60% over four generations and body weights of two strains of channel catfish (Ictalurus punctatus) were increased 21 to 29% over three generations.
  • the second method is genetic engineering, a selective process by which genes are introduced into the chromosomes of animals or plants to give these organisms a new trait or characteristic, like improved growth or greater resistance to disease. The results of genetic engineering have exceeded those of breeding in some cases.
  • the advantage of genetic engineering in fish is that an organism can be altered directly in a very short periods of time if the appropriate gene has been identified.
  • the disadvantage of genetic engineering in fish is that few of the many genes that are involved in growth and development have been identified and the interactions of their protein products is poorly understood. Procedures for genetic manipulation are lacking in many economically important animals.
  • the present invention provides an efficient system for performing insertional mutagenesis (gene tagging) and efficient procedures for producing transgenic animals.
  • transgenic DNA is not efficiently incorporated into chromosomes. Only about one in a million of the foreign DNA molecules integrates into the cellular genome, generally several cleavage cycles into development. Consequently, most transgenic animals are mosaic. As a result, animals raised from embryos into which transgenic DNA has been delivered must be cultured until gametes can be assayed for the presence of integrated foreign DNA. Many transgenic animals fail to express the transgene due to position effects. A simple, reliable procedure that directs early integration of exogenous DNA into the chromosomes of animals at the one-cell stage is needed. The present system helps to fill this need, as described in more detail below.
  • the transposon system of this invention has applications to many areas of biotechnology. Development of transposable elements for vectors in animals permits the following: 1) efficient insertion of genetic material into animal chromosomes using the methods given in this application. 2) identification, isolation, and characterization of genes involved with growth and development through the use of transposons as insertional mutagens (e.g., see Kaiser et al, 1995, "Eukaryotic transposable elements as tools to study gene structure and function.” In Mobile Genetic Elements, IRL Press, pp. 69-100). 3) identification, isolation and characterization of transcriptional regulatory sequences controlling growth and development. 4) use of marker constructs for quantitative trait loci (QTL) analysis.
  • QTL quantitative trait loci
  • the system of this invention can be used to produce sterile transgenic fish. Broodstock with inactivated genes could be mated to produce sterile offspring for either biological containment or for maximizing growth rates in aquacultured fish.
  • the nucleic acid fragment is modified to incorporate a gene to provide a gene therapy to a cell.
  • the gene is placed under the control of a tissue specific promoter or of a ubiquitous promoter or one or more other expression control sequences for the expression of a gene in a cell in need of that gene.
  • genes are being tested for a variety of gene therapies including, but not limited to, the CFTR gene for cystic fibrosis, adenosine deaminase (ADA) for immune system disorders, factor LX globins and interleukin-2 (IL-2) genes for blood cell diseases, alpha- 1- antitr psin for lung disease, and tumor necrosis factors (TNFs), phenylalanine/hydroxylase for PKU (phenylketouria), and multiple drug resistance (MDR) proteins for cancer therapies.
  • CFTR gene for cystic fibrosis
  • ADA adenosine deaminase
  • IL-2 interleukin-2
  • TNFs tumor necrosis factors
  • PKU phenylalanine/hydroxylase
  • MDR multiple drug resistance
  • the gene transfer system of this invention can be used as part of a process for working with or for screening a library of recombinant sequences, for example, to assess the function of the sequences or to screen for protein expression, or to assess the effect of a particular protein or a particular expression control sequence on a particular cell type.
  • a library of recombinant sequences such as the product of a combinatorial library or the product of gene shuffling, both techniques now known in the art and not the focus of this invention, can be incorporated into the nucleic acid fragment of this invention to produce a library of nucleic acid fragments with varying nucleic acid sequences positioned between constant inverted repeat sequences.
  • the library is then introduced into cells together with the SB protein as discussed above.
  • An advantage of this system is that it is not limited to a great extent by the size of the intervening nucleic acid sequence positioned between the inverted repeats.
  • the SB protein has been used to incorporate transposons ranging from 1.3 kilobases (kb) to about 5.0 kb and the mariner transposase has mobilized transposons up to about 13 kb. There is no known limit on the size of the nucleic acid sequence that can be incorporated into DNA of a cell using the SB protein.
  • the two-part SB transposon system can be delivered to cells via viruses, including retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpesviruses, and others.
  • viruses including retroviruses (including lentiviruses), adenoviruses, adeno-associated viruses, herpesviruses, and others.
  • retroviruses including lentiviruses
  • adenoviruses adeno-associated viruses
  • herpesviruses herpesviruses
  • transposon portion containing the transgene of interest flanked by the inverted terminal repeats (LRs) and the gene encoding the transposase flanked by the inverted terminal repeats (LRs) and the gene encoding the transposase.
  • LRs inverted terminal repeats
  • both the transposon and the transposase gene can be contained together on the same recombinant viral genome; a single infection
  • the transposase and the transposon can be delivered separately by a combination of viruses and/or non- viral systems such as lipid-containing reagents.
  • either the transposon and/or the transposase gene can be delivered by a recombinant virus.
  • the expressed transposase gene directs liberation of the transposon from its carrier DNA (viral genome) for integration into chromosomal DNA.
  • the invention also relates to methods for using the gene transfer system of this invention.
  • the invention relates to the introduction of a nucleic acid fragment comprising a nucleic acid sequence positioned between at least two inverted repeats into a cell.
  • efficient incorporation of the nucleic acid fragment into the DNA of a cell occurs when the cell also contains an SB protein.
  • the SB protein can be provided to the cell as SB protein or as nucleic acid encoding the SB protein.
  • Nucleic acid encoding the SB protein can take the form of RNA or DNA.
  • the protein can be introduced into the cell alone or in a vector, such as a plasmid or a viral vector.
  • the nucleic acid encoding the SB protein can be stably or transiently incorporated into the genome of the cell to facilitate temporary or prolonged expression of the SB protein in the cell.
  • promoters or other expression control sequences can be operably linked with the nucleic acid encoding the SB protein to regulate expression of the protein in a quantitative or in a tissue-specific manner.
  • the SB protein is a member of a family of SB proteins preferably having at least an 80% amino acid sequence identity to SEQ ID NO:l and more preferably at least a 90% amino acid sequence identity to SEQ ID NO: 1.
  • the SB protein contains a DNA- binding domain, a catalytic domain (having transposase activity) and an NLS signal.
  • the nucleic acid fragment of this invention is introduced into one or more cells using any of a variety of techniques known in the art such as, but not limited to, microinjection, combining the nucleic acid fragment with lipid vesicles, such as anionic or cationic lipid vesicles, particle bombardment, electroporation, microinjection, DNA condensing reagents (e.g., calcium phosphate, polylysine or polyethyleneimine) or incorporating the nucleic acid fragment into a viral vector and contacting the viral vector with the cell.
  • the viral vector can include any of a variety of viral vectors known in the art including viral vectors selected from the group consisting of a retroviral vector, an adeno virus vector or an adeno-associated viral vector.
  • the gene transfer system of this invention can readily be used to produce transgenic animals that carry a particular marker or express a particular protein in one or more cells of the animal.
  • Methods for producing transgenic animals are known in the art and the incorporation of the gene transfer system of this invention into these techniques does not require undue experimentation.
  • the examples provided below teach methods for creating transgenic fish by microinjecting the gene transfer system into a cell of an embryo of the fish. Further, the examples also describe a method for introducing the gene transfer system into mouse embryonic stem cells. Methods for producing transgenic mice from embryonic stem cells are well known in the art.
  • nucleic acid fragments of this invention in combination with the SB protein or nucleic acid encoding the SB protein is a powerful tool for germline transformation, for the production of transgenic animals, as methods for introducing nucleic acid into DNA in a cell, for insertional mutagenesis, and for gene-tagging in a variety of species.
  • Two strategies are diagramed in Figure 9. Due to their inherent ability to move from one chromosomal location to another within and between genomes, transposable elements have been exploited as genetic vectors for genetic manipulations in several organisms.
  • Transposon- tagging is a technique in which transposons are mobilized to "hop" into genes, thereby inactivating them by insertional mutagenesis.
  • the inactivated genes are "tagged" by the transposable element which then can be used to recover the mutated allele.
  • the ability of the human and other genome projects to acquire gene sequence data has outpaced the ability of scientists to ascribe biological function to the new genes. Therefore, the present invention provides an efficient method for introducing a tag into the genome of a cell.
  • nucleic acid fragment functions as a tag. Primers designed to sequence the genomic DNA flanking the nucleic acid fragment of this invention can be used to obtain sequence information about the disrupted gene.
  • the invention provides a method for mobilizing a nucleic acid sequence in a cell.
  • the nucleic acid fragment of this invention is incorporated into DNA in a cell, as provided in the discussion above.
  • Additional SB protein or nucleic acid encoding the SB protein is introduced into the cell and the protein is able to mobilize (i.e. move) the nucleic acid fragment from a first position within the DNA of the cell to a second position within the DNA of the cell.
  • the DNA of the cell can be chromosomal DNA or extrachromosomal DNA.
  • the term "genomic DNA" is used herein to include both chromosomal DNA and extrachromosomal DNA.
  • the method permits the movement of the nucleic acid fragment from one location in the genome to another location in the genome, or for example, from a plasmid in a cell to the genome of that cell.
  • transposable elements disclosed herein can further increase the efficiency of insertion of genetic material into animal chromosomes so as to allow the identification, isolation, and characterization of genes involved with growth, development and disease, and the identification, isolation and characterization of transcriptional regulatory sequences controlling growth, development and disease.
  • Examples of the types of modifications that can be made to the transposable elements disclosed herein include the construction of transposable elements taking the form of expression control sequence-trap transposon vectors, gene-trap transposon vectors, and dicistronic gene expression transposon vectors.
  • the nucleic acid sequence that is flanked by the inverted repeats comprises at least one coding sequence.
  • the coding sequence encodes a detectable and/or selectable marker.
  • a coding sequence that encodes a detectable and/or selectable marker will be referred to as a "detectable marker coding sequence,” however it is to be understood that this coding sequence can encode any type of detectable or selectable marker, or a protein that activates a detectable or selectable marker supplied in trans or in cis.
  • detectable marker is neomycin.
  • Preferred detectable markers include luciferase, ⁇ -galactosidase, fluorescent proteins, chloramphenicol acetyl transferase (CAT) and other exogenous proteins detectable by their fluorescence, enzymatic activity or immunological properties.
  • Non-limiting examples of fluorescent proteins include GFP, Yellow Fluorescent Protein and Blue Fluorescent Protein.
  • a detectable marker coding sequence is operably linked to a poly(A) signal that is present 3' to the detectable marker coding sequence.
  • the cells into which the nucleic acid fragment is introduced preferably also contain a detectable marker coding sequence operably linked to a promoter that can be activated by an activator protein.
  • a protein encoded by a detectable marker coding sequence of an expression control sequence-trap vector or a gene-trap vector is the tr- y-acting activator protein tTA (tetracycline controlled transactivator) (Clontech, Palo Alto, CA), which interacts with a tetracycline response element to which a detectable marker coding sequence is operably linked.
  • tTA tetracycline controlled transactivator
  • the intervening nucleic acid sequence of the nucleic acid fragment of the invention further comprises at least one expression control sequence that is operably linked to the detectable marker coding sequence.
  • the expression control sequence comprises a promoter, more preferably a weak promoter.
  • weak promoter or “minimal promoter” refer to a promoter that by itself does not have the ability to direct high expression of the coding sequence to which it is operably linked.
  • the nucleic acid fragment inserts into a cell's genomic DNA so that the weak promoter is operably linked to at least one expression control sequence already present in the cell's DNA, preferably at least one of which is an enhancer (see, for instance, Fig.
  • the weak promoter can direct the expression of the detectable marker coding sequence in tissues in which the enhancer is active and at levels higher than the weak promoter would direct expression when not operably linked to the enhancer.
  • An enhancer is a s-acting nucleotide sequence that generally increases the activity of promoters and typically can function in either orientation and either upstream or downstream of a promoter. Examples of suitable weak promoters useful in vertebrate cells are the promoter for the carp ⁇ -actin coding sequence (Liu et al.,
  • the invention includes a method for using the nucleic acid fragment of the invention to identify or "trap" expression control sequences present in genomic DNA.
  • the coding sequence of the nucleic acid fragment encodes a detectable marker and is operably linked to at least one expression control sequence present in the nucleic acid sequence of the nucleic acid fragment.
  • the detectable marker is preferably a fluorescent protein or a selectable marker.
  • the intervening nucleic acid sequence comprises a detectable marker coding sequence operably linked to a promoter, preferably a weak promoter.
  • an expression control sequence-trap transposon vector comprising the nucleic acid fragment is introduced into a cell, preferably along with a source of transposase, such that the nucleic acid fragment inserts into the DNA of the cell.
  • the transposase source can be a nucleic acid and/or a protein as described in detail hereinbelow.
  • a vector containing the nucleic acid fragment can contain a second coding sequence encoding a transposase.
  • the cell can contain a coding sequence that encodes an SB transposase.
  • an mRNA encoding an SB transposase or an SB transposase itself can be introduced into the cell.
  • the nucleic acid fragment can insert within a coding sequence present in a cell's DNA that can result in the insertional inactivation of that coding sequence, or the nucleic acid fragment can insert into DNA outside of a coding sequence. Either type of insertion can result in expression of the detectable marker provided the nucleic acid fragment inserts near an appropriate expression control sequence.
  • the nucleic acid fragment integrates into the DNA of the cell or its progeny within a domain that contains an expression control sequence, more preferably an enhancer. It is possible that the nucleic acid fragment of this embodiment will insert in-frame into a coding sequence in a cell's DNA and is expressed by virtue of the endogenous promoter and not the weak promoter.
  • the nucleic acid fragment will be operating as a gene trap.
  • the nucleic acid fragment comprising a detectable marker operably linked to a weak promoter can be used to detect the presence of an expression control sequence that regulates the expression of the promoter.
  • enhancers are detected. As enhancers activate promoters located within the same domain defined by border elements as the enhancer, the expression of the detectable marker generally indicates that the nucleic acid fragment has inserted within the same domain as an enhancer.
  • Expression control sequences can be detected in accordance with the invention in any type of cell, without limitation.
  • Preferred cells are pluripotent or totipotent cells, including an oocyte, a cell of an embryo, an egg and a stem cell.
  • cells can be derived from any type of tissue, differentiated or undifferentiated.
  • Cells from fish, birds and other animals can be used, as can cells from mammals including, but not limited to, rodents, such as rats or mice, ungulates, such as cows or goats, sheep, swine or cells from a human.
  • enhancers it is possible for enhancers to be active only at specific times or specific tissues within an animal.
  • evaluation of expression of the detectable marker encoded by an inserted nucleic acid fragment in an animal can result in identification of enhancers that have distinct spatial and/or temporal expression. For instance, detection of the detectable marker only at specific times during the cell cycle or during development of the animal indicates that the enhancer is active only at specific times (i.e., developmental stage-specific expression).
  • Detection of the detectable marker only in specific tissues of the whole animal indicates that the trapped enhancer is a tissue-specific enhancer.
  • the cells are grown into an animal and the cells assayed for expression of the detectable marker are present in an animal.
  • cells that can be detected include progeny of a cell that contain the nucleic acid fragment comprising the detectable marker coding sequence.
  • the animal can be an embryo, an adult, or at a developmental phase between embryo and adult.
  • the animal is an embryo.
  • Expression of the detectable marker in the animal can be assayed by methods known to the art. For instance, assay of ⁇ - galactosidase expression or immunological detection of a foreign protein like
  • CAT can be used.
  • Another example of evaluating expression of a detectable marker in an embryo is the expression of fluorescent proteins in the optically clear zebrafish embryo.
  • the expression control sequence detection method includes observing at least one phenotype of a cell that contains the integrated nucleic acid fragment, and comparing it to a cell that does not contain the nucleic acid fragment to determine whether the phenotype of the first cell is altered.
  • An altered phenotype can be detected by methods known to the art.
  • the cell that contains the integrated nucleic acid fragment can be grown into an animal, and animal phenotypes similarly compared.
  • the method can be used to make a transgenic animal having tissue- specific expression of a preselected coding sequence.
  • a first transgenic animal can be produced that contains an expression control sequence- trap that is expressed in a particular tissue, and the detectable marker coding sequence encodes a tn s-acting activator.
  • a second and independent transgenic animal can be produced that contains a preselected coding sequence that is operably linked to a promoter that is activated by the activator encoded by the expression control sequence-trap that is present in the first transgenic animal.
  • Crossing the two transgenic animals can result in transgenic progeny that contain i) the expression control sequence-trap that is expressed in a particular tissue and ii) the preselected coding sequence operably linked to a promoter that is activated by the activator encoded by the expression control sequence-trap.
  • Tissue-specific expression of the activator protein will cause tissue-specific expression of the preselected coding sequence.
  • This aspect of the invention is particularly useful in those animals where tissue-specific promoters have not yet been identified.
  • the method optionally includes cleaving the DNA of the cell with a restriction endonuclease capable of cleaving at a restriction site within the intervening nucleic acid sequence of the nucleic acid fragment to yield at least one restriction fragment containing at least a portion of the integrated nucleic acid fragment, which portion comprises at least a portion of an inverted repeat sequence along with an amount of genomic DNA of the cell, which genomic DNA is adjacent to the inverted repeat sequence.
  • a restriction endonuclease capable of cleaving at a restriction site within the intervening nucleic acid sequence of the nucleic acid fragment to yield at least one restriction fragment containing at least a portion of the integrated nucleic acid fragment, which portion comprises at least a portion of an inverted repeat sequence along with an amount of genomic DNA of the cell, which genomic DNA is adjacent to the inverted repeat sequence.
  • the intervening nucleic acid sequence thus preferably includes a restriction endonuclease recognition site, preferably a 6-base recognition sequence.
  • the cell DNA is isolated and digested with the restriction endonuclease.
  • a restriction endonuclease is used that employs a 6-base recognition sequence
  • the cell DNA is cut into about 4000- base pair restriction fragments on average. Since the site of DNA integration mediated by the SB proteins generally occurs at TA base pairs and the TA base pairs are typically duplicated such that an integrated nucleic acid fragment is flanked by TA base pairs, TA base pairs will be immediately adjacent to an integrated nucleic acid fragment.
  • the genomic DNA of the genomic fragment is typically immediately adjacent to the TA base pairs on either side of the integrated nucleic acid fragment.
  • the genomic fragments can be cloned in a vector using methods well known to the art allowing individual clones containing genomic fragments comprising at least a portion of the integrated nucleic acid fragment and genomic DNA of the cell adjacent to the inserted nucleic acid fragment to be identified.
  • a non-limiting example of identifying the desired genomic fragments include hybridization with a probe complementary to the sequence of the inverted repeats.
  • linkers can be added to the ends of the digested fragments to provide complementary sequence for PCR primers. Where linkers are added, PCR reactions are used to amplify fragments using primers from the linkers and primers binding to a nucleotide sequence within the inverted repeats.
  • Nucleotide sequences of the genomic DNA on either or both sides of the inserted nucleic acid fragment can be determined by nucleotide sequencing using methods well known to the art.
  • the resulting nucleotide sequences are then used to search computer databases such as GenBank or EMBL for related sequences; if the nucleotide sequences encode a putative protein, the encoded amino acid sequences can also be used to search protein data bases such as SwissProt for related or homologous polypeptide sequences.
  • the restriction endonuclease used to cleave the cell DNA is one that is incapable of cleaving the nucleic acid sequence of the nucleic acid fragment.
  • Non-limiting examples of characterizing the resulting restriction fragments include adding linkers to the ends of the digested fragments to provide complementary sequences for PCR primers described above or for inverse PCR. For instance, to identify fragments that contain nucleotides on either or both sides of the inserted nucleic acid fragment using inverse PCR, genomic DNA is isolated from cells that express a detectable marker such as GFP or show a consequential phenotypic response after mutagenesis with a transposon of the present invention (see, e.g., Fig. 16).
  • the DNA is then cleaved with one or more restriction endonucleases that cut outside of the transposon and the resulting fragments of DNA are circularized using DNA ligase. About one in a million genomic fragments may contain the transposon.
  • the genomic sequence can then be PCR amplified in two steps. The first PCR amplification uses the P2 external primers IPJDR(L)-p2 CCACAGGTACACCTCCAATTGACTC (SEQ ID NO:72) and IR/DR(R)-P2 GTGGTGATCCTAACTGACCTTAAGAC (SEQ ID NO:73).
  • the products of round 1 of amplification are reamplified using internal PI primers that further augment the number of copies of the interrupted genetic sequence.
  • the internal primers are IPJDR(L)-pl GTGTCATGCACAAAGTAGATGTCC (SEQ ID NO:74) and IR DR(R)-P1 CTCGGATTAAATGTCAGGAATTGTG (SEQ ID NO:75).
  • Primers PI and P2 are complementary to sequences within the DR elements of the SB transposon.
  • the amplified DNA sequences are isolated for sequencing and/or other analysis, and nucleotide sequences of the genomic DNA on either or both sides of the inserted nucleic acid fragment can thus be determined.
  • the intervening nucleic acid sequence includes a splice acceptor site and/or an internal ribosomal entry site (IRES), each of these expression control sequences being operably linked to the coding sequence, preferably a detectable marker coding sequence.
  • the intervening nucleic acid sequence comprises both a splice acceptor site and an IRES, and the
  • the IRES is positioned between the splice acceptor site and the detectable marker coding sequence so as to ultimately permit ribosome binding to the detectable marker mRNA and thereby initiate translation of the detectable marker nucleotide sequence (see, for instance, Fig. 12(A), 12(B)).
  • the splice acceptor site and/or an IRES are considered operably linked to a coding sequence when the splice acceptor site and/or the IRES is located 5' of the detectable marker coding sequence and is present in an mRNA containing the detectable marker coding sequence prior to processing of the mRNA.
  • the splice acceptor site is located 5' to the IRES
  • the IRES is located 5' to the coding sequence to which a splice acceptor site and an IRES are operably linked.
  • the splice acceptor site acts to provide signals to target the sequences 3' to, i.e., following, the splice acceptor site, including the detectable marker coding sequence, to be present in the mRNA containing the detectable marker coding sequence provided there is an intron upstream of the splice acceptor site
  • a splice acceptor site typically includes a branch site and a 3' splice site.
  • the consensus sequence of a branch site is typically a nucleotide sequence 5'-P y80 N P yg0 P yg7 P y87 P u75 A P ⁇ j , where P y is T or C, P u is A or G, and the subscripted number is the approximate percent occurrence of the appropriate nucleotide (see, for instance,
  • the branch site is typically located 10 to 60 nucleotides 5' to the splice site, preferably 15 to 50 nucleotides 5' to the splice site.
  • the 3' splice site is typically the nucleotide sequence C 55 AG, where the subscripted number is the percent occurrence of the C, and the intron is cleaved after the G.
  • the splice acceptor site is derived from the 3' end (i.e., the splice acceptor end) of the first intron (intron A) of the ⁇ -actin coding sequence of carp (Liu, Z., et al., DNA Sequence -J. DNA Sequencing and Mapping, ⁇ 1. l, pp. 125-136 (1990)).
  • nucleotides 1335-1571 of the nucleotides sequence available at GenBank Accession No. M24113, more preferably nucleotides 1485-1571, are particularly suitable for use in the present invention.
  • the maximum distance between splice acceptor site and IRES is unknown.
  • the overall size of the nucleic acid fragment can have an effect on the efficiency of transposition of the nucleic acid fragment.
  • the SB protein has been used to incorporate transposons ranging from
  • the IRES is typically positioned within about 0 to 7 bases of the translation initiation codon, e.g., ATG, of the coding sequence to which the
  • IRES is operably linked. Typically, an IRES contains at least two translation initiation codons. Preferably, the IRES includes at least one translation initiation codon, and the IRES is ligated to the translation coding region such that an IRES translation initiation codon replaces the translation initiation codon of the coding sequence.
  • An IRES allows ribosomal access to mRNA without a requirement for cap recognition and subsequent scanning to the initiator AUG (Pelletier, J.A., et al, Nature, 334, 320-325 (1988)).
  • An IRES that can be used in the invention typically includes a viral IRES, preferably a picornavirus IRES, poliovirus IRES, mengovirus IRES, or EMCV IRES, more preferably a poliovirus IRES, mengovirus IRES, or EMCV IRES, and most preferably an EMCV IRES.
  • An example of an EMCV IRES that can be used in the invention is nucleotides 234- 848 of the nucleotide sequence available at GenBank Accession No. M81861. In some embodiments nucleotides 827-831 (GAT A) are replaced with TGCT. This 615 base pair nucleotide sequence contains ATG codons at nucleotides 834-836 and 846-848.
  • the ATG codon at nucleotides 834-836 is used as the translation initiation codon by the ribosome and a coding sequence (for instance a GFP coding sequence) can be fused to this ATG codon.
  • a coding sequence for instance a GFP coding sequence
  • the coding sequence included in the intervening nucleic acid sequence of the nucleic acid fragment of the invention typically contains a polyadenylation signal, it need not.
  • the detectable marker coding sequence is preferably operably linked to a promoter located 5' to the coding sequence and a splice donor site located 3' of the coding sequence.
  • the mRNA containing the detectable marker may be stabilized when the splice donor splices with a downstream exon that encodes a poly (A) tail. This is known as a poly (A) trap.
  • the invention includes a method for using the nucleic acid fragment of the invention to identify or "trap" coding sequences present in genomic DNA, i.e. a "gene trap" transposon method that allows for gene discovery and functional analysis. Insertion of a transposon into genomic DNA can interrupt or mutate a genomic coding sequence.
  • a genomic coding sequence present in a cell is interrupted, and the detectable marker coding sequence is inserted in just the right way (in the correct direction, in-frame, and in an exon of the interrupted coding sequence), typically the detectable marker coding sequence is expressed spatially and temporally in the same way as the interrupted genomic coding sequence is expressed when not interrupted.
  • This aspect of the invention can be used, for example, in gene discovery by providing for a method to insert a nucleic acid fragment into genomic DNA so that a genomic coding sequence no longer expresses a functional product, i.e., the insertion results in a loss-of-function mutation.
  • Successful utilization of the transposon-derived vectors in the gene-trap and enhancer-trap methods of the invention without further modification was surprising in view of the possibility that the IR/DR sequences might contain cryptic promoter or splicing signals that would have interfered with the use of these vectors.
  • a genomic coding sequence in a cell's DNA can be identified according to the present invention by introducing a nucleic acid fragment comprising a coding sequence, preferably a detectable marker coding sequence, into a cell, preferably along with a source of transposase as described above, then detecting the detectable marker in the cell or its progeny.
  • the intervening nucleic acid sequence of the nucleic acid fragment includes a splice acceptor site and/or an IRES, each of which is operably linked to the coding sequence.
  • the IRES is preferably located between the splice acceptor site and the detectable marker coding sequence.
  • the detectable marker coding sequence is preferably not operably linked to a promoter.
  • a splice acceptor site and an internal ribosome binding site operably linked to the detectable marker coding sequence expands the probability that the detectable marker coding sequence will be expressed when inserted into a genomic coding sequence: it is possible to get expression of the detectable marker coding sequence even if the transposon integrates in an intron or if it integrates out of frame with respect to the interrupted genomic coding sequence. Detection of the detectable marker in the cell or in progeny of the cell containing the nucleic acid fragment is indicative that the nucleic acid fragment has integrated within a genomic coding sequence of the cell.
  • Genomic coding sequences can be detected in any type of cell as generally described above, including but not limited to an oocyte, a cell of an embryo, an egg cell or a stem cell, and in any type of tissue, differentiated or undifferentiated.
  • the detectable marker is expressed spatially and temporally in the same way as the genomic coding sequence is expressed when not interrupted.
  • the genomic coding sequence detection method includes observing at least one phenotype of a cell that contains the integrated nucleic acid fragment, and comparing it to a cell that does not contain the nucleic acid fragment to determine whether the phenotype of the first cell is altered.
  • the cell that contains the integrated nucleic acid fragment can be grown into an animal, and animal phenotypes similarly compared.
  • the method optionally comprises cleaving the DNA of the cell with a restriction endonuclease to yield at least one restriction fragment containing at least a portion of the integrated nucleic acid fragment, which portion comprises an inverted repeat sequence along with an amount of genomic DNA of the cell, which genomic DNA is adjacent to the inverted repeat sequence.
  • the intervening nucleic acid sequence thus preferably includes a restriction endonuclease recognition site, as described above in connection with the expression control region detection method. Restriction fragments containing portions of the inverted repeats and genomic DNA are sequenced, and the DNA flanking the inverted repeats and/or the amino acid sequences encoded thereby are used to search computer databases such as GenBank or SwissProt.
  • the intervening nucleic acid sequence comprises a coding sequence, preferably a detectable marker coding sequence, and a second coding sequence located 5', i.e., upstream, of the detectable marker coding sequence.
  • the detectable marker coding sequence typically is not operably linked to a promoter.
  • the intervening nucleic acid sequence further comprises an IRES located between the detectable marker coding sequence and the second coding sequence, wherein the IRES is operably linked to the detectable marker coding sequence (see, for instance, Fig. 15).
  • the second coding sequence is operably linked to at least one expression control sequence.
  • the expression control sequence to which the second coding sequence is optionally operably linked can include a splice acceptor site, an IRES or a promoter, preferably a promoter.
  • the analyte coding sequence can include any coding sequence of interest including, for instance, a randomly inserted coding sequence from a library of DNA fragments or a preselected coding sequence.
  • the nucleic acid sequence comprising the analyte coding sequence preferably includes at least one expression control sequence, including but not limited to expression control sequences that are associated with the analyte coding sequence in its wild type or native state, i.e., those expression control sequences operably linked to the coding sequence as it naturally exists in a cell.
  • at least one of the expression control sequences is a promoter.
  • Useful promoters include constitutive and inducible promoters.
  • the promoter can be the native promoter, i.e., the promoter that is normally operably linked to the analyte coding region.
  • the detectable marker coding sequence can be operably linked to a splice acceptor site and/or an IRES.
  • the detectable marker coding sequence is operably linked to an IRES (see, e.g., Fig. 15).
  • the analyte coding sequence can encode a protein that is biologically active, thereby allowing, for example, the evaluation and/or verification of the function of coding sequences and/or their protein products, as well as mutant rescue and transgenic analysis.
  • insertion of a dicistronic vector that has an analyte coding sequence that encodes a biologically active protein can cause a gain-of-function mutation.
  • the analyte coding sequence can encode a protein that is incapable of performing the function of the wild-type, i.e., native, protein.
  • This type of protein is typically inactive by virtue of an amino acid sequence altered relative to the native protein and can be used for the functional analysis of proteins using, for example, dominant-negative mutant analysis.
  • the nucleic acid sequence of this aspect of the invention can encode two mRNAs, one encoded by the detectable marker coding sequence and a second mRNA encoded by the analyte coding sequence.
  • the nucleic acid sequence of this aspect of the invention encodes one mRNA that includes two coding sequences, i.e., a dicistronic mRNA.
  • a dicistronic vector of this aspect of the invention generally provides for the expression of a detectable marker coding sequence primarily when the analyte coding sequence is expressed.
  • the invention includes a method for identifying or analyzing the function of an analyte coding sequence that involves introducing into a host cell a dicistronic nucleic acid fragment of the invention that includes the analyte coding sequence and a detectable marker coding sequence, preferably together with a source of transposase, followed by detection of the detectable marker.
  • the development of transposable elements for vectors in animals thus makes possible the identification, isolation, and characterization of coding sequences involved with growth, development and disease, and also the transcriptional regulatory sequences that control growth, development and disease.
  • the nucleic acid fragment used in this method of the invention when read from left to right, contains at least the following elements in the following order: inverted repeats, the analyte coding sequence, the detectable marker coding sequence, and inverted repeats.
  • the analyte coding sequence is located 5' of the detectable marker coding sequence, and the analyte coding sequence is transcribed first, followed by the detectable marker coding sequence.
  • transcription of the two coding sequences in the nucleic acid fragment can result in a dicistronic mRNA.
  • the analyte coding sequence is not operably linked to either a splice acceptor site or an IRES, although it can be. While it is anticipated that insertion of the nucleic acid fragment into genomic DNA can result in the interruption of a genomic coding sequence, identification of an analyte coding sequence does not require the interruption of a genomic coding sequence.
  • the analyte coding sequence is operably linked to a promoter, as described above.
  • the detectable marker coding sequence can be operably linked to a splice acceptor site and/or an IRES.
  • the detectable marker coding sequence is operably linked to an IRES (see, e.g., Fig. 15).
  • the two coding sequences present in the transposon are transcribed, and a dicistronic mRNA is typically produced.
  • the analyte coding sequence of the nucleic acid fragment will be translated by virtue of ribosome initiation via scanning from the 5 ' end of the mRNA.
  • the detectable marker coding sequence of the nucleic acid fragment will be translated by virtue of internal initiation mediated by the
  • the translation of the detectable marker i.e., the second coding sequence of the dicistronic mRNA
  • the translation of the detectable marker provides a method to detect expression of the analyte coding sequence of the nucleic acid fragment. This is a significant advantage, as the expression of some biological coding sequences of interest can be difficult to monitor directly.
  • the dicistronic gene expression transposon vectors of the invention will generally allow the expression of a biological coding sequence of interest to be detected.
  • a use of a dicistronic transposon vector of this aspect of the invention is depicted schematically in Fig. 15.
  • the dicistronic transposon vector and mRNA encoding SB transposase can be microinjected into zebrafish embryos which are allowed to mature.
  • Expression of GFP marks cells in which "Gene X" is also expressed. This allows analysis of the effects of "Gene X" on specific tissues; a form of mosaic analysis.
  • Gene X may encode a protein or a portion of a protein, and the encoded protein can be beneficial or deleterious to the cells.
  • analyte coding sequences can be analyzed in any type of cell, including but not limited to an oocyte, a cell of an embryo, an egg cell or a stem cell, and in any type of tissue, differentiated or undifferentiated.
  • An alternative use of the dicistronic vector is to inject dicistronic mRNA encoded by a vector containing a nucleic acid fragment comprising the analyte coding sequence and the detectable marker coding sequence. An example of this embodiment is described in Example 9.
  • the method for identifying or analyzing the function of an analyte coding sequence includes observing the phenotype of a cell that contains the integrated nucleic acid fragment, and comparing it to a cell that does not contain the nucleic acid fragment to determine whether the phenotype of the first cell is altered, wherein an altered phenotype is indicative that the analyte coding sequence plays a function in the identified phenotype.
  • the cell that contains the integrated nucleic acid fragment can be grown into an animal, and animal phenotypes similarly compared.
  • the nucleic acid fragments of the invention have applications to many areas of biotechnology and functional genomics.
  • the invention allows efficient insertion of genetic material into the genomic DNA of a cell of animals, preferably vertebrate animals, for the mutation, evaluation of function, and subsequent cloning of a genomic coding sequence and/or genomic expression control sequences.
  • the invention has the property of allowing identification of organisms in which the detectable marker that is encoded by the inserted nucleic acid fragment is expressed in specific tissues or at specific times in development.
  • Another property of the invention is the ability to insert a biological coding sequence of interest into a cell's genomic DNA and evaluate the location and time of expression of the biological coding sequence of interest by assaying for the co-expressed downstream detectable marker coding sequence.
  • the system has two components: a nucleic acid fragment that comprises a nucleic acid sequence comprising a coding sequence, wherein the nucleic acid sequence is positioned between at least two inverted repeats that can bind to an SB protein, and a source of transposase.
  • the intervening nucleic acid sequence of the nucleic acid fragment can include any variation or feature herein disclosed, without limitation, and the nucleic acid fragment is one that is capable of integrating into DNA of a cell, as described more fully hereinabove.
  • the nucleic acid fragment is preferably part of a plasmid or a recombinant viral vector.
  • the transposase source can be either a nucleic acid encoding the transposase or the transposase protein itself, and the transposase is preferably an SB protein.
  • Another embodiment of the gene transfer system is directed to the introduction of a nucleic acid fragment into the DNA of a human or a fish.
  • This embodiment of the gene transfer system includes a nucleic acid fragment comprising a nucleic acid sequence that comprises an IRES, and the nucleic acid fragment is capable of integrating into the fish or human DNA.
  • the nucleic acid sequence of this embodiment further comprises a coding sequence located 3' to and operably linked to the IRES.
  • the nucleic acid sequence of this embodiment comprises a first coding sequence located 3 1 to and operably linked to the IRES, and a second coding sequence located 5' to both the first coding sequence and the IRES.
  • the nucleic acid sequence of the nucleic acid fragment need not be flanked by inverted repeats that bind an SB protein, nor is a source of transposase necessary, although these features are optionally included.
  • the invention is further directed to a transgenic human or fish, preferably zebrafish, whose cells contain a nucleic acid fragment comprising an IRES as described, and its progeny.
  • the invention is directed to a transgenic fish or fish cell comprising a IRES that is heterologous with respect to the fish genome, for example a viral IRES.
  • the invention also includes a method for producing a transgenic animal.
  • a nucleic acid fragment of the invention including any variation or feature herein disclosed, without limitation, and a source of transposase as described above are introduced into a cell.
  • the nucleic acid fragment preferably contains a coding sequence that is heterologous with respect to the animal, i.e., it is not found in the animal's genome.
  • the coding sequence can also be one that is endogenous to the animal.
  • the cell or cells containing the nucleic acid fragment are then grown into an animal.
  • the resulting animal can be transgenic, including a mosaic.
  • the nucleic fragment is integrated into both somatic and germline cells of the transgenic animal, and the transgenic animal is capable of transmitting the nucleic acid fragment to its progeny.
  • the invention is further directed to a transgenic animal whose cells contain a nucleic acid fragment of the invention, and its progeny.
  • Gene reconstruction-Phase 1 Reconstruction of a transposase open reading frame.
  • the Tssl.l element from Atlantic salmon (GenBank accession number LI 2206) was PCR-amplified using a primer pair flanking the defective transposase gene, FTC-Start and FTC-Stop to yield product SB1.
  • a segment of the defective transposase gene of the Tssl.2 element (LI 2207) was PCR-amplified using PCR primers FTC-3 and FTC-4, then further amplified with FTC-3 and FTC-5.
  • the PCR product was digested with restriction enzymes Ncol and Blpl, underlined in the primer sequences, and cloned to replace the corresponding fragment in SB1 to yield SB2. Then, an approximately 250 bp
  • FTC-3 5'-AACACCATGGGACCACGCAGCCGTCA (SEQ ID NO: 19)
  • FTC-4 5'-CAGGTTATGTCGATATAGGACTCGTTTTAC (SEQ ID NO: 19)
  • FTC-5 5'-CCTTGCTGAGCGGCCTTTCAGGTTATGTCG (SEQ ID NO:21)
  • Gene reconstruction-Phase 2 Site-specific PCR mutagenesis of the SB3 open reading frame to introduce consensus amino acids.
  • PCR mutagenesis two methods have been used: megaprimer PCR (Sarkar and
  • Oligonucleotide primers for product SB4 were the following:
  • FTC-7 5'-TTGCACTTTTCGCACCAA for Gln->Arg(74) and Asn->Lys(75)
  • FTC-13 5'-GTACCTGTTTCCTCCAGCATC for Ala->Glu(93) (SEQ ID NO:23);
  • FTC-8 5'-GAGCAGTGGCTTCTTCCT for Leu->Pro( 121) (SEQ ID NO:24);
  • FTC-9 5'-CCACAACATGATGCTGCC for Leu->Met(193) (SEQ ID NO:
  • FTC- 10 5'-TGGCCACTCCAATACCTTGAC for Ala->Val(265) and Cys- >Trp(268) (SEQ ID NO:26);
  • FTC-11 5'-ACACTCTAGACTAGTATTTGGTAGCATTGCC for Ser-
  • Oligonucleotide primers for product SB5 are Oligonucleotide primers for product SB5:
  • B5-PTV 5'-GTGCTTCACGGTTGGGATGGTG (SEQ ID NO:28) for Leu- >Pro(l 83), Asn->Thr(l 84) and Met->Val(l 85) (SEQ ID NO:28).
  • Oligonucleotide primers for product SB6 are Oligonucleotide primers for product SB6:
  • FTC-DDE 5'-ATTTTCTATAGGATTGAGGTCAGGGC for Asp->Glu(279) (SEQ ID NO:29).
  • PR-GAIS 5'-GTCTGGTTCATCCTTGGGAGCAATTTCCAAACGCC for
  • Oligonucleotide primers for product SB9 are Asn->Ile(28), His->Arg(31) and Phe->Ser(21) (SEQ ID NO:30). Oligonucleotide primers for product SB9 :
  • VEGYP 5*-GTGGAAGGCTACCCGAAACGTTTGACC for Leu->Pro(324)
  • Oligonucleotide primers for product SB 10 are Oligonucleotide primers for product SB 10:
  • FATAH 5'-GACAAAGATCGTACTTTTTGGAGAAATGTC for Cys-
  • plasmid pSBlO was cut with Mscl, which removes 322 bp of the transposase-coding region, and recircularized. Removal of the Mscl fragment from the transposase gene deleted much of the catalytic DDE domain and disrupted the reading frame by introducing a premature translational termination codon.
  • Fig. 1 A Conceptual translation of the mutated transposase open reading frames and comparison with functional motifs in other proteins allowed us to identify five regions that are highly conserved in the SB transposase family (Fig. 1 A): I) a paired box/leucine zipper motif at the N-terminus; ii) a DNA-binding domain; iii) a bipartite nuclear localization signal (NLS); iv) a glycine-rich motif close to the center of the transposase without any known function at present; and v) a catalytic domain consisting of three segments in the C-terminal half comprising the DDE domain that catalyzes the transposition. DDE domains were identified by Doak et al.
  • the first step of reactivating the transposase gene was to restore an open reading frame (SBl through SB3 in Fig. IB) from bits and pieces of two inactive TcEs from Atlantic salmon (Salmo salar) and a single element from rainbow trout (Oncorhynchus mykiss) (Radice et al., 1994, supra). SB3, which has a complete open reading frame after removal of stop codons and frameshifts, was tested in an excision assay similar to that described by Handler et al.
  • the SB3 polypeptide differs from the consensus transposase sequence in 24 positions (Fig. IB) which can be sorted into two groups; nine residues that are probably essential for transposase activity because they are in the presumed functional domains and/or conserved in the entire Tel family, and another fifteen residues whose relative importance could not be predicted. Consequently, a dual gene reconstruction strategy was undertaken. First, the putative functional protein domains of the transposase were systematically rebuilt one at a time by correcting the former group of mutations. Each domain for a biochemical activity was tested independently when possible. Second, in parallel with the first approach, a full-length, putative transposase gene was synthesized by extending the reconstruction procedure to all of the 24 mutant amino acids in the putative transposase.
  • the reconstituted functional transposase domains were tested for activity.
  • a short segment of the SB4 transposase gene (Fig. IB) encoding an NLS- like protein motif was fused to the lacZ gene.
  • the transposase NLS was able to mediate the transfer of the cytoplasmic marker-protein, ⁇ -galactosidase, into the nuclei of cultured mouse cells (Ivies et al., 1996, supra), supporting predictions that a bipartite NLS was a functional motif in SB and that our approach to resurrect a full-length, multifunctional enzyme was viable.
  • a TcE from Tanichthys albonubes was cloned into the Smal site of pUC19 to result in pT.
  • the donor construct for the integration assays, pT/neo was made by cloning, after Klenow fill-in, an EcoRl/Bam I fragment of the plasmid pRc-CMV (Invitrogen, San Diego, CA) containing the SV40 promoter/enhancer, the neomycin resistance gene and an SV40 poly(A) signal into the Stul/Mscl sites of pT.
  • TcEs there are at least two distinct subfamilies of TcEs in the genomes of Atlantic salmon and zebrafish, Tssl/Tdrl and Tss2/Tdr2, respectively. Elements from the same subfamily are more alike, having about 70% nucleic acid identity, even when they are from two different species (e.g., Tssl and Tdrl) than members of two different subfamilies in the same species.
  • Tdrl and Tdr2 are characteristically different in their encoded transposases and their inverted repeat sequences, and share only about 30% nucleic acid identity. It may be that certain subfamilies of transposons must be significantly different from each other in order to avoid cross-mobilization. A major question is whether substrate recognition of transposases is sufficiently specific to prevent activation of transposons of closely related subfamilies.
  • the 12-bp DRs of salmonid-type elements are part of the binding sites for SB. However, these binding-sites are 30 bp long. Thus, specific DNA-binding also involves DNA sequences around the DRs that are variable between TcE subfamilies in fish. Such a difference in the sequences of transposase binding sites might explain the inability of N 123 to bind efficiently to zebrafish Tdrl IRs, and may enable the transposase to distinguish even between closely related TcE subfamilies.
  • TcEs serve one or more regulatory purposes affecting transposition and/or gene expression.
  • transposase Once in the nucleus, a transposase must bind specifically to its recognition sequences in the transposon.
  • the specific DNA-binding domains of both the Tel and Tc3 transposases have been mapped to their N-terminal regions
  • N-terminal region of SB has significant structural and sequence similarities to the paired DNA- binding domain, found in the Pax family of transcription factors, in a novel combination with a leucine zipper-like motif (Ivies et al., 1996, supra).
  • a gene segment encoding the first 123 amino acids of SB (N123), which presumably contains all the necessary information for specific DNA-binding and includes the NLS, was reconstructed (SB8 in Fig. IB), and expressed in E. coli. N123 was purified via a C-terminal histidine tag as a 16 KDa polypeptide (Fig. 3 A).
  • N 123 was in E. coli strain BL21(D ⁇ 3) (Novagen) by the addition of 0.4 mM IPTG at 0.5 O.D. at 600 nm and continued for 2.5 h at 30°C.
  • Cells were sonicated in 25 mM HEPES, pH 7.5, 1 M NaCl, 15% glycerol, 0.25% Tween 20, 2 mM ⁇ -mercaptoethanol, 1 mM PMSF) and 10 mM imidazole (pH
  • Ni2 + -NTA resin (Qiagen) was added to the soluble fraction before it was mixed with Ni2 + -NTA resin (Qiagen) according to the recommendations of the manufacturer.
  • the resin was washed with 25 mM HEPES (pH 7.5), 1 M NaCl, 30% glycerol, 0.25% Tween 20, 2 mM ⁇ -mercaptoethanol, 1 mM PMSF and 50 mM imidazole (pH 8.0) and bound proteins were eluted with sonication buffer containing 300 mM imidazole, and dialyzed overnight at 4°C against sonication buffer without imidazole.
  • N123 also contains the specific DNA- binding domain of SB, as tested in a mobility-shift assay (Fig. 3B).
  • a 300 bp EcoRI/HmdlH fragment of pT comprising the left inverted repeat of the element was end-labeled using [ ⁇ - ⁇ PjdCTP and Klenow.
  • Nucleoprotein complexes were formed in 20 mM ⁇ P ⁇ S (p ⁇ 7.5), 0.1 mM ⁇ DTA, 0.1 mg/ml BSA, 150 mM NaCl, 1 mM DTT in a total volume of 10 ⁇ l. Reactions contained 100 pg labeled probe, 2 ⁇ g poly[dI][dC] and 1.5 ⁇ l N123.
  • N 123 is able to distinguish between salmonid-type and zebrafish-type TcE substrates.
  • the number of the deoxyribonucleoprotein complexes detected by the mobility-shift assay at increasingly higher N123 concentrations indicated two protein molecules bound per IR (Fig. 3B, right panel), consistent with either two binding sites for transposase within the IR or a transposase dimer bound to a single site.
  • Transposase-binding sites were further analyzed and mapped in a DNasel footprinting experiment. Using the same fragment of J 1 as above, two protected regions close to the ends of the IR probe were observed (Fig. 4). The two 30-bp footprints cover the subterminal DR motifs within the IRs.
  • DRs are the core sequences for DNA-binding by N 123.
  • the DR motifs are almost identical between salmonid- and zebrafish-type TcEs (Ivies et al., 1997).
  • the 30-bp transposase binding-sites are longer than the DR motifs and contain 8 base pairs and 7 base pairs in the outer and internal binding sites, respectively, that are different between the zebrafish- and the salmonid-type IRs
  • a gene segment of SB8 was PCR-amplified using primers FTC-Start and FTC-8, 5'- phosphorylated with T4 polynucleotide kinase, digested with BamHl, filled in with Klenow, and cloned into the Ndel/Ec ⁇ l digested expression vector pET21a (Novagen) after Klenow fill-in.
  • This plasmid, pET21a/N123 expresses the first 123 amino acids of the transposase (N123) with a C-terminal histidine tag.
  • transposase to a donor construct and subsequent active transport of these nucleoprotein complexes into the nuclei of transfected cells could have resulted in elevated integration rates, as observed for transgenic zebrafish embryos using an SV40 NLS peptide (Collas et al., 1996 Transgenic Res. 5, 451-458).
  • DNA-binding and nuclear targeting activities alone did not increase transformation frequency, which occurred only in the presence of full-length transposase. Although not sufficient, these functions are probably necessary for transposase activity. Indeed, a single amino acid replacement in the NLS of mariner is detrimental to overall transposase function (Lohe et al., 1997 Proc. Natl. Acad. Sci.
  • SB6 a mutated version of the transposase gene, to catalyze transposition demonstrates the importance of the sequences of the conserved motifs. Notably, three of the 11 amino acid substitutions that SB6 contains, F(21), N(28) and H(31) are within the specific
  • the transposase-producing helper construct is often a "wings-clipped" transposase gene which lacks one of the inverted repeats of P which prevents the element from jumping (Cooley et al.,
  • transposition can only occur if both components of the SB system are present in the same cell. Once that happens, multiple integrations can take place as demonstrated by the finding of up to 11 integrated transgenes in neomycin-resistant cell clones (Fig. 7 A). In contrast to spontaneous integration of plasmid DNA in cultured mammalian cells that often occurs in the form of concatemeric multimers into a single genomic site (Perucho et al., 1980 Cell 22, 309-317), these multiple insertions appear to have occurred in distinct chromosomal locations. Integration of the synthetic, salmonid transposons was observed in fish as well as in mouse and human cells.
  • DD(34)E In the C-terminal half of the SB transposase, three protein motifs make up the DD(34)E catalytic domain; the two invariable aspartic acid residues, D(153) and D(244), and a glutamic acid residue, E(279), the latter two being separated by 34 amino acids (Fig. 2).
  • An intact DD(34)E box is essential for catalytic functions of Tel and Tc3 transposases (van Luenen et al, 1994 Cell 79, 293-301; Vos and Plasterk, 1994, supra).
  • a first assay was designed to detect chromosomal integration events into the chromosomes of cultured cells.
  • the assay is based on tr ⁇ r ⁇ -complementation of two nonautonomous transposable elements, one containing a selectable marker gene (donor) and another that expresses the transposase (helper) (Fig. 5 A).
  • the donor, pT/neo is an engineered, 7Zbased element which contains an SV40 promoter-driven neo gene flanked by the terminal IRs of the transposon containing binding sites for the transposase.
  • the helper construct expresses the full-length SB 10 transposase gene driven by a human cytomegalo virus (CMV) enhancer/promoter.
  • CMV human cytomegalo virus
  • the donor plasmid is cotransfected with the helper or control constructs into cultured vertebrate cells, and the number of cell clones that are resistant to the neomycin analog drug G418 due to chromosomal integration and expression of the neo transgene serves as an indicator of the efficiency of gene transfer. If SB is not strictly host-specific, transposition should also occur in phylogenetically distant vertebrate species. Using the assay system shown in Fig.
  • Fig. 5B shows five plates of transfected HeLa cells that were placed under G418 selection, and were stained with methylene blue two weeks post-transfection.
  • the staining patterns clearly demonstrate a significant increase in integration of neo-marked transposons into the chromosomes of HeLa cells when the SB transposase-expressing helper construct was cotransfected (plate 2), as compared to a control cotransfection of the donor plasmid plus the SB transposase gene cloned in an antisense orientation (pSBlO-AS; plate 1).
  • pSBlO-AS antisense orientation
  • an indicator plasmid containing the transposase recognition sequence and a marker gene was co-injected with a target plasmid containing a kanamycin gene and SB transposase.
  • Resulting plasmids were isolated and used to transform E. coli. Colonies were selected for ampicillin and kanamycin resistance (see Figure 8). While SB transposase was co-microinjected in these assays, mRNA encoding the SB transposase could also be co-microinjected in place of or in addition to, the SB transposase protein.
  • Cells were cultured in DM ⁇ M supplemented with 10% fetal bovine serum, seeded onto 6 cm plates one day prior to transfection and transfected with 5 ⁇ g ⁇ lutip (Schleicher and Schuell)-purified plasmid DNA using Lipofectin from BRL. After 5 hrs of incubation with the DNA-lipid complexes, the cells were "glycerol-shocked" for 30 sec with 15% glycerol in phosphate buffered saline (PBS), washed once with PBS and then refed with serum-containing medium.
  • PBS phosphate buffered saline
  • transfected cells Two days post-transfection, the transfected cells were trypsinized, resuspended in 2 ml of serum-containing DM ⁇ M and either 1 ml or 0.1 ml aliquots of this cell suspension were seeded onto several 10 cm plates in medium containing 600 ⁇ g/ml G418 (BRL). After two weeks of selection, cell clones were either picked and expanded into individual cultures or fixed with 10% formaldehyde in PBS for 15 min, stained with methylene blue in PBS for 30 min, washed extensively with deionized water, air dried and photographed. These assays can also be used to map transposase domains necessary for chromosomal integration.
  • junction fragments of integrated transposons and human genomic DNA were isolated using a ligation- mediated PCR assay (Devon et al., Nucl. Acids. Res., 23, 1644-1645 (1995), Izsvak, et al., BioTechniques, 15, 814-816 (1993)). Junction fragments of five integrated transposons were cloned and sequenced.
  • transposase activity was assessed using five different vertebrate cells, NIH 3T3, LMTK and embryonic stem cells from mouse, HeLa cells from human and embryonic cells from the zebarafish.
  • the assay was designed to demonstrate that the transposase worked in a functioning set of cells (i.e., embryonic cells that were differentiating and growing in a natural environment).
  • the assay involved inter-plasmid transfer where the transposon in one plasmid is removed and inserted into a target plasmid and the transposase construct was injected into 1-cell stage zebrafish embryos.
  • the Indicator (donor) plasmids for monitoring transposon excision and/or integration included: 1) a marker gene that when recovered in E. coli or in fish cells, could be screened by virtue of either the loss or the gain of a function, and 2) transposase-recognition sequences in the IRs flanking the marker gene.
  • the total size of the marked transposons was kept to about 1.6 kb, the natural size of the TcEs found in teleost genomes.
  • the transposition activity of Tsl transposase was evaluated by co-microinjecting 200 ng/ ⁇ l of Tsl mRNA, made in vitro by T7 RNA polymerase from a Bluescript expression vector, plus about 250 ng/ ⁇ l each of target and donor plasmids into 1- cell stage zebrafish embryos.
  • Low molecular weight DNA was prepared from the embryos at about 5 hrs post-injection, transformed into E.coli cells, and colonies selected by replica plating on agar containing 50 ⁇ g/ml kanamycin and/or ampicillin.
  • transposition frequency into the target plasmid was about 0.041% in experimental cells as compared to 0.002% in control cells. This level did not include transpositions that occurred in the zebrafish genome. In these experiments we found that about 40% to 50% of the embryos did not survive beyond 4 days. Insertional mutagenesis studies in the mouse have suggested that the rate of recessive lethality is about 0.05 (i.e., an average of about 20 insertions will be lethal). Assuming that this rate is applicable to zebrafish, the approximate level of mortality suggests that with the microinjection conditions used in these experiments, about 20 insertions per genome, the mortality can be accounted for.
  • a transposon system will be functional for gene transfer, for such purposes as gene therapy and gene delivery to animal chromosomes for bioreactor systems, only if the delivered genes are reliably expressed.
  • GFP GFP
  • GFP GFP
  • the numbers in the columns for fish A-D show the numbers of GFP expressing fish followed by the total number of offspring examined.
  • GFP-expressing offspring are given in parentheses.
  • transposable elements Due to their inherent ability to move from one chromosomal location to another within and between genomes, transposable elements have revolutionized genetic manipulation of certain organisms including bacteria (Gonzales et al.,
  • An SB-type transposable element can integrate into either of two types of chromatin, functional DNA sequences where it may have a deleterious effect due to insertional mutagenesis or non-functional chromatin where it may not have much of a consequence (Fig. 9).
  • This power of "transposon tagging" has been exploited in simpler model systems for nearly two decades (Bingham et al., Cell, 25, 693-704 (1981); Bellen et al, 1989, supra).
  • Transposon tagging is an old technique in which transgenic DNA is delivered to cells so that it will integrate into genes, thereby inactivating them by insertional mutagenesis.
  • the inactivated genes are tagged by the transposable element which then can be used to recover the mutated allele.
  • Insertion of a transposable element may disrupt the function of a gene which can lead to a characteristic phenotype.
  • Fig. 9 because insertion is approximately random, the same procedures that generate insertional, loss-of-function mutants can often be used to deliver genes that will confer new phenotypes to cells. Gain-of-function mutants can be used to understand the roles that gene products play in growth and development as well as the importance of their regulation.
  • isolating the tagged gene In all cases genomic DNA is isolated from cells from one or more tissues of the mutated animal by conventional techniques (which vary for different tissues and animals).
  • the DNA is cleaved by a restriction endonuclease that may or may not cut in the transposon tag (more often than not it does cleave at a known site).
  • the resulting fragments can then either be directly cloned into plasmids or phage vectors for identification using probes to the transposon DNA (see Kim et al., 1995 for references in Mobile Genetic Elements, LRL Press, D. L. Sheratt eds.).
  • the DNA can be PCR amplified in any of many ways; including the LM-PCR procedure of Izsvak and Ivies (1993, supra) and a modification by Devon et al. (1995, supra) and identified by its hybridization to the transposon probe.
  • An alternative method is inverse-PCR
  • the identified clone is then sequenced.
  • the sequences that flank the transposon (or other inserted DNA) can be identified by their non-identity to the insertional element.
  • the sequences can be combined and then used to search the nucleic acid databases for either homology with other previously characterized gene(s), or partial homology to a gene or sequence motif that encodes some function. In some cases the gene has no homology to any known protein. It becomes a new sequence to which others will be compared. The encoded protein will be the center of further investigation of its role in causing the phenotype that induced its recovery.
  • mRNA can be used to determine the nucleotide sequence of the genomic DNA flanking the inserted nucleic acid fragment. For instance, the use of sequence-specific primers that hybridize to nucleotide sequences of the inserted nucleic acid fragment that would be present in a resulting mRNA, subsequent reverse transcription and 5' or 3' RACE (rapi).
  • DANA is a retroposon with an unusual substructure of distinct cassettes that appears to have been assembled by insertions of short sequences into a progenitor SINE element. DANA has been amplified in the
  • Primers that can be used in IRS-PCR to detect polymorphic DNA include 5'-GGCGACRCAGTGGCGCAGTRGG (SEQ ID NO: 13) where R is G or A and 5'-GAAYRTGCAAACTCCACACAGA (SEQ ID NO: 14) where Y is T or C and R is G or A, each of which anneal to nucleotides present in the retroposon DANA (D); 5*-TCCATCAGACCACAGGACAT (SEQ LD NO: 15) and 5'- TGTCAGGAGGAATGGGCCAAAATTC (SEQ ID NO: 16), each of which anneal to nucleotides present in Tdrl transposons; and 5'-
  • TTTCAGTTTTGGGTGAACTATCC SEQ ID NO: 12
  • Angel a highly reiterated miniature inverted-repeat transposable element.
  • Polymorphic DNA fragments can be generated by DANA or Angel specific primers in IRS-PCR and the number of detectable polymorphic bands can be significantly increased by the combination of various primers to repetitive sequences in the zebrafish genome, including SB-like transposons.
  • Polymorphic fragments can be recovered from gels and cloned to provide sequence tagged sites (STSs) for mapping mutations.
  • STSs sequence tagged sites
  • Fig. 10B illustrates the general principles and constraints for using IRS-PCR to generate STSs. It is estimated that about 0.1% of the zebrafish genome can be directly analyzed by IRS-PCR using only 4 primers. The four conserved (Cl-4) regions of DANA seem to have different degrees of conservation and representation in the zebrafish genome and this is taken into account when designing PCR primers. The same method has a potential application in fingerprinting fish stocks and other animal populations.
  • the method can facilitate obtaining subclones of large DNAs cloned in yeast, bacterial and bacteriophage PI -derived artificial chromosomes (YACs, BACs and PACs respectively) and can be used for the detection of integrated transgenic sequences.
  • yeast bacterial and bacteriophage PI -derived artificial chromosomes
  • PACs bacteriophage PI -derived artificial chromosomes
  • Dicistronic vectors in zebrafish would allow researchers to track the expression of a biological gene of interest in living embryos simply by using a reporter molecule, like GFP. Knowing where and when an introduced DNA or mRNA construct is expressed could greatly facilitate interpretation of over- expression and mutant-rescue experiments.
  • a dicistronic vector In order for a dicistronic vector to be useful for all of these purposes, it must determined in which cells and tissues, and at what developmental stages, detectable expression of a second cistron encoding a marker gene occurs.
  • EMCV IRES can function in developing zebrafish from early cleavage stages to larval stages.
  • the products of both genes in mRNAs co-localize within the embryo, indicating that both products are made in many cell types within the embryo.
  • phBeL was constructed from component fragments of pRC/CMV (Invitrogen), pCMV ⁇ (Clontech), pGem eLuc, and CMV4 (Andersson et al. 1989).
  • the vector backbone consists of the 3.15 kilobase (kb) fragment obtained after digestion with the restriction endonucleases Xhol and Notl.
  • The-- ⁇ 7zoI to Notl fragment contains the ColEl origin of replication, ampicillin resistance gene (amp), and CMV promoter found in the complete pRC/CMV vector.
  • Fused to the Notl site of the pRC/CMV Xhol/ ⁇ otl fragment is the 3.74-kb fragment obtained after digestion of pCMV ⁇ with the restriction endonuclease ⁇ otl.
  • This Notl fragment contains the complete ⁇ -galactosidase ( ⁇ gal) coding region found in pCMV ⁇ .
  • ⁇ gal ⁇ -galactosidase
  • the ⁇ otl/StuI fragment contains the EMCV IRES and luciferase coding regions.
  • a 1.11 -kb fragment of CMV4 was obtained after digestion with the restriction endonucleases Smal and Sail.
  • This Smal/Sall fragment contains the human growth hormone poly(A) signal , the SV40 origin of replication, and the SV40 early enhancer/promoter region.
  • the Smal/Sall CMV4 fragment completes the vector since Stwl and Smal create blunt cuts while Xhol and Sail have compatible single-stranded overhangs.
  • the hairpin structure found in the mRNA of this vector is due to the large number of restriction sites upstream of the ⁇ -galactosidase coding region due to the incorporation of partial multiple cloning sites from both pRC/CMV and pCMV ⁇ .
  • pGem/eLuc was created from component fragments of pGem Luc (Promega) and SK/EMCV IRES.
  • the vector pGem Luc was digested with the BamHl; the single-stranded overhang left by digestion with BamHl was removed by treatment with S 1 nuclease.
  • the linearized vector was then cut by Notl which cuts within 20 base pairs of the BamHl site.
  • SK EMCV was digested first with ⁇ TzoI and the single-stranded overhang left after digestion was removed by treatment with SI nuclease.
  • the linearized SK/EMCV was then digested by Notl.
  • the 0.64-kb ⁇ otl/Xhol (SI nuclease treated) fragment was then cloned into pGem Luc modified as above.
  • SK/EMCV IRES was created from component fragments of pBluescriptSK- (Stratagene) and pED4 (R. J. Kaufman et al., Nucleic Acids Res., 19(16), 4485-90 (1991)).
  • pBluescriptSK- was digested by the restriction endonucleases EcoRI and Xhol, which both cut within the multiple cloning site of pBluescriptSK-.
  • p ⁇ D4 was digested with EcoRI and Xhol to obtain the 0.60-kb fragment corresponding to the ⁇ MCV IRES. The EcoRI/XhoI fragment was then ligated into pBluescriptSK- modified as above.
  • pBL was created from phBeL.
  • phBeL was digested by the restriction endonucleases Kpn and Notl. The single stranded overhangs left by these restriction enzymes was then removed by treatment with SI nuclease. The two large fragments, the 6.02-kb Kpnl/Kpnl fragment and the 3.47-kb ⁇ otl/ ⁇ otl fragment, were ligated together. This resulted in a loss of a 70-base pair fragment within the multiple cloning site that disrupted the hairpin structure found in phBeL, and a loss of a 0.51-kb fragment corresponding to the all but 100 base pairs of the EMCV IRES.
  • pBeL was made from phBeL and pBL. Both vectors were cut with the restriction endonucleases Seal and BssHU. The Seal recognition site is within the amp resistance gene and the BssHU recognition site is within the ⁇ - galactosidase coding region. The 7.03-kb fragment of phBeL was combined with the 2.97-kb fragment of pBL. This resulted in a loss of the hairpin structure of phBeL while maintaining the complete EMCV IRES.
  • pnBeG pnBeG was constructed of component fragments of SK/nBeG(afmx) and pBL.
  • Both vectors were digested with the restriction endonucleases S ⁇ cl and Xmal. Sacl cuts within the amp resistance gene and Xmal cuts just upstream of the ⁇ -galactosidase gene in either vector.
  • the 6.67- kb Xmal/SacI fragment of SK/nBeG(afmx) was ligated to the 1.35-kb Sacl/Xmal fragment of pBL. This regenerated the amp resistance gene and replaced the T7 promoter region of SK/nBeG(afmx) with the CMV/T7 promoters located within pBL.
  • pnBeG was further optimized by PCR mutagenesis of the IRES-GFP junction to GAAAAACACGATTGCTATATGGCCACA ACCATGGCTA GC
  • GM2 is a GFP that has been modified to fluoresce more than Affymax GFP. Teh construct with the optimal spacing between the EMCV
  • SKVnBeG(afmx) was constructed from component fragments of SK/eG(afmx) and KS/NCOnls ⁇ gal.
  • SK/eG(afmx) was digested with the restriction endonuclease EcoKV. This linearized the vector upstream of the EMCV IRES and Affymax GFP.
  • KS/NCOnls ⁇ gal was digested with the restriction endonucleases Dr ⁇ and Spel. Following this digestion, the single- stranded overhangs created by these enzymes were completely filled by using T4 polymerase.
  • the 3.28-kb Spel/Dral fragment which contained a nuclear localized variant of ⁇ -galactosidase, was ligated into the EcoRV digested SK/eG(afrnx). Recombinants with the ⁇ -galactosidase coding region on the same coding strand as the GFP were selected.
  • SKVeG(afmx) was created with component fragments of SK/ ⁇ -globin 3'UTR 2a, SK/EMCV IRES, and pBAD-GFP (A. Crameri et al., Nat. Biotechnol, 14(3), 315-9 (1996), available from Affymax).
  • SK/ ⁇ -globin 3'UTR 2a was digested with EcoRI. This linearized SK/ ⁇ -globin 3'UTR 2a 5' of the Xenopus ⁇ -globin 3 'UTR.
  • SK/ ⁇ MCV IRES was digested first with
  • SK/ ⁇ -globin 3'UTR 2a was created from component fragments of pBluescriptSK- (Stratagene) and XenB3UTR (a gift of H. Joseph Yost, Huntman Cancer Center, Univeristy of Utah, Salt Lake City, UT).
  • the XenB3UTR was digested with EcoRI and Xbal. The single- stranded overhang resulting from digestion with these enzymes was completely filled using Klenow polymerase.
  • This fragment containing the Xenopus ⁇ - globin 3' UTR cDNA in the orientation from EcoRI to Xbal was cloned into the Smal site of pBluescriptSK- (Stratagene).
  • KS/NCOnls ⁇ gal was constructed from component fragments of pBluescript KS-, ⁇ PD1.27 (A. Fire et al., Gene, 93(2), 189-98 (1990)), and a short adapter (AGCCATGGCT) (S ⁇ Q ID NO:65).
  • pBluescript KS- was cut with Xbal and Notl. Both of these enzymes cut within the multiple cloning site of pBluescript KS- and therefore the digest results in a linearization of the pBluescript KS-.
  • pPDl .27 was also cut with Xbal and
  • capped synthetic mR ⁇ A was prepared using Ambion's mMessage machine and diluted to 200 ⁇ g/ml with DEPC-treated H 2 O prior to injection. Purified supercoiled D ⁇ A was diluted to 50 ⁇ g/ml with H 2 O prior to injection. One nanoliter of capped mR ⁇ A or D ⁇ A was injected into or just under the cytoplasm of single-cell embryos. Post-injection embryos were incubated at 28.5°C. ⁇ -galactosidase and Luciferase Expression Levels.
  • Embryos injected with pBeL, pBL, or phBeL mR ⁇ As were collected in groups of five embryos at 0.5, 2, 4, 6, 8, 10, and 12 hours postinjection.
  • the embryos were lysed with 50 ⁇ l of lx reporter lysis buffer (Promega) and a micropestal.
  • Embryonic lysates were stored at -80°C prior to further analysis. Frozen lysates were thawed by hand, and microfuged at 8,000 x g at 4°C for 5 minutes. Lysates were kept on ice at all times during preparation.
  • pBeL ( Figure 13) encodes ⁇ - galactosidase in the first cistron and luciferase in the second cistron.
  • the two cistrons are separated by the EMCV IRES, ⁇ -galactosidase was expected to be translated by standard cap-dependent scanning whereas luciferase is expected to be translated only if the EMCV IRES directs internal initiation in a developing zebrafish.
  • luciferase activity detected from pBeL could be due to leaky scanning through the ⁇ -galactosidase initiation codon or reinitiation of ribosomes at the luciferase initiation codon.
  • phBeL a dicistronic control vector
  • pBL pBL
  • an additional sequence in the 5' UTR forms a stable hairpin structure in the mRNA that should prevent ribosomal scanning to the ⁇ -galactosidase open reading frame. If the luciferase activity observed in the test vector, pBeL, is due to leaky scanning, the luciferase activity observed in phBeL should be reduced to the same extent as the ⁇ -galactosidase expression.
  • luciferase expression levels should be unaffected by the incorporation of a hairpin structure in the 5' UTR of phBeL.
  • pBL the majority of the EMCV IRES was removed.
  • the luciferase activity observed in the test vector, pBeL is from ribosomes that have translated the ⁇ -galactosidase open reading frame followed by reinitiation at the luciferase initiation codon, luciferase levels from pBL should be comparable to those in pBeL.
  • the expression of luciferase in pBeL is due to internal initiation directed by the EMCV IRES, there should be little to no luciferase activity in pBL-injected embryos.
  • mRNA was transcribed in vitro using the T7 promoter present in pBeL, pBL, and phBeL (Fig. 13). Shown in Fig. 14 are the ⁇ -galactosidase and luciferase activities of pBeL, phBeL, or pBL mRNA-injected embryos at 6 hours postinjection. pBeL- injected embryos expressed significant amounts of both ⁇ -galactosidase and luciferase. This was the first indication that a dicistronic message could produce protein from both of its open reading frames in developing zebrafish embryos.
  • ⁇ -galactosidase In phBeL-injected embryos, a hairpin structure in front of the first open reading frame, ⁇ -galactosidase, blocked its production but did not affect production of luciferase in the second cistron. Deletion of the EMCV IRES blocked the production of luciferase from the second cistron in pBL-injected embryos, but did not affect ⁇ -galactosidase production. Thus, the EMCV IRES is required for translation of luciferase from the dicistronic message pBeL in zebrafish embryos, and translation of the second cistron is not occurring by a leaky scanning or reinitiation mechanism. Immunolocalization of Dicistronic Reporters.
  • ⁇ -galactosidase and luciferase were localized by immunohistochemistry.
  • pBeL plasmid DNA was injected into or just under the cytoplasm of single cell embryos. The embryos were then fixed and immunostained. The embryos displayed highly mosaic expression patterns characteristic of DNA injections. Cells positive for ⁇ -galactosidase also stained for luciferase. Occasionally, weakly expressing cells were observed to express only one of the two reporters.
  • GFP is a powerful reporter in the optically clear embryos of the zebrafish because it allows non-invasive analysis of expression in living embryos. Embryos injected with pnBeG DNA were examined for GFP expression at 24 hours postinjection. Although the observed GFP expression was only 5- 15% of what is seen when standard monocistronic GFP expression cassettes are injected into zebrafish, its expression was readily detectable. GFP was expressed in a wide variety of cells derived from ectoderm, mesoderm, and endoderm.
  • GFP Gene-trap vector construction
  • a gene-trap transposon vector has been constructed and injected into zebrafish embryos (see, e.g., Fig. 12(A)). At least one specific cell in several embryos at approximately 28 hours post-injection tested positive for the detectable marker encoded by the gene-trap, indicating that the gene-trap had transposed into a coding sequence present in the zebrafish genome.
  • pFGT/eGFP-b was formed from component fragments of pT/HB and pFV/e(nls)G.
  • the parental vector, pT/HB was cut with the restriction endonucleases BgRl and Eagl. Prior to the cloning the Bgl ⁇ l and Eagl recessed ends were completely filled in using Klenow polymerase.
  • pFV/e(nls)G was cut with the restriction endonuclease Nael and the fragment containing approximately the last 200 nucleotides of the carp ⁇ -actin intron 1, the ⁇ MCV
  • pFGT/eGFP-b has IR/DR(R) of the sleeping beauty transposon followed by the remnant BgRl site, the 3' end of ca ⁇ ⁇ -actin intron I, the EMCV IRES, GFP, the CSGH poly(A) signal, the remnant Eagl site, and the IR DR(L) of the sleeping beauty transposon.
  • pT/HB was constructed from components of pBluescript KS- (Stratagene) and pT/SVNeo (Z. Ivies et al. Cell, 91(4), 501-10 (1997)).
  • pBluescript KS- was digested with the restriction endonucleases S ⁇ cl and Accl; this digest removes most of the multiple cloning site found within pBluescript KS-.
  • pT/SVNeo was also cut with S ⁇ cl and Accl. This digest gave two products one of them being the SVNeo sleeping beauty transposon complete with both IR/DRs. The transposon piece was then cloned into pBluescript KS-. This vector, pT/Hindlll-precursor, was then digested with the restriction endonuclease
  • Hindlll This digest removed the internal portion of the transposon containing the SV40 promoter, neomycin resistance gene, and SV40 poly(A) signal. The remaining vector piece was ligated to create the plasmid pT/Hindlll, a vector containing a single Hr ⁇ dlll site between the IR DRs of the sleeping beauty transposon system. pT/ ⁇ indlll was then cut with Xbal. Xbal cut pT/ ⁇ indlll once, and the recessed ends of this digestion were completely filled in using
  • pT/MCS-precursor was then cut with Hr ⁇ dlll.
  • a short double-stranded oligo was ligated to produce a multiple cloning region containing restriction endonuclease sites for Hr ⁇ dlll, EcoRV, EcoRI, Spel, Eagl, Notl, Xbal, and BgRl.
  • pT/ ⁇ B has the multiple cloning oligo inserted so that the sites go from H dIII to BgRl with respect to the orientation of IR/DR(L) to IR DR(R).
  • pFV/e(nls)G was formed from components of pFV3 (Caldovic L., et al, Mol. Mar. Biol. Biotechnol, 4, 51-61 (1995)) and pnBeG*.
  • pFV3 was first digested by EcoRI. This linearized pFV3 just 3 ' of the CSG ⁇ poly(A) signal. After digestion with EcoRI, the recessed ends of pFV3 were completely filled in using Klenow polymerase. The resultant fragment was self- ligated to form pFV3 ⁇ RI.
  • pFV7a has the Sw ⁇ l site preceding the EcoRI site in relationship to the ca ⁇ ⁇ - actin promoter, ca ⁇ ⁇ -actin exon 1, ca ⁇ ⁇ -actin intron 1, and the CSG ⁇ poly(A) signal.
  • pnBeG* was then cut with EcoRI.
  • One of the resulting fragments of this digest contained only the ⁇ MCV IRES and GFP.
  • This fragment was then cloned into pFV7a digested with EcoRI.
  • the product that contained the ⁇ MCV IRES and GFP in the proper orientation with respect to the fish elements i.e. promoter, exon, intron, poly(A) signal
  • pFV/eG was then digested with the restriction endonuclease Nhel that cuts just after the initiation codon of GFP.
  • NLS2 TACTCCACCAAAGAAGAGAAAGGT GGAGGACG (SEQ ID NO:67) with CTAG 5' end overhangs
  • pFV/e(nls)G has an additional 12 amino acids (TPPKKRKVE DAS) (SEQ ID NO:68) encoding the SV40 nuclear localization signal.
  • pFGT/etTA was formed from component fragments of pFGT/eGFP-b and pTet-Off (Clontech).
  • the parental vector, pFGT/eGFP-b was cut with Ncol and Spel. This digest removed the GFP and CSGH poly(A) signal from the remaining pFGT/eGFP-b vector.
  • the tetracycline responsive transcriptional activator (tTA) of pTet-Off was PCR mutagenized to create an
  • pSBRNAX The pSBRNAX vector was made with component fragments from SK/ ⁇ -globin 3 'UTR 2a and SB 10 transposase (Z. Ivies et al. Cell, 91(4), 501-10 (1997)).
  • SK/ ⁇ -globin 3'UTR 2a was digested with the restriction endonuclease EcoRV.
  • the SB 10 transposase was ampified by polymerase chain reaction that inco ⁇ orated an BamHl restriction site upstream of the SBlO-coding sequence and an EcoRI restriction site downstream of the SBlO-coding sequence as described by Z. Ivies et al. Cell, 91(4), 501-10 (1997). This fragment was digested with BamHl and EcoRI, and the resulting single- strand DNA overhangs were completely filled in using Klenow polymerase. The resulting 1.03-kb fragment was then ligated into the linearized SK/ ⁇ - globin3'UTR-2a. Microinjection of Zebrafish.
  • Embryos from wild-type zebrafish were used for all experiments as described (M. Westerfield, The Zebrafish Book, University of Oregon Press, Eugene, OR (1995)).
  • 3 ⁇ l of 50 ⁇ g/ml of pFGT/eGFP-b DNA was mixed with 1 ⁇ l of 100 ⁇ g/ml Sleeping Beauty mRNA.
  • pFGT/eGFP-b was injected as a supercoiled plasmid or as linear DNA.
  • the linear form of pFGT/eGFP-b was obtained by digestion with the restriction endonuclease BspHl, which has two recognition sites within the vector backbone.
  • the two resultant fragments were separated by gel electrophoresis, and the transposon containing fragment was purified using Qiagen' s gel extraction kit.
  • the Sleeping Beauty mRNA was produced using
  • Embryos were injected with linear pFGT/eGFP-b D ⁇ A and Sleeping Beauty mR ⁇ A, grown to about the 28-hour stage and illuminated with blue light.
  • GFP in selective cells (for instance, muscle pioneer cells and myotomes) emitted a green fluorescence, indicating that the transposon had integrated into a gene that was expressed in these cells.
  • SEQ ID NO: 1 An SB transposase. SEQ ID NO:2; Junction sequence of T/neo transposon integrated into human genomic DNA.
  • SEQ ID NO:3 Nucleic acid sequence encoding an SB protein.
  • SEQ ID NO:6 5' outer direct repeat.
  • SEQ ID NO:7 5' inner direct repeat.
  • SEQ ID NO: 10 A consensus direct repeat.
  • SEQ ID NO: 11 A portion of a direct repeat sequence.
  • SEQ ID NO:12-36 Oligonucleotide primer.
  • SEQ ID NO:37 Salmonid transposase-binding sites.
  • SEQ ID NO:38 Zebrafish Tdrl transposase-binding sites.
  • SEQ ID NO:39 Salmonid transposase-binding sites.
  • SEQ ID NO:40 Zebrafish Tdrl transposase-binding sites.
  • SEQ ID NO:41 Outer transposase-binding site in SB transposon
  • SEQ ID NO:42 Internal transposase-binding site in SB transposon.
  • SEQ ID NO:45-63 Junction sequence of T/neo transposon integrated into human genomic DNA.
  • SEQ ID NO:65 An adaptor.
  • SEQ ID NO:66-67 A double stranded oligonucleotide.
  • SEQ ID NO:68 SV40 nuclear localization signal.
  • SEQ ID NO:69 Oligonucleotide primer.
  • SEQ ID NO:70 Amino acids of EMCV polypeptide.
  • SEQ ID NO:71-75 Oligonucleotide primer.

Landscapes

  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mycology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

La présente invention concerne des vecteurs de transposon codant pour des pièges à gènes et des pièges à régions de commande d'expression. La présente invention concerne également des vecteurs dicistroniques. Selon certains modes de réalisation, l'invention concerne, en outre, des sites d'entrée de ribosome interne.
EP98957974A 1997-11-13 1998-11-13 Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule Withdrawn EP1034258A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US6530397P 1997-11-13 1997-11-13
US65303P 1997-11-13
PCT/US1998/024348 WO1999025817A2 (fr) 1997-11-13 1998-11-13 Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule

Publications (1)

Publication Number Publication Date
EP1034258A2 true EP1034258A2 (fr) 2000-09-13

Family

ID=22061760

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98957974A Withdrawn EP1034258A2 (fr) 1997-11-13 1998-11-13 Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule

Country Status (5)

Country Link
EP (1) EP1034258A2 (fr)
JP (1) JP2001523450A (fr)
AU (1) AU1410399A (fr)
CA (1) CA2309000A1 (fr)
WO (1) WO1999025817A2 (fr)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0973928B1 (fr) 1997-03-11 2010-05-05 Regents Of The University Of Minnesota Systeme transposon a base d'adn permettant d'introduire de l'acide nucleique dans l'adn d'une cellule
US7160682B2 (en) 1998-11-13 2007-01-09 Regents Of The University Of Minnesota Nucleic acid transfer vector for the introduction of nucleic acid into the DNA of a cell
CA2373121A1 (fr) * 1999-05-11 2000-11-16 Regents Of The University Of Minnesota Diffusion de sequences de transposon integratreur par des vecteurs
AU1292301A (en) * 1999-10-19 2001-04-30 Minos Biosystems Limited Method for genetic manipulation
AU1100201A (en) 1999-10-28 2001-05-08 Board Of Trustees Of The Leland Stanford Junior University Methods of in vivo gene transfer using a sleeping beauty transposon system
AU2001231171B2 (en) 2000-01-28 2007-01-04 The Scripps Research Institute Synthetic internal ribosome entry sites and methods of identifying same
CA2407651C (fr) * 2000-04-27 2013-07-02 Max-Delbruck-Centrum Fur Molekulare Medizin Sleeping beauty, un vecteur transposon a large gamme d'hotes pour la transformation genetique chez les vertebres
EP1286699A2 (fr) * 2000-05-19 2003-03-05 Regents Of The University Of Minnesota Composition pour l'introduction de composes dans des cellules
WO2003089618A2 (fr) 2002-04-22 2003-10-30 Regents Of The University Of Minnesota Systeme de transposons, et procedes d'utilisation
DE10224242A1 (de) * 2002-05-29 2003-12-11 Max Delbrueck Centrum Frog Prince, ein Transposonvektor für den Gentransfer bei Wirbeltieren
US7985739B2 (en) 2003-06-04 2011-07-26 The Board Of Trustees Of The Leland Stanford Junior University Enhanced sleeping beauty transposon system and methods for using the same
US7919583B2 (en) 2005-08-08 2011-04-05 Discovery Genomics, Inc. Integration-site directed vector systems
US9228180B2 (en) 2007-07-04 2016-01-05 Max-Delbruck-Centrum Fur Molekulare Medizin Polypeptide variants of sleeping beauty transposase
WO2011153440A2 (fr) 2010-06-04 2011-12-08 Regents Of The University Of Minnesota Système de mutagenèse par transposons et ses méthodes d'utilisation
US10260087B2 (en) 2014-01-07 2019-04-16 Fundació Privada Institut De Medicina Predictiva I Personalitzada Del Cáncer Method for generating double stranded DNA libraries and sequencing methods for the identification of methylated cytosines
BR112020000406A2 (pt) * 2017-07-10 2022-09-13 Inst De Biologia Molecular Do Parana Ibmp Plataforma genética para superexpressão heteróloga associada à seleção de células altamente produtoras de proteínas
WO2019027850A1 (fr) 2017-07-29 2019-02-07 Juno Therapeutics, Inc. Réactifs d'expansion de cellules exprimant des récepteurs recombinants
US20210254000A1 (en) 2017-11-01 2021-08-19 Juno Therapeutics, Inc. Process for producing a t cell composition
BR112020008340A2 (pt) 2017-11-01 2020-11-17 Juno Therapeutics Inc processo para gerar composições terapêuticas de células modificadas
SG11202005272SA (en) 2017-12-08 2020-07-29 Juno Therapeutics Inc Process for producing a composition of engineered t cells
JP2022554348A (ja) 2019-11-05 2022-12-28 ジュノー セラピューティクス インコーポレイテッド 治療用t細胞組成物の属性を決定する方法
EP4127184A4 (fr) * 2020-03-30 2024-04-24 Inst Zoology Cas Systèmes de transposon d'adn actif et leurs procédés d'utilisation
WO2021231657A1 (fr) 2020-05-13 2021-11-18 Juno Therapeutics, Inc. Procédés d'identification de caractéristiques associées à une réponse clinique et leurs utilisations
CA3210581A1 (fr) 2021-03-22 2022-09-29 Neil HAIG Procedes de determination de la puissance d'une composition de cellules therapeutiques
WO2023122716A1 (fr) 2021-12-22 2023-06-29 Vanderbilt University Transpososomes de nouvelle génération
WO2023230548A1 (fr) 2022-05-25 2023-11-30 Celgene Corporation Procédé de prédiction d'une réponse à une thérapie par lymphocyte t
WO2023230272A1 (fr) 2022-05-27 2023-11-30 Kite Pharma, Inc. Administration non virale de constructions de thérapie cellulaire

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0885307A1 (fr) * 1996-02-09 1998-12-23 Vereniging Het Nederlands Kanker Instituut Vecteurs et methodes afferentes permettant d'apporter a des cellules des acides nucleiques supplementaires s'integrant dans leur genome
EP0973928B1 (fr) * 1997-03-11 2010-05-05 Regents Of The University Of Minnesota Systeme transposon a base d'adn permettant d'introduire de l'acide nucleique dans l'adn d'une cellule

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9925817A3 *

Also Published As

Publication number Publication date
CA2309000A1 (fr) 1999-05-27
WO1999025817A2 (fr) 1999-05-27
AU1410399A (en) 1999-06-07
JP2001523450A (ja) 2001-11-27
WO1999025817A3 (fr) 1999-08-05
WO1999025817A9 (fr) 1999-09-10

Similar Documents

Publication Publication Date Title
US7160682B2 (en) Nucleic acid transfer vector for the introduction of nucleic acid into the DNA of a cell
US6489458B2 (en) DNA-based transposon system for the introduction of nucleic acid into DNA of a cell
EP1034258A2 (fr) Systeme de transposon a base d'adn permettant d'introduire un acide nucleique dans l'adn d'une cellule
WO1998040510A9 (fr) Systeme transposon a base d'adn permettant d'introduire de l'acide nucleique dans l'adn d'une cellule
JP2022095633A (ja) ヌクレアーゼ非依存的な標的化遺伝子編集プラットフォームおよびその用途
EP3184632B1 (fr) Transposases piggybac hyperactives
US7098031B2 (en) Random integration of a polynucleotide by in vivo linearization
JP2009542228A (ja) 定方向進化による触媒作用のために最適化されたキメラジンクフィンガーリコンビナーゼ
JP2001120284A (ja) 外来性遺伝子のゲノム上部位特異的挿入により作製されたトランスジェニック又はキメラ動物
EP1222262B1 (fr) Construction de piegeage de genes conditionnel pour la disruption genetique
EP1187938B1 (fr) Procede d'expression de transgenes dans la lignee germinale de caenorhabditis elegans
US7951927B2 (en) Reconstructed human mariner transposon capable of stable gene transfer into chromosomes in vertebrates
Boeke Putting mobile DNA to work: the toolbox
EP1308516A1 (fr) Protéines de fusions à activité recombinase ciblée et les polynucléotides, vecteurs et kits correspondants, ainsi que leur utilisation pour la recombinaison ciblée de l'ADN
US20040231006A1 (en) Self-extinguishing recombinases, nucleic acids encoding them and methods of using the same
Clark The development of Sleeping Beauty gene-trap transposons for insertional mutagenesis of vertebrates
Thomson Optimizing integration events in the Cre/loxP system

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000609

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RBV Designated contracting states (corrected)

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20030601