WO1998020031A1 - Technique de production de genes, de proteines et de transcripts marques - Google Patents

Technique de production de genes, de proteines et de transcripts marques Download PDF

Info

Publication number
WO1998020031A1
WO1998020031A1 PCT/US1997/020150 US9720150W WO9820031A1 WO 1998020031 A1 WO1998020031 A1 WO 1998020031A1 US 9720150 W US9720150 W US 9720150W WO 9820031 A1 WO9820031 A1 WO 9820031A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
sequence
group
dna
tagged
Prior art date
Application number
PCT/US1997/020150
Other languages
English (en)
Inventor
Jonathan W. Jarvik
Original Assignee
Jarvik Jonathan W
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jarvik Jonathan W filed Critical Jarvik Jonathan W
Priority to AU51685/98A priority Critical patent/AU5168598A/en
Priority to CA002271228A priority patent/CA2271228A1/fr
Publication of WO1998020031A1 publication Critical patent/WO1998020031A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags

Definitions

  • This invention relates to the fields of Molecular Biology and Molecular Genetics with specific reference to the identification and isolation of proteins and of the genes and transcripts that encode them.
  • Transposon Tagging Another technique for cloning genes that has been developed relatively recently goes by the name transposon tagging.
  • this technique (3) mutations due to the insertion of transposable elements into new sites in the genome are identified, and the genes in which the transposons lie can then be cloned using transposon DNA as a molecular probe.
  • Transposon tagging like RFLP/linkage analysis, identifies genes, not proteins.
  • Enhancer trapping Another method for identifying genes, enhancer trapping (4) , involves the random insertion into a eucaryotic genome of a promoter-less foreign gene (the reporter) whose expression can be detected at the cellular level. Expression of the reporter gene indicates that it has been fused to an active transcription unit or that it has been inserted into the genome in proximity to cis- acting elements that promote transcription. This approach has been important in identifying genes that are expressed in a cell type-specific or developmental stage-specific manner. Enhancer trapping, like RFLP/linkage analysis and transposon tagging, identifies genes, not proteins, and it does not directly reveal anything about the nature of the protein product of a gene.
  • guest Peptides and Epitope Tagging A number of studies have been performed in which new peptides have been inserted into proteins at a variety of positions by modifying the genes encoding the proteins 5 using recombinant DNA technology.
  • the term "guest peptide” has been used to describe the foreign peptides in these cases . It is clear that in many cases the presence of such peptides is relatively innocuous and does not substantially compromise protein function - especially in those cases
  • Epitope tagging (5) is a method that utilizes antibodies against guest peptides to study protein localization at the cellular level and subcellular levels.
  • Epitope tagging begins with a cloned gene and an antibody that recognizes a known peptide (the epitope) .
  • the epitope an antibody that recognizes a known peptide
  • a sequence of nucleotides encoding the epitope is inserted into the coding region of the cloned gene, and the hybrid gene is introduced into a
  • the hybrid gene When the hybrid gene is expressed the result is a chimeric protein containing the epitope as a guest peptide. If the epitope is exposed on the surface of the protein, it is available for recognition by the epitope-specific antibody, allowing
  • Epitope tagging serves to mark proteins of already-cloned genes but does not serve to identify genes.
  • Isolating Genes Beginning with the Proteins they Encode 30 A number of procedures have been developed for isolating genes beginning with the proteins that they encode. Some, such as expression library screening (6) , involve the use of specific antibodies that react to the protein of interest. Others involve sequencing all or part 35 of the protein and designing oligonucleotide probes that can be used to identify the gene by DNA/DNA hybridization. In all of these cases, one must have specific knowledge about a protein before it is possible to take steps to clone and characterize the gene that encodes it .
  • cDNA Cloning and Sequencing A method of gene identification that has received a great deal of attention in the recent past is the cloning (and in many instances, sequencing) of so-called expressed sequence tags (ESTs) from cDNA libraries made from mRNA extracted from a given tissue or cell type (7) .
  • ESTs expressed sequence tags
  • Information about the proteins encoded by the mRNAs can be derived from the cDNA sequences by identifying and analyzing their open reading frames. In many cases such cDNAs are not full length, however, and so information about the amino-terminal portion of the protein is lacking. And, more significantly, the method tags transcript sequences and not the proteins that the transcripts encode.
  • RNA splicing is the natural phenomenon, characteristic of all eucaryotic cells, whereby introns are removed from primary RNA transcripts.
  • a large body of research has revealed that an intron is functionally defined by three components - a 5' donor site, a branch site and a 3' acceptor site (8) . If these sites are present, and if the intron is not too large (it can be at least as large as 2kb in many organisms) , and if the distance between the branch and 3' acceptor sites is appropriate, the cellular splicing machinery is activated and the intron is removed from the transcript.
  • Many different natural DNA sequences are known to have splice site function; consensus sites for mammalian splicing are indicated in Figure 1 below.
  • R purine
  • Y pyrimidine
  • N any base
  • Gene Trapping Gene trapping is a method used to identify transcribed genes.
  • Gene trapping vectors carry splice acceptor sites directly upstream of the coding sequence for a reporter protein such as b-galactosidase .
  • a reporter protein such as b-galactosidase
  • the vector inserts into an intron of an actively transcribed gene the result is a protein fusion between an N-terminal fragment of the target gene-product and the reporter protein, the activity of which is used as an indicator that integration into an active gene has occurred (9) .
  • Gene trapping seeks to identify transcribed genes - not to tag proteins, and to inactivate genes - not to produce an active tagged gene product.
  • the so-called central dogma of genetics states that information flows from DNA to RNA to protein.
  • the method of this invention tags each of the classes of macromolecule included in the central dogma. Accordingly, the method is referred to herein as "CD-tagging.”
  • CD-DNA is used herein to refer to a DNA molecule that is inserted into the genome using the method of this invention.
  • CD-tagging has just this feature.
  • cells can be treated with a CD-DNA, or with DNA constructs containing a CD-DNA, and then subjected to immunological screens or selections to identify the epitope tag. Many different screens or selections are possible, each of which has its own particular advantages.
  • tagged proteins include -direct or indirect immunofluorescence by which tagged proteins can be localized to particular regions or subcellular structures within a cell, immunoblot analysis by which the abundance, molecular weight and isoelectric points of tagged proteins can be determined, enzyme linked immunoassays (ELISAs) by which internal or secreted tagged proteins can be distinguished, and fluorescence-activated cell sorting (FACS) by which living cells with tagged proteins at their surfaces can be obtained.
  • ELISAs enzyme linked immunoassays
  • FACS fluorescence-activated cell sorting
  • proteins and genes of interest have been identified, they can be efficiently purified using standard hybridization and/or affinity-purification methods that take advantage of their specific tags.
  • CD-tagging depends on the insertion of a CD-DNA into an intron. Since higher eucaryotic genes contain much more intron than exon sequence, the target size is large relative to any other tagging method in which the DNA must insert into an exon. Further, since the typical gene contains numerous introns, the boundaries of which determine the sites at which amino acid insertions in the protein can be produced by CD-tagging, it is likely that for a given protein there exist multiple sites at which peptide tags produced by CD-DNA insertions would not seriously compromise protein function.
  • epitope fusion proteins have normal, or nearly normal, activity. But even this is not a requirement in order for CD-tagging to be useful in identifying proteins and their genes because in many applications one or more copies of the normal gene will be present in addition to the tag-containing gene (e.g., when diploid cells are tagged) ; here the tagged protein need not be fully functional as long as it can, for example, co- assemble at its normal location along with the protein encoded by the unaltered gene.
  • a DNA representing a portion of mRNA encoding the protein can be obtained by standard techniques such as plasmid rescue or amplifying the sequence of interest from cDNA by means of the polymerase chain reaction (PCR) using poly-dT as one primer and a DNA complimentary to the tag-encoding sequence as the other.
  • the amplified DNA can then be sequenced by standard methods . Knowledge of the sequence can then be used to design primers for amplification from genomic DNA in order to obtain genomic sequence information.
  • CD-tagging One important application for CD-tagging is to identify proteins, and the genes encoding them, that are present in particular subcellular structures. This can be done by screening CD-DNA recipients for those that express the protein tag in the structure of interest. A significant advantage of this approach is that it does not depend upon the purification of the structure of interest, or even on the prior existence of a method for such purification, as traditional methods for characterizing subcellular structures do.
  • CD-tagging holds the promise of identifying new structures, and the proteins they contain, that have not been explicitly recognized before.
  • CD-tagging can be used to identify proteins, and the genes encoding them, whose synthesis is stimulated by a particular treatment, such as the administration of a particular hormone or growth factor to a particular cell type . This can be accomplished by comparing treated and untreated cells to identify proteins whose levels change in response to the treatment. And, using standard immunocytochemical methods, one can discriminate among such proteins to identify those that are secreted, localized to the cell surface, or present in particular subcellular compartments .
  • Viral infection often leads to specific changes in cellular gene expression.
  • cellular genes whose expression is up or down-regulated can be identified by comparing the levels of tagged proteins in infected versus uninfected cells.
  • the viral genome is tagged, the expression of viral proteins during the viral life cycle can be observed.
  • CD-tagging provides an efficient general method to directly identify new genes on the basis of their expression as proteins and on the basis of the location of those proteins in particular cellular or extracellular structures.
  • CD-tagging provides a method for efficient physical and/or RFLP mapping of genes, as well as a method for the isolation of genes and transcripts via their nucleic acid tags and for the efficient purification of proteins via their epitope tags.
  • CD-tagging has specific advantages over the prior art method for identifying and mapping genes using expressed sequence tags
  • ESTs are cDNA sequences, not genomic sequences.
  • an EST probe will hybridize not only to the true gene but to any pseudogenes that are present in the genome, thereby limiting its usefulness for mapping and cloning the true gene.
  • an EST probe may hybridize with closely related members of a gene family, again limiting its usefulness as a probe for a unique sequence .
  • CD-tagging has broad application to the analysis and diagnosis of disease.
  • CD- tagging makes it possible to demonstrate, through linkage analysis, that a defect with respect to a given protein represents the primary defect for a given genetic disease or cancer. The function of the protein can then be examined in detail to gain new understanding of the biology of the disease.
  • genes that are isolated using CD-tagging can provide probes to identify disease- associated restriction fragment length polymorphisms, and they can provide primers by which mutations responsible for genetic diseases could be precisely identified. Once such polymorphisms or mutations have been identified, diagnostic tests for the presence of mutant alleles in homozygous or heterozygous individuals can be developed using standard approaches.
  • proteins that are isolated using the invention can be used as antigens to develop antibodies that can be used to make molecular diagnoses for a particular genetic disease.
  • genes or proteins that are identified using CD-tagging could be used to treat a wide variety of infectious and non- infectious diseases.
  • the invention utilizes a "CD-DNA" molecule that contains acceptor and donor sites for RNA splicing. Between the acceptor and donor sites is a sequence of nucleotides that encodes a particular peptide (or set of three peptides, one for each possible reading frame) .
  • CD-DNA When the CD-DNA is inserted into an existing intron, it creates a new peptide-encoding exon surrounded by two hybrid, but functional, introns. The result is that, after transcription, RNA splicing and translation, a protein is produced that contains the peptide located precisely between the amino acids encoded by the exons that surrounded the target intron.
  • the peptide encoding segments will generally be between 24 and 75 nucleotides in length so as to encode
  • This invention provides a method for tagging proteins and the genes and transcripts that encode them in a single recombinational event.
  • the method involves the insertion by in vi tro or in vivo recombination of a specially chosen and/or designed DNA sequence into an intron that is expressed within the genome of a cell or organism.
  • This DNA sequence carries: 1) coding information' for one or more specific peptides, typically, but not necessarily, from eight to twenty-five amino acids in length, and 2) appropriately placed branch, acceptor and donor sites for RNA splicing.
  • the nucleotide sequences representing the branch, acceptor and donor sites may represent natural sites taken from known genes or they may be rationally designed based on current knowledge of the nucleotide compositions of such sites (8) .
  • Figures 2-8 show the structures of a number of different embodiments of the invention.
  • a key and essential feature of these embodiments is that, when inserted into existing introns, they instruct the splicing machinery of the cell to recognize more than one intron where there was previously one, with these new introns flanking a new exon, or exons, encoding a peptide, or peptides, of determined amino acid sequence.
  • All of these embodiments can be readily produced by an individual skilled in the arts of molecular biology. I have not specified the specific means by which the embodiments are constructed because there are numerous ways, well known to an individual skilled in the arts of molecular biology, by which this can be accomplished.
  • Figure 2 represents a simple embodiment of the invention.
  • the DNA is designed to function when inserted into an intron that is transcribed from left to right . It has a peptide-encoding segment between splice acceptor 5 donor sites. Within the left arm is a splice branch site.
  • the size and nucleotide sequence of the peptide-encoding region determines the size and amino acid sequence of the encoded peptide, with the amino acid sequence of the peptide determined by the rules of the genetic code.
  • the 10 number of nucleotide pairs in the peptide-encoding region must be an even multiple of three to ensure that the reading frame is maintained with respect to the surrounding exons .
  • Figures 3, 4 and 5 represent embodiments designed 15 to function when inserted into an intron in either orientation.
  • Figure 6 represents a circular embodiment of the invention.
  • This embodiment could, for example, be a plasmid that contains DNA encoding the guest peptide.
  • Figure 7 represents an embodiment incorporating a gene, or genes, that could allow for selection in a target cell.
  • the gene is intron-less so that it does not contribute splice sites.
  • Figure 8 represents a circular embodiment of the 2.5 invention containing two peptide-encoding segments.
  • FIGS 2 through 8 represent some, but by no means all, possible embodiments of the invention.
  • CD-DNAs containing more than two segments encoding guest peptides can be designed; such CD-DNAs could be relatively large and yet not lead to the generation, in the target gene, of new introns that are excessively large for efficient splicing.
  • branch sites are less critical to splicing function than the acceptor and donor sites, in which case an effective embodiment of the invention might be created without specific branch sites.
  • the design of the CD-DNA is such that when it is inserted into an existing intron, it creates, within the intron, a new peptide-encoding exon.
  • the result is that, after transcription, RDA splicing and translation, a protein is produced that contains the peptide located precisely between the amino acids encoded by the exons that surrounded the target intron.
  • the gene encoding the protein is tagged by the CD-DNA sequence for recognition by a DNA probe or primer
  • the RNA transcript encoding the protein is tagged by the peptide-encoding sequence for recognition by a DNA probe or primer
  • 3) the protein is tagged by the peptide for recognition by a specific antibody or other reagent .
  • FIG 9 illustrates the structure of the DNA that results from the integration of a linear CD-DNA within an intron by recombination at its ends. When transcribed, this DNA yields an RNA that is spliced to produce an mRNA encoding a protein that contains a guest peptide located precisely between the protein segments encoded by the exons that bound the target intron.
  • Figure 10 illustrates the structure of the DNA that results from the integration of a circular CD-DNA within an intron by a single crossover.
  • this integrated DNA When transcribed, this integrated DNA yields an RNA that is spliced to produce an mRNA encoding a protein that also contains a guest peptide (in this case encoded in two guest exons) located precisely between the protein segments encoded by the exons that bound the target intron.
  • a guest peptide in this case encoded in two guest exons
  • CD-DNA Integration of a CD-DNA can be accomplished in a number of ways.
  • One approach involves the introduction of CD-DNA into cells by standard methods such as transformation, electroporation, transfection, bulk loading, or liposome fusion, followed by nonhomologous recombination of the CD-DNA into the genome.
  • the occurrence of such recombination is well known in many cell types; sometimes the integration of foreign DNA is accompanied by a small deletion of the target sequence, but, as long as such a deletion remains within the intron, it will present no problem.
  • the CD- DNA is inserted by standard in vi tro recombination methods into a genomic library in a viral or plasmid vector, and the recombinant plasmids or viruses are then introduced into cells where the recombinant genes are expressed.
  • Yet another approach takes advantage of the mobility of transposons; in this case the CD-DNA is located on a transposon that moves it to new sites in the genome via transposon insertion.
  • the peptide that is introduced into a protein is an epitope that is recognized by a specific monoclonal or polyclonal antibody.
  • epitope In principle, almost any amino acid sequence not present in the cells of interest could serve as such an epitope. And, while there may not be a single "optimal" epitope, epitope design could still follow a rational basis. In most cases, it would be valuable for the epitope to be on the surface of the protein where: 1) it would be readily available to the antibody combining site, and 2) it would minimally disrupt the tertiary structure of the protein as a whole .
  • hydrophilic epitopes except in the case of integral membrane proteins, where hydrophobic epitopes can be employed. If a single repeating nucleotide is used to encode the epitope, it will yield the same poly-amino acid epitope in all three reading frames; a repeating dinucleotide will encode two potential poly-amino acid epitopes, and a repeating trinucleotide, three such epitopes.
  • a somewhat more complex repeating sequence can be used to encode repeating di-amino acid epitopes, and still more informationally complex sequences can be used to create epitopes of a very wide variety of amino acid sequences, with the only obvious requirement being the absence of stop codons in the reading frames.
  • some CD-DNAs ( Figures 3, 4, 5) contain peptide-encoding sequences that can be read in both directions; in these cases as many as six distinct epitopes can be encoded on the same CD-DNA. Which epitope appears in the protein will then depend on the orientation the CD-DNA as well as the reading frame that is dictated by the specifics of the intron/exon boundaries of the target intron.
  • epitopes that are designed according to the principles outlined above, other epitopes exist, such as hemagglutinin sequences from influenza virus, micro-exon 1 encoded sequence from the ubx gene of Drosophila, or sequences encoded by the myc oncogene, that have already proved their worth in epitope tagging. These very sequences can be used in embodiments of CD-tagging, thereby ensuring that the guest peptides can be identified by standard procedures .
  • RNA splicing is a universal characteristic of eucaryotic cells
  • CD-tagging is applicable to a very wide variety of cells and organisms, including yeasts, protozoans, algae, metazoans (both plant and animal) , and somatic and germline cells derived from metazoan organisms .
  • yeasts, protozoans, algae, metazoans (both plant and animal) and somatic and germline cells derived from metazoan organisms .
  • the nucleotide sequences that are necessary and sufficient for splicing are highly conserved across the eucaryotes, it is likely that in many cases the same CD-DNA will function in a variety of cell types and organisms. This is not to say, however, that a given CD-DNA will not function optimally in a given cell type or organism, and so it may prove useful to develop different CD-DNAs for use in different backgrounds.
  • the optimal CD-DNA would typically be one in which splicing of the hybrid transcript always occurs .
  • One way to maximize the likelihood of this is to construct the CD-DNA using nucleotide sequences that are known to function in the very background in which the tagging is to be performed.
  • CD-tagging is a molecular-genetic method that adds specific tags to gene, mRNA and protein in a single recombinational event.
  • the CD-cassette or cassettes can be delivered directly to cells by transfection or transformation, or they may be incorporated into delivery vectors such as viruses or transposons. Using the CD-tagging method, establishing the correspondence between gene and protein in gene discovery is dramatically simplified due to the fact that gene and gene product are discovered together.
  • CD-tagging targets introns using one or more CD- cassettes that contain intronic splice branch, acceptor and donor sites surrounding an internal exon (Jarvik et al . 1996) .
  • Jarvik et al . 1996) describes the structure and use of genetic elements that, when incorporated in the appropriate intronic portions of tandem the CD-cassettes, allow one to create OR REMOVE frameshift mutations and thereby gain critical information about gene function.
  • Recombinase Two extensively characterized site-specific recombination systems are the ere recombinase and its target lox site (ataacttcgtataatgtatgctatacg aagttat) , and the FLP recombinase and its target FRT site (GAAGTTCCTATTCTCTAGAAAGTATAGGAACTTC) , but other systems exist as well, such as pSRl from Zygosaccharomyces rouxii. Recombinase can be provided to CD-tagged cells in a numberof ways .
  • a gene encoding the recombinase can be delivered to the tagged cell by transfection or by infection with a recombinant virus containing the gene (e.g., pAdv/Cre, Wang et al . (1996)).
  • the recombinase gene can be provided by crossing a transgenic animal carrying the CD-tagged gene to an animal that expresses recombinase; excision of the exon will then occur in those cells of the zygote in which recombinase is expressed (Lasko et al . (1992, 1996); Gu et al . (1994); Rajewsky et al. (1996) ) .
  • tandem CD-cassettes When tandem CD-cassettes are present in a gene, and when one of the CD-cassettes contains a pair of site- specific recombinase target sites surrounding its guest exon, then when recombinase activity is expressed in a cell containing the tagged gene the result is excision of the exon surrounded by the sites.
  • the two guest exons are designed to encode compensatory frameshift mutations; in particular, one guest exon contains 3N+1 nucleotides (where N is a whole number) and the other contains 3N-1 nucleotides.
  • CD-1+ an adenosine is added between C-168 and T-169.
  • CD-I- C-168 is deleted.
  • Oligonucleotides containing lox sites are synthesized by standard methods and inserted into CD-1+ at the Cla-1 site
  • Chlamydomonas pfl4 gene in plasmid pKE-RS3 following the procedures described in Jarvik et al . (1996) .
  • a doubly tagged plasmid with the CD-1+ cassette upstream of the CD- 1- cassette at the Nsil site is identified and named pRS03+/-.
  • pRS03+/- is transformed into the cre-expressing
  • E. coli strain NS3516 and plasmid is isolated from a clone of transformed cells and shown by sequencing analysis to have lost the CD-1+ cassette and to retain a single lox site.
  • This plasmid is named pRS03-/cre.
  • Chlamydomonas cells carrying a pfl4 ochre mutation are transformed with plasmid pKE-RS3 (Jarvik et al. (1996)), plasmid pRS03-/l and plasmid pRS03-/cre. Cells that contain the plasmid DNA are identified by PCR analysis.
  • the cells containing the wild type pfl4 gene (plasmid pKE-RS3) and those that contain plasmid pRS03+/- are observed to have acquired motile flagella, indicating that the tagged RSP3 protein expressed from the pRS03+- DNA is functional.
  • Immunofluorescence analysis with antibody 12CA5 shows immunostaining of the flagella in the transformants, and Western blot analysis shows the presence of a protein about 4kD larger than native RSP3 (pfl4 gene product) .
  • the transformants that contain the pRS03-/cre plasmid are not motile and their flagella are not immunostained with antibody 12CA5, indicating that the cells do not contain functional RSP3.
  • the new vector GeneFinder-1 is designed so that once a gene is tagged one can readily produce a frameshift mutation in it in vivo using FLP-recombinase .
  • GeneFinder-1 carries two epitope-encoding CD-cassettes, with the 5 ' exon 3N-1 nucleotides in length and the 3' exon 3N + 1 nucleotides.
  • Surrounding the downstream exon are FRT sites that serve to delete the exon in vivo when the strain is crossed to one expressing FLP-recombinase (Golic and Lindquist (1989)).
  • the result will be a frameshift mutation in the gene.
  • the FRT sites will be situated so that the vermillion gene is deleted as well, allowing us to readily identify individuals that have deleted the DNA between the FRT sites on the basis of eye color.
  • Jarvik et al . (1996)) is opened at its Sad site near the 5' end of the guest exon and ligated to a 20-fold molar excess of the two synthetic 11-mers, 5 ' caattggagct 3' and 5 ' ccaattgagct 3' (which base pair to form a SacI-to-SacI linker with an internal Muni site) .
  • the ligated DNA is cut with Muni, religated, and transformed into E. coli .
  • Plasmids are prepared from Ampr colonies and tested for the presence of a Muni restriction site at the former Sad site.
  • the guest exon in the 5' CD-cassette is 65 nt (3N-1) in length.
  • pCD-1 (Jarvik et al. (1996)) is opened at the Bglll site near the 3' end of its guest exon and ligated to a 20-fold molar excess of the synthetic 10-mer oligonucleotide 5' gatcccatgg 3' (which base pairs to form a Bglll-to-Bglll linker with an internal Ncol site) .
  • the ligated DNA is cut with Ncol, religated, and transformed into E. coli .
  • Plasmids are prepared from Ampr colonies and tested for the presence of a Ncol restriction site at the former Bglll site.
  • the guest exon in the 3' CD-cassette is 64 nt (3N+1) in length.
  • GeneFinder-1 element begins with plasmid pYCl.8, which contains a 1.8 kb vermillion gene surrounded by P element ends (Fridell and Searles (1991)).
  • a 34nt FRT sequence with Hindlll sticky ends obtained by combining two 38nt oligonucleotides, is inserted in the Hindlll site upstream of the vermilion insert.
  • the 5' CD-cassette is cut with EcoRI and the fragment is inserted at the polylinker EcoRI site of the pYC1.8 derivative. Recombinant plasmids are recovered and tested to identify one with the CD-cassette oriented opposite to the direction of transcription of vermilion.
  • This plasmid is opened with Sail and the Sail fragment of the 3 ' cassette inserted to produce a plasmid with tandem CD-cassettes.
  • an FRT site is inserted into the 3' cassette at the Pad site.
  • nptl (Kanr) ColEl-ori fragment from plasmid pUC4K is inserted into the NotI site to produce the complete GeneFinder-1 vector.
  • the plasmid is injected along with the transposase-donating plasmid pp25.7 into two sets of Drosophila embryos: a white mutant to identify transformants on the basis of white expression, and a vermillion, rosy mutant (v36fry506) to identify transformants on the basis of vermilion expression.
  • White+ transformant embryos are tested for expression of epitope- tagged Ubx protein after crossing to a GAL4 -expressing strain (Jarvik et al . (1996)) .
  • Epitope-tagged Ubx protein is observed, indicating that GeneFinder is a functional CD- tagging vector.
  • Observation of vermilion+ transformants is also observed, indicating that the vermilion gene is expressed from GeneFinder.
  • v+ transformants express epitope tagged Ubx protein, indicating that expression of vermilion does not interfere with expression of the guest exons from the opposite DNA strand.
  • GeneFinder-1 DNA is injected into v36fyl ry506 embryos along with the transposase-donating plasmid pp25.7.
  • v36fy+ry506 males are crossed to v36fy+ry506 males.
  • the male progeny (which can be recognized immediately by their yellow body color due to the X-lined yl mutation in addition to the normal sexually dimorphic characters) are primarily of two types: those that have received the TMS second chromosome and have scarlet eyes due to the v36f allele on X and the ry + gene in the D2-3 P element, and those that have received the second chromosome with the ry506mutation and therefore have peach eyes.
  • Sb+ males that express v+.
  • GeneFinder to X are tested to find one that transposes to other chromosomes with relatively high frequency under the influence of D2-3.
  • the first cross is performed as described above and several virgin female progeny are placed in individual bottles along with a similar number of v36fy+ ry506 males. Approximately fifteen days later the progeny in each bottle are examined for yellow non-stubble males with dark red eyes. Individual lines are established from these animals by crossing to a v36f strain. To ensure that all transposition lines are independent, only one line is established from each bottle.
  • Frameshift mutations are created beginning with the homozygous GeneFinder transposition lines generated as follows. First they are crossed to a strain that carries v- and ry-mutations and FLP38, a chromosome 3 MKRS balancer within which resides a P element with an ry+ marker gene and a FLP-recombinase gene under the control of the hsp70 heat shock promoter (Chou and Perrimon (1992)). Progeny carrying recombinase and its target FRT site (GAAGTTCCTATTCTCTAGAAAGTATAGGAACTTC) , but other systems exist as well, such as pSRl from Zygosaccharomyces rouxii.
  • Recombinase can be provided to CD-tagged cells in a number of ways.
  • a gene encoding the recombinase can be delivered to the tagged cell by transfection or by infection with a recombinant virus containing the gene
  • the recombinase gene can be provided by crossing a transgenic animal carrying the CD-tagged gene to an animal that expresses recombinase; excision of the exon will then occur in those cells of the zygote in which recombinase is expressed
  • tandem CD-cassettes When tandem CD-cassettes are present in a gene, and when one of the CD-cassettes contains a pair of site- specific recombinase target sites surrounding its guest exon, then when recombinase activity is expressed in a cell containing the tagged gene the result is excision of the exon surrounded by the sites.
  • the two guest exons are designed to encode compensatory frameshift mutations; in particular, one guest exon contains 3N+1 nucleotides (where N is a whole number) and the other contains 3N-1 nucleotides.
  • Oligonucleotides containing lox sites are synthesized by standard methods and inserted into CD-I- at the Cla-1 site (position 74) and at the Sphl site (position 203) by standard methods.
  • the modified CD-1+ and CD-I- are inserted in tandem into the Nsil site in intron 3 of the Chlamydomonas pfl4 gene in plasmid pKE-RS3 following the procedures described in Jarvik et al . (1996).
  • a doubly tagged plasmid with the CD-1+ cassette upstream of the CD- 1- cassette at the Nsil site is identified and named pRS03+/-.
  • pRS03+/- is transformed into in the cre- expressing E.
  • coli strain NS3516 and plasmid is isolated from a clone of transformed cells and GeneFinder and the FLP38 chromosome are identified on the basis of their wild type eye color and the dominant markers on the balancer. These animals are subjected to a heat shock regime (Golic and Lindguist (1989) ) and allowed to mate inter se . F2 progeny are screened to identify those with ry+ v- eye color; these animals have lost the 3' guest exon by recombination between their FRT sites. As a result, the CD-tagged mRNA is now frameshifted beginning immediately downstream of the 65 nt 5' guest exon; in the great majority of cases this inactivates the gene product and produces a recessive mutation.
  • the frameshift mutation is a dominant lethal, of course, there is a failure to obtain ry+ v-F2 animals altogether, and if it has a dominant visible phenotype it is apparent by inspection of the mutants.
  • the ry+ v- flies are crossed inter se and their progeny inspected.
  • the presence of ry- v- progeny in the F3 indicates that the mutation is not homozygous lethal; in such cases adults, larvae and embryos are observed closely to see if there is a visible nonlethal phenotype.
  • the fertility of these flies is also examined, because some mutations are male or female sterile.
  • the absence of ry- v- progeny in the F3 indicates that the mutaton is a lethal. In these cases the pupal, larval and embryonic stages are examined closely to identify the lethal stage and to determine the way in which the defect is expressed morphologically.
  • the truncated protein resulting from the frameshift mutation retains a guest epitope, it is worthwhile to immunostain the mutant organisms, including, in the recessive lethal cases, those that are dying or destined to die. Immunostaining is particularly informative in the cases where the original CD-tagged protein showed tissue or organ specific expression. Since the truncated protein, though inactive, serves to mark the very tissues in which its function is required.
  • the mutant analysis has an additional formal virtue. For each gene for which FLP-recombinase creates a recessive lethal mutation, it can be concluded that the original CD-tagged gene did in fact retain activity. Thus, the mutant data will allow us to reach explicit conclusions about the frequency with which CD-tagging a gene does, or does not, destroy its function.
  • the CD-tagged gene is initially tagged with a construct that does not alter the translational reading frame, and, by subsequent provision of recombinase activity, a frameshift is created. But the situation can be readily reversed, i.e., the tagging construct can create a frameshift, and subsequent provision of recombinase can remove it, leaving a functional CD-tagged gene. This is accomplished in the following manner.
  • the CD-tagging construct has two tandem CD-cassettes, as before, but now one of the guest exons has 3N+1 or 3N-1 nucleotides and the other has 3N. Recombinase target sites are provided flanking the 3N+1 or 3N-1 exon.
  • the guest exon that is excised by recombinase could encode an enzymatic activity (e.g., neomycin phosphotransferase or beta-galactosidase) or some other function (e.g., Green Fluorescent Protein or a substrate for biotin ligase activity) or it could contain translational stop codons . Accordingly, the scope of the invention should be determined not by the embodiments illustrated here but by the appended claims and their legal equivalents .
  • this invention describes a method for tagging gene, transcript and protein in a single recombinational event. This method has unique and highly useful advantages over all other methods with similar aims in the prior art .
  • the specific description of my invention presented above should not be construed as limitating its scope, but rather as exemplification of certain embodiments thereof. Many other variations and applications are possible.
  • peptides could be designed that have sites that lead to specific covalent modification of the tagged protein - either by a small molecule or a macromolecule .
  • the peptide tag could contain a site for hydrolysis of a peptide bond by an inducible protease, thereby making it possible to assess the function of the tagged gene in vivo .
  • CD-DNAs could contain cis-acting sites for the inducible activation of transcription arranged so that inhibitory anti-sense transcripts from the target gene are produced, thereby making it possible to assess the function of the tagged gene in vivo .
  • the peptide-encoding sequence could contain nucleotides that are hypermutable in vivo so as to promote mutations such as frameshifts that could inactivate protein function.
  • an enhancer of transcription could be included within the CD- DNA so that expression of the target gene is stimulated by the CD-DNA. Accordingly, the scope of the invention should be determined not by the embodiments illustrated here but by the appended claims and their legal equivalents.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

Cette invention porte sur une technique au titre de laquelle un marqueur moléculaire est placé sur un gène, un transcript et une protéine, en un seul processus de recombinaison. Le marqueur de protéine prend la forme d'un peptide unique pouvant être reconnu par un anticorps ou un autre réactif spécifique, le marqueur de transcript, celle d'une séquence nucléotidique codant le peptide qu'une sonde polynucléotidique spécifique est en mesure de reconnaître et le marqueur de gène, celle d'une séquence nucléotidique plus longue renfermant la séquence codant le peptide et d'autres séquences nucléotidiques associées. La particularité la plus notable de l'invention réside essentiellement dans le fait que la structure de l'ADN constitutif du marqueur est telle qu'au moment de son insertion dans un intron à l'intérieur d'un gène, il se crée deux introns hybrides séparés par un nouvel exon codant le marqueur de protéine. Le point fort de cette technique est de permettre l'identification de nouvelles protéines ou de structures contenant une protéine et, cela fait, une identification ainsi qu'une analyse aisées du gène codant ces protéines.
PCT/US1997/020150 1996-11-08 1997-11-07 Technique de production de genes, de proteines et de transcripts marques WO1998020031A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU51685/98A AU5168598A (en) 1996-11-08 1997-11-07 Method for producing tagged genes, transcripts and proteins
CA002271228A CA2271228A1 (fr) 1996-11-08 1997-11-07 Technique de production de genes, de proteines et de transcripts marques

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US70540496A 1996-11-08 1996-11-08
US08/705,404 1996-11-08

Publications (1)

Publication Number Publication Date
WO1998020031A1 true WO1998020031A1 (fr) 1998-05-14

Family

ID=24833313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/020150 WO1998020031A1 (fr) 1996-11-08 1997-11-07 Technique de production de genes, de proteines et de transcripts marques

Country Status (3)

Country Link
AU (1) AU5168598A (fr)
CA (1) CA2271228A1 (fr)
WO (1) WO1998020031A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999061604A2 (fr) * 1998-05-22 1999-12-02 Japan Science And Technology Corporation Vecteur de piegeage de gene et procede servant a pieger un gene au moyen de ce vecteur
US6080576A (en) * 1998-03-27 2000-06-27 Lexicon Genetics Incorporated Vectors for gene trapping and gene activation
US6136566A (en) * 1996-10-04 2000-10-24 Lexicon Graphics Incorporated Indexed library of cells containing genomic modifications and methods of making and utilizing the same
US6207371B1 (en) 1996-10-04 2001-03-27 Lexicon Genetics Incorporated Indexed library of cells containing genomic modifications and methods of making and utilizing the same
WO2002053732A2 (fr) * 2000-12-28 2002-07-11 Pangenex Technique d'elaboration de bibliotheque de polynucleotides, de reseaux de polynucleotides et de bibliotheques de cellules destines a l'analyse des caracteristiques genomiques de haut rendement
US6436707B1 (en) 1998-03-27 2002-08-20 Lexicon Genetics Incorporated Vectors for gene mutagenesis and gene discovery
US6808921B1 (en) 1998-03-27 2004-10-26 Lexicon Genetics Incorporated Vectors for gene mutagenesis and gene discovery
US6855545B1 (en) 1996-10-04 2005-02-15 Lexicon Genetics Inc. Indexed library of cells containing genomic modifications and methods of making and utilizing the same
US7332338B2 (en) 1996-10-04 2008-02-19 Lexicon Pharmaceuticals, Inc. Vectors for making genomic modifications
WO2009093242A2 (fr) * 2008-01-24 2009-07-30 Yeda Research And Development Co. Ltd. Populations cellulaires pour analyse de polypeptides et utilisations correspondantes

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5652128A (en) * 1993-01-05 1997-07-29 Jarvik; Jonathan Wallace Method for producing tagged genes, transcripts, and proteins

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5652128A (en) * 1993-01-05 1997-07-29 Jarvik; Jonathan Wallace Method for producing tagged genes, transcripts, and proteins

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BIOTECHNIQUES, July 1997, Vol. 23, No. 1, SMITH D.J., "Mini-Exon Epitope Tagging for Analysis of the Protein Coding Potential of Genomic Sequence", pages 116-120. *
BIOTECHNIQUES, May 1996, Vol. 20, No. 5, JARVIK et al., "CD-Tagging: A New Approach to Gene and Protein Discovery and Analysis", pages 896-898, 900-904. *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6207371B1 (en) 1996-10-04 2001-03-27 Lexicon Genetics Incorporated Indexed library of cells containing genomic modifications and methods of making and utilizing the same
US7332338B2 (en) 1996-10-04 2008-02-19 Lexicon Pharmaceuticals, Inc. Vectors for making genomic modifications
US6855545B1 (en) 1996-10-04 2005-02-15 Lexicon Genetics Inc. Indexed library of cells containing genomic modifications and methods of making and utilizing the same
US6136566A (en) * 1996-10-04 2000-10-24 Lexicon Graphics Incorporated Indexed library of cells containing genomic modifications and methods of making and utilizing the same
US6436707B1 (en) 1998-03-27 2002-08-20 Lexicon Genetics Incorporated Vectors for gene mutagenesis and gene discovery
US6776988B2 (en) 1998-03-27 2004-08-17 Lexicon Genetics Incorporated Vectors for gene mutagenesis and gene discovery
US6808921B1 (en) 1998-03-27 2004-10-26 Lexicon Genetics Incorporated Vectors for gene mutagenesis and gene discovery
US6080576A (en) * 1998-03-27 2000-06-27 Lexicon Genetics Incorporated Vectors for gene trapping and gene activation
WO1999061604A2 (fr) * 1998-05-22 1999-12-02 Japan Science And Technology Corporation Vecteur de piegeage de gene et procede servant a pieger un gene au moyen de ce vecteur
WO1999061604A3 (fr) * 1998-05-22 2000-03-02 Japan Science & Tech Corp Vecteur de piegeage de gene et procede servant a pieger un gene au moyen de ce vecteur
WO2002053732A2 (fr) * 2000-12-28 2002-07-11 Pangenex Technique d'elaboration de bibliotheque de polynucleotides, de reseaux de polynucleotides et de bibliotheques de cellules destines a l'analyse des caracteristiques genomiques de haut rendement
WO2002053732A3 (fr) * 2000-12-28 2003-04-10 Pangenex Technique d'elaboration de bibliotheque de polynucleotides, de reseaux de polynucleotides et de bibliotheques de cellules destines a l'analyse des caracteristiques genomiques de haut rendement
WO2009093242A2 (fr) * 2008-01-24 2009-07-30 Yeda Research And Development Co. Ltd. Populations cellulaires pour analyse de polypeptides et utilisations correspondantes
WO2009093242A3 (fr) * 2008-01-24 2009-11-19 Yeda Research And Development Co. Ltd. Populations cellulaires pour analyse de polypeptides et utilisations correspondantes

Also Published As

Publication number Publication date
AU5168598A (en) 1998-05-29
CA2271228A1 (fr) 1998-05-14

Similar Documents

Publication Publication Date Title
US6096717A (en) Method for producing tagged genes transcripts and proteins
US11111506B2 (en) Compositions and methods of engineered CRISPR-Cas9 systems using split-nexus Cas9-associated polynucleotides
EP1984512B1 (fr) Système d'expression génique utilisant une variante d'épissage chez des insectes
CA3111432A1 (fr) Nouvelles enzymes crispr et systemes
JP6665088B2 (ja) 配列操作のための最適化されたCRISPR−Cas二重ニッカーゼ系、方法および組成物
Jarvik et al. CD-tagging: a new approach to gene and protein discovery and analysis
EP0955364A2 (fr) Eléments transposables eucaryotiques
EP1187938B1 (fr) Procede d'expression de transgenes dans la lignee germinale de caenorhabditis elegans
Arcà et al. Mobilization of a Minos transposon in Drosophila melanogaster chromosomes and chromatid repair by heteroduplex formation
WO1998020031A1 (fr) Technique de production de genes, de proteines et de transcripts marques
Sievert et al. Sequence conservation and expression of the Sex-lethal homologue in the fly Megaselia scalaris
EP1885856B1 (fr) Transposition d'elements ac/ds de mais chez les vertebres
CN112334004A (zh) 在节肢动物中基因驱动靶向雌性双性剪接
EP3594339A1 (fr) Composition contenant une endonucléase c2cl pour étalonnage diélectrique, et procédé d'étalonnage diélectrique utilisant celle-ci
Reid et al. An Anopheles stephensi promoter-trap: augmenting genome annotation and functional genomics
JP4364474B2 (ja) 哺乳動物において機能的なトランスポゾン
US5252475A (en) Methods and vectors for selectively cloning exons
Twyman Recombinant DNA and molecular cloning
EP1661992B1 (fr) Procédé de criblage d'événements de recombination homologue
US20200131487A1 (en) Composition containing c2cl endonuclease for dielectric calibration and method for dielectric calibration using same
WO2024076688A2 (fr) Zones de sécurité génomiques synthétiques et procédés associés
OBrochta et al. Hobo-like transposable elements as non-drosophilid gene vectors
US20050186677A1 (en) Novel mutated mammalian cells and animals
Keplinger Identification and characterization of promoter and enhancer elements controlling glucose dehydrogenase expression in the Drosophila reproductive tract
Sundararajan Transposable element interactions and transposase functional analysis. Critical questions for development of insect gene vectors

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AT AU AZ BA BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE GH HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2271228

Country of ref document: CA

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase