WO2016113733A1

WO2016113733A1 - Vectors, compositions and methods for endogenous epitope tagging of target genes

Info

Publication number: WO2016113733A1
Application number: PCT/IL2016/050035
Authority: WO
Inventors: Yardena Samuels; Rafi EMMANUEL
Original assignee: Yeda Research And Development Co. Ltd.
Priority date: 2015-01-13
Filing date: 2016-01-13
Publication date: 2016-07-21
Also published as: WO2016113733A9

Abstract

There are provided tagging vectors, compositions comprising the same and methods of using the same for specific and efficient endogenous epitope tagging of target genes in target cells.

Description

VECTORS, COMPOSITIONS AND METHODS FOR ENDOGENOUS EPITOPE TAGGING OF TARGET GENES FIELD OF THE INVENTION

The present invention relates to vectors, compositions and methods for endogenous epitope tagging of target genes in target cells.

BACKGROUND OF THE INVENTION

The study of genes and products thereof (such as proteins) may be performed by various means. The study may include various methods, some of which include exogenously introducing and overexpressing the gene in test cells, whereby the test gene is exogenous to the genome of the cells. However, such heterologous method, albeit allow studying some properties of the test gene do not exactly emulate the exact physiological conditions under which the gene is expressed in the native cells, under different conditions.

A different approach to study genes and products thereof in their physiological context is endogenous epitope tagging, which allows tagging one of the alleles of the tested gene, for example by homologous recombination (HR) mediated pathway, to provide for a gene product (for example, protein encoded by the tested gene) which includes the tag. Generally, the translational stop codon of the endogenous gene of interest is replaced with a tag and a resistance gene cassette. This substitution is directed by ~1 Kbs of homologous sequences flanking the stop codon of the gene of interest on both sides. The tagged protein may then be further studied, for example, to identify binding partners of the tested tagged protein, under physiological conditions. Using antibodies, directed to the tag added to the protein of interest, the tagged protein can be immunoprecipitated and subjected to further analysis and characterization. Kim et.al. (Nucleic acids research. 36, el27, 2008) discloses epitope tagging of endogenous genes in diverse human cell lines. US patent application publication No. US 2009/0305272 discloses methods of characterizing endogenous polynucleotide-polypeptide interactions.

Nevertheless, the methods, approaches and constructs currently used for endogenous epitope tagging results in high background levels, where many false positive colonies resulting from random integration (i.e. colonies not harboring the tagged proteins). Further, the time required to obtain positive colonies is very long. Additionally, the methods and constructs currently used require the use of Cre- recombinase to obtain the cDNA of the tagged allele (i.e., obtain the tagged protein). Moreover, if the tested cells harbor a mutated allele, it is not always possible to identify whether the wild-type or the mutated allele was tagged.

Thus, there remains an unmet need in the art for compositions and methods enabling specific and efficient tagging of endogenous target gene products in various target cells to allow for their characterization and investigation under physiological conditions. SUMMARY OF THE INVENTION

The present invention provides compositions and methods for endogenous epitope tagging (EET) of endogenous target genes and products thereof in target cells. In some embodiments, the compositions include tagging vectors (constructs) that enable the specific endogenous tagging of target genes in target cells, whereby the target genes may be wild type genes and/or mutated alleles of the gene. In some embodiments, the compositions and methods provided herein are advantageous over currently used methods for tagging genes. In some embodiments, the compositions and methods provided herein advantageously allow a very sensitive, efficient and time and cost effective tagging of both wild type and/or mutated genes in a various target cells, while reducing background tagging of undesired genes. Further, the compositions and methods provided herein surprisingly and advantageously allow tagging desired target genes in any type of target cell, including, for example, melanoma cells. Further, the compositions and methods provided herein surprisingly and advantageously allow reducing the tagging time, thereby reducing time to obtain the tagged genes. In addition, the compositions and methods provided herein allow the tagging of both the N-terminus and/or the C-terminus of the gene product (protein). In some embodiments, the compositions and methods provided herein allow the endogenous epitope tagging of genes of interest regardless of the endogenous expression levels of the genes of interest. In some embodiments, the vectors, compositions and methods disclosed herein allow the creation of knock-in mutation located up to 200 nucleotides upstream or downstream to the start or stop codon of genes of interest. In some embodiments, the compositions and methods disclosed herein allow the characterization of the endogenous tagged gene products, interactions thereof with other molecules (such as, polypeptides, polynucleotides, and the like), and the like. The tagged genes and products thereof, obtained by the compositions and methods disclosed herein can be studied by various methods known in the art, such as, proteomics, biochemical methods (such as, immunoprecipitation, immunofluorescence, chromatin- immunoprecipitation), functional assays, knockdown and the like).

According to some embodiments, the advantageous compositions and methods disclosed herein are contemplated in an advantageous tagging vector, which allows the formation of one reading frame for both the tagged gene of interest and a selectable marker (such as resistance gene/cassette), the expression of both being controlled by the endogenous promoter (promoter trap). To this aim, the tagging vector may include a self-cleaving peptide and optionally excluding the ATG translation start codon of the resistance gene, which allows the formation of the polycistronic transcript that is regulated by the endogenous promoter of the tagged gene of interest, while maintaining only one reading frame for both the tagged gene and the resistance cassette. The maintaining of one reading frame results in extensive reduction (up to complete elimination) of background, since resistant colonies are only obtained if site-specific integration to the gene of interest (by homologous recombination (HR)) is achieved. Furthermore, in order to further reduce background, a poly-adenylation (polyA) signal may be added upstream to the left homology arm (LHA) of the tagging vector, to terminate transcription in case the tagging vector is randomly integrated in-frame into a transcriptionally active region. Additionally, a selection reporter marker (such as, a fluorescent marker, for example, EGFP), may be further included in the tagging vector, to facilitate the identification of single positive colonies (alternatively to or in addition to a selectable (resistance) marker). In some embodiments, the tagging vector includes sequences required to make a recombinant Adeno Associated virus (AAV) virion.

According to some embodiments, there is thus provided a tagging vector for endogenous epitope tagging of a gene of interest in a target cell, the tagging vector comprising a left homology arm (LHA) nucleotide sequence comprising a nucleotide sequence that is substantially homologous to the 5' region flanking the target gene locus; a right homology arm (RHA) nucleotide sequence, comprising a nucleotide sequence that is substantially homologous to the 3' region flanking the target gene locus; a nucleotide sequence encoding for a tag; a nucleotide sequence encoding for a self-cleavable peptide; and a nucleotide sequence encoding for a selectable marker, optionally without start (ATG) codon.

In some embodiments, the left homology arm (LHA) nucleotide sequence may include a nucleotide sequence that is substantially homologous to the 5' region flanking the start codon of target gene locus. In some embodiments, the right homology arm (RHA) nucleotide sequence, may include a nucleotide sequence that is substantially homologous to the 3' region flanking the stop codon of the target gene locus.

In some embodiments, the vector may further include a left inverted terminal repeat (L-ITR) sequence of adeno associated virus (AAV) and a right inverted terminal repeat (R-LTR) sequence of AAV.

In some embodiments, the vector may further include a poly adenylation (PolyA) nucleotide sequence located upstream to the left homology arm.

In some embodiments, the vector may further include recombination site sequences and/or recognition site sequences flanking the selectable marker sequence. In some embodiments, the recombination site sequences may be LoxP sequences. In some embodiments, the recognition site sequences may be KSi sequence, T3 sequence, T7 sequence, SP6 sequence, and the like, or combinations thereof.

In some embodiments, the tag may be selected from FLAG, 3XFLAG, His-tag,

Myc-Tag, HA tag, poly-Arg tag, Strep-tag, S-tag, HAT tag, Calmodulin-binding peptide-tag, Cellulose-binding domain-tag, Strep tavidin-binding peptide tag, Chitin- binding domain tag, Glutathione S-transferase tag, Maltose-binding protein (MBP) tag, fragment crystallizable (fc) region tag, and the like, or combinations thereof. In some embodiments, the tag is devoid of a STOP codon.

In some embodiments, the selectable marker is a resistance gene that may be selected from, but not limited to: Puromycin-N-acetyl-transferase (puro), neomycin phosphotransferase (neo), hygromycin phosphotransferase (hygro), dihydrofolate reductase, Thymidine Kinase, or combinations thereof.

In some embodiments, the self-cleavable peptide is T2A peptide. In some embodiments, the self-cleavable peptide may be located downstream to the sequence encoding the tag. In some embodiments, the vector may further include a nucleotide sequence encoding for a second selectable marker. In some embodiments, the second selectable marker may be selected from: LacZ, Green Fluorescent Protein (GFP), mCherry, mApple, DsRed, Red Fluorescent Protein (RFP), Blue Fluorescent Protein (BFP), EGFP, CFP, YFP, AmCyanl, ZsGreenl, ZsYellowl, DsRed2, AsRed2, and HcRedl.

In additional embodiments, the vector may be capable of being packaged in an AAV virion.

In some embodiments, the vector may further include a sequence recognizable by a shRNA. In some embodiments, the sequence recognizable by the shRNA is any sequence which does not exist in the genome of the targeted cells. In some embodiments, the sequence recognizable by the shRNA is derived from EGFP. In some embodiments, the EGFP sequence comprises the sequence AACTAC AAC AGCC AC AACGTCT ATATC . In some embodiments the sequence is recognized by an shRNA with the sequence TACAACAGCCACAACGTCTAT.

According to some embodiments, the target cells may be of human, animal or plant source. In some embodiments, the cells are cancer cells, such as, melanoma cells.

According to some embodiments, there is thus provided a tagging vector for endogenous epitope tagging of a gene of interest, the tagging vector comprises a left and right inverted terminal repeats (LTRs) of AAV (L-LTR and R-LTR, respectively) nucleotide sequences, a polyA signal nucleotide sequence located upstream to a left homology arm (LHA) nucleotide sequence, a nucleotide sequence encoding for (or serving as) a Tag, a nucleotide sequence encoding for a cleavable peptide, and a right homology arm (RHA) nucleotide sequence.

In some embodiments, there is provided a composition for endogenously tagging a target gene of interest in a target cell, the composition comprising a tagging vector.

In some embodiments, there is provided a method for endogenously tagging a target gene of interest in a target cell, the method comprising transfecting the target cell with a virion comprising a tagging vector, whereby following site specific integration of the tagging vector, a tagged gene is formed in the cell.

In some embodiments, there is provided the use of a tagging vector for endogenously tagging a target gene of interest in a target cell. In further embodiments, there is provided the use of a tagging vector for the creation/preparation of a target cell having an endogenous target gene comprising an epitope tag.

Some embodiments, features, advantages and the full scope of applicability of the present invention will become apparent from the detailed description and drawings given hereinafter. However, it should be understood that the detailed description, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Figs. 1A-E schematic illustrations of tagging vectors, showing the various sequences regions, according to some embodiments;

Figs. 2A-K - schematic illustrations of various constructs made and used herein, according to some embodiments. Fig. 2A- pTK-Puro-User construct, which includes a left inverted repeat region (L-ITR), Left Homology arms (LHA), a 3XFlag tag, a loxP site, a thymidine kinase promoter (pTK) and Puromycin resistance gene (Puro). Fig. 2B pTK-Puro-EGFP-User construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a 3XFlag tag, a loxP site, a thymidine kinase promoter (pTK) sequence, Puromycin resistance gene (Puro), EGFP encoding sequence, SV40 polyA signal (SV40), additional loxP site, right homology arm (RHA) and a right inverted repeat region (R-ITR). Fig. 2C- PolyA-3Flag W/O stop codon-pTK-Puro-User constructs includes a left inverted repeat region (L-ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a thymidine kinase promoter (pTK) sequence, Puromycin resistance gene (Puro), a LoxP site, right Homology arm (LHA) and a right inverted repeat region (R-ITR). Fig. 2D- T2A-Puro- User construct includes a left inverted repeat region (L-ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a LoxP site, Puromycin resistance gene (Puro), a second LoxP site, right Homology arm (RHA) and a right inverted repeat region (R-ITR). Fig. 2E- T2A- Puro-EGFP-User construct includes a left inverted repeat region (L-ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a LoxP site, Puromycin resistance gene (Puro), an EGFP encoding sequence, a second LoxP site, right Homology arm (RHA) and a right inverted repeat region (R-ITR); Fig. 2F pT2A-Puro-User W/O LoxP (pT2A-Puro-User 2^nd generation) construct includes a left inverted repeat region (L-ITR), a poly A sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a Ksi site, Puromycin resistance gene (Puro) lacking a start codon ("Met"), a T3 site, right Homology arm (RHA) and a right inverted repeat region (R-ITR); Fig. 2G- pT2A-Puro-User 2^nd generation (pT2A-puro-User-C) construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a Ksi site, Puromycin resistance gene (Puro) lacking a start codon ("Met"), a T3 site, right Homology arm (RHA) and a right inverted repeat region (R-ITR); Fig. 2H- pT2A-Puro- User-N' construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a Puromycin resistance gene (Puro), a T2A (self-cleavable peptide) sequence, a 3XFlag tag lacking a STOP codon, right Homology arm (RHA) and a right inverted repeat region (R-ITR); Fig. 21 - illustration of the expected end products formed in the cells when using the pT2A-Puro-User-N' construct; Fig. 2J - C'-3Xflag-sheGFP-User construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a KSi recognition sequence (site), Puromycin resistance gene (Puro) (excluding a start codon (AUG)), EGFP sequence recognized by a specific shRNA (shg), a T3 recognition sequence (site), Right Homology Arm (LHA) and a right inverted repeat region (R- ITR); and Fig. 2K- N'-3Xflax-sheGFP-User construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a Puromycin resistance gene (Puro), EGFP sequence recognized by a specific shRNA (shg), a T2A (self-cleavable peptide) sequence, a 3XFlag tag lacking a STOP codon, Right Homology Arm (LHA) and a right inverted repeat region (R-ITR).

Fig. 3- Fluorescent microscope pictograms of 293 cells transfected with pTK- Puro^r-User and pTK-Puro^r-EGFP-User vectors, as well as control transfected to pCMV- puroGFP and non-transfected. 48h following transfection the cells were treated with 3 μg/ml puromycin, Expression of the EGFP in the cells was determined; Fig. 4- pictograms of 293T control cells or 293T cells transfected with the pCMV-GFP-T2A-Puro-User construct, and treated with 3μg/ml puromycin;

Fig. 5 - pictograms of Western blot analysis (left panel -short exposure and right panel- long exposure) of 293 cell lysates transfected with either pEGFP-Nl (G) or pCMV-Puro-GFP (P-G) as controls for the size of the EGFP and the Puro-GFP fusion protein or with G-T2A-P vector (which mimics an endogenous tagged gene using the pT2A- Puro-User construct). Cell lysates were prepared and subjected to Western Blot analysis with anti-GFP Ab or Immunoprecipitated (IP) with anti-Flag Antibody. Arrows mark the expected size of the cleaved GFP (left hand panel) and the uncleaved GFP (right hand panel), which can only be detected under long exposure;

Fig. 6 - pictograms of Western blot analysis of cell lysate (described in Fig. 5) that were subjected to immunoprecipitation (IP) experiment using anti-flag Antibody. The resulting IP product was then blotted with either anti-GFP antibody (left hand panel) or anti-Flag antibody (right hand panel);

Fig. 7 - genomic screening of positive tagged colonies of cells transfected with a recombinant AAV vector (rAAV) packaging pT2A-Puro-User construct having homology arms to the RGS7 target gene. Top panel shows a schematic representation of the regions amplified using PCR reaction in order to identify genomic integration of the tag to the gene of interest (RGS7) in target cells (53T and 67 T melanoma cells). Depicted are the locations of the amplified regions and the primers used to identify the left homology arm (LHA) region and the right homology arm (LHA) region. The amplified regions were amplified using both primers from the Flag-Puro cassette and from genomic regions located outside the homology arms, to distinguish between homologous and random integration of the Flag-tag. The arrows represent the primer location to amplify the LHA and RHA. Solid and dashed arrows designate the Forward and Reverse primers, respectively. The middle panel of Fig. 7 shows the PCR products obtained with primers for the LHA region (arrow mark the expected PCR product of the LHA region) and the lower panel of Fig. 7 shows PCR products obtained with primers for the RHA region (arrow mark the expected PCR product of the RHA region);

Fig. 8 - mRNA levels expression in cells tested to be positive in genomic screening (Fig. 7). Total RNA was extracted from colonies that were suspected to be positive, based on the genomic screening. After reverse transcription reaction (RT), PCR reaction was performed using forward primer from exome in the target gene that is not included in the LHA region, and a reverse primer located within the selectable marker gene (Puro). The rectangles mark the expected size of the PCR products;

Figs. 9A-C- C-Terminus tagging of NRAS target gene. Identifying positive tagged colonies of cells transfected with a pT2A-Puro-User 2nd generation construct comprising the homology Arms (HAs) of NRAS target gene and expression of a tagged gene within the cells - Fig. 9A- A representative gel of the genomic screen PCR products of the RHA of NRAS. The positive colony (Col5) is marked with an arrow;

Fig. 9B - A representative gel of the cDNA screening of PCR products of tagged NRAS colonies. The positive colony (Col5) is marked by arrow; Fig. 9C- Expression of the tagged NRAS allele at the protein level. A representative Western Blot of Col5 and A375 cells following IP with anti-Flag beads and blotting with either anti-Flag and anti- NRAS antibodies. The pulled-down tagged protein is pointed by the red arrow;

Fig. 10A- schematic illustration of a N' -tagging construct (having the tagging cassette cloned upstream to an EGFP sequence (N'-3xFlag tagged EGFP construct);

Fig. 10B- 293T were transfected with either Puro-EGFP or the N'-3xFlag-GFP constructs. 24h post transfection the expression of the GFP was monitored by fluorescent microscope. In addition 3μg of puromycine (puro) was added to the cells and 24h later, the cells viability was estimated under the microscope;

Fig. IOC- Recognition of the native tagged EGFP by Flag antibody. The constructs schematically described in the upper panel of Fig. IOC were transfected into 239T cells. 24h post transfection cell lysate was prepared and incubated with Agarose- anti Flag beads. The beads were washed and the eluents were loaded on Acrylamid (A.A.) gel then blotted with either Anti-GFP (left panel) or Anti-Flag (right panel). The expected size of the EGFP, uncleaved protein and the cleaved Flag-RGFP is indicated in the lower panel. The transfection of the N'-3xFlag-GFP was performed twice (N'-Flag- GFP1 and N'-Flag-GFP2). The expected bands are marked by arrows.

Figs. 11A-C- Tagging of N'-Flag-NRAS target gene in cells. Potential positive colonies obtained following a genomic screen of cells transfected with the pT2A-Puro- User-N' construct which includes the Homolgy arms (HAs) of NRAS, flanking the ATG translation start codon were further validated at the mRNA level. Shown in top panel of Fig. 11A - a representative gel of the cDNA screening of PCR products of tagged NRAS colonies. The expected PCR products are pointed by an arrow. The PCR products were sequenced to validate the integration of the tagging cassette upstream and in frame to NRAS gene. The sequencing results are presented in the lower panel of Fig. 11A. Fig. 11B-C- show expression of the tagged NRAS protein. Cell extract was prepared from two colonies and incubated with anti-flag agarose beads. The beads were washed and the eluents were loaded on A. A. gel then blotted with either anti-NRAS antibody (Fig. 11B) or anti-Flag antibody (Fig. 11C). The bright arrows mark the tagged protein and the dark arrows mark the light chain of the anti-flag antibody of the agarose beads. The asterisk designates an unspecific band.

Fig. 12A-B: Estimating the silencing efficiency of sheGFP sequence. 293T cells were co-transfected with either ^g of sheGFP or pLKO-1 and with lng or 5ng pEGFP- Nl. 48h hours post transfection the EGFP intensity was monitored by fluorescent microscope (Fig. 12A). Thereafter, the cells were lysed and loaded on acrylamide gel and blotted with anti-GFP and anti-GAPDH to normalize loading quantity (Fig. 12B);

Fig. 13A-C- N-Terminus tagging of a target gene with a construct which includes sheGFP target sequence. The tagging of the NRAS target protein was performed essentially as described in Example 5, while using the N'-3Xflax-sheGFP- User construct. Fig. 13 A- A representative gel of the genomic screen PCR of the N'- Flag-NRAS.-sheFGP. The size of the positive colonies is pointed by an arrow; Fig. 13B- Potential positive colonies obtained following the genomic screen were further validated at the mRNA level. The positive colonies are indicated by an arrow. A cDNA of a positive colony from example 5 was used as a positive control for the PCR reaction and designated as +C (Upper panel). PCR products were sequenced to validate the integration of the tagging cassette upstream and in frame to NRAS gene (lower panel);

Fig. 13C- Cell extract was prepared from two colonies and incubated with anti- flag agarose beads. The beads were washed and the eluents were loaded on Acrylamid gel then blotted with anti-NRAS. The arrows mark the tagged protein.

Fig. 14 - C-Terminus tagging of a target gene (TRRAP) with a construct which includes sheGFP target sequence. A representative gel of the genomic screen PCR of the C'-Flag-TRRAP- sheGFP transfected cells. The positive colony is marked by an arrow (Upper panel). Sequencing validation of the PCR product is shown in the Lower panel of Fig. 14. The highlighted sequence is of the Flag sequence. The non-highlighted sequence is the sequence of the LHA.

Fig. 15 - Efficient knockdown of the tagged NRAS allele. A representative blot of NRAS-tagged A375 cells transfected with a control plasmid or with pLKO-sheGFP is shown. The cell lysates were blotted with anti-NRAS antibody and with tubulin to normalize the amount of protein loaded.

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides tagging vectors for specific endogenous epitope tagging of target genes, compositions comprising the same and methods of using the same.

In the following detailed description of the invention when a reference term, such as: said, the, the last and the former; is used it refers to the exact term that is mentioned above.

The following are terms which are used throughout the description and which should be understood in accordance with the various embodiments to mean as follows:

As referred to herein, the terms "polynucleotide molecules", "oligonucleotide", "polynucleotide", "nucleic acid" and "nucleotide" sequences may interchangeably be used herein. The terms are directed to polymers of deoxyribonucleotides (DNA), ribonucleotides (RNA), and modified forms thereof in the form of a separate fragment or as a component of a larger construct, linear or branched, single stranded, double stranded, triple stranded, or hybrids thereof. The term also encompasses RNA/DNA hybrids. The polynucleotides may include sense and antisense oligonucleotide or polynucleotide sequences of DNA or RNA. The DNA or RNA molecules may be, for example, but not limited to: complementary DNA (cDNA), genomic DNA, synthesized DNA, recombinant DNA, or a hybrid thereof or an RNA molecule such as, for example, mRNA, shRNA, siRNA, miRNA, and the like. Accordingly, as used herein, the terms "polynucleotide molecules", "oligonucleotide", "polynucleotide", "nucleic acid" and "nucleotide" sequences are meant to refer to both DNA and RNA molecules. The terms further include oligonucleotides composed of naturally occurring bases, sugars, and covalent inter nucleoside linkages, as well as oligonucleotides having non-naturally occurring portions, which function similarly to respective naturally occurring portions. The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to naturally occurring amino acid polymers, to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to amino acid polymers having one or more tags.

The terms "epitope" and "tag" may interchangeably be used and are directed to a molecule (or a portion thereof), that may be recognized by the immune system (for example, by a specific antibody). In some embodiments, the tag may be selected from various types of molecules, including such molecules as, but not limited to: an amino acid, a stretch of amino acids, a peptide, a protein, a polynucleotide, a carbohydrate, a lipid, a polysaccharide, a lipopolysaccharide, a glycolipid, a glycoprotein, viral particles, and the like, or combinations thereof. Exemplary tags comprised of amino acids include such tags as, but not limited to: FLAG tag (comprising amino acid sequence DYKDDDDK (SEQ ID NO: 1)), 3XFLAG (comprising amino acid sequence DYKDHDGDYKDHDIDYKDDDDK (SEQ ID NO: 2), in which the Asp-Tyr-Lys- Xaa-Xaa-Asp (SEQ ID NO: 3) motif is repeated three times), His-tag (comprising amino acid sequence HHHHHHH (SEQ ID NO: 4) (2-10 Histidines), Myc-Tag (comprising amino acid sequence EQKLISEEDL (SEQ ID NO: 5)), HA tag (comprising amino acid sequence YPYDVPDYASLGGP (SEQ ID NO: 6)), poly-Arg (comprising amino acid sequence RRRRR (SEQ ID NO: 7) (5-6 repeats of Arginine)), Strep-tag (comprising amino acid sequence WSHPQFEK (SEQ ID NO: 8)), S-tag (comprising amino acid sequence KETAAAKFERQHMDS (SEQ ID NO: 9)), HAT (comprising amino acid sequence KDHLIHNVHKEFHAHAHNK - SEQ ID NO: 35), Calmodulin-binding peptide (comprising amino acid sequence (KRRWKKNFIA VS A ANRFKKIS S S GAL (SEQ ID NO: 10)), Cellulose-binding domains tags, Strep tavidin-binding peptide tag (comprising amino acid sequence MDEKTTGWRGGHVVEGLAGELEQLRARLEHHPQGQREP (SEQ ID NO: 11)), Chitin-binding domain tag (comprising amino acid sequence TNPGVSAWQVNTAYTAGQLVTYNGKTYKCLQPHTSLAGWEPSNVPALWQLQ

(SEQ ID NO: 12)), Glutathione S-transferase (GST) tag, Maltose-binding protein (MBP) tag, fragment crystallizable region (Fc region of antibodies), and the like, or combinations thereof. In some embodiments, tandem arrays of the tags may be used. In some embodiments, a combination of tags may be used. In some embodiments, a combination of tags may be used such that one or more tags are on the C-terminus and one or more tags are on the N-terminus. Each possibility is a separate embodiment.

The term "construct", as used herein refers to an artificially assembled or isolated nucleic acid molecule which may include one or more nucleic acid sequences, wherein the nucleic acid sequences may include coding sequences (that is, sequence which encodes for an end product), regulatory sequences, non-coding sequences, or any combination thereof. The term construct includes, for example, vectors, but should not be seen as being limited thereto.

As used herein, the term "tagging vector" and "vector" may interchangeably be used and refer to a polynucleotide construct that includes various nucleotide sequences that allow endogenous epitope tagging of target genes. In some embodiments, the vector may include such sequences as, but not limited to: sequences that are homologous to endogenous chromosomal polynucleotide sequences flanking the target gene locus ("homology arms", such as left homology arm and right homology arm), to direct the vector to the target gene; sequences that encode for a tag; sequences that encode for selectable markers, such as resistance marker and reporter markers; sequences that allow packing or creation of infections virions (such as AAV virions); sequences that serve as restrictions sites for restrictions enzymes (multiple cloning sites (MCS) sequences); sequences that serve as transcriptional regulators (for example, promoters), sequences that serve as translational regulators (for example, poly A signals), sequences that encode for self-cleavable peptides, w, and the like or any combination thereof. Each possibility is a separate embodiment.

The terms "homologous recombination" and "HR" may interchangeably be used and refer to the process of DNA recombination based on sequence homology.

As referred to herein, the term "complementarity" is directed to base pairing between strands of nucleic acids. As known in the art, each strand of a nucleic acid may be complementary to another strand in that the base pairs between the strands are non- covalently connected via two or three hydrogen bonds. Two nucleotides on opposite complementary nucleic acid strands that are connected by hydrogen bonds are called a base pair. According to the Watson-Crick DNA base pairing, adenine (A) forms a base pair with thymine (T) and guanine (G) with cytosine (C). In RNA, thymine is replaced by uracil (U). The degree of complementarity between two strands of nucleic acid may vary, according to the number (or percentage) of nucleotides that form base pairs between the strands. For example, " 100% complementarity" indicates that all the nucleotides in each strand form base pairs with the complement strand. For example, "95% complementarity" indicates that 95% of the nucleotides in each strand from base pair with the complement strand. The term sufficient complementarity may include any percentage of complementarity from about 30% to about 100%.

The term "exogenous" refers to nucleic acid sequences which are introduced to and/or expressed within a target cell.

As used herein, the term "target cell" refers to any cell or group of cells such as but not limited to: human cells, animal cells, and plant cell which include, harbor and/or express the gene of interest.

As referred to herein, the terms "gene of interest" or "target gene" may interchangeably be used and refer to an endogenous nucleic acid sequence which may encode for any structural or functional molecule subsequently expressed in the target cell. In some embodiments, the target gene is located at a genomic locus. The term "gene product" refers to the end product of the gene in the cell. In some embodiments, the gene product is a protein or a peptide encoded by the gene and expressed (translated) in the cell. In some embodiments, the gene product is a "protein of interest", i.e. a protein or a peptide encoded by the gene of interest and expressed (translated) in the target cell.

The terms "recombination site" and "recombination sequence" are used herein interchangeably and refer to a recognition sequence on a nucleic acid molecule participating in an integration or recombination reaction by recombination proteins.

As used herein, the term "flanked" includes each end/side of a nucleic acid sequence.

The terms "Upstream" and "Downstream", as used herein refers to a relative position in a nucleotide sequence, such as, for example, a DNA sequence or an RNA sequence. As known, a nucleotide sequence has a 5' end and a 3' end, so called for the carbons on the sugar (deoxyribose or ribose) ring of the nucleotide backbone. Hence, relative to the position on the nucleotide sequence, the term downstream relates to the region towards the 3' end of the sequence. The term upstream relates to the region towards the 5' end of the strand.

As referred to herein, the term, "Open Reading Frame" ("ORF") is directed to a coding region which contains a start codon and a stop codon.

The term "Start codon" is directed to include the codon from which translation initiates. The start codon is usually ATG (AUG), encoding for Methionine (Met).

The terms "STOP", "stop codon" and "STOP codon" are used herein interchangeably and refer to a sequence that does not encode an amino acid. Typically, stop codons include, but are not limited to TAA, TAG and TGA. Usually a stop codon terminates translation of a protein or peptide.

The terms "promoter element", "promoter" or "promoter sequence" as used herein, refer to a nucleotide sequence that is generally located at the 5' end (that is, precedes, located upstream) of the coding sequence and functions as a switch, activating the expression of a coding sequence. If the coding sequence is activated, it is said to be transcribed. Transcription generally involves the synthesis of an RNA molecule (such as, for example, an mRNA) from a coding sequence. The promoter, therefore, serves as a transcriptional regulatory element and also provides a site for initiation of transcription of the coding sequence into mRNA. Promoters may be derived in their entirety from a native source, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic nucleotide segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions, or at various expression levels. Promoters which cause a gene to be expressed in most cell types at most times are commonly referred to as "constitutive promoters". Promoters that derive gene expression in a specific tissue are called "tissue specific promoters".

The term "expression", as used herein, refers to the production of a desired end- product molecule in a target cell. The end-product molecule may include, for example an RNA molecule; a peptide or a protein; and the like; or combinations thereof.

As used herein, the terms "introducing" and "transfection" may interchangeably be used and refer to the transfer of molecules, such as, for example, nucleic acids, polynucleotide molecules, vectors, and the like into a target cell(s), and more specifically into the interior of a membrane-enclosed space of a target cell(s). The molecules can be "introduced" into the target cell(s) by any means known to those of skill in the art, for example as taught by Sambrook et al. (Sambrook et ah, 1989), the contents of which are incorporated by reference herein. Means of "introducing" molecules into a cell include, for example, but are not limited to: heat shock, calcium phosphate transfection, Poly-ethylenimine (PEI) transfection, electroporation, lipofection, transfection reagent(s), viral-mediated transfer, and the like, or combinations thereof. The transfection of the cell may be performed on any type of cell, of any origin, such as, for example, human cells, animal cells, plant cells, and the like. The cells may include isolated cells, tissue cultured cells, cell lines, cells present within an organism body, and the like. Transfection may be "stable", where the introduced DNA is incorporated into the genome of the cell, or "transient", where the introduced DNA is not incorporated into the genome of the cell and may eventually disappear.

The term "selectable marker" is directed to include a marker that can allow the identification and/or isolation of cells expressing the marker, out of a population of cells. In some embodiments, the selectable marker is a resistance gene (resistance marker), which may confer the cells resistance to antibiotics, such as, but not limited to: puromycin-N-acetyl-transferase (puro), neomycin phosphotransferase (neo), hygromycin phosphotransferase (hygro), dihydrofolate reductase, Thymidine Kinase, and the like. In some embodiments, the selectable marker is a reporter marker, which may encode for a detectable marker, such as, for example, a detectable protein, such as, for example, but not limited to: LacZ, Green Fluorescent Protein (GFP), mCherry, mApple, DsRed, Red Fluorescent Protein (RFP), Blue Fluorescent Protein (BFP), EGFP, CFP, YFP, AmCyanl, ZsGreenl, ZsYellowl, DsRed2, AsRed2, and HcRedl .

The terms "self-cleavable peptide" and "self-cleaving peptide" may interchangeably be used. The terms refer to peptides which allow multiple proteins to be encoded as polyproteins, which can then dissociate into component proteins upon translation, for example, by a mechanism of ribosomal skipping. In some embodiments, the self-cleavable peptide is a T2A peptide. In some embodiments, the T2A peptide may be derived from various viruses or strains or viruses. In some embodiments more than one copy of a T2A peptide may be used in the tagging vector. In some embodiments, the T2A peptide may include one or more modification or changes in the sequence thereof (compared to a Wild-type sequence). In some embodiments, the sequence of the T2A may include the nucleic acids: GGCAGTGGTGAGGGCAGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGA GAATCCTGGCCCA (SEQ ID NO: 13).

The term "recombinant virus" refers to a virus that has been altered, for example, by the insertion of exogenous polynucleotides into the viral particle.

The term "AAV virion" refers to an Adeno-Associated virus particle that is capable of infecting target cells. In some embodiments, the AAV virions may be a recombinant virus. In some embodiments, the AAV virions may include a capsid encapsulating an exogenous (heterologous) polynucleotide sequence (such as, for example, a tagging vector). In some embodiments, the exogenous (heterologous) polynucleotide sequence in the AAV virion may be flanked by inverted terminal repeats (ITR) (such as a left ITR (L-ITR) and right ITR (R-ITR).

The present invention in embodiments thereof provides for a tagging vector for endogenous epitope tagging of a target gene in a target cell, compositions comprising the same and methods of using the same.

In some embodiments, the tagging vector comprises various nucleotide sequences that allow endogenous epitope tagging of target genes. In some embodiments, the vector may include such sequences as, but not limited to: sequences that are homologous to endogenous chromosomal polynucleotide sequences flanking the target gene locus ("homology arms", such as left homology arm and right homology arm, the sequences of which may be determined base on the sequence of the sequences flanking the endogenous gene locus), to direct the vector to the target gene; sequences that encode for a tag, such as a peptide tag; sequences that encode for selectable markers, such as resistance marker (for example, resistance to antibiotics) and/or reporter markers (such as, fluorescent proteins); sequences that allow packing or creation of infecting virions (such as AAV virions, including such sequences as inverted terminal repeats (ITRs) sequences); sequences that serve as restrictions sites for restrictions enzymes (multiple cloning sites (MCS) sequences); sequences that serve as transcriptional regulators (for example, promoters, IRES sequences), sequences that serve as translational regulators (for example, poly A signals), sequences that encode for self-cleavable peptides, to allow creation of polycistronic gene products or any combination thereof.

At a minimum, the tagging vector comprises:

(i) a left homology arm (LHA) nucleotide sequence comprising a nucleotide sequence that is substantially homologous to the 5' region flanking the target gene locus;

(ii) a right homology arm (RHA) nucleotide sequence, comprising a nucleotide sequence that is substantially homologous to the 3' region flanking the target gene locus;

(iii) a nucleotide sequence encoding for a tag;

(iv) a nucleotide sequence encoding for a self cleavable peptide; and

(v) a nucleotide sequence encoding for a selectable marker.

The LHA is typically situated 5' to components iii, iv and v, whereas the RHA is typically positioned 3' to components iii, iv and v.

Components (iii), (iv) and (v) are not operationally linked to promoter sequences. In this way, once integrated into the target cell, both the selectable marker and the tag are only under control of the endogenous promoter of the target gene and never at any stage under control of an exogenous promoter.

A L-ITR and a R-ITR may be added to the vector, if the vector is introduced into the cell via a virus. The L-ITR is typically located 5' to the LHA and the R-ITR is typically located 3' to the RHA.

In one embodiment, the vector is devoid of recombination sites. In another embodiment, the selectable marker of the vector is flanked by recombination sites.

Reference is now made to Fig. 1A, which is a schematic illustration of an exemplary tagging vector, according to some embodiments. As shown in Fig. 1A, the tagging vector (shown in the form of a circular vector) may include the following nucleotide sequences (in the direction of 5' -3'): a left inverted terminal repeat ("L- ITR"), a poly A signal ("PolyA"), a Left homology arm ("LHA"), a sequence encoding for or serving as a tag ("Tag"); a self-cleavable peptide sequence ("SCP"), a recombination sequence/site and/or a recognition sequence/site (shown as "1^st rec. site"), a sequence encoding for a selectable marker ("Selectable marker"), a second recombination sequence and/or a recognition sequence/site (shown as "2^nd rec. site"), a right homology arm (RHA) and a right inverted terminal repeat ("R-ITR"). The tagging vector may be introduced into cells capable of making an infective virion that my then be used to infect/transfect target cells harboring the gene of interest that is to be tagged. When the tagging vector is introduced into the target cell, it is capable of being integrated to the genome of the target cell at a location dictated by sequence homology between the homology arms (left and right arms) of the tagging vector and the genomic sequences flanking the gene of interest. Following integration, the Tag is added at a correct location to the gene, such that the gene product would result, for example, in a chimeric protein comprising the Tag at the C-terminus. Similarly, by changing the relative location of the various sequences of the tagging vector, the Tag may be added to the N-terminus of the resulting gene product.

In order for the tag to be expressed at the N terminus of the protein encoded by the target gene, the homology arms should flank the translation initiation codon of the target gene and the selectable marker should be positioned upstream to the self- cleavable peptide sequence. In order for the tag to be expressed at the C terminus of the protein encoded by the target gene, the homology arms should flank the stop codon of the target gene and the selectable marker should be positioned downstream to the self- cleavable peptide sequence.

In addition to the Tag being expressed in the cell, a selectable marker (such as, antibiotic resistance gene, a fluorescent protein, and the like) is also expressed in the cell, allowing the selection of positive cells expressing the Tag-protein. The Tag and the selectable marker would both be expressed from a single reading frame, whereby the reading frame of the Tag and the selectable marker are separated by the cleavable peptide. The use of the cleavable peptide allows upon translation of the resulting transcript for the tagged protein and the selectable marker to be translated into two separate proteins. In some embodiments, the Tag sequence does not include a STOP codon. In some embodiments, the selectable marker does not include a stop codon. Optionally, for C terminus tagging, the selectable marker may be devoid of a start codon - see Figure 2G for example. In some embodiments, more than one selectable marker may be encoded by the tagging vector. In some embodiments, a poly A sequence is added upstream to the Left homology arm, to terminate transcription in the case the construct was randomly integrated in frame into a transcriptionally active region.

Reference is now made to Fig. IB, which is a schematic illustration of an exemplary tagging vector, according to some embodiments. As shown in Fig. IB, the tagging vector (shown in the form of a circular vector) may include the following nucleotide sequences (in the direction of 5' -3'): a left inverted terminal repeat ("L- ITR"), a Left homology arm ("LHA"), a sequence encoding for or serving as a tag ("Tag"), optionally without a stop codon; a self-cleavable peptide sequence ("SCP"), a recombination sequence/site and/or a recognition sequence/site (shown as "1^st rec. site"), a sequence encoding for a selectable marker ("Selectable marker"), optionally without a start codon, a second recombination sequence and/or a recognition sequence/site (shown as "2^nd rec. site"), a right homology arm (RHA) and a right inverted terminal repeat ("R-ITR"). In some embodiments, the tagging vector may further include an shRNA recognizable site, located between the selectable marker and the second recombination site.

Reference is now made to Fig. 1C, which is a schematic illustration of an exemplary tagging vector, for tagging at the N-terminus, according to some embodiments. As shown in Fig. 1C, the tagging vector (shown in the form of a circular vector) may include the following nucleotide sequences (in the direction of 5'-3'): a left inverted terminal repeat ("L-ITR"), a Left homology arm ("LHA"), a sequence encoding for a selectable marker ("Selectable marker"); a self-cleavable peptide sequence ("SCP"), a sequence encoding for or serving as a tag ("Tag"), optionally without a stop codon, a right homology arm (RHA) and a right inverted terminal repeat ("R-ITR"). Such a vector, when expressed in the cells may be used to tag the N- terminus of a gene product, by forming at least two proteins in the cells - selectable marker having a SCP sequence at its C terminus, and a tagged product, having a tag at its N-terminus. In some embodiments, the tagging vector may further include an shRNA recognizable site, located between the selectable marker and the self-cleavable peptide sequence ("SCP").

Reference is now made to Fig. ID, which is a schematic illustration of an exemplary tagging vector, according to some embodiments. As shown in Fig. ID, the exemplary tagging vector (shown in the form of a circular vector) may include the following nucleotide sequences (in the direction of 5'-3'): a left inverted terminal repeat ("L-ITR"), a poly A signal ("PolyA"), a Left homology arm ("LHA"), a sequence encoding for or serving as a tag ("shown as "FLAG Tag"); a self-cleavable peptide sequence (Shown as "T2A"), a recombination sequence/site (shown as "LoxP"), a sequence encoding for a selectable marker (shown as resistance marker "Puro", conferring resistance to puromycin) a second recombination sequence (shown as "LoxP"), a right homology arm (RHA) and a right inverted terminal repeat ("R-ITR").

The tagging vector may be introduced into cells capable of making an infective virion that my then be used to infect/transfect target cells harboring the gene of interest that is to be tagged. When the tagging vector is introduced into the target cell, it is capable of being integrated to the genome of the target cell at a location dictated by sequence homology between the homology arms (left and right arms) of the tagging vector and the genomic sequences flanking the gene of interest. Following integration, the FLAG Tag (for example, a 3XFLAG tag) is added at a correct location to the gene, such that the gene product would result, for example, in a chimeric protein comprising the Tag at the C-terminus. Similarly, by changing the relative location of the various sequences of the tagging vector, the FLAG Tag may be added to the N-terminus of the resulting gene product. In addition to the FLAG Tag being expressed in the cell, an antibiotics resistant selectable marker (such as, Puromycin antibiotic resistance gene), also expressed in the cell, allowing the selection of positive cells expressing the FLAG- Tag-protein. The FLAG-Tag and the antibiotic resistance selectable marker would both be expressed from a single reading frame, whereby the reading frame of the Tag and the antibiotic resistance selectable marker are separated by the cleavable peptide. The use of the T2A cleavable peptide allows, upon translation of the resulting transcript, for the tagged protein and the antibiotics resistance selectable marker to be translated into two separate proteins. In some embodiments, the FLAG-Tag sequence does not include a STOP codon. In some embodiments, a poly A sequence is added upstream to the Left homology arm, to terminate transcription in the case the construct was randomly integrated in frame into a transcriptionally active region. This is further illustrated in Fig. 21, below.

Reference is now made to Fig. IE, which is a schematic illustration of an exemplary tagging vector, according to some embodiments. As shown in Fig. IE, the exemplary tagging vector (shown in the form of a circular vector) may include the following nucleotide sequences (in the direction of 5'-3'): a left inverted terminal repeat ("L-ITR"), a poly A signal ("PolyA"), a Left homology arm ("LHA"), a sequence encoding for or serving as a tag ("shown as "FLAG Tag"); a self-cleavable peptide sequence (Shown as "T2A"), a recombination sequence/site (shown as "Lox"), a sequence encoding for a first selectable marker (shown as resistance marker "Puro", conferring resistance to puromycin), a sequence encoding for a second selectable marker (shown as sequence encoding for a fluorescent protein, "EGFP"), a second recombination sequence (shown as "Lox"), a right homology arm (RHA) and a right inverted terminal repeat ("R-ITR"). The tagging vector may be introduced into cells capable of making an infective virion that my then be used to infect/transfect target cells harboring the gene of interest that is to be tagged. When the tagging vector is introduced into the target cell, it is capable of being integrated to the genome of the target cell at a location dictated by sequence homology between the homology arms (left and right arms) of the tagging vector and the genomic sequences flanking the gene of interest. Following integration, the FLAG Tag (for example, a 3XFLAG tag) is added at a correct location to the gene, such that the gene product would result, for example, in a chimeric protein comprising the Tag at the C-terminus. Similarly, by changing the relative location of the various sequences of the tagging vector, the FLAG Tag may be added to the N-terminus of the resulting gene product. In addition to the FLAG Tag being expressed in the cell, an antibiotics resistant selectable marker (such as, Puromycin antibiotic resistance gene) and a fluorescent protein (such as EGFP), are also expressed in the cell, allowing the selection of positive cells expressing the FLAG- Tag-protein. The FLAG-Tag and the selectable markers would both be expressed from a single reading frame, whereby the reading frame of the FLAG Tag and the selectable markers are separated by the cleavable peptide. The use of the T2A cleavable peptide allows, upon translation of the resulting transcript, for the tagged protein and the antibiotics resistance selectable marker to be translated into two separate proteins. In some embodiments, the FLAG-Tag sequence does not include a STOP codon. In some embodiments, the first selectable marker (for example, the antibiotic resistance gene) does not include a STOP codon, so as to allow for expression of the second selectable marker. In some embodiments, a polyA sequence is added upstream to the Left homology arm, to terminate transcription in the case the construct was randomly integrated in frame into a transcriptionally active region.

In some embodiments, the tag includes a stretch of one or more amino acids that are inserted in-frame to the target gene product (protein) at the N-terminus or C- terminus of the protein, depending on the constructions of the tagging vector. In some embodiments, the tag may be selected from, but not limited to: FLAG, 3XFLAG, His- tag, Myc-Tag, HA tag, poly-Arg tag, Strep-tag, S-tag, HAT tag, Calmodulin-binding peptide-tag, Cellulose-binding domain-tag, Strep tavidin-binding peptide tag, Chitin- binding domain tag, Glutathione S-transferase tag, Maltose-binding protein (MBP) tag, fragment crystallizable (fc) region tag, and the like, or combinations thereof. In some embodiments, the tag may be identifiable by a specific antibody (such as, for example, an anti-FLAG antibody in the case of a FLAG tag). Hence, when the chimeric protein of interest is formed in the target cell (i.e. a target protein having expressing a Tag), it may be identifiable and subjected to further research by using the Tag-specific antibody. The research may include, for example, immunoprecipitating the chimeric Tag -protein to identify endogenous binding partners of the target protein; to perform biochemical analysis of the protein (for example, Western blotting, immunocytochemistry immunohistochemistry, immunofluorescence, Northern blots, Southern blots, ELISA, radioimmunoassay, flow cytometry, and the like); to perform Chromatin-IP experiments for protein that may interact with or participate in transcriptional regulation; and the like.

In some embodiments, the selectable marker is a resistance gene (resistance marker), which may confer the cells resistance to antibiotics, such as, but not limited to: puromycin-N-acetyl-transferase (puro), neomycin phosphotransferase (neo), hygromycin phosphotransferase (hygro), dihydrofolate reductase, Thymidine Kinase, and the like, or combinations thereof. In some embodiments, the selectable marker is a reporter marker, which may encode for a detectable marker, such as, for example, a detectable protein, such as, for example, but not limited to: LacZ, Green Fluorescent Protein (GFP), mCherry, mApple, DsRed, Red Fluorescent Protein (RFP), Blue Fluorescent Protein (BFP), EGFP, CFP, YFP, AmCyanl, ZsGreenl, ZsYellowl, DsRed2, AsRed2, and HcRedl. In some embodiments, more than one selectable marker may be used. In some embodiments, at least of the selectable markers are devoid of a STOP codon.

According to some embodiments, the recombination sites may include any appropriate recombination sequences, such as, but not limited to LoxP sequence, sequence, or combinations thereof. In some embodiments, the recombination sites may include any recognition site which include any nucleotide sequence that can function as complementary binding site for oligonucleotides that may serve for nucleic acid molecules amplification (for example, PCR amplification), reverse transcription and/or sequencing such as, for example, KSi sequence (comprising nucleotide sequence of: CCTCGAGGTGGACGGTATCG (SEQ ID NO: 14)), T3 (comprising nucleotide sequence of, GCGCAATTAACCCTCACTAAAG (SEQ ID NO: 15)), T7, SP6 or combinations thereof. The use of such recognition sequences may be used for identification of recombination, insertion and the like, and/or for in-vitro transcription. In some embodiments, the recombination sites may include any appropriate recombination sequences and/or recognitions sites.

In some embodiments, the vector may further include a sequence recognizable by a shRNA. In some embodiments, the sequence of the shRNA can be of any gene that is not expressed in mammalians. In some embodiments, introducing shRNA recognition sequence to the tagging construct may be used to deferentially knockdown the expression of the tagged allele to decipher the role of recurrent mutations i without ectopic expression. In some embodiments, the sequence recognizable by the shRNA is derived from EGFP. In some embodiments, the EGFP sequence comprises the sequence AACTACAACAGCCACAACGTCTATATC (SEQ ID NO: 16). In some embodiments the sequence is recognized by an shRNA with the sequence TACAACAGCCACAACGTCTAT (SEQ ID NO: 17).

According to some embodiments, the tagging vector may include various restriction enzyme recognitions sites, to aid in the accommodation and adjustment of the tagging vector to various target genes. For example, the tagging vector may include one or more multiple cloning sites (MCS) to aid in the addition/insertion of the appropriate homology arms sequences so as to conform to the desired target gene. The target gene locus can be an intact gene, an exon, an intron, a regulatory sequence, any region between genes, or combination thereof. The target gene locus may be located in the nucleus or mitochondria.

According to some embodiments, the tagging vector may be packaged in a delivery vehicle such as a virion. In some embodiments, the virion may be an AAV virion. In some embodiments, the AAV virions may be derived from any strain or serotype of AAV, such as, AAV-1, AAV-2, AAV-3, AAV-4, AAV-5, AAV-6, AAV-7 and/or AAV-8. The AAV virion can have one or more of the AAV wild-type genes deleted or truncated, so as to accumulate the tagging vector and form an infective virion that can be used to infect/transfect target cells, such as, somatic cells, including hard to transfect cells, such as, melanoma cells. The cells in which the tagging vector has been introduced successfully can be selected based on the appropriate selectable marker, for example, by exposure to the appropriate antibiotics (if such selectable marker was used). Further, Cells which have been genetically modified (that is, integrated at the target gene locus has occurred) can be identified by various means known in the art, such as, for example, PCR (by using specific primers that anneals to regions outside the homology regions), in situ hybridization, and the like.

According to some embodiments, there is thus provided a tagging vector for endogenous epitope tagging of a gene of interest in a target cell, the tagging vector comprising a left homology arm (LHA) nucleotide sequence comprising a nucleotide sequence that is substantially homologous to the 5' region flanking the target gene locus; a right homology arm (RHA) nucleotide sequence, comprising a nucleotide sequence that is substantially homologous to the 3' region flanking the target gene locus; a nucleotide sequence encoding for a tag; a nucleotide sequence encoding for a self-cleavable peptide; and a nucleotide sequence encoding for a selectable marker. In some embodiments, the vector may further include a left inverted terminal repeat (L- ITR) sequence of adeno associated virus (AAV) and a right inverted terminal repeat (R- LTR) sequence of AAV. In some embodiments, the vector may further include a poly adenylation (PolyA) nucleotide sequence located upstream to the left homology arm. In further embodiments, the vector may further include recombination site and/or recognition sequences flanking the selectable marker sequence (such as, LoxP sites, KSi site, T3 site, T7 site, SP6 site, and the like). In some embodiments, tag may be selected from FLAG, 3XFLAG, His-tag, Myc-Tag, HA tag, poly-Arg tag, Strep-tag, S-tag, HAT tag, Calmodulin-binding peptide-tag, Cellulose-binding domain-tag, Streptavidin- binding peptide tag, Chitin-binding domain tag, Glutathione S-transferase tag, Maltose- binding protein (MBP) tag, fragment crystallizable (fc) region tag, and the like, or combinations thereof. In some embodiments, the tag is devoid of a STOP codon. In some embodiments, the selectable marker is a resistance gene that may be selected from, but not limited to: Puromycin-N-acetyl-transferase (puro), neomycin phosphotransferase (neo), hygromycin phosphotransferase (hygro), dihydrofolate reductase, Thymidine Kinase, or combinations thereof. In some embodiments, the self- cleavable peptide is T2A peptide. In some embodiments, the self-cleavable peptide may be located downstream to the sequence encoding the tag. In some embodiments, the vector may further include a nucleotide sequence encoding for a second selectable marker. In some embodiments, the second selectable marker may be selected from: LacZ, Green Fluorescent Protein (GFP), mCherry, mApple, DsRed, Red Fluorescent Protein (RFP), Blue Fluorescent Protein (BFP), EGFP, CFP, YFP, AmCyanl, ZsGreenl, ZsYellowl, DsRed2, AsRed2, and HcRedl. In additional embodiments, the vector may be capable of being packaged in an AAV virion. According to some embodiments, the target cells may be of human, animal or plant source. In some embodiments, the cells are cancer cells, such as, melanoma cells.

According to some embodiments, there is provided a composition for endogenous epitope tagging of a target gene in a target cell, the composition comprising a tagging vector capable of endogenously epitope tagging a target gene.

According to some embodiments, there is provided a composition for endogenous epitope tagging of a target gene in a target cell, the composition comprising a tagging vector comprising: a left homology arm and a right homology arm, each of said homology arms comprise a polynucleotide sequence that is substantially homologous to the 5' and 3' regions flanking the target gene locus, respectively; a nucleotide sequence encoding for a tag; a nucleotide sequence encoding for a cleavable peptide; and a nucleotide sequence encoding for a selectable marker. In some embodiments, the tagging vector may further include a left inverted terminal repeat of AAV (L-ITR) and a right inverted terminal repeat of AAV (R-ITR) to aid in the formation of AAV virions capable of infecting target cells. In some embodiments, the tagging vector may further include a polyA sequence located upstream to the Left homology arm. In some embodiments, the tagging vector, further comprises recombination sequences flanking the selectable marker. In some embodiments, the tagging vector may further include an additional selectable marker, located downstream to the first selectable marker. In some embodiments, the tag sequence does not include a STOP codon. In some embodiments, the sequence encoding for the self-cleavable peptide is located downstream to the tag encoding sequence. In some embodiments, when more than one selectable marker is used, the first (upstream) sequence encoding for the upstream marker does not include a STOP codon.

According to some embodiments, there is provided a method for endogenous epitope tagging of a target gene in a target cell, the method comprising introducing the target cell with a tagging vector (encapsulated in a recombinant virus, such as, rAAV), comprising a left homology arm and a right homology arm, each of said homology arms comprise a polynucleotide sequence that is substantially homologous to the 5' and 3' regions flanking the target gene locus, respectively; a nucleotide sequence encoding for a tag; a nucleotide sequence encoding for a cleavable peptide; and a nucleotide sequence encoding for a selectable marker, whereby following introduction of the tagging vector to the target cell, the tag sequence is integrated into the target gene and the selection marker is expressed in the cells. In some embodiments, the homology arms comprise a polynucleotide sequence that is substantially homologous to the 5' and 3' regions flanking the start codon (ATG) and/or the STOP codon of the target gene locus, respectively. In some embodiments, the tagged gene is expressed in the cell to result in a gene product comprising a tagged polypeptide. In some embodiments, the epitope-tagged polypeptide endogenously formed by the method can be further characterized. Characterization of the tagged polypeptide formed by the method may include, for example, but not limited to: identify/determine protein-protein interactions: identify subcellular localization and cellular movement; identify functions and/or endogenous processes in which the tagged polypeptide is involved.

According to some exemplary embodiments, there is provided a method for endogenous epitope tagging of a target gene in a target cell, the method comprising one or more of the following steps:

Cloning of the Homology arms (Has) in a tagging vector (for example, any of the tagging vectors in Example 1.

Encapsulation the tagging vector in a recombinant virus (for example, in rAAV) in packaging cells, such as, for example, 293T cells.

Infection of the target cells harboring the target gene to be tagged.

Seeding the cells (for example, onto 96-well plates).

Performing selection with a selection agent corresponding to the selectable marker* s ) of the tagging gene. In one example, the selection agent may be puromycin.

Consolidate colonies and identify positive colonies by various screening methods (for example, by performing genomic PGR using primers directed to nucleotide regions within the selectable marker region of the tagging vector and primers located outside the HAs).

Extracting RNA from colonies identified to be positive, prepare cDNA and amplify/sequence the tagged allele to determine the genotype of the tagged allele, Further checking cells identified to be positive, by various methods, such as, for example, by performing immunoprecipitation (IP) and/or Western Blot to test for tagged protein expression, for example using a specific antibody to the tagged used. Performing any desired analysis on the cells (for example, proteomics or any relevant studies.

According to some embodiments, any type of cell may be used as a target cell, including hard to transfect cells, such as, human cells, animal cells, plant cells, and the like.

In some embodiments, the cells are cancer cells, such as melanoma cells. In some embodiments, the cells are low passage cancer cells. In some embodiments, the cells are low passage cancer cells obtained from subjects afflicted with any type of cancer. In some embodiments, the cells are white blood cells (such as, for example, but not limited to: lymphocytes, myeloid, and the like). In some embodiments, the cells are stem cells.

Although the compositions and methods of the invention are exemplified for cancer cells, the compositions are applicable, for the tagging of any desired gene product in any target cell.

In some embodiments, there is provided a use of a tagging vector for endogenously tagging a target gene of interest in a target cell.

In further embodiments, there is provided the use of a tagging vector for the creation/preparation of a target cell having an endogenous target gene comprising an epitope tag.

In some embodiments, upon tagging the target gene, it may be further investigated, for example, by identifying binding partners (for example, by immunoprecipitation assays), testing its activity, testing its effect on various cellular traits, reducing or eliminating its activity or expression, and the like.

In some embodiments, the compositions and methods disclosed herein may be used to study the in-vivo knockdown of specific tagged alleles in cells.

In some embodiments, the compositions and methods disclosed herein may be used to which would enable us to double tag target genes in the same cell.

The following examples are presented to provide a more complete understanding of the invention. The specific techniques, conditions, materials, proportions and reported data set forth to illustrate the principles of the invention are exemplary and should not be construed as limiting the scope of the invention.

EXAMPLES

Example 1; Construction of a Tagging vector

In order to construct the tagging vector, various intermediate constructs were made and tested for their activity. The constructs were made by standard molecular cloning methods known in the art. A schematic description of the various constructs is depicted in Figures 2A-2K.

1. The Construct pTK-Puro-User (Fig. 2A) includes a left inverted repeat region (L- ITR), Left Homology arms (LHA), a 3XFlag tag, a loxP site, a thymidine kinase promoter (pTK) and Puromycin resistance gene (Puro). Further indicated in the figure restriction enzyme sites. The pTK-Puro-User construct was constructed using two step PCR: The pCDFl vector (System Biosciences CDlOOA-1) was used as a template for the amplification of Puro^r cassette. The cloning was performed via Mlul and EcoRV site. In addition, an Ndel site was inserted immediately downstream to the stop codon of the Puro^r cassette. The products were cloned into pTK-Neo-User digested Mlul and EcoRI with The following primers were used:

Step 1 PCR:

Fragment 1:

MluI_Puro_FW:

AGGTGACGCGTGTGGCCTCGAACACCGAGCGACCCTGCAGCCAATATGACC GAGTACAAGCCCAC (SEQ ID NO: 18)

NdeI_Puro_Rev: TATTGCCGATCCCCCATATGTCAGGCACCGGGCTTGCGGG (SEQ ID NO: 19)

Fragment 2:

NdeI_PolyA_LoxP_Fw: CATATGGGGGATCGGCAATAAAAAGAC (SEQ ID NO: 20)

pTK_seq_Rev: CGAGCTGAGGCATAGTCTAG (SEQ ID NO: 21)

Step 2 PCR:

MluI_Puro_FW:

AGGTGACGCGTGTGGCCTCGAACACCGAGCGACCCTGCAGCCAATATGACC GAGTACAAGCCCAC (SEQ ID NO: 22)

pTK_seq_Rev: CGAGCTGAGGCATAGTCTAG (SEQ ID NO: 23)

2. The Construct pTK-Puro-EGFP-User (Fig. 2B) includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a 3XFlag tag, a loxP site, a thymidine kinase promoter (pTK) sequence, Puromycin resistance gene (Puro), EGFP encoding sequence, SV40 polyA signal (SV40), additional loxP site, right homology arm (RHA) and a right inverted repeat region (R-ITR). A fused Puro-EGFP sequence was amplified from the pCMV-Puro-GFP vector, including the polyA signal of the construct (SV40 polyA signal). This construct is based on the pEGFP-Nl (Clontech 6085-1). The Puro^r was cloned upstream to the EGFP via Xhol and Hindlll sites. The MCS of the plasmid serves as a linker between both proteins. The cloning was performed via Mlul and Ndel sites using the following primers:

MluI_Puro_FW:

CCCCCAGGTGACGCGTGTGGCCTCGAACACCGAGCGACCCTGCAGCCAATA TGACCGAGTACAAGCCCAC (SEQ ID NO: 24)

EGFP_Rev: CCCCCCATATGAACTTGTGGCCGTTTACGTC (SEQ ID NO:

25)

3. The Construct PolyA-3Flag W/O stop codon-pTK-Puro-User (Fig. 2C) includes a left inverted repeat region (L-ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a thymidine kinase promoter (pTK) sequence, Puromycin resistance gene (Puro), a LoxP site, right Homology arm (LHA) and a right inverted repeat region (R-ITR).

The pTK-Puro^r-User vector was digested with Agel and filled in with Klenow. Then the construct was digested with EcoRI. The insert was amplified using pTK-Neo-User as a template. The PolyA signal sequence was included in the forward primer between two Agel sites (sequences below). The PCR product was digested with EcoRI prior to the ligation.

AgeI_PolyA_FW:

ACCGGTAATAAAACGATAATAAACCGGTGCTGAGGGAAAGTC (SEQ ID NO: 26)

EcoRI_Flag_WO_TGA_Rev:CCCCCGAATTCCTTGTCATCGTCATCCTTGT (SEQ ID NO: 27)

4. The Construct T2A-Puro-User (Fig. 2D) includes a left inverted repeat region (L- ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a LoxP site, Puromycin resistance gene (Puro), a second LoxP site, right Homology arm (LHA) and a right inverted repeat region (R-ITR). To prepare this construct, T2A and the LoxP sequences were annealed and extended using a Taq Polymerase. A Bglll site was added between the T2A and LoxP sequences. Then the insert was digested with EcoRI and Mlul prior to the ligation to the vector. The sequences of the insert oligos were:

T2A_LoxP FW:

CCCCCGAATTCGGCAGTGGTGAGGGCAGAGGAAGTCTGCTAACATGCGGTG ACGTCGAGGAGAATCCTG (SEQ ID NO: 28)

T2A_LoxP Rev:

CCCCCACGCGTGATAACTTCGTATAATGTATGCTATACGAAGTTATGGATAG ATCTGCTGGGCCAGGATTCTCCTCGACGTCA (SEQ ID NO: 29)

T2A sequence:

GGCAGTGGTGAGGGCAGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGA GAATCCTGGCCC (SEQ ID NO: 30)

5. The Construct T2A-Puro-User (Fig. 2E) includes a left inverted repeat region (L- ITR), a polyA sequence, Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, an EGFP encoding sequence, a LoxP site, Puromycin resistance gene (Puro), a second LoxP site, right Homology arm (LHA) and a right inverted repeat region (R-ITR).

To prepare this construct, the fused Puro^r-EGFP is digested from the pTK-Puror-EGFP- User vector (Fig. 2B) by Mlul and Ndel and ligated into T2A-Puror-User construct (Fig. 2D) that is be digested with the corresponding restriction enzymes.

6. The Construct pT2A-Puro-User W/O LoxP (also named herein pT2A-Puro-User 2^nd generation (Figs. 2F-2G)) includes a left inverted repeat region (L-ITR), an optional polyA sequence (excluded in Fig. 2G), Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a KSi recognition sequence (site), Puromycin resistance gene (Puro) (excluding a start codon (AUG)), encoding for MET), a T3 recognition sequence (site), Right Homology Arm (LHA) and a right inverted repeat region (R-ITR).

For excluding the AUG of the puro^r gene, the pT2A-Puro^r-User was digested with Mlul and EcoRV. The puro^r gene resistance gene was amplified from pCDFl (System Biosciences CDlOOA-1) vector using the following primers:

Forward (5-3) TATCGCACGCGTGTGGCCTCGAACACCGAGCGACCCTGCAGCCAATACCGA GTACAAGCCCACGGT (SEQ ID NO: 31)

Reverse (5-3):

TCAGCGATATCCCTAGGCCGCGGCTTTAGTGAGGGTTAATTGCGCCATATGT CAGGCACCGGGCTTGCGGGT (SEQ ID NO: 32)

The reverse primer includes a T3 sequence, which substitutes the LoxP located downstream to the RHA, to be used for genome PCR screen of the RHA to identify positive clones.

This intermediate plasmid was digested with Bglll and Mlul, treated with Shrimp Alkaline Phosphatase, to exclude the LoxP site located downstream the LHA. 100 pmole of each oligo were treated with T4 polynucleotide kinase, then annealed and ligated to the digested plasmid, to generate the pT2A-Puro^r-User 2^nd generation. The LoxP site was substituted with Ksi sequence oligos as follows:

KSi FW: GATCTCCTCGAGGTGGACGGTATCGCA (SEQ ID NO: 33)

KSi Rev: CGCGTGCGATACCGTCCACCTCGAGGA (SEQ ID NO: 34)

7. The Construct pT2A-Puro-User-N' (Figs. 2H-2I)), used for tagging the N-terminus of a target gene. This construct includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a Puromycin resistance gene (Puro), a T2A (self-cleavable peptide) sequence, a 3XFlag tag lacking a STOP codon, Right Homology Arm (LHA) and a right inverted repeat region (R-ITR).

The construction of the N' -tagging construct was performed by including minor modulations in the pT2A-Puro-User 2^nd generation. The Puro gene with a Kozak translation element was cloned upstream to the T2A peptide followed by the 3XFlag epitope. In this construct, the tagging cassette is inserted instead of the ATG of the target gene of interest. This construct is considered a polyadenylation signal trap, which should contribute to reduce the background caused by random integration into the genome. This is illustrated in Fig. 21, which shows the expected end products formed in the cells.

8. The Constructs C'-3Xflag-sheGFP-User and N'-3Xflax-sheGFP-User (Figs. 2J- 2K) The C'-3Xflag-sheGFP-User construct (Fig. 2J) includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a 3XFlag tag lacking a STOP codon, a T2A (self-cleavable peptide) sequence, a KSi recognition sequence (site), Puromycin resistance gene (Puro) (excluding a start codon (AUG)), EGFP sequence recognized by a specific shRNA (shg), a T3 recognition sequence (site), Right Homology Arm (LHA) and a right inverted repeat region (R-ITR).

The N'-3Xflax-sheGFP-User construct (Fig. 2K) includes a left inverted repeat region (L-ITR), Left Homology arm (LHA), a Puromycin resistance gene (Puro), EGFP sequence recognized by a specific shRNA (shg), a T2A (self-cleavable peptide) sequence, a 3XFlag tag lacking a STOP codon, Right Homology Arm (LHA) and a right inverted repeat region (R-ITR).

These constructs include introducing of an EGFP sequences which is recognized by a specific shRNA to the various tagging constructs described herein. Such constructs can be used for functional assays, to test the effect of various mutations on the function of target genes. This is achieved by introducing a sequence that is recognized by a specific shRNA into the tagging constructs. Following tagging the WT or the mutated alleles a selective down-regulate of the expression of the tagged allele may be performed which can allow to identify and study the consequences on the cells (for example, proliferation, viability, signaling pathways, gene expression, and the like). The EGFP sequence is AACTACAACAGCCACAACGTCTATATC (SEQ ID NO: 16), and it is recognized by a shRNA with the sequence TACAACAGCCACAACGTCTAT (SEQ ID NO: 17) (sheGFP, "shg").

Example 2: Validating the function of the different elements of the T2A-Puro^r- User vector

The activity of the Puro^r cassette was first tested by transfecting the pTK-Puro^r- User and pTK-Puro^r-EGFP-User constructs into 293T cells. 293T cells were seeded onto 6-well plate at a density of 5xl0⁵ cells/well. 24 hours later, cells were transfected with Ιμξ of either pEGFP-Nl, pCMV-Puro-GFP or G-T2A-Puro, using 2μ\ of the transfection reagent Turbofect (Thermo Scientific, R0531). 24 hours post transfection, cells were treated with 3 μg puromycin. Cell viability was estimated 24hrs later.

Hence, 48hrs following transfection, the cells were treated with 3 μg/ml puromycin, (IC₅₀ concentration in 293T cells following 48h of treatment). The PCMV- Puro-GFP construct was used as positive control and non-transfected cells were used as negative control. The expression of the EGFP marker was monitored by fluorescent microscope. The results presented in Fig. 3 show that the tested constructs are able to express GFP in the cells, albeit to lower levels as compared to the control, due to the difference in the promoter used (TK promoter compared to CMV promoter).

Next, the cleaving activity of the T2A peptide was tested and the results are presented in Fig. 4. The cells were harvested 24 hours post transfection using RIPA buffer (10 mM Tris-Cl (pH 8.0), 1 mM EDTA, 0.5 mM EGTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS, 140 mM NaCl protease inhibitors(Sigma) were freshly added to the buffer. The lysates were loaded on AcrylAmid gel, then Blotted using anti-GFP antibody.

The CMV enhancer and promoter region along with the EGFP were amplified from the pEGFP-Nl (Clontech (6085-1)) and cloned in-frame to the 3xFlag-T2A-Puro gene in the T2A-Puro-User vector ("G-T2A-P"). This construct mimics an endogenous tagged gene using the pT2A- Puro-User construct. The construct was transfected into 293T cells. To one plate, puromycin was added on the cells at a concentration of 3μg/ml. The results are shown in Fig. 4, which shows pictograms of control cells or transfected cells, while demonstrating that the cells transfected with G-T2A-P construct are resistant to puromycin.

In order to test the expression of the tag and the identification thereof (3XFLAG) by a specific antibody, cell lysate was prepared and subjected to Western Blot analysis with anti-GFP Ab or Immunoprecipitated (IP) with anti-Flag Antibody. The results are presented in Fig. 5: Cells was transfected with either pEGFP-Nl (G) or pCMV-Puro-GFP (P-G) as controls for the size of the EGFP and the Puro-GFP fusion protein. If the EGFP is cleaved from the Puro gene, a 37 KDa protein is obtained and identified. The uncleaved protein has a size of -65 KDa. To check the recognition of the native 3xFlag, the transfected cells were wash with ice cold PBSxl and then resuspended with IP buffer (50mM Tris-HCL Ph7.5, 150mM NaCl, 1% NP-40, ImM Na3V04, ImM NaF, 0.1% β-mercaptoethanol and lx protease inhibitor). The cell lysates were incubated with 30 μΐ of Anti-Flag M2 Affinity gel beads (Sigma, A2220) at 40°C overnight. The beads were then washed with the binding buffer, boiled with loading buffer, loaded on acrylamid gel and blotted with Rabbit anti-Flag antibody (Sigma, F7425).

As shown in Fig. 5, most of the T2A peptide was cleaved. The uncleaved protein is detectable only following long exposure (arrow, Figure 5, right panel). Since the cleavage of the T2A peptide adds 20 amino acids (a.a.) to the C- terminus of the 3xFlag, it was tested whether this addition may interrupt with the binding of the anti-Flag antibody. To this aim, an immunoprecipitation (IP) experiment was performed on the cell lysates (detailed above), using anti-Flag Antibody (Ab). The resulting IP product was then blotted with either anti-GFP or anti-Flag antibodies. The results presented in Fig. 6 demonstrate that the recognition of the Flag peptide was not interrupted by the addition of the residual amino acids of the cleaved T2A peptide.

Altogether, the results provided herein demonstrate that the various elements of the pT2A- Puro^r-User are functional (i.e. the Puro^r gene, the T2A cleaving peptide and the 3xFlag Tag).

Example 3 - Functional validation of the efficiency of the pT2A-Puro^r-User construct

The efficiency of the T2A-Puro^r-User vector was tested by tagging the RGS7 gene in two different melanoma cells, 53T and 67T cells. The homology arms of the genes were cloned into pT2A-Puro-User and the construct was packaged in a recombinant AAV vector (rAAV). The cells were infected with the virus and seeded onto 96-well plates (5000 cells/well) and then subjected to puromycin. The cells were monitored until large enough colonies were obtained for screening for positive colonies at both the genome and the mRNA levels by PCR. The results are presented in Figs. 7- 8. In Fig. 7, the results of the genomic PCR are depicted. The top panel of Fig. 7 includes a schematic depiction of the locations of the amplified regions and the primers used to identify the left homology arm (LHA) region and the right homology arm (LHA) region. The amplified regions were amplified using both primers from the Flag- Puro cassette and from genomic regions located outside the homology arms, to distinguish between homologous and random integration of the Flag-tag. The arrows represent the primer location to amplify the LHA and RHA. Solid and dashed arrows designate the Forward and Reverse primers, respectively. The middle panel of Fig. 7 shows the PCR products obtained with primers for the LHA region (arrow mark the expected PCR product of the LHA region) and the lower panel of Fig. 7 shows PCR products obtained with primers for the RHA region (arrow mark the expected PCR product of the RHA region). The PCR products of the genomic analysis were verified by sequencing. In Fig. 8, the results of the mRNA expression levels are depicted. Total RNA was extracted from colonies that were suspected to be positive, according to the genomic screening. After reverse transcription reaction (RT), PCR reaction was performed using forward primer from exome in the target gene that is not included in the LHA region and a reverse primer located within the selectable marker gene (Puro). The rectangles mark the expected PCR products. The PCR products of the expression analysis were verified by sequencing (using a primer from the tag (Flag) sequence).

Example 4 - C-Terminus tagging of a target gene

To test the efficiency of the pT2A-Puro-User 2nd generation construct, the homology Arms (HAs) of NRAS target gene were cloned and the construct was infected into A375 melanoma cell line which express WT form of the NRAS. These cells were chosen as a proof of concept to test the efficiency of the construct for their high infectability. Colonies were obtained and screened at the genomic DNA level 10 days following infection. As shown in Fig. 9A, using the RHA primers, a product of the expected length was identified (arrow).

Next, a total of 10 colonies were screened to obtain a positive colony. For further validation of the positive colony, RNA was extracted from the cells and cDNA was prepared and used as a template to amplify the tagged NRAS allele, as further demonstrated in Fig, 9B. The PCR product of colony 5 was sequenced.

Next it was validated that the tagged NRAS allele is expressed at the protein level. Lysates were prepared from parental A375 and Col5 cells. The lysates were incubated with anti-Flag beads. The beads were then washed and loaded on Acrylamid gel and blotted with either anti-Flag or anti-NRAS antibodies. As shown in Fig. 9C, the tagged NRAS allele was pulled-down with the anti-Flag beads and was enriched in the IP (immunoprecipitation) fraction only in the Clon5 cell lysate and not in the parental A375 lysate and was recognized by both the anti-Flag and anti-NRAS antibodies.

The results presented in Fig. 9A-C demonstrate that the constructs used can successfully tag an endogenous gene and to reduce the background of false positive colonies. In addition, for rapidly propagating cells, the time for obtaining a tagged protein is reduced from at least three months to a month and a half from the day of infection. This was achieved by using the promoter trap system and avoiding the use of Cre-recombinase. Example 5 - N- Terminus tagging of a target gene

In order to validate that the N' -tagging construct (pT2A-Puro-User-N') is functional, the tagging cassette was cloned upstream to an EGFP sequence (Fig. 10A). The expression of the EGFP and the puro gene were tested, compared to a Puro-EGFP fused construct, as positive control in 293T cells. The results are shown in Fig. 10B. Additionally, an immunoprecipitation (IP) using anti-Flag beads was performed, and the protein membrane was blotted with anti-GFP and anti-Flag antibodies. The results are presented in Fig. IOC, which show that, a ~30Kda band obtained for the N'-Flag- GFP and a weak band obtained with a size of ~54Kda (for the uncleaved protein), indicating that the native fused protein is recognized by the Flag antibody and that the cleaving efficiency of the T2A is higher than 90%.

Next, an endogenous target gene, NRAS was used for N-tagging using the pT2A-Puro-User-N' construct. The NRAS gene was chosen as a candidate for N'- tagging due to the CAAX motif that is located before the stop codon, which is essential for post translational modifications and localization to the cell membrane. Homolgy arms (HAs) of NRAS, flanking the ATG translation start codon were cloned into pT2A- puror-User-N' . The tagging cassette was packaged for virus extraction then A375 were infected with 12000 MOI viruses. 72h post infection, 2000 cells/well were seeded into 96 well-plates. Once colonies covered 70% of wells surface, genomic DNA (gDNA) was extracted to screen for positive colonies at the genomic level. Thereafter, the suspected colonies were further analyzed for the tagging allele at the mRNA level, by sequencing the PCR product obtained, using a Forward primer located within the tagging cassette and a Reverse primer located in an exon, which is not included in the RHA of the tagging construct. The results are presented in Fig. 11A, wherein in the top panel, the expected PCR products are identified by the arrows. Lower panel of Fig. 11 A, shows the sequence of PCR products that were sequenced to validate the integration of the tagging cassette upstream and in frame to NRAS gene. Next, the tagged protein itself was detected in the cells by Western-Blotting (WB), following IP using anti-Flag agarose beads. Cell extract was prepared from two colonies and incubated with anti-flag agarose beads. The beads were washed and the eluents were loaded on Acrylamid gel then blotted with either anti-NRAS (Fig. 11B) or anti-Flag (Fig. 11C) antibodies. The results are shown in Fig. 11B-C, where bright arrows indicate the tagged protein and the dark arrows point to the light chain of the anti-flag antibody of the agarose beads. The asterisk designates an unspecific band.

Example 6 - estimating the silencing efficiency of sheGFP sequence

In order to estimate the silencing efficiency of the sheGFP sequence, sheGFP was co-transfected with pEGFP-Nl plasmid expressing EGFP reporter gene, into 293T cells. The expression of the EGFP was monitored by fluorescent microscope (Fig. 12A) and by Western Blotting (Fig. 12B) using GFP antibody. As can be seen, transfection with sheGFP caused a reduction in the expression of the EGFP.

Example 7 - N- Terminus tagging of a target gene with a construct which includes sheGFP target sequence

The tagging of the NRAS target protein was performed essentially as described in Example 5, while using the N'-3Xflax-sheGFP-User construct. According to the genomic screen of the cells (shown in Fig. 13A), about 50% of the screened pools were positive. Then, various pools were validated at the mRNA level and sequenced (Fig. 13B), followed by immuno- precipitation to validate the expression of the tagged protein (Fig. 13C).

The results demonstrate the ability of the N'-3Xflax-sheGFP-User construct to successfully N-tag a target protein within the cell.

Example 8 - C-Terminus tagging of a target gene with a construct which includes sheGFP target sequence

TRRAP- transformation/transcription domain-associated protein, is part of the phosphoinositide 3-kinase-related kinases (PIKK) family (ATM, SMG-1, mTOR/FRAP and DNA-PKcs). TRRAP is a common component of many histone acetyltransferase (HAT) complexes and plays a role in transcription and DNA repair by recruiting HAT (STAGA) complexes to chromatin. The TRRAP gene was chosen for C-terminus tagging for being a large protein of -400 kDa, which makes it hard to be cloned and to be studied by ectopic expression.

For the tagging of the TRRAP gene, A375 cells harboring a somatic recurrent mutation in this gene, p.Ser722Phe, were infected with the C'-3xFlag-sheGFP-User tagging construct containing HAs flanking the stop codon of the gene. Three days post infection the cells were seeded into 96-well plates, 2000 cells/well. 24h later puromycin was added till colonies were obtained. 34 single colonies were screened at the genomic level. According to the PCR product size and the sequencing results one colony was found to be positive, demonstrating a tagging efficiency of 3%. The results of the genomic PCR screen are shown in Fig. 14, top panel. Sequencing validation of the positive colony PCR product is presented in the lower panel of Fig. 14.

The results demonstrate the ability of the C'-3xFlag-sheGFP-User construct to C-tag a target protein within the target cell.

Altogether, the results presented herein demonstrate that the provided constructs can indeed be used for specific endogenous tagging of target genes in various cells, including hard to transfect cells, such as melanoma cells. Furthermore, the construct enables the analysis of tested colonies at the mRNA levels without the need for Cre- recombinase (for example, by performing genomic PCR with specific primers). Additionally, in order to reduce background of non-specific expression, the stop codon of the Selectable Marker may be deleted to prevent the formation of cryptic promoters that may increase non- specific expression in cells that have not undergone site specific integration. The results further demonstrate the use of the constructs in tagging target genes both at the C-terminus and N-terminus and further allow to study the interaction of such target proteins as well as to study their function, within the target cell.

Example 9 -Tagging low passage melanoma cells

Four different cell lines with different infection efficiencies were tagged. The results are summarized in Table 1. The tagging efficiency was approximately 20% even for polyploid cells like 12T and for the low infectable cells 17T and HOT.

Table 1

NRAS # of

% of

Genotype alleles % of colonies # of

Cell MOI screened positive positive

(NRAS) copy Infection

colonies clones number (genome)

A375 WT 2n -100 12xl0⁴ 70 28* 40

106T WT 2n -100 12xl0⁴ 80 17* 21

12T Q61R 4n -15 105 44 10* 22 17T Q61K 2n <1 105 96 13* 13

HOT Q61K 2n <1 105 96 17* 18

Table 1: Efficient N-terminus tagging of NRAS in low passage melanoma cells. The table describes the allele copy number of NRAS in the cells, the infection efficiency, the multiplicity of infection (MOI), the number of colonies screened and the number of positive colonies. (*pools, ^#single colony).

To estimate the knockdown efficiency of the tagged allele, the cells were transfected with either a control plasmid (pLKO) or with a plasmid expressing sheGFP. The sequence recognized by the sheGFP was cloned in the tagging cassette. 72 hours post transfection, Western Blot analysis was performed on the cell lysates using anti-NRAS antibody. As illustrated in Fig. 15, the sheGFP caused a reduction of -80% in the expression of the tagged allele.

Claims

WHAT IS CLAIMED IS:

1. A tagging vector for generating an endogenously tagged target gene of interest in a target cell, the vector comprising:

a left homology arm (LHA) nucleotide sequence comprising a nucleotide sequence that is substantially homologous to the 5' region flanking the target gene locus;

a right homology arm (RHA) nucleotide sequence, comprising a nucleotide sequence that is substantially homologous to the 3' region flanking the target gene locus;

a nucleotide sequence encoding for a tag;

a nucleotide sequence encoding for a self cleavable peptide; and

a nucleotide sequence encoding for a selectable marker, wherein following site specific integration of the tagging vector into the target cell, a polycistronic transcript is generated that is regulated by the endogenous promoter of said target gene, said polycistronic transcript comprising the tagged gene and said selectable marker.

2. The vector of claim 1, wherein neither said nucleotide sequence encoding for a selectable marker nor said nucleotide sequence encoding for a tag are operationally fused to a promoter sequence.

3. The vector of claims 1 or 2, being devoid of a promoter sequence.

4. The vector of claim 1 further comprising a left inverted terminal repeat (L-ITR) sequence of adeno associated virus (AAV) and a right inverted terminal repeat (R-LTR) sequence of AAV.

5. The vector of claim 1, further comprising a poly adenylation (PolyA) nucleotide sequence located upstream to the left homology arm.

6. The vector of claim 1, further comprising recombination site sequences and/or recognition site sequences flanking the selectable marker sequence.

7. The vector of claim 6, wherein the recombination sites comprise LoxP sequences.

8. The vector of claim 6, wherein the recognition sequences are selected from KSi, T3, T7, SP6, or combination thereof.

9. The vector of claim 1, wherein the left homology arm (LHA) nucleotide sequence comprises a nucleotide sequence that is substantially homologous to the 5' region flanking the start codon (ATG) of target gene locus; and/or the right homology arm (RHA) nucleotide sequence comprises a nucleotide sequence that is substantially homologous to the 3' region flanking the stop codon of the target gene locus.

10. The vector of claim 1, further comprising a sequence recognizable by a shRNA.

11. The vector of claim 1, wherein the tag is selected from a FLAG tag, 3XFLAG tag, His-tag, Myc-Tag, HA tag, poly-Arg tag, Strep-tag, S-tag, HAT tag, Calmodulin-binding peptide-tag, Cellulose-binding domain-tag, Strep tavidin-binding peptide tag, Chitin-binding domain tag, Glutathione S-transferase tag, Maltose-binding protein (MBP) tag, fragment crystallizable (fc) region tag, or any combinations thereof.

12. The vector of claim 1, wherein the selectable marker is a resistance marker.

13. The vector of claim 12, wherein the resistance marker encodes for a gene that confer resistance to antibiotics, selected from Puromycin-N-acetyl-transferase (puro), neomycin phosphotransferase (neo), hygromycin phosphotransferase (hygro), dihydrofolate reductase, Thymidine Kinase, or combinations thereof.

14. The vector of claim 1, wherein the self-cleavable peptide is T2A peptide.

15. The vector of claim 1, wherein the nucleotide sequence encoding for the self-cleavable peptide is located downstream to the sequence encoding the tag.

16. The vector of claim 1, wherein the sequence encoding for the tag is devoid of a STOP codon.

17. The vector of claim 1, further comprising a nucleotide sequence encoding for a second selectable marker.

18. The vector of claim 17, wherein the second selectable marker is selected from: LacZ, Green Fluorescent Protein (GFP), mCherry, mApple, DsRed, Red Fluorescent Protein (RFP), Blue Fluorescent Protein (BFP), EGFP, CFP, YFP, AmCyanl, ZsGreenl, ZsYellowl, DsRed2, AsRed2, and HcRedl.

19. The vector of claim 4, configured to be packaged in an AAV virion.

20. The vector of claim 1, wherein the cells are of human, animal or plant source.

21. The vector of claim 1, wherein the cells are cancer cells.

22. The vector of claim 21, wherein the cells are melanoma cells.

23. The vector of claim 21, wherein the cancer cells are low passage cancer cells.

24. The vector of claim 18, wherein the cells are stem cells.

25. A composition for endogenously tagging a target gene of interest in a target cell, the composition comprising the tagging vector of claim 1.

26. A method for endogenously tagging a target gene of interest in a target cell, the method comprising transfecting the target cell with the vector of claims 1 or 4, whereby following site specific integration of the tagging vector, a tagged gene is formed in the cell.

27. The method of claim 26, wherein the tagging is effected in the absence of a recombinase enzyme.

28. Use of the tagging vector of claim 1 for endogenously tagging a target gene of interest in a target cell.

29. Use of the tagging vector of claim 1, for the preparation of a target cell having an endogenous target gene comprising an epitope tag.