WO2023196220A2 - Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers - Google Patents

Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers Download PDF

Info

Publication number
WO2023196220A2
WO2023196220A2 PCT/US2023/017257 US2023017257W WO2023196220A2 WO 2023196220 A2 WO2023196220 A2 WO 2023196220A2 US 2023017257 W US2023017257 W US 2023017257W WO 2023196220 A2 WO2023196220 A2 WO 2023196220A2
Authority
WO
WIPO (PCT)
Prior art keywords
ews
seq
zinc finger
ggaa
bold
Prior art date
Application number
PCT/US2023/017257
Other languages
French (fr)
Other versions
WO2023196220A9 (en
WO2023196220A3 (en
Inventor
J. Keith Joung
Y. Esther TAK
Miguel N. RIVERA
Gaylor Boulay
Original Assignee
The General Hospital Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The General Hospital Corporation filed Critical The General Hospital Corporation
Publication of WO2023196220A2 publication Critical patent/WO2023196220A2/en
Publication of WO2023196220A3 publication Critical patent/WO2023196220A3/en
Publication of WO2023196220A9 publication Critical patent/WO2023196220A9/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • C07K14/4701Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
    • C07K14/4702Regulators; Modulating activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • C07K2319/81Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor containing a Zn-finger domain for DNA binding

Definitions

  • the present invention relates to compositions comprising zinc fingers and methods of use thereof for the treatment of nucleotide repeat expansion disorders such as Ewing Sarcoma.
  • Nucleotide repeat expansion disorders involve the localized expansion of unstable repeats of sets of three, four, five, or more nucleotides and can result in loss of function of the gene in which the repeat resides, a gain of toxic function, or both. Expanded repeat regions within non-coding sequences can lead to aberrant expression of the gene while expanded repeats within coding regions (also known as codon reiteration disorders) may cause mis-folding and protein aggregation. The exact cause of the pathophysiology associated with the aberrant proteins is often not known.
  • Ewing sarcoma is an aggressive pediatric malignancy that likely arises from neural crest- or mesoderm-derived mesenchymal stem cells (MSCs). It is driven by oncogenic fusions between EWS and genes in the ETS family (mostly FLU). EWS-FLI1 binds DNA either at ETS-like consensus sites containing a GGAA core motif or, more specifically with respect to other ETS family members, at GGAA microsatellites, where the enhancer activity increases with the number of consecutive GGAA motifs.
  • the human genome contains thousands of GGAA-microsatellites. As such, in Ewing Sarcoma, the disease is caused by the widespread activation of GGAA, and illustrates the need for therapeutic agents that are able to perturb these elements.
  • Repeat elements can be dysregulated at genome-wide scale in human diseases.
  • Ewing sarcoma hundreds of normally inert GGAA tandem repeats can be converted into de novo transcriptional enhancers when bound by the EWS-FLI1 oncogenic fusion protein.
  • ZFAs zinc finger arrays
  • a fusion of a KRAB repression domain to a GGAA repeat-targeted ZFA could silence GGAA microsatellite enhancers genome-wide in Ewing sarcoma cells, thereby reducing expression of EWS-FLI1- activated genes.
  • this KRAB-ZFA fusion showed selective toxicity against Ewing sarcoma cell lines compared with other non-Ewing cancer cell lines, consistent with its Ewing sarcoma-specific impact on the transcriptome.
  • engineered zinc finger arrays comprising 6 zinc finger recognition regions, wherein the zinc finger array binds a target sequence of GGAAGGAAGGAAGGAAGG (SEQ ID NO:4) or AAGGAAGGAAGGAAGGAA (SEQ ID NO: 5).
  • the engineered zinc finger array comprises at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to an amino acid sequence set forth in any one of SEQ ID NOs:24-39.
  • the engineered zinc finger array comprises the amino acid sequence set forth in SEQ ID NO: 30.
  • isolated cells comprising the zinc finger array according to any one of the aforementioned embodiments.
  • provided herein are isolated nucleic acid encoding the zinc finger array according to any one of the aforementioned embodiments.
  • provided herein is a vector comprising the isolated nucleic acid described above.
  • fusion protein comprising the zinc finger arrays according to any one of the aforementioned embodiments fused to a heterologous functional domain, with an optional intervening linker, wherein the linker does not interfere with activity of the fusion protein.
  • the heterologous functional domain is a transcriptional silencer or transcriptional repression domain.
  • the transcriptional repression domain is a Krueppel- associated box (KRAB) domain, ERF repressor domain (ERD), or mSin3 A interaction domain (SID).
  • the transcriptional silencer is Heterochromatin Protein 1 (HP1).
  • isolated cells comprising the fusion protein according to any one of the aforementioned embodiments.
  • provided herein are isolated nucleic acids encoding the fusion according to any one of the aforementioned embodiments.
  • provided herein is a vector comprising the isolated nucleic acid described above.
  • Also provided herein are methods of reducing aberrant gene expression driven by activation of GGAA-microsatellites in a cell comprising contacting the cell with an effective amount of the fusion proteins as described above, or the isolated nucleic acid as described above. Also provided herein are methods of treating a subject who has a disease associated with aberrant gene expression driven by activation of GGAA- microsatellites in a cell, the method comprising administering to the subject an effective amount of a composition comprising the fusion proteins as described above, or the isolated nucleic acid as described above. In some embodiments of the methods described above, the subject has Ewing sarcoma.
  • the composition is administered by injection into or near a tumor, or by application after surgical resection. In some embodiments of the methods described above, the composition is administered by injection into or near a tumor, or by application before surgical resection. In some embodiments, the method of treating a subject further comprises treating a subject with one or more chemotherapy agents. In some embodiments, the chemotherapy is one of vincristine, doxorubicin, cyclophosphamide, ifosfamide, etoposide, or a combination thereof. In some embodiments, the composition is administered before radiation.
  • FIGs. 1A-1D Engineering ZFAs to bind GGAA microsatellites in the human genome and efficient activation of a target gene by engineered ZFAs fused to EWS.
  • FIG. 1A Schematic of 16 ZFAs, each engineered to bind ⁇ 4.5 GGAA microsatellites.
  • the ZFAs have six zinc fingers, and each finger recognizes three nucleotides.
  • the target sequences of ZFA 1 through 8 start with GGA, and ZFA 9 through 16 with AAG.
  • the amino acid compositions of recognition helices for each zinc finger are shown on the right. Multiple zinc fingers with different recognition helices can recognize the same nucleotides.
  • FIG. IB The amino acid compositions of recognition helices for each zinc finger are shown on the right. Multiple zinc fingers with different recognition helices can recognize the same nucleotides.
  • FIG. 1C 32 fusions of EWS and ZFAs that target GGAA repeats were tested for UGT3A2 gene activation in U2OS cells by nucleofection.
  • EWS-ZFA7 closely mimicked the activation level of EWS- FLI1 and therefore was selected for further experiments.
  • FIG. ID 32 fusions of EWS and ZFAs that target GGAA repeats were tested for UGT3A2 gene activation in U2OS cells by nucleofection.
  • EWS-ZFA7 closely mimicked the activation level of EWS- FLI1 and therefore was selected for further experiments.
  • FIG. ID 32 fusions of EWS and ZFAs that target GGAA repeats were tested for UGT3A2 gene activation in U2OS cells by nucleofection.
  • EWS-ZFA7 closely mimicked the activation level of EWS- FLI1 and therefore was selected for further experiments.
  • UGT3A2 mRNA expression of UGT3A2 in U2OS cells nucleofected with EWS-ZFA7, EWS-dCas9, dCas9-EWS or dCas9-based bi-partite EWS activators (dCas9-DmrA and DmrC-EWS).
  • the bi-partite system increases the density of EWS molecules recruited to a target site.
  • FIGs. 2A-2B Gene activation by dCas9-based EWS activators targeting specific promoters in the human genome.
  • FIG. 2A mRNA expression levels of the endogenous IL2RA, CD69, HBB, and HBG promoters in the presence of EWS-dCas9, dCas9-EWS or dCas9-based bi-partite EWS with single gRNAs (1, 2, and 3) or pooled gRNAs (all) targeting promoter sequences in U2OS cells. Relative expression of each gene was measured by RT-qPCR, normalized to HPRT levels and calculated relative to that of a control sample expressing a non-targeting gRNA.
  • FIGs. 3A-3H Efficient and specific binding of EWS-ZFA at GGAA repeats in MSCs induces active chromatin and activation of GGAA repeat associated genes.
  • FIG. 3A GGAA repeat motifs identified at sites bound by EWS-ZFA in MSCs (AGGAAGGAAGGAAGGAAGGAAGGA, SEQ ID NO: 134).
  • FIG. 3C Bar plots showing the fraction of GGAA repeats in the genome bound by EWS-ZFA (left) and EWS-FLI1 (right) upon lentiviral transduction in MSCs. Data from one of two biological replicate experiments is shown. The number of consecutive GGAA repeats in each category is shown on the x-axis.
  • FIG. 3E Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the IGF2BP1 locus containing a GGAA repeats element and a canonical ETS binding site.
  • FIG. 3F Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the IGF2BP1 locus containing a GGAA repeats element and a canonical ETS binding site.
  • FIG. 3G Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the NIBAN3 and COLGALT1 loci containing a canonical ETS binding site.
  • FIG. 3H Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the NIBAN3 and COLGALT1 loci containing a canonical ETS binding site.
  • FIGs. 4A-4G Efficient binding of EWS-ZFA at GGAA repeats in MSCs and comparison of changes in H3K27ac ChlP-seq signals in MSCs after treatment with EWS- FLI1 and EWS-ZFA.
  • FIG. 4A GGAA repeat motifs identified at sites bound by EWS- ZFA from a second biological replicate experiment in MSCs (GAAGGAAGGAAGGAAGGAAGGAAG, SEQ ID NO: 135).
  • FIG. 4C Bar plot showing the number of GGAA repeat microsatellites genome-wide based on the number of consecutive GGAA repeats.
  • FIG. 4D Bar plots showing the fraction of GGAA repeats in the genome bound by EWS-ZFA (left) and EWS-FLI1 (right) upon lentiviral transduction in MSCs. The number of consecutive GGAA repeats in each category is shown on the x-axis. The data shown corresponds to the second of two biological replicate experiments.
  • FIG. 4E Bar plot showing the number of GGAA repeat microsatellites genome-wide based on the number of consecutive GGAA repeats.
  • FIG. 4D Bar plots showing the fraction of GGAA repeats in the genome bound by EWS-ZFA (left) and EWS-FLI1 (right) upon lentiviral transduction in MSCs. The number of consecutive GGAA repeats in each category is shown on the x-axis. The data shown corresponds to the second of two biological replicate experiments.
  • FIG. 4F Protein expression levels of Control (Empty Vector), EWS-ZFA and EWS-FLI1 in MSCs. Protein levels were determined by immunoblotting using specific antibodies directed against HA. GAPDH was used as loading control.
  • FIG. 4G Protein expression levels of Control (Empty Vector), EWS-ZFA and EWS-FLI1 in MSCs. Protein levels were determined by immunoblotting using specific antibodies directed against HA. GAPDH was used as loading control.
  • FIG. 4G Protein expression levels of Control (Empty Vector), EWS-ZFA and EWS-FLI1 in MSCs. Protein levels were determined by immunoblotting using specific antibodies directed against HA. GAPDH was used as loading control.
  • FIGs. 5A-5H Binding of KRAB-ZFAto GGAA repeats induces selective toxicity in Ewing sarcoma cell lines by repressing target gene expression.
  • FIG. 5B Composite plot showing EWS-FLI1 occupancy of GGAA repeats after introduction of KRAB-ZFA or GFP (control) in SKNMC. The x axis represents a 10-Kb window centered on 812 GGAA repeats.
  • FIG. 5C The x axis represents a 10-Kb window centered on 812 GGAA repeats.
  • FIG. 5D Example showing the binding of KRAB-ZFA (3xHA-tagged), endogenous EWS-FLI, H3K9me3 and H3K27ac a GGAA repeat element associated with the CCND1 locus, after treatment of SKNMC cells with KRAB-ZFA constructs. GFP was used as control.
  • FIG. 5E Example showing the binding of KRAB-ZFA (3xHA-tagged), endogenous EWS-FLI, H3K9me3 and H3K27ac a GGAA repeat element associated with the CCND1 locus, after treatment of SKNMC cells with KRAB-ZFA constructs. GFP was used as control.
  • FIG. 5E Example showing the binding of KRAB-ZFA (3xHA-tagged), endogenous EWS-FLI, H3K9me3 and H3K27ac a GGAA repeat element associated with the CCND1 locus, after treatment of SKNMC cells with KRAB-ZFA constructs. GFP
  • FIG. 5F Example showing the binding of KRAB-ZFA (3xHA-tagged), H3K9me3 and H3K27ac at a GGAA repeat element associated with the CCND1 locus, after the treatment of HEK293T cells with KRAB-ZFA construct. GFP was used as control.
  • FIG. 5G Example showing the binding of KRAB-ZFA (3xHA-tagged), H3K9me3 and H3K27ac at a GGAA repeat element associated with the CCND1 locus, after the treatment of HEK293T cells with KRAB-ZFA construct. GFP was used as control.
  • FIG. 5G Example showing the binding of KRAB-ZFA (3xHA-tagged), H3K9me3 and H3K27ac at a GGAA repeat element associated with the CCND1 locus
  • FIG. 5H Viability of Ewing sarcoma and non-Ewing cell lines 8 days post lentiviral transduction of KRAB-ZFA and GFP (control). Open circles indicate two biological replicates with three technical replicates, error bars show the s.e.m.
  • FIGs. 6A-6F Changes in EWS-FLI1 occupancy and chromatin states upon binding of KRAB-ZFA at GGAA repeats and ETS canonical binding sites in Ewing Sarcoma cell lines.
  • FIG. 6B Composite plot showing decreased EWS-FLI1 occupancy at GGAA repeat enhancers after introduction of KRAB-ZFA in A673. GFP was used as control.
  • the x-axis represents a 10-Kb window centered on 812 GGAA repeats.
  • FIG. 6C The x-axis represents a 10-Kb window centered on 812 GGAA repeats.
  • FIG. 6D Composite plots showing maintained EWS-FLI1 occupancy at canonical ETS binding sites after introduction of KRAB-ZFA in SKNMC and A673 cells. GFP was used as control.
  • FIG. 6F Boxplots showing changes in FLU (EWS-FLI1) ChlP-seq signals upon lentiviral induction of KRAB-ZFA in SKNMC and A673 cells at GGAA repeat microsatellites (blue, n
  • FIGs. 7A-7E KRAB-ZFAs can silence GGAA repeat-associated genes in Ewing Sarcoma cells but not in HEK293T.
  • FIG. 7B The results from two biological replicates.
  • FIG. 7D
  • FIG. 7E Protein levels of KRAB-ZFA and EWS-FLI1 across all cell lines tested ( Figure 3a) were determined by immunoblotting using specific antibodies directed against HA (KRAB- ZFA) and FLU (EWS-FLI1). GAPDH was used as loading control.
  • Microsatellite repeats are a class of simple tandem repeats that previous studies have shown can be dysregulated in multiple disease states (Subramanian, Mishra, and Singh 2003; Malik et al. 2021; Trost et al. 2020; Usdin 2008).
  • large scale epigenetic dysregulation of microsatellite repeats has been observed in Ewing sarcoma, a pediatric bone tumor where the EWS-FLI1 translocation fusion protein operates as a transcriptional pioneer factor (Delattre et al. 1992; Riggi et al. 2014). This fusion includes both the N-terminal transactivation domain of EWS and the C-terminal DNA binding domain of FLU.
  • EWS-FLI1 can bind to both non-repeat GGAA motifs and GGAA microsatellite repeats.
  • binding of EWS-FLI1 to the hundreds of GGAA microsatellites present throughout the human genome converts them into transcriptional enhancers, thereby inducing a tumor-specific gene regulatory program (Gangwal et al. 2008; Guillon et al. 2009; Riggi et al. 2014; Boulay et al. 2017).
  • This example together with the dysregulated expression of other repeat classes in other tumor types (Ting et al. 2011; Burns 2017), illustrates how aberrant transcriptional programs in cancer and other diseases can be caused by the widespread activation of specific repeat categories and highlights the need for robust tools to conduct genome-wide studies and perturbation of these elements.
  • Described herein are engineered ZFAs that can be used to efficiently target and alter the chromatin state of a class of microsatellite repeats in human cells.
  • EWS-FLI1 GGAA microsatellite repeats bound by EWS-FLI1
  • engineered EWS-ZFA fusion proteins targeted to these repeats can be over an order of magnitude more efficient than an EWS-dCas9-targeted fusion for activating a GGAA repeat previously shown to be converted into a de novo enhancer by EWS-FLI1.
  • EWS-ZFA fusions can effectively phenocopy the pioneer function of EWS-FLI1 at GGAA microsatellites and recapitulate the GGAA repeat-dependent chromatin landscape and gene expression profiles of Ewing sarcoma.
  • coupling of a GGAA repeat- targeted ZFAto a transcriptional repressor KRAB domain resulted in genome-wide silencing of GGAA microsatellites and cytotoxicity that was selective for Ewing sarcoma cells through the targeted inactivation of oncogenic gene expression programs.
  • Our results validate the power and efficacy of engineered ZF technology for targeting and altering the functional state of microsatellite repeats and illustrate how this platform can be deployed to interrogate the function of microsatellite repetitive elements at genome-scale.
  • exogenous nucleic acid sequence is a nucleic acid sequence that is not normally present in a cell, but can be introduced into a cell by one or more genetic, biochemical or other methods. Normal presence in the cell is determined with respect to the particular developmental stage and environmental conditions of the cell. Thus, for example, as used herein, an extrachromosomal DNA sequence that is introduced into the cell is an exogenous nucleic acid (even if part or all of that sequence is also present in the genome of the cell). Similarly, a nucleic acid sequence that is present only during embryonic development of muscle is an exogenous nucleic acid sequence with respect to an adult muscle cell.
  • a nucleic acid sequence induced by heat shock is an exogenous molecule with respect to a non-heat-shocked cell.
  • An exogenous nucleic acid sequence can comprise, for example, a functioning version of a malfunctioning endogenous gene.
  • an “endogenous” nucleic acid sequence is one that is normally present in a particular cell at a particular developmental stage under particular environmental conditions.
  • an endogenous nucleic acid can comprise a chromosome, the genome of a mitochondrion, chloroplast or other organelle, or a naturally- occurring episomal nucleic acid.
  • Nucleic acid refers to deoxyribonucleotides or ribonucleotides in either single- or double-stranded form.
  • the term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which can be synthetic, naturally occurring, or non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs).
  • PNAs peptide-nucleic acids
  • nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide.
  • a “gene,” for the purposes of the present disclosure, includes a DNA region encoding a gene product (see infra), as well as all DNA regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences.
  • a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.
  • polypeptide “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues.
  • the terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y-carboxyglutamate, and O-phosphoserine.
  • Amino acid analog refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a-carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine, and methyl sulfonium.
  • Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • Ranges provided herein are understood to be shorthand for all of the values within the range.
  • a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 (as well as fractions thereof unless the context clearly dictates otherwise).
  • compositions comprising a zinc finger DNA-binding domain that specifically binds to a target site in any gene comprising a tetra-nucleotide repeat, e g., GGAA.
  • zinc finger refers to a polypeptide comprising a DNA binding domain that is stabilized by zinc.
  • the individual DNA binding domains are typically referred to as “fingers.”
  • a zinc finger protein has at least one finger, preferably two fingers, three fingers, four fingers, five fingers, or six fingers.
  • a zinc finger protein having two or more zinc fingers is referred to as a “multi-finger” or “multi- zinc finger” protein or “multi-finger array” or “zinc finger array.”
  • Each finger typically comprises an approximately 30 amino acid, zinc- chelating, DNA-binding domain.
  • An exemplary motif characterizing one class of these proteins is X(2)-Cys-X(2,4)-Cys-X(12)-His-X(3- 5)-His (SEQ ID NO: 1), where X is any amino acid, which is known as the “C(2)H(2)” class.
  • Zinc finger units are joined together by non-canonical (non-TGEKP linkers) such as TGSQKP (SEQ ID NO:2) or CGSQKP (SEQ ID NO:3).
  • a single zinc finger of this C(2)H(2) class consists of an alpha helix containing the two invariant histidine residues coordinated with zinc along with the two cysteine residues of a single beta turn (Berg and Shi, Science 271:1081-1085 (1996)).
  • Each finger within a zinc finger array binds to about two to about five nucleotides within a DNA sequence.
  • a zinc finger array that include three fingers typically recognize a target site that includes 9 or 10 nucleotides; a zinc finger arrays that include four fingers typically recognize a target site that includes 12 to 14 nucleotides; while a zinc finger arrays having six fingers can recognize target sites that include 18 to 21 nucleotides.
  • the zinc finger protein/array is a non-naturally occurring protein, in that it is engineered to bind to a target site of choice.
  • An engineered zinc finger binding domain can have a novel binding specificity, compared to a naturally- occurring zinc finger.
  • Engineering methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, using databases comprising tri-nucleotide sequences and individual zinc finger amino acid sequences, in which each tri-nucleotide sequence is associated with one or more amino acid sequences of zinc fingers which bind the particular tri-nucleotide sequence.
  • Engineered zinc finger proteins are non-naturally occurring zinc finger proteins whose recognition helices have been altered (e.g., by selection and/or rational design) to bind to a pre-selected target site.
  • Any of the zinc finger arrays described herein may include 1, 2, 3, 4, 5, 6 or more zinc fingers, each zinc finger having a recognition helix that binds to a target subsite in the selected sequence(s) (e.g., gene(s)).
  • the recognition helix is non-naturally occurring.
  • the zinc finger proteins have the recognition helices shown in FIG. 1 A.
  • the DNA binding domain is an engineered zinc finger array including four to six fingers that is capable of recognizing target sites of 12 to 18 nucleotides (e.g., a zinc finger array having 6 fingers that recognizes target sites of 18 nucleotides).
  • Each zinc finger within the array is designed to target a trinucleotide sequence.
  • each zinc finger is designed to recognize GGA, AGG, AAG, or GAA. Therefore, when the zinc finger array is appropriately assembled, the zinc finger array can recognize sequences such as GGAAGGAAGGAAGGAAGG (SEQ ID NO:4) or AAGGAAGGAAGGAAGGAA (SEQ ID NO: 5).
  • FIG. 1 A is a schematic of 16 different ZFAs, each engineered to bind ⁇ 4.5 GGAA microsatellites.
  • the ZFAs each have six zinc fingers, and each finger recognizes three nucleotides.
  • the target sequences of ZFAs 1 through 8 start with GGA, and ZFAs 9 through 16 with AAG.
  • the amino acid compositions of recognition helices for each zinc finger are shown on the right side of FIG. 1A. Multiple zinc fingers with different recognition helices can in certain instances recognize the same nucleotides.
  • Fusion proteins comprising DNA-binding proteins as described herein and a heterologous regulatory (functional) domain (or functional fragment thereof) are also provided.
  • Common domains include, e.g., transcriptional repressors (e.g., KRAB, ERD, SID, TGF-P-inducible early gene (TTEG), v-erbA, MBD2, MBD3, Rb, MeCP2, R0M2, AtHD2A, and others, e.g., amino acids 473-530 of the ets2 repressor factor (ERF) repressor domain (ERD), amino acids 1-97 of the KRAB domain of K0X1, or amino acids 1-36 of the Mad mSIN3 interaction domain (SID); see Beerli et al., PNAS USA 95: 14628-14633 (1998)) or silencers such as Heterochromatin Protein 1 (HP1, also known as swi6), e.g., HPla or HP10; proteins or peptides that could
  • the fusion proteins include a linker between the zinc finger array and the heterologous functional domains. Domains could also be proteins that recruit (either directly or indirectly) other proteins in the cell that in turn can modulate gene expression.
  • linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins.
  • the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine).
  • the linker comprises one or more units consisting of GGGS (SEQ ID NO:6) or GGGGS (SEQ ID NO:7), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:6) or GGGGS (SEQ ID NO:7) unit.
  • Other linker sequences can also be used.
  • Indirect fusions include one or more dimerization systems (e.g., heterodimer systems containing DmrAand DmrC) that mediate coupling of different domains (e.g., DNA-binding domains and gene expression modulating domains), for example, by addition of a drug that induces activation of the dimerization systems.
  • the zinc finger fusion protein e.g., a zinc finger that targets GGAA repeats and a repressor domain
  • the nucleic acid encoding the zinc finger fusion protein can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression.
  • Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the zinc finger fusion protein for production of the zinc finger fusion protein.
  • the nucleic acid encoding the zinc finger fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
  • a sequence encoding a zinc finger fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription.
  • Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010).
  • Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva etal., 1983, Gene 22:229-235). Kits for such expression systems are commercially available.
  • Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
  • the promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the zinc finger fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the zinc finger fusion protein. In addition, a preferred promoter for administration of the zinc finger fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity.
  • the promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino etal., 1998, Gene Then, 5:491-496; Wang et al., 1997, Gene Then, 4:432-441; Neering etal., 1996, Blood, 88: 1147-55; and Rendahl etal., 1998, Nat. Biotechnol., 16:757-761).
  • elements that are responsive to transactivation e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e
  • the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic.
  • a typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the zinc finger fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination.
  • Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
  • the particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the zinc finger fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc.
  • Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
  • Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus.
  • eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
  • the vectors for expressing the zinc finger fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the Hl, U6 or 7SK promoters. These human promoters allow for expression of zinc finger fusion proteins in mammalian cells following plasmid transfection.
  • Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase.
  • High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
  • the elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
  • Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264: 17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol.
  • nucleic acids encoding the fusion proteins, as well as cells, tissues, and transgenic animals comprising the nucleic acids and optionally expressing the fusion proteins.
  • Any nucleic acid construct capable of directing expression and/or which can transfer sequences to target cells can be used to administer the nucleic acid sequences described herein encoding either the exogenous nucleic acid sequence to be inserted within the target site or the zinc finger nuclease fusion proteins.
  • Nucleic acid sequences described herein can be delivered to cells with vector delivery systems, including viral vector delivery systems comprising DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
  • vector refers to nucleic acid molecules, usually doublestranded DNA, which may have inserted into it another nucleic acid molecule, such as a sequence encoding a nuclease fusion protein.
  • the vector is used to transport the inserted nucleic acid molecule into a suitable host cell.
  • a vector may contain the necessary elements that permit transcribing the inserted nucleic acid molecule, and translating the transcript into a polypeptide.
  • the vector Once in the host cell, the vector may for instance replicate independently of, or coincidental with, the host chromosomal DNA, and several copies of the vector and its inserted nucleic acid molecule may be generated.
  • vector may thus also be defined as a gene delivery vehicle that facilitates gene transfer into a target cell.
  • This definition includes both non-viral and viral vectors.
  • gene delivery systems can be used to combine viral and non-viral components, such as nanoparticles or virosomes (Yamada et al. (2003) Nat Biotechnol . 21, 885-890).
  • Non- viral vectors include but are not limited to cationic lipids, liposomes, nanoparticles, PEG, PEI, etc.
  • Viral vectors are derived from viruses including but not limited to: retrovirus, lentivirus, adeno-associated virus, adenovirus, herpesvirus, hepatitis virus or the like.
  • viral vectors are replication-deficient as they have lost the ability to propagate in a given cell since viral genes essential for replication have been eliminated from the viral vector.
  • RNA or DNA viral based systems for the delivery of nucleic acids takes advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus.
  • Viral vectors can be derived from lentivirus, adeno-associated virus, adenovirus, retroviruses and antiviruses.
  • Conventional viral based systems for the delivery of nucleic acid sequences could include retroviral, lentiviral, adenoviral, adeno- associated, herpes simplex virus, and TMV-like viral vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
  • Retroviruses and antiviruses are RNA viruses that have the ability to insert their genes into host cell chromosomes after infection. Retroviral and lentiviral vectors have been developed that lack the genes encoding viral proteins, but retain the ability to infect cells and insert their genes into the chromosomes of the target cell (Miller (1990) Mol Cell Biol. 10, 4239-4242; Naldini et al. (1996) Science 272, 263-267; VandenDriessche et al., (1999) Proc Natl Acad Sci USA. 96, 10379-10384.
  • lentiviral vectors can transduce both dividing and non-dividing cells whereas MLV-based retroviral vectors can only transduce dividing cells.
  • Adenoviral vectors are designed to be administered directly to a living subject. Unlike retroviral vectors, most of the adenoviral vector genomes do not integrate into the chromosome of the host cell. Instead, genes introduced into cells using adenoviral vectors are maintained in the nucleus as an extrachromosomal element (episome) that persists for an extended period of time. Adenoviral vectors will transduce dividing and nondividing cells in many different tissues (Chuah et al. (2003) Blood. 101, 1734-1743). Another viral vector is derived from the herpes simplex virus, a large, double-stranded DNA virus. Recombinant forms of the vaccinia virus, another dsDNA virus, can accommodate large inserts and are generated by homologous recombination.
  • Adeno-associated virus is a small ssDNA virus which infects humans and some other primate species, not known to cause disease and consequently causing only a very mild immune response. AAV can infect both dividing and non-dividing cells and may incorporate its genome into that of the host cell. These features make AAV a very attractive candidate for creating viral vectors for gene therapy, although the cloning capacity of the vector is relatively limited. In a specific embodiment described herein, the vector used is therefore derived from adeno associated virus.
  • the zinc finger fusion proteins described herein can be delivered to cells by conventional protein transduction methods known in the art.
  • one or more Nuclear Localization Signals (NLS) or protein transduction domains e.g., penetratin or transportan
  • NLS Nuclear Localization Signals
  • protein transduction domains e.g., penetratin or transportan
  • Such methods are described, for example by Liu, J. et al, Molecular Therapy-Nucleic Acids (2015) 4, e232 and Gaj, T. et al, ACS Chem. Biol. 2014, 9, 1662-1667.
  • Cys2His2 zinc fingerss themselves harbor intrinsic cell transduction properties. See, e.g., Gaj T, Guo J, Kato Y, Sirk SJ, Barbas CF 3rd. Nat Methods.
  • the zinc finger fusion proteins include a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide or hCT derived cell-penetrating peptides, see, e.g., Caron etal., (2001) Afo/ Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton FL 2002); El-Andaloussi etal., (2005) Curr Pharm Des. 11 (28):3597- 611; and Deshayes et aL, (2005) Cell Mol Life Sci. 62(16): 1839-49.
  • a cell-penetrating peptide sequence that facilitates delivery to the intracellular space
  • HIV-derived TAT peptide or hCT derived cell-penetrating peptides see, e.g., Caron etal., (2001) Afo/ Ther
  • CPPs Cell penetrating peptides
  • cytoplasm or other organelles e.g. the mitochondria and the nucleus.
  • molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonucleotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanoparticles, and liposomes.
  • CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g.
  • CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55: 1189-1193, Vives etal., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi etal., (1994) J. Biol. Chem. 269: 10444-10450), polyarginine peptide sequences (Wender etal., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
  • CPPs can be linked with their cargo through covalent or non-covalent strategies.
  • Methods for covalently joining a CPP and its cargo are known in the art, e.g. chemical cross-linking (Stetsenko etal., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara etal., (1998) Nat. Med. 4:1449-1453).
  • Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
  • CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard etal., (2000) Nature Medicine 6(11): 1253-1257), siRNA against cyclin Bl linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al. , (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Afo/. Cancer Ther. 1(12): 1043-1049, Snyder et al., (2004) PLoS Biol.
  • CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications.
  • green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4): 511-518).
  • Tat conjugated to quantum dots have been used to successfully cross the blood- brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146).
  • CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm.
  • zinc finger fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences (one or more hexahistidine sequences).
  • affinity tags can facilitate the purification of recombinant zinc finger fusion proteins.
  • the zinc finger fusion proteins do not include a NLS or hexahistidine sequence.
  • compositions and kits comprising the zinc finger fusion protein described herein.
  • the kits can also include one or more additional reagents, e.g., additional enzymes (such as RNA polymerases) and buffers, e.g., for use in a method described herein.
  • additional reagents e.g., additional enzymes (such as RNA polymerases) and buffers, e.g., for use in a method described herein.
  • compositions comprising the zinc finger fusion proteins described herein as an active ingredient.
  • compositions typically include a pharmaceutically acceptable carrier.
  • pharmaceutically acceptable carrier includes saline, solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. Supplementary active compounds can also be incorporated into the compositions.
  • compositions are typically formulated to be compatible with its intended route of administration.
  • routes of administration include intrathecal, intraperitoneal, intraocular, oral, intravenous, intradermal, subcutaneous, oral, intratumoral injection, administration by a gel for slow release, or an infusion pump.
  • solutions or suspensions used for administration to the eye, parenteral, intradermal, or subcutaneous application can include the following components: a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates or phosphates and agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide.
  • the parenteral preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of glass or plastic.
  • compositions suitable for injectable use can include sterile aqueous solutions (where water-soluble) or dispersions and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersion.
  • suitable carriers include physiological saline, bacteriostatic water, Cremophor ELTM (BASF, Parsippany, NJ) or phosphate buffered saline (PBS).
  • the composition must be sterile and should be fluid to the extent that easy syringability exists. It should be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms such as bacteria and fungi.
  • the carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyetheylene glycol, and the like), and suitable mixtures thereof.
  • the proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use of surfactants.
  • Prevention of the action of microorganisms can be achieved by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like.
  • isotonic agents for example, sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in the composition.
  • Prolonged absorption of the injectable compositions can be brought about by including in the composition an agent that delays absorption, for example, aluminum monostearate and gelatin.
  • Sterile injectable solutions can be prepared by incorporating the active compound in the required amount in an appropriate solvent with one or a combination of ingredients enumerated above, as required, followed by filtered sterilization.
  • dispersions are prepared by incorporating the active compound into a sterile vehicle, which contains a basic dispersion medium and the required other ingredients from those enumerated above.
  • the preferred methods of preparation are vacuum drying and freeze-drying, which yield a powder of the active ingredient plus any additional desired ingredient from a previously sterile- filtered solution thereof.
  • Oral compositions generally include an inert diluent or an edible carrier.
  • the active compound can be incorporated with excipients and used in the form of tablets, troches, or capsules, e.g., gelatin capsules.
  • Oral compositions can also be prepared using a fluid carrier for use as a mouthwash.
  • Pharmaceutically compatible binding agents, and/or adjuvant materials can be included as part of the composition.
  • the tablets, pills, capsules, troches and the like can contain any of the following ingredients, or compounds of a similar nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or orange flavoring.
  • a binder such as microcrystalline cellulose, gum tragacanth or gelatin
  • an excipient such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch
  • a lubricant such as magnesium stearate or Sterotes
  • a glidant such as colloidal silicon dioxide
  • the compounds can be delivered in the form of an aerosol spray from a pressured container or dispenser that contains a suitable propellant, e.g., a gas such as carbon dioxide, or a nebulizer.
  • a suitable propellant e.g., a gas such as carbon dioxide, or a nebulizer.
  • compositions may also be formulated to provide slow, controlled or sustained release of the active agent using, by way of example, hydroxypropyl methyl cellulose in varying proportions or other polymer matrices, liposomes and/or microspheres.
  • the pharmaceutical compositions described herein may contain opacifying agents and may be formulated so that they release the active agent only, or preferentially, in a certain portion of the gastrointestinal tract, optionally, in a delayed manner.
  • the active agent can also be in micro-encapsulated form, if appropriate, with one or more of the above-described excipients.
  • the methods described herein include methods for the treatment of disorders associated with GGAA tandem repeats.
  • the disorder is Ewing Sarcoma.
  • the disorder is prostate cancer (see, e.g., Kedage et al An Interaction with Ewing's Sarcoma Breakpoint Protein EWS Defines a Specific Oncogenic Mechanism of ETS Factors Rearranged in Prostate Cancer, Cell Reports 2016 Oct 25;17(5): 1289-1301, where dysregulation of GGAA repeats in prostate cancer due to TMPRSS2-ERG fusions is described).
  • the disorder is a tumor where ETS factors have abnormal functions may involve dysregulation of GGAA repeats (including hematopoietic malignancies with high levels of FLU).
  • the methods include administering a therapeutically effective amount of the compositions comprising a zinc finger fusion protein as described herein, to a subject who is in need of, or who has been determined to be in need of, such treatment.
  • patient or “subject” refers to members of the animal kingdom including but not limited to human beings and “mammal” refers to all mammals, including, but not limited to human beings.
  • the terms “treat,” “treating,” “treatment,” and the like refer to reducing or ameliorating a disorder and/or symptoms associated therewith by any suitable dosage regimen, procedure and/or administration route of a composition, device or structure with the object of achieving a desirable clinical/medical end-point. It will be appreciated that, although not precluded, treating a disorder or condition does not require that the disorder, condition or symptoms associated therewith be completely eliminated. In specific embodiments, the terms “treat,” “treatment,” and “treating” refer to the amelioration of at least one measurable physical parameter of a proliferative disorder, such as growth of a tumor, not necessarily discernible by the patient.
  • the terms “treat,” “treatment,” and “treating” refer to the inhibition of the progression of a proliferative disorder, either physically by, e.g., stabilization of a discernible symptom, physiologically by, e.g., stabilization of a physical parameter, or both. In other embodiments the terms “treat,” “treatment,” and “treating” refer to the reduction or stabilization of tumor size or cancerous cell count.
  • Ewing s Sarcoma is a type of cancerous tumor that grows in the bones or the soft tissue around bones, such as cartilage or the nerves. It results from a translocation which fuses the EWS gene on chromosome 22 with the FLU gene on chromosome 11. The resultant fusion, EWS-FLI1, functions as a transcriptional activator.
  • Treatment for Ewing sarcoma usually begins with chemotherapy. The drugs may shrink the tumor and make it easier to remove the cancer with surgery or target with radiation therapy. After surgery or radiation therapy, chemotherapy treatments might continue in order to kill any cancer cells that might remain.
  • compositions described herein are administered to a subject in need thereof (e.g., intravenous (similar to other chemotherapy treatments currently used for Ewing’s Sarcoma), through infusion pump, or intratumoral injection) in a therapeutically sufficient amount to reduce tumor size or to kill tumor cells.
  • a subject in need thereof e.g., intravenous (similar to other chemotherapy treatments currently used for Ewing’s Sarcoma), through infusion pump, or intratumoral injection
  • the compositions described herein are administered in a therapeutically sufficient amount to reduce the aberrant gene expression driven by activation of GGAA-microsatellites in a cell, which results because of the activity of EWS-FLI1.
  • compositions described herein can be used in combination with one or more other treatments that are typically used to treat Ewing’s Sarcoma.
  • chemotherapy agents e.g., vincristine, doxorubicin, cyclophosphamide, ifosfamide, and etoposide
  • radiation surgery, or any combination thereof.
  • engineered ZFAs can be used to efficiently target and alter the chromatin state of a class of microsatellite repeats in human cells.
  • MSCs Primary bone marrow derived-MSCs were collected with approval from the Institutional Review Board of the Centre Hospitalier Universitaire Vaudois. Samples were de-identified prior to our analysis. MSCs were cultured in IMDM (Life Technologies) containing 10% fetal calf serum (FCS) and 10 ng/ml platelet-derived growth factor BB (PeproTech). U2OS were obtained from Toni Cathomen (Freiburg). All other cell lines were obtained from ATCC and media from Life Technologies. Ewing sarcoma cell lines SKNMC, A673, EW7 were grown in RPMI 1640 and CHP100 in McCoy’s 5a Medium.
  • FCS fetal calf serum
  • BB platelet-derived growth factor BB
  • HEK293T, Hela and U2OS were grown in DMEM and MRC5 in EMEM. All media were supplemented with 1% penicillin and streptomycin (Life Technologies). McCoy’s 5a medium was supplemented with 15% FBS and all other media were supplemented with 10% FBS. Cells were cultured at 37° C with 5% CO2. Media supernatant was analyzed biweekly for the presence of Mycoplasma using MycoAlertTM PLUS (Lonza). Cell lines were authenticated by ATCC STR profiling.
  • Each of the 16 different ZFAs that recognize ⁇ 4.5 GGAA tandem repeats was generated by assembling pre-selected 2-ZF units from an unpublished Joung lab archive. Although we used an unpublished archive of engineered zinc finger modules to provide the various 2-ZF units for constructing our ZFAs, there are other published public sources of zinc finger units as well as protocols that can be used to create customized zinc finger arrays (Sander et al. 2010; Fu et al. 2009; Wright et al. 2006; Sander et al. 2011; Maeder et al. 2008, 2009). The assembled ZFAs were inserted into the pENTR3C vector and EWS N-terminus (Riggi et al.
  • dCas9-EWS (NP173) was constructed by cloning EWS into BPK1179 digested with Xhol and Notl by Gibson assembly, and EWS-dCas9 (YET3486) was constructed by cloning EWS into pSQT digested with Agel and BstZ17i by Gibson assembly.
  • DmrC-EWS was generated by inserting EWS into DmrC entry vector digested with Nrul, using Gibson assembly. Sequences of gRNAs used in this study are provided in Table 2A.
  • Lentivirus was produced in HEK293T LentiX cells (Clontech) by LT1 (Mirus Bio) transfection with gene delivery vector and packaging vectors GAG/POL and VSV plasmids(Boulay et al. 2017). Viral supernatants were collected 72 h after transfection and concentrated using the LentiX concentrator (Clontech). Virus containing pellets were resuspended in PBS and added dropwise on cells in presence of growth media supplemented with 6 ug/ml polybrene.
  • Cells infected with lentivirus were selected using puromycin (Invivogen) at a concentration of 1 ug/ml for SKNMC, EW7, CHP100, HEK293T, HeLa and U2OS or 2 ug/ml for A673 and MRC5 in the growth medium. MSCs were selected with 0.75 ug/ml puromycin. Overexpression efficiency was determined by immunoblot analysis.
  • Immunoblot analyses were performed using standard protocols (Boulay et al. 2017). Primary antibodies were used at the following concentrations: rat anti-HA (Roche, lug/ml), rabbit anti -FLU (abeam, lug/ml), and mouse anti-GAPDH (Millipore, 0.1 ug/ml). Secondary antibodies were goat anti-rabbit, goat anti-rat, and goat anti-mouse IgG respectively conjugated with horseradish peroxidase (Bio-Rad, 1: 10,000 dilution). Membranes were developed using Western Lightning Plus-ECL enhanced chemiluminescence substrate (PerkinElmer) and visualized using photographic film.
  • qPCR was performed using Roche LightCycler480 with the following cycling protocols: initial denaturation at 95 °C for 20 seconds (s) followed by 45 cycles of 95 °C for 3 s and 60 °C for 30 s. Ct values over 35, were considered as 35. Relative quantification of each target, normalized to an endogenous control (GAPDH or HPRTPy was performed using the comparative Ct method (Applied Biosystems).
  • Cells that were transduced with lentiviral KRAB-ZFA plasmid or GFP control plasmid were grown for 8 days and cell viability was measured using the CellTiter-Glo luminescent assay (Promega) as described by the manufacturer. Endpoint luminescence was measured on a SpectraMax M5 plate reader (Molecular Devices).
  • ChIP assays of MSCs, SKNMC, A673 and HEK293T cells were carried out using 2-5 x 10 6 cells per sample and per epitope, following the procedures described previously (Mikkelsen et al. 2007).
  • chromatin from formaldehyde-fixed cells were fragmented to 200-700 bp with a Branson 250 sonifier. Solubilized chromatin was immunoprecipitated overnight at 4C with 3 pg of target specific antibodies (rat anti-HA (Roche), rabbit anti -FLU (Abeam), rabbit anti-H3K27ac (Active Motif), and rabbit anti- H3K9me3 (Abeam)).
  • Antibody-chromatin complexes were pulled down with protein G- Dynabeads (Life Technologies), washed, and then eluted. After crosslink reversal, RNase A, and proteinase K treatment, immunoprecipitated DNA was extracted with AMP Pure beads (Beckman Coulter). ChIP DNA was quantified with Qubit. Sequencing libraries were prepared with 1-5 ng of ChIP DNA samples and input samples using the Ovation Ultralow System V2 kit (Nugen). Libraries were sequenced with single-end (SE) 50-75 cycles on an Illumina Nextseq 500 Illumina genome analyzer.
  • Reads were aligned to human reference genome hgl9 using bwa (Li and Durbin 2009). Aligned reads were then filtered to exclude PCR duplicates and were extended to 200 bp to approximate fragment sizes. Density maps were generated by counting the number of fragments overlapping each position using igvtools, and normalized to 10 million reads. We used MACS2 (Zhang et al. 2008) to call peaks using matching input controls with a q- value threshold of 0.01. Peaks were filtered to exclude blacklisted regions as defined by the ENCODE consortium (ENCODE Project Consortium 2012). Peaks within 200 bp of each other were merged. Genome- wide GGAA microsatellite repeats were previously annotated (Boulay et al.
  • RNA libraries were prepared from 500 ng of total RNA treated with Ribogold zero to remove ribosomal RNA, using TruSeq Stranded Total RNA Library Prep Gold kit (Illumina, 20020599) and TruSeq RNA Single Indexes.
  • the RNA libraries were sequenced with PE 32 cycles on an Illumina Nextseq500 system.
  • RNA samples were sent to Novogene Corporation for mRNA sequencing.
  • RNA libraries were sequenced with PEI 50 cycles on an Illumina NovaSeq 6000 system. Reads were aligned to hgl9 using STAR (Dobin et al. 2013).
  • Mapped reads were filtered to exclude PCR duplicates and reads mapping to known ribosomal RNA coordinates, obtained from the rmsk table in the UCSC database (genome .
  • Gene expression was calculated using featureCounts (Liao, Smyth, and Shi 2014). Only primary alignments with mapping quality of 10 or more were counted. Counts were then normalized to 1 million reads. Signal tracks were generated using bedtools (Quinlan et al. 2010). Differential expression was calculated using DESeq2 (Love, Huber, and Anders 2014).
  • Gene set overlaps were computed using Gene Set Enrichment Analysis (GSEA, gsea-msigdb.org/gsea/msigdb/annotate.jsp). Genes lists for GSEA analysis were selected using a log2 fold change of 0.6 for upregulated genes and -0.6 for downregulated genes. An adjusted p-value threshold of ⁇ 0.1 was also applied. Gene lists were then analyzed for overlaps with C2 (curated gene sets) and BP (GO biological process), with a FDR q-value ⁇ 0.05.
  • GSEA Gene Set Enrichment Analysis
  • Example 1.1 Engineering sequence-specific DNA-binding domains to target GGAA microsatellite repeats
  • dCas9 programmed by guide RNAs have similarly been used to create gene regulatory proteins that function efficiently in human cells (Qi et al. 2013; Maeder et al. 2013; Perez-Pinera et al. 2013) and offer the substantial additional advantage of simple targetability by altering the gRNA sequence.
  • TALE repeats have also been used to build customized DNA-binding domains, utilizing assembled arrays of four TALE repeat domains as “building blocks”, with each recognizing one of the four different DNA bases (Scholze and Boch 2011; Boch et al. 2009; Moscou and Bogdanove 2009).
  • EWS-ZFA7 EWS-ZFA7 fusion
  • dCas9-DmrA fusions harboring two, three or four DmrA domains could mediate modest activation of the UGT3A2 gene (mean fold-activation of 3.1, 3, or 1.1, respectively), levels much lower compared to the activation observed using the EWS- ZFA fusion (FIG. ID).
  • these same dCas9-based EWS constructs were effective at activating various other genes when using different gRNAs directed to non- repetitive target sites within the promoters of those genes in U2OS cells and HEK293 cells (FIGs. 2A-2B).
  • Example 1.2 An EWS-ZFA fusion recapitulates genome-wide activation of microsatellite repeats observed in Ewing sarcoma
  • EWS-ZFA could target and activate GGAA microsatellites genome-wide by comparing its activity to EWS-FLI1 in mesenchymal stem cells (MSCs).
  • MSCs are a model for the cell of origin of Ewing sarcoma and EWS- FLI1 has previously been shown to operate as a pioneer factor at GGAA repeats in these cells to induce a chromatin landscape and gene expression pattern similar to that of tumor cells (Riggi et al. 2014).
  • GGAA repeats as the dominant motif found in EWS-ZFA peaks (more than 80% of EWS-ZFA binding sites contained more than four consecutive GGAA units, FIG. 3A, FIG. 4A, note alternate motifs include: GGAAGGAAGGAAGGAAGGAAGGAA (SEQ ID NO: 136) and AAGGAAGGAAGGAAGGAAGGAAGG (SEQ ID NO: 137)).
  • EWS-ZFA also bound nearly all of the GGAA repeats in the genome bound by EWS-FLI1 (FIG. 3B, FIG.
  • Example 1.3 KRAB-ZFAs can selectively silence the microsatellite-driven Ewing Sarcoma gene expression program
  • EWS-ZFA can efficiently target and activate GGAA microsatellites in MSCs
  • a fusion of our engineered ZFA to a repressive KRAB domain might conversely silence active GGAA microsatellites bound by endogenous EWS-FLI1 in Ewing sarcoma cells, thereby inactivating its downstream oncogenic gene expression program.
  • This approach offers the possibility to delineate the precise functional role of GGAA repeats in Ewing Sarcoma cells, in isolation from the non-repeat GGAA target sites of EWS-FLI1.
  • KRAB-ZFA binding was also associated with striking changes in chromatin states and the induction of repressive marks with increased H3K9me3 and decreased H3K27Ac signals at GGAA microsatellites (FIGs. 5A, 5C - 5D, FIGs. 6A, 6C). As expected, these changes were observed uniquely at GGAA repeats and not at non-repeat GGAA EWS-FLI1 binding sites, confirming the specificity of KRAB-ZFA (FIGs. 6D-6F).
  • HEK293T cells were largely devoid of active chromatin marks at GGAA repeats and that there were no major changes in H3K27Ac signals induced with KRAB-ZFA expression (FIG. 7C).
  • GGAA repeats in HEK293T cells accumulated strong repressive H3K9me3 signals after expression of KRAB-ZFA in the same manner as Ewing sarcoma cells (FIGs. 5E - 5F).
  • Ewing sarcoma cells HEK293T transduced with KRAB-ZFA displayed minimal transcriptional changes, which only included a handful of genes with GGAA repeats located within their promoters (FIG. 7D, Table 6).
  • engineered ZFAs are highly effective and specific tools for targeting widely distributed repetitive elements and altering their chromatin states.
  • Engineered ZFAs have distinct advantages for this purpose given their high DNA binding affinities, small size, and similarities to endogenous transcription factors.
  • Our findings further demonstrate that engineered ZFAs can greatly facilitate the functional assessment of the important but challenging-to-study repetitive elements of the human genome and may provide a strategy for therapeutically modifying the non-coding function of these repeats.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

Described herein are compositions comprising zinc fingers and methods of use thereof for the treatment of nucleotide repeat expansion disorders such as Ewing Sarcoma.

Description

Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers
CLAIM OF PRIORITY
This application claims priority under 35 USC §119(e) to U.S. Patent Application Serial No. 63/327,175, filed on April 4, 2022, the entire contents of which are hereby incorporated by reference.
FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with Government support under Grant Nos. GM118158, CA211707, CA204954, OD006862, GM105378, and CA231637 awarded by the National Institutes of Health. The Government has certain rights in the invention.
TECHNICAL FIELD
The present invention relates to compositions comprising zinc fingers and methods of use thereof for the treatment of nucleotide repeat expansion disorders such as Ewing Sarcoma.
BACKGROUND
Nucleotide repeat expansion disorders involve the localized expansion of unstable repeats of sets of three, four, five, or more nucleotides and can result in loss of function of the gene in which the repeat resides, a gain of toxic function, or both. Expanded repeat regions within non-coding sequences can lead to aberrant expression of the gene while expanded repeats within coding regions (also known as codon reiteration disorders) may cause mis-folding and protein aggregation. The exact cause of the pathophysiology associated with the aberrant proteins is often not known.
Ewing sarcoma is an aggressive pediatric malignancy that likely arises from neural crest- or mesoderm-derived mesenchymal stem cells (MSCs). It is driven by oncogenic fusions between EWS and genes in the ETS family (mostly FLU). EWS-FLI1 binds DNA either at ETS-like consensus sites containing a GGAA core motif or, more specifically with respect to other ETS family members, at GGAA microsatellites, where the enhancer activity increases with the number of consecutive GGAA motifs.
The human genome contains thousands of GGAA-microsatellites. As such, in Ewing Sarcoma, the disease is caused by the widespread activation of GGAA, and illustrates the need for therapeutic agents that are able to perturb these elements.
SUMMARY
Repeat elements can be dysregulated at genome-wide scale in human diseases. For example, in Ewing sarcoma, hundreds of normally inert GGAA tandem repeats can be converted into de novo transcriptional enhancers when bound by the EWS-FLI1 oncogenic fusion protein. Here we show that fusions of GGAA repeat-targeted engineered zinc finger arrays (ZFAs) to the EWS domain can function at least as efficiently as EWS-FLI1 for converting hundreds of GGAA repeats into active enhancers in an Ewing sarcoma precursor cell model. Furthermore, a fusion of a KRAB repression domain to a GGAA repeat-targeted ZFA could silence GGAA microsatellite enhancers genome-wide in Ewing sarcoma cells, thereby reducing expression of EWS-FLI1- activated genes. Remarkably, this KRAB-ZFA fusion showed selective toxicity against Ewing sarcoma cell lines compared with other non-Ewing cancer cell lines, consistent with its Ewing sarcoma-specific impact on the transcriptome. These findings demonstrate the value of ZFAs for functional annotation of repeats and illustrate how aberrant microsatellite activities might be regulated for potential therapeutic applications.
In some embodiments, provided herein are engineered zinc finger arrays comprising 6 zinc finger recognition regions, wherein the zinc finger array binds a target sequence of GGAAGGAAGGAAGGAAGG (SEQ ID NO:4) or AAGGAAGGAAGGAAGGAA (SEQ ID NO: 5). In some embodiments, the engineered zinc finger array comprises at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to an amino acid sequence set forth in any one of SEQ ID NOs:24-39. In some embodiments, the engineered zinc finger array comprises the amino acid sequence set forth in SEQ ID NO: 30. In some embodiments, provided herein are isolated cells comprising the zinc finger array according to any one of the aforementioned embodiments.
In some embodiments, provided herein are isolated nucleic acid encoding the zinc finger array according to any one of the aforementioned embodiments. In some embodiments, provided herein is a vector comprising the isolated nucleic acid described above.
In some embodiments, provided herein are fusion protein comprising the zinc finger arrays according to any one of the aforementioned embodiments fused to a heterologous functional domain, with an optional intervening linker, wherein the linker does not interfere with activity of the fusion protein. In some embodiments, the heterologous functional domain is a transcriptional silencer or transcriptional repression domain. In some embodiments, the transcriptional repression domain is a Krueppel- associated box (KRAB) domain, ERF repressor domain (ERD), or mSin3 A interaction domain (SID). In some embodiments, the transcriptional silencer is Heterochromatin Protein 1 (HP1).
In some embodiments, provided herein are isolated cells comprising the fusion protein according to any one of the aforementioned embodiments.
In some embodiments, provided herein are isolated nucleic acids encoding the fusion according to any one of the aforementioned embodiments. In some embodiments, provided herein is a vector comprising the isolated nucleic acid described above.
Also provided herein are methods of reducing aberrant gene expression driven by activation of GGAA-microsatellites in a cell, the method comprising contacting the cell with an effective amount of the fusion proteins as described above, or the isolated nucleic acid as described above. Also provided herein are methods of treating a subject who has a disease associated with aberrant gene expression driven by activation of GGAA- microsatellites in a cell, the method comprising administering to the subject an effective amount of a composition comprising the fusion proteins as described above, or the isolated nucleic acid as described above. In some embodiments of the methods described above, the subject has Ewing sarcoma. In some embodiments of the methods described above, the composition is administered by injection into or near a tumor, or by application after surgical resection. In some embodiments of the methods described above, the composition is administered by injection into or near a tumor, or by application before surgical resection. In some embodiments, the method of treating a subject further comprises treating a subject with one or more chemotherapy agents. In some embodiments, the chemotherapy is one of vincristine, doxorubicin, cyclophosphamide, ifosfamide, etoposide, or a combination thereof. In some embodiments, the composition is administered before radiation.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
DESCRIPTION OF DRAWINGS
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
FIGs. 1A-1D. Engineering ZFAs to bind GGAA microsatellites in the human genome and efficient activation of a target gene by engineered ZFAs fused to EWS. FIG. 1A. Schematic of 16 ZFAs, each engineered to bind ~4.5 GGAA microsatellites. The ZFAs have six zinc fingers, and each finger recognizes three nucleotides. The target sequences of ZFA 1 through 8 start with GGA, and ZFA 9 through 16 with AAG. The amino acid compositions of recognition helices for each zinc finger are shown on the right. Multiple zinc fingers with different recognition helices can recognize the same nucleotides. FIG. IB. Schematic of ZFAs fused to EWS activating UGT3A2 by binding to an 11 -unit GGAA microsatellite located ~2Kb upstream of the TSS. EWS is fused to the N-terminus (left panel) or C-terminus (right panel) of ZFAs. FIG. 1C. 32 fusions of EWS and ZFAs that target GGAA repeats were tested for UGT3A2 gene activation in U2OS cells by nucleofection. EWS-ZFA7 closely mimicked the activation level of EWS- FLI1 and therefore was selected for further experiments. FIG. ID. mRNA expression of UGT3A2 in U2OS cells nucleofected with EWS-ZFA7, EWS-dCas9, dCas9-EWS or dCas9-based bi-partite EWS activators (dCas9-DmrA and DmrC-EWS). The bi-partite system increases the density of EWS molecules recruited to a target site.
FIGs. 2A-2B. Gene activation by dCas9-based EWS activators targeting specific promoters in the human genome. FIG. 2A. mRNA expression levels of the endogenous IL2RA, CD69, HBB, and HBG promoters in the presence of EWS-dCas9, dCas9-EWS or dCas9-based bi-partite EWS with single gRNAs (1, 2, and 3) or pooled gRNAs (all) targeting promoter sequences in U2OS cells. Relative expression of each gene was measured by RT-qPCR, normalized to HPRT levels and calculated relative to that of a control sample expressing a non-targeting gRNA. Open circles indicate biological replicates (n=3), bars the mean of replicates, and error bars the s.e.m. FIG. 2B. mRNA expression levels of the endogenous IL2RA, CD69, HBB, and HBG promoters in the presence of dCas9-EWS or dCas9-based bi-partite EWS with single gRNAs (1, 2, and 3) or pooled gRNAs (all) targeting promoter sequences in HEK293 cells. Relative expression of each gene was measured by RT-qPCR, normalized to HPRT levels and calculated relative to that of a control sample expressing a non-targeting gRNA. Open circles indicate biological replicates (n=3), bars the mean of replicates, and error bars the s.e.m.
FIGs. 3A-3H. Efficient and specific binding of EWS-ZFA at GGAA repeats in MSCs induces active chromatin and activation of GGAA repeat associated genes. FIG. 3A. GGAA repeat motifs identified at sites bound by EWS-ZFA in MSCs (AGGAAGGAAGGAAGGAAGGAAGGA, SEQ ID NO: 134). FIG. 3B. Scatterplot showing binding of 3xHA-tagged EWS-FLI1 and EWS-ZFA to GGAA repeats genomewide (n=13029) in MSCs determined using HA ChlP-seq. ChlP-seq signals are in a log2 scale. The Spearman correlation coefficient is 0.68 with p-value < 2.2e-16. Data from one of two biological replicate experiments is shown. FIG. 3C. Bar plots showing the fraction of GGAA repeats in the genome bound by EWS-ZFA (left) and EWS-FLI1 (right) upon lentiviral transduction in MSCs. Data from one of two biological replicate experiments is shown. The number of consecutive GGAA repeats in each category is shown on the x-axis. FIG. 3D. Heatmaps showing HA and H3K27ac ChlP-seq signals in MSCs at EWS-FLI1 bound GGAA repeats identified in Ewing sarcoma (n=812) upon lentiviral transduction of either 3xHA-tagged EWS-FLI1 or EWS-ZFA. 3x HA-tagged GFP was used as control. 10-kb windows in each panel are centered on EWS-FLI1 binding sites in Ewing sarcoma. FIG. 3E. Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the IGF2BP1 locus containing a GGAA repeats element and a canonical ETS binding site. FIG. 3F. Heatmaps showing HA and H3K27ac ChlP-seq signals in MSCs at EWS-FLI1 bound canonical ETS binding sites identified in Ewing sarcoma (n=973) upon lentiviral transduction of either 3xHA-tagged EWS-FLI1 or EWS-ZFA. GFP was used as control. 10-kb windows in each panel are centered on EWS-FLI1 binding sites in Ewing sarcoma. FIG. 3G. Example showing the binding of 3xHA-tagged EWS-FLI1 or EWS-ZFA and accompanying H3K27ac levels in MSC at the NIBAN3 and COLGALT1 loci containing a canonical ETS binding site. FIG. 3H. Heatmaps of log2 fold changes in expression of GGAA-repeat-associated genes (n=126) in MSCs treated with EWS-FLI1 or EWS-ZFA constructs compared to a GFP control, determined by RNA-seq. Two biological replicates are shown. Spearman correlation of log2 fold changes in EWS-FLI1 and EWS-ZNF is 0.58 (p- value < 2.22e-16).
FIGs. 4A-4G. Efficient binding of EWS-ZFA at GGAA repeats in MSCs and comparison of changes in H3K27ac ChlP-seq signals in MSCs after treatment with EWS- FLI1 and EWS-ZFA. FIG. 4A. GGAA repeat motifs identified at sites bound by EWS- ZFA from a second biological replicate experiment in MSCs (GAAGGAAGGAAGGAAGGAAGGAAG, SEQ ID NO: 135). FIG. 4B. Scatterplot showing binding of 3xHA-tagged EWS-FLI1 and EWS-ZFA to GGAA repeats genomewide (n=13092) in MSCs determined using HA ChlP-seq. The ChlP-seq signals are in a log2 scale. The Spearman’s correlation coefficient is 0.68 with p-value < 2.2e-16. The data shown corresponds to the second of two biological replicate experiments. FIG. 4C. Bar plot showing the number of GGAA repeat microsatellites genome-wide based on the number of consecutive GGAA repeats. FIG. 4D. Bar plots showing the fraction of GGAA repeats in the genome bound by EWS-ZFA (left) and EWS-FLI1 (right) upon lentiviral transduction in MSCs. The number of consecutive GGAA repeats in each category is shown on the x-axis. The data shown corresponds to the second of two biological replicate experiments. FIG. 4E. Boxplots show changes in H3K27ac ChlP-seq signals in MSCs expressing either EWS-FLI1 or EWS-ZFA at GGAA repeat microsatellites (a, n=812). The results of two biological replicate experiments are shown. FIG. 4F. Protein expression levels of Control (Empty Vector), EWS-ZFA and EWS-FLI1 in MSCs. Protein levels were determined by immunoblotting using specific antibodies directed against HA. GAPDH was used as loading control. FIG. 4G. Boxplots show changes in H3K27ac ChlP-seq signals in MSCs expressing either EWS-FLI1 or EWS- ZFA at canonical ETS-binding sites (b, n=973) bound by EWS-FLI1 in Ewing sarcoma. The results of two biological replicate experiments are shown.
FIGs. 5A-5H. Binding of KRAB-ZFAto GGAA repeats induces selective toxicity in Ewing sarcoma cell lines by repressing target gene expression. FIG. 5A. Heatmaps showing binding of 3xHA tagged KRAB-ZFA and H3K9me3 deposition at EWS-FLI1 bound GGAA repeats (n=812) in SKNMC cells as determined using ChlP-seq. FIG. 5B. Composite plot showing EWS-FLI1 occupancy of GGAA repeats after introduction of KRAB-ZFA or GFP (control) in SKNMC. The x axis represents a 10-Kb window centered on 812 GGAA repeats. FIG. 5C. Histograms showing changes in H3K27ac at 812 EWS-FLI1 bound GGAA repeats upon treatment of SKNMC cells with KRAB-ZFA. FIG. 5D. Example showing the binding of KRAB-ZFA (3xHA-tagged), endogenous EWS-FLI, H3K9me3 and H3K27ac a GGAA repeat element associated with the CCND1 locus, after treatment of SKNMC cells with KRAB-ZFA constructs. GFP was used as control. FIG. 5E. Heatmaps showing binding of KRAB-ZFA (3xHA tagged) and H3K9me3 deposition in HEK293T cells at GGAA repeats bound by EWS-FLI1 in Ewing sarcoma (n=812) as determined using ChlP-seq. FIG. 5F. Example showing the binding of KRAB-ZFA (3xHA-tagged), H3K9me3 and H3K27ac at a GGAA repeat element associated with the CCND1 locus, after the treatment of HEK293T cells with KRAB-ZFA construct. GFP was used as control. FIG. 5G. Heatmaps showing expression (row- normalized counts) of GGAA repeat-associated genes (n=235), in SKNMC and HEK293T cells treated with KRAB-ZFA or GFP (control) determined by RNA-seq. Data are from two biological replicates. FIG. 5H. Viability of Ewing sarcoma and non-Ewing cell lines 8 days post lentiviral transduction of KRAB-ZFA and GFP (control). Open circles indicate two biological replicates with three technical replicates, error bars show the s.e.m.
FIGs. 6A-6F. Changes in EWS-FLI1 occupancy and chromatin states upon binding of KRAB-ZFA at GGAA repeats and ETS canonical binding sites in Ewing Sarcoma cell lines. FIG. 6A. Heatmaps showing HA (KRAB-ZFA) and H3K9me3 ChlP- seq signals at EWS-FLI1 bound GGAA repeats (n=812) in A673 cells upon lentiviral transduction. FIG. 6B. Composite plot showing decreased EWS-FLI1 occupancy at GGAA repeat enhancers after introduction of KRAB-ZFA in A673. GFP was used as control. The x-axis represents a 10-Kb window centered on 812 GGAA repeats. FIG. 6C. Histogram showing changes in H3K27ac ChlP-seq signals at 812 EWS-FLI1 bound GGAA repeats upon treatment of A673 cells with KRAB-ZFA. FIG. 6D. Composite plots showing maintained EWS-FLI1 occupancy at canonical ETS binding sites after introduction of KRAB-ZFA in SKNMC and A673 cells. GFP was used as control. The x- axis represents a 10-kb window centered on EWS-FLI1 -bound canonical ETS binding sites (n=973). FIG. 6E. Boxplots showing changes in FLU (EWS-FLI1) ChlP-seq signals upon lentiviral induction of KRAB-ZFA in SKNMC and A673 cells at GGAA repeat microsatellites (blue, n=812) and canonical ETS-binding sites (gray, n=973). FIG. 6F. Scatterplots showing changes in H3K9me3 and H3K27ac ChlP-seq signals at EWS- FLI1 binding sites (GGAA repeats (top, n=812) and canonical ETS binding sites (bottom, n=973)) upon lentiviral induction of KRAB-ZFA in Ewing sarcoma cell lines SKNMC and A673 as well as the control cell line HEK293T. Iog2 fold changes are shown.
FIGs. 7A-7E. KRAB-ZFAs can silence GGAA repeat-associated genes in Ewing Sarcoma cells but not in HEK293T. FIG. 7A. Heatmap showing row- normalized expression levels of GGAA repeat-associated genes (n=235) in A673 and HEK293T cells treated with KRAB-ZFA or GFP (control) determined by RNA-seq. Data are from two biological replicates. FIG. 7B. Bar plot showing the number of genes up or downregulated by 1.5-fold (p- value < 0.1) upon treatment with KRAB-ZFA construct targeting GGAA repeat microsatellites, in Ewing sarcoma cell lines SKNMC and A673 as well as the control cell line HEK293T. FIG. 7C. Heatmaps showing the absence of activity and the lack of changes in H3K27ac ChlP-seq signals in HEK293T cells at EWS- FLU bound GGAA repeats in Ewing sarcoma (n=812) upon KRAB-ZFA lentiviral transduction. GFP was used as control. 10-kb windows in each panel are centered on EWS-FLI1 binding sites. FIG. 7D. Binding of KRAB-ZFA accompanied by chromatin changes (H3K9me3 and H3K27ac) at a GGAA repeat located within the promoter of BCL2L2, a gene downregulated by KRAB-ZFA in HEK293Tcells. FIG. 7E. Protein levels of KRAB-ZFA and EWS-FLI1 across all cell lines tested (Figure 3a) were determined by immunoblotting using specific antibodies directed against HA (KRAB- ZFA) and FLU (EWS-FLI1). GAPDH was used as loading control.
DETAILED DESCRIPTION
Microsatellite repeats are a class of simple tandem repeats that previous studies have shown can be dysregulated in multiple disease states (Subramanian, Mishra, and Singh 2003; Malik et al. 2021; Trost et al. 2020; Usdin 2008). For example, large scale epigenetic dysregulation of microsatellite repeats has been observed in Ewing sarcoma, a pediatric bone tumor where the EWS-FLI1 translocation fusion protein operates as a transcriptional pioneer factor (Delattre et al. 1992; Riggi et al. 2014). This fusion includes both the N-terminal transactivation domain of EWS and the C-terminal DNA binding domain of FLU. In contrast to FLU, which stably binds to only non-repeat GGAA sites, EWS-FLI1 can bind to both non-repeat GGAA motifs and GGAA microsatellite repeats. Notably, binding of EWS-FLI1 to the hundreds of GGAA microsatellites present throughout the human genome converts them into transcriptional enhancers, thereby inducing a tumor-specific gene regulatory program (Gangwal et al. 2008; Guillon et al. 2009; Riggi et al. 2014; Boulay et al. 2017). This example, together with the dysregulated expression of other repeat classes in other tumor types (Ting et al. 2011; Burns 2017), illustrates how aberrant transcriptional programs in cancer and other diseases can be caused by the widespread activation of specific repeat categories and highlights the need for robust tools to conduct genome-wide studies and perturbation of these elements.
Described herein are engineered ZFAs that can be used to efficiently target and alter the chromatin state of a class of microsatellite repeats in human cells. Using GGAA microsatellite repeats bound by EWS-FLI1, we showed that engineered EWS-ZFA fusion proteins targeted to these repeats can be over an order of magnitude more efficient than an EWS-dCas9-targeted fusion for activating a GGAA repeat previously shown to be converted into a de novo enhancer by EWS-FLI1. In addition, EWS-ZFA fusions can effectively phenocopy the pioneer function of EWS-FLI1 at GGAA microsatellites and recapitulate the GGAA repeat-dependent chromatin landscape and gene expression profiles of Ewing sarcoma. Remarkably, coupling of a GGAA repeat- targeted ZFAto a transcriptional repressor KRAB domain resulted in genome-wide silencing of GGAA microsatellites and cytotoxicity that was selective for Ewing sarcoma cells through the targeted inactivation of oncogenic gene expression programs. Our results validate the power and efficacy of engineered ZF technology for targeting and altering the functional state of microsatellite repeats and illustrate how this platform can be deployed to interrogate the function of microsatellite repetitive elements at genome-scale.
Definitions
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In case of conflict, the present application, including definitions will control.
An “exogenous” nucleic acid sequence is a nucleic acid sequence that is not normally present in a cell, but can be introduced into a cell by one or more genetic, biochemical or other methods. Normal presence in the cell is determined with respect to the particular developmental stage and environmental conditions of the cell. Thus, for example, as used herein, an extrachromosomal DNA sequence that is introduced into the cell is an exogenous nucleic acid (even if part or all of that sequence is also present in the genome of the cell). Similarly, a nucleic acid sequence that is present only during embryonic development of muscle is an exogenous nucleic acid sequence with respect to an adult muscle cell. Alternatively, a nucleic acid sequence induced by heat shock is an exogenous molecule with respect to a non-heat-shocked cell. An exogenous nucleic acid sequence can comprise, for example, a functioning version of a malfunctioning endogenous gene. By contrast, an “endogenous” nucleic acid sequence is one that is normally present in a particular cell at a particular developmental stage under particular environmental conditions. For example, an endogenous nucleic acid can comprise a chromosome, the genome of a mitochondrion, chloroplast or other organelle, or a naturally- occurring episomal nucleic acid.
“Nucleic acid” refers to deoxyribonucleotides or ribonucleotides in either single- or double-stranded form. The term encompasses nucleic acids containing known nucleotide analogs or modified backbone residues or linkages, which can be synthetic, naturally occurring, or non-naturally occurring, which have similar binding properties as the reference nucleic acid, and which are metabolized in a manner similar to the reference nucleotides. Examples of such analogs include, without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs). Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, as well as the sequence explicitly indicated. The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and polynucleotide. A “gene,” for the purposes of the present disclosure, includes a DNA region encoding a gene product (see infra), as well as all DNA regions which regulate the production of the gene product, whether or not such regulatory sequences are adjacent to coding and/or transcribed sequences. Accordingly, a gene includes, but is not necessarily limited to, promoter sequences, terminators, translational regulatory sequences such as ribosome binding sites and internal ribosome entry sites, enhancers, silencers, insulators, boundary elements, replication origins, matrix attachment sites and locus control regions.
The terms “polypeptide,” “peptide” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
The term “amino acid” refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, y-carboxyglutamate, and O-phosphoserine. Amino acid analog refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a-carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine, and methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
Ranges provided herein are understood to be shorthand for all of the values within the range. For example, a range of 1 to 50 is understood to include any number, combination of numbers, or sub-range from the group consisting 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 (as well as fractions thereof unless the context clearly dictates otherwise).
In this disclosure, “comprises,” “comprising,” “containing” and “having” and the like can have the meaning ascribed to them in U.S. Patent law and can mean “includes,” “including,” and the like; “consisting essentially of’ or “consists essentially” likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.
Other definitions appear in context throughout this disclosure.
Compositions
Described herein are compositions comprising a zinc finger DNA-binding domain that specifically binds to a target site in any gene comprising a tetra-nucleotide repeat, e g., GGAA.
As used herein, the term zinc finger refers to a polypeptide comprising a DNA binding domain that is stabilized by zinc. The individual DNA binding domains are typically referred to as “fingers.” A zinc finger protein has at least one finger, preferably two fingers, three fingers, four fingers, five fingers, or six fingers. A zinc finger protein having two or more zinc fingers is referred to as a “multi-finger” or “multi- zinc finger” protein or “multi-finger array” or “zinc finger array.” Each finger typically comprises an approximately 30 amino acid, zinc- chelating, DNA-binding domain. An exemplary motif characterizing one class of these proteins is X(2)-Cys-X(2,4)-Cys-X(12)-His-X(3- 5)-His (SEQ ID NO: 1), where X is any amino acid, which is known as the “C(2)H(2)” class. Zinc finger units are joined together by non-canonical (non-TGEKP linkers) such as TGSQKP (SEQ ID NO:2) or CGSQKP (SEQ ID NO:3). Studies have demonstrated that a single zinc finger of this C(2)H(2) class consists of an alpha helix containing the two invariant histidine residues coordinated with zinc along with the two cysteine residues of a single beta turn (Berg and Shi, Science 271:1081-1085 (1996)). Each finger within a zinc finger array binds to about two to about five nucleotides within a DNA sequence. A zinc finger array that include three fingers typically recognize a target site that includes 9 or 10 nucleotides; a zinc finger arrays that include four fingers typically recognize a target site that includes 12 to 14 nucleotides; while a zinc finger arrays having six fingers can recognize target sites that include 18 to 21 nucleotides.
In some embodiments the zinc finger protein/array is a non-naturally occurring protein, in that it is engineered to bind to a target site of choice. An engineered zinc finger binding domain can have a novel binding specificity, compared to a naturally- occurring zinc finger. Engineering methods include, but are not limited to, rational design and various types of selection. Rational design includes, for example, using databases comprising tri-nucleotide sequences and individual zinc finger amino acid sequences, in which each tri-nucleotide sequence is associated with one or more amino acid sequences of zinc fingers which bind the particular tri-nucleotide sequence.
Engineered zinc finger proteins are non-naturally occurring zinc finger proteins whose recognition helices have been altered (e.g., by selection and/or rational design) to bind to a pre-selected target site. Any of the zinc finger arrays described herein may include 1, 2, 3, 4, 5, 6 or more zinc fingers, each zinc finger having a recognition helix that binds to a target subsite in the selected sequence(s) (e.g., gene(s)). In some embodiments, the recognition helix is non-naturally occurring. In certain embodiments, the zinc finger proteins have the recognition helices shown in FIG. 1 A.
In certain embodiments, the DNA binding domain is an engineered zinc finger array including four to six fingers that is capable of recognizing target sites of 12 to 18 nucleotides (e.g., a zinc finger array having 6 fingers that recognizes target sites of 18 nucleotides). Each zinc finger within the array is designed to target a trinucleotide sequence. For example, each zinc finger is designed to recognize GGA, AGG, AAG, or GAA. Therefore, when the zinc finger array is appropriately assembled, the zinc finger array can recognize sequences such as GGAAGGAAGGAAGGAAGG (SEQ ID NO:4) or AAGGAAGGAAGGAAGGAA (SEQ ID NO: 5).
See also, FIG. 1 A, which is a schematic of 16 different ZFAs, each engineered to bind ~4.5 GGAA microsatellites. The ZFAs each have six zinc fingers, and each finger recognizes three nucleotides. The target sequences of ZFAs 1 through 8 start with GGA, and ZFAs 9 through 16 with AAG. The amino acid compositions of recognition helices for each zinc finger are shown on the right side of FIG. 1A. Multiple zinc fingers with different recognition helices can in certain instances recognize the same nucleotides.
Fusion Proteins
Fusion proteins comprising DNA-binding proteins as described herein and a heterologous regulatory (functional) domain (or functional fragment thereof) are also provided. Common domains include, e.g., transcriptional repressors (e.g., KRAB, ERD, SID, TGF-P-inducible early gene (TTEG), v-erbA, MBD2, MBD3, Rb, MeCP2, R0M2, AtHD2A, and others, e.g., amino acids 473-530 of the ets2 repressor factor (ERF) repressor domain (ERD), amino acids 1-97 of the KRAB domain of K0X1, or amino acids 1-36 of the Mad mSIN3 interaction domain (SID); see Beerli et al., PNAS USA 95: 14628-14633 (1998)) or silencers such as Heterochromatin Protein 1 (HP1, also known as swi6), e.g., HPla or HP10; proteins or peptides that could recruit long noncoding RNAs (IncRNAs) fused to a fixed RNA binding sequence such as those bound by the MS2 coat protein, endoribonuclease Csy4, or the lambda N protein; enzymes that modify the methylation state of DNA (e.g., DNA methyltransferase (DNMT) or TET proteins); enzymes that modify histone subunits (e.g., histone acetyltransferases (HAT), histone deacetylases (HD AC), histone methyltransferases (e.g., for methylation of lysine or arginine residues) or histone demethylases (e.g., for demethylation of lysine or arginine residues); transcriptional activators (e.g., activation domains of NF-KB (e.g., p65), VP64, VPR, or p300).
In some embodiments, the fusion proteins include a linker between the zinc finger array and the heterologous functional domains. Domains could also be proteins that recruit (either directly or indirectly) other proteins in the cell that in turn can modulate gene expression. For direct fusions, linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins. In preferred embodiments, the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine). In some embodiments, the linker comprises one or more units consisting of GGGS (SEQ ID NO:6) or GGGGS (SEQ ID NO:7), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:6) or GGGGS (SEQ ID NO:7) unit. Other linker sequences can also be used. Indirect fusions include one or more dimerization systems (e.g., heterodimer systems containing DmrAand DmrC) that mediate coupling of different domains (e.g., DNA-binding domains and gene expression modulating domains), for example, by addition of a drug that induces activation of the dimerization systems.
Delivery and Expression Systems
To use the zinc finger fusion protein (e.g., a zinc finger that targets GGAA repeats and a repressor domain) described herein, it may be desirable to express them from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the zinc finger fusion protein can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the zinc finger fusion protein for production of the zinc finger fusion protein. The nucleic acid encoding the zinc finger fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
To obtain expression, a sequence encoding a zinc finger fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva etal., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
The promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the zinc finger fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the zinc finger fusion protein. In addition, a preferred promoter for administration of the zinc finger fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino etal., 1998, Gene Then, 5:491-496; Wang et al., 1997, Gene Then, 4:432-441; Neering etal., 1996, Blood, 88: 1147-55; and Rendahl etal., 1998, Nat. Biotechnol., 16:757-761).
In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the zinc finger fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the zinc finger fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
The vectors for expressing the zinc finger fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the Hl, U6 or 7SK promoters. These human promoters allow for expression of zinc finger fusion proteins in mammalian cells following plasmid transfection.
Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264: 17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983)). Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the zinc finger fusion protein.
Also provided herein are nucleic acids encoding the fusion proteins, as well as cells, tissues, and transgenic animals comprising the nucleic acids and optionally expressing the fusion proteins. Any nucleic acid construct capable of directing expression and/or which can transfer sequences to target cells can be used to administer the nucleic acid sequences described herein encoding either the exogenous nucleic acid sequence to be inserted within the target site or the zinc finger nuclease fusion proteins. Nucleic acid sequences described herein can be delivered to cells with vector delivery systems, including viral vector delivery systems comprising DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.
The term “vector” as used herein refers to nucleic acid molecules, usually doublestranded DNA, which may have inserted into it another nucleic acid molecule, such as a sequence encoding a nuclease fusion protein. The vector is used to transport the inserted nucleic acid molecule into a suitable host cell. A vector may contain the necessary elements that permit transcribing the inserted nucleic acid molecule, and translating the transcript into a polypeptide. Once in the host cell, the vector may for instance replicate independently of, or coincidental with, the host chromosomal DNA, and several copies of the vector and its inserted nucleic acid molecule may be generated. The term “vector” may thus also be defined as a gene delivery vehicle that facilitates gene transfer into a target cell. This definition includes both non-viral and viral vectors. Alternatively, gene delivery systems can be used to combine viral and non-viral components, such as nanoparticles or virosomes (Yamada et al. (2003) Nat Biotechnol . 21, 885-890). Non- viral vectors include but are not limited to cationic lipids, liposomes, nanoparticles, PEG, PEI, etc. Viral vectors are derived from viruses including but not limited to: retrovirus, lentivirus, adeno-associated virus, adenovirus, herpesvirus, hepatitis virus or the like. Typically, but not necessarily, viral vectors are replication-deficient as they have lost the ability to propagate in a given cell since viral genes essential for replication have been eliminated from the viral vector.
The use of RNA or DNA viral based systems for the delivery of nucleic acids takes advantage of highly evolved processes for targeting a virus to specific cells in the body and trafficking the viral payload to the nucleus. Viral vectors can be derived from lentivirus, adeno-associated virus, adenovirus, retroviruses and antiviruses. Conventional viral based systems for the delivery of nucleic acid sequences could include retroviral, lentiviral, adenoviral, adeno- associated, herpes simplex virus, and TMV-like viral vectors for gene transfer. Integration in the host genome is possible with the retrovirus, lentivirus, and adeno-associated virus gene transfer methods, often resulting in long term expression of the inserted transgene. Additionally, high transduction efficiencies have been observed in many different cell types and target tissues.
Retroviruses and antiviruses are RNA viruses that have the ability to insert their genes into host cell chromosomes after infection. Retroviral and lentiviral vectors have been developed that lack the genes encoding viral proteins, but retain the ability to infect cells and insert their genes into the chromosomes of the target cell (Miller (1990) Mol Cell Biol. 10, 4239-4242; Naldini et al. (1996) Science 272, 263-267; VandenDriessche et al., (1999) Proc Natl Acad Sci USA. 96, 10379-10384. The difference between a lentiviral and a classical Moloney-murine leukemia-virus (MLV) based retroviral vector is that lentiviral vectors can transduce both dividing and non-dividing cells whereas MLV-based retroviral vectors can only transduce dividing cells.
Adenoviral vectors are designed to be administered directly to a living subject. Unlike retroviral vectors, most of the adenoviral vector genomes do not integrate into the chromosome of the host cell. Instead, genes introduced into cells using adenoviral vectors are maintained in the nucleus as an extrachromosomal element (episome) that persists for an extended period of time. Adenoviral vectors will transduce dividing and nondividing cells in many different tissues (Chuah et al. (2003) Blood. 101, 1734-1743). Another viral vector is derived from the herpes simplex virus, a large, double-stranded DNA virus. Recombinant forms of the vaccinia virus, another dsDNA virus, can accommodate large inserts and are generated by homologous recombination.
Adeno-associated virus (AAV) is a small ssDNA virus which infects humans and some other primate species, not known to cause disease and consequently causing only a very mild immune response. AAV can infect both dividing and non-dividing cells and may incorporate its genome into that of the host cell. These features make AAV a very attractive candidate for creating viral vectors for gene therapy, although the cloning capacity of the vector is relatively limited. In a specific embodiment described herein, the vector used is therefore derived from adeno associated virus.
The zinc finger fusion proteins described herein can be delivered to cells by conventional protein transduction methods known in the art. In specific embodiments, one or more Nuclear Localization Signals (NLS) or protein transduction domains (e.g., penetratin or transportan) can be optionally added to the fusion protein. Such methods are described, for example by Liu, J. et al, Molecular Therapy-Nucleic Acids (2015) 4, e232 and Gaj, T. et al, ACS Chem. Biol. 2014, 9, 1662-1667. In some instances, Cys2His2 zinc fingerss themselves harbor intrinsic cell transduction properties. See, e.g., Gaj T, Guo J, Kato Y, Sirk SJ, Barbas CF 3rd. Nat Methods. 2012 Jul 1 ;9(8): 805-7. ; Gaj T, Liu J, Anderson KE, Sirk SJ, Barbas CF 3rd. ACS Chem Biol. 2014 Aug 15 ;9(8): 1662-7; Liu J, Gaj T, Wallen MC, Barbas CF 3rd. Mol Ther Nucleic Acids. 2015 Mar 10;4(3):e232; Liu J, et al. Nat Protoc. 2015 Nov; 10(11): 1842-59; Perdigao PRL, Cunha-Santos C, Barbas CF 3rd, Santa-Marta M, Goncalves J. Mol Ther Methods Clin Dev. 2020 May 22; 18: 145- 158.
In other embodiments, the zinc finger fusion proteins include a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide or hCT derived cell-penetrating peptides, see, e.g., Caron etal., (2001) Afo/ Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton FL 2002); El-Andaloussi etal., (2005) Curr Pharm Des. 11 (28):3597- 611; and Deshayes et aL, (2005) Cell Mol Life Sci. 62(16): 1839-49.
Cell penetrating peptides (CPPs) are short peptides that facilitate the movement of a wide range of biomolecules across the cell membrane into the cytoplasm or other organelles, e.g. the mitochondria and the nucleus. Examples of molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonucleotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanoparticles, and liposomes. CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g. lysine or arginine, or an alternating pattern of polar and nonpolar amino acids. CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55: 1189-1193, Vives etal., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi etal., (1994) J. Biol. Chem. 269: 10444-10450), polyarginine peptide sequences (Wender etal., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
CPPs can be linked with their cargo through covalent or non-covalent strategies. Methods for covalently joining a CPP and its cargo are known in the art, e.g. chemical cross-linking (Stetsenko etal., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara etal., (1998) Nat. Med. 4:1449-1453). Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard etal., (2000) Nature Medicine 6(11): 1253-1257), siRNA against cyclin Bl linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al. , (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Afo/. Cancer Ther. 1(12): 1043-1049, Snyder et al., (2004) PLoS Biol. 2:E36), and dominant negative forms of Ras or phosphoinositol 3 kinase (PI3K) fused to Tat to treat asthma (Myou et al., (2003) J. Immunol. 171 :4399- 4405).
CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications. For example, green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4): 511-518). Tat conjugated to quantum dots have been used to successfully cross the blood- brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146). CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm.
347(1): 133-140). See also Ramsey and Flynn, Pharmacol Ther. 2015 Jul 22. pii: S0163- 7258(15)00141-2.
In some embodiments, zinc finger fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences (one or more hexahistidine sequences). Such affinity tags can facilitate the purification of recombinant zinc finger fusion proteins.
In some embodiments, the zinc finger fusion proteins do not include a NLS or hexahistidine sequence.
Also provided herein are compositions and kits comprising the zinc finger fusion protein described herein. The kits can also include one or more additional reagents, e.g., additional enzymes (such as RNA polymerases) and buffers, e.g., for use in a method described herein.
Pharmaceutical Compositions and Methods of Administration
The methods described herein include the use of pharmaceutical compositions comprising the zinc finger fusion proteins described herein as an active ingredient.
Pharmaceutical compositions typically include a pharmaceutically acceptable carrier. As used herein the language “pharmaceutically acceptable carrier” includes saline, solvents, dispersion media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, compatible with pharmaceutical administration. Supplementary active compounds can also be incorporated into the compositions.
Pharmaceutical compositions are typically formulated to be compatible with its intended route of administration. Examples of routes of administration include intrathecal, intraperitoneal, intraocular, oral, intravenous, intradermal, subcutaneous, oral, intratumoral injection, administration by a gel for slow release, or an infusion pump.
Methods of formulating suitable pharmaceutical compositions are known in the art, see, e.g., Remington: The Science and Practice of Pharmacy, 21st ed., 2005; and the books in the series Drugs and the Pharmaceutical Sciences: a Series of Textbooks and Monographs (Dekker, NY). For example, solutions or suspensions used for administration to the eye, parenteral, intradermal, or subcutaneous application can include the following components: a sterile diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as ethylenediaminetetraacetic acid; buffers such as acetates, citrates or phosphates and agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of glass or plastic.
Pharmaceutical compositions suitable for injectable use can include sterile aqueous solutions (where water-soluble) or dispersions and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous administration, suitable carriers include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, Parsippany, NJ) or phosphate buffered saline (PBS). In all cases, the composition must be sterile and should be fluid to the extent that easy syringability exists. It should be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyetheylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the required particle size in the case of dispersion and by the use of surfactants. Prevention of the action of microorganisms can be achieved by various antibacterial and antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in the composition. Prolonged absorption of the injectable compositions can be brought about by including in the composition an agent that delays absorption, for example, aluminum monostearate and gelatin.
Sterile injectable solutions can be prepared by incorporating the active compound in the required amount in an appropriate solvent with one or a combination of ingredients enumerated above, as required, followed by filtered sterilization. Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle, which contains a basic dispersion medium and the required other ingredients from those enumerated above. In the case of sterile powders for the preparation of sterile injectable solutions, the preferred methods of preparation are vacuum drying and freeze-drying, which yield a powder of the active ingredient plus any additional desired ingredient from a previously sterile- filtered solution thereof.
Oral compositions generally include an inert diluent or an edible carrier. For the purpose of oral therapeutic administration, the active compound can be incorporated with excipients and used in the form of tablets, troches, or capsules, e.g., gelatin capsules. Oral compositions can also be prepared using a fluid carrier for use as a mouthwash. Pharmaceutically compatible binding agents, and/or adjuvant materials can be included as part of the composition. The tablets, pills, capsules, troches and the like can contain any of the following ingredients, or compounds of a similar nature: a binder such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or orange flavoring.
For administration by inhalation, the compounds can be delivered in the form of an aerosol spray from a pressured container or dispenser that contains a suitable propellant, e.g., a gas such as carbon dioxide, or a nebulizer. Such methods include those described in U.S. Patent No. 6,468,798.
Pharmaceutical compositions may also be formulated to provide slow, controlled or sustained release of the active agent using, by way of example, hydroxypropyl methyl cellulose in varying proportions or other polymer matrices, liposomes and/or microspheres. In addition, the pharmaceutical compositions described herein may contain opacifying agents and may be formulated so that they release the active agent only, or preferentially, in a certain portion of the gastrointestinal tract, optionally, in a delayed manner. The active agent can also be in micro-encapsulated form, if appropriate, with one or more of the above-described excipients.
Methods of Treatment
The methods described herein include methods for the treatment of disorders associated with GGAA tandem repeats. In some embodiments, the disorder is Ewing Sarcoma. In other embodiments, the disorder is prostate cancer (see, e.g., Kedage et al An Interaction with Ewing's Sarcoma Breakpoint Protein EWS Defines a Specific Oncogenic Mechanism of ETS Factors Rearranged in Prostate Cancer, Cell Reports 2016 Oct 25;17(5): 1289-1301, where dysregulation of GGAA repeats in prostate cancer due to TMPRSS2-ERG fusions is described). In other embodiments, the disorder is a tumor where ETS factors have abnormal functions may involve dysregulation of GGAA repeats (including hematopoietic malignancies with high levels of FLU). Generally, the methods include administering a therapeutically effective amount of the compositions comprising a zinc finger fusion protein as described herein, to a subject who is in need of, or who has been determined to be in need of, such treatment.
As used herein, the term “patient” or “subject” refers to members of the animal kingdom including but not limited to human beings and “mammal” refers to all mammals, including, but not limited to human beings.
As used herein, the terms “treat,” “treating,” “treatment,” and the like refer to reducing or ameliorating a disorder and/or symptoms associated therewith by any suitable dosage regimen, procedure and/or administration route of a composition, device or structure with the object of achieving a desirable clinical/medical end-point. It will be appreciated that, although not precluded, treating a disorder or condition does not require that the disorder, condition or symptoms associated therewith be completely eliminated. In specific embodiments, the terms “treat,” “treatment,” and “treating” refer to the amelioration of at least one measurable physical parameter of a proliferative disorder, such as growth of a tumor, not necessarily discernible by the patient. In other embodiments the terms “treat,” “treatment,” and “treating” refer to the inhibition of the progression of a proliferative disorder, either physically by, e.g., stabilization of a discernible symptom, physiologically by, e.g., stabilization of a physical parameter, or both. In other embodiments the terms “treat,” “treatment,” and “treating” refer to the reduction or stabilization of tumor size or cancerous cell count.
Ewing’s Sarcoma is a type of cancerous tumor that grows in the bones or the soft tissue around bones, such as cartilage or the nerves. It results from a translocation which fuses the EWS gene on chromosome 22 with the FLU gene on chromosome 11. The resultant fusion, EWS-FLI1, functions as a transcriptional activator. Treatment for Ewing sarcoma usually begins with chemotherapy. The drugs may shrink the tumor and make it easier to remove the cancer with surgery or target with radiation therapy. After surgery or radiation therapy, chemotherapy treatments might continue in order to kill any cancer cells that might remain. Accordingly, in some examples, the compositions described herein are administered to a subject in need thereof (e.g., intravenous (similar to other chemotherapy treatments currently used for Ewing’s Sarcoma), through infusion pump, or intratumoral injection) in a therapeutically sufficient amount to reduce tumor size or to kill tumor cells. In some instances, the compositions described herein are administered in a therapeutically sufficient amount to reduce the aberrant gene expression driven by activation of GGAA-microsatellites in a cell, which results because of the activity of EWS-FLI1.
In some instances, the compositions described herein can be used in combination with one or more other treatments that are typically used to treat Ewing’s Sarcoma. For example, in combination with chemotherapy agents (e.g., vincristine, doxorubicin, cyclophosphamide, ifosfamide, and etoposide), radiation, surgery, or any combination thereof.
EXAMPLES
The present invention is additionally described by way of the following illustrative, non-limiting Examples that provide a better understanding of the present invention and of its many advantages.
As shown herein, engineered ZFAs can be used to efficiently target and alter the chromatin state of a class of microsatellite repeats in human cells. Materials and Methods
The following materials and methods were used in the examples below.
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Cell lines
Primary bone marrow derived-MSCs were collected with approval from the Institutional Review Board of the Centre Hospitalier Universitaire Vaudois. Samples were de-identified prior to our analysis. MSCs were cultured in IMDM (Life Technologies) containing 10% fetal calf serum (FCS) and 10 ng/ml platelet-derived growth factor BB (PeproTech). U2OS were obtained from Toni Cathomen (Freiburg). All other cell lines were obtained from ATCC and media from Life Technologies. Ewing sarcoma cell lines SKNMC, A673, EW7 were grown in RPMI 1640 and CHP100 in McCoy’s 5a Medium. HEK293T, Hela and U2OS were grown in DMEM and MRC5 in EMEM. All media were supplemented with 1% penicillin and streptomycin (Life Technologies). McCoy’s 5a medium was supplemented with 15% FBS and all other media were supplemented with 10% FBS. Cells were cultured at 37° C with 5% CO2. Media supernatant was analyzed biweekly for the presence of Mycoplasma using MycoAlertTM PLUS (Lonza). Cell lines were authenticated by ATCC STR profiling.
METHODS DETAILS
Plasmids and oligonucleotides
Each of the 16 different ZFAs that recognize ~4.5 GGAA tandem repeats was generated by assembling pre-selected 2-ZF units from an unpublished Joung lab archive. Although we used an unpublished archive of engineered zinc finger modules to provide the various 2-ZF units for constructing our ZFAs, there are other published public sources of zinc finger units as well as protocols that can be used to create customized zinc finger arrays (Sander et al. 2010; Fu et al. 2009; Wright et al. 2006; Sander et al. 2011; Maeder et al. 2008, 2009). The assembled ZFAs were inserted into the pENTR3C vector and EWS N-terminus (Riggi et al. 2014) or KRAB (from BPK1407) was cloned into pENTR3C- ZFAs by Gibson assembly. The EWS-ZFA or KRAB-ZFA fusions thus generated were transferred to lentiviral pLIV vector containing EFl -alpha promoter via LR reactions using Gateway LR clonase II Enzyme Mix (Invitrogen). dCas9-EWS (NP173) was constructed by cloning EWS into BPK1179 digested with Xhol and Notl by Gibson assembly, and EWS-dCas9 (YET3486) was constructed by cloning EWS into pSQT digested with Agel and BstZ17i by Gibson assembly. DmrC-EWS was generated by inserting EWS into DmrC entry vector digested with Nrul, using Gibson assembly. Sequences of gRNAs used in this study are provided in Table 2A.
Transfection
For EWS-ZFA experiments in U2OS cells, 2 x 105 cells were transfected with lug of plasmids by nucleofection using the DN-100 program on a Lonza 4-D Nucleofector with the SE Cell Line Kit (Lonza) and transfected cells were plated in 24-well plates. For dCas9- based EWS constructs, we used the nucleofection method described in detail previously (Tak et ai. 2017).
Lentiviral Generation
Lentivirus was produced in HEK293T LentiX cells (Clontech) by LT1 (Mirus Bio) transfection with gene delivery vector and packaging vectors GAG/POL and VSV plasmids(Boulay et al. 2017). Viral supernatants were collected 72 h after transfection and concentrated using the LentiX concentrator (Clontech). Virus containing pellets were resuspended in PBS and added dropwise on cells in presence of growth media supplemented with 6 ug/ml polybrene. Cells infected with lentivirus were selected using puromycin (Invivogen) at a concentration of 1 ug/ml for SKNMC, EW7, CHP100, HEK293T, HeLa and U2OS or 2 ug/ml for A673 and MRC5 in the growth medium. MSCs were selected with 0.75 ug/ml puromycin. Overexpression efficiency was determined by immunoblot analysis.
Immunoblot Analysis
Immunoblot analyses were performed using standard protocols (Boulay et al. 2017). Primary antibodies were used at the following concentrations: rat anti-HA (Roche, lug/ml), rabbit anti -FLU (abeam, lug/ml), and mouse anti-GAPDH (Millipore, 0.1 ug/ml). Secondary antibodies were goat anti-rabbit, goat anti-rat, and goat anti-mouse IgG respectively conjugated with horseradish peroxidase (Bio-Rad, 1: 10,000 dilution). Membranes were developed using Western Lightning Plus-ECL enhanced chemiluminescence substrate (PerkinElmer) and visualized using photographic film.
Real-Time Quantitative reverse transcription PCR
Total RNA was extracted from the transfected cells 72 hours post-transfection using the NucleoSpin® RNA Plus (Clontech), and 250 ng of purified RNA was used for cDNA synthesis in 20ul of total reaction using High-Capacity RNA-cDNA kit (ThermoFisher). cDNA was diluted 1:20 and 3 pl of cDNA was used for quantitative PCR (qPCR) using SYBR Green Real-Time PCR Master Mix (ThermoFisher), and primers specific for the target transcript (Table 2B). qPCR was performed using Roche LightCycler480 with the following cycling protocols: initial denaturation at 95 °C for 20 seconds (s) followed by 45 cycles of 95 °C for 3 s and 60 °C for 30 s. Ct values over 35, were considered as 35. Relative quantification of each target, normalized to an endogenous control (GAPDH or HPRTPy was performed using the comparative Ct method (Applied Biosystems).
Cell Viability Assays
Cells that were transduced with lentiviral KRAB-ZFA plasmid or GFP control plasmid were grown for 8 days and cell viability was measured using the CellTiter-Glo luminescent assay (Promega) as described by the manufacturer. Endpoint luminescence was measured on a SpectraMax M5 plate reader (Molecular Devices).
Definition of target genes associated with EWS-FLI1 bound GGAA-repeats
In FIG. 3H, 126 GGAA repeat-associated genes were selected based on a maximum distance of 100 kb from EWS-FLI1 bound GGAA repeats (n=812) and upregulation upon EWS-FLI1 induction in MSCs (greater than 2-fold). In FIG. 5H, 235 GGAA repeat-associated genes were selected based on a maximum distance of 100 kb from EWS-FLI1 bound GGAA repeats (n=812) and downregulation upon EWS-FLI1 knockdown in both SKNMC and A673 Ewing sarcoma cell lines (greater than 2-fold) (Riggi et al. 2014). ChlP-seq
ChIP assays of MSCs, SKNMC, A673 and HEK293T cells were carried out using 2-5 x 106 cells per sample and per epitope, following the procedures described previously (Mikkelsen et al. 2007). In brief, chromatin from formaldehyde-fixed cells were fragmented to 200-700 bp with a Branson 250 sonifier. Solubilized chromatin was immunoprecipitated overnight at 4C with 3 pg of target specific antibodies (rat anti-HA (Roche), rabbit anti -FLU (Abeam), rabbit anti-H3K27ac (Active Motif), and rabbit anti- H3K9me3 (Abeam)). Antibody-chromatin complexes were pulled down with protein G- Dynabeads (Life Technologies), washed, and then eluted. After crosslink reversal, RNase A, and proteinase K treatment, immunoprecipitated DNA was extracted with AMP Pure beads (Beckman Coulter). ChIP DNA was quantified with Qubit. Sequencing libraries were prepared with 1-5 ng of ChIP DNA samples and input samples using the Ovation Ultralow System V2 kit (Nugen). Libraries were sequenced with single-end (SE) 50-75 cycles on an Illumina Nextseq 500 Illumina genome analyzer.
ChlP-seq Bioinformatic Analysis
Reads were aligned to human reference genome hgl9 using bwa (Li and Durbin 2009). Aligned reads were then filtered to exclude PCR duplicates and were extended to 200 bp to approximate fragment sizes. Density maps were generated by counting the number of fragments overlapping each position using igvtools, and normalized to 10 million reads. We used MACS2 (Zhang et al. 2008) to call peaks using matching input controls with a q- value threshold of 0.01. Peaks were filtered to exclude blacklisted regions as defined by the ENCODE consortium (ENCODE Project Consortium 2012). Peaks within 200 bp of each other were merged. Genome- wide GGAA microsatellite repeats were previously annotated (Boulay et al. 2017, 2018). Peak intersections were identified using bedtools (Quinlan et al. 2010). Average ChlP-seq signals across intervals were calculated using bwtool (Pohl and Beato 2014). findMotifsGenome.pl was used to identify de novo DNA motifs between 8 and 20 bp from all sites bound by EWS-ZFA with the Homer suite of tools (Heinz et al. 2010). Signals shown in heatmaps (100 bp windows) and composite plots (10 bp window) were calculated using bwtool (Pohl and Beato 2014). Heatmap signals are in log2 scale, centered around EWS-FLI1 binding sites (Riggi et al. 2014) and are capped at the 99th percentile.
RNA-Seq
Total RNA was isolated from cells using NucleoSpin RNA Plus (Clontech). For Fig. 2h, RNA libraries were prepared from 500 ng of total RNA treated with Ribogold zero to remove ribosomal RNA, using TruSeq Stranded Total RNA Library Prep Gold kit (Illumina, 20020599) and TruSeq RNA Single Indexes. The RNA libraries were sequenced with PE 32 cycles on an Illumina Nextseq500 system. For Fig. 3g, RNA samples were sent to Novogene Corporation for mRNA sequencing. RNA libraries were sequenced with PEI 50 cycles on an Illumina NovaSeq 6000 system. Reads were aligned to hgl9 using STAR (Dobin et al. 2013). Mapped reads were filtered to exclude PCR duplicates and reads mapping to known ribosomal RNA coordinates, obtained from the rmsk table in the UCSC database (genome . Gene expression was calculated using featureCounts (Liao,
Figure imgf000033_0001
Smyth, and Shi 2014). Only primary alignments with mapping quality of 10 or more were counted. Counts were then normalized to 1 million reads. Signal tracks were generated using bedtools (Quinlan et al. 2010). Differential expression was calculated using DESeq2 (Love, Huber, and Anders 2014).
GSEA analysis
Gene set overlaps were computed using Gene Set Enrichment Analysis (GSEA, gsea-msigdb.org/gsea/msigdb/annotate.jsp). Genes lists for GSEA analysis were selected using a log2 fold change of 0.6 for upregulated genes and -0.6 for downregulated genes. An adjusted p-value threshold of < 0.1 was also applied. Gene lists were then analyzed for overlaps with C2 (curated gene sets) and BP (GO biological process), with a FDR q-value < 0.05.
QUANTIFICATION AND STATISTICAL ANALYSIS
Information on the number of biological replicates, statistical tests and p-values is provided in the figure legends. Example 1.1: Engineering sequence-specific DNA-binding domains to target GGAA microsatellite repeats
Although multiple platforms are available to create DNA-binding modules that might be capable of recognizing GGAA microsatellite repeats bound by the EWS-FLI1 fusion protein, we chose to focus on using engineered Zinc Finger Arrays (ZFAs) and RNA-targeted dCas9 from Streptococcus pyogenes. Cys2His2 ZFAs can be engineered to recognize novel DNA sequences of interest and have been used successfully to build artificial transcription factors capable of influencing gene expression in human cells (Graslund et al. 2005; Beerli et al. 1998). Alternatively, dCas9 programmed by guide RNAs (gRNAs) have similarly been used to create gene regulatory proteins that function efficiently in human cells (Qi et al. 2013; Maeder et al. 2013; Perez-Pinera et al. 2013) and offer the substantial additional advantage of simple targetability by altering the gRNA sequence. TALE repeats have also been used to build customized DNA-binding domains, utilizing assembled arrays of four TALE repeat domains as “building blocks”, with each recognizing one of the four different DNA bases (Scholze and Boch 2011; Boch et al. 2009; Moscou and Bogdanove 2009). However, because the NN TALE repeat typically used to recognize guanine (G) has also been reported to recognize adenine (A) (albeit with less efficiency) (Streubel et al. 2012; Deng et al. 2012; Christian et al. 2012), we elected not to engineer TALE repeat arrays designed to recognize GGAA microsatellite repeat sequences.
We engineered ZFAs to recognize two 18 bp sequences that align within different registers of 4.5 GGAA repeats: 5’-GGAAGGAAGGAAGGAAGG and 5’- AAGGAAGGAAGGAAGGAA. A single ZF recognizes ~3 bp of DNA and previous work has shown that highly active arrays of six ZFs that recognize 18 bp target sites can be assembled by using pre-selected 2-ZF units joined together by non-canonical (non- TGEKP linkers) such as TGSQKP or CGSQKP (Sander et al. 2010; Fu et al. 2009; Wright et al. 2006; Sander et al. 2011; Maeder et al. 2009, 2008; Joung, Voytas, and Kamens 2015; Pearson 2008; Moore, Klug, and Choo 2001, Joung lab (unpublished data)). Using this strategy and an archive of pre-selected 2-ZF units engineered to bind to various specific target sequences, we assembled eight different 6-ZF arrays for each of the two 18 bp target sites (FIG. 1A) (Methods, Table 1). To test the abilities of these 16 ZFAs to bind to GGAA microsatellite repeats, we fused the disordered prion-like N- terminal domain of EWSR1 (Chong et al. 2018; Boulay et al. 2017) (hereafter referred to as the EWS domain) to the N-terminus or C-terminus of each of the ZFAs (FIG. IB). We then assessed the abilities of each of these 32 fusions to activate the UGT3A2 gene (an EWS-FLI1 target gene that has 11 GGAA repeats positioned ~2 kb upstream of its promoter) in human U2OS cells. We found that all of these ZF-based fusions activated UGT3A2 with varying levels of efficiency (mean fold-activation ranging from 14- to 190- fold) (FIG. 1C). Because the ZF array ZFA7 exhibited approximately equivalent activity regardless of the position of the EWS domain (mean fold activation of 83- and 70-fold) and this level of activation was similar to that observed with EWS-FLI1 (mean fold activation of 121 -fold) (FIG. 1C), we selected the EWS-ZFA7 fusion (hereafter referred to as EWS-ZFA) for use in further experiments.
Table 1 - Zinc Finger Arrays
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
#, SEQ ID NO:
Table 2A - SpCas9 gRNAs used in this study
Figure imgf000043_0002
Figure imgf000044_0001
#, SEQ ID NO:
Table 2B. RT-qPCR primers used in this study
Figure imgf000044_0002
#, SEQ ID NO: To enable binding of GGAA repeats by dCas9, we also designed a gRNA that would target a 23 bp sequence composed of ~5.5 GGAA repeats: 5’- AGGAAGGAAGGAAGGAAGGAagg, consisting of a 20 nt spacer (bold) and an NGG PAM (lower case). To our surprise, expression of this gRNA together with a fusion protein in which the EWS domain was fused to the N-terminal or C-terminal end of dCas9 (hereafter referred to as EWS-dCas9 or dCas9-EWS, respectively) failed to activate the UGT3A2 gene in U2OS cells (FIG. ID). To increase the number of EWS domains recruited by our dCas9-gRNA complex, we used single and multimerized configurations of two domains called DmrA and DmrC that only interact in the presence of a small molecule A/C heterodimerizer. We co-expressed our gRNA and a DmrC-EWS domain fusion together with a dCas9-DmrA fusion protein harboring one, two, three, or four DmrA domains in U2OS cells (FIG. ID). In the presence of heterodimerizer, we found that dCas9-DmrA fusions harboring two, three or four DmrA domains could mediate modest activation of the UGT3A2 gene (mean fold-activation of 3.1, 3, or 1.1, respectively), levels much lower compared to the activation observed using the EWS- ZFA fusion (FIG. ID). In contrast, these same dCas9-based EWS constructs were effective at activating various other genes when using different gRNAs directed to non- repetitive target sites within the promoters of those genes in U2OS cells and HEK293 cells (FIGs. 2A-2B). The ability of our dCas9-based EWS constructs to mediate activation from unique sites in the genome but not from GGAA repeats suggests that it may be challenging for these fusions to recognize and/or bind to these repeats that are present at over 13,000 loci in the human genome including the UGT3A2 promoter. Taken together, our results show that an engineered EWS-ZFA fusion could more effectively activate an EWS-FLI1 target gene with upstream GGAA repeats than analogous dCas9- based fusions to the EWS domain.
Example 1.2: An EWS-ZFA fusion recapitulates genome-wide activation of microsatellite repeats observed in Ewing sarcoma
We next tested whether EWS-ZFA could target and activate GGAA microsatellites genome-wide by comparing its activity to EWS-FLI1 in mesenchymal stem cells (MSCs). MSCs are a model for the cell of origin of Ewing sarcoma and EWS- FLI1 has previously been shown to operate as a pioneer factor at GGAA repeats in these cells to induce a chromatin landscape and gene expression pattern similar to that of tumor cells (Riggi et al. 2014). Transduction of MSCs with lentiviral vectors expressing EWS- ZFA followed by ChlP-seq and unbiased sequence analysis identified GGAA repeats as the dominant motif found in EWS-ZFA peaks (more than 80% of EWS-ZFA binding sites contained more than four consecutive GGAA units, FIG. 3A, FIG. 4A, note alternate motifs include: GGAAGGAAGGAAGGAAGGAAGGAA (SEQ ID NO: 136) and AAGGAAGGAAGGAAGGAAGGAAGG (SEQ ID NO: 137)). EWS-ZFA also bound nearly all of the GGAA repeats in the genome bound by EWS-FLI1 (FIG. 3B, FIG. 4B) We categorized 13,029 GGAA microsatellites with more than 4 consecutive GGAA units based on their length (FIG. 4C) and found that the EWS-ZFA binds a higher fraction of GGAA microsatellites (10-20%) than EWS-FLI1 at each length interval (FIG. 3C, FIG. 4D)
We further tested whether EWS-ZFA binding would lead to the induction of active chromatin states at GGAA repeats in MSCs. To do this, we used ChlP-seq to measure the active chromatin mark H3K27ac at 812 GGAA microsatellites that are consistently bound by endogenous EWS-FLI1 in Ewing sarcoma cell lines (Riggi et al. 2014). We observed strong binding of EWS-ZFA at these same sites and de novo deposition of H3K27ac, often at higher levels than that induced by EWS-FLI1 (FIGS. 3D - 3E; FIG. 4E). This higher activity and the higher binding of GGAA repeats by EWS- ZFA compared to EWS-FLI1 (FIG. 3C, FIG. 4D) may be due to higher protein expression levels observed upon lentiviral induction (FIG. 4F). However, we cannot rule out other possible explanations such as structural and functional differences between the ZFA and FLU DNA-binding domains (which may provide distinct DNA stability profiles to each fusion protein) or differing numbers of fusion proteins recruited to a given GGAA repeat (which may result in variable recruitment of chromatin co-factors involved in H3K27ac deposition). By contrast, canonical non-repeat GGAA sites bound by EWS- FLI1 showed no evidence of EWS-ZFA binding or chromatin state changes, thereby demonstrating the specificity of the engineered EWS-ZFA fusion for GGAA repeats relative to non-repeat GGAA sites as expected (FIGs. 3F - 3G, FIG. 4G). In addition to changes in chromatin activity, we also measured transcriptional changes for genes in the vicinity of EWS -FLU -bound GGAA repeats. Transcript analysis showed that 72% of the genes that are within 100Kb of EWS -FLU -bound GGAA repeats (FIG. 3H, Table 3, shown below) and that are induced > 2-fold by EWS-FLI1 were also upregulated to a similar degree by EWS-ZFA. Taken together, these data show that EWS-ZFA was able to phenocopy the chromatin and transcriptional activation observed in Ewing sarcoma, suggesting that localizing the N-terminal EWS domain to GGAA repeats via an engineered ZFA instead of the FLU DNA-binding domain was sufficient to initiate the recruitment of chromatin regulators required for pioneer function, enhancer activation and target gene expression. These results provide an important proof-of-concept for how engineered ZFAs can be an effective tool to target and alter the functional state of GGAA microsatellites genome-wide.
Table 3:
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Example 1.3: KRAB-ZFAs can selectively silence the microsatellite-driven Ewing Sarcoma gene expression program
Given that EWS-ZFA can efficiently target and activate GGAA microsatellites in MSCs, we hypothesized that a fusion of our engineered ZFA to a repressive KRAB domain (Margolin et al. 1994; Groner et al. 2010) might conversely silence active GGAA microsatellites bound by endogenous EWS-FLI1 in Ewing sarcoma cells, thereby inactivating its downstream oncogenic gene expression program. This approach offers the possibility to delineate the precise functional role of GGAA repeats in Ewing Sarcoma cells, in isolation from the non-repeat GGAA target sites of EWS-FLI1. We expressed a KRAB-ZFA fusion protein and found that it bound efficiently to GGAA microsatellites in two Ewing sarcoma cell lines (SKNMC and A673). Interestingly, KRAB-ZFA binding was followed by EWS-FLI1 eviction from the same genomic sites, as assessed by FLU ChlP-seq performed in SKNMC and A673 cells (FLU ChlP-seq can be used to detect the binding of EWS-FLI1 because these two cell lines do not express endogenous wild type FLU) (FIGs. 5A - 5B, FIGs. 6A-6B). KRAB-ZFA binding was also associated with striking changes in chromatin states and the induction of repressive marks with increased H3K9me3 and decreased H3K27Ac signals at GGAA microsatellites (FIGs. 5A, 5C - 5D, FIGs. 6A, 6C). As expected, these changes were observed uniquely at GGAA repeats and not at non-repeat GGAA EWS-FLI1 binding sites, confirming the specificity of KRAB-ZFA (FIGs. 6D-6F). Among the genes located within 100 kb of EWS-FLI1- bound GGAA repeats that showed > 2-fold decreases in EWS-FLI1 -depleted cell lines, 49% and 47% showed a similar decrease due to KRAB-ZFA expression in SKNMC and A673 cells, respectively (FIG. 5G, FIG. 7A, Table 4, shown below). Genes involved in specific functional categories (e.g., cell cycle regulation and neurogenesis) that have previously been identified after EWS-FLI1 knockdown (Riggi et al. 2014) and are linked to Ewing sarcoma cell survival were enriched among the genes downregulated by KRAB-ZFA in SKNMC and A673 cells (FIG. 7B).
Table 4:
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
Figure imgf000061_0001
Because the KRAB-ZFA fusion would only be expected to alter the function of GGAA repeats in Ewing sarcoma cells in which the EWS-FLI1 is expressed (and in which these repeats function as enhancers), we were interested in evaluating the effects of KRAB-ZFA expression in non-Ewing sarcoma cells. To do this, we analyzed genomewide chromatin state changes in HEK293T cells upon expression of KRAB-ZFA. Similar to what has been observed in most non-Ewing cell types previously examined (Riggi et al. 2014), we found that HEK293T cells were largely devoid of active chromatin marks at GGAA repeats and that there were no major changes in H3K27Ac signals induced with KRAB-ZFA expression (FIG. 7C). However, GGAA repeats in HEK293T cells accumulated strong repressive H3K9me3 signals after expression of KRAB-ZFA in the same manner as Ewing sarcoma cells (FIGs. 5E - 5F). In contrast to Ewing sarcoma cells, HEK293T transduced with KRAB-ZFA displayed minimal transcriptional changes, which only included a handful of genes with GGAA repeats located within their promoters (FIG. 7D, Table 6).
Finally, we tested whether the selective antagonistic effect exerted by the KRAB- ZFA fusion on the EWS-FLI1 -induced transcriptional program in Ewing sarcoma cells might also translate into a cell-type-specific impact on cell viability. To this end, we quantitatively compared the viability of four different Ewing sarcoma cell lines to four non-Ewing sarcoma control lines upon the expression of KRAB-ZFA or GFP (as a negative control) (FIG. 5H). Strikingly, despite similar KRAB-ZFA protein expression levels (FIG. 7E), only the viability of Ewing sarcoma cells was affected by KRAB-ZFA, with a reduction exceeding 80%, whereas minimal toxicity was observed in all negative control non-Ewing sarcoma cell lines (FIG. 5H). Discussion
Our results show that engineered ZFAs are highly effective and specific tools for targeting widely distributed repetitive elements and altering their chromatin states. Engineered ZFAs have distinct advantages for this purpose given their high DNA binding affinities, small size, and similarities to endogenous transcription factors. Our findings further demonstrate that engineered ZFAs can greatly facilitate the functional assessment of the important but challenging-to-study repetitive elements of the human genome and may provide a strategy for therapeutically modifying the non-coding function of these repeats.
In the case of GGAA microsatellites that are activated genome- wide in Ewing sarcoma, the high degree of specificity conferred by ZFAs allowed us to isolate their function and determine that these elements are in fact responsible for large-scale gene activation in this tumor type. By recruiting specific regulatory domains without the involvement of endogenous DNA binding proteins, engineered ZFAs also make it possible to study the contribution of poorly understood proteins as shown by our finding that the N-terminus of EWSR1 is sufficient to activate GGAA microsatellites in the absence of the ETS DNA binding domain contained in EWS-FLI1.
Intriguingly, we observed large differences between ZFA and RNA-guided dCas9 approaches for targeting GGAA repeats. These observations suggest that ZFAs may have advantages over dCas9 for studying the function of tandem repeats that occur at a high number of different locations in the human genome.
Sequences for Zinc-finger array fusion proteins
KRAB-ZFA1 (SEQ ID NO:66)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA1 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCAT
TTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGA
TCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACG
CACCTGAGGGGATCCTAA
KRAB-ZFA2 (SEQ ID NO:67)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA2
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA CTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACAT
TTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA3 (SEQ ID NO:68)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA3
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCAT
TTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGA
TCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACG
CACCTGAGGGGATCCTAA
KRAB-ZFA4 (SEQ ID NO:69)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA4 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACAT
TTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA5 (SEQ ID NO: 70)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA5
AAGGACCCCAAGAAGAAGAGGAAAGTCYCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC ATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCAT
TTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGA
TCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACG
CACCTGAGGGGATCCTAA
KRAB-ZFA6 (SEQ ID NO:71)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA6
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACAT
TTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA7 (SEQ ID NO: 72)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA7
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCAT
TTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGA
TCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACG
CACCTGAGGGGATCCTAA
KRAB-ZFA8 (SEQ ID NO: 73)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA8
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGC
ATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATAC
AGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTG
CACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCA
GTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATC
TACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAA
CTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCC
AGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACAT
TTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA9 (SEQ ID NO: 74)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA9
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACAC
CGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAG
GACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCC
AGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACAT ACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCC
CAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTA
ACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCG
AATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTA
CGCACCTGAGGGGATCCTAA
KRAB-ZFA10 (SEQ ID NO: 75)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA10
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACAC
CGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAG
GACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCC
AGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACAT
ACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCC
CAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAA
CTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGA
ATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTAC
GCACCTGAGGGGATCCTAA
KRAB-ZFA11 (SEQ ID NO: 76)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA11 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACAC
CGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAG
GACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCC
AGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCAT
ACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCC
CAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTA
ACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCG
AATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTA
CGCACCTGAGGGGATCCTAA
KRAB-ZFA12 (SEQ ID NO:77)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA12
AAGGACCCCAAGAAGAAGAGGAAAGTCYCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA TGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACAC
CGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAG
GACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCC
AGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCAT
ACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCC
CAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAA
CTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGA
ATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTAC
GCACCTGAGGGGATCCTAA
KRAB-ZFA13 (SEQ ID NO:78)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA13
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACC
GGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACT
CTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCA
GTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATA
CCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAAT
TTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCA
GAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACT
TGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAAT ATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA14 (SEQ ID NO:79)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA14
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACC
GGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACT
CTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCA
GTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATA
CCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAAT
TTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCA
GAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACT
TGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAAT
ATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA15 (SEQ ID NO: 80)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA15
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACC
GGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACT
CTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCA
GTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATA
CCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAAT
TTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCA
GAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACT
TGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAAT
ATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGC
ACCTGAGGGGATCCTAA
KRAB-ZFA16 (SEQ ID NO:81)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA16
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCCGATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTT
CAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGAC
ACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGA
ACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGG
TTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGAGAAATTCACCAA
GAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAAATCATCAGTTGG
AGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTA
TGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACC
GGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACT
CTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCA
GTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATA CCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAAT
TTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCA
GAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACT
TGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAAT
ATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGC
ACCTGAGGGGATCCTAA
ZFA1-KRAB (SEQ ID NO: 82)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA1
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCA
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA2-KRAB (SEQ ID NO: 83)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA2 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAAC
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA3-KRAB (SEQ ID NO: 84)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA3 AUGGACCCCAAGAAGAAGAGGAAAGTCICGAGCTACCCATACGATGTTCCAGAT TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GG/TCTAGACC CGGAGAGCGC CC ATTCC AGTGTCGGATTTGCATGCGGAAC TT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
AACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATT CCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCAC
ATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGA
AATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA
CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG
AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA
AATCATCAGTTTAA
ZFA4-KRAB (SEQ ID NO: 85)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA4
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
AACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATT
CCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACC
ATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGA
AATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA
CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA
AATCATCAGTTTAA
ZFA5-KRAB (SEQ ID NO: 86)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA5
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCA
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA6-KRAB (SEQ ID NO: 87)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA6
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAAC
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA7-KRAB (SEQ ID NO: 88)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA7 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCAC
ACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTA
ACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATTC
CAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACA
TACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGGA
TCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGAC ACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGG
AAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGG
AGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGAT
GTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGA
GAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAA ATCATCAGTTTAA
ZFA8-KRAB (SEQ ID NO: 89)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA8
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCAC
ACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTA
ACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATTC
CAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCA
TACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGGA
TCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGAC
ACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTGG
AAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTGG
AGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGAT
GTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAGA
GAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCAA ATCATCAGTTTAA
ZFA9-KRAB (SEQ ID NO:90)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA9 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA10-KRAB (SEQ ID NO:91)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA10
AUGGACCCCAAGAAGAAGAGGAAAGTCICGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GG/TCTAGACC CGGAGAGCGC CC ATTTC AGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA
CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG
AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA
AATCATCAGTTTAA
ZFA11-KRAB (SEQ ID NO:92)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA11
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA12-KRAB (SEQ ID NO:93)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA12
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA
CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG
AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA
AATCATCAGTTTAA
ZFA13-KRAB (SEQ ID NO:94)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA13
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA14-KRAB (SEQ ID NO:95)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA14 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG
AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA AATCATCAGTTTAA
ZFA15-KRAB (SEQ ID NO:96)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA15
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGG
ACACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGT
GGAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCT
GGAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCA
GATGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAG
AGAGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAA
TCAAATCATCAGTTTAA
ZFA16-KRAB (SEQ ID NO:97)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: KRAB, Underlined: ZFA16 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGATGCTAAGTCACTAACTGCCTGGTCCCGGA
CACTGGTGACCTTCAAGGATGTATTTGTGGACTTCACCAGGGAGGAGTG
GAAGCTGCTGGACACTGCTCAGCAGATCGTGTACAGAAATGTGATGCTG
GAGAACTATAAGAACCTGGTTTCCTTGGGTTATCAGCTTACTAAGCCAGA
TGTGATCCTCCGGTTGGAGAAGGGAGAAGAGCCCTGGCTGGTGGAGAG
AGAAATTCACCAAGAGACCCATCCTGATTCAGAGACTGCATTTGAAATCA
AATCATCAGTTTAA
EWS-ZFA1 (SEQ ID NO:98)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA1
AUGGACCCCAAGAAGAAGAGGAAAGTCICGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTT
GCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCC
ACACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCG
CGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCT
TTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGT
CACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA2 (SEQ ID NO:99)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA2
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTT
GCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCC
ACACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCG
CGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTT
TCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTC
ACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA3 (SEQ ID NO: 100)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA3
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTT
GTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCA
CACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGC
GTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTT
CAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCA
CCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA4 (SEQ ID NO: 101)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA4
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTT
GTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCA
CACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGC
GTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTT
CAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCA
CCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA5 (SEQ ID NO: 102)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA5
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTT
GCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCC
ACACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCG
CGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCT
TTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGT
CACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA6 (SEQ ID NO: 103)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA6
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTT
GCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCC
ACACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCG
CGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTT
TCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTC
ACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA7 (SEQ ID NO: 104)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA7
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTT
GTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCA
CACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGC
GTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTT
CAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCA
CCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA8 (SEQ ID NO: 105)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA8
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTCCAGT
GTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACC
CGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTT
CTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCACTGCGGCAGCCAG
AAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTT
GTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATA
TGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCA
CACCGGTTCCCAGAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGC
GTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTT
CAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCA
CCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA9 (SEQ ID NO: 106)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA9 ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATT
TGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCC
ACACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCC
CAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCAT
TCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGG
CACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA10 (SEQ ID NO: 107) Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA10
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATT
TGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCC
ACACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCC
CAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCAT
TCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTAT
CACCTGCGTACGCACCTGAGGGGATCCTAA EWS-ZFA11 (SEQ ID NO: 108)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA11
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATT
TGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATC
TGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCA
CACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCC
AGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATT
CCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGC
ACCTGCGTACGCACCTGAGGGGATCCTAA EWS-ZFA12 (SEQ ID NO: 109)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA12
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTCAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATT
TGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATC
TGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCA
CACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCC
AGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATT CCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATC
ACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA13 (SEQ ID NO: 110)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA13
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATT
TGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT
CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCC
ACACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCC CAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCAT
TCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGG
CACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA14 (SEQ ID NO: 111)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA14
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATT
TGGACGCACATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGAT CTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCC
ACACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCC
CAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCAT
TCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTAT
CACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA15 (SEQ ID NO: 112)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA15
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCA
GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATT TGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATC
TGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCA
CACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCC
AGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAGCCATT
CCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTCTTGGC
ACCTGCGTACGCACCTGAGGGGATCCTAA
EWS-ZFA16 (SEQ ID NO: 113)
Bold and Italic. NLS, Italic and underlined. 3X HA, Bold: EWS, Underlined: ZFA16
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GCTTCAGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCA
GGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC
ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTG
ATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGC
CTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTG
CCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTA
TGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCT
CAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGC
CAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGA
GACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA
GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACC
CCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACA
CCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACA
AAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACG
GGCAGCAGGGAGGCGGTGGA4GCTCTAGACCCGGAGAGCGCCCATTTCAGT
GTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTA
CGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACT
TCAGTCGTAACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGCAGCCA GAAGCCATTCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATT
TGTTGAACCATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATC
TGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAACGTCATCTACGTACCCA
CACCGGTTCCCAGAAGCCATTTCAGTGTCGGATCTGTATGCGGAACTTCTCCC
AGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAGCCATT
CCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCAGTATC
ACCTGCGTACGCACCTGAGGGGATCCTAA
ZFA1-EWS (SEQ ID NO: 114)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA1
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCA
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA2-EWS (SEQ ID NO: 115)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA2
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCA
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA3-EWS (SEQ ID NO: 116)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA3
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
AACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATT
CCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCAC
ATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGA
AATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT
GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
ZFA4-EWS (SEQ ID NO: 117)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA4
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCCACATCATTTGGACGCACATACCCGTACTCATACAGGTGAAAAA
CCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAA
ACGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATC
TGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
AACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATT
CCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACC
ATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGA
AATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG
CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
ZFA5-EWS (SEQ ID NO: 118)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA5
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCAC
ACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTA
ACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATTC
CAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACA
TACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGGA
TCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAGC
TGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGA
TATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATG
GACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTA
TGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATG
GCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGC
CTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCT
ATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAA
CAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAAC
CAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTAC
CTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCC
TACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACT
CTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAG
CTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA
CCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGA
GCAGCAGCTACGGGCAGCAGTAA
ZFA6-EWS (SEQ ID NO: 119)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA6
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCA
CACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGT
CAGGACAACTTGTCTTGGCACCTAAAAACCCACACCGGTTCCCAGAAGCCAT
TCCAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAAC
CATACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCG
AAATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA7-EWS (SEQ ID NO: 120)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA7
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCAC
ACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTA
ACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATTC
CAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACA
TACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGGA
TCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAGC
TGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGA
TATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATG GACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTA
TGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT
ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATG
GCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGC
CTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCT
ATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAA
CAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAAC
CAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTAC
CTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCC
TACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACT
CTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAG
CTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA
CCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGA
GCAGCAGCTACGGGCAGCAGTAA
ZFA8-EWS (SEQ ID NO: 121)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA8
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTCCAGTGTCGGATTTGCATGCGGAACTT
TTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCATACAGGTGAAAAAC
CCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGTCTGCACATTTGAAA
CGTCATCTACGTACCCACTGCGGCAGCCAGAAGCCATTTCAGTGTCGGATCT
GTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCAC
ACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTA
ACTCTTATTTGCAGTATCACCTAAAAACCCACACCGGTTCCCAGAAGCCATTC
CAGTGTCGGATTTGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCA
TACCCGTACTCATACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAA
ATTTCTCCCAGTCTGCACATTTGAAACGTCACCTGCGTACGCACCTGAGGGGA
TCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAGC
TGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGA TATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATG
GACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTA
TGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT
ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATG
GCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGC
CTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCT
ATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAA
CAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAAC
CAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTAC
CTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCC
TACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACT
CTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAG
CTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA
CCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGA
GCAGCAGCTACGGGCAGCAGTAA
ZFA9-EWS (SEQ ID NO: 122)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA9
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA10-EWS (SEQ ID NO: 123)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA10
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT
GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG
CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
ZFA11-EWS (SEQ ID NO: 124)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA11
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA12-EWS (SEQ ID NO: 125)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA12
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCCAGGTAACTTGCAGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCAGGACAACTTGTC
TTGGCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT
GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG
CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
ZFA13-EWS (SEQ ID NO: 126)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA13
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA14-EWS (SEQ ID NO: 127)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA14
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCCACATCATTTGGACGCACATACCCGTACTCA TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT
GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG
CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
ZFA15-EWS (SEQ ID NO: 128)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA15
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA
GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCCAGGTAACTTGCAGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTCAGGACAACTTGTCTTGGCACCTGCGTACGCACCTGAGGG
GATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAA
GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAG
GATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTA
TGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTT
ATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTA
TGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAG
GCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGC
CTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGA
AACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACA
ACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGT
ACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT
CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTA
CTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT
AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACC
CACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACA
GAGCAGCAGCTACGGGCAGCAGTAA
ZFA16-EWS (SEQ ID NO: 129)
Bold and Italic. NLS, Italic and underlined'. 3X HA, Bold: EWS, Underlined: ZFA16
ATGGACCCCAAGAAGAAGAGGAAAGTCTCGAGCTACCCATACGATGTTCCAGAT
TACGCTTATCCTTATGACGTACCTGACTATGCATACCCTTATGATGTACCAGACTAC
GC7TCTAGACCCGGAGAGCGCCCATTTCAGTGTCGGATCTGTATGCGGAACTT
CTCCCAGCGTGGTAACTTGTTGCGTCATCTACGTACGCACACCGGAGAGAAG
CCATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTAACTCTTATTTGCA GTATCACCTAAAAACCCACACCGGCAGCCAGAAGCCATTCCAGTGTCGGATT
TGCATGCGGAACTTTTCGCGTCGTGCACATTTGTTGAACCATACCCGTACTCA
TACAGGTGAAAAACCCTTTCAGTGTCGGATCTGTATGCGAAATTTCTCCCAGT
CTGCACATTTGAAACGTCATCTACGTACCCACACCGGTTCCCAGAAGCCATTT
CAGTGTCGGATCTGTATGCGGAACTTCTCCCAGCGTGGTAACTTGTTGCGTCA
TCTACGTACGCACACCGGAGAGAAGCCATTCCAATGCCGAATATGCATGCGC
AACTTCAGTCGTAACTCTTATTTGCAGTATCACCTGCGTACGCACCTGAGGGG
ATCCGGAGGCGGTGGAAGCGCGTCCACGGATTACAGTACCTATAGCCAAG
CTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGG
ATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTAT
GGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCT
ATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTA
TACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT
GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGG
CCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCC
TATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAA
ACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAA
CCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTC
CTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTAC
TCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTA
GCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCC
ACCCCAAACTGGTTCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG
AGCAGCAGCTACGGGCAGCAGTAA
KRAB (SEQ ID NO: 130)
GATGCTAAGTCACTAACTGCCTGGTCCCGGACACTGGTGACCTTCAAGGATG
TATTTGTGGACTTCACCAGGGAGGAGTGGAAGCTGCTGGACACTGCTCAGCA
GATCGTGTACAGAAATGTGATGCTGGAGAACTATAAGAACCTGGTTTCCTTG
GGTTATCAGCTTACTAAGCCAGATGTGATCCTCCGGTTGGAGAAGGGAGAAG AGCCCTGGCTGGTGGAGAGAGAAATTCACCAAGAGACCCATCCTGATTCAGA GACTGCATTTGAAATCAAATCATCAGTT
3X HA (SEQ ID NO: 131)
TACCCATACGATGTTCCAGATTACGCTTATCCTTATGACGTACCTGACTATGC
ATACCCTTATGATGTACCAGACTACGCT
NLS (SEQ ID NO: 132)
CCCAAGAAGAAGAGGAAAGTC
EWS (SEQ ID NO: 133)
GCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACA
GTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATA
TGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACC
CAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATG
GACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCA
GCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCA
CCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCT
TATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGG
ATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTA
CAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTAC
CAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGC
AGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCA
ACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGT
TCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGC AGCAG
ZEA recognition helices:
QSAHLKR (SEQ ID NO: 138) RPHHLDA (SEQ ID NO: 139)
RQDNLSW (SEQ ID NO: 140)
QPGNLQR (SEQ ID NO: 141)
RRAHLLN (SEQ ID NO: 142)
RNSYLQY (SEQ ID NO: 143)
QRGNLLR (SEQ ID NO: 144)
References
Aksenova, Anna Y., and Sergei M. Mirkin. 2019. “At the Beginning of the End and in the Middle of the Beginning: Structure and Maintenance of Telomeric DNA Repeats and Interstitial Telomeric Sequences.” Genes
Beerli, R. R., D. J. Segal, B. Dreier, and C. F. Barbas. 1998. “Toward Controlling Gene Expression at Will: Specific Regulation of the erbB-2/HER-2 Promoter by Using Polydactyl Zinc Finger Proteins Constructed from Modular Building Blocks.” Proceedings of the National Academy of Sciences
Boch, Jens, Heidi Scholze, Sebastian Schornack, Angelika Landgraf, Simone Hahn, Sabine Kay, Thomas Lahaye, Anja Nickstadt, and Ulla Bonas. 2009. “Breaking the Code of DNA Binding Specificity of TAL-Type III Effectors.” Science 326 (5959): 1509-12.
Boulay, Gaylor, Gabriel J. Sandoval, Nicolo Riggi, Sowmya Iyer, Remi Buisson, Beverly Naigles, Mary E. Awad, et al. 2017. “Cancer-Specific Retargeting of BAF Complexes by a Prion-like Domain.” Cell 171 (1): 163-78. el9.
Boulay, Gaylor, Angela Volorio, Sowmya Iyer, Liliane C. Broye, Ivan Stamenkovic, Nicolo Riggi, and Miguel N. Rivera. 2018. “Epigenome Editing of Microsatellite Repeats Defines Tumor- Specific Enhancer Functions and Dependencies.” Genes & Development 32 (15-16): 1008-19.
’’Mechanisms of action of key genetic abnormality in Ewing sarcoma.” August 20, 2018. sciencedaily, com/releases/2018/08/180802115649. htm
Burns, Kathleen H. 2017. “Transposable Elements in Cancer.” Nature Reviews. Cancer 17 (7): 415-24. Chong, Shasha, Claire Dugast-Darzacq, Zhe Liu, Peng Dong, Gina M. Dailey, Claudia Cattoglio, Alec Heckert, et al. 2018. “Imaging Dynamic and Selective Low- Complexity Domain Interactions That Control Gene Transcription.” Science 2018 Jul 27;361(6400):eaar2555
Christian, Michelle L., Zachary L. Demorest, Colby G. Starker, Mark J. Osborn, Michael D. Nyquist, Yong Zhang, Daniel F. Carlson, Philip Bradley, Adam J. Bogdanove, and Daniel F. Voytas. 2012. “Targeting G with TAL Effectors: A Comparison of Activities of TALENs Constructed with NN and NK Repeat Variable Di-Residues.” PloS One 7 (9): e45383.
Delattre, O., J. Zucman, B. Plougastel, C. Desmaze, T. Melot, M. Peter, H. Kovar, I. Joubert, P. de Jong, and G. Rouleau. 1992. “Gene Fusion with an ETS DNA- Binding Domain Caused by Chromosome Translocation in Human Tumours.” Nature 359 (6391): 162-65.
Deng, D., C. Yan, X. Pan, M. Mahfouz, and J. Wang. 2012. “Structural Basis for Sequence-Specific Recognition of DNA by TAL Effectors.” science.sciencemag.org/content/335/6069/720.abstract.
Dobin, Alexander, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, and Thomas R. Gingeras. 2013. “STAR: Ultrafast Universal RNA-Seq Aligner.” Bioinformatics 29 (1): 15-21.
ENCODE Project Consortium. 2012. “An Integrated Encyclopedia of DNA Elements in the Human Genome.” Nature 489 (7414): 57-74.
Fuentes, Daniel R., Tomek Swigut, and Joanna Wysocka. 2018. “Systematic Perturbation of Retroviral LTRs Reveals Widespread Long-Range Effects on Human Gene Regulation.” eLife 7 (August). 2018 Aug 2;7:e35989.
Fu, Fengli, Jeffry D. Sander, Morgan Maeder, Stacey Thibodeau-Beganny, J. Keith Joung, Drena Dobbs, Leslie Miller, and Daniel F. Voytas. 2009. “Zinc Finger Database (ZiFDB): A Repository for Information on C2H2 Zinc Fingers and Engineered Zinc-Finger Arrays.” Nucleic Acids Research 31 (Database issue): D279-83.
Gangwal, Kunal, Savita Sankar, Peter C. Hollenhorst, Michelle Kinsey, Stephen C. Haroldsen, Atul A. Shah, Kenneth M. Boucher, et al. 2008. “Microsatellites as EWS/FLI Response Elements in Ewing’s Sarcoma.” Proceedings of the National Academy of Sciences of the United States of America 105 (29): 10149-54.
Gao, Xuefei, Jason C. H. Tsang, Fortis Gaba, Donghai Wu, Liming Lu, and Pentao Liu. 2014. “Comparison of TALE Designer Transcription Factors and the CRISPR/dCas9 in Regulation of Gene Expression by Targeting Enhancers.” Nucleic Acids Research 42 (20): el 55.
Gersbach, Charles A., Thomas Gaj, and Carlos F. Barbas 3rd. 2014. “Synthetic Zinc Finger Proteins: The Advent of Targeted Gene Regulation and Genome Modification Technologies.” Accounts of Chemical Research 47 (8): 2309-18.
Graslund, Torbjbrn, Xuelin Li, Laurent Magnenat, Mikhail Popkov, and Carlos F. Barbas 3rd. 2005. “Exploring Strategies for the Design of Artificial Transcription Factors: Targeting Sites Proximal to Known Regulatory Regions for the Induction of Gamma-Globin Expression and the Treatment of Sickle Cell Disease.” The Journal of Biological Chemistry 280 (5): 3707-14.
Groner, Anna C., Sylvain Meylan, Angela Ciuffi, Nadine Zangger, Giovanna Ambrosini, Nicolas Denervaud, Philipp Bucher, and Didier Trono. 2010. “KRAB-Zinc Finger Proteins and KAP1 Can Mediate Long-Range Transcriptional Repression through Heterochromatin Spreading.” PLoS Genetics 6 (3): el 000869.
Guillon, Noelle, Franck Tirode, Valentina Boeva, Andrei Zynovyev, Emmanuel Barillot, and Olivier Delattre. 2009. “The Oncogenic EWS-FLI1 Protein Binds in Vivo GGAA Microsatellite Sequences with Potential Transcriptional Activation Function.” PloS One 4 (3): e4932.
Heinz, Sven, Christopher Benner, Nathanael Spann, Eric Bertolino, Yin C. Lin, Peter Laslo, Jason X. Cheng, Cornells Murre, Harinder Singh, and Christopher K. Glass. 2010. “Simple Combinations of Lineage-Determining Transcription Factors Prime Cis-Regulatory Elements Required for Macrophage and B Cell Identities.” Molecular Cell 38 (4): 576-89.
Holtzman, Liad, and Charles A. Gersbach. 2018. “Editing the Epigenome: Reshaping the Genomic Landscape.” Annual Review of Genomics and Human Genetics 19 (August): 43-71.
Jachowicz, Joanna W., Xinyang Bing, Julien Pontabry, Ana Boskovic, Oliver J. Rando, and Maria-Elena Torres-Padilla. 2017. “LINE-1 Activation after Fertilization Regulates Global Chromatin Accessibility in the Early Mouse Embryo.” Nature Genetics. 2017 Oct;49(10): 1502-1510.
Joung, J. Keith, Daniel F. Voytas, and Joanne Kamens. 2015. “Accelerating Research through Reagent Repositories: The Genome Editing Example.” Genome Biology 16 (November): 255.
Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, et al. 2001. “Initial Sequencing and Analysis of the Human Genome.” Nature 409 (6822): 860-921.
Liao, Yang, Gordon K. Smyth, and Wei Shi. 2014. “featureCounts: An Efficient General Purpose Program for Assigning Sequence Reads to Genomic Features.” Bioinformatics. 30 (7): 923-30.
Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25 (14): 1754-60.
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550.
Maeder, Morgan L., Samantha J. Linder, Vincent M. Cascio, Yanfang Fu, Quan H. Ho, and J. Keith Joung. 2013. “CRISPR RNA-Guided Activation of Endogenous Human Genes ” Nature Methods 10 (10): 977-79.
Maeder, Morgan L., Stacey Thibodeau-Beganny, Anna Osiak, David A. Wright, Reshma M. Anthony, Magdalena Eichtinger, Tao Jiang, et al. 2008. “Rapid ‘Open-Source’ Engineering of Customized Zinc-Finger Nucleases for Highly Efficient Gene Modification.” Molecular Cell 31 (2): 294-301.
Maeder, Morgan L., Stacey Thibodeau-Beganny, Jeffry D. Sander, Daniel F. Voytas, and J. Keith Joung. 2009. “Oligomerized Pool Engineering (OPEN): An ‘Open-Source’ Protocol for Making Customized Zinc-Finger Arrays.” Nature Protocols 4 (10): 1471-1501.
Malik, Indranil, Chase P. Kelley, Eric T. Wang, and Peter K. Todd. 2021. “Molecular Mechanisms Underlying Nucleotide Repeat Expansion Disorders.” Nature Reviews. Molecular Cell Biology 22 (9): 589-607. Margolin, J. F., J. R. Friedman, W. K. Meyer, H. Vissing, H. J. Thiesen, and F. J. Rauscher 3rd. 1994. “Kriippel-Associated Boxes Are Potent Transcriptional Repression Domains.” Proceedings of the National Academy of Sciences of the United States of America 91 (10): 4509-13.
Mikkelsen, Tarjei S., Manching Ku, David B. Jaffe, Biju Issac, Erez Lieberman, Georgia Giannoukos, Pablo Alvarez, et al. 2007. “Genome-Wide Maps of Chromatin State in Pluripotent and Lineage-Committed Cells.” Nature 448 (7153): 553-60.
Moore, M., A. Klug, and Y. Choo. 2001. “Improved DNA Binding Specificity from Polyzinc Finger Peptides by Using Strings of Two-Finger Units.” Proceedings of the National Academy of Sciences of the United States of America 98 (4): 1437-41.
Moscou, Matthew J., and Adam J. Bogdanove. 2009. “A Simple Cipher Governs DNA Recognition by TAL Effectors.” Science 326 (5959): 1501.
Payer, Lindsay M., and Kathleen H. Burns. 2019. “Transposable Elements in Human Genetic Disease.” Nature Reviews. Genetics 20 (12): 760-72.
Pearson, Helen. 2008. “Protein Engineering: The Fate of Fingers.” Nature 455 (7210): 160-64.
Pehrsson, Erica C., Mayank N. K. Choudhary, Vasavi Sundaram, and Ting Wang. 2019. “The Epigenomic Landscape of Transposable Elements across Normal Human Development and Anatomy.” Nature Communications 10 (1): 5640.
Perez-Pinera, Pablo, D. Dewran Kocak, Christopher M. Vockley, Andrew F. Adler, Ami M. Kabadi, Lauren R. Polstein, Pratiksha I. Thakore, et al. 2013. “RNA-Guided Gene Activation by CRISPR-Cas9-Based Transcription Factors.” Nature Methods 10 (10): 973-76.
Pohl, Andy, and Miguel Beato. 2014. “Bwtool: A Tool for bigWig Files.” Bioinformatics 30 (11): 1618-19.
Qi, Lei S., Matthew H. Larson, Luke A. Gilbert, Jennifer A. Doudna, Jonathan S. Weissman, Adam P. Arkin, and Wendell A. Lim. 2013. “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression.” Cell 2013 Feb 28;152(5): 1173-83
Quinlan, Aaron R., Royden A. Clark, Svetlana Sokolova, Mitchell L. Leibowitz, Yujun Zhang, Matthew E. Hurles, Joshua C. Mell, and Ira M. Hall. 2010. “Genome-Wide Mapping and Assembly of Structural Variant Breakpoints in the Mouse Genome.” Genome Research 20 (5): 623-35.
Riggi, Nicolo, Birgit Knoechel, Shawn M. Gillespie, Esther Rheinbay, Gaylor Boulay, Mario L. Suva, Nikki E. Rossetti, et al. 2014. “EWS-FLI1 Utilizes Divergent Chromatin Remodeling Mechanisms to Directly Activate or Repress Enhancer Elements in Ewing Sarcoma.” Cancer Cell 26 (5): 668-81.
Sander, Jeffry D., Elizabeth J. Dahlborg, Mathew J. Goodwin, Lindsay Cade, Feng Zhang, Daniel Cifuentes, Shaun J. Curtin, et al. 2011. “Selection-Free Zinc-Finger- Nuclease Engineering by Context-Dependent Assembly (CoDA).” Nature Methods 8 (1): 67-69.
Sander, Jeffry D., and J. Keith Joung. 2014. “CRISPR-Cas Systems for Editing, Regulating and Targeting Genomes.” Nature Biotechnology 32 (4): 347-55.
Sander, Jeffry D., Morgan L. Maeder, Deepak Reyon, Daniel F. Voytas, J. Keith Joung, and Drena Dobbs. 2010. “ZiFiT (Zinc Finger Targeter): An Updated Zinc Finger Engineering Tool.” Nucleic Acids Research 38 (Web Server issue): W462-68.
Sawaya, Sterling, Andrew Bagshaw, Emmanuel Buschiazzo, Pankaj Kumar, Shantanu Chowdhury, Michael A. Black, and Neil Gemmell. 2013. “Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements.” PloS One 8 (2): e54710.
Scholze, Heidi, and Jens Boch. 2011. “TAL Effectors Are Remote Controls for Gene Activation.” Current Opinion in Microbiology 14 (1): 47-53.
Streubel, Jana, Christina Blucher, Angelika Landgraf, and Jens Boch. 2012. “TAL Effector RVD Specificities and Efficiencies.” Nature Biotechnology 30 (7): 593-95.
Subramanian, Subbaya, Rakesh K. Mishra, and Lalji Singh. 2003. “Genome-Wide Analysis of Microsatellite Repeats in Humans: Their Abundance and Density in Specific Genomic Regions.” Genome Biology 4 (2): R13.
Tak, Y. Esther, Benjamin P. Kleinstiver, James K. Nunez, Jonathan Y. Hsu, Joy E.
Horng, Jingyi Gong, Jonathan S. Weissman, and J. Keith Joung. 2017. “Inducible and Multiplex Gene Regulation Using CRISPR-Cpfl -Based Transcription Factors.” Nature Methods 14 (12): 1163-66.
Ting, David T., Doron Lipson, Suchismita Paul, Brian W. Brannigan, Sara Akhavanfard, Erik J. Coffman, Gianmarco Contino, et al. 2011. “Aberrant Overexpression of Satellite Repeats in Pancreatic and Other Epithelial Cancers.” Science 331 (6017): 593-96.
Trost, Brett, Worrawat Engchuan, Charlotte M. Nguyen, Bhooma Thiruvahindrapuram, Egor Dolzhenko, Ian Backstrom, Mila Mirceta, et al. 2020. “Genome-Wide Detection of Tandem DNA Repeats That Are Expanded in Autism.” Nature 2020 Oct;586(7827): 80-86.
Usdin, Karen. 2008. “The Biological Effects of Simple Tandem Repeats: Lessons from the Repeat Expansion Diseases.” Genome Research 18 (7): 1011-19.
Wright, David A., Stacey Thibodeau-Beganny, Jeffry D. Sander, Ronnie J. Winfrey, Andrew S. Hirsh, Magdalena Eichtinger, Fengli Fu, et al. 2006. “Standardized Reagents and Protocols for Engineering Zinc Finger Nucleases by Modular Assembly.” Nature Protocols 1 (3): 1637-52.
Yarrington, Robert M., Surbhi Verma, Shaina Schwartz, Jonathan K. Trautman, and Dana Carroll. 2018. “Nucleosomes Inhibit Target Cleavage by CRISPR-Cas9 in Vivo.” Proceedings of the National Academy of Sciences of the United States of America 115 (38): 9351-58.
Zeitler, Bryan, Steven Froelich, Kimberly Marlen, David A. Shivak, Qi Yu, Davis Li, Jocelynn R. Pearl, et al. 2019. “Allele-Selective Transcriptional Repression of Mutant HTT for the Treatment of Huntington’s Disease.” Nature Medicine 25 (7): 1131-42.
Zhang, Yong, Tao Liu, Clifford A. Meyer, Jerome Eeckhoute, David S. Johnson, Bradley E. Bernstein, Chad Nusbaum, et al. 2008. “Model-Based Analysis of ChlP-Seq (MACS).” Genome Biology 9 (9): R137.
OTHER EMBODIMENTS
It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:
1. An engineered zinc finger array comprising 6 zinc finger recognition regions, wherein the zinc finger array binds a target sequence of GGAAGGAAGGAAGGAAGG (SEQ ID NO:4) or AAGGAAGGAAGGAAGGAA (SEQ ID NO: 5).
2. The engineered zinc finger array of claim 1 , wherein the engineered zinc finger array comprises at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% sequence identity to an amino acid sequence set forth in any one of SEQ ID NOs:24-39.
3. The engineered zinc finger array of claim 1, wherein the engineered zinc finger array comprises the amino acid sequence set forth in SEQ ID NO:30.
4. An isolated cell comprising the zinc finger array according to any one of claims 1 to 3.
5. An isolated nucleic acid encoding the zinc finger array according to any one of claims 1 to 3.
6. A vector comprising the isolated nucleic acid of claim 5.
7. A fusion protein comprising the zinc finger array according to any one of claims 1 to 3 fused to a heterologous functional domain, with an optional intervening linker, wherein the linker does not interfere with activity of the fusion protein.
8. The fusion protein of claim 7, wherein the heterologous functional domain is a transcriptional silencer or transcriptional repression domain.
9. The fusion protein of claim 8, wherein the transcriptional repression domain is a Krueppel-associated box (KRAB) domain, ERF repressor domain (ERD), or mSin3 A interaction domain (SID).
10. The fusion protein of claim 8, wherein the transcriptional silencer is Heterochromatin Protein 1 (HP1).
11. An isolated cell comprising the fusion protein according to any one of claims 7 to 10.
12. An isolated nucleic acid encoding the fusion according to any one of claims 7 to 10.
13. A vector comprising the isolated nucleic acid of claim 12.
14. A method of reducing aberrant gene expression driven by activation of GGAA-microsatellites in a cell, the method comprising contacting the cell with an effective amount of the fusion protein of claims 7-10, or the isolated nucleic acid of claim 12.
15. A method of treating a subject who has a disease associated with aberrant gene expression driven by activation of GGAA-microsatellites in a cell, the method comprising administering to the subject an effective amount of a composition comprising the fusion protein of claims 7-10, or the isolated nucleic acid of claim 12.
16. The method of claims 14 or 15, wherein the subject has Ewing sarcoma.
17. The method of claim 15, wherein the composition is administered by injection into or near a tumor, or by application after surgical resection.
18. The method of claim 15, wherein the composition is administered by injection into or near a tumor, or by application before surgical resection.
19. The method of any one of claims 15-18, further comprising treating a subject with one or more chemotherapy agents.
20. The method of claim 19, wherein the chemotherapy is one of vincristine, doxorubicin, cyclophosphamide, ifosfamide, etoposide, or a combination thereof.
21. The method of any one of claims 15-20, wherein the composition is administered before radiation.
PCT/US2023/017257 2022-04-04 2023-04-03 Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers WO2023196220A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263327175P 2022-04-04 2022-04-04
US63/327,175 2022-04-04

Publications (3)

Publication Number Publication Date
WO2023196220A2 true WO2023196220A2 (en) 2023-10-12
WO2023196220A3 WO2023196220A3 (en) 2023-11-23
WO2023196220A9 WO2023196220A9 (en) 2024-02-01

Family

ID=88243374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/017257 WO2023196220A2 (en) 2022-04-04 2023-04-03 Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers

Country Status (1)

Country Link
WO (1) WO2023196220A2 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10465187B2 (en) * 2017-02-06 2019-11-05 Trustees Of Boston University Integrated system for programmable DNA methylation
US20220193210A1 (en) * 2018-02-02 2022-06-23 Danmarks Tekniske Universitet Therapeutics for autoimmune kidney disease: synthetic antigens
US11041155B2 (en) * 2018-05-17 2021-06-22 The General Hospital Corporation CCCTC-binding factor variants

Also Published As

Publication number Publication date
WO2023196220A9 (en) 2024-02-01
WO2023196220A3 (en) 2023-11-23

Similar Documents

Publication Publication Date Title
US11891631B2 (en) Transcription activator-like effector (tale) - lysine-specific demethylase 1 (LSD1) fusion proteins
US20200291424A1 (en) Targeted deletion of cellular dna sequences
US10093910B2 (en) Engineered CRISPR-Cas9 nucleases
EP3504327B1 (en) Engineered target specific nucleases
US11110154B2 (en) Methods and compositions for treating Huntington&#39;s Disease
JP6793547B2 (en) Optimization Function Systems, methods and compositions for sequence manipulation with the CRISPR-Cas system
KR102455623B1 (en) An engineered guide RNA for the optimized CRISPR/Cas12f1 system and use thereof
EP2879693B1 (en) Dna modifying fusion proteins and methods of use thereof
EP3676287A1 (en) Fusion proteins for improved precision in base editing
JP2022526695A (en) Inhibition of unintentional mutations in gene editing
US11618780B2 (en) Composition and method for activating latent human immunodeficiency virus (HIV)
CN111954540A (en) Engineered target-specific nucleases
JP2024073630A (en) Novel transcription activators
KR20200135225A (en) Single base editing proteins and composition comprising the same
US20240209396A1 (en) Small cas proteins and uses thereof
CN109207518B (en) Drug-inducible CRISPR/Cas9 system for gene transcription activation
US20220064237A1 (en) Htt repressors and uses thereof
WO2023196220A2 (en) Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers
US20240254464A1 (en) Cleavage-inactive cas12f1, cleavage-inactive cas12f1-based fusion protein, crispr gene-editing system comprising same, and preparation method and use thereof
EP4368713A1 (en) Cleavage-inactive cas12f1, cleavage-inactive cas12f1-based fusion protein, crispr gene-editing system comprising same, and preparation method and use thereof
CN112654711B (en) Composition of Cas protein inhibitor and application
US20230045095A1 (en) Compositions, Methods and Systems for the Delivery of Gene Editing Material to Cells
RU2800921C2 (en) New transcription activator
KR102691097B1 (en) Target Specific CRISPR variants
JP2024521368A (en) Polypeptides Translated by Circular RNA Circ-ACE2 and Uses Thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23785217

Country of ref document: EP

Kind code of ref document: A2