US20210102213A1 - CCCTC-Binding Factor Variants - Google Patents

CCCTC-Binding Factor Variants Download PDF

Info

Publication number
US20210102213A1
US20210102213A1 US17/118,378 US202017118378A US2021102213A1 US 20210102213 A1 US20210102213 A1 US 20210102213A1 US 202017118378 A US202017118378 A US 202017118378A US 2021102213 A1 US2021102213 A1 US 2021102213A1
Authority
US
United States
Prior art keywords
seq
cbs
ctcf
amino acid
acid sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/118,378
Inventor
Rebecca Tayler COTTMAN
J. Keith Joung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Hospital Corp
Original Assignee
General Hospital Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Hospital Corp filed Critical General Hospital Corp
Priority to US17/118,378 priority Critical patent/US20210102213A1/en
Publication of US20210102213A1 publication Critical patent/US20210102213A1/en
Assigned to THE GENERAL HOSPITAL CORPORATION reassignment THE GENERAL HOSPITAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JOUNG, J. KEITH, COTTMAN, Rebecca Tayler
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • C12N15/1135Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing against oncogenes or tumor suppressor genes
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/46Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
    • C07K14/47Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
    • C07K14/4701Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
    • C07K14/4702Regulators; Modulating activity
    • C07K14/4703Inhibitors; Suppressors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1044Preparation or screening of libraries displayed on scaffold proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/40Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation
    • C07K2319/43Fusion polypeptide containing a tag for immunodetection, or an epitope for immunisation containing a FLAG-tag
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/80Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor
    • C07K2319/81Fusion polypeptide containing a DNA binding domain, e.g. Lacl or Tet-repressor containing a Zn-finger domain for DNA binding

Definitions

  • the invention relates, at least in part, to engineered CCCTC-binding factor variants with altered DNA-binding specificities.
  • CCCTC-binding factor is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression (Ong and Corces, Nature Reviews Genetics (2014); Phillips and Corces, Cell (2009)).
  • CTCF maintains topologically associated domains (TADs) spanning MBs of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters (Ali et al., Current Opinion in Genetics & Development (2016); Nora et al., Nature (2012); Rao et al., Cell (2014)).
  • CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting homologous recombination repair at double-strand breaks (Shukla et al., Nature (2011); Hilmi et al., Science Advances (2017); Han et al., Scientific Reports (2016)).
  • CTCF binds throughout the genome via an 11 finger zinc finger (ZF) array that recognizes CTCF binding sites (CBSs).
  • the CBS is typically 40 bp in length with a highly conserved 15 bp core sequence.
  • the present invention is based, at least in part, on the development of engineered CTCF variants that can bind to mutant CBSs with higher affinity than a wild-type CTCF.
  • the present invention relates to an engineered CCCTC-binding factor (CTCF) variant including at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) ⁇ 1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.
  • CBS CTCF binding sequence
  • the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that has a Thymine (T), Adenine (A), or Guanine (G) residue at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 position +3.
  • CBS CTCF binding sequence
  • T Thymine
  • A Adenine
  • G Guanine residue at position 2 of the consensus CBS motif
  • the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 position +3.
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a G residue at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), or DHLAT (SEQ ID NO: 17) at ZF7 positions +2 to +6.
  • DHLQT SEQ ID NO: 8
  • EHLNV SEQ ID NO: 9
  • AHLQV SEQ ID NO: 10
  • EHLRE SEQ ID NO: 11
  • DHLQV SEQ ID NO: 12
  • EHLKV SEQ ID NO: 13
  • EHLVV SEQ ID NO: 15
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 3 of the consensus CBS motif, the engineered CTCF at ZF7 positions ⁇ 1 to +3 including: the amino acid sequence RKHD (SEQ ID NO: 173) or RRSD (SEQ ID NO: 174), where the mutant CBS has a T residue at position 3 of the consensus CBS motif; the amino acid sequence RKAD (SEQ ID NO: 175), IPRI (SEQ ID NO: 176), RKHD (SEQ ID NO: 173), or RKDD (SEQ ID NO: 177), where the mutant CBS has a G residue at position 3 of the consensus CBS motif; or the amino acid sequence GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), QALL (SEQ ID NO: 180) or PHRM (SEQ ID NO: 181), where the mutant CBS has a C residue at position 3 of the consensus CBS motif
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or A residue at position 5 of the consensus CBS motif, the engineered CTCF at ZF6 positions +2 to +6 including: the amino acid sequence NAMKR (SEQ ID NO: 30), GNMAR (SEQ ID NO: 182), EGMTR (SEQ ID NO: 183), SNMVR (SEQ ID NO: 184), or NAMRG (SEQ ID NO: 185), where the mutant CBS has a T residue at position 5 of the consensus CBS motif; or the amino acid sequence EHMGR (SEQ ID NO: 31), DHIVINR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THIVINR (SEQ ID NO: 35), where the mutant CBS has a G residue at position 5 of the consensus CBS motif.
  • NAMKR SEQ ID NO: 30
  • GNMAR SEQ ID NO: 182
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 6 of the consensus CBS motif, the engineered CTCF at ZF6 positions ⁇ 1 to +3 including: the amino acid sequence MNES (SEQ ID NO: 36) or HRES (SEQ ID NO: 37), where the mutant CBS has a T residue at position 6 of the consensus CBS motif; or the amino acid sequence RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40), where the mutant CBS has a G residue at position 6 of the consensus CBS motif.
  • the mutant CBS has a G residue at position 6 of the consensus CBS motif.
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 7 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), or DGLRV (SEQ ID NO: 45), where the mutant CBS has a T residue at position 7 of the consensus CBS motif; the amino acid sequence HTLKV (SEQ ID NO: 46), or HGLKV (SEQ ID NO: 41), where the mutant CBS has an A residue at position 7 of the consensus CBS motif; or the amino acid sequence SRLKE (SEQ ID NO: 44), HRLKE (SEQ ID NO: 42) or NRLKE (SEQ ID NO: 47), where the mutant CBS has a C residue at position 7 of the consensus CBS motif.
  • the mutant CBS has a
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 8 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), or HGLIR (SEQ ID NO: 51), where the mutant CBS has a T residue at position 8 of the consensus CBS motif; the amino acid sequence ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), or GGLTR (SEQ ID NO: 55), where the mutant CBS has an A residue at position 8 of the consensus CBS motif; the amino acid sequence HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59), where
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 10 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence AHLRK (SEQ ID NO: 60), wherein the mutant CBS has a T residue at position 10 of the consensus CBS motif; the amino acid sequence AKLRV (SEQ ID NO: 61), EKLRI (SEQ ID NO: 186), or AKLRI (SEQ ID NO: 63), where the mutant CBS has an A residue at position 10 of the consensus CBS motif; or the amino acid sequence TKLKV (SEQ ID NO: 64), wherein the mutant CBS has a C residue at position 10 of the consensus CBS motif.
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 11 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence ATLRR (SEQ ID NO: 66) or RRLDR (SEQ ID NO: 67), where the mutant CBS has a T residue at position 11 of the consensus CBS motif; the amino acid sequence TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), or GNLTR (SEQ ID NO: 70), where the mutant CBS has an A residue at position 11 of the consensus CBS motif; or the amino acid sequence AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74), where the mutant CBS has a C residue at position 11 of the consensus CBS motif.
  • ATLRR SEQ ID NO: 66
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 13 of the consensus CBS motif, the engineered CTCF at ZF3 positions +2 to +6 including: the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), or QQLLI (SEQ ID NO: 79), where the mutant CBS has a T residue at position 13 of the consensus CBS motif; the amino acid sequence GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), GKLVT (SEQ ID NO: 187), TELII (SEQ ID NO: 82) or QGLLV (SEQ ID NO: 83), where the mutant CBS has an A residue at position 13 of the consensus CBS motif; or the amino acid sequence QQLLT (SEQ ID NO: 84
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence AKLKK (SEQ ID NO: 88), AKLRK (SEQ ID NO: 89), AHLRV (SEQ ID NO: 90), AKLRV (SEQ ID NO: 61), or SKLRL (SEQ ID NO: 92) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence ERLRV (SEQ ID NO: 93), NRLKV (SEQ ID NO: 94), SRLKE (SEQ ID NO: 44), or NRLKV (SEQ ID NO: 94) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence RPDT (SEQ ID NO: 38), RTET (SEQ ID NO:
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLRK (SEQ ID NO: 60), or GKLRI (SEQ ID NO: 106) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence SRLKE (SEQ ID NO: 44), DALRR (SEQ ID NO: 108), DGLKR (SEQ ID NO: 109), or TRLRE (SEQ ID NO: 110) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence at RPDTMKR (SEQ ID NO: 188) or RTENMKM (SEQ ID NO: 189) at ZF6 positions ⁇ 1 to +6 of the engineered CTCF; and (i) the amino
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence SNLRR (SEQ ID NO: 116), GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLKR (SEQ ID NO: 119), ANLRR (SEQ ID NO: 69), NNLRR (SEQ ID NO: 121), or TNLRR (SEQ ID NO: 68) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMKR (SEQ ID NO: 123), EHMAR (SEQ ID NO: 34), THMKR (SEQ ID NO: 33), EHMNR (SEQ ID NO: 126), or EHMAR (SEQ ID NO: 127) at ZF6 positions +2 to +6 of the engineered CTCF;
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLAR (SEQ ID NO: 138), GNLMR (SEQ ID NO: 139), ANLRR (SEQ ID NO: 69), SNLRR (SEQ ID NO: 116), or NNLRR (SEQ ID NO: 121) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHIVI
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, T, and T residues at positions 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLKK (SEQ ID NO: 159), TKLRL (SEQ ID NO: 160), TKLKL (SEQ ID NO: 161), GHLRK (SEQ ID NO: 162), THLKK (SEQ ID NO: 163), or AHLRK (SEQ ID NO: 60) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence TRLKE (SEQ ID NO: 165) or SRLKE (SEQ ID NO: 44) at ZF5 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence RADN (SEQ ID NO: 167), RHDT (SEQ ID NO: 40),
  • the engineered CTCF variant includes at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) ⁇ 1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.
  • CBS CTCF binding sequence
  • the engineered CCCTC-binding factor (CTCF) variant that binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that differs from a consensus CBS at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 +3 position.
  • CBS CTCF binding sequence
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C-to-G mutation at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), DHLQV (SEQ ID NO: 12), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), DHLAT (SEQ ID NO: 17), or DHLQT (SEQ ID NO: 8) at ZF7 positions +2 to +6.
  • DHLQT SEQ ID NO: 8
  • EHLNV SEQ ID NO: 9
  • AHLQV SEQ ID NO: 10
  • EHLRE SEQ ID NO: 11
  • DHLQV SEQ ID NO: 12
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 3 of the consensus CBS motif, the engineered CTCF including the amino acid sequence RKHD (SEQ ID NO: 173), RRSD (SEQ ID NO: 174), GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), or PHRM (SEQ ID NO: 181) at ZF7 positions ⁇ 1 to +3.
  • RKHD SEQ ID NO: 173
  • RRSD SEQ ID NO: 174
  • GIVN SEQ ID NO: 178
  • ELLN SEQ ID NO: 179
  • PHRM SEQ ID NO: 181
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 5 of the consensus CBS motif, the engineered CTCF including the amino acid sequence NAMKR (SEQ ID NO: 30), EHMGR (SEQ ID NO: 31), DHMNR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THMNR (SEQ ID NO: 35) at ZF6 positions +2 to +6.
  • NAMKR SEQ ID NO: 30
  • EHMGR SEQ ID NO: 31
  • DHMNR SEQ ID NO: 32
  • THMKR SEQ ID NO: 33
  • EHMRR SEQ ID NO: 34
  • THMNR SEQ ID NO: 35
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 6 of the consensus CBS motif, the engineered CTCF including the amino acid sequence MNES (SEQ ID NO: 36), HRES (SEQ ID NO: 37), RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40) at ZF6 positions ⁇ 1 to +3.
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 7 of the consensus CBS motif, the engineered CTCF including the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), DGLRV (SEQ ID NO: 45), HTLKV (SEQ ID NO: 46), or NRLKE (SEQ ID NO: 47) at ZF5 positions +2 to +6.
  • the engineered CTCF including the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), DGLRV (SEQ ID NO: 45), HTLKV (SEQ ID NO: 46), or NRLKE (SEQ ID NO: 47) at ZF5 positions +2 to +6.
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 8 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), HGLIR (SEQ ID NO: 51), ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), GGLTR (SEQ ID NO: 55), HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59) at ZF5 positions +2 to +6.
  • ATLKR SEQ ID NO: 48
  • QALRR SEQ ID NO: 49
  • GGLVR SEQ ID NO: 50
  • HGLIR SEQ ID NO: 51
  • ANLSR SEQ ID NO
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 10 of the consensus CBS motif, the engineered CTCF including the amino acid sequence AHLRK (SEQ ID NO: 60), AKLRV (SEQ ID NO: 61), GGLGL (SEQ ID NO: 62), AKLRI (SEQ ID NO: 63), TKLKV (SEQ ID NO: 64), or SKLRV (SEQ ID NO: 65) at ZF4 positions +2 to +6.
  • AHLRK SEQ ID NO: 60
  • AKLRV SEQ ID NO: 61
  • GGLGL SEQ ID NO: 62
  • AKLRI SEQ ID NO: 63
  • TKLKV SEQ ID NO: 64
  • SKLRV SEQ ID NO: 65
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 11 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLRR (SEQ ID NO: 66), RRLDR (SEQ ID NO: 67), TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), GNLTR (SEQ ID NO: 70), AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74) at ZF4 positions +2 to +6.
  • ATLRR SEQ ID NO: 66
  • RRLDR SEQ ID NO: 67
  • TNLRR SEQ ID NO: 68
  • ANLRR SEQ ID NO: 69
  • GNLTR SEQ ID NO: 70
  • AMLKR SEQ ID NO: 71
  • HMLTR SEQ ID NO: 72
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 13 of the consensus CBS motif, the engineered CTCF including the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), QQLLI (SEQ ID NO: 79), GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), TELII (SEQ ID NO: 82), QGLLV (SEQ ID NO: 83), QQLLT (SEQ ID NO: 84), GQLLT (SEQ ID NO: 85), GELLT (SEQ ID NO: 86), or QQLLI (SEQ ID NO: 79) at ZF3 positions +2 to +6.
  • the engineered CTCF including the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:
  • EHMNR amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHMNR (SEQ ID NO: 32) at ZF6 positions +2 to +6;
  • the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • the engineered CTCF variant interacts with cohesion to mediate the formation of an enhancer-promoter loop to modulate gene expression.
  • the invention features a method of treating a subject in need thereof, the method including administering to the subject a therapeutically effective amount of an engineered CTCF variant described herein.
  • the subject can have cancer.
  • the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting the engineered CTCF according to any one of claims 1 - 15 with a sequence of interest in the gene, such that the expression of the gene is regulated.
  • the invention features a pharmaceutical composition including an engineered CTCF variant described herein.
  • the invention features a gene expression system for regulation of a gene, the system including a nucleic acid encoding an engineered CTCF variant according described herein.
  • the invention features a method of altering the structure of chromatin including contacting an engineered CTCF variant according to any one of claims 1 - 15 with a sequence of interest to form a binding complex, such that the structure of the chromatin is altered.
  • the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting the CBS bearing one or more mutations with an engineered CTCF variant described herein.
  • the invention features a kit including an engineered CTCF variant described herein.
  • FIG. 1 Diagram of an exemplary 11-finger CTCF zinc finger array protein-DNA interactions at the CTCF binding site.
  • Each zinc finger of the 11-finger array contained a recognition alpha-helix where protein-DNA base contacts were made by amino acids in position ⁇ 1, 2, 3 and 6 of each alpha-helix.
  • position ⁇ 1, 3, and 6 were only depicted as positon 2 makes a cross strand contact with the opposite strand of the binding site that is not shown here.
  • the sequence for the binding site was derived from ChIP-seq data (Nakahashi et al., 2013). The binding site was partitioned into three segments: 5′ flanking (gray-line), core (black-line), and 3′ flanking (light gray line).
  • each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Zinc fingers 3-7 of the array (white) make protein-DNA contacts with the core sequence (bold, black lined). There was a possible 5-6 base pair gap (represented by horizontal dashed lines) between zinc finger 8 and zinc fingers 9-11 as suggested by ChIP-exo and DNAse I footprinting of CTCF bound DNA fragments (Hashimoto, H. et al., 2017). Note CTCF binds to its target site in the 3′-5′ direction with the N-terminal side of the protein binding to the 3′ end of the binding site. FIG. 1 discloses SEQ ID NO: 5544.
  • FIG. 2 Diagram of B2H Beta-galactosidase reporter assay.
  • the B2H reporter assay used Gal11P-mediated recruitment of Gal4 to indicate binding.
  • E. coli is transformed with two plasmids: one plasmid encoded for both a zinc finger-Gal11P fusion and an alpha N-terminal domain of RNA polymerase ( ⁇ -NTD)-Gal4 fusion; the second plasmid contained a modifiable binding sequence upstream of a weak promoter that drives the expression of the lacZ gene, which encodes for ⁇ -galactosidase.
  • ⁇ -NTD RNA polymerase
  • a zinc finger-Gal11P fusion that was able to bind to the target sequence recruited the ⁇ -NTD-Gal4 fusion to the promoter, thereby inducing the expression of lacZ.
  • This increase in ⁇ -galactosidase levels was detected by a simple colorimetric ONPG-based assay.
  • the CTCF zinc finger array-gal11P fusion was bound to a CTCF binding site in this diagram, recruiting the ⁇ -NTD-Gal4 fusion to the promoter region upstream of lacZ, leading to expression.
  • FIG. 3 Fold activation in the B2H B-gal assay was greatest when CTCF zinc fingers 1-11 of 11 finger array interacts with full length target site.
  • Five target sites (sequence indicated in the legend) were tested along with the full CTCF zinc finger array and four different subsets (indicated on the x-axis).
  • the core sequence black and bolded which is the most highly conserved sequence of CTCF binding sites was tested independently and with different quantities of flanking sequence as derived from Hashimoto, H. et al. Mol. Cell. 2017 (black and light gray); Persikov, A and Singh, M. NAR. 2014 (medium gray); and Nakahashi, H. et al., Cell Rep. 2013 (very light gray and dark gray).
  • Positive control reflects binding activity of a known 3-finger zinc finger that binds strongly in bacterial and human contexts to a known sequence.
  • the negative control reflects baseline beta-galactosidase levels when the alpha N-terminal domain of RNA polymerase ( ⁇ -NTD)-Gal4 fusion is not directly recruited to the promoter of lacZ. This baseline was used to calculate fold activation when the CTCF zinc finger array is fused to gal11P.
  • FIG. 3 discloses SEQ ID NOS 5545-5548 and 5544, respectively, in order of appearance.
  • FIG. 4 CTCF zinc finger array is sensitive to sequence changes at certain positions of the core region within the CTCF binding site.
  • Each of the four possible nucleotides at each position of the 40 bp reference CBS were tested for ability to bind the CTCF zinc finger array in the B2H y.
  • Fold activation reflects binding activity above background ⁇ -galactosidase levels (Background ⁇ -gal levels are obtained from the levels of ⁇ -gal from samples with each binding site in the presence of the gal4-RNA polymerase fusion with no zinc finger array fused to gal11P).
  • the reference sequence above is partitioned into three segments: 5′ flanking (dark gray lined), core (black lined), and 3′ flanking (gray lined).
  • each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Core sequence 1-15 of the binding site (black, bold) interacts with zinc finger 3-7 of the array (white, black outline) and appear to be most sensitive to changes in the binding sequence. Alterations to the 5′ flanking sequence as well as the 3′ flanking sequence did not negatively impact binding.
  • FIG. 4 discloses SEQ ID NO: 5544.
  • FIG. 5 Maximizing binding potential of the CTCF binding site. Modifications were made to the reference binding site (bottom) to combine nucleotide changes that, individually, showed increased binding activity of the CTCF zinc finger array. The core sequence motif is bold while changes made are underlined. Binding activity of the 11-finger CTCF zinc finger array was quantified in the B2H Beta-galactosidase reporter assay in triplicate. Fold activation reflects binding activity above background levels when no DNA binding protein is present.
  • FIG. 5 discloses SEQ ID NOS 5549-5550 and 5544, respectively, in order of appearance.
  • FIG. 6 Diagram of B2H Beta-lactamase inhibitor selection.
  • the selection system contained the same components as the reporter system except successful binding of the zinc finger array to the CBS drove BlaC expression, an inhibitor of the beta-lactamase class of antibiotics, instead of lacZ. Expression of BlaC allowed for growth on Carbenicillin plates. The selection was driven by the addition of Clavulanic acid, an inhibitor of beta lactamase inhibitors. Low level expression of BlaC can result in growth on Carbenicillin plates, but the addition of clavulanic acid inhibits BlaC activity and results in the depletion of false positives and further enrichment of strong binders to any modification made to the binding site.
  • FIGS. 7A-7C Binding activity of variants on altered CTCF binding sites. Variants picked from the high stringency gradient of the selective plates were tested for binding activity on sequences representing all four possible nucleotides at position 2 of the core sequence (gray star). Amino acid sequence of variants pulled out of the selection were listed above the heat map and the nucleotide present at position 2 of the core sequence was indicated on the y-axis.
  • FIG. 7A The nucleotide at position 2 is T.
  • FIG. 7B The nucleotide at position 2 is A.
  • FIG. 7C The nucleotide at Binding was quantified by the beta-galactosidase reporter system and colorimetric ONPG assay.
  • FIGS. 7A-C disclose “RKSXLGV” as SEQ ID NO: 5551.
  • FIG. 8 Increasing the variation within the recognition helix produced stronger binders.
  • Four amino acids were targeted for variance in the library to allow for more flexibility in the selection and generate stronger binders to the modified binding site of choice.
  • ZF7 targeting a C:G change at position 2 (gray star) of the core sequence was selected for variants using the expanded approach.
  • Each amino acid codon was replaced with ‘VNS’ codons at the indicated sites (‘X’). Twelve colonies were picked from the high-stringency end of the selection and tested for their ability to bind to the CTCF binding site when the indicated nucleotide is at positon 2 of core sequence.
  • Amino acid sequence of the variants selected are listed on the x-axis and the nucleotide at position two of the core sequence is on the y-axis. Wild-type zinc finger array binding activity on wild-type binding sequence is indicated by the white dot.
  • RKSXLGV as SEQ ID NO: 5551, “AHLQV” as SEQ ID NO: 10, “DHLRT” as SEQ ID NO: 16, “DHLAT” as SEQ ID NO: 17, “DHLQT” as SEQ ID NO: 8, “DHLQV” as SEQ ID NO: 12, “SDLGV” as SEQ ID NO: 5552, “EHLKV” as SEQ ID NO: 13, “EHLVV” as SEQ ID NO: 15, “EHLNV” as SEQ ID NO: 9 and “EHLRE” as SEQ ID NO: 11.
  • FIGS. 9A-9C Selected variants binding altered binding sites sequence at position 3 of core motif in CBS. Selections performed on library of variants centered around alterations in position ⁇ 1 to 3 of recognition helix in ZF7 of the 11 finger CTCF zinc finger array. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 3 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines.
  • A Selections performed on A:T change in the binding site,
  • B A:G change,
  • C A:C change. Most variants pulled out had relaxed binding specificity instead of altered specificity.
  • FIGS. 9A-9C Selected variants binding altered binding sites sequence at position 3 of core motif in CBS.
  • RKSD as SEQ ID NO: 711
  • RKHD as SEQ ID NO: 173
  • RRSD as SEQ ID NO: 174
  • RKAD as SEQ ID NO: 175
  • IPRI as SEQ ID NO: 176
  • RKDD as SEQ ID NO: 177
  • QALL as SEQ ID NO: 180
  • PHRM as SEQ ID NO: 181
  • ELLN as SEQ ID NO: 179
  • GIVN as SEQ ID NO: 178.
  • FIGS. 10A-10B Selections performed targeting sequence changes at position 5 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of the ZF6 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 5 of the core motif in the core motif of the CBS (gray star). Direct protein-DNA contacts were indicated by dashed lines.
  • A Selections performed on C:T change in the binding site
  • B C:G change. No variants grew beyond the low stringency end of the gradient on selection plates for C:A change and were considered weak/insufficient binders.
  • FIGS. 10A-B disclose “GNMAR” as SEQ ID NO: 182, “NAMKR” as SEQ ID NO: 30, “EGMTR” as SEQ ID NO: 183, “NAMRG” as SEQ ID NO: 185, “GTMKM” as SEQ ID NO: 1255, “SNMVR” as SEQ ID NO: 184, “DHMNR” as SEQ ID NO: 32, “EHMRR” as SEQ ID NO: 34, “EHMGR” as SEQ ID NO: 31, “THMNR” as SEQ ID NO: 35 and “THMKR” as SEQ ID NO: 33.
  • FIGS. 11A-11C Selections performed targeting sequence changes at position 6 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position ⁇ 1 to 3 of ZF6 recognition helix. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 6 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines.
  • A Selections performed on A:T change in the binding site
  • B A:G change
  • C A:C change. Variants analyzed from the A:T selection had relaxed binding profile while variants from A:G selection showed strong binding for only the changed nucleotide. No good binders were identified in the A:C selection.
  • FIGS. 11A-C disclose “MMES” as SEQ ID NO: 36, “QSGT” as SEQ ID NO: 1582, “HRES” as SEQ ID NO: 37, “RHDT” as SEQ ID NO: 40, “RPDT” as SEQ ID NO: 38, “RTDI” as SEQ ID NO: 39, “RADN” as SEQ ID NO: 167 and “ERKS” as SEQ ID NO: 1479.
  • FIGS. 12A-12C Selections performed targeting sequence changes at position 7 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 4 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 7 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS.
  • 12A-C disclose “DGLRV” as SEQ ID NO: 45, “HGLKV” as SEQ ID NO: 41, “HRLKE” as SEQ ID NO: 42, “HALKV” as SEQ ID NO: 43, “YKLKR” as SEQ ID NO: 5553, “SRLKE” as SEQ ID NO: 44, “HTLKV” as SEQ ID NO: 46 and “NRLKE” as SEQ ID NO: 47.
  • FIGS. 13A-13C Selections performed targeting sequence changes at position 8 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 8 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line.
  • A Selections performed on G:T change in the binding site
  • B G:A change
  • C G:C change. Note the different variants that appear with the same library being used to bind to the same changes in the sequence, but in a different position on the binding site.
  • FIGS. 13A-13C Selections performed targeting sequence changes at position 8 of the core motif in the CBS.
  • GGLVR as SEQ ID NO: 50, “QALRR” as SEQ ID NO: 49, “HGLIR” as SEQ ID NO: 51, “YKLKR” as SEQ ID NO: 5553, “ATLKR” as SEQ ID NO: 48, “GGLTR” as SEQ ID NO: 55, “HGLVR” as SEQ ID NO: 54, “ANLSR” as SEQ ID NO: 52, “TGLTR” as SEQ ID NO: 53, “HGLRR” as SEQ ID NO: 59, “ADLKR” as SEQ ID NO: 58, “HTLRR” as SEQ ID NO: 56 and “TVLKR” as SEQ ID NO: 57.
  • FIGS. 14A-14C Selections performed targeting sequence changes at position 10 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 10 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line.
  • A Selections performed on G:T change in the binding site,
  • B G:A change,
  • C G:C change. G:C selection did not produce any growth at the high stringency end of the gradient selective plates.
  • FIGS. 14A-C disclose “GHLRK” as SEQ ID NO: 162, “AKLRL” as SEQ ID NO: 3311, “AHLRK” as SEQ ID NO: 60, “SKLKR” as SEQ ID NO: 3470, “GGLGL” as SEQ ID NO: 62, “AKLRI” as SEQ ID NO: 63, “AKLRV” as SEQ ID NO: 61, “EKLRI” as SEQ ID NO: 186, “SKLRV” as SEQ ID NO: 65 and “TKLKV” as SEQ ID NO: 64.
  • FIGS. 15A-15C Selections performed targeting sequence changes at position 11 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 11 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS.
  • FIGS. 16A-16C Selections performed targeting sequence changes at position 13 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF3 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 13 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line.
  • A Selections performed on G:T change in the binding site
  • B G:A change
  • C G:C change.
  • 16A-C disclose “QQLLI” as SEQ ID NO: 79, “QQLLV” as SEQ ID NO: 77, “QQLIV” as SEQ ID NO: 75, “GELVV” as SEQ ID NO: 78, “GELVR” as SEQ ID NO: 5554, “SQLIV” as SEQ ID NO: 76, “QGLLV” as SEQ ID NO: 83, “GQLTV” as SEQ ID NO: 81, “GQLIV” as SEQ ID NO: 80, “GKLVT” as SEQ ID NO: 187, “TELII” as SEQ ID NO: 82, “GQLLT” as SEQ ID NO: 85, “QQLLT” as SEQ ID NO: 84, “GELLT” as SEQ ID NO: 86 and “ATLAD” as SEQ ID NO: 5555.
  • FIG. 17 Binding activity of multi-finger variants on multiple sequence changes to the CBS.
  • ZF1-3 and ZF8-11 were unmodified in this library
  • Protein-DNA contacts are indicated by lines between the ZF recognition helices and the CBS sequence. Wild-type CTCF 11-finger zinc finger array binding strength to wild-type CBS is indicated by a white dot.
  • FIG. 17 discloses “CGTGGTGCGAAC” as SEQ ID NO: 5556, “CAAGCGTGGTGCGCT” as SEQ ID NO: 5557, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “ERLRV” as SEQ ID NO: 93, “RPDT” as SEQ ID NO: 38, “DNLLA” as SEQ ID NO: 100, “AKLKK” as SEQ ID NO: 88, “AKLRK” as SEQ ID NO: 89, “NRLKV” as SEQ ID NO: 94, “RTET” as SEQ ID NO: 98, “SNLLV” as SEQ ID NO: 101, “AHLRV” as SEQ ID NO: 90, “SRLKE” as SEQ ID NO: 44, “DNL
  • FIG. 18 Binding activity of multi-finger variants on multiple sequence changes to the CBS. The same selection as before except now there is a C:G change at position 2 of the CBS, where previously there was a C:A change. Variants pulled out of this selection had binding activity on the modified CBS without binding to the wild-type CBS. Wild-type 11-finger ZF array only showed binding activity on wild-type CBS (white dot) and no ability to bind to the modified CBS. Interestingly, the dominant variant selected for in the library contained a mutation that occurs at position 9 of the recognition helix that was either introduced during oligo synthesis (0.05% chance of the wrong nucleotide at each position) or through PCR while constructing these libraries. FIG.
  • FIG. 19 Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG.
  • FIG. 20 Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG.
  • FIG. 21 Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS (white dot), but not the modified one. FIG.
  • FIG. 22 Wild-type CTCF has binding activity to wild-type CTCF target site and no binding activity to two variant target sites.
  • endogenous CTCF binds to the wild-type CBSs and not the variant binding sites, as seen in the B2H assay, in a human cell context.
  • CTCF was assayed for binding to a known CTCF target site and to two endogenous variant binding site sequences using a CTCF specific antibody to enrich for genomic DNA crosslinked to CTCF.
  • Two sets of qPCR primers were designed for each binding site (indicated by 1.1, 1.2, etc).
  • Binding was determined by enrichment of target site above 1% input of crosslinked and sonicated sample not treated with antibody, which is to represent the levels of the site of interest as a fold increase over the frequency of the site of interest in a sample unenriched with antibody.
  • Antibody based enrichment of each sample is quantified by fold enrichment above untreated, and therefore unenriched, input.
  • the negative control reflects background qPCR amplification levels of a target site that CTCF does not bind to. Anything above this negative level is considered enriched indicating CTCF binding while anything below is considered to not be unenriched, and therefore no binding by CTCF. Wild-type CTCF binds to the wild-type target site with no detectable binding to the variant binding sites as predicted by the bacterial B2H reporter assay
  • FIGS. 23A-23B Exogenous wild-type and variant CTCF binding activity in human cells. Two endogenous variant binding site sequences, matching one of the five variant binding sites that CTCF variants were selected on, were identified in the human genome (Variant site 1 and Variant site 2). Both wild-type CTCF with a 3 ⁇ HA tag and one of the 3 ⁇ HA tagged engineered CTCF variants, selected to bind to the variant binding site sequence of Variant site 1 and Variant site 2, were assayed for binding in human cells through ChIP-qPCR.
  • FIG. 23A 3 ⁇ HA tagged wild-type CTCF binds to wild-type CTCF binding site and does not bind to either variant binding site.
  • FIG. 23B 3 ⁇ HA tagged variant CTCF binds to variant binding sites and does not bind to wild-type CTCF binding site.
  • K562 cells expressing variant CTCF tagged with 3 ⁇ HA were analyzed by ChIP-qPCR and treated with HA specific antibody. The same sites as in FIGS. 22 and 23A were investigated for variant CTCF binding.
  • the variant CTCF could bind to the variant sites as indicated by enrichment with variant specific HA antibody and no detectable binding was seen at the wild-type binding site as indicated by lack of HA antibody-based enrichment.
  • FIGS. 24A-24B Changes in gene expression relative to wild-type control of genes located around variant binding sites.
  • a variant CTCF selected to the G3 binding site sequence and variant CTCF selected to the Other binding site sequence were expressed in wild-type K562s.
  • the variant CTCFs were fused to GFP and RNA was isolated from GFP+ cells 72 hours post nucleofection. cDNA was generated from the RNA and quantified by RT-qPCR. Gene expression levels across samples were normalized to a house keeping gene (HPRT). Changes in gene expression are relative to gene expression levels in wild-type K562s expressing wild-type CTCF tagged with GFP.
  • FIG. 24A Changes in gene expression of genes around G3 variant binding site in the presence of variant CTCF relative to the wild-type CTCF control.
  • FIG. 24B Changes in gene expression of genes around Other variant binding site relative to the wild-type control.
  • FIG. 25 Introduction of variant binding sites upstream of MYC leads to reduction of Endogenous MYC expression.
  • the CTCF binding site ⁇ 2 kb upstream of the MYC TSS was replaced with one of six different sequences used for CTCF variant selections (listed in table).
  • the introduction of these sequences with 4-6 nucleotide changes from the wild-type CTCF binding site sequence result in a reduction of endogenous MYC expression to the same levels as when the CTCF binding site is deleted and loop formation is disrupted.
  • WT_6 sequence has 4 point mutations from the native CTCF binding site, but these changes should have no impact on wild-type CTCF binding as indicated by results from the B2H reporter assay.
  • FIG. 25 discloses SEQ ID NOS 5568-5573, respectively, in order of appearance.
  • FIGS. 26A-26B Variant CTCFs are able to bind the engineered G3 variant binding site and recover MYC expression.
  • CTCF variants selected to bind to the G3 variant binding site sequence were expressed in the G3_3.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. G3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed.
  • FIG. 26A Endogenous MYC levels are recovered to wild-type levels in the G3_3 cell line when CTCF variants are expressed.
  • Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of G3_3 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines).
  • FIG. 26B CTCF variants are able to bind to the introduced variant binding site in G3_3 cell line while the wild-type CTCF does not.
  • CTCF Ab specific enrichment captures both wild-type and variant CTCF while HA Ab will only detect HA-tagged CTCF (transiently expressed).
  • exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line.
  • exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding.
  • the exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the G3_3 cell line.
  • ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ⁇ 2 kb upstream of MYC (MYC site).
  • An endogenous G3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively.
  • the variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody, while the wild-type CTCF does not.
  • 26A-B disclose “CAGGGGAGGAGC” as SEQ ID NO: 5564, “DTYKLKR” as SEQ ID NO: 3, “SNLRR” as SEQ ID NO: 116, “GNLRR” as SEQ ID NO: 118, “GNLVR” as SEQ ID NO: 117, “ANLRR” as SEQ ID NO: 69, “GNLMR” as SEQ ID NO: 139, “NNLRR” as SEQ ID NO: 121, “GNLAR” as SEQ ID NO: 138, “SKLKR” as SEQ ID NO: 3470, “EHMKR” as SEQ ID NO: 123, “EHMRR” as SEQ ID NO: 34, “EHMNR” as SEQ ID NO: 126, “SHMNR” as SEQ ID NO: 147, “SHMNR” as SEQ ID NO: 146, “THMKR” as SEQ ID NO: 33, “DHMNR” as SEQ ID NO: 32, “GTMKM” as SEQ ID NO
  • FIGS. 27A-27B Variant CTCFs are able to bind the engineered A3 variant binding site and recover MYC expression.
  • CTCF variants selected to bind to the A3 variant binding site sequence were expressed in the A3_4.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. A3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed.
  • FIG. 27A Endogenous MYC levels are recovered to wild-type levels in the A3_4 cell line when CTCF variants are expressed.
  • Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of A3_4 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines).
  • FIG. 27B CTCF variants are able to bind to the introduced variant binding site in A3_4 cell line while the wild-type CTCF does not.
  • CTCF Ab specific enrichment captures both wild-type and variant CTCF while HA Ab will only detect HA-tagged CTCF (transiently expressed).
  • exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line.
  • exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding.
  • the exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the A3_4 cell line.
  • ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ⁇ 2 kb upstream of MYC (MYC site).
  • An endogenous A3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively.
  • the variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody above the negative control, while the wild-type CTCF does not bind.
  • 27A-B disclose “CAGGGGAGGAAC” as SEQ ID NO: 5562, “DTYKLKR” as SEQ ID NO: 3, “GNLKR” as SEQ ID NO: 119, “GNLVR” as SEQ ID NO: 117, “SNLRR” as SEQ ID NO: 116, “ANLRR” as SEQ ID NO: 69, “GNLRR” as SEQ ID NO: 118, “NNLRR” as SEQ ID NO: 121, “TNLRR” as SEQ ID NO: 68, “SKLKR” as SEQ ID NO: 3470, “EHMNR” as SEQ ID NO: 126, “EHMRR” as SEQ ID NO: 34, “EHMKR” as SEQ ID NO: 123, “THMKR” as SEQ ID NO: 33, “EHMAR” as SEQ ID NO: 127, “GTMKM” as SEQ ID NO: 1255, “DNLLA” as SEQ ID NO: 100, “DNLLV” as SEQ ID
  • FIG. 28 Variant CTCFs recover MYC expression of the Other 10 variant binding site cell line.
  • CTCF variants selected to bind to the Other variant binding site sequence were expressed in the Other 10.K562 cell line.
  • Cells were analyzed for MYC expression 72 hours post nucleofection.
  • Residues of ZF helix of the variant and wild-type CTCFs are listed in the legend.
  • Other binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed.
  • A. Endogenous MYC levels are recovered to wild-type levels in the Other 10 cell line when CTCF variants are expressed.
  • Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Other 10 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines).
  • RKSDLGV as SEQ ID NO: 5
  • CGTGGTGCGACC as SEQ ID NO: 5574
  • TKLRL as SEQ ID NO: 160
  • TNLKK as SEQ ID NO: 163
  • GHLRK as SEQ ID NO: 162
  • TKLKL as SEQ ID NO: 161
  • AHLRK as SEQ ID NO: 60
  • AHLKK as SEQ ID NO: 159
  • SKLKR as SEQ ID NO: 3470
  • SRLKE as SEQ ID NO: 44
  • TRLKE as SEQ ID NO: 165
  • YKLKR as SEQ ID NO: 5553
  • RRDT as SEQ ID NO: 169
  • RPDT as SEQ ID NO: 38
  • RRNDT as SEQ ID NO: 172
  • RADN as SEQ ID NO: 167
  • RHDT as SEQ ID NO: 40
  • QSGT as SEQ ID NO: 1582.
  • FIG. 29 Variant CTCFs recover MYC expression of the Aother_2 variant binding site cell line.
  • CTCF variants selected to bind to the Aother variant binding site sequence were expressed in the Aother_2.K562 cell line.
  • Cells were analyzed for MYC expression 72 hours post nucleofection.
  • Residues of ZF helix of the variant and wild-type CTCFs are listed in the legend.
  • Aother binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed.
  • Endogenous MYC levels are recovered to wild-type levels in the Aother_2 cell line when CTCF variants are expressed.
  • Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Aother_2 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines).
  • CTCF variants with alterations in the zinc finger array can be engineered to recognize CBSs that harbor one or more point mutations, i.e., mutant CBSs.
  • CCCTC-binding factor is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression.
  • CTCF maintains topologically associated domains (TADs) spanning megabases of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters.
  • TADs topologically associated domains
  • CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting homologous recombination repair at double-strand breaks. Wild type CTCF binds throughout the genome via an 11 finger zinc finger array that recognizes canonical CTCF binding sites (CBSs).
  • Wild-type CTCF ZF arrays comprise the following sequences at ZFs 3-6 positions ⁇ 1 to +6:
  • ZF3 positions ⁇ 1 to +6 (SEQ ID NO: 1) TSGELVR ZF4 positions ⁇ 1 to +6: (SEQ ID NO: 2) EVSKLKR ZF5 positions ⁇ 1 to +6: (SEQ ID NO: 3) DTYKLKR ZF6 positions ⁇ 1 to +6: (SEQ ID NO: 4) QSGTMKM ZF7 positions ⁇ 1 to +6: (SEQ ID NO: 5) RKSDLGV
  • a wild-type CTCF has an amino acid sequence that has greater than 80%, greater than 85%, greater than 90%, greater than 95%, greater than 96%, greater than 97%, greater than 98% or greater than 99% sequence identity as compared to the amino acid sequence shown below:
  • test sequence For the purpose of comparing two different nucleic acid or polypeptide sequences, one sequence (test sequence) may be described to be a specific percentage identical to another sequence (comparison sequence).
  • the percentage identity can be determined by the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 90:5873-5877 (1993), which is incorporated into various BLAST programs.
  • the percentage identity can be determined by the “BLAST 2 Sequences” tool, which is available at the National Center for Biotechnology Information (NCBI) website. See Tatusova and Madden, FEMS Microbiol. Lett., 174(2):247-250 (1999).
  • the BLASTN program is used with default parameters (e.g., Match: 1; Mismatch: ⁇ 2; Open gap: 5 penalties; extension gap: 2 penalties; gap x_dropoff: 50; expect: 10; and word size: 11, with filter).
  • the BLASTP program can be employed using default parameters (e.g., Matrix: BLOSUM62; gap open: 11; gap extension: 1; x_dropoff: 15; expect: 10.0; and wordsize: 3, with filter).
  • Percent identity of two sequences is calculated by aligning a test sequence with a comparison sequence using BLAST, determining the number of amino acids or nucleotides in the aligned test sequence that are identical to amino acids or nucleotides in the same position of the comparison sequence, and dividing the number of identical amino acids or nucleotides by the number of amino acids or nucleotides in the comparison sequence.
  • BLAST is used to compare two sequences, it aligns the sequences and yields the percent identity over defined, aligned regions. If the two sequences are aligned across their entire length, the percent identity yielded by the BLAST is the percent identity of the two sequences.
  • BLAST does not align the two sequences over their entire length, then the number of identical amino acids or nucleotides in the unaligned regions of the test sequence and comparison sequence is considered to be zero and the percent identity is calculated by adding the number of identical amino acids or nucleotides in the aligned regions and dividing that number by the length of the comparison sequence.
  • BLAST programs can be used to compare sequences, e.g., BLAST 2.1.2 or BLAST+ 2.2.22.
  • CBSs CTCF Binding Sites
  • the CBS is typically 40 bp in length with a highly conserved 15 bp core sequence (or core motif). Sequence flanking the core sequence is significantly less well conserved, but still important for CTCF binding at sites throughout the genome ( FIG. 1 ).
  • Wild type CTCF binds to a “consensus CBS motif” contains the following core sequence:
  • the consensus CBS motif contains the following core sequence: 5′-CCAGCAGGGGGCGCT-3′ (SEQ ID NO:6). Other core sequences that are known in the art.
  • CTCF binding is sensitive to changes in the conserved 15 bp core motif of the CBS, where, in mice, single nucleotide changes at certain positions can lead to loss of CTCF binding (Nakahashi et al., Cell Reports (2013)).
  • CTCF binding sites have been reported to be mutational hotspots in cancer with cancer-associated mutations localized to the core sequence of the CTCF binding site in primary samples from gastrointestinal cancer patients and with accompanying atypical gene expression profiles of oncogenic and tumor suppressor genes (Guo et al., Nature Communications (2018)).
  • Methods described herein can be used to select and generate engineered CTCF variants comprising a plurality of zinc fingers, where the selected polypeptide has at least one amino acid residue in at least one zinc finger that differs in sequence from a wild-type CTCF, and where the engineered CTCF variant binds to a DNA sequence of interest (e.g., CBS harboring at least one mutation in the consensus CBS sequence) but does not bind to a consensus CBS.
  • a scaffold polypeptide is re-engineered into a new scaffold-based zinc-finger polypeptide that has different structural and functional features, such that the new polypeptide binds to a sequence of interest but does not bind to a naturally occurring DNA binding site of the scaffold protein.
  • Zf refers to a polypeptide having DNA binding domains that are stabilized by zinc.
  • the individual DNA binding domains are typically referred to as “fingers.”
  • a Zf protein has at least one finger, preferably 2 fingers, 3 fingers, or 6 fingers.
  • a Zf protein having two or more Zfs is referred to as a “multi-finger” or “multi-Zf” protein.
  • Each finger typically comprises an approximately 30 amino acid, zinc-chelating, DNA-binding domain.
  • An exemplary motif characterizing one class of these proteins is -Cys-(X) (2-4)-Cys-(X) (12)-His-(X) (3-5)-His (SEQ ID NO:7), where X is any amino acid, which is known as the “C(2)H(2)class.”
  • a single Zf of this class typically consists of an alpha helix containing the two invariant histidine residues co-ordinated with zinc along with the two cysteine residues.
  • bind to or “binding” with respect to a nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to sequence-dependent binding of the nucleic acid binding factor to the target nucleic acid sequence of a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, in such a way that the binding allows the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, recruitment of other proteins to the binding site, and/or alteration of chromatic structure.
  • Such binding can result in modulation of expression of genes, such as activation, overexpression, suppression, or inactivation of gene expression.
  • nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to the lack of sequence-specific binding of the nucleic acid binding factor to a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, as a result of the lack of presence of a target sequence in the nucleic acid (e.g., due to one or more point-mutations in the CBS).
  • CTCF variant or wild-type
  • CBS refers to the lack of sequence-specific binding of the nucleic acid binding factor to a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, as a result of the lack of presence of a target sequence in the nucleic acid (e.g., due to one or more point-mutations in the CBS).
  • Such non-binding does not allow the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, DNA modification, DNA cleavage, recruitment of other proteins to the binding site, and/or alteration of chromatic structure.
  • Each finger within a Zf protein binds to from about two to about five base pairs within a DNA sequence.
  • a single Zf within a Zf protein binds to a three or four base pair “subsite” within a DNA sequence.
  • a “subsite” is a DNA sequence that is bound by a single zinc finger.
  • a “multi-subsite” is a DNA sequence that is bound by more than one zinc finger, and comprises at least 4 bp, preferably 6 bp or more.
  • a multi-Zf protein binds at least two, and typically three, four, five, six or more subsites, i.e., one for each finger of the protein.
  • Described herein are engineered CTCF variants that can bind to mutant CBSs with higher affinity than wild-type CTCF.
  • the engineered CTCF variants can be used in regulating genes that are under the control of mutant CBSs (CBSs having at least one nucleic acid that is different in sequence from the nucleic acid sequence of a consensus CBS).
  • the CTCF variants have at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF.
  • Exemplary engineered CTCF variants include those that contain:
  • DHLQT amino acid sequence DHLQT (SEQ ID NO:8), EHLNV (SEQ ID NO:9), AHLQV (SEQ ID NO:10), EHLRE (SEQ ID NO:11), DHLQV (SEQ ID NO:12), EHLKV (SEQ ID NO:13), DHLQV (SEQ ID NO:14), EHLVV (SEQ ID NO:15), DHLRT (SEQ ID NO:16), DHLAT (SEQ ID NO:17), or DHLQT (SEQ ID NO:18) at ZF7 positions +2 to +6;
  • DHLQT amino acid sequence DHLQT (SEQ ID NO:19), EHLNV (SEQ ID NO:20), AHLQV (SEQ ID NO:21), EHLRE (SEQ ID NO:22), DHLQV (SEQ ID NO:23), EHLKV (SEQ ID NO:24), DHLQV (SEQ ID NO:25), EHLVV (SEQ ID NO:26), DHLRT (SEQ ID NO:27), DHLAT (SEQ ID NO:28), or DHLQT (SEQ ID NO:29) at ZF7 positions +2 to +6;
  • NAMKR amino acid sequence NAMKR (SEQ ID NO:30), EHMGR (SEQ ID NO:31), DHIVINR (SEQ ID NO:32), THMKR (SEQ ID NO:33), EHMRR (SEQ ID NO:34), or THIVINR (SEQ ID NO:35) at ZF6 positions +2 to +6;
  • HGLKV amino acid sequence HGLKV (SEQ ID NO:41), HRLKE (SEQ ID NO:42), HALKV (SEQ ID NO:43), SRLKE (SEQ ID NO:44), DGLRV (SEQ ID NO:45), HTLKV (SEQ ID NO:46), or NRLKE (SEQ ID NO:47) at ZF5 positions +2 to +6;
  • AHLRK amino acid sequence AHLRK (SEQ ID NO:60), AKLRV (SEQ ID NO:61), GGLGL (SEQ ID NO:62), AKLRI (SEQ ID NO:63), TKLKV (SEQ ID NO:64), or SKLRV (SEQ ID NO:65) at ZF4 positions +2 to +6;
  • the engineered CTCF variants contain two or more combinations of the above-listed amino acid sequences.
  • CTCF zinc finger arrays capable of recognizing CBSs harboring multiple point mutations.
  • CTCF proteins harboring these zinc finger array variants are used to restore CTCF binding activity at sites bearing one or more mutations within a CBS (i.e., non-canonical CBSs).
  • CTCF variants capable of recognizing alternative non-CBS sites in the genome can be used to create artificial TADs and/or enhancer-promoter loops that can purposefully insulate genes and/or perturb the higher order structure of the genome and thereby alter expression of certain target genes of interest.
  • the engineered CTCF variants described herein can be used for treating diseases where aberrant gene regulation due to mutant CBS is an underlying factor.
  • the engineered CTCF variants described herein can, for example, bind to mutant CBSs that do not bind wild-type CTCFs, thereby altering or restoring gene regulation that can reverse or slow down progression of diseases.
  • CTCF binding has been shown to regulate expression of oncogenes, such as MYC. Mutations accumulated in CTCF binding sites and loss of wild-type CTCF binding are associated to dysregulation of oncogenes and increased risk of carcinogenesis. Transcriptional dysregulation of MYC is one of the most frequent events in aggressive tumor cells and the dysregulation is a result of mutations in CTCF binding site disrupting enhancer-promoter loop.
  • Engineered CTCF variants can bind to the mutated sites and restore normal gene expression levels, reducing risk of cancer development.
  • Fragile X Syndrome is the result of a duplication in a repetitive region and the loss of FMR1 expression.
  • Duplication of a repeat region in the X chromosome disrupts a CTCF binding site, leading to the loss of an enhancer-promoter loop driving the expression of FMR1.
  • the engineered CTCF variants could restore the enhancer-promoter loop, leading to restoration of FMR1 expression.
  • Human Papilloma Virus (HPV) and other integrating viruses (such as HIV) are often silenced by CTCF-mediated insulation of the viral genome from nearby enhancers.
  • HPV18 there is a CTCF binding site in the promoter region of the viral genome.
  • HPV18 that have mutations in the CTCF binding site are not silenced because these sequence mutations in the binding site can no longer be recognized by CTCF.
  • Engineered CTCF variants would be able to bind to the mutated HPV integrated genomes and restore the insulating loop.
  • kits comprising the engineered CTCF variant, and/or nucleic acids encoding an engineered CTCF variant as described herein and instructions for use.
  • the engineered CTCF variant, or nucleic acids encoding such engineered CTCF variant can be used to further elucidate the complex interactions of CTCF and other chromatin organization proteins.
  • the structural maintenance of chromosomes is tightly regulated within cells and CTCF plays a major role. It still remains unclear how higher order structures are inherited across cell division and maintained through cell differentiation, the use of CTCF variants can help clarify that role.
  • CTCF variants might be used to investigate how loops are formed across the genome and to modify or restore normal genomic architecture in a manner that impacts endogenous gene expression for research and therapeutic applications. They might also be used to re-establish ancestral CTCF binding sites so that we may better understand the evolutionary implications of TAD-based genome organization and epigenetic regulation of gene expression or to create alternative genomic architectures that impact endogenous gene expression for research and therapeutic applications.
  • the zinc-finger bacterial expression plasmid contained the CTCF zinc finger array (or variants) fused to gal11P.
  • the amino-terminal end of all or part of the CTCF 11-finger zinc finger array was fused to the carboxy-terminal end of gal11P with a Flag tag linker between them.
  • the zinc finger expression plasmid contains a Kanamycin resistance gene.
  • the second plasmid known as the bacterial reporter plasmid, contained CTCF binding site sequence that was introduced via BsaI restriction digest followed by T4 mediated ligation of annealed oligos containing the CTCF binding site.
  • the reporter plasmid contained bacterial lac promoter that promoted the expression of lacZ when the CTCF binding site was bound.
  • the reporter plasmid also has a Chloramphenicol resistance gene.
  • oligos were synthesized by IDT with ‘VNS’ or ‘NNS’ variation introduced in the sequence by design. Oligos were annealed and ligated into the zinc finger expression plasmid (previously digested with XbaI and BamHI) using T4 ligase. Ligation reaction was purified using Qiagen Minelute column and the purified substrate was electro-transformed into electro-competent XL1blue E. coli strain. After 1 hour recover in SOC at 37° C., the transformation was inoculated into 150 mL Luria broth (LB) with 50 ug/mL of Kanamycin. After the culture reached a OD600 of 0.400-0.600 (about 10 hours growth at 37° C.) the culture was spun down and the library was harvested using Qiagen Maxiprep kit.
  • LB Luria broth
  • 600 ng of gal11P-zinc finger expression plasmid and 600 ng of reporter plasmid with CTCF binding site of interest were chemically transformed into 150 uL of ⁇ E. coli strain with an alpha N-terminal domain of RNA polymerase ( ⁇ -NTD)-Gal4 fusion. Plasmid and cell mixture was incubated on ice for 30 minutes, heat shocked at 42° C. for 1 minute, recovered on ice for 2 minutes, followed by recovery in 500 uL of Luria Broth for 1 hour. Post-recovery, transformation was plated on Kanamycin (50 ug/mL), Chloramphenicol (12.5 ug/uL) selective LB agar plates.
  • ⁇ -NTD RNA polymerase
  • the reporter plasmid is made to be a selective plasmid by swapping LacZ with BlaC, an antibiotic resistance gene for ⁇ -lactam ring class of antibiotics, such as Carbenicillin. Selections are carried out by constructing libraries of variants from a pool of oligos ligated into the zinc finger-gal11P expression plasmid. These are electro-transformed into electro-competent ⁇ E. coli strain containing the selective plasmid with the CTCF binding site of interest. Cells are recovered in 1 mL of SOC for 1 hour at 37° C. followed by induction of selective plasmid for 3 additional hours at 37° C.
  • transformations are plated on low stringency plates (LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of zinc chloride, and 200 ug/mL, IPTG and 0.45 ug/mL of Clavulanic acid). Plates are grown overnight at 37° C. for 20-24 hours and then colonies are harvested off the surface with 2 mL of LB.
  • LB low stringency plates
  • plasmid is harvested from the overnight cultures and chemically transformed into chemically competent ⁇ E. coli strain containing the same selective plasmid with the CTCF binding site of interest as before.
  • the chemical transformation is performed as previously described with the addition of 2 hour growth in induction media following a 1 hour recovery at 37° C. After a total of 3 hours of growth, cells are plated on high stringency selective gradient plates.
  • the high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ⁇ 1 up to 40 ug/mL in concentration. Plates were incubated 20-24 hours at 37° C. Colonies that grew on the gradient with the highest levels of Clavulanic acid were picked and grown in 1 mL of TB with 50 ug/mL of Kanamycin and grown overnight in order to harvest the plasmid. The variant plasmid was then Sanger sequenced as well as analyzed for binding activity in the B2H ⁇ -gal reporter assay.
  • the high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ⁇ 1 to 40 ug/mL in concentration.
  • rectangle plates are elevated using a pipette tip so as to have a ⁇ 25° C. slope (enough of an angle so that the thin end of the wedge is only barely covered with LB agar).
  • the plates are laid flat and 20-25 mLs of LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG (with no Clavulanic acid) is poured on top. This creates plates with a gradient of Clavulanic acid ranging from ⁇ 1 ug/mL up to 4.0 ug/mL.
  • K562 cells were seeded 18-24 hours in advance of transfection at a density of 3 ⁇ 10 5 cells/mL. 3 million K562s per variant were transfected using Lonza Kit V using the provided optimized protocol and pooled in a 10 cm dish. 5 ug of plasmid expressing HA epitope tagged CTCF (wild-type or variant) expressed by a pCAG promoter was used for each 1 million cell reaction. 72 hours post transfection, approximately 10 million cells were crosslinked with 1% Formaldehyde at 37° C. for 10 mins. Reaction was quenched with 1.2 mL of 2.5M Glycine for 5 mins at 37° C.
  • the antibody chromatin complex was eluted from the beads in 100 uL of Elution Buffer (10 mM Tris-HCl pH 8, 0.1% SDS, 150 mM NaCl) with 5 mM DTT added fresh. Beads were incubated with elution buffer at 65° C. for 1 hour, shaking at 900 rpm. Beads were pelleted by magnet and supernatant was moved to a clean tube where, after cooling to room temp, 1 uL of RNAse (Roche 11119915001) was added to the sample and incubated at 37° C. for 30 mins at 600 rpm. 3 uL of Proteinase K [20 mg/mL] was added to samples and incubated overnight at 65° C.
  • Elution Buffer 10 mM Tris-HCl pH 8, 0.1% SDS, 150 mM NaCl
  • exoMYC.K562 is K562 cell line transduced with exogenous MYC construct expressed off of PGK promoter. This was necessary as any reduction of endogenous MYC expression can impact the survival of K562 cells.
  • GFP+ cells were sorted at a high dilution into a 96 well plate for single-cell clonal expansion.
  • gDNA and RNA was extracted to genotype and phenotype the clonal cell population. Clonal lines that had a reduction of endogenous MYC and also appeared homozygous at the target site for the desired HDR event were used in the study.
  • the binding of a DNA-binding zinc finger array to a target site of interest can be configured to result in increased transcription of a reporter gene (e.g., beta-galactosidase or an antibiotic resistance gene) ( FIG. 2 ).
  • a reporter gene e.g., beta-galactosidase or an antibiotic resistance gene
  • two fusions are expressed in an E. coli cell bearing a reporter construct.
  • the first fusion consists of a zinc finger array fused to a fragment of the yeast Gal11P protein, which interacts with a fragment of the yeast Gal4 fusion.
  • the second fusion consists of a fusion of the N-terminal domain of the E. coli RNA polymerase alpha subunit to the yeast Gal4 fragment (the ⁇ -Gal4 fusion).
  • the reporter construct consists of a weak E.
  • the coli promoter that drives expression of the reporter gene of interest with a binding site for the zinc finger array positioned upstream of the promoter. Binding of the zinc finger-Gal11P fusion to the zinc finger binding site results in recruitment of RNA polymerase complexes harboring the alpha-Gal4 fusion, resulting in increased transcription of the reporter gene. If the reporter gene is lacZ, which encodes for ⁇ -galactosidase ( ⁇ -gal), the level of beta-gal expression can be easily quantified using a well-established colorimetric ONPG-based assay ( FIG. 2 ).
  • blaC expression can be selected for by using media containing carbenicillin and increasingly higher concentrations of the beta-lactamase inhibitor clavulanic acid. Gradients of clavulanic acid can be created within a single agar plate ( FIG. 6 ; see Materials and Methods), thereby enabling sampling of cells at various concentrations of the inhibitor.
  • CTCF zinc finger array variants that can bind to CBSs bearing single point mutations that abolish binding by the wild-type CTCF zinc finger array in this system.
  • CTCF zinc finger array variants that could bind to mutant CBSs bearing mutations of the C that is contacted by an aspartic acid (D) present at the third position (+3) of the alpha-helical recognition helix of ZF7 (shown by previously published co-crystal structures cited above).
  • CTCF zinc finger array variants that showed preferential binding activity (as judged by the B2H reporter assay) for the mutated CBS compared with the original consensus CBS.
  • These clones also showed selection for a particular amino acid at the ZF7 +3 position: for the C to T site, a threonine (T) was selected, for the C to A site, an asparagine (N) was selected, and for the C to G site a histidine (H) was selected.
  • T threonine
  • N asparagine
  • H histidine
  • Exogenous wild-type 3 ⁇ HA-CTCF could bind to the wild-type CBSs and could not bind to the variant binding sites, same as endogenous wild-type CTCF, suggesting overexpression of CTCF by plasmid delivery reflects biologically relevant behavior ( FIG. 23A ).
  • the variant chosen was one pulled out from selection in the B2H selection assay and shown to bind to the variant site with the same sequence as variant site 1 and 2, used in FIGS. 22-23B , by the B2H reporter assay.
  • K562s were transfected with the 3 ⁇ HA-tagged CTCF variant and the same sites as before were examined for binding activity by ChIP-qPCR.
  • Variant specific HA enrichment was present at the variant binding sites and lacking at the wild-type sites suggesting we successfully evolved a variant that can specifically bind to mutant CBS with as few as three nucleotide changes without binding native CBSs ( FIG. 23B ).
  • CTCF has the capacity to alter gene expression through CTCF-Cohesin mediated looping of the genome.
  • variant CTCFs could reproduce the gene regulatory capacity of wild-type CTCF when bound to the endogenous variant binding sites.
  • K562s were nucleofected with variant CTCFs fused to GFP that had the capacity to bind to Variant site 1 and Variant site 2.
  • 6 genes showed a change in gene expression relative to cells nucleofected with the wild-type CTCF control (JJ388) ( FIG. 24A ).
  • 2 of the 10 genes identified for Variant site 2.1 and 2.2 had altered gene expression levels relative to wild-type control ( FIG. 24B ).
  • This data suggests that not only do the variant CTCF proteins bind to their target sequence in human cells, but it also reproduces the biological role of native CTCF to regulate gene expression possibly through the formation of loops or sub-TADs.
  • CTCF variants could replicate the biological function of wild-type CTCF at a known CTCF binding site that creates an enhancer-promoter loop.
  • MYC expression is maintained by a loop formed between a CTCF binding site ⁇ 2 kb upstream of the transcriptional start site (TSS) of MYC and a CTCF binding site ⁇ 1 kb downstream of the MYC TSS14.
  • TSS transcriptional start site
  • cohesin links both CTCFs via the CTCF's cohesin-interaction domain, creating a loop that maintains the expression of MYC. If one or both of the CTCF binding sites is disrupted the CTCF-mediated loop is lost and there is a reduction in MYC expression 14.
  • HA tagged wild-type CTCF and HA tagged CTCF variants were expressed in the cell line that contained their matching variant binding site.
  • Variants selected to bind to the G3 variant binding site were expressed in the G3_3 cell line, A3 variants in the A3_4 cell line, etc.
  • HA-tagged wild-type CTCF was also tested in each of the variant cell lines for binding and for recovery of endogenous MYC expression.
  • the level of endogenous MYC expression in exoMYC.K562 served as wild-type control as there is no alteration to the CTCF binding site upstream of the MYC TSS.
  • CTCF variants expressed in the engineered cell lines recovered endogenous MYC expression while expression of wild-type CTCF in these cell lines failed to recover MYC expression ( FIGS. 26A-29 ).
  • the same samples were analyzed for occupancy of the variant binding sites by wild-type CTCF or the variant CTCFs by ChIP-qPCR enriching for CTCF-bound DNA fragments with CTCF or HA antibody. Wild-type CTCF had a reduced occupancy of the variant binding sites, consistent with continued reduction of MYC expression, while variant CTCF proteins could bind to the variant site they were selected for as well as rescue MYC expression ( FIG. 26-29 ). Together, this data suggests that we have evolved CTCF variants that can bind to novel sequences and still interact with cohesin to form loops that maintain gene expression profiles.
  • Amino acid sequence of variants selected for on different CTCF binding sites All amino acids sequences are listed from N to C terminal. Colonies growing on the highest stringency of selection were scrapped off, pooled, and plasmid encoding for the zinc finger was isolated and deep sequenced. The number of reads reflects how prominent the variant was in the population pooled from selections performed in triplicate.
  • ERPRM 1 2026 GGLKQ 1 2028 GMLKV 1 2217 GRLKA 1 2218 GRLKV 1 2030 GTLKQ 1 2219 GVLKE 1 2220 GVLTG 1 2221 HALDV 1 2031 HALKA 1 2222 HELKV 1 2223 HGLEA 1 2036 HGLKQ 1 2224 HGLRG 1 2225 HGMKA 1 2226 HGPKV 1 2044 HILIA 1 2227 HILKE 1 2228 HILKV 1 2229 HILNA 1 2230 HKLKG 1 2231 HKLKQ 1 2046 HKLRV 1 2048 HMLRE 1 1933 HPEG...
  • GDLSG 1 3573 GGLDQ 1 3878 GGLKD 1 3879 GGLKI 1 2659 GGLNR 1 3575 GGLPE 1 GH*R . . . 1 3393 GHLKR 1 3446 GHLLR 1 3580 GHLMV 1 3330 GHLRR 1 3363 GHLRV 1 3419 GHLTL 1 3448 GHLVG 1 3582 GILAG 1 3880 GILRM 1 3881 GK*RG 1 3584 GKLKA 1 3382 GKLKM 1 3882 GKLML 1 3883 GKLQV 1 3588 GKLRA 1 3884 GKLRQ 1 3885 GKLRT 1 3394 GKLTL 1 3593 GLLLD 1 3594 GLLMG 1 3364 GLLPG 1 3595 GLLRG 1 3886 GPLGQ 1 3597 GPLGV 1 3887 GPLMG 1 3888 GQLKA 1 3889 GRLAV 1 3890 GRLNA 1 3601 GSLST 1 3602 GSLVK 1 3603 GVLAG
  • RGLDM 1 3924 RGLDM 1 3925 RGLDR 1 3691 RGLTA 1 3926 RGLVA 1 2953 RGLVR 1 3692 RGLVV 1 3694 RHLRE 1 3697 RILPR 1 3698 RKLIV 1 3927 RKLKA 1 3928 RKLKV 1 3929 RKLRE 1 3930 RKLRV 1 3931 RKVRV 1 3700 RLLGA 1 3701 RLLMP 1 3932 RMLQE 1 3703 RMLVP 1 3933 RPLEV 1 3705 RRLVN 1 3706 RTLML 1 3707 RTLTQ 1 S*G . . .
  • TTLGR 1 2858 TTLGR 1 2859 TTLIR 1 5441 TTLRS 1 5442 TVLNR 1 3308 VSLRR 1 2995 VTLKR 1 5443 VTLQR 1 5444 VVLGN 1 5445 WRLDR 1 5446 WTLRR 1

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Oncology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Toxicology (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

Described herein are engineered CCCTC-binding factor (CTCF) variants that can bind to mutant CTCF binding sequences and method of using the same.

Description

    CLAIM OF PRIORITY
  • This application is a divisional of U.S. patent application Ser. No. 16/415,989, filed May 17, 2019, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/672,682, filed on May 17, 2018 and U.S. Provisional Patent Application Ser. No. 62/828,277, filed on Apr. 2, 2019. The entire contents of the foregoing are hereby incorporated by reference.
  • FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
  • This invention was made with Government support under Grant No. GM118158 awarded by the National Institutes of Health. The Government has certain rights in the invention.
  • SEQUENCE LISTING
  • The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Dec. 10, 2020, is named Sequence Listing.txt and is 1,104,397 bytes in size.
  • TECHNICAL FIELD
  • The invention relates, at least in part, to engineered CCCTC-binding factor variants with altered DNA-binding specificities.
  • BACKGROUND
  • CCCTC-binding factor (CTCF) is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression (Ong and Corces, Nature Reviews Genetics (2014); Phillips and Corces, Cell (2009)). CTCF maintains topologically associated domains (TADs) spanning MBs of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters (Ali et al., Current Opinion in Genetics & Development (2016); Nora et al., Nature (2012); Rao et al., Cell (2014)). In addition, CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting homologous recombination repair at double-strand breaks (Shukla et al., Nature (2011); Hilmi et al., Science Advances (2017); Han et al., Scientific Reports (2016)). CTCF binds throughout the genome via an 11 finger zinc finger (ZF) array that recognizes CTCF binding sites (CBSs). The CBS is typically 40 bp in length with a highly conserved 15 bp core sequence.
  • SUMMARY
  • The present invention is based, at least in part, on the development of engineered CTCF variants that can bind to mutant CBSs with higher affinity than a wild-type CTCF.
  • The present invention relates to an engineered CCCTC-binding factor (CTCF) variant including at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) −1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that has a Thymine (T), Adenine (A), or Guanine (G) residue at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 position +3.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a G residue at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), or DHLAT (SEQ ID NO: 17) at ZF7 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 3 of the consensus CBS motif, the engineered CTCF at ZF7 positions −1 to +3 including: the amino acid sequence RKHD (SEQ ID NO: 173) or RRSD (SEQ ID NO: 174), where the mutant CBS has a T residue at position 3 of the consensus CBS motif; the amino acid sequence RKAD (SEQ ID NO: 175), IPRI (SEQ ID NO: 176), RKHD (SEQ ID NO: 173), or RKDD (SEQ ID NO: 177), where the mutant CBS has a G residue at position 3 of the consensus CBS motif; or the amino acid sequence GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), QALL (SEQ ID NO: 180) or PHRM (SEQ ID NO: 181), where the mutant CBS has a C residue at position 3 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or A residue at position 5 of the consensus CBS motif, the engineered CTCF at ZF6 positions +2 to +6 including: the amino acid sequence NAMKR (SEQ ID NO: 30), GNMAR (SEQ ID NO: 182), EGMTR (SEQ ID NO: 183), SNMVR (SEQ ID NO: 184), or NAMRG (SEQ ID NO: 185), where the mutant CBS has a T residue at position 5 of the consensus CBS motif; or the amino acid sequence EHMGR (SEQ ID NO: 31), DHIVINR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THIVINR (SEQ ID NO: 35), where the mutant CBS has a G residue at position 5 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 6 of the consensus CBS motif, the engineered CTCF at ZF6 positions −1 to +3 including: the amino acid sequence MNES (SEQ ID NO: 36) or HRES (SEQ ID NO: 37), where the mutant CBS has a T residue at position 6 of the consensus CBS motif; or the amino acid sequence RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40), where the mutant CBS has a G residue at position 6 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 7 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), or DGLRV (SEQ ID NO: 45), where the mutant CBS has a T residue at position 7 of the consensus CBS motif; the amino acid sequence HTLKV (SEQ ID NO: 46), or HGLKV (SEQ ID NO: 41), where the mutant CBS has an A residue at position 7 of the consensus CBS motif; or the amino acid sequence SRLKE (SEQ ID NO: 44), HRLKE (SEQ ID NO: 42) or NRLKE (SEQ ID NO: 47), where the mutant CBS has a C residue at position 7 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 8 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), or HGLIR (SEQ ID NO: 51), where the mutant CBS has a T residue at position 8 of the consensus CBS motif; the amino acid sequence ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), or GGLTR (SEQ ID NO: 55), where the mutant CBS has an A residue at position 8 of the consensus CBS motif; the amino acid sequence HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59), where the mutant CBS has a C residue at position 8 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 10 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence AHLRK (SEQ ID NO: 60), wherein the mutant CBS has a T residue at position 10 of the consensus CBS motif; the amino acid sequence AKLRV (SEQ ID NO: 61), EKLRI (SEQ ID NO: 186), or AKLRI (SEQ ID NO: 63), where the mutant CBS has an A residue at position 10 of the consensus CBS motif; or the amino acid sequence TKLKV (SEQ ID NO: 64), wherein the mutant CBS has a C residue at position 10 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 11 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence ATLRR (SEQ ID NO: 66) or RRLDR (SEQ ID NO: 67), where the mutant CBS has a T residue at position 11 of the consensus CBS motif; the amino acid sequence TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), or GNLTR (SEQ ID NO: 70), where the mutant CBS has an A residue at position 11 of the consensus CBS motif; or the amino acid sequence AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74), where the mutant CBS has a C residue at position 11 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 13 of the consensus CBS motif, the engineered CTCF at ZF3 positions +2 to +6 including: the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), or QQLLI (SEQ ID NO: 79), where the mutant CBS has a T residue at position 13 of the consensus CBS motif; the amino acid sequence GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), GKLVT (SEQ ID NO: 187), TELII (SEQ ID NO: 82) or QGLLV (SEQ ID NO: 83), where the mutant CBS has an A residue at position 13 of the consensus CBS motif; or the amino acid sequence QQLLT (SEQ ID NO: 84), GQLLT (SEQ ID NO: 85), GELLT (SEQ ID NO: 86), or QQLLI (SEQ ID NO: 79), where the mutant CBS has a C residue at position 13 of the consensus CBS motif.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence AKLKK (SEQ ID NO: 88), AKLRK (SEQ ID NO: 89), AHLRV (SEQ ID NO: 90), AKLRV (SEQ ID NO: 61), or SKLRL (SEQ ID NO: 92) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence ERLRV (SEQ ID NO: 93), NRLKV (SEQ ID NO: 94), SRLKE (SEQ ID NO: 44), or NRLKV (SEQ ID NO: 94) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence RPDT (SEQ ID NO: 38), RTET (SEQ ID NO: 98), or RADV (SEQ ID NO: 99) at ZF6 positions −1 to +3 of the engineered CTCF; and (iv) the amino acid sequence DNLLA (SEQ ID NO: 100), SNLLV (SEQ ID NO: 101), DNLMA (SEQ ID NO: 102), or DNLRV (SEQ ID NO: 103) at ZF7 positions +2 to +6 of the engineered CTCF.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLRK (SEQ ID NO: 60), or GKLRI (SEQ ID NO: 106) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence SRLKE (SEQ ID NO: 44), DALRR (SEQ ID NO: 108), DGLKR (SEQ ID NO: 109), or TRLRE (SEQ ID NO: 110) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence at RPDTMKR (SEQ ID NO: 188) or RTENMKM (SEQ ID NO: 189) at ZF6 positions −1 to +6 of the engineered CTCF; and (iv) the amino acid sequence EHLKV (SEQ ID NO: 13), DHLLA (SEQ ID NO: 114), or HHLDV (SEQ ID NO: 115) at ZF7 positions +2 to +6 of the engineered CTCF.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence SNLRR (SEQ ID NO: 116), GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLKR (SEQ ID NO: 119), ANLRR (SEQ ID NO: 69), NNLRR (SEQ ID NO: 121), or TNLRR (SEQ ID NO: 68) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMKR (SEQ ID NO: 123), EHMAR (SEQ ID NO: 34), THMKR (SEQ ID NO: 33), EHMNR (SEQ ID NO: 126), or EHMAR (SEQ ID NO: 127) at ZF6 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence DNLLT (SEQ ID NO: 128), DNLLV (SEQ ID NO: 129), DNLQT (SEQ ID NO: 130), DNLLA (SEQ ID NO: 100), DNLAT (SEQ ID NO: 132), DNLQA (SEQ ID NO: 133), DNLMA (SEQ ID NO: 102), or DNLMT (SEQ ID NO: 135) at ZF7 positions +2 to +6 of the engineered CTCF.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLAR (SEQ ID NO: 138), GNLMR (SEQ ID NO: 139), ANLRR (SEQ ID NO: 69), SNLRR (SEQ ID NO: 116), or NNLRR (SEQ ID NO: 121) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHIVINR (SEQ ID NO: 32) at ZF6 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence EHLKV (SEQ ID NO: 13), EHLAE (SEQ ID NO: 151), STLNE (SEQ ID NO: 152), DHLQV (SEQ ID NO: 12), EHLNV (SEQ ID NO: 9), DHLNT (SEQ ID NO: 155), EHLQA (SEQ ID NO: 156), or HHLMH (SEQ ID NO: 157) at ZF7 positions +2 to +6 of the engineered CTCF.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, T, and T residues at positions 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLKK (SEQ ID NO: 159), TKLRL (SEQ ID NO: 160), TKLKL (SEQ ID NO: 161), GHLRK (SEQ ID NO: 162), THLKK (SEQ ID NO: 163), or AHLRK (SEQ ID NO: 60) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence TRLKE (SEQ ID NO: 165) or SRLKE (SEQ ID NO: 44) at ZF5 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence RADN (SEQ ID NO: 167), RHDT (SEQ ID NO: 40), RRDT (SEQ ID NO: 169), RPDT (SEQ ID NO: 38), RTSS (SEQ ID NO: 171), or RNDT (SEQ ID NO: 172) at ZF6 positions −1 to +3 of the engineered CTCF.
  • In some embodiments, the engineered CTCF variant includes at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) −1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.
  • In some embodiments, the engineered CCCTC-binding factor (CTCF) variant that binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that differs from a consensus CBS at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 +3 position.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C-to-G mutation at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), DHLQV (SEQ ID NO: 12), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), DHLAT (SEQ ID NO: 17), or DHLQT (SEQ ID NO: 8) at ZF7 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 3 of the consensus CBS motif, the engineered CTCF including the amino acid sequence RKHD (SEQ ID NO: 173), RRSD (SEQ ID NO: 174), GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), or PHRM (SEQ ID NO: 181) at ZF7 positions −1 to +3.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 5 of the consensus CBS motif, the engineered CTCF including the amino acid sequence NAMKR (SEQ ID NO: 30), EHMGR (SEQ ID NO: 31), DHMNR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THMNR (SEQ ID NO: 35) at ZF6 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 6 of the consensus CBS motif, the engineered CTCF including the amino acid sequence MNES (SEQ ID NO: 36), HRES (SEQ ID NO: 37), RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40) at ZF6 positions −1 to +3.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 7 of the consensus CBS motif, the engineered CTCF including the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), DGLRV (SEQ ID NO: 45), HTLKV (SEQ ID NO: 46), or NRLKE (SEQ ID NO: 47) at ZF5 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 8 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), HGLIR (SEQ ID NO: 51), ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), GGLTR (SEQ ID NO: 55), HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59) at ZF5 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 10 of the consensus CBS motif, the engineered CTCF including the amino acid sequence AHLRK (SEQ ID NO: 60), AKLRV (SEQ ID NO: 61), GGLGL (SEQ ID NO: 62), AKLRI (SEQ ID NO: 63), TKLKV (SEQ ID NO: 64), or SKLRV (SEQ ID NO: 65) at ZF4 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 11 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLRR (SEQ ID NO: 66), RRLDR (SEQ ID NO: 67), TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), GNLTR (SEQ ID NO: 70), AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74) at ZF4 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 13 of the consensus CBS motif, the engineered CTCF including the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), QQLLI (SEQ ID NO: 79), GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), TELII (SEQ ID NO: 82), QGLLV (SEQ ID NO: 83), QQLLT (SEQ ID NO: 84), GQLLT (SEQ ID NO: 85), GELLT (SEQ ID NO: 86), or QQLLI (SEQ ID NO: 79) at ZF3 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • (i) the amino acid sequence AKLKK (SEQ ID NO: 88), AKLRK (SEQ ID NO: 89), AHLRV (SEQ ID NO: 90), AKLRV (SEQ ID NO: 61), or SKLRL (SEQ ID NO: 92) at ZF4 positions +2 to +6;
  • (ii) the amino acid sequence ERLRV (SEQ ID NO: 93), NRLKV (SEQ ID NO: 94), SRLKE (SEQ ID NO: 44), or NRLKV (SEQ ID NO: 94) at ZF5 positions +2 to +6;
  • (iii) the amino acid sequence RPDT (SEQ ID NO: 38), RTET (SEQ ID NO: 98), or RADV (SEQ ID NO: 99) at ZF6 positions −1 to +3; and
  • (iv) the amino acid sequence DNLLA (SEQ ID NO: 100), SNLLV (SEQ ID NO: 101), DNLMA (SEQ ID NO: 102), or DNLRV (SEQ ID NO: 103) at ZF7 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLRK (SEQ ID NO: 60), or GKLRI (SEQ ID NO: 106) at ZF4 positions +2 to +6;
  • (ii) the amino acid sequence SRLKE (SEQ ID NO: 44), DALRR (SEQ ID NO: 108), DGLKR (SEQ ID NO: 109), or TRLRE (SEQ ID NO: 110) at ZF5 positions +2 to +6;
  • (iii) the amino acid sequence at RPDTMKR (SEQ ID NO: 188) or RTENMKM (SEQ ID NO: 189) at ZF6 positions −1 to +36; and
  • (iv) the amino acid sequence EHLKV (SEQ ID NO: 13), DHLLA (SEQ ID NO: 114), or HHLDV (SEQ ID NO: 115) at ZF7 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:
  • (i) the amino acid sequence SNLRR (SEQ ID NO: 116), GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLKR (SEQ ID NO: 119), ANLRR (SEQ ID NO: 69), NNLRR (SEQ ID NO: 121), or TNLRR (SEQ ID NO: 68) at ZF4 positions +2 to +6;
  • (ii) the amino acid sequence EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), THMKR (SEQ ID NO: 33), EHMNR (SEQ ID NO: 126), or EHMAR (SEQ ID NO: 127) at ZF6 positions +2 to +6; and
  • (iii) the amino acid sequence DNLLT (SEQ ID NO: 128), DNLLV (SEQ ID NO: 129), DNLQT (SEQ ID NO: 130), DNLLA (SEQ ID NO: 100), DNLAT (SEQ ID NO: 132), DNLQA (SEQ ID NO: 133), DNLMA (SEQ ID NO: 102), or DNLMT (SEQ ID NO: 135) at ZF7 positions +2 to +6.
  • In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:
  • (i) the amino acid sequence GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLAR (SEQ ID NO: 138), GNLMR (SEQ ID NO: 139), ANLRR (SEQ ID NO: 69), SNLRR (SEQ ID NO: 116), or NNLRR (SEQ ID NO: 121) at ZF4 positions +2 to +6;
  • (ii) the amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHMNR (SEQ ID NO: 32) at ZF6 positions +2 to +6; and
  • (iii) the amino acid sequence EHLKV (SEQ ID NO: 13), EHLAE (SEQ ID NO: 151), STLNE (SEQ ID NO: 152), DHLQV (SEQ ID NO: 12), EHLNV (SEQ ID NO: 9), DHLNT (SEQ ID NO: 155), EHLQA (SEQ ID NO: 156), or HHLMH (SEQ ID NO: 157) at ZF7 positions +2 to +6.
  • In one embodiment, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:
  • (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLKK (SEQ ID NO: 159), TKLRL (SEQ ID NO: 160), TKLKL (SEQ ID NO: 161), GHLRK (SEQ ID NO: 162), THLKK (SEQ ID NO: 163), or AHLRK (SEQ ID NO: 60) at ZF4 positions +2 to +6;
  • (ii) the amino acid sequence TRLKE (SEQ ID NO: 165) or SRLKE (SEQ ID NO: 44) at ZF5 positions +2 to +6; and
  • (iii) the amino acid sequence RADN (SEQ ID NO: 167), RHDT (SEQ ID NO: 40), RRDT (SEQ ID NO: 169), RPDT (SEQ ID NO: 38), RTSS (SEQ ID NO: 171), or RNDT (SEQ ID NO: 172) at ZF6 positions −1 to +3.
  • In some embodiments, the engineered CTCF variant interacts with cohesion to mediate the formation of an enhancer-promoter loop to modulate gene expression.
  • In another aspect, the invention features a method of treating a subject in need thereof, the method including administering to the subject a therapeutically effective amount of an engineered CTCF variant described herein.
  • In some embodiments, the subject can have cancer.
  • In another aspect, the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting the engineered CTCF according to any one of claims 1-15 with a sequence of interest in the gene, such that the expression of the gene is regulated.
  • In another aspect, the invention features a pharmaceutical composition including an engineered CTCF variant described herein.
  • In another aspect, the invention features a gene expression system for regulation of a gene, the system including a nucleic acid encoding an engineered CTCF variant according described herein.
  • In another aspect, the invention features a method of altering the structure of chromatin including contacting an engineered CTCF variant according to any one of claims 1-15 with a sequence of interest to form a binding complex, such that the structure of the chromatin is altered.
  • In another aspect, the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting the CBS bearing one or more mutations with an engineered CTCF variant described herein.
  • In another aspect, the invention features a kit including an engineered CTCF variant described herein.
  • In this disclosure, “comprises,” “comprising,” “containing” and “having” and the like can have the meaning ascribed to them in U.S. Patent law and can mean “includes,” “including,” and the like; “consisting essentially of” or “consists essentially” likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.
  • Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
  • Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
  • DESCRIPTION OF DRAWINGS
  • The following Detailed Description, given by way of example, but not intended to limit the invention to specific embodiment described, may be understood in conjunction with the accompanying figures, incorporated herein by reference.
  • FIG. 1: Diagram of an exemplary 11-finger CTCF zinc finger array protein-DNA interactions at the CTCF binding site. Each zinc finger of the 11-finger array contained a recognition alpha-helix where protein-DNA base contacts were made by amino acids in position −1, 2, 3 and 6 of each alpha-helix. Here, position −1, 3, and 6 were only depicted as positon 2 makes a cross strand contact with the opposite strand of the binding site that is not shown here. The sequence for the binding site was derived from ChIP-seq data (Nakahashi et al., 2013). The binding site was partitioned into three segments: 5′ flanking (gray-line), core (black-line), and 3′ flanking (light gray line). The position of each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Zinc fingers 3-7 of the array (white) make protein-DNA contacts with the core sequence (bold, black lined). There was a possible 5-6 base pair gap (represented by horizontal dashed lines) between zinc finger 8 and zinc fingers 9-11 as suggested by ChIP-exo and DNAse I footprinting of CTCF bound DNA fragments (Hashimoto, H. et al., 2017). Note CTCF binds to its target site in the 3′-5′ direction with the N-terminal side of the protein binding to the 3′ end of the binding site. FIG. 1 discloses SEQ ID NO: 5544.
  • FIG. 2: Diagram of B2H Beta-galactosidase reporter assay. The B2H reporter assay used Gal11P-mediated recruitment of Gal4 to indicate binding. E. coli is transformed with two plasmids: one plasmid encoded for both a zinc finger-Gal11P fusion and an alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion; the second plasmid contained a modifiable binding sequence upstream of a weak promoter that drives the expression of the lacZ gene, which encodes for β-galactosidase. A zinc finger-Gal11P fusion that was able to bind to the target sequence recruited the α-NTD-Gal4 fusion to the promoter, thereby inducing the expression of lacZ. This increase in β-galactosidase levels was detected by a simple colorimetric ONPG-based assay. The CTCF zinc finger array-gal11P fusion was bound to a CTCF binding site in this diagram, recruiting the α-NTD-Gal4 fusion to the promoter region upstream of lacZ, leading to expression.
  • FIG. 3: Fold activation in the B2H B-gal assay was greatest when CTCF zinc fingers 1-11 of 11 finger array interacts with full length target site. Five target sites (sequence indicated in the legend) were tested along with the full CTCF zinc finger array and four different subsets (indicated on the x-axis). The core sequence (black and bolded) which is the most highly conserved sequence of CTCF binding sites was tested independently and with different quantities of flanking sequence as derived from Hashimoto, H. et al. Mol. Cell. 2017 (black and light gray); Persikov, A and Singh, M. NAR. 2014 (medium gray); and Nakahashi, H. et al., Cell Rep. 2013 (very light gray and dark gray). Positive control reflects binding activity of a known 3-finger zinc finger that binds strongly in bacterial and human contexts to a known sequence. The negative control reflects baseline beta-galactosidase levels when the alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion is not directly recruited to the promoter of lacZ. This baseline was used to calculate fold activation when the CTCF zinc finger array is fused to gal11P. FIG. 3 discloses SEQ ID NOS 5545-5548 and 5544, respectively, in order of appearance.
  • FIG. 4: CTCF zinc finger array is sensitive to sequence changes at certain positions of the core region within the CTCF binding site. Each of the four possible nucleotides at each position of the 40 bp reference CBS were tested for ability to bind the CTCF zinc finger array in the B2H y. Fold activation reflects binding activity above background β-galactosidase levels (Background β-gal levels are obtained from the levels of β-gal from samples with each binding site in the presence of the gal4-RNA polymerase fusion with no zinc finger array fused to gal11P). The reference sequence above is partitioned into three segments: 5′ flanking (dark gray lined), core (black lined), and 3′ flanking (gray lined). The position of each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Core sequence 1-15 of the binding site (black, bold) interacts with zinc finger 3-7 of the array (white, black outline) and appear to be most sensitive to changes in the binding sequence. Alterations to the 5′ flanking sequence as well as the 3′ flanking sequence did not negatively impact binding. FIG. 4 discloses SEQ ID NO: 5544.
  • FIG. 5: Maximizing binding potential of the CTCF binding site. Modifications were made to the reference binding site (bottom) to combine nucleotide changes that, individually, showed increased binding activity of the CTCF zinc finger array. The core sequence motif is bold while changes made are underlined. Binding activity of the 11-finger CTCF zinc finger array was quantified in the B2H Beta-galactosidase reporter assay in triplicate. Fold activation reflects binding activity above background levels when no DNA binding protein is present. FIG. 5 discloses SEQ ID NOS 5549-5550 and 5544, respectively, in order of appearance.
  • FIG. 6: Diagram of B2H Beta-lactamase inhibitor selection. The selection system contained the same components as the reporter system except successful binding of the zinc finger array to the CBS drove BlaC expression, an inhibitor of the beta-lactamase class of antibiotics, instead of lacZ. Expression of BlaC allowed for growth on Carbenicillin plates. The selection was driven by the addition of Clavulanic acid, an inhibitor of beta lactamase inhibitors. Low level expression of BlaC can result in growth on Carbenicillin plates, but the addition of clavulanic acid inhibits BlaC activity and results in the depletion of false positives and further enrichment of strong binders to any modification made to the binding site. Libraries of mutations in the zinc finger array fused to gal11P were selected for binders to an altered binding sequence through low stringency conditions followed by selection on a gradient of clavulanic acid. Growth on the highest stringency end of the gradient indicated variants in the zinc finger array that are strong binders to the new binding sequence.
  • FIGS. 7A-7C: Binding activity of variants on altered CTCF binding sites. Variants picked from the high stringency gradient of the selective plates were tested for binding activity on sequences representing all four possible nucleotides at position 2 of the core sequence (gray star). Amino acid sequence of variants pulled out of the selection were listed above the heat map and the nucleotide present at position 2 of the core sequence was indicated on the y-axis. FIG. 7A: The nucleotide at position 2 is T. FIG. 7B: The nucleotide at position 2 is A. FIG. 7C: The nucleotide at Binding was quantified by the beta-galactosidase reporter system and colorimetric ONPG assay. Binding activity of wild-type CTCF zinc finger array on the wild-type binding site sequence was indicated by the white dot. A diagram of the ZF7 alpha recognition helix for each nucleotide change is on the left. It included the amino acid residues interacting with the triplet in the binding sequence. The amino acid at position 3 of the alpha helix was varied in the library and is indicated by an ‘X’. FIGS. 7A-C disclose “RKSXLGV” as SEQ ID NO: 5551.
  • FIG. 8: Increasing the variation within the recognition helix produced stronger binders. Four amino acids were targeted for variance in the library to allow for more flexibility in the selection and generate stronger binders to the modified binding site of choice. ZF7 targeting a C:G change at position 2 (gray star) of the core sequence was selected for variants using the expanded approach. Each amino acid codon was replaced with ‘VNS’ codons at the indicated sites (‘X’). Twelve colonies were picked from the high-stringency end of the selection and tested for their ability to bind to the CTCF binding site when the indicated nucleotide is at positon 2 of core sequence. Amino acid sequence of the variants selected are listed on the x-axis and the nucleotide at position two of the core sequence is on the y-axis. Wild-type zinc finger array binding activity on wild-type binding sequence is indicated by the white dot. FIG. 8 discloses “RKSXLGV” as SEQ ID NO: 5551, “AHLQV” as SEQ ID NO: 10, “DHLRT” as SEQ ID NO: 16, “DHLAT” as SEQ ID NO: 17, “DHLQT” as SEQ ID NO: 8, “DHLQV” as SEQ ID NO: 12, “SDLGV” as SEQ ID NO: 5552, “EHLKV” as SEQ ID NO: 13, “EHLVV” as SEQ ID NO: 15, “EHLNV” as SEQ ID NO: 9 and “EHLRE” as SEQ ID NO: 11.
  • FIGS. 9A-9C: Selected variants binding altered binding sites sequence at position 3 of core motif in CBS. Selections performed on library of variants centered around alterations in position −1 to 3 of recognition helix in ZF7 of the 11 finger CTCF zinc finger array. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 3 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines. (A) Selections performed on A:T change in the binding site, (B) A:G change, (C) A:C change. Most variants pulled out had relaxed binding specificity instead of altered specificity. FIGS. 9A-C disclose “RKSD” as SEQ ID NO: 711, “RKHD” as SEQ ID NO: 173, “RRSD” as SEQ ID NO: 174, “RKAD” as SEQ ID NO: 175, “IPRI” as SEQ ID NO: 176, “RKDD” as SEQ ID NO: 177, “QALL” as SEQ ID NO: 180, “PHRM” as SEQ ID NO: 181, “ELLN” as SEQ ID NO: 179 and “GIVN” as SEQ ID NO: 178.
  • FIGS. 10A-10B: Selections performed targeting sequence changes at position 5 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of the ZF6 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 5 of the core motif in the core motif of the CBS (gray star). Direct protein-DNA contacts were indicated by dashed lines. (A) Selections performed on C:T change in the binding site, (B) C:G change. No variants grew beyond the low stringency end of the gradient on selection plates for C:A change and were considered weak/insufficient binders. Most variants pulled out had relaxed binding specificity instead of altered specificity with the exception of THMKR′ (SEQ ID NO: 33) targeting C:G change in the binding sequence. FIGS. 10A-B disclose “GNMAR” as SEQ ID NO: 182, “NAMKR” as SEQ ID NO: 30, “EGMTR” as SEQ ID NO: 183, “NAMRG” as SEQ ID NO: 185, “GTMKM” as SEQ ID NO: 1255, “SNMVR” as SEQ ID NO: 184, “DHMNR” as SEQ ID NO: 32, “EHMRR” as SEQ ID NO: 34, “EHMGR” as SEQ ID NO: 31, “THMNR” as SEQ ID NO: 35 and “THMKR” as SEQ ID NO: 33.
  • FIGS. 11A-11C: Selections performed targeting sequence changes at position 6 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position −1 to 3 of ZF6 recognition helix. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 6 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines. (A) Selections performed on A:T change in the binding site, (B) A:G change, (C) A:C change. Variants analyzed from the A:T selection had relaxed binding profile while variants from A:G selection showed strong binding for only the changed nucleotide. No good binders were identified in the A:C selection. FIGS. 11A-C disclose “MMES” as SEQ ID NO: 36, “QSGT” as SEQ ID NO: 1582, “HRES” as SEQ ID NO: 37, “RHDT” as SEQ ID NO: 40, “RPDT” as SEQ ID NO: 38, “RTDI” as SEQ ID NO: 39, “RADN” as SEQ ID NO: 167 and “ERKS” as SEQ ID NO: 1479.
  • FIGS. 12A-12C: Selections performed targeting sequence changes at position 7 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 4 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 7 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 12A-C disclose “DGLRV” as SEQ ID NO: 45, “HGLKV” as SEQ ID NO: 41, “HRLKE” as SEQ ID NO: 42, “HALKV” as SEQ ID NO: 43, “YKLKR” as SEQ ID NO: 5553, “SRLKE” as SEQ ID NO: 44, “HTLKV” as SEQ ID NO: 46 and “NRLKE” as SEQ ID NO: 47.
  • FIGS. 13A-13C: Selections performed targeting sequence changes at position 8 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 8 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. Note the different variants that appear with the same library being used to bind to the same changes in the sequence, but in a different position on the binding site. FIGS. 13A-13C disclose “GGLVR” as SEQ ID NO: 50, “QALRR” as SEQ ID NO: 49, “HGLIR” as SEQ ID NO: 51, “YKLKR” as SEQ ID NO: 5553, “ATLKR” as SEQ ID NO: 48, “GGLTR” as SEQ ID NO: 55, “HGLVR” as SEQ ID NO: 54, “ANLSR” as SEQ ID NO: 52, “TGLTR” as SEQ ID NO: 53, “HGLRR” as SEQ ID NO: 59, “ADLKR” as SEQ ID NO: 58, “HTLRR” as SEQ ID NO: 56 and “TVLKR” as SEQ ID NO: 57.
  • FIGS. 14A-14C: Selections performed targeting sequence changes at position 10 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 10 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. G:C selection did not produce any growth at the high stringency end of the gradient selective plates. Binding data reflects colonies picked from mid-tier region, which is why they did not perform well as binders. White dot indicates wild-type CTCF zinc finger array binding activity on wild-type binding sequence. FIGS. 14A-C disclose “GHLRK” as SEQ ID NO: 162, “AKLRL” as SEQ ID NO: 3311, “AHLRK” as SEQ ID NO: 60, “SKLKR” as SEQ ID NO: 3470, “GGLGL” as SEQ ID NO: 62, “AKLRI” as SEQ ID NO: 63, “AKLRV” as SEQ ID NO: 61, “EKLRI” as SEQ ID NO: 186, “SKLRV” as SEQ ID NO: 65 and “TKLKV” as SEQ ID NO: 64.
  • FIGS. 15A-15C: Selections performed targeting sequence changes at position 11 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 11 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 15A-C disclose “RRLDR” as SEQ ID NO: 67, “SKLKR” as SEQ ID NO: 3470, “ATLRR” as SEQ ID NO: 66, “GNLTR” as SEQ ID NO: 70, “ANLRR” as SEQ ID NO: 69, “TNLRR” as SEQ ID NO: 68, “AMLRR” as SEQ ID NO: 73, “AMLKR” as SEQ ID NO: 71, “HMLTR” as SEQ ID NO: 72 and “TMLRR” as SEQ ID NO: 74.
  • FIGS. 16A-16C: Selections performed targeting sequence changes at position 13 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF3 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 13 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 16A-C disclose “QQLLI” as SEQ ID NO: 79, “QQLLV” as SEQ ID NO: 77, “QQLIV” as SEQ ID NO: 75, “GELVV” as SEQ ID NO: 78, “GELVR” as SEQ ID NO: 5554, “SQLIV” as SEQ ID NO: 76, “QGLLV” as SEQ ID NO: 83, “GQLTV” as SEQ ID NO: 81, “GQLIV” as SEQ ID NO: 80, “GKLVT” as SEQ ID NO: 187, “TELII” as SEQ ID NO: 82, “GQLLT” as SEQ ID NO: 85, “QQLLT” as SEQ ID NO: 84, “GELLT” as SEQ ID NO: 86 and “ATLAD” as SEQ ID NO: 5555.
  • FIG. 17: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Diagram of the recognition helices of zinc finger 4-7 out of the 11 finger array, binding to their respective triplets in the core motif of the CBS. Altered amino acids are indicated by ‘X’ and nucleotide changes to the wild-type CBS are indicated by a gray star in the diagram and by bolded letters. ZF1-3 and ZF8-11 were unmodified in this library Protein-DNA contacts are indicated by lines between the ZF recognition helices and the CBS sequence. Wild-type CTCF 11-finger zinc finger array binding strength to wild-type CBS is indicated by a white dot. The amino acid sequence of each variant recognition helix in ZF4-7 are listed on the y-axis and binding activity on the modified CBS (changes in red) or the wild-type CBS are reflected by B2H β-gal reporter assay. FIG. 17 discloses “CGTGGTGCGAAC” as SEQ ID NO: 5556, “CAAGCGTGGTGCGCT” as SEQ ID NO: 5557, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “ERLRV” as SEQ ID NO: 93, “RPDT” as SEQ ID NO: 38, “DNLLA” as SEQ ID NO: 100, “AKLKK” as SEQ ID NO: 88, “AKLRK” as SEQ ID NO: 89, “NRLKV” as SEQ ID NO: 94, “RTET” as SEQ ID NO: 98, “SNLLV” as SEQ ID NO: 101, “AHLRV” as SEQ ID NO: 90, “SRLKE” as SEQ ID NO: 44, “DNLMA” as SEQ ID NO: 102, “AKLRV” as SEQ ID NO: 61, “SKLRL” as SEQ ID NO: 92, “RADV” as SEQ ID NO: 99 and “DNLRV” as SEQ ID NO: 103.
  • FIG. 18: Binding activity of multi-finger variants on multiple sequence changes to the CBS. The same selection as before except now there is a C:G change at position 2 of the CBS, where previously there was a C:A change. Variants pulled out of this selection had binding activity on the modified CBS without binding to the wild-type CBS. Wild-type 11-finger ZF array only showed binding activity on wild-type CBS (white dot) and no ability to bind to the modified CBS. Interestingly, the dominant variant selected for in the library contained a mutation that occurs at position 9 of the recognition helix that was either introduced during oligo synthesis (0.05% chance of the wrong nucleotide at each position) or through PCR while constructing these libraries. FIG. 18 discloses “CGTGGTGCGAGC” as SEQ ID NO: 5559, “CGAGCGTGGTGCGCT” as SEQ ID NO: 5560, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GHLKK” as SEQ ID NO: 158, “SRLKE” as SEQ ID NO: 44, “EHLKV” as SEQ ID NO: 13, “RPDT(MK)R” as SEQ ID NO: 5561, “AHLRK” as SEQ ID NO: 60, “DALRR” as SEQ ID NO: 108, “RTEN” as SEQ ID NO: 112, “DHLLA” as SEQ ID NO: 114, “DGLKR” as SEQ ID NO: 109, “RPDT” as SEQ ID NO: 38, “HHLDV” as SEQ ID NO: 115, “GKLRI” as SEQ ID NO: 106 and “TRLRE” as SEQ ID NO: 110.
  • FIG. 19: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG. 19 discloses “DTYKLKR” as SEQ ID NO: 3, “CAGGGGAGGAAC” as SEQ ID NO: 5562, “CAAGGAGGGGACGCT” as SEQ ID NO: 5563, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “SNLRR” as SEQ ID NO: 116, “EHMKR” as SEQ ID NO: 123, “DNLLT” as SEQ ID NO: 128, “GNLVR” as SEQ ID NO: 117, “EHMRR” as SEQ ID NO: 34, “DNLLV” as SEQ ID NO: 129, “GNLRR” as SEQ ID NO: 118, “THMKR” as SEQ ID NO: 33, “DNLQT” as SEQ ID NO: 130, “GNLKR” as SEQ ID NO: 119, “EHMNR” as SEQ ID NO: 126, “DNLLA” as SEQ ID NO: 100, “ANLRR” as SEQ ID NO: 69, “DNLAT” as SEQ ID NO: 132, “DNLQA” as SEQ ID NO: 133, “NNLRR” as SEQ ID NO: 121, “DNLMA” as SEQ ID NO: 102, “TNLRR” as SEQ ID NO: 68, “EHMAR” as SEQ ID NO: 127 and “DNLMT” as SEQ ID NO: 135.
  • FIG. 20: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG. 20 discloses “DTYKLKR” as SEQ ID NO: 3, “CAGGGGAGGAGC” as SEQ ID NO: 5564, “CGAGGAGGGGACGCT” as SEQ ID NO: 5565, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GNLVR” as SEQ ID NO: 117, “EHMNR” as SEQ ID NO: 126, “EHLKV” as SEQ ID NO: 13, “GNLRR” as SEQ ID NO: 118, “EHMKR” as SEQ ID NO: 123, “EHLAE” as SEQ ID NO: 151, “GNLAR” as SEQ ID NO: 138, “EHMRR” as SEQ ID NO: 34, “STLNE” as SEQ ID NO: 152, “GNLMR” as SEQ ID NO: 139, “SHMNR” as SEQ ID NO: 146, “DHLQV” as SEQ ID NO: 12, “ANLRR” as SEQ ID NO: 69, “SHMRR” as SEQ ID NO: 147, “EHLNV” as SEQ ID NO: 9, “SNLRR” as SEQ ID NO: 116, “DHLNT” as SEQ ID NO: 155, “EHLQA” as SEQ ID NO: 156, “NNLRR” as SEQ ID NO: 121, “THMKR” as SEQ ID NO: 33, “DHMNR” as SEQ ID NO: 32 and “HHLMH” as SEQ ID NO: 157.
  • FIG. 21: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS (white dot), but not the modified one. FIG. 21 discloses “CGTGGTGCGACC” as SEQ ID NO: 5566, “RKSDLGV” as SEQ ID NO: 5, “CCAGCGTGGTGCGCT” as SEQ ID NO: 5567, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GHLKK” as SEQ ID NO: 158, “TRLKE” as SEQ ID NO: 165, “RADN” as SEQ ID NO: 167, “AHLKK” as SEQ ID NO: 159, “RHDT” as SEQ ID NO: 40, “TKLRL” as SEQ ID NO: 160, “SRLKE” as SEQ ID NO: 44, “RRDT” as SEQ ID NO: 169, “TKLKL” as SEQ ID NO: 161, “RPDT” as SEQ ID NO: 38, “GHLRK” as SEQ ID NO: 162, “RTSS” as SEQ ID NO: 171, “RNDT” as SEQ ID NO: 172, “THLKK” as SEQ ID NO: 163 and “AHLRK” as SEQ ID NO: 60.
  • FIG. 22: Wild-type CTCF has binding activity to wild-type CTCF target site and no binding activity to two variant target sites. To confirm endogenous CTCF binds to the wild-type CBSs and not the variant binding sites, as seen in the B2H assay, in a human cell context, we harvested K562 cells, a human erythroleukemia cell line, and examined CTCF binding through ChIP-qPCR. CTCF was assayed for binding to a known CTCF target site and to two endogenous variant binding site sequences using a CTCF specific antibody to enrich for genomic DNA crosslinked to CTCF. Two sets of qPCR primers were designed for each binding site (indicated by 1.1, 1.2, etc). Binding was determined by enrichment of target site above 1% input of crosslinked and sonicated sample not treated with antibody, which is to represent the levels of the site of interest as a fold increase over the frequency of the site of interest in a sample unenriched with antibody. Antibody based enrichment of each sample is quantified by fold enrichment above untreated, and therefore unenriched, input. The negative control reflects background qPCR amplification levels of a target site that CTCF does not bind to. Anything above this negative level is considered enriched indicating CTCF binding while anything below is considered to not be unenriched, and therefore no binding by CTCF. Wild-type CTCF binds to the wild-type target site with no detectable binding to the variant binding sites as predicted by the bacterial B2H reporter assay
  • FIGS. 23A-23B: Exogenous wild-type and variant CTCF binding activity in human cells. Two endogenous variant binding site sequences, matching one of the five variant binding sites that CTCF variants were selected on, were identified in the human genome (Variant site 1 and Variant site 2). Both wild-type CTCF with a 3×HA tag and one of the 3×HA tagged engineered CTCF variants, selected to bind to the variant binding site sequence of Variant site 1 and Variant site 2, were assayed for binding in human cells through ChIP-qPCR. FIG. 23A: 3×HA tagged wild-type CTCF binds to wild-type CTCF binding site and does not bind to either variant binding site. Human K562 cells were transfected with plasmid expressing 3×HA tagged CTCF and processed with HA antibody to enrich specifically for the exogenous CTCF (3×HA tagged) and not endogenous CTCF (no tag) binding. A negative control is provided to show ChIP-qPCR levels with no enrichment for a region that is not occupied by CTCF. These results demonstrate exogenous wild-type CTCF has the same binding activity as endogenous CTCF. FIG. 23B: 3×HA tagged variant CTCF binds to variant binding sites and does not bind to wild-type CTCF binding site. K562 cells expressing variant CTCF tagged with 3×HA were analyzed by ChIP-qPCR and treated with HA specific antibody. The same sites as in FIGS. 22 and 23A were investigated for variant CTCF binding. The variant CTCF could bind to the variant sites as indicated by enrichment with variant specific HA antibody and no detectable binding was seen at the wild-type binding site as indicated by lack of HA antibody-based enrichment.
  • FIGS. 24A-24B: Changes in gene expression relative to wild-type control of genes located around variant binding sites. A variant CTCF selected to the G3 binding site sequence and variant CTCF selected to the Other binding site sequence were expressed in wild-type K562s. The variant CTCFs were fused to GFP and RNA was isolated from GFP+ cells 72 hours post nucleofection. cDNA was generated from the RNA and quantified by RT-qPCR. Gene expression levels across samples were normalized to a house keeping gene (HPRT). Changes in gene expression are relative to gene expression levels in wild-type K562s expressing wild-type CTCF tagged with GFP. FIG. 24A. Changes in gene expression of genes around G3 variant binding site in the presence of variant CTCF relative to the wild-type CTCF control. FIG. 24B. Changes in gene expression of genes around Other variant binding site relative to the wild-type control.
  • FIG. 25: Introduction of variant binding sites upstream of MYC leads to reduction of Endogenous MYC expression. The CTCF binding site ˜2 kb upstream of the MYC TSS was replaced with one of six different sequences used for CTCF variant selections (listed in table). The introduction of these sequences with 4-6 nucleotide changes from the wild-type CTCF binding site sequence result in a reduction of endogenous MYC expression to the same levels as when the CTCF binding site is deleted and loop formation is disrupted. WT_6 sequence has 4 point mutations from the native CTCF binding site, but these changes should have no impact on wild-type CTCF binding as indicated by results from the B2H reporter assay. This appears to be the case as MYC expression levels in the WT_6 cell line are comparable to wild-type K562 MYC expression levels. Because K562 vitality is linked to MYC expression, all variant cell lines were generated in a K562 cell line with exogenous MYC expressed off of a separate PGK promoter (exoMYC.K562). FIG. 25 discloses SEQ ID NOS 5568-5573, respectively, in order of appearance.
  • FIGS. 26A-26B: Variant CTCFs are able to bind the engineered G3 variant binding site and recover MYC expression. CTCF variants selected to bind to the G3 variant binding site sequence were expressed in the G3_3.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. G3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. FIG. 26A. Endogenous MYC levels are recovered to wild-type levels in the G3_3 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of G3_3 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 26B. CTCF variants are able to bind to the introduced variant binding site in G3_3 cell line while the wild-type CTCF does not. CTCF Ab specific enrichment captures both wild-type and variant CTCF while HA Ab will only detect HA-tagged CTCF (transiently expressed). exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line. exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding. The exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the G3_3 cell line. ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ˜2 kb upstream of MYC (MYC site). An endogenous G3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively. The variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody, while the wild-type CTCF does not. FIGS. 26A-B disclose “CAGGGGAGGAGC” as SEQ ID NO: 5564, “DTYKLKR” as SEQ ID NO: 3, “SNLRR” as SEQ ID NO: 116, “GNLRR” as SEQ ID NO: 118, “GNLVR” as SEQ ID NO: 117, “ANLRR” as SEQ ID NO: 69, “GNLMR” as SEQ ID NO: 139, “NNLRR” as SEQ ID NO: 121, “GNLAR” as SEQ ID NO: 138, “SKLKR” as SEQ ID NO: 3470, “EHMKR” as SEQ ID NO: 123, “EHMRR” as SEQ ID NO: 34, “EHMNR” as SEQ ID NO: 126, “SHMNR” as SEQ ID NO: 147, “SHMNR” as SEQ ID NO: 146, “THMKR” as SEQ ID NO: 33, “DHMNR” as SEQ ID NO: 32, “GTMKM” as SEQ ID NO: 1255, “DHLNT” as SEQ ID NO: 155, “EHLAE” as SEQ ID NO: 151, “DHLQV” as SEQ ID NO: 12, “EHLKV” as SEQ ID NO: 13, “STLQE” as SEQ ID NO: 225, “EHLNV” as SEQ ID NO: 9, “STLNE” as SEQ ID NO: 152, “EHLQA” as SEQ ID NO: 156, “HHLMH” as SEQ ID NO: 157 and “SDLGV” as SEQ ID NO: 5552.
  • FIGS. 27A-27B: Variant CTCFs are able to bind the engineered A3 variant binding site and recover MYC expression. CTCF variants selected to bind to the A3 variant binding site sequence were expressed in the A3_4.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. A3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. FIG. 27A. Endogenous MYC levels are recovered to wild-type levels in the A3_4 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of A3_4 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 27B. CTCF variants are able to bind to the introduced variant binding site in A3_4 cell line while the wild-type CTCF does not. CTCF Ab specific enrichment captures both wild-type and variant CTCF while HA Ab will only detect HA-tagged CTCF (transiently expressed). exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line. exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding. The exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the A3_4 cell line. ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ˜2 kb upstream of MYC (MYC site). An endogenous A3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively. The variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody above the negative control, while the wild-type CTCF does not bind. FIGS. 27A-B disclose “CAGGGGAGGAAC” as SEQ ID NO: 5562, “DTYKLKR” as SEQ ID NO: 3, “GNLKR” as SEQ ID NO: 119, “GNLVR” as SEQ ID NO: 117, “SNLRR” as SEQ ID NO: 116, “ANLRR” as SEQ ID NO: 69, “GNLRR” as SEQ ID NO: 118, “NNLRR” as SEQ ID NO: 121, “TNLRR” as SEQ ID NO: 68, “SKLKR” as SEQ ID NO: 3470, “EHMNR” as SEQ ID NO: 126, “EHMRR” as SEQ ID NO: 34, “EHMKR” as SEQ ID NO: 123, “THMKR” as SEQ ID NO: 33, “EHMAR” as SEQ ID NO: 127, “GTMKM” as SEQ ID NO: 1255, “DNLLA” as SEQ ID NO: 100, “DNLLV” as SEQ ID NO: 129, “DNLQA” as SEQ ID NO: 133, “DNLLT” as SEQ ID NO: 128, “DNLAT” as SEQ ID NO: 132, “DNLQT” as SEQ ID NO: 130, “DNLMA” as SEQ ID NO: 102, “DNLMT” as SEQ ID NO: 135 and “SDLGV” as SEQ ID NO: 5552.
  • FIG. 28: Variant CTCFs recover MYC expression of the Other 10 variant binding site cell line. CTCF variants selected to bind to the Other variant binding site sequence were expressed in the Other 10.K562 cell line. Cells were analyzed for MYC expression 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type CTCFs (indicated by (wt) are listed in the legend. Other binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. A. Endogenous MYC levels are recovered to wild-type levels in the Other 10 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Other 10 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 28 discloses “RKSDLGV” as SEQ ID NO: 5, “CGTGGTGCGACC” as SEQ ID NO: 5574, “TKLRL” as SEQ ID NO: 160, “THLKK” as SEQ ID NO: 163, “GHLRK” as SEQ ID NO: 162, “TKLKL” as SEQ ID NO: 161, “AHLRK” as SEQ ID NO: 60, “AHLKK” as SEQ ID NO: 159, “SKLKR” as SEQ ID NO: 3470, “SRLKE” as SEQ ID NO: 44, “TRLKE” as SEQ ID NO: 165, “YKLKR” as SEQ ID NO: 5553, “RRDT” as SEQ ID NO: 169, “RPDT” as SEQ ID NO: 38, “RNDT” as SEQ ID NO: 172, “RADN” as SEQ ID NO: 167, “RHDT” as SEQ ID NO: 40 and “QSGT” as SEQ ID NO: 1582.
  • FIG. 29: Variant CTCFs recover MYC expression of the Aother_2 variant binding site cell line. CTCF variants selected to bind to the Aother variant binding site sequence were expressed in the Aother_2.K562 cell line. Cells were analyzed for MYC expression 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type CTCFs (indicated by (wt) are listed in the legend. Aother binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. A. Endogenous MYC levels are recovered to wild-type levels in the Aother_2 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Aother_2 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 29 discloses “CGTGGTGCGAAC” as SEQ ID NO: 5575, “AKLRK” as SEQ ID NO: 89, “AKLRV” as SEQ ID NO: 61, “SKLRL” as SEQ ID NO: 92, “SKLKR” as SEQ ID NO: 3470, “NRLKV” as SEQ ID NO: 94, “SRLKE” as SEQ ID NO: 44, “YKLKR” as SEQ ID NO: 5553, “RTET” as SEQ ID NO: 98, “RPDT” as SEQ ID NO: 38, “RADV” as SEQ ID NO: 99, “QSGT” as SEQ ID NO: 1582, “SNLLV” as SEQ ID NO: 101, “DNLMA” as SEQ ID NO: 102, “DNLRV” as SEQ ID NO: 103 and “SDLGV” as SEQ ID NO: 5552.
  • DETAILED DESCRIPTION
  • To date, there are no engineered CTCF variants available that are designed to bind to mutant CBSs with higher affinity than wild-type CTCF. Therefore, there is a need for engineered CTCF variants that can bind to mutant CBSs with higher affinity than wild-type CTCF.
  • The present disclosure is based, at least in part, on the discovery that CTCF variants with alterations in the zinc finger array can be engineered to recognize CBSs that harbor one or more point mutations, i.e., mutant CBSs.
  • CTCF
  • CCCTC-binding factor (CTCF) is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression. CTCF maintains topologically associated domains (TADs) spanning megabases of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters. In addition, CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting homologous recombination repair at double-strand breaks. Wild type CTCF binds throughout the genome via an 11 finger zinc finger array that recognizes canonical CTCF binding sites (CBSs).
  • Wild-type CTCF ZF arrays comprise the following sequences at ZFs 3-6 positions −1 to +6:
  • ZF3 positions −1 to +6:
    (SEQ ID NO: 1)
    TSGELVR
    ZF4 positions −1 to +6:
    (SEQ ID NO: 2)
    EVSKLKR
    ZF5 positions −1 to +6:
    (SEQ ID NO: 3)
    DTYKLKR
    ZF6 positions −1 to +6:
    (SEQ ID NO: 4)
    QSGTMKM
    ZF7 positions −1 to +6:
    (SEQ ID NO: 5)
    RKSDLGV
  • A wild-type CTCF has an amino acid sequence that has greater than 80%, greater than 85%, greater than 90%, greater than 95%, greater than 96%, greater than 97%, greater than 98% or greater than 99% sequence identity as compared to the amino acid sequence shown below:
  • (SEQ ID NO: 190)
    MEGDAVEAIVEESETFIKGKERKTYQRRREGGQEEDACHLPQNQTDGGEV
    VQDVNSSVQMVMMEQLDPTLLQMKTEVMEGTVAPEAEAAVDDTQIITLQV
    VNMEEQPINIGELQLVQVPVPVTVPVATTSVEELQGAYENEVSKEGLAES
    EPMICHTLPLPEGFQVVKVGANGEVETLEQGELPPQEDPSWQKDPDYQPP
    AKKTKKTKKSKLRYTEEGKDVDVSVYDFEEEQQEGLLSEVNAEKVVGNMK
    PPKPTKIKKKGVKKTFQCELCSYTCPRRSNLDRHMKSHTDERPHKCHLCG
    RAFRTVTLLRNHLNTHTGTRPHKCPDCDMAFVTSGELVRHRRYKHTHEKP
    FKCSMCDYASVEVSKLKRHIRSHTGERPFQCSLCSYASRDTYKLKRHMRT
    HSGEKPYECYICHARFTQSGTMKMHILQKHTENVAKFHCPHCDTVIARKS
    DLGVHLRKQHSYIEQGKKCRYCDAVFHERYALIQHQKSHKNEKRFKCDQC
    DYACRQERHMEVIHKRTHTGEKPYACSHCDKTFRQKQLLDMHFKRYHDPN
    FVPAAFVCSKCGKTFTRRNTMARHADNCAGPDGVEGENGGETKKSKRGRK
    RKMRSKKEDSSDSENAEPDLDDNEDEEEPAVEIEPEPEPQPVTPAPPPAK
    KRRGRPPGRTNQPKQNQPTAIIQVEDQNTGAIENIIVEVKKEPDAEPAEG
    EEEEAQPAATDAPNGDLTPEMILSMMDR
  • For the purpose of comparing two different nucleic acid or polypeptide sequences, one sequence (test sequence) may be described to be a specific percentage identical to another sequence (comparison sequence). The percentage identity can be determined by the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 90:5873-5877 (1993), which is incorporated into various BLAST programs. The percentage identity can be determined by the “BLAST 2 Sequences” tool, which is available at the National Center for Biotechnology Information (NCBI) website. See Tatusova and Madden, FEMS Microbiol. Lett., 174(2):247-250 (1999). For pairwise DNA-DNA comparison, the BLASTN program is used with default parameters (e.g., Match: 1; Mismatch: −2; Open gap: 5 penalties; extension gap: 2 penalties; gap x_dropoff: 50; expect: 10; and word size: 11, with filter). For pairwise protein-protein sequence comparison, the BLASTP program can be employed using default parameters (e.g., Matrix: BLOSUM62; gap open: 11; gap extension: 1; x_dropoff: 15; expect: 10.0; and wordsize: 3, with filter). Percent identity of two sequences is calculated by aligning a test sequence with a comparison sequence using BLAST, determining the number of amino acids or nucleotides in the aligned test sequence that are identical to amino acids or nucleotides in the same position of the comparison sequence, and dividing the number of identical amino acids or nucleotides by the number of amino acids or nucleotides in the comparison sequence. When BLAST is used to compare two sequences, it aligns the sequences and yields the percent identity over defined, aligned regions. If the two sequences are aligned across their entire length, the percent identity yielded by the BLAST is the percent identity of the two sequences. If BLAST does not align the two sequences over their entire length, then the number of identical amino acids or nucleotides in the unaligned regions of the test sequence and comparison sequence is considered to be zero and the percent identity is calculated by adding the number of identical amino acids or nucleotides in the aligned regions and dividing that number by the length of the comparison sequence. Various versions of the BLAST programs can be used to compare sequences, e.g., BLAST 2.1.2 or BLAST+ 2.2.22.
  • CTCF Binding Sites (CBSs)
  • The CBS is typically 40 bp in length with a highly conserved 15 bp core sequence (or core motif). Sequence flanking the core sequence is significantly less well conserved, but still important for CTCF binding at sites throughout the genome (FIG. 1).
  • Wild type CTCF binds to a “consensus CBS motif” contains the following core sequence:
  • (SEQ ID NO: 191)
    5′-NCDNHNGRNGDNNNN-3′.
  • In one embodiment, the consensus CBS motif contains the following core sequence: 5′-CCAGCAGGGGGCGCT-3′ (SEQ ID NO:6). Other core sequences that are known in the art.
  • It is not known if the nucleotides flanking the core sequence are bound by the 11 finger ZF array present within CTCF. Co-crystal structures of the 11-finger Zinc Finger (ZF) array bound to a consensus CTCF Binding Sequence (CBS) suggests that only ZFs 3-7 of the 11-finger ZF array appear to bind directly to the highly conserved core sequence while ZFs 8-11 and 1-2 do not appear to mediate sequence-specific contacts. Progressive truncations of the ZF array suggest that ZFs 8-11 and ZFs 1-2 may improve DNA-binding of CTCF to CBSs and DNaseI foot printing, as well as ChIP-Seq and ChIP-Exo data, suggests that ZFs 9-11 may make important protein-DNA contacts (Rhee and Pugh, Cell (2011); Nakahashi et al., Cell Reports (2013)). Interestingly, the co-crystal structure of the CTCF Z array bound to a CBS only contains zinc finger 2-9 with the other fingers not visible in the structure, consistent with the idea that zinc fingers interacting with flanking regions of the motif may not make stable contacts with the DNA (Hashimoto, et al., Molecular Cell (2017)). Thus, it remains unclear what impact all 11 fingers of the array have on DNA binding activity of CTCF and if all zinc fingers, or a subset, contact the DNA.
  • CTCF binding is sensitive to changes in the conserved 15 bp core motif of the CBS, where, in mice, single nucleotide changes at certain positions can lead to loss of CTCF binding (Nakahashi et al., Cell Reports (2013)). CTCF binding sites have been reported to be mutational hotspots in cancer with cancer-associated mutations localized to the core sequence of the CTCF binding site in primary samples from gastrointestinal cancer patients and with accompanying atypical gene expression profiles of oncogenic and tumor suppressor genes (Guo et al., Nature Communications (2018)). Small deletions of CTCF binding sites have also been shown to lead to loss of expression of genes such as MYC and PTGS2, which both play a role in cancer development (Schuijers et al., Cell Reports (2018); Kang et al., Oncogene (2015)).
  • Methods described herein can be used to select and generate engineered CTCF variants comprising a plurality of zinc fingers, where the selected polypeptide has at least one amino acid residue in at least one zinc finger that differs in sequence from a wild-type CTCF, and where the engineered CTCF variant binds to a DNA sequence of interest (e.g., CBS harboring at least one mutation in the consensus CBS sequence) but does not bind to a consensus CBS. Using methods of the present invention, a scaffold polypeptide is re-engineered into a new scaffold-based zinc-finger polypeptide that has different structural and functional features, such that the new polypeptide binds to a sequence of interest but does not bind to a naturally occurring DNA binding site of the scaffold protein.
  • The term “zinc finger” or “Zf” refers to a polypeptide having DNA binding domains that are stabilized by zinc. The individual DNA binding domains are typically referred to as “fingers.” A Zf protein has at least one finger, preferably 2 fingers, 3 fingers, or 6 fingers. A Zf protein having two or more Zfs is referred to as a “multi-finger” or “multi-Zf” protein. Each finger typically comprises an approximately 30 amino acid, zinc-chelating, DNA-binding domain. An exemplary motif characterizing one class of these proteins is -Cys-(X) (2-4)-Cys-(X) (12)-His-(X) (3-5)-His (SEQ ID NO:7), where X is any amino acid, which is known as the “C(2)H(2)class.” A single Zf of this class typically consists of an alpha helix containing the two invariant histidine residues co-ordinated with zinc along with the two cysteine residues.
  • The term “bind to” or “binding” with respect to a nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to sequence-dependent binding of the nucleic acid binding factor to the target nucleic acid sequence of a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, in such a way that the binding allows the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, recruitment of other proteins to the binding site, and/or alteration of chromatic structure. Such binding can result in modulation of expression of genes, such as activation, overexpression, suppression, or inactivation of gene expression.
  • The term “does not bind to” with respect to a nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to the lack of sequence-specific binding of the nucleic acid binding factor to a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, as a result of the lack of presence of a target sequence in the nucleic acid (e.g., due to one or more point-mutations in the CBS). Such non-binding does not allow the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, DNA modification, DNA cleavage, recruitment of other proteins to the binding site, and/or alteration of chromatic structure.
  • Each finger within a Zf protein binds to from about two to about five base pairs within a DNA sequence. Typically a single Zf within a Zf protein binds to a three or four base pair “subsite” within a DNA sequence. Accordingly, a “subsite” is a DNA sequence that is bound by a single zinc finger. A “multi-subsite” is a DNA sequence that is bound by more than one zinc finger, and comprises at least 4 bp, preferably 6 bp or more. A multi-Zf protein binds at least two, and typically three, four, five, six or more subsites, i.e., one for each finger of the protein.
  • Compositions and Methods
  • Described herein are engineered CTCF variants that can bind to mutant CBSs with higher affinity than wild-type CTCF. The engineered CTCF variants can be used in regulating genes that are under the control of mutant CBSs (CBSs having at least one nucleic acid that is different in sequence from the nucleic acid sequence of a consensus CBS). The CTCF variants have at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF.
  • Exemplary engineered CTCF variants include those that contain:
  • (1) the amino acid sequence DHLQT (SEQ ID NO:8), EHLNV (SEQ ID NO:9), AHLQV (SEQ ID NO:10), EHLRE (SEQ ID NO:11), DHLQV (SEQ ID NO:12), EHLKV (SEQ ID NO:13), DHLQV (SEQ ID NO:14), EHLVV (SEQ ID NO:15), DHLRT (SEQ ID NO:16), DHLAT (SEQ ID NO:17), or DHLQT (SEQ ID NO:18) at ZF7 positions +2 to +6;
  • (2) the amino acid sequence DHLQT (SEQ ID NO:19), EHLNV (SEQ ID NO:20), AHLQV (SEQ ID NO:21), EHLRE (SEQ ID NO:22), DHLQV (SEQ ID NO:23), EHLKV (SEQ ID NO:24), DHLQV (SEQ ID NO:25), EHLVV (SEQ ID NO:26), DHLRT (SEQ ID NO:27), DHLAT (SEQ ID NO:28), or DHLQT (SEQ ID NO:29) at ZF7 positions +2 to +6;
  • (3) the amino acid sequence NAMKR (SEQ ID NO:30), EHMGR (SEQ ID NO:31), DHIVINR (SEQ ID NO:32), THMKR (SEQ ID NO:33), EHMRR (SEQ ID NO:34), or THIVINR (SEQ ID NO:35) at ZF6 positions +2 to +6;
  • (4) the amino acid sequence MNES (SEQ ID NO:36), HRES (SEQ ID NO:37), RPDT (SEQ ID NO:38), RTDI (SEQ ID NO:39), or RHDT (SEQ ID NO:40) at ZF6 positions −1 to +3;
  • (5) the amino acid sequence HGLKV (SEQ ID NO:41), HRLKE (SEQ ID NO:42), HALKV (SEQ ID NO:43), SRLKE (SEQ ID NO:44), DGLRV (SEQ ID NO:45), HTLKV (SEQ ID NO:46), or NRLKE (SEQ ID NO:47) at ZF5 positions +2 to +6;
  • (6) the amino acid sequence ATLKR (SEQ ID NO:48), QALRR (SEQ ID NO:49), GGLVR (SEQ ID NO:50), HGLIR (SEQ ID NO:51), ANLSR (SEQ ID NO:52), TGLTR (SEQ ID NO:53), HGLVR (SEQ ID NO:54), GGLTR(SEQ ID NO:55), HTLRR(SEQ ID NO:56), TVLKR(SEQ ID NO:57), ADLKR (SEQ ID NO:58), or HGLRR (SEQ ID NO:59) at ZF5 positions +2 to +6;
  • (7) the amino acid sequence AHLRK (SEQ ID NO:60), AKLRV (SEQ ID NO:61), GGLGL (SEQ ID NO:62), AKLRI (SEQ ID NO:63), TKLKV (SEQ ID NO:64), or SKLRV (SEQ ID NO:65) at ZF4 positions +2 to +6;
  • (8) the amino acid sequence ATLRR (SEQ ID NO:66), RRLDR (SEQ ID NO:67), TNLRR (SEQ ID NO:68), ANLRR (SEQ ID NO:69), GNLTR (SEQ ID NO:70), AMLKR (SEQ ID NO:71), HMLTR (SEQ ID NO:72), AMLRR (SEQ ID NO:73), or TMLRR (SEQ ID NO:74) at ZF4 positions +2 to +6;
  • (9) the amino acid sequence QQLIV (SEQ ID NO:75), SQLIV (SEQ ID NO:76), QQLLV (SEQ ID NO:77), GELVV (SEQ ID NO:78), QQLLI (SEQ ID NO:79), GQLIV (SEQ ID NO:80), GQLTV (SEQ ID NO:81), TELII (SEQ ID NO:82), QGLLV (SEQ ID NO:83), QQLLT (SEQ ID NO:84), GQLLT (SEQ ID NO:85), GELLT (SEQ ID NO:86), or QQLLI (SEQ ID NO:87) at ZF3 positions +2 to +6;
  • (10) the amino acid sequence AKLKK (SEQ ID NO:88), AKLRK (SEQ ID NO:89), AHLRV (SEQ ID NO:90), AKLRV (SEQ ID NO:91), or SKLRL (SEQ ID NO:92) at ZF4 positions +2 to +6; the amino acid sequence ERLRV (SEQ ID NO:93), NRLKV (SEQ ID NO:94), SRLKE (SEQ ID NO:95), or NRLKV (SEQ ID NO:96) at ZF5 positions +2 to +6; the amino acid sequence RPDT (SEQ ID NO:97), RTET (SEQ ID NO:98), or RADV (SEQ ID NO:99) at ZF6 positions −1 to +3; and the amino acid sequence DNLLA (SEQ ID NO:100), SNLLV (SEQ ID NO:101), DNLMA (SEQ ID NO:102), or DNLRV (SEQ ID NO:103) at ZF7 positions +2 to +6;
  • (11) the amino acid sequence GHLKK (SEQ ID NO:104), AHLRK (SEQ ID NO:105), or GKLRI (SEQ ID NO:106) at ZF4 positions +2 to +6; the amino acid sequence SRLKE (SEQ ID NO:107), DALRR (SEQ ID NO:108), DGLKR (SEQ ID NO:109), or TRLRE (SEQ ID NO:110) at ZF5 positions +2 to +6; the amino acid sequence at RPDT (SEQ ID NO:111) or RTEN (SEQ ID NO:112) at ZF6 positions −1 to +3; and the amino acid sequence EHLKV (SEQ ID NO:113), DHLLA (SEQ ID NO:114), or HHLDV (SEQ ID NO:115) at ZF7 positions +2 to +6;
  • (12) the amino acid sequence SNLRR (SEQ ID NO:116), GNLVR (SEQ ID NO:117), GNLRR (SEQ ID NO:118), GNLKR (SEQ ID NO:119), ANLRR (SEQ ID NO:120), NNLRR (SEQ ID NO:121), or TNLRR (SEQ ID NO:122) at ZF4 positions +2 to +6; the amino acid sequence EHMKR (SEQ ID NO:123), EHMRR (SEQ ID NO:124), THMKR (SEQ ID NO:125), EHMNR (SEQ ID NO:126), or EHMAR (SEQ ID NO:127) at ZF6 positions +2 to +6; and the amino acid sequence DNLLT (SEQ ID NO:128), DNLLV (SEQ ID NO:129), DNLQT (SEQ ID NO:130), DNLLA (SEQ ID NO:131), DNLAT (SEQ ID NO:132), DNLQA (SEQ ID NO:133), DNLMA (SEQ ID NO:134), or DNLMT (SEQ ID NO:135) at ZF7 positions +2 to +6;
  • (13) the amino acid sequence GNLVR (SEQ ID NO:136), GNLRR (SEQ ID NO:137), GNLAR (SEQ ID NO:138), GNLMR (SEQ ID NO:139), ANLRR (SEQ ID NO:140), SNLRR (SEQ ID NO:141), or NNLRR (SEQ ID NO:142) at ZF4 positions +2 to +6; the amino acid sequence EHMNR (SEQ ID NO:143), EHMKR (SEQ ID NO:144), EHMRR (SEQ ID NO:145), SHMNR (SEQ ID NO:146), SHMRR (SEQ ID NO:147), THMKR (SEQ ID NO:148), or DHMNR (SEQ ID NO:149) at ZF6 positions +2 to +6; and the amino acid sequence EHLKV (SEQ ID NO:150), EHLAE (SEQ ID NO:151), STLNE (SEQ ID NO:152), DHLQV (SEQ ID NO:153), EHLNV (SEQ ID NO:154), DHLNT (SEQ ID NO:155), EHLQA (SEQ ID NO:156), or HHLMH (SEQ ID NO:157) at ZF7 positions +2 to +6; or
  • (14) the amino acid sequence GHLKK (SEQ ID NO:158), AHLKK (SEQ ID NO:159), TKLRL (SEQ ID NO:160), TKLKL (SEQ ID NO:161), GHLRK (SEQ ID NO:162), THLKK (SEQ ID NO:163), or AHLRK (SEQ ID NO:164) at ZF4 positions +2 to +6; the amino acid sequence TRLKE (SEQ ID NO:165) or SRLKE (SEQ ID NO:166) at ZF5 positions +2 to +6; and the amino acid sequence RADN (SEQ ID NO:167), RHDT (SEQ ID NO:168), RRDT (SEQ ID NO:169), RPDT (SEQ ID NO:170), RTSS (SEQ ID NO:171), or RNDT (SEQ ID NO:172) at ZF6 positions −1 to +3.
  • In some embodiments, the engineered CTCF variants contain two or more combinations of the above-listed amino acid sequences.
  • In one embodiment of the present disclosure, mutations at certain positions within the consensus CBS substantially reduced binding by the wild-type CTCF zinc finger array in a bacterial two-hybrid system that was used to select for variants from randomized libraries that are capable of recognizing the mutated CBS sequence. Combining fingers together can be used to generate variant CTCF zinc finger arrays capable of recognizing CBSs harboring multiple point mutations. In some embodiments of the present disclosure, CTCF proteins harboring these zinc finger array variants are used to restore CTCF binding activity at sites bearing one or more mutations within a CBS (i.e., non-canonical CBSs). In some embodiments of the present disclosure, CTCF variants capable of recognizing alternative non-CBS sites in the genome. In some embodiments, such CTCF variants can be used to create artificial TADs and/or enhancer-promoter loops that can purposefully insulate genes and/or perturb the higher order structure of the genome and thereby alter expression of certain target genes of interest.
  • Diagnosis and Treatment of Diseases
  • The engineered CTCF variants described herein can be used for treating diseases where aberrant gene regulation due to mutant CBS is an underlying factor. The engineered CTCF variants described herein can, for example, bind to mutant CBSs that do not bind wild-type CTCFs, thereby altering or restoring gene regulation that can reverse or slow down progression of diseases. CTCF binding has been shown to regulate expression of oncogenes, such as MYC. Mutations accumulated in CTCF binding sites and loss of wild-type CTCF binding are associated to dysregulation of oncogenes and increased risk of carcinogenesis. Transcriptional dysregulation of MYC is one of the most frequent events in aggressive tumor cells and the dysregulation is a result of mutations in CTCF binding site disrupting enhancer-promoter loop. Engineered CTCF variants can bind to the mutated sites and restore normal gene expression levels, reducing risk of cancer development. In another case, Fragile X Syndrome is the result of a duplication in a repetitive region and the loss of FMR1 expression. Duplication of a repeat region in the X chromosome disrupts a CTCF binding site, leading to the loss of an enhancer-promoter loop driving the expression of FMR1. The engineered CTCF variants could restore the enhancer-promoter loop, leading to restoration of FMR1 expression. Human Papilloma Virus (HPV) and other integrating viruses (such as HIV) are often silenced by CTCF-mediated insulation of the viral genome from nearby enhancers. In the case of HPV18, there is a CTCF binding site in the promoter region of the viral genome. HPV18 that have mutations in the CTCF binding site are not silenced because these sequence mutations in the binding site can no longer be recognized by CTCF. Engineered CTCF variants would be able to bind to the mutated HPV integrated genomes and restore the insulating loop.
  • Kits
  • Also provided herein are kits comprising the engineered CTCF variant, and/or nucleic acids encoding an engineered CTCF variant as described herein and instructions for use.
  • Other Applications for the Engineered CTCF Variants
  • The engineered CTCF variants described herein can be used in a number of other applications, some of which are disclosed herein.
  • In some embodiments, the engineered CTCF variant, or nucleic acids encoding such engineered CTCF variant can be used to further elucidate the complex interactions of CTCF and other chromatin organization proteins. The structural maintenance of chromosomes is tightly regulated within cells and CTCF plays a major role. It still remains unclear how higher order structures are inherited across cell division and maintained through cell differentiation, the use of CTCF variants can help clarify that role. CTCF variants might be used to investigate how loops are formed across the genome and to modify or restore normal genomic architecture in a manner that impacts endogenous gene expression for research and therapeutic applications. They might also be used to re-establish ancestral CTCF binding sites so that we may better understand the evolutionary implications of TAD-based genome organization and epigenetic regulation of gene expression or to create alternative genomic architectures that impact endogenous gene expression for research and therapeutic applications.
  • Examples Materials and Methods
  • The following materials and methods were used in the examples set forth below.
  • Construction of B2H Reporter Assay Components
  • The zinc-finger bacterial expression plasmid contained the CTCF zinc finger array (or variants) fused to gal11P. The amino-terminal end of all or part of the CTCF 11-finger zinc finger array was fused to the carboxy-terminal end of gal11P with a Flag tag linker between them. The zinc finger expression plasmid contains a Kanamycin resistance gene. The second plasmid, known as the bacterial reporter plasmid, contained CTCF binding site sequence that was introduced via BsaI restriction digest followed by T4 mediated ligation of annealed oligos containing the CTCF binding site. The reporter plasmid contained bacterial lac promoter that promoted the expression of lacZ when the CTCF binding site was bound. The reporter plasmid also has a Chloramphenicol resistance gene.
  • Bacterial-Two-Hybrid (B2H) Randomized Library Construction
  • Complimentary oligos were synthesized by IDT with ‘VNS’ or ‘NNS’ variation introduced in the sequence by design. Oligos were annealed and ligated into the zinc finger expression plasmid (previously digested with XbaI and BamHI) using T4 ligase. Ligation reaction was purified using Qiagen Minelute column and the purified substrate was electro-transformed into electro-competent XL1blue E. coli strain. After 1 hour recover in SOC at 37° C., the transformation was inoculated into 150 mL Luria broth (LB) with 50 ug/mL of Kanamycin. After the culture reached a OD600 of 0.400-0.600 (about 10 hours growth at 37° C.) the culture was spun down and the library was harvested using Qiagen Maxiprep kit.
  • Bacterial-Two-Hybrid (B2H) Reporter Assay
  • 600 ng of gal11P-zinc finger expression plasmid and 600 ng of reporter plasmid with CTCF binding site of interest were chemically transformed into 150 uL of Δλ E. coli strain with an alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion. Plasmid and cell mixture was incubated on ice for 30 minutes, heat shocked at 42° C. for 1 minute, recovered on ice for 2 minutes, followed by recovery in 500 uL of Luria Broth for 1 hour. Post-recovery, transformation was plated on Kanamycin (50 ug/mL), Chloramphenicol (12.5 ug/uL) selective LB agar plates. After 14-16 hours of growth at 37° C., colonies were picked and grown overnight in 1 mL of induction media (Luria broth with 50 ug/uL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 10 ug/mL of ZnCl, and 500 ug/mL of IPTG). After 15-17 hours of growth, 25 uL of the overnight culture was sub-cultured into 1 mL of fresh induction media and grown for 2 hours at 37° C. or until cultures were between OD595 0.157-0.268 as measured by spectrophotometer. 100 uL of the subculture in then lysed for minimum of 15 minutes using 11 ul of a 1:10 mixture of lysozyme and PopCulture soap. 15 uL of the lysis mixture was then analyzed for fold activation of LacZ by previously described colorimetric ONPG assay. Binding was quantified by fold activation of LacZ. Fold activation was determined by calculating the fold increase of β-gal levels of a sample above the β-gal levels of the negative control (no zinc finger protein fused to gal11P).
  • Bacterial-Two-Hybrid (B2H) Selection Assay
  • Plasmids involved in the selection assay are the same as before with only one difference: The reporter plasmid is made to be a selective plasmid by swapping LacZ with BlaC, an antibiotic resistance gene for β-lactam ring class of antibiotics, such as Carbenicillin. Selections are carried out by constructing libraries of variants from a pool of oligos ligated into the zinc finger-gal11P expression plasmid. These are electro-transformed into electro-competent Δλ E. coli strain containing the selective plasmid with the CTCF binding site of interest. Cells are recovered in 1 mL of SOC for 1 hour at 37° C. followed by induction of selective plasmid for 3 additional hours at 37° C. in 4 mLs of induction media (previously described). After four total hours, transformations are plated on low stringency plates (LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of zinc chloride, and 200 ug/mL, IPTG and 0.45 ug/mL of Clavulanic acid). Plates are grown overnight at 37° C. for 20-24 hours and then colonies are harvested off the surface with 2 mL of LB. 50 uL of the scrapped colonies are sub-cultured into 1 mL of terrific broth (TB) with 50 ug/mL of Kanamycin, and 12.5 ug/mL of Chloramphenicol and grown 14-16 hours at 37° C. The next day, plasmid is harvested from the overnight cultures and chemically transformed into chemically competent Δλ E. coli strain containing the same selective plasmid with the CTCF binding site of interest as before. The chemical transformation is performed as previously described with the addition of 2 hour growth in induction media following a 1 hour recovery at 37° C. After a total of 3 hours of growth, cells are plated on high stringency selective gradient plates. The high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ˜1 up to 40 ug/mL in concentration. Plates were incubated 20-24 hours at 37° C. Colonies that grew on the gradient with the highest levels of Clavulanic acid were picked and grown in 1 mL of TB with 50 ug/mL of Kanamycin and grown overnight in order to harvest the plasmid. The variant plasmid was then Sanger sequenced as well as analyzed for binding activity in the B2H β-gal reporter assay.
  • High Stringency Gradient Plates
  • The high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ˜1 to 40 ug/mL in concentration. To obtain a gradient of Clavulanic acid, rectangle plates are elevated using a pipette tip so as to have a ˜25° C. slope (enough of an angle so that the thin end of the wedge is only barely covered with LB agar). 20-25 mL of LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG and 4 ug/mL of Clavulanic acid is added to the inclined plate to form the bottom wedge. Once solidified, the plates are laid flat and 20-25 mLs of LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG (with no Clavulanic acid) is poured on top. This creates plates with a gradient of Clavulanic acid ranging from ˜1 ug/mL up to 4.0 ug/mL.
  • CTCF Binding Assay Using ChIP-qPCR
  • K562 cells were seeded 18-24 hours in advance of transfection at a density of 3×105 cells/mL. 3 million K562s per variant were transfected using Lonza Kit V using the provided optimized protocol and pooled in a 10 cm dish. 5 ug of plasmid expressing HA epitope tagged CTCF (wild-type or variant) expressed by a pCAG promoter was used for each 1 million cell reaction. 72 hours post transfection, approximately 10 million cells were crosslinked with 1% Formaldehyde at 37° C. for 10 mins. Reaction was quenched with 1.2 mL of 2.5M Glycine for 5 mins at 37° C. Cells were pelleted at 430 g for 10 mins and sonicated on SFX250 Branson sonifier for 5.5 mins, 32% Amplitude, 1.3 s off, 0.7 s on. The samples were then split in half, one precipitated overnight, rotating at 4° C. with antibody specific to CTCF and the other precipitated overnight with HA specific antibody. The next day, antibody bound chromatin complexes were incubated with G-dynabeads for 2 hours at 4° C., rotating. Beads were washed three times in 1 mL of ice-cold RIPA 150 Wash Buffer (0.1% SDS, 0.1% DOC, 1% Triton X-100, 1 mM EDTA, 10 mM Tris-HCl pH 8, 150 mM NaCl), three time in 1 mL of ice-cold RIPA 500 wash buffer (0.1% SDS, 0.1% DOC, 1% Triton X-100, 1 mM EDTA, 10 mM Tris-HCl pH 8, 500 mM NaCl), three times in 1 mL of ice-cold LiCl wash buffer (10 mM Tris-HCl pH8, 250 mM LiCl, 0.5% Triton X-100, 0.5% DOC), and once in 1 mL of ice-cold 10 mM Tris-HCl pH 8.5. The antibody chromatin complex was eluted from the beads in 100 uL of Elution Buffer (10 mM Tris-HCl pH 8, 0.1% SDS, 150 mM NaCl) with 5 mM DTT added fresh. Beads were incubated with elution buffer at 65° C. for 1 hour, shaking at 900 rpm. Beads were pelleted by magnet and supernatant was moved to a clean tube where, after cooling to room temp, 1 uL of RNAse (Roche 11119915001) was added to the sample and incubated at 37° C. for 30 mins at 600 rpm. 3 uL of Proteinase K [20 mg/mL] was added to samples and incubated overnight at 65° C. (Lifetech #100005393). The next day, 100 uL of SPRI beads with 160 uL of PEG/NaCl (20% PEG, 2.5M NaCl) were added to samples, vortexed and incubated at room temp for 5 minutes before pelleting beads on a magnet. Pellet was washed twice with 80% ethanol and air dried for 5 minutes before final elution in 150 uL of 10 mM Tris-HCl pH 8. 3 uL of recovered supernatant was mixed with 5 uL of SYBR qPCR master mix and 2 uL of primer mix for quantification of fragment enrichment over 1% input untreated by antibody by Real Time-qPCR.
  • Generation of Variant Binding Site Cell Lines
  • Cell lines with the variant binding site introduced at the CTCF binding site ˜2 kb upstream of MYC TSS were generated by nucleofecting exoMYC.K562 with SpCas9-P2A-GFP, gRNA targeting the CTCF binding site, and one of 6 distinct ssODNs as HDR templates to introduce the 6 different variant binding sites. exoMYC.K562 is K562 cell line transduced with exogenous MYC construct expressed off of PGK promoter. This was necessary as any reduction of endogenous MYC expression can impact the survival of K562 cells. GFP+ cells were sorted at a high dilution into a 96 well plate for single-cell clonal expansion. Once expanded, gDNA and RNA was extracted to genotype and phenotype the clonal cell population. Clonal lines that had a reduction of endogenous MYC and also appeared homozygous at the target site for the desired HDR event were used in the study.
  • Quantifying MYC Expression by RT-qPCR
  • Three million K562 cells genome edited to harbor the variant binding site upstream of MYC were nucleofected with 5 ug of plasmid expressing a variant CTCF following the Lonza Kit V protocol. 72 hours post nucleofection, 1 million cells were isolated for RNA extraction following the NucleoSpin RNA Plus RNA isolation protocol. The RNA was converted to cDNA via Thermo High-Capacity RNA-to-cDNA Kit. 3 uL of 1:20 dilution of cDNA was mixed with 5 uL of Thermo Fast SYBRgreen Master Mix and run on RT-qPCR machine following standard PCR amplification protocol.
  • Results Single Nucleotide Substitution at CBS Affecting CTCF Binding Efficiency
  • We reasoned we could use a bacterial two-hybrid (B2H) system to evolve the zinc finger array of CTCF to bind to mutated CBSs bearing single or multiple sequence changes that disrupt wild-type CTCF binding (Wright et al. Nature Protocols (2006); Sander et al., Nature Methods (2010); Maeder et al. Molecular Cell. (2008)). We used a previously described bacterial-two-hybrid (B2H) system to systematically define the impact of single nucleotide substitutions within a previously defined consensus CBS site (Joung et al., PNAS (2000)). In the B2H system, the binding of a DNA-binding zinc finger array to a target site of interest can be configured to result in increased transcription of a reporter gene (e.g., beta-galactosidase or an antibiotic resistance gene) (FIG. 2). To do this, two fusions are expressed in an E. coli cell bearing a reporter construct. The first fusion consists of a zinc finger array fused to a fragment of the yeast Gal11P protein, which interacts with a fragment of the yeast Gal4 fusion. The second fusion consists of a fusion of the N-terminal domain of the E. coli RNA polymerase alpha subunit to the yeast Gal4 fragment (the α-Gal4 fusion). The reporter construct consists of a weak E. coli promoter that drives expression of the reporter gene of interest with a binding site for the zinc finger array positioned upstream of the promoter. Binding of the zinc finger-Gal11P fusion to the zinc finger binding site results in recruitment of RNA polymerase complexes harboring the alpha-Gal4 fusion, resulting in increased transcription of the reporter gene. If the reporter gene is lacZ, which encodes for β-galactosidase (β-gal), the level of beta-gal expression can be easily quantified using a well-established colorimetric ONPG-based assay (FIG. 2).
  • In this B2H reporter assay, we determined the entire zinc finger array (ZF1-11) and the full CTCF binding site (CBS), not just the 15 bp consensus CBS sequence, was required for optimal expression of the lacZ gene (FIG. 3), which mimics observed CTCF binding requirements in human cells10,11. After optimizing positioning of the CBS site relative to the transcription start site, we then systematically introduce point mutations into the CBS and tested their impact on lacZ expression. These results demonstrated that mutation of nucleotides outside the 15 bp core sequence had little impact on lacZ expression. By contrast, binding, however certain sequences at certain positions within the core sequence resulted in no or reduced binding (FIG. 4). Our results closely match ChIP-Seq data for CTCF binding sites in human cells and reflect other studies in the literature in which point mutations in the CTCF core lead to loss of CTCF binding. Taken together, these results strongly suggest that binding activity of the CTCF zinc finger array in the B2H system mimics the binding activity of intact CTCF protein in human cells.
  • Although most sequence changes in the flanking regions of the binding site had little impact on binding efficiency, certain alterations appeared to slightly improve the fold-activation of lacZ expression. Therefore, we tested whether a more “optimized” CBS bearing the “best” nucleotides as defined in the B2H assay might lead to higher-fold activation of lacZ expression but we did not observe any higher activity compared with the original consensus sequence (derived from Nakahashi et al. ChIP-seq data) (FIG. 5).
  • Generation of Engineered CTCF Variants that Bind to Mutated CBSs with Single Altered Nucleotide
  • Next, we sought to determine if we could use the B2H system to select for CTCF zinc finger array variants capable of recognizing mutated CBSs not recognized by the wild-type CTCF zinc finger array. To do this, we modified the B2H reporter construct, replacing the lacZ gene with the blaC gene (FIG. 6), which encodes beta-lactamase and therefore confers resistance to beta-lactam antibiotics (e.g., carbenicillin). This modification enables us to select for cells that express a CTCF zinc finger array variant that can efficiently bind a mutant CBS positioned upstream of the weak promoter driving blaC expression. Increasingly higher levels of blaC expression can be selected for by using media containing carbenicillin and increasingly higher concentrations of the beta-lactamase inhibitor clavulanic acid. Gradients of clavulanic acid can be created within a single agar plate (FIG. 6; see Materials and Methods), thereby enabling sampling of cells at various concentrations of the inhibitor.
  • With this modified B2H selection system, we first sought to identify CTCF zinc finger array variants that can bind to CBSs bearing single point mutations that abolish binding by the wild-type CTCF zinc finger array in this system. In an initial set of selection experiments, we sought to identify CTCF zinc finger array variants that could bind to mutant CBSs bearing mutations of the C that is contacted by an aspartic acid (D) present at the third position (+3) of the alpha-helical recognition helix of ZF7 (shown by previously published co-crystal structures cited above). We created a randomized library of CTCF zinc finger array variants in which the codon encoding the ZF7 +3 position was randomized using a degenerate NNS codon (where N=G, A, C, or T and S=G or C). We then used the B2H selection system to interrogate this library to identify variants capable of recognizing CBSs bearing C to T, C to G, and C to A substitutions at the position contacted by ZF+3. Selections were initially performed on low stringency plates with clavulanic acid gradients ranging from 0 to 0.45 ug/ml) and surviving colonies harvested and plasmids encoding the variant zinc finger arrays were purified. This selected subset of variants was then subjected to high stringency selection in the B2H system on plates with carbenicillin and gradients of clavulanic acid ranging from 0 to 4 ug/ml). Plasmids encoding variant zinc finger arrays were purified from colonies that grew on the end of the gradient plate with highest concentration of clavulanic acid, sequenced, and then tested in the B2H reporter assay by beta-galactosidase assay.
  • As can be seen in FIGS. 7A-C, we obtained CTCF zinc finger array variants that showed preferential binding activity (as judged by the B2H reporter assay) for the mutated CBS compared with the original consensus CBS. These clones also showed selection for a particular amino acid at the ZF7 +3 position: for the C to T site, a threonine (T) was selected, for the C to A site, an asparagine (N) was selected, and for the C to G site a histidine (H) was selected. The identities of these amino acids is consistent with what might be expected to recognize the mutant nucleotide based on previous zinc finger selections using the Zif268 zinc finger array. However, although we successfully selected for mutants that had altered binding activity, in most cases, the binding activity of the variant for the mutated CBS was not as strong (as judged by the B2H reporter assay) as that of the wild-type CTCF zinc finger array for the consensus CBS (FIGS. 7A-C).
  • Based on our previous experience with re-engineering the DNA-binding specificities of the Zif268 zinc finger array, we hypothesized that obtaining stronger binding variants might require alteration of amino acids flanking the +3 position in ZF7. To test this idea, we created a larger library of variants in which we randomized positions +2, +3, +5 and +6 of ZF7 using degenerate VNS codons (where V=G, A, or C). Position +4 of ZF7 was not altered because it faces the internal core of the ZF domain and is not expected to make contacts to the DNA. We then performed B2H selections as described above using this library to identify variants that could identify a mutant CBS with a C to G mutation at the position contacted by ZF7 +3 in the wild-type CTCF zinc finger array. These selections identified variants that showed stronger binding activity for the mutant CBS and showed some degree of consensus in the identities of amino acids selected (FIG. 8).
  • Based on this success, we generated additional randomized libraries in which randomized positions −1, +1, +2, and +3 or +2, +3, +5 and +6 for ZF7, ZF6, ZF5, ZF4, and ZF3. We then performed selections as described above using these libraries against various matched mutant CBSs harboring nucleotide substitutions at positions expected to be contacted by residues randomized in the libraries (FIGS. 9-16). Analysis of variants from individual surviving colonies at the most selective end of the high stringency selection plates showed that many of these selections yielded variants with high activity for the mutant CBS of interest and sequencing of these clones showed that there was generally a degree of consensus in the amino acid sequences suggesting that selection was successfully occurring (FIGS. 9-16).
  • Generation of Engineered CTCF Variants that Bind to Mutated CBSs with Multiple Altered Nucleotides
  • Having successfully identified CTCF zinc finger variants that could recognize CBSs with a single altered nucleotide position, we next sought to identify variants that could recognize CBSs bearing multiple mutated nucleotides. To do this, we sought to recombine ZF variants each selected to bind to different “subsites” within the CBS that bear individual mutations. However, because of well-known context-dependent effects that exist between ZFs in a multi-finger array, we undertook a strategy in which we recombined together pools of selected ZF variants (rather than a single variant) for any given altered subsite to identify the combinations of mutated ZFs that best work together to recognize a CBS bearing multiple mutations. To isolate pools of ZF variants for various mutated CBS subsites, we harvested all remaining clones from the high stringency selection plates we performed with the CBS sites bearing single mutations (depicted in FIGS. 9-16). Deep sequencing of the various selected clones in these pools yielded a variety of sequences with some degree of consensus within each selection as expected (Table 1).
  • We then recombined pools of variants for ZFs 4, 5, 6, and 7 to create CTCF zinc finger arrays that harbored various altered recognition helices for these positions and then performed B2H selections (see Materials and Methods) against five different mutated CBSs bearing combinations of various nucleotide substitutions in subsites for ZFs 4, 5, 6, and 7 (FIGS. 17-21). Sequencing of clones from these selections showed that certain recognition helix sequences for each finger were selected multiple times, suggesting that the selections were identifying combinations that work well together. Importantly, for all five of the multiply mutated CBSs, several of the CTCF zinc finger array variants identified showed good binding activity on the site for which they were selected as judged by B2H assay (FIGS. 17-21). In addition, for four of the five mutant CBS sites, we were able to identify variants that not only bind to the mutant CBS but also fail to bind to the original unmutated (consensus) CBS. Thus, we conclude that using our approach described here we are able to identify CTCF ZF array variants capable of recognizing multiply mutated CBSs that are not efficient bound by the original wild-type CTCF zinc finger array.
  • Binding Specificity of Engineered CTCF Variants to Mutant and Wild-Type CBSs in Human Cells
  • Having successfully engineered variants that can recognize CBSs with multiple sequence changes across the motif, we next wanted to investigate whether the variants can bind to these same mutant binding sites in a human cell context while not binding to wild-type CBSs. First, we found a collection of sites in the human genome that matched the 15 bp core sequence for each of the five mutated binding sites that we had selected CTCF variants to bind (described in FIG. 17-21). We then looked at two variant binding sites with sequence that matched one of the five mutated binding sites (sequence depicted in FIG. 20) as well as known CBSs to determine if endogenous CTCF could bind to the wild-type CBS and not bind to the variant binding sites as the B2H reporter assay would suggest (FIG. 20). Human K562s, an erythroleukemia cell line, were harvested and analyzed by ChIP-qPCR using CTCF specific antibody to detect CTCF-DNA binding. Wild-type CTCF showed no detectable binding to two different target sites that matched the mutated CBS but showed great enrichment for wild-type CTCF binding site, supporting the results of the B2H reporter assay (FIG. 22). Next, we wanted to see if overexpressed, exogenous, 3×HA tagged wild-type CTCF delivered by plasmid transfection in K562s had the same binding profile observed with endogenous CTCF. Wild-type K562s were transfected with 3×HA-CTCF and 72 hours later were harvested and processed for ChIP-qPCR analysis with HA specific antibodies. Exogenous wild-type 3×HA-CTCF could bind to the wild-type CBSs and could not bind to the variant binding sites, same as endogenous wild-type CTCF, suggesting overexpression of CTCF by plasmid delivery reflects biologically relevant behavior (FIG. 23A). Based on these results, we next examined the ability of a variant CTCF to bind to the variant binding sites native to the human genome. The variant chosen was one pulled out from selection in the B2H selection assay and shown to bind to the variant site with the same sequence as variant site 1 and 2, used in FIGS. 22-23B, by the B2H reporter assay. K562s were transfected with the 3×HA-tagged CTCF variant and the same sites as before were examined for binding activity by ChIP-qPCR. Variant specific HA enrichment was present at the variant binding sites and lacking at the wild-type sites suggesting we successfully evolved a variant that can specifically bind to mutant CBS with as few as three nucleotide changes without binding native CBSs (FIG. 23B).
  • Gene Expression Regulation by Engineered CTCF Variants Via Looping
  • CTCF has the capacity to alter gene expression through CTCF-Cohesin mediated looping of the genome. We were curious to see if the variant CTCFs could reproduce the gene regulatory capacity of wild-type CTCF when bound to the endogenous variant binding sites. To investigate gene expression changes, we focused on genes within a 1 Mb region of the variant binding sites. Eleven genes were identified within 1 Mb region for Variant site 1.1 and 1.2 and another 10 genes were identified for Variant site 2.1 and 2.2. K562s were nucleofected with variant CTCFs fused to GFP that had the capacity to bind to Variant site 1 and Variant site 2. 72 hours post nucleofection, RNA was isolated from GFP+ cells and gene expression levels were compared to RNA extracted from K562s nucleofected with a wild-type CTCF control. Of the 11 genes for Variant site 1.1 and 1.2, 6 genes showed a change in gene expression relative to cells nucleofected with the wild-type CTCF control (JJ388) (FIG. 24A). 2 of the 10 genes identified for Variant site 2.1 and 2.2 had altered gene expression levels relative to wild-type control (FIG. 24B). This data suggests that not only do the variant CTCF proteins bind to their target sequence in human cells, but it also reproduces the biological role of native CTCF to regulate gene expression possibly through the formation of loops or sub-TADs.
  • Next we wanted to demonstrate that the CTCF variants could replicate the biological function of wild-type CTCF at a known CTCF binding site that creates an enhancer-promoter loop. MYC expression is maintained by a loop formed between a CTCF binding site ˜2 kb upstream of the transcriptional start site (TSS) of MYC and a CTCF binding site ˜1 kb downstream of the MYC TSS14. When CTCF Is bound to both sites, cohesin links both CTCFs via the CTCF's cohesin-interaction domain, creating a loop that maintains the expression of MYC. If one or both of the CTCF binding sites is disrupted the CTCF-mediated loop is lost and there is a reduction in MYC expression 14. Five cell lines were generated containing the 5 different variant binding site sequences (defined in FIG. 25) at the CTCF binding site ˜2 kb upstream of the MYC TSS. This was done in K562 background transduced with a lentiviral construct expressing exogenous MYC via phosphoglycerate kinase (PGK) promoter (exoMYC.K562) to compensate for any reduced cell fitness that reduction of endogenous MYC expression may cause. An additional sixth cell line was generated where point mutations to the CTCF binding site were made that should have no affect on wild-type CTCF binding as indicated by results from the B2H reporter assay. RNA was isolated from the clonal cell lines homozygous for the variant binding sites and endogenous MYC gene expression levels were assayed by reverse transcriptase Real Time qPCR (RT-qPCR). Each of the isolated cell lines with the variant CTCF binding site demonstrated a reduced level of MYC expression suggesting that the CTCF-mediated loop is disrupted (FIG. 25).
  • Based on this result, we wanted to see if expression of the variant CTCFs in these modified cell lines could bind to the engineered sites and restore MYC expression. HA tagged wild-type CTCF and HA tagged CTCF variants were expressed in the cell line that contained their matching variant binding site. Variants selected to bind to the G3 variant binding site were expressed in the G3_3 cell line, A3 variants in the A3_4 cell line, etc. HA-tagged wild-type CTCF was also tested in each of the variant cell lines for binding and for recovery of endogenous MYC expression. The level of endogenous MYC expression in exoMYC.K562 served as wild-type control as there is no alteration to the CTCF binding site upstream of the MYC TSS. CTCF variants expressed in the engineered cell lines recovered endogenous MYC expression while expression of wild-type CTCF in these cell lines failed to recover MYC expression (FIGS. 26A-29). The same samples were analyzed for occupancy of the variant binding sites by wild-type CTCF or the variant CTCFs by ChIP-qPCR enriching for CTCF-bound DNA fragments with CTCF or HA antibody. Wild-type CTCF had a reduced occupancy of the variant binding sites, consistent with continued reduction of MYC expression, while variant CTCF proteins could bind to the variant site they were selected for as well as rescue MYC expression (FIG. 26-29). Together, this data suggests that we have evolved CTCF variants that can bind to novel sequences and still interact with cohesin to form loops that maintain gene expression profiles.
  • Tables
  • Amino acid sequence of variants selected for on different CTCF binding sites. All amino acids sequences are listed from N to C terminal. Colonies growing on the highest stringency of selection were scrapped off, pooled, and plasmid encoding for the zinc finger was isolated and deep sequenced. The number of reads reflects how prominent the variant was in the population pooled from selections performed in triplicate.
  • TABLE 1
    ZF7 selection on C:G change at
    nt 2 of core motif in CBS.
    Sequences reflect position
    2 through 6.
    SEQ
    ID #
    NO:  Sequence reads
    8 DHLQT 2981
    15 EHLVV 2413
    155 DHLNT 1517
    16 DHLRT 1442
    13 EHLKV 1434
    192 KDLVV 1357
    193 DHLQA 1114
    194 DHLLV 1076
    195 DHLLT 881
    196 EHLTV 803
    197 STLME 786
    17 DHLAT 777
    9 EHLNV 736
    12 DHLQV 574
    198 DHLKT 541
    199 EHLKE 517
    200 DHLLE 506
    201 EHLRV 503
    202 STLRE 498
    203 DHLMV 431
    204 DHLKV 427
    205 DHLRV 394
    206 DHLNV 389
    114 DHLLA 380
    207 DHLKE 368
    208 DHLNE 330
    11 EHLRE 330
    209 STLLE 323
    210 DHLMA 305
    211 KDLTV 296
    212 DHLVT 284
    213 AHLNV 278
    214 AHLTV 268
    215 HTLME 245
    216 DHLRA 237
    217 DHLAV 221
    218 HHLAE 221
    219 GHLMD 207
    220 DHLST 199
    221 EHLMV 197
    222 AHLVV 196
    223 EHLAV 192
    224 HTLAE 187
    225 STLQE 181
    226 DHLAE 167
    227 AHLQE 163
    228 SSLNE 158
    229 GHLNV 155
    230 EHLVE 144
    231 DHLME 143
    232 DHLRE 134
    233 AHLNA 120
    234 HTLVE 120
    235 STLKE 112
    236 EHLQV 107
    237 GTLME 106
    238 HHLAV 102
    239 HSLME 101
    240 HSLTE 97
    241 EHLMA 97
    242 DHLHT 94
    10 AHLQV 94
    243 DHLTV 93
    244 EHLIV 90
    245 SGLNE 89
    246 AHLLV 85
    247 EHLLV 84
    248 VKLKI 83
    249 DHLQE 80
    250 HTLTE 77
    251 STLHE 76
    252 DHLVV 76
    253 AGLAL 70
    254 STLND 69
    255 DHLKA 68
    256 KDLTQ 66
    257 DKLMN 66
    258 GTLRE 66
    259 GHLTV 66
    260 RLLTA 65
    261 SSLRE 63
    262 HTLKE 62
    263 GHLAV 60
    264 RLLAQ 58
    265 KDLAV 57
    266 EHLQE 57
    267 SHLNV 57
    268 AGLPI 57
    269 TTLME 56
    90 AHLRV 56
    270 AHLMV 55
    271 EHLME 55
    272 EHLQT 55
    273 EVLNR 55
    274 HHLVV 54
    275 KDLSV 54
    276 RHLVM 53
    277 THLNE 50
    278 RDLRT 49
    279 LLLGS 49
    280 MVLGN 48
    281 KTLIE 47
    282 AHLGV 46
    283 SGLLA 46
    284 DHLHV 45
    285 EHLNT 45
    286 STLLQ 44
    287 AHLKV 44
    288 AHLAV 42
    289 TNLID 41
    290 GTLNE 41
    291 QVLTQ 40
    292 SSLME 39
    293 GHLVE 38
    294 HSLLE 38
    295 SGLLE 38
    296 GGLLE 36
    297 STLRV 36
    298 HTLAD 35
    299 SHLME 35
    300 DHLAI 35
    301 EHLLA 35
    302 HNLLL 34
    303 PHLVV 34
    304 KALGT 33
    305 PHLVI 31
    306 VLLII 30
    307 HHLRE 29
    308 GALRM 29
    309 RGLHE 29
    310 AHLLE 28
    311 EHLKA 28
    312 DTLLV 27
    313 EHLRT 26
    314 SSLRD 24
    156 EHLQA 23
    315 EHLAT 23
    316 SGLGE 22
    317 ATLQE 22
    318 DHLSA 22
    101 SNLLV 22
    319 SHLLV 21
    320 KDLMV 21
    321 DHLQQ 20
    322 ATLME 20
    323 GHLQA 20
    324 RTLTE 20
    325 RRLAH 20
    326 DTLQA 20
    327 GHLEV 19
    328 HQLKL 19
    329 EHLLT 19
    330 DGLRT 18
    331 THLRP 18
    132 DNLAT 18
    332 EHLNA 17
    333 STLVV 17
    135 DNLMT 17
    334 DTLLA 17
    335 STLDE 16
    336 KDLVA 15
    337 AHLHA 15
    338 KDLQV 15
    339 HHLTV 15
    340 SGLLD 15
    341 ANLME 14
    129 DNLLV 14
    342 EHLKT 13
    343 GSLAI 13
    344 EHLSV 13
    345 EHLNE 13
    346 EHLVI 13
    347 KDLKV 13
    348 EGLGT 13
    130 DNLQT 12
    349 STLMS 12
    350 AHLMM 12
    351 IKLDG 12
    352 VLLGA 12
    353 PGLSA 12
    354 AELNR 12
    355 HQLVI 12
    356 GHLVV 12
    357 PHLLV 11
    358 PRLAL 11
    359 DHLNA 11
    360 KDLDV 11
    361 AHLHV 11
    362 RVLGG 11
    363 AHLQA 11
    364 RQLRT 10
    365 AHLQT 10
    100 DNLLA 10
    151 EHLAE 10
    366 EHLAM 10
    367 DRLSI 10
    368 GGLGA 10
    369 GHLNT 10
    370 AHLRT 10
    371 DTLRV 18
    372 MSLRG 9
    373 DHLTI 9
    374 THLIV 9
    375 DTLMA 9
    376 MKLQE 9
    377 TALGT 9
    378 GHLLV 9
    379 GQLAI 8
    380 ANLES 8
    381 AHLNT 8
    382 EHLLE 8
    383 SNLTV 8
    384 STLLV 8
    385 STLMV 8
    386 GTLVS 7
    387 DNLKT 7
    388 GHLQT 7
    128 DNLLT 7
    389 EHLVT 7
    390 GALRE 7
    391 SSLAE 7
    392 DTLRQ 7
    393 KALLG 7
    394 AMLNP 6
    395 DTLHQ 6
    396 DNLLQ 6
    397 EHLAH 6
    398 AHLKE 6
    399 ATLAE 6
    400 EHLMD 6
    401 STLHM 6
    402 DTLAV 6
    403 DHLVE 6
    404 PTLGE 6
    405 KGLPL 6
    406 DTLLQ 6
    407 AHLNE 6
    408 AHLAE 6
    409 GHLKV 6
    410 SGLQV 5
    411 HHLLV 5
    412 EPLLP 5
    413 DNLAV 5
    414 AHLLT 5
    415 AHLST 5
    133 DNLQA 5
    416 DNLRT 5
    417 DTLAL 5
    418 DTLQV 5
    419 EHLRA 5
    420 SNLQV 5
    421 KDLRV 5
    422 DTLAT 5
    423 DTLRA 5
    424 QHLRV 4
    425 SSLLE 4
    426 SNLMV 4
    427 SDLGG 4
    428 DNLHT 4
    429 DNLTA 4
    430 DTLMV 4
    431 EHLST 4
    432 DTLSV 4
    102 DNLMA 4
    433 EHLVM 4
    434 STLAE 4
    435 KDLAE 4
    436 SSLNV 4
    437 SSLLV 4
    438 AHLKT 4
    439 AHLRE 4
    440 KDLLV 4
  • TABLE 2
    ZF7 selection on C:T change at
    nt 2 of core motif in CBS.
    Sequences reflect position
    2 through 6.
    SEQ
    ID Read
    NO:  Sequence #
    312 DTLLV 3772
    334 DTLLA 1720
    406 DTLLQ 1681
    326 DTLQA 1340
    371 DTLRV 1048
    418 DTLQV 715
    423 DTLRA 643
    375 DTLMA 620
    430 DTLMV 538
    402 DTLAV 451
    422 DTLAT 406
    441 DSLLV 373
    432 DTLSV 359
    442 DTLLM 339
    392 DTLRQ 334
    443 DTLLI 306
    444 DTLTQ 300
    434 STLAE 269
    445 DTLAA 268
    395 DTLHQ 246
    446 DTLSA 227
    447 DTLKA 216
    384 STLLV 213
    448 STLQQ 201
    449 DTLQQ 200
    450 DTLLL 194
    451 DTLMQ 189
    225 STLQE 189
    452 DTLNA 180
    453 STLLA 176
    454 DTLKV 163
    455 STLNA 162
    456 DTLRE 161
    457 DTLTA 152
    458 DTLQD 146
    459 DTLVA 137
    460 DTLLS 123
    461 STLTQ 122
    462 DSLLA 116
    463 DTLRT 116
    464 DTLQI 115
    465 DTLMN 114
    466 STLSE 114
    467 SSLQV 112
    468 TNLAV 109
    469 DTLVV 108
    470 DTLHA 107
    471 DTLMT 107
    437 SSLLV 107
    209 STLLE 107
    472 DSLRV 106
    473 DTLAE 105
    474 STLNV 105
    475 DTLRN 101
    476 DTLNV 100
    477 DTLRD 99
    478 DSLAV 94
    479 DTLVQ 94
    480 DTLQE 93
    481 STLLD 92
    482 DTLTH 89
    483 SSLND 88
    484 STLTV 88
    385 STLMV 87
    485 DTLML 86
    286 STLLQ 85
    202 STLRE 85
    486 STLQA 84
    487 DTLLD 83
    488 DTLKQ 82
    489 DTLLT 81
    417 DTLAL 76
    490 DTLII 75
    491 DTLLN 75
    492 DSLLQ 73
    493 STLEQ 73
    494 DTLGV 71
    495 DVLRE 67
    496 STLSA 66
    497 DSLSV 65
    498 DTLLE 63
    499 STLAA 63
    500 DTLKI 62
    501 DTLKM 62
    502 DTLQN 60
    197 STLME 60
    503 TTLMT 60
    504 TTLAE 59
    505 STLTE 58
    506 VELVQ 57
    507 TTLNQ 56
    508 DTLMI 54
    509 TTLMD 54
    510 STLMA 51
    511 DVLLA 50
    512 DVLLT 49
    235 STLKE 49
    513 TTLNE 49
    514 MTLPT 48
    292 SSLME 48
    251 STLHE 48
    515 HTLVV 47
    269 TTLME 46
    516 ATLTQ 45
    517 STLAS 45
    333 STLVV 44
    425 SSLLE 43
    518 SSLVE 42
    519 DALQA 41
    520 DVLDA 41
    521 GSLMQ 41
    522 DTLTM 40
    523 STLAQ 39
    524 STLMI 38
    525 DTLAM 37
    526 DTLHT 37
    527 DTLQL 37
    528 DSLKQ 36
    529 DSLRA 36
    530 STLHV 35
    531 STLMQ 35
    532 DGLMA 34
    533 DTLRL 34
    534 SSLLT 34
    535 DSLQA 33
    536 DTLRI 33
    537 STLGE 33
    538 DALKE 32
    539 STLRA 31
    540 DTLHH 30
    541 DTLRG 30
    542 DTLRM 30
    543 DVLMT 30
    544 DTLEI 29
    228 SSLNE 29
    545 DTLHV 28
    546 GTLDE 28
    547 SSLAV 28
    548 STLKQ 28
    549 DTLMD 27
    550 GTLQT 27
    551 SSLVQ 27
    297 STLRV 27
    552 LMLMG 25
    553 STLRQ 25
    554 STLTA 25
    8 DHLQT 24
    555 DSLVA 23
    556 SSLRV 23
    557 DSLRE 22
    558 GRLQD 22
    559 MALQD 22
    560 STLLH 21
    561 STLVQ 21
    562 VRLTA 21
    563 AVLGD 20
    564 PILVT 20
    565 STLDD 20
    566 DSLMI 19
    567 STLID 19
    568 TKLDT 19
    569 ATLVA 18
    570 DTLIA 18
    571 DTLTE 18
    572 GTLNH 17
    573 STLAI 17
    282 AHLGV 16
    129 DNLLV 16
    574 DQLVQ 16
    575 MPLIL 16
    576 TTLHQ 16
    577 TTLQV 16
    578 ATLLE 15
    579 DVLHE 15
    580 ETLRA 15
    581 KVLRS 15
    101 SNLLV 15
    135 DNLMT 14
    582 DSLRQ 14
    583 DTLAN 14
    584 GTLNV 14
    585 HNLMV 14
    586 QTLQA 14
    587 RQLTT 14
    588 DTLSI 13
    589 DRLVG 12
    590 ETLRQ 12
    591 SSLGE 12
    592 SSLVV 12
    193 DHLQA 11
    128 DNLLT 11
    593 DTLME 11
    594 DTLTV 11
    595 DTLVG 11
    596 ETLKA 11
    597 GVLSQ 11
    598 LALMR 11
    599 RTLVE 11
    600 TTLLI 11
    601 TTLNV 11
    602 DTLSE 10
    391 SSLAE 10
    603 STLAV 10
  • TABLE 3
    ZF7 selection on C:A change at
    nt 2 of core motif in CBS.
    Sequences reflect position
    2 through 6.
    SEQ
    ID #
    NO:  Sequence read
    100 DNLLA 2659
    101 SNLLV 2616
    135 DNLMT 2555
    130 DNLQT 1983
    129 DNLLV 1945
    128 DNLLT 1922
    132 DNLAT 1457
    604 DNLRA 1117
    102 DNLMA 1038
    605 DNLMV 901
    606 DNLQV 845
    607 DNLQQ 841
    396 DNLLQ 813
    387 DNLKT 582
    133 DNLQA 571
    420 SNLQV 565
    608 DNLRQ 494
    426 SNLMV 459
    383 SNLTV 458
    609 DNLNT 412
    428 DNLHT 389
    610 SNLVV 349
    611 SNLQQ 334
    429 DNLTA 323
    612 DNLLS 322
    413 DNLAV 316
    416 DNLRT 309
    613 DNLTT 300
    614 DNLAA 295
    615 SNLLA 295
    616 SNLLQ 278
    617 SNLAV 257
    618 DNLNA 240
    619 DNLGT 240
    103 DNLRV 239
    620 DNLKA 167
    621 DNLMQ 156
    622 DNLKV 148
    623 SNLNV 132
    624 SNLMA 128
    625 SVLQD 113
    626 DNLQS 110
    627 DNLSA 105
    628 DNLAQ 103
    629 DNLMS 98
    630 DNLSQ 95
    631 DNLNV 87
    632 DNLGV 87
    633 SNLLT 87
    634 DNLIA 83
    635 DNLNQ 83
    636 SNLQT 80
    637 SNLRV 79
    638 SNLIV 79
    639 DNLSV 74
    640 SNLQA 60
    641 SNLLL 57
    642 SNLDV 56
    643 DNLVQ 54
    644 SNLLI 54
    645 TGLAL 52
    646 SNLMQ 51
    647 DQLKI 40
    648 GDLGT 40
    649 SNLKV 39
    650 VPLVD 38
    651 DNLRI 37
    652 DNLLI 37
    653 TNLDV 36
    654 HDLKI 35
    655 DNLVV 35
    312 DTLLV 32
    656 DNLTV 31
    657 DNLVT 31
    658 SNLAQ 30
    659 DNLIV 28
    660 SNLMT 27
    465 DTLMN 25
    661 SNLTQ 23
    662 EILRI 23
    663 IGLEA 22
    664 HRLGG 22
    8 DHLQT 21
    665 DNLST 20
    666 MRLHV 19
    667 SNLTT 18
    668 SNLGV 16
    669 SNLAT 16
    15 EHLVV 16
    670 ANLMV 14
    671 HVLVG 14
    672 SNLRA 13
    673 HNLQL 12
    674 DNLVA 12
    675 SNLTA 12
    676 KGLRM 12
    334 DTLLA 12
    677 PMLGV 11
    678 GVLVA 11
    679 DNLQD 11
    680 MKLGT 11
    406 DTLLQ 11
  • TABLE 4
    ZF7 selection on A:T change at
    nt 3 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID #
    NO:  Sequence Reads
    173 RKHD 4641
    175 RKAD 1938
    174 RRSD 1299
    681 RRHD 868
    682 RKTD 182
    683 NVSM 146
    684 RQSD 76
    685 RKND 69
    686 SENV 69
    687 VDHR 60
    688 AQIV 58
    689 KTPH 56
    690 PKIV 51
    691 GAEP 42
    692 MLVE 40
    693 VVGN 40
    694 KGPE 36
    695 GKVM 33
    696 TEPG 33
    697 TPHN 32
    698 MPGG 31
    699 DLEK 28
    700 GTDN 27
    701 ISRL 25
    702 ATGL 21
    703 ASNP 19
    704 GAPT 17
    705 HSPN 17
    706 RPVA 16
    177 RKDD 6
    707 MLVD 4
    708 RHRK 3
    709 RKHV 3
    710 RKQD 3
    711 RKSD 3
    712 DHHT 2
    713 GKHD 2
    714 MKAD 2
    715 RKAE 2
    716 RRAD 2
    717 APIG 1
    718 AQNR 1
    719 DMDA 1
    720 EAPM 1
    721 EEMM 1
    722 EPIR 1
    723 GALE 1
    724 GENV 1
    725 GKAD 1
    726 GKVD 1
    727 GPLA 1
    728 GRIE 1
    729 IEKL 1
    730 KAAS 1
    731 KEEH 1
    732 LKVD 1
    733 LLVE 1
    734 LMTQ 1
    735 MASL 1
    736 MGIG 1
    737 MPGD 1
    738 MSLG 1
    739 NDMT 1
    740 NMHT 1
    741 NRIV 1
    742 PENA 1
    743 QKHD 1
    744 QVPD 1
    745 RASD 1
    746 REHD 1
    747 RGHD 1
    748 RKHA 1
    749 RKHY 1
    750 RKLD 1
    751 RKPD 1
    752 RKVD 1
    753 RKYD 1
    754 RMSD 1
    755 RRLD 1
    756 RRND 1
    757 RRRD 1
    758 RRSG 1
    759 RWHD 1
    760 SHRL 1
    761 SQHV 1
    762 SSHD 1
    763 TTHV 1
    764 VHHV 1
    765 WKAD 1
    766 WKHD 1
  • TABLE 5
    ZF7 selection on A:G change at
    nt 3 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID Read
    NO: Sequence  #
    174 RRSD 2997
    173 RKHD 2731
    175 RKAD 1867
    177 RKDD 667
    682 RKTD 475
    767 HADA 411
    710 RKQD 376
    768 RKWD 296
    745 RASD 265
    681 RRHD 169
    685 RKND 126
    754 RMSD 40
    769 RKGD 5
    743 QKHD 3
    757 RRRD 3
    711 RKSD 3
    752 RKVD 2
    180 QALL 2
    753 RKYD 2
    756 RRND 2
    720 EAPM 1
    770 RRCD 1
    771 MLPA 1
    772 RATD 1
    773 RKDV 1
    774 KKPV 1
    775 GEHG 1
    776 HPVR 1
    777 RQHD 1
    778 RMMQ 1
    779 RRGD 1
    780 GREV 1
    781 REQD 1
    782 DRDM 1
    783 SKHD 1
    784 RLSD 1
    785 VPTV 1
    786 HKWD 1
    787 KKND 1
    788 RRSE 1
    749 RKHY 1
    789 READ 1
    790 RNTD 1
    791 MVRA 1
    792 RKED 1
    793 KTMG 1
    794 NEPN 1
    795 RGSD 1
    796 RKRD 1
    797 RWSD 1
    798 TPLP 1
    799 RKAN 1
    800 RKAY 1
    801 QLPL 1
    709 RKHV 1
    802 QGTS 1
    803 DTMV 1
    804 LKWD 1
    805 MNTL 1
    806 HADV 1
    697 TPHN 1
    750 RKLD 1
    807 GRAH 1
    704 GAPT 1
    808 MKHD 1
    809 HEDA 1
    712 DHHT 1
    810 RMLS 1
    811 WRSD 1
    812 DDAT 1
    735 MASL 1
    730 KAAS 1
  • TABLE 6
    ZF7 selection on A:C change at
    nt 3 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID Read
    NO: Sequence #
    173 RKHD 9
    813 DTEN 6
    775 GEHG 5
    814 STKN 5
    815 NIEI 5
    801 QLPL 4
    780 GREV 4
    712 DHHT 4
    782 DRDM 4
    816 MVIN 4
    817 VPDT 4
    818 NIVP 4
    819 MVPS 4
    820 PNHP 4
    821 KTDV 4
    794 NEPN 3
    760 SHRL 3
    736 MGIG 3
    822 HIKM 3
    823 ILQI 3
    741 NRIV 3
    824 IVMQ 3
    825 QTNS 3
    826 ENMD 3
    827 TVER 3
    828 THDR 3
    829 IRSP 3
    771 MLPA 3
    721 EEMM 2
    830 ARIA 2
    785 VPTV 2
    831 EELI 2
    832 KPLR 2
    812 DDAT 2
    833 NRLS 2
    834 PTLR 2
    835 MHIL 2
    836 GGGP 2
    837 MVEN 2
    719 DMDA 2
    838 IVAT 2
    839 TLDR 2
    840 MEPL 2
    841 DTGV 2
    842 TSRS 2
    843 VLSI 2
    844 STVQ 2
    845 GPAQ 2
    846 VEQP 2
    847 MTKK 2
    848 PLIM 2
    802 QGTS 2
    849 AMTV 2
    850 SPMR 2
    851 EPNV 2
    735 MASL 2
    852 MQIN 2
    853 ALDE 2
    728 GRIE 2
    854 ALEH 2
    855 REKD 2
    856 ELLA 2
    857 GVAR 2
    858 VDTL 2
    859 GHEN 2
    730 KAAS 2
    860 ELES 2
    861 DPDT 2
    862 SLEL 2
    863 TMNV 2
    764 VHHV 2
    864 IQPV 2
    865 MLQE 1
    866 VMTV 1
    867 MVEE 1
    868 VARP... 1
    869 KAIG 1
    870 DRSM 1
    871 KNSI 1
    872 DDVS 1
    873 KPQP 1
    874 PHVP 1
    875 DTLQ 1
    876 KLGT 1
    877 IDPH 1
    878 HPNT 1
    879 KSRG 1
    880 RQMA 1
    881 KKEN 1
    882 QVLD 1
    722 EPIR 1
    883 RRQM 1
    798 TPLP 1
    884 ILKN 1
    885 HQMK 1
    179 ELLN 1
    886 MDGG 1
    887 AAGS 1
    888 STVV 1
    889 PARA 1
    890 ALQG 1
    891 SAPG 1
    892 PVLN 1
    742 PENA 1
    893 TSLL 1
    731 KEEH 1
    894 HLDV 1
    895 IHIR 1
    896 SVTL 1
    897 VKDR 1
    898 KMTI 1
    899 AGEM 1
    900 GDSE 1
    901 QPVK 1
    902 KVEA 1
    903 EQER 1
    729 IEKL 1
    984 GHHV 1
    905 GMHL 1
    906 RLRR 1
    907 ATIR 1
    908 RMDI 1
    909 SVIH 1
    910 MDIG 1
    911 LART 1
    912 RLMA 1
    913 RQPP 1
    914 MTMT 1
    915 EDTR 1
    739 NDMT 1
    916 MRGR 1
    917 ELHA 1
    918 TNGQ 1
    919 VNLT 1
    920 MHIR 1
    921 MLLQ 1
    922 GRGE 1
    923 NLRG 1
    924 HIML 1
    807 GRAH 1
    805 MNTL 1
    763 TTHV 1
    793 KTMG 1
    925 MTSV 1
    926 RLSM 1
    803 DTMV 1
    720 EAPM 1
    927 DMGM 1
    928 MLMM 1
    929 LMEM 1
    930 QAVS 1
    931 SRVL 1
    932 DEDP 1
    933 SGDR 1
    934 MMNC 1
    935 NIGM 1
    936 MVQR 1
    937 APHR 1
    938 LDAG 1
    939 RLAN 1
    940 MKGS 1
    941 KKLV 1
    942 VNQE 1
    943 ILKQ 1
    944 PVIP 1
    945 VESL 1
    946 IKQN 1
    947 EDNI 1
    948 THRD 1
    949 IPAG 1
    950 GLNH 1
    951 VDGR 1
    181 PHRM 1
    952 RTGA 1
    953 VSPD 1
    954 KVGD 1
  • TABLE 7
    ZF6 selection on C:T change at
    nt 5 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    955 GHMRR 29
    956 GHMNR 23
    34 EHMRR 23
    957 THMRR 19
    33 THMKR 17
    126 EHMNR 17
    958 GHMKR 12
    127 EHMAR 11
    959 EHMQR 10
    147 SHMRR 10
    960 SAMRR 9
    961 ENMGR 8
    962 SHMKR 8
    35 THMNR 7
    963 NHMRR 7
    964 EGMRR 7
    965 GNMGR 7
    146 SHMNR 6
    966 NGMRI 6
    967 EGMAR 6
    968 ESMRR 6
    969 GHMSR 5
    970 EGMHR 5
    971 TAMRR 5
    972 TNMQR 5
    973 VNMRR 5
    974 AHMKR 4
    975 NGMTA 4
    976 DGMRR 4
    977 GHMTR 4
    978 EHMSR 4
    123 EHMKR 4
    979 GSMRR 4
    980 TNMLR 4
    981 NHMKR 4
    982 ENMLR 4
    983 SPMGV 3
    984 TNMGR 3
    985 SSMAR 3
    986 GGMRR 3
    987 GGMKL 3
    988 SGMVR 3
    989 EHMHR 3
    990 THMSR 3
    991 GSMKI 3
    992 EKMKE 3
    993 NGMAR 3
    994 QNMVR 3
    995 DNMRR 3
    996 ENMER 3
    997 NSMRR 3
    998 SGMKR 3
    999 ANMQR 3
    1000 GHMQR 3
    1001 ANMGR 3
    1002 DNMVR 3
    1003 QAMRE 2
    1004 GNMSR 2
    1005 ESMQR 2
    1006 TPMKV 2
    1007 SNMGR 2
    1008 GAMRI 2
    1009 ANMNR 2
    1010 DNMMR 2
    1011 GSMKM 2
    31 EHMGR 2
    1012 GNMAQ 2
    1013 EGMKG 2
    1014 SSMKI 2
    1015 TSMRR 2
    1016 DGMKR 2
    1017 DNMAR 2
    1018 SSMRR 2
    1019 GNMMR 2
    185 NAMRG 2
    1020 THMKL 2
    1021 ENMAR 2
    1022 NNMVR 2
    1023 TGMKR 2
    1024 TAMKR 2
    1025 AHMNR 2
    1026 QNMGR 2
    1027 TNMVR 2
    1028 NHMNR 2
    1029 EHMTR 2
    1030 GNMIR 2
    1031 SGMRR 2
    1032 NHMSR 2
    1033 GGMRL 2
    1034 SPMKV 2
    1035 TNMRR 2
    1036 GNMRE 2
    1037 ENMMR 2
    1038 THMER 1
    1039 QKMRT 1
    1040 GAMRR 1
    1041 TPMEV 1
    1042 GGMRE 1
    1043 GDMDR 1
    1044 GAMRA 1
    1045 PNMSR 1
    1046 EGMGR 1
    1047 EGTHR 1
    1048 QSMRE 1
    1049 THMKG 1
    1050 NNMGR 1
    1051 GHMNS 1
    1052 IDMKG 1
    1053 ESMTR 1
    1054 SHMKI 1
    1055 HNMMR 1
    184 SNMVR 1
    1056 TAMKV 1
    1057 DSMKR 1
    1058 SNMAR 1
    1059 ESMGR 1
    1060 EAMRR 1
    1861 GNMVR 1
    1062 ANMRR 1
    1063 DGMKI 1
    1064 SHMHR 1
    1065 GAMKE 1
    1066 ESMRE 1
    1067 GSMLR 1
    1068 THMEV 1
    1069 TSMGR 1
    1070 EAMSK 1
    1071 NAMRQ 1
    1072 EGMRT 1
    1073 SHMQR 1
    1074 NGMKR 1
    1075 ESMKE 1
    1076 ANMHR 1
    1077 DHTKR 1
    1078 NGMRE 1
    1079 GSMRA 1
    1080 EGMNQ 1
    1081 GGMRM 1
    1082 PNMKR 1
    1083 NGMKI 1
    1084 SNMLR 1
    1085 SNMRR 1
    1086 SHMTR 1
    1087 TGMRR 1
    1088 SGMRI 1
    1089 DNMGR 1
    183 EGMTR 1
  • TABLE 8
    ZF6 selection on C:A change at
    nt 5 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    965 GNMGR 873
    968 ESMRR 784
    964 EGMRR 772
    967 EGMAR 672
    970 EGMHR 648
    994 QNMVR 597
    980 TNMLR 556
    998 SGMKR 486
    975 NGMTA 479
    979 GSMRR 453
    1003 QAMRE 452
    961 ENMGR 434
    960 SAMRR 431
    993 NGMAR 401
    1079 GSMRA 390
    996 ENMER 389
    1007 SNMGR 378
    1046 EGMGR 376
    1017 DNMAR 368
    1063 DGMKI 347
    999 ANMQR 342
    1040 GAMRR 322
    973 VNMRR 297
    997 NSMRR 295
    1005 ESMQR 293
    1018 SSMRR 289
    1087 TGMRR 289
    1009 ANMNR 279
    1044 GAMRA 275
    183 EGMTR 273
    126 EHMNR 265
    1004 GNMSR 263
    971 TAMRR 260
    972 TNMQR 257
    1010 DNMMR 253
    976 DGMRR 241
    1026 QNMGR 240
    1082 PNMKR 228
    1089 DNMGR 226
    1090 ETMRR 225
    1091 DNMKI 224
    1014 SSMKI 224
    995 DNMRR 221
    1053 ESMTR 214
    1042 GGMRE 214
    984 TNMGR 211
    1031 SGMRR 204
    986 GGMRR 203
    1022 NNMVR 201
    1092 TNMER 197
    1083 NGMKI 195
    1021 ENMAR 194
    1059 ESMGR 194
    1019 GNMMR 193
    1036 GNMRE 193
    1002 DNMVR 187
    1093 TNMAR 186
    34 EHMRR 182
    1066 ESMRE 181
    1027 TNMVR 181
    1015 TSMRR 175
    988 SGMVR 173
    1024 TAMKR 170
    1030 GNMIR 169
    985 SSMAR 163
    991 GSMKI 159
    1094 EHMKQ 149
    982 ENMLR 149
    1016 DGMKR 144
    1012 GNMAQ 139
    1095 SGMQR 138
    1084 SNMLR 133
    1061 GNMVR 130
    1001 ANMGR 129
    1096 HNMRR 129
    1050 NNMGR 128
    1081 GGMRM 127
    1033 GGMRL 124
    1097 QNMER 124
    1057 DSMKR 122
    1035 TNMRR 122
    1008 GAMRI 115
    1058 SNMAR 115
    1056 TAMKV 114
    1098 VSMKR 113
    966 NGMRI 112
    1099 TNMMR 110
    1013 EGMKG 109
    1071 NAMRQ 108
    123 EHMKR 107
    1032 NHMSR 106
    1100 GAMRM 102
    1070 EAMSK 100
    1101 TAMNQ 99
    1102 ESMSR 96
    1103 GGMNQ 95
    1048 QSMRE 95
    185 NAMRG 92
    1104 GGMKR 89
    184 SNMVR 84
    1105 ESMRL 83
    1075 ESMKE 81
    1106 SAMRE 80
    1107 GGMQM 76
    1023 TGMKR 73
    1037 ENMMR 69
    1108 NSMKM 69
    1109 ESMKN 66
    1072 EGMRT 64
    987 GGMKL 64
    1110 TNMSR 63
    1111 DAMRV 61
    1112 GNMER 60
    1113 GAMRE 59
    182 GNMAR 54
    1114 EGMRK 53
    1011 GSMKM 50
    1115 SGMAR 58
  • TABLE 9
    ZF6 selection on C:G change at
    nt 5 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID #
    NO: Sequence Read
    34 EHMRR 3207
    955 GHMRR 2397
    957 THMRR 2025
    956 GHMNR 1880
    33 THMKR 1415
    35 THMNR 1341
    958 GHMKR 1208
    978 EHMSR 1038
    127 EHMAR 927
    962 SHMKR 771
    959 EHMQR 764
    126 EHMNR 676
    146 SHMNR 646
    147 SHMRR 579
    123 EHMKR 511
    1029 EHMTR 460
    963 NHMRR 436
    992 EKMKE 381
    32 DHMNR 374
    981 NHMKR 342
    983 SPMGV 322
    977 GHMTR 318
    1028 NHMNR 285
    1116 DHMKR 264
    969 GHMSR 258
    1025 AHMNR 247
    989 EHMHR 232
    974 AHMKR 227
    31 EHMGR 210
    1117 GHMHR 129
    1118 THMKV 129
    1020 THMKL 117
    1006 TPMKV 110
    1000 GHMQR 105
    1119 DHMRR 105
    990 THMSR 97
    1120 AHMRR 92
    1121 EKMRE 86
    1122 GHMAR 84
    1074 NGMKR 81
    1123 VHMNR 77
    1052 IDMKG 72
    1124 NHMTR 65
    1032 NHMSR 64
    964 EGMRR 57
    1125 THMTR 57
    1126 GHMKI 56
    1073 SHMQR 52
    1127 EHMVR 43
    1086 SHMTR 43
    1128 TKMKE 42
    1129 EHMER 38
    1130 THMKT 37
    1043 GDMDR 36
    1131 NGMRR 35
    1132 EPMLM 34
    1133 GHMVR 31
    1134 THMRT 29
    968 ESMRR 28
    1135 PHMKR 26
    1136 EHMRQ 24
    1137 EHMRT 23
    1138 DHMSR 22
    1039 QKMRT 22
    1139 ETMMI 21
    1034 SPMKV 21
    1140 SHMKL 21
    1141 TPMKL 21
    1142 GHMKM 20
    965 GNMGR 19
    1143 RQMLI 19
    1144 GHMRM 18
    1145 EGMKR 17
    1146 EHMKA 17
    1147 QIMPL 17
    1148 SHMKV 16
    1149 SGMNR 16
    1150 THMAR 16
    1151 QGMKR 15
    960 SAMRR 14
    1152 TKMEG 14
    1153 RPMGR 14
    1154 VHMRR 13
    1155 THMRV 13
    1068 THMEV 12
    1156 NHMKS 11
    1049 THMKG 11
    1157 AAMST 11
    980 TNMLR 11
    996 ENMER 10
    1158 GKMRD 10
    1159 THMEL 10
    998 SGMKR 10
    1160 TPMRV 10
    1161 SPMRV 10
    1104 GGMKR 10
    967 EGMAR 10
    1162 THMGV 9
    971 TAMRR 9
    995 DNMRR 9
    966 NGMRI 9
    961 ENMGR 9
    1163 MGMGR 8
    973 VNMRR 8
    1164 GKPSM 8
    975 NGMTA 8
    1165 SHMRV 8
    1166 SPMNR 8
    1167 SAMNR 8
    1168 SHMSR 8
    1169 NGMPR 8
    972 TNMQR 8
    1170 SPMRR 8
    994 QNMVR 8
    970 EGMHR 8
    1017 DNMAR 7
    1026 QNMGR 7
    1171 GHMGV 7
    1172 THMRL 7
    979 GSMRR 7
    1173 QHMKR 7
    1174 THMGR 7
    976 DGMRR 7
    1175 THMQR 6
    1038 THMER 6
    1021 ENMAR 6
    1176 RHMKR 6
    1018 SSMRR 6
    1177 EHMRV 6
    1178 KHMKR 6
    1179 QHMNR 6
    1180 RAMKV 6
    993 NGMAR 6
    984 TNMGR 6
    1002 DNMVR 6
    1066 ESMRE 6
    1181 GHMRV 6
    982 ENMLR 6
    185 NAMRG 5
    1014 SSMKI 5
    1182 TPMGV 5
    1040 GAMRR 5
    1183 GHMKV 5
    1184 RHMNR 5
    1009 ANMNR 5
    1185 TPMEL 5
    1022 NNMVR 5
    988 SGMVR 5
    1186 SPMKL 5
    1187 SPMKR 5
    1035 TNMRR 5
    1082 PNMKR 5
    1188 LAMEE 5
    1044 GAMRA 5
    1100 GAMRM 5
    1046 EGMGR 5
    1033 GGMRL 5
    1189 PGMMS 5
    986 GGMRR 5
    991 GSMKI 5
    1089 DNMGR 5
    183 EGMTR 4
    1190 SHMEV 4
    1004 GNMSR 4
    1191 GMMLT 4
    1003 QAMRE 4
    997 NSMRR 4
    1087 TGMRR 4
    1192 TPMKG 4
    1041 TPMEV 4
    1193 THMHR 4
    1194 SHMGV 4
    1063 DGMKI 4
    1016 DGMKR 4
    1195 THMKS 4
    1196 THMRG 4
    1197 GHMKT 4
    1015 TSMRR 4
    1019 GNMMR 4
    999 ANMQR 4
    1079 GSMRA 4
    1036 GNMRE 4
    1083 NGMKI 4
    1008 GAMRI 4
    1050 NNMGR 4
    1198 THMRS 4
    1013 EGMKG 4
    1199 NHMQR 4
    1007 SNMGR 4
    1200 SHMAR 3
    1061 GNMVR 3
    1201 EAMKR 3
    1202 GSMRE 3
    1203 SPMEL 3
    1204 AHMAR 3
    1057 DSMKR 3
    1205 PPMMV 3
    1027 TNMVR 3
    1096 HNMRR 3
    1206 KHMNR 3
    1030 GNMIR 3
    1084 SNMLR 3
    1207 TPMKR 3
    1208 QSMKR 3
    1209 RHMRR 3
    1075 ESMKE 3
    1210 DHMQR 3
    1056 TAMKV 3
    1211 AHMSR 3
    1212 EHMRS 3
    1213 AHMTR 3
    1214 GHINR 3
    1048 QSMRE 3
    1093 TNMAR 3
    1215 EYMRR 3
    1216 GQMNR 3
    1217 GHMKE 3
    1011 GSMKM 3
    1064 SHMHR 3
    1059 ESMGR 3
    1005 ESMQR 3
    1051 GHMNS 3
    1058 SNMAR 3
    1012 GNMAQ 3
    1023 TGMKR 3
    1031 SGMRR 3
    1001 ANMGR 3
    987 GGMKL 3
    1218 EHMMR 2
    1219 SHMRL 2
    1072 EGMRT 2
    1107 GGMQM 2
    1220 GGMKA 2
    1070 EAMSK 2
    1221 EHMPR 2
    1222 AHMKS 2
    1223 AHMQR 2
    1224 GHTRR 2
    1225 GHMKG 2
    1226 EPMKV 2
    1227 EHMAK 2
    1228 GYMNR 2
    1229 THMSS 2
    1230 GDMNR 2
    1231 GHMRT 2
    1094 EHMKQ 2
    1232 QRMGV 2
    1233 GSMRQ 2
    1234 DHMTR 2
    1235 VEMER 2
    1236 SPMEV 2
    1237 GPMKV 2
    1238 TPMER 2
    1239 EHMDR 2
    1240 EHVRR 2
    1091 DNMKI 2
    1241 GGMAR 2
    1242 HHMKR 2
    1243 GHMRS 2
    1244 EYMAR 2
    1245 KHMRR 2
    1246 EHMSS 2
    1247 TPMRL 2
    1248 GHMSL 2
    1249 VHMKR 2
    1250 GHTNR 2
    1251 GPMRT 2
    1081 GGMRM 2
    1092 TNMER 2
    1109 ESMKN 2
    1252 EQMRR 2
    1053 ESMTR 2
    1253 EHMKS 2
    1254 THMKM 2
    1065 GAMKE 2
    1024 TAMKR 2
    1010 DNMMR 2
    985 SSMAR 2
    1037 ENMMR 2
    1255 GTMKM 1
    1256 VHRIR 1
    1257 DHMNK 1
    1258 TPMNM 1
    1259 RQMII 1
    1260 EHMRW 1
    1261 SPMRL 1
    1262 GVMRA 1
    1263 GHMQV 1
    1264 GPMKL 1
    1265 IDMKR 1
    1266 PGMMG 1
    1267 KHMER 1
    1268 TPMNV 1
    1269 EHVQR 1
    1270 ENMKE 1
    1271 DHMKM 1
    1272 SHMNQ 1
    1108 NSMKM 1
    1273 GLMKR 1
    1274 APMNL 1
    1275 RHMSR 1
    1276 EHMRG 1
    1277 DWMRR 1
    1278 GHMRH 1
    1279 QNMHR 1
    1280 CHMRR 1
    1281 ERMRR 1
    1282 EHMKE 1
    1283 EPMKR 1
    1284 AHINR 1
    1285 SHMRT 1
    1286 PHMNR 1
    1287 AHMKV 1
    1288 THMGM 1
    1289 NGMKM 1
    1290 EKMKR 1
    1291 EHMIR 1
    1292 NNMHR 1
    1293 GNMNR 1
    1294 KRMQR 1
    1295 EKMRR 1
    1296 TQMKQ 1
    1297 EHMKV 1
    1298 DHMKE 1
    1299 EHTTR 1
    1300 SPMRM 1
    1301 GKMNR 1
    1302 TNMKR 1
    1303 THKRR 1
    1304 SQTNR 1
    1305 THLKR 1
    1306 SHMQS 1
    1307 THMSV 1
    1308 THMRH 1
    1309 DPMKV 1
    1310 PHMMS 1
    1311 SHVKR 1
    1102 ESMSR 1
    1312 SHMGL 1
    1313 TDMVA 1
    1314 PQMMS 1
    1315 KHMQR 1
    1316 EHMQL 1
    1317 EHISR 1
    1318 SHMKK 1
    1319 EQMTR 1
    1320 TPMRG 1
    1321 GHISR 1
    1322 GPMGV 1
    1323 GYMRR 1
    1324 GHMTV 1
    1325 APMIM 1
    1326 THINR 1
    1327 DHMMS 1
    1328 GHMKL 1
    1329 EKMEE 1
    1330 DPMRM 1
    1331 SHMKT 1
    1332 SPMGL 1
    1333 SPMGE 1
    1334 DHISR 1
    1335 TPMKQ 1
    1336 GHMKW 1
    1337 EHMCR 1
    1338 NNMKR 1
    1339 ESMKR 1
    1340 TEMLI 1
    1341 SHMKM 1
    1342 EHVNR 1
    1343 GHMER 1
    1344 NHMDR 1
    1345 GHMWR 1
    1346 THMKI 1
    1347 QKMKE 1
    1348 THMNK 1
    1349 AHMKQ 1
    1350 DHMGR 1
    1351 EGMKW 1
    1352 TQMKE 1
    1353 TRMRR 1
    1354 AHMGR 1
    1355 TRMKR 1
    1356 KNLTR 1
    1357 PEMMS 1
    1358 EHLTL 1
    1359 RHMKV 1
    1360 PGMIR 1
    1361 THTKR 1
    1362 EHIRR 1
    1363 THMPR 1
    1364 GKMKQ 1
    1365 GPMRV 1
    1366 AHVNR 1
    1367 EPMSR 1
    1368 PRMMV 1
    1369 ELMSR 1
    1090 ETMRR 1
    1370 SNMNR 1
    1371 TSMKT 1
    1372 GNMHR 1
    1373 TQMRR 1
    1374 SHMKG 1
    1375 DHMRT 1
    1376 EHMRE 1
    1377 SQLNR 1
    1378 SHMGR 1
    1379 GHKNR 1
    1380 THMNL 1
    1381 GYMKR 1
    1382 SNMKV 1
    1383 GHMRC 1
    1384 NHMRV 1
    1385 SGMKT 1
    1386 EHLRR 1
    1387 VPMRR 1
    1388 DLMKR 1
    1389 TSMKL 1
    1390 APMTV 1
    1105 ESMRL 1
    1391 EHMLM 1
    1392 EKMNR 1
    1393 THRRR 1
    1111 DAMRV 1
    1394 ERMNR 1
    1395 NHMHR 1
    1396 DLMNR 1
    1397 GQMQR 1
    1398 RGMMI 1
    1399 TQMKR 1
    1400 EHMGV 1
    1401 AHMTQ 1
    1402 TPMMV 1
    1403 GHKRR 1
    1404 GPMER 1
    1405 EPMQV 1
    1101 TAMNQ 1
    1406 GDMRR 1
    1407 EHLKR 1
    1408 DHMKK 1
    1409 GDIDR 1
    1410 GHMKK 1
    1411 TQMMI 1
    1412 SGMKA 1
    1413 TPMRM 1
    1414 SPMKG 1
    1415 KQLNR 1
    1416 NHMKT 1
    1417 TKMRE 1
    1098 VSMKR 1
    1418 EHMAV 1
    1419 EHMNS 1
    1420 DHMHR 1
    1421 AHMVR 1
    1422 GRMRR 1
    1423 GHMNV 1
    1424 GHMNL 1
    1425 GHVSR 1
    1426 GQMHR 1
    1427 EKMAR 1
    1428 NHMGL 1
    1429 EHMKG 1
    1430 EPMAL 1
    1431 AHLTR 1
    1432 KHMTR 1
    1433 GHMTM 1
    1434 EPMSG 1
    1435 NHMNM 1
    1436 GQMKR 1
    1437 TPMEG 1
    1438 KHMRV 1
    1439 SLMKR 1
    1440 DGMRN 1
    1441 RQMHI 1
    1442 EPMRV 1
    1113 GAMRE 1
    1443 SHMRM 1
    1444 EQMAR 1
    1445 SHMRS 1
    1446 EHMQV 1
    1447 EPMPM 1
    1448 IDMNR 1
    1449 TKMKQ 1
    1450 RQMLS 1
    1451 ATMML 1
    1452 PQMMI 1
    1453 NAMKI 1
    1454 GHMQS 1
    1455 EAMKK 1
    1456 THMRK 1
    1457 PHMRR 1
    1458 GHMKA 1
    1459 AHMNH 1
    1460 EYMSR 1
    1461 EHMAW 1
    1462 NHMGR 1
    1463 GHMKS 1
    1464 EHMRL 1
    1465 ENMTR 1
    1099 TNMMR 1
    1466 QAMRV 1
    1467 EHMQP 1
    1468 THMSM 1
    1469 IDMKE 1
    1047 EGTHR 1
    1055 HNMMR 1
    1045 PNMSR 1
    184 SNMVR 1
    1062 ANMRR 1
    1042 GGMRE 1
    1060 EAMRR 1
    1067 GSMLR 1
    1054 SHMKI 1
    1076 ANMHR 1
    1069 TSMGR 1
    1077 DHTKR 1
    1078 NGMRE 1
    1071 NAMRQ 1
    1080 EGMNQ 1
    1085 SNMRR 1
    1088 SGMRI 1
  • TABLE 10
    ZF6 selection on A:C change at
    nt 6 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID Read
    NO: Sequence #
    37 HRES 6362
    36 MNES 5959
    1470 VKES 3337
    1471 LRDS 2986
    1472 HLES 1799
    1473 TRES 1285
    1474 MREA 648
    1475 VRET 601
    1476 MRET 284
    1477 LLES 222
    1478 MRTS 192
    1479 ERKS 122
    1480 IKES 111
    38 RPDT 95
    1481 VRVT 61
    1482 RNES 51
    1483 HVES 41
    98 RTET 40
    1484 LSHT 33
    1485 RPES 33
    1486 SRES 32
    1487 ENKA 25
    167 RADN 24
    1488 TREN 23
    1489 DSPQ 21
    1490 RRES 20
    1491 RGEN 17
    1492 VRES 17
    1493 HRDS 15
    1494 HREA 15
    1495 LRDT 15
    1496 RVES 15
    1497 EKKS 14
    1498 GRES 13
    1499 RMES 13
    1500 LRES 12
    1501 RTDN 12
    1502 HADH 12
    1503 VNES 12
    1504 ANES 12
    112 RTEN 12
    1505 RNEH 11
    1506 MNET 11
    1507 RLDT 11
    99 RADV 10
    1508 RLET 9
    1509 HRET 9
    HMR... 9
    1510 NRES 8
    1511 TGEA 8
    1512 TGES 8
    1513 RHET 8
    1514 MRES 7
    172 RNDT 7
    1515 LVES 7
    1516 VGSS 7
    40 RHDT 7
    1517 RIDT 7
    1518 VREA 6
    1519 HMES 6
    1520 ERKN 5
    1521 RPEA 5
    1522 TPPI 5
    1523 RREA 5
    1524 RQEN 5
    1525 VKDS 4
    1526 RKES 4
    1527 MLGL... 4
    1528 DRPN 4
    1529 RKEA 4
    1530 VMLGL... 4
    1531 TRDS 4
    1532 HLET 4
    1533 HLDS 4
    1534 PPAT 4
    1535 ENAS 4
    1536 VKET 4
    1537 GREA 4
    1538 TREA 4
    H... 4
    1539 IRDS 3
    1540 MNDS 3
    1541 LLDS 3
    1542 RTES 3
    1543 RPET 3
    1544 IDVH 3
    1545 RTEH 3
    1546 TRET 3
    1547 HGES 3
    1548 TMES 3
    1549 LRVS 2
    1550 PREA 2
    1551 EGKN 2
    1552 TSES 2
    1553 VKFGHIFCVL
    L*NV... 2
    1554 YRES 2
    1555 MKES 2
    39 RTDI 2
    1556 MNEG 2
    1557 MIES 2
    1558 QRES 2
    1559 MMEA 2
    1560 MNER 2
    RGS 2
    171 RTSS 2
    1561 RNAS 2
    1562 RTDT 2
    1563 TRVS 1
    1564 TFNV 1
    1565 VRVS 1
    1566 FRDS 1
    1567 IKER 1
    1568 RLEN 1
    1569 IKET 1
    1570 HRVS 1
    1571 DRKG 1
    1572 VKEC 1
    1573 MSEA 1
    1574 LRDR 1
    1575 INES 1
    1576 MSES 1
    1577 NLES 1
    1578 LQDS 1
    1579 HAPT 1
    HRR... 1
    1580 HRKA 1
    1581 LRGS 1
    1582 QSGT 1
    1583 HUES 1
    1584 ETGS 1
    SGT... 1
    1585 MLGF... 1
    1586 MNGS 1
    1587 MRED 1
    1588 TKES 1
    1589 RPDH 1
    1590 HRGS 1
    1591 GNES 1
    1592 LWDS 1
    1593 MRDS 1
    1594 IHES 1
    1595 LRDG 1
    1596 LRDC 1
    1597 MYES 1
    1598 RPNI 1
    1599 EGRS 1
    TRR... 1
    1600 RLES 1
    1601 LGLPTGR... 1
    1602 ARES 1
    1603 HLGS 1
    1604 HSES 1
    1605 PRTS 1
    1606 MNKS 1
    1607 RRDS 1
    1608 RREN 1
    1609 QGES 1
    1610 LREA 1
    1611 LLET 1
    1612 MREV 1
    1613 VEES 1
    1614 MNEA 1
    1615 RNEN 1
    1616 HWES 1
    1617 RHEA 1
    1618 MTES 1
    1619 GRDS 1
    1620 VSET 1
    1621 MRKA 1
    1622 EKES 1
    1623 ERKG 1
    VKR... 1
    1624 RNDH 1
    1625 VPDA 1
    TGR... 1
    1626 RKDA 1
    1627 SPDT 1
    1628 TTTL 1
    1629 RKDS 1
    1630 RRLT 1
    1631 RTSN 1
    LRT... 1
    1632 RQSA 1
    1633 ARFT 1
    1634 DRKS 1
    169 RRDT 1
    1635 RMDS 1
    1636 HRKS 1
    1637 GT113 1
    1638 DKRN 1
    1639 RPERE... 1
    1640 SGDS 1
    TAG 1
    GR... 1
    T... 1
    1582 ...QSGT... 0
  • TABLE 11
    ZF6 selection on A:G change at
    nt 6 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID #
    NO: Sequence Reads
    38 RPDT 6216
    1482 RNES 2750
    98 RTET 1736
    1485 RPES 1565
    167 RADN 1412
    112 RTEN 973
    1499 RMTS 860
    1507 RLDT 734
    1490 RRES 690
    1501 RTDN 588
    1496 RVES 584
    1505 RNEH 575
    1517 RIDT 557
    1521 RPEA 516
    1491 RGEN 467
    99 RADV 455
    172 RNDT 452
    1513 RHET 413
    1529 RKEA 340
    1508 RLET 297
    1543 RPET 263
    1523 RREA 252
    40 RHDT 247
    37 HRES 239
    1526 RKES 231
    1524 ROTN 199
    1641 RGSA 186
    171 RTSS 154
    39 RTDI 152
    1479 ERKS 123
    36 MNES 104
    1561 RNAS 90
    1608 RREN 88
    1642 RLDP 82
    169 RRDT 80
    1545 RTEH 80
    1626 RKDA 63
    1470 VKES 61
    1643 RRET 53
    1471 LRDS 44
    1562 RTDT 36
    1568 RLEN 35
    1564 TFNV 29
    1644 RADT 28
    1472 HLES 28
    1473 TRES 27
    1645 RKET 24
    1646 ATNM 23
    1647 RREH 22
    1648 RTDH 21
    1632 RO5A 21
    1542 RTES 20
    1649 RNET 20
    1650 RPDN 19
    1651 THVP 19
    1633 ARFT 18
    1487 ENKA 18
    1637 GTTP 17
    1652 EASN 16
    1653 RMTG 14
    1654 RTAA 14
    1589 RPDH 14
    1627 SPDT 14
    1489 DSPQ 14
    1497 EKKS 13
    1474 MREA 13
    1655 RNEP 12
    1656 VHDN 12
    1657 RKEN 12
    1658 RPYT 12
    1659 ROTS 11
    1660 RSGS 11
    1661 RPDS 10
    1475 VRET 10
    1662 MTGN 7
    1530 VMLGL... 7
    1615 RNEN 7
    1663 RGET 6
    1664 RKGS 6
    1600 RLES 5
    1476 MRET 5
    1624 RNDH 5
    1665 RNDS 5
    1666 STET 5
    1537 GREA 5
    1667 SNES 5
    1668 RPDA 4
    1669 RNER 4
    1670 RPEN 4
    1671 RVET 4
    1672 RAET 4
    1673 SHET 4
    1674 RSDT 4
    Q... 4
    1535 ENAS 3
    1675 LPDT 3
    1676 MMES 3
    1677 SPES 3
    1678 RMTN 3
    1679 RVEI 3
    1607 RRDS 3
    1680 RMTT 3
    1681 SADN 3
    1682 RAES 3
    1683 RPDV 3
    1684 RTEA 3
    1685 RHES 3
    1686 ROTA 3
    1478 MRTS 3
    1520 ERKN 3
    1687 RNRS 2
    1688 RAEA 2
    1689 RVDN 2
    1690 RNEG 2
    1691 RVEG 2
    1692 RAEN 2
    1693 RVDT 2
    1694 RDDN 2
    1695 RLEA 2
    1696 RPNT 2
    1697 RGES 2
    1698 SPEA 2
    1699 RTAG 2
    1700 MKEA 2
    1486 SRES 2
    1701 WNES 2
    1591 GNES 2
    1629 RKDS 2
    1628 TTTL 2
    1702 RVEN 2
    1635 RMDS 2
    1703 RMEH 2
    1630 RRLT 2
    1704 RKEH 1
    1705 ENRS 1
    1706 RNKS 1
    1707 RPGE... 1
    1708 RKDT 1
    1625 VPDA 1
    1709 RGEA 1
    1710 WIDT 1
    1711 RNEY 1
    1712 RADI 1
    1713 RADY 1
    1714 RTDD 1
    1715 RVDS 1
    1716 HTET 1
    1717 HTEN 1
    1718 SGEN 1
    1719 RTST 1
    1720 RAGR... 1
    1721 SNAS 1
    1722 RPGT 1
    1723 RAEH 1
    1724 MHDT 1
    1725 REDN 1
    1726 REEV 1
    RRR... 1
    1727 RMEW 1
    1728 RRER 1
    1729 RLDN 1
    RPT... 1
    1730 MVES 1
    1510 NRES 1
    1731 RIPA 1
    1732 RMEA 1
    1733 RHNT 1
    1734 RNSS 1
    1735 LPES 1
    1736 SLDP 1
    1737 STEN 1
    1738 RPKS 1
    ATS... 1
    1739 MIDT 1
    1740 PPDT 1
    1741 GLDA 1
    1742 RPEGE... 1
    1743 RHYT 1
    1744 RTEI 1
    1745 SPEN 1
    APR... 1
    LSL... 1
    1746 RHEN 1
    1747 REDV 1
    1748 RLKT 1
    1749 RIET 1
    1750 RIES 1
    1477 LLES 1
    1751 RPDI 1
    1752 MNDT 1
    1753 RLYT 1
    1504 ANES 1
    1754 RAYN 1
    1755 RADS 1
    1756 KNES 1
    1757 RVSA 1
    1758 RPED 1
    1759 RGEH 1
    1728 RRER... 1
    1760 LTET 1
    1761 LADN 1
    GTR... 1
    1762 RPER... 1
    1763 MLGLPGTR... 1
    1764 RPDP 1
    1765 QADV 1
    1599 EGRS 1
    RGR... 1
    1766 MADV 1
    1767 HTDN 1
    1768 RKEV 1
    1769 RADA 1
    1770 RDAS 1
    1771 MLDT 1
    1772 RPGS 1
    1773 RTEY 1
    1774 SLDT 1
    1775 RWES 1
    1776 ERKA 1
    1777 RIYT 1
    1778 TPVP 1
    1779 RQDA 1
    1780 RMER 1
    1631 RTSN 1
    LRT... 1
    1559 MMEA 1
    1481 VRVT 1
    1634 DRKS 1
    1488 TREN 1
    1636 HRKS 1
    1500 LRES 1
    1639 RPERE... 1
    1638 DKRN 1
    1781 VGTV 1
    1582 ...QSGT... 0
  • TABLE 12
    ZF6 selection on A:C change at
    nt 6 of core motif in CBS.
    Sequences reflect position
    −1 to 3.
    SEQ
    ID #
    NO: Sequence Reads
    37 HRES 7487
    1479 ERKS 7125
    1489 DSPQ 876
    1487 ENKA 801
    1497 EKKS 508
    1473 TRES 141
    38 RPDT 126
    1520 ERKN 120
    1537 GREA 112
    1535 ENAS 103
    1471 LRDS 95
    36 MNES 89
    1504 ANES 84
    1571 DRKG 73
    1634 DRKS 72
    1599 EGRS 69
    1584 ETGS 67
    1482 RNES 60
    1470 VKES 57
    1486 SRES 50
    98 RTET 42
    1625 VPDA 39
    1630 RRLT 37
    167 RADN 30
    1485 RPES 30
    1782 ERGG 27
    1472 HLES 25
    1638 DKRN 25
    112 RTEN 21
    1628 TTTL 19
    1636 HRKS 19
    1490 RRES 19
    1499 RMTS 18
    1551 EGKN 17
    1623 ERKG 16
    1491 RGEN 16
    1705 ENRS 15
    1498 GRES 15
    1501 RTDN 15
    1507 RLDT 13
    1496 RVES 13
    1517 RIDT 13
    1510 NRES 13
    1505 RNEH 12
    1783 EKGT 11
    1513 RHET 11
    1474 WIREA 10
    1543 RPET 9
    QGK 9
    1519 HNIES 9
    1475 VRET 9
    99 RADV 9
    HMR... 9
    1784 ERNS 8
    1524 ROTN 8
    172 RNDT 8
    40 RHDT 8
    1493 HRDS 7
    171 RTSS 7
    1529 RKEA 7
    1785 ENNS 6
    1776 ERKA 6
    1523 RREA 5
    RGS 5
    QEK... 5
    1478 WIRTS 5
    1500 LRES 4
    1526 RKES 4
    1786 HREN 4
    1521 RPEA 4
    1547 HGES 4
    39 RTDI 4
    1508 RLET 4
    1477 LLES 3
    1626 RKDA 3
    1476 WIRET 3
    1590 HRGS 3
    1787 ERKR 3
    1561 RNAS 3
    1788 ERKI 3
    1789 ERRS 2
    1642 RLDP 2
    1604 HSES 2
    1790 YSPQ 2
    1791 EGKS 2
    1792 HRER 2
    QVK... 2
    1793 DRKA 2
    1794 ESGN 2
    QG... 2
    1795 ERES 2
    1796 HKES 2
    1797 ESKS 2
    1558 QRES 2
    1798 WKS 2
    1627 SPDT 2
    169 RRDT 2
    1527 MLGL 2
    1633 ARFT 2
    1562 RTDT 2
    1799 KRKS 1
    1652 EASN 1
    1800 TGDA 1
    1801 NRKS 1
    RGK 1
    1802 EKNS 1
    HRE... 1
    1803 QGKS 1
    1662 WITGN 1
    1804 DSPD. 1
    TGE... 1
    1805 VRKS 1
    1509 HRET 1
    1806 ENKV 1
    1568 RLEN 1
    1732 RMTA 1
    1494 HREA 1
    1692 RAEN 1
    1774 SLDT 1
    R... 1
    1512 TGES 1
    1644 RADT 1
    QAK... 1
    1807 DIPQ 1
    QGT... 1
    1808 ERKC 1
    1809 HSPQ 1
    1542 RTES 1
    1538 TREA 1
    1810 RTAT 1
    QGR... 1
    1811 TRKS 1
    1812 GRKS 1
    1813 ESKA 1
    ERK... 1
    1554 YRES 1
    1814 EKRN 1
    MGK... 1
    1815 DSPH 1
    1816 ERNG 1
    1817 VSPQ 1
    QWK... 1
    1818 EKKC 1
    1601 LGLPTGR... 1
    1819 ERNN 1
    1643 RRET 1
    1820 TNES 1
    1821 HRKN 1
    RLF... 1
    1822 DKSN 1
    1823 DRNS 1
    KRN 1
    1824 ERMS 1
    1608 RREN 1
    1825 +IAS 1
    1826 HREC 1
    1827 ERKT 1
    1828 ETGN 1
    1632 RQSA 1
    1631 RTSN 1
    1635 RMDS 1
    1545 RTEH 1
    1559 MMEA 1
    1629 RKDS 1
    LRT... 1
    1481 VRVT 1
    1488 TREN 1
    1639 RPERE... 1
    1637 GTTP 1
    1640 SGDS 1
    1582 ...QSGT... 0
  • TABLE 13
    ZF5 selection on G:T change at
    nt 7 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID #
    NO: Sequence Read
    165 TRLKE 2129
    42 HRLKE 1938
    44 SRLKE 1530
    110 TRLRE 1078
    1829 HRLRE 1073
    47 NRLKE 1015
    1830 QRLRE 769
    1831 DALKR 700
    109 DGLKR 681
    1832 SRLRE 534
    43 HALKV 389
    94 NRLKV 381
    93 ERLRV 375
    1833 DGLKK 374
    41 HGLKV 335
    1834 HRLKV 315
    1835 ERLRM 295
    1836 QRLKE 243
    1837 DGLVR 235
    46 HTLKV 233
    1838 NRLRE 195
    1839 ARLRE 168
    108 DALRR 168
    1840 ERLRQ 141
    1841 ARLKE 135
    1842 TRLRD 125
    1843 DGLRR 118
    1844 SRLNE 118
    1845 TGLKV 92
    1846 HRLSE 91
    1847 HRLNE 78
    1848 SHLKV 75
    1849 TTLKV 75
    1850 HRLGE 68
    1851 STLKV 66
    1852 DGLKV 65
    1853 DGLRK 61
    1854 HRLTE 60
    1855 DRLKV 59
    1856 HSLKV 56
    45 DGLRV 47
    1857 SRLKV 45
    1858 QRLKV 44
    1859 HGLTV 43
    1860 HRLME 43
    1861 RLLPN 42
    1862 ERLKV 41
    1863 NRLRV 35
    1864 TRLKV 34
    1865 DGLKE 29
    454 DTLKV 29
    1866 HGLRV 29
    1867 SALKT 28
    1868 HRLAE 25
    1869 ERUS 23
    1870 DGLTR 22
    1871 DALVR 21
    1872 HRLKR 21
    1873 ERLRE 20
    1874 HQLKV 20
    1875 TTLKQ 18
    1876 SRLKR 17
    1877 DRLKQ 16
    1878 HRLRV 16
    1879 TRLKR 16
    1880 TRLNE 16
    1881 NRLKQ 15
    1882 TRLKD 14
    1883 TRLRV 14
    1884 EALKR 13
    1885 HTLKQ 13
    1886 NALKV 13
    1887 SALKV 13
    1888 SRLKD 13
    1889 DGLRE 12
    1890 ERLKE 12
    488 DTLKQ 11
    1891 HKLKV 11
    1892 GTLKV 10
    1893 ERLRR 9
    1894 HALKT 9
    1895 HGLKE 9
    1896 HHLVQ 9
    1897 NGLKV 9
    538 DALKE 8
    1898 DALKV 8
    1899 HALKE 8
    1900 HHLKQ 8
    1901 HHLKV 8
    1902 TRLKK 8
    1903 DRLRT 7
    1904 DRLRV 7
    371 DTLRV 7
    1905 HRLKK 7
    262 HTLKE 7
    1906 NRLKK 7
    235 STLKE 7
    1907 SRLIE 6
    1908 TRLME 6
    1909 ATLKV 5
    1910 HGLVV 5
    1911 HRLRM 5
    1912 HRLRQ 5
    1913 HTLKA 5
    1914 NRLRD 5
    1915 TGLKE 5
    1916 TGLKT 5
    1917 TRLRQ 5
    1918 TTLM 5
    1919 TTLRV 5
    1920 DRLKE 4
    1921 HRLKA 4
    1922 HRLKD 4
    1923 HSLKE 4
    1924 NRLM 4
    1925 NRLKR 4
    1926 STLKA 4
    548 STLKQ 4
    1927 TRLKA 4
    1928 TRLKQ 4
    1929 TRLRR 4
    447 DTLKA 3
    1930 HALKR 3
    1931 HGLKA 3
    1932 HGLKR 3
    1933 HPEG... 3
    1934 HRLK... 3
    1935 HRLRK 3
    1936 HTLRV 3
    1937 NTLKQ 3
    1938 QRLRV 3
    1939 SRLME 3
    1940 SRPKE 3
    1941 TQLKV 3
    1942 TRLQE 3
    1943 TRLR... 3
    1944 ARLKR 2
    1945 ARLKV 2
    1946 ARLR... 2
    1947 ARLRV 2
    1948 ARLVR 2
    1949 DALKK 2
    1950 DALRV 2
    1951 DAPKR 2
    1952 DRLRE 2
    1953 EGLKV 2
    1954 ERLLV 2
    1955 ERLRA 2
    1956 ERMRM 2
    1957 GGLKV 2
    1958 GGLVT 2
    1959 HALRE 2
    1960 HGLRE 2
    1961 HHLKE 2
    1962 HILKA 2
    1963 HRLQE 2
    1964 HRLRR 2
    1965 KRLKE 2
    1966 KTLKQ 2
    1967 NALKE 2
    1968 NRLNE 2
    1969 NTLKV 2
    1970 QRLKR 2
    1971 QRLRQ 2
    1972 QSLIA 2
    1973 QTLKV 2
    1974 RKLRS 2
    1975 RRLRE 2
    1976 SALKE 2
    1977 SRLKK 2
    1978 SRLRK 2
    1979 SRLRV 2
    297 STLRV 2
    1980 TMLKE 2
    1981 TRLKG 2
    1982 TRLRM 2
    1983 TRLTE 2
    1984 TRRKE 2
    1985 AALKR 1
    1986 AGLKR 1
    1987 AGLKV 1
    1988 AGLVR 1
    1989 ARLGE 1
    1990 ARLME 1
    1991 ARLNE 1
    1992 ARLRD 1
    1993 ARLRM 1
    1994 CRLKE 1
    1995 DALDR 1
    1996 DALKT 1
    1997 DALKW 1
    1998 DALRK 1
    1999 DALTV 1
    2000 DELKR 1
    2001 DELPG 1
    2002 DGLK... 1
    2003 DGLKG 1
    2004 DGLKW 1
    2005 DGLLR 1
    2006 DGLRQ 1
    2007 DGLTV 1
    2008 DGLVW 1
    1016 DGMKR 1
    2009 DKLKQ 1
    2010 DKLRQ 1
    2011 DRLRK 1
    2012 DTHAG... 1
    2013 DTLKT 1
    2014 DVLKK 1
    2015 EAAG... 1
    2016 EHLRQ 1
    2017 ELLKV 1
    2018 EPLRV 1
    2019 ERLCV 1
    2020 ERLKK 1
    1893 ERLRR... 1
    2021 ERLVR 1
    2022 ERLWE 1
    2023 ERPRM 1
    2024 ERPRV 1
    2025 ERQRM 1
    2026 GGLKQ 1
    2027 GGLKR 1
    2028 GMLKV 1
    2029 GRLKE 1
    2030 GTLKQ 1
    2031 HALKA 1
    2032 HALKG 1
    2033 HALPV 1
    2034 HAPEV 1
    2035 HGLKK 1
    2036 HGLKQ 1
    2037 HGLMV 1
    2038 HGLPV 1
    2039 HGLRD 1
    54 HGLVR 1
    2040 HGQKE 1
    2041 HGRKV 1
    2042 HGRRG 1
    2043 HHLRV 1
    2044 HILIA 1
    2045 HKLKE 1
    2046 HKLRV 1
    2047 HMLKR 1
    2048 HMLRE 1
    2049 HNLKV 1
    2050 HPLKV 1
    2051 HQLKE 1
    2052 HQLRE 1
    2053 HQLRV 1
    HR*A... 1
    2054 HRGCG... 1
    2055 HRLDE 1
    2056 HRLIE 1
    2057 HRLKF 1
    2058 HRLKG 1
    2059 HRLKL 1
    2060 HRLMV 1
    2061 HRLN... 1
    2062 HRLR... 1
    2063 HRLRA 1
    2064 HRLS... 1
    2065 HRLVR 1
    2066 HRMRE 1
    2067 HRPKE 1
    2068 HRPNE 1
    2069 HRQRE 1
    2070 HRRKE 1
    2071 HRRME 1
    2072 HRRRE 1
    2073 HRVRE 1
    2074 HSACG... 1
    2075 HSLNV 1
    2076 HSLRV 1
    2077 HTLAQ 1
    2078 HTLNV 1
    2079 HTMKV 1
    2080 HVLKV 1
    2081 HWLRE 1
    2082 KGLKQ 1
    2083 MHLRS 1
    2084 MRLRE 1
    2085 MRLRM 1
    2086 NALKR 1
    2087 NGLKE 1
    2088 NLLRE 1
    2089 NMLKE 1
    2090 NMLNV 1
    2091 NPLRE 1
    2092 NRFKE 1
    2093 NRLIE 1
    2094 NRLKA 1
    2095 NRLKF 1
    2096 NRLKL 1
    2097 NRLKT 1
    2098 NRLME 1
    2099 NRLND 1
    2100 NRLNV 1
    2101 NRLQE 1
    2102 NRLR... 1
    2103 NRLRM 1
    2104 NRLRQ 1
    2105 NRMKE 1
    2106 NRPKE 1
    2107 NRPKV 1
    2108 NRQKE 1
    2109 NSLKE 1
    2110 NTLTV 1
    2111 PRLKE 1
    2112 PRLLP 1
    2113 PRLRE 1
    2114 PRLTE 1
    2115 QAEG... 1
    2116 QRLIS 1
    2117 QRLKK 1
    2118 QRLME 1
    2119 QRLRG 1
    2120 QRLRM 1
    2121 QRLTE 1
    2122 QTA*R... 1
    2123 QTAW... 1
    2124 QTG*S... 1
    R... 1
    2125 RGLKV 1
    2126 RRLGD 1
    2127 RRLKE 1
    2128 RRLNE 1
    2129 RRLTK 1
    2130 SALKK 1
    2131 SALKR 1
    2132 SCLKE 1
    2133 SGLAM 1
    2134 SGLAV 1
    2135 SGLKV 1
    2136 SHLKE 1
    2137 SKLKV 1
    649 SNLKV 1
    2138 SQLKV 1
    2139 SRLIG 1
    2140 SRLK... 1
    2141 SRLKA 1
    2142 SRLKG 1
    2143 SRLQE 1
    2144 SRLR... 1
    2145 SRLRA 1
    2146 SRLRM 1
    2147 SRLRQ 1
    2148 SRLTE 1
    2149 SRQRE 1
    2150 SSLKE 1
    2151 SSLKV 1
    2152 SSQRE 1
    2153 STLKR 1
    TAG... 1
    2154 TGLKG 1
    2155 TGLKQ 1
    2156 TGLKS 1
    2157 TGLRV 1
    2158 TGRRG 1
    2159 TLLRE 1
    2160 TMQKE 1
    2161 TRL*L 1
    2162 TRLAE 1
    2163 TRLE... 1
    2164 TRLEE 1
    2165 TRLGE 1
    2166 TRLK... 1
    2167 TRLKY 1
    2168 TRLRG 1
    2169 TRLRK 1
    2170 TRLSE 1
    2171 TRPKE 1
    2172 TRQRD 1
    2173 TRRRD 1
    2174 TRVRE 1
    2175 TSLRE 1
    2176 TTLKA 1
    2177 TTLKE 1
    2178 TTLKL 1
    2179 TTLKT 1
    2180 TTPRG 1
    2181 TTRKQ 1
    2182 TWLRE 1
    2183 VRRKV 1
    2184 YGLKR 1
    2185 YRLKE 1
    2186 YTLKV 1
  • TABLE 14
    ZF5 selection on G:C change at
    nt 7 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    44 SRLKE 2533
    165 TRLKE 2146
    42 HRLKE 1984
    47 NRLKE 1528
    1829 HRLRE 1001
    1832 SRLRE 799
    110 TRLRE 625
    46 HTLKV 499
    41 HGLKV 320
    1830 QRLRE 299
    1851 STLKV 249
    1841 ARLKE 238
    1836 QRLKE 135
    235 STLKE 126
    1849 TTLKV 102
    447 DTLKA 95
    1891 HKLKV 87
    454 DTLKV 84
    43 HALKV 82
    1962 HILKA 80
    1845 TGLKV 80
    1839 ARLRE 78
    1850 HRLGE 75
    1838 NRLRE 75
    1854 HRLTE 61
    1861 RLLPN 55
    1852 DGLKV 50
    1834 HRLKV 46
    1856 HSLKV 43
    1931 HGLKA 37
    94 NRLKV 30
    1901 HHLKV 27
    1972 QSLIA 26
    371 DTLRV 25
    1864 TRLKV 25
    2177 TTLKE 25
    262 HTLKE 24
    1888 SRLKD 23
    1948 ARLVR 20
    2187 SKLKE 20
    1855 DRLKV 19
    93 ERLRV 19
    1857 SRLKV 19
    1831 DALKR 18
    109 DGLKR 18
    2029 GRLKE 18
    1892 GTLKV 18
    1842 TRLRD 17
    1913 HTLKA 16
    1868 HRLAE 15
    488 DTLKQ 14
    1895 HGLKE 14
    2188 HILKT 14
    1974 RKLRS 14
    2133 SGLAM 12
    1875 TTLKQ 12
    1926 STLKA 11
    1833 DGLKK 10
    2126 RRLGD 10
    1882 TRLKD 10
    2189 TSLKV 10
    1837 DGLVR 9
    1835 ERLRM 9
    1961 HHLKE 9
    1896 HHLVQ 9
    1847 HRLNE 9
    1885 HTLKQ 9
    1880 TRLNE 9
    2190 HRLHE 8
    1848 SHLKV 8
    2191 SKLRM 8
    45 DGLRV 7
    1862 ERLKV 7
    2192 GTLRV 7
    1921 HRLKA 7
    2193 HTLKS 7
    1844 SRLNE 7
    1915 TGLKE 7
    108 DALRR 6
    2194 HGLKT 6
    1859 HGLTV 6
    2045 HKLKE 6
    1860 HRLME 6
    1887 SALKV 6
    1909 ATLKV 5
    2195 DTLKE 5
    2196 GILND 5
    2135 SGLKV 5
    2141 SRLKA 5
    1871 DALVR 4
    2197 ETLKV 4
    1846 HRLSE 4
    1923 HSLKE 4
    1936 HTLRV 4
    1969 NTLKV 4
    1858 QRLKV 4
    2140 SRLK... 4
    2198 THLKE 4
    1928 TRLKQ 4
    1945 ARLKV 3
    1853 DGLRK 3
    1843 DGLRR 3
    1840 ERLRQ 3
    1957 GGLKV 3
    1960 HGLRE 3
    1900 HHLKQ 3
    1965 KRLKE 3
    2199 NALRV 3
    1897 NGLKV 3
    2200 NRLGE 3
    1906 NRLKK 3
    1975 RRLRE 3
    2132 SCLKE 3
    2137 SKLKV 3
    2201 SRLRD 3
    1979 SRLRV 3
    548 STLKQ 3
    1927 TRLKA 3
    1942 TRLQE 3
    2186 YTLKV 3
    2202 APLLR 2
    2009 DKLKQ 2
    2203 DKLKV 2
    1920 DRLKE 2
    1873 ERLRE 2
    1899 HALKE 2
    2043 HHLRV 2
    2051 HQLKE 2
    2204 HRLEE 2
    1878 HRLRV 2
    2205 HTLKG 2
    1966 KTLKQ 2
    2206 MVLVV 2
    2094 NRLKA 2
    2207 NRLKD 2
    1881 NRLKQ 2
    2101 NRLQE 2
    2108 NRQKE 2
    2208 NTLKA 2
    1938 QRLRV 2
    1973 QTLKV 2
    2127 RRLKE 2
    2209 SRLKQ 2
    2151 SSLKV 2
    553 STLRQ 2
    297 STLRV 2
    1983 TRLTE 2
    2175 TSLRE 2
    1987 AGLKV 1
    2210 AQMKE 1
    1991 ARLNE 1
    1992 ARLRD 1
    2211 ARRRE 1
    2212 CRLM... 1
    2213 CRLMV 1
    538 DALKE 1
    1898 DALKV 1
    2001 DELPG 1
    1865 DGLKE 1
    2010 DKLRQ 1
    2214 DRLKA 1
    2215 DRLKT 1
    1952 DRLRE 1
    1903 DRLRT 1
    2013 DTLKT 1
    2216 DTPKA 1
    1869 ERLIS 1
    1893 ERLRR... 1
    2023 ERPRM 1
    2026 GGLKQ 1
    2028 GMLKV 1
    2217 GRLKA 1
    2218 GRLKV 1
    2030 GTLKQ 1
    2219 GVLKE 1
    2220 GVLTG 1
    2221 HALDV 1
    2031 HALKA 1
    2222 HELKV 1
    2223 HGLEA 1
    2036 HGLKQ 1
    2224 HGLRG 1
    2225 HGMKA 1
    2226 HGPKV 1
    2044 HILIA 1
    2227 HILKE 1
    2228 HILKV 1
    2229 HILNA 1
    2230 HKLKG 1
    2231 HKLKQ 1
    2046 HKLRV 1
    2048 HMLRE 1
    1933 HPEG... 1
    2232 HPLKE 1
    1874 HQLKV 1
    2233 HRLGV 1
    1922 HRLKD 1
    2058 HRLKG 1
    2059 HRLKL 1
    1872 HRLKR 1
    2234 HRLLE 1
    2235 HRLQG 1
    2063 HRLRA 1
    2236 HRLRS 1
    2237 HRLTV 1
    2065 HRLVR 1
    2066 HRMRE 1
    2072 HRRRE 1
    2238 HSG*G... 1
    2239 HSLKQ 1
    2240 HSLRE 1
    2241 HSVKA 1
    2242 HTG*R... 1
    2077 HTLAQ 1
    2243 HTLEV 1
    215 HTLME 1
    2244 HTLMV 1
    2245 HTLQE 1
    2246 HTLRQ 1
    2080 HVLKV 1
    2247 IRLKE 1
    2248 IRQEE 1
    2082 KGLKQ 1
    2249 KRLKV 1
    2250 LRLKK 1
    2251 NKLKE 1
    2252 NKLKG 1
    2092 NRFKE 1
    2253 NRLAE 1
    2254 NRLEE 1
    1925 NRLKR 1
    2255 NRLKS 1
    2097 NRLKT 1
    1914 NRLRD 1
    2256 NRLRG 1
    1863 NRLRV 1
    2257 NRLTE 1
    2109 NSLKE 1
    1937 NTLKQ 1
    2258 PAEG... 1
    2259 PPPPE 1
    2113 PRLRE 1
    2115 QAEG... 1
    2260 QGRRE 1
    2261 QRLEE 1
    2119 QRLRG 1
    2262 QSLGR 1
    2134 SGLAV 1
    2263 SKLK... 1
    2264 SMLRE 1
    2265 SRLAE 1
    2266 SRLCE 1
    2142 SRLKG 1
    2267 SRLLE 1
    2143 SRLQE 1
    2145 SRLRA 1
    1978 SRLRK 1
    1940 SRPKE 1
    2149 SRQRE 1
    2268 SRRKE 1
    2150 SSLKE 1
    2152 SSQRE 1
    539 STLRA 1
    202 STLRE 1
    2155 TGLKQ 1
    2269 TGLRE 1
    2270 THLKV 1
    2271 TILYE 1
    2272 TLLKE 1
    1981 TRLKG 1
    1908 TRLME 1
    1883 TRLRV 1
    2273 TRLTV 1
    2274 TRMGE 1
    2275 TRMKQ 1
    2176 TTLKA 1
    1918 TTLKI 1
    2178 TTLKL 1
    2276 YTLKE 1
  • TABLE 15
    ZF5 selection on G:A change at
    nt 7 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    46 HTLKV 3934
    41 HGLKV 2682
    1851 STLKV 2167
    1861 RLLPN 1887
    1849 TTLKV 1471
    43 HALKV 923
    454 DTLKV 888
    1875 TTLKQ 754
    1891 HKLKV 571
    1885 HTLKQ 513
    1845 TGLKV 482
    1892 GTLKV 473
    488 DTLKQ 462
    1852 DGLKV 443
    1856 HSLKV 352
    1896 HHLVQ 298
    1901 HHLKV 259
    1834 HRLKV 210
    42 HRLKE 190
    371 DTLRV 189
    44 SRLKE 186
    165 TRLKE 178
    1887 SALKV 177
    1909 ATLKV 155
    1900 HHLKQ 149
    1926 STLKA 140
    1897 NGLKV 136
    47 NRLKE 124
    548 STLKQ 118
    1973 QTLKV 112
    1874 HQLKV 94
    2135 SGLKV 91
    1829 HRLRE 89
    1936 HTLRV 88
    297 STLRV 78
    447 DTLKA 75
    1957 GGLKV 75
    1928 TRLKQ 75
    1966 KTLKQ 69
    2277 HTL*A 66
    1913 HTLKA 64
    1832 SRLRE 61
    110 TRLRE 58
    1937 NTLKQ 56
    2278 SKLKQ 55
    1830 QRLRE 53
    2203 DKLKV 51
    1919 TTLRV 48
    2151 SSLKV 43
    1848 SHLKV 42
    2030 GTLKQ 40
    1864 TRLKV 40
    2270 THLKV 38
    1969 NTLKV 37
    553 STLRQ 35
    2279 HALRV 34
    1931 HGLKA 33
    2009 DKLKQ 32
    109 DGLKR 29
    1953 EGLKV 29
    2197 ETLKV 29
    2280 GILKV 28
    1855 DRLKV 26
    1866 HGLRV 24
    2281 SVLKQ 23
    1831 DALKR 22
    93 ERLRV 22
    2282 GQLHV 21
    2283 TTLRQ 21
    45 DGLRV 20
    2284 DTLKN 20
    2179 TTLKT 20
    2285 GVLKV 17
    2010 DKLRQ 16
    2286 GTLKA 16
    2026 GGLKQ 15
    2036 HGLKQ 15
    2043 HHLRV 15
    94 NRLKV 15
    2192 GTLRV 14
    262 HTLKE 14
    2287 SVLKV 14
    2155 TGLKQ 14
    1835 ERLRM 13
    1838 NRLRE 13
    2137 SKLKV 13
    649 SNLKV 13
    2288 TVLKV 13
    1841 ARLKE 12
    1839 ARLRE 12
    1833 DGLKK 12
    2289 HHLRQ 12
    2205 HTLKG 12
    2080 HVLKV 12
    1917 TRLRQ 12
    2290 NTLRQ 11
    2134 SGLAV 11
    108 DALRR 10
    2291 QTLKQ 10
    2292 RTLKQ 10
    235 STLKE 10
    1987 AGLKV 9
    2013 DTLKT 9
    274 HHLVV 9
    2049 HNLKV 9
    1836 QRLKE 9
    2293 STLKG 9
    2294 TVLKQ 9
    1837 DGLVR 8
    2295 GGLVV 8
    2296 HGLQV 8
    1850 HRLGE 8
    1854 HRLTE 8
    2246 HTLRQ 8
    1857 SRLKV 8
    2297 DTLKG 7
    2298 GGLTV 7
    2299 GVLKA 7
    2031 HALKA 7
    2194 HGLKT 7
    2176 TTLKA 7
    2300 GTLRQ 6
    2301 HALKQ 6
    1844 SRLNE 6
    2302 STLKT 6
    1842 TRLRD 6
    2303 ATLKA 5
    2304 ATLKQ 5
    2305 DGLKQ 5
    1843 DGLRR 5
    1862 ERLKV 5
    2306 GTLNA 5
    2307 GVLKN 5
    1895 HGLKE 5
    1910 HGLVV 5
    2308 TTLKG 5
    1853 DGLRK 4
    1840 ERLRQ 4
    2309 ETLRV 4
    2310 HGLKG 4
    2311 HGLNV 4
    1859 HGLTV 4
    1961 HHLKE 4
    1846 HRLSE 4
    1886 NALKV 4
    484 STLTV 4
    2312 VGLGE 4
    2186 YTLKV 4
    2313 AGLAT 3
    1948 ARLVR 3
    2314 D*LPG 3
    2003 DGLKG 3
    2315 DKLRV 3
    1899 HALKE 3
    1860 HRLME 3
    2239 HSLKQ 3
    2078 HTLNV 3
    2079 HTMKV 3
    2316 HTQKV 3
    2262 QSLGR 3
    1974 RKLRS 3
    474 STLNV 3
    2177 TTLKE 3
    1871 DALVR 2
    2001 DELPG 2
    2317 DGLRA 2
    2318 DVLKV 2
    2319 GALRV 2
    2320 GGLVQ 2
    2321 GNLKV 2
    2322 GPLKV 2
    2323 GTLKG 2
    2324 GVLKQ 2
    2325 GVLRV 2
    678 GVLVA 2
    2032 HALKG 2
    2326 HDLKV 2
    2327 HGLEV 2
    2226 HGPKV 2
    2328 HHMVQ 2
    1962 HILKA 2
    2329 HKLKA 2
    2045 HKLKE 2
    2231 HKLKQ 2
    1921 HRLKA 2
    2330 HRLKQ 2
    1847 HRLNE 2
    2082 KGLKQ 2
    2331 KTLKV 2
    2332 PTLKV 2
    1972 QSLIA 2
    2333 RLLPY 2
    2334 RLRPN 2
    2335 RTLAQ 2
    2336 RTLKV 2
    2337 SALTV 2
    2338 STLKL 2
    1916 TGLKT 2
    2339 TKLKQ 2
    1918 TTLKI 2
    2340 TTPKV 2
    2341 AGLAS 1
    2342 AGLKM 1
    2343 APLKV 1
    1945 ARLKV 1
    1992 ARLRD 1
    2344 ATLKG 1
    538 DALKE 1
    1898 DALKV 1
    2345 DELRQ 1
    2346 DGLKA 1
    1865 DGLKE 1
    2347 DGLKL 1
    2348 DKLKG 1
    1877 DRLKQ 1
    1952 DRLRE 1
    1904 DRLRV 1
    2349 DSLKV 1
    2195 DTLKE 1
    2350 DTLNQ 1
    326 DTLQA 1
    423 DTLRA 1
    533 DTLRL 1
    2351 DTLWQ 1
    2352 DTMKV 1
    2353 EGLKQ 1
    1955 ERLRA 1
    1873 ERLRE 1
    2023 ERPRM 1
    2354 ETLKE 1
    2355 ETRRV 1
    2356 GGLAV 1
    2357 GGLRG 1
    2358 GGLRV 1
    2359 GHLKA 1
    2196 GILND 1
    2028 GMLKV 1
    2360 GPLRA 1
    2361 GQQHV 1
    2362 GTLQA 1
    2363 GTPKV 1
    2364 HALES 1
    2365 HALKF 1
    2366 HALMV 1
    2033 HALPV 1
    2367 HAMKV 1
    2368 HARKV 1
    2222 HELKV 1
    2369 HGLKD 1
    2370 HGLKL 1
    2371 HGLKM 1
    2372 HGLKW 1
    2373 HGRKI 1
    2041 HGRKV 1
    2374 HHLAQ 1
    2375 HHLGQ 1
    2376 HHLMQ 1
    2377 HHMKV 1
    2044 HILIA 1
    2228 HILKV 1
    2230 HKLKG 1
    2378 HKLKM 1
    2379 HKLNV 1
    2380 HKLQE 1
    2046 HKLRV 1
    2381 HMLNV 1
    2382 HPLDV 1
    2050 HPLKV 1
    2383 HPLQV 1
    2384 HQLKA 1
    2385 HQLKG 1
    2386 HQLKT 1
    1868 HRLAE 1
    2058 HRLKG 1
    2059 HRLKL 1
    1872 HRLKR 1
    1912 HRLRQ 1
    2065 HRLVR 1
    2067 HRPKE 1
    2387 HSLKA 1
    1923 HSLKE 1
    2388 HSLKG 1
    2389 HSLKL 1
    2241 HSVKA 1
    2077 HTLAQ 1
    2390 HTLAV 1
    2243 HTLEV 1
    2391 HTLKN 1
    2244 HTLMV 1
    2392 HTLNA 1
    2393 HTLQV 1
    250 HTLTE 1
    2394 HTLTV 1
    2395 HTPKV 1
    2396 HTRKQ 1
    2397 HVLKF 1
    2398 HVMKV 1
    2399 HWLKV 1
    2400 KADTV 1
    2401 KGLKG 1
    2402 KRLKQ 1
    2403 KTLAQ 1
    2404 KTLRV 1
    2405 KTLTQ 1
    2406 LHLKV 1
    2407 LTLKQ 1
    2408 LTLKV 1
    2409 MGLKV 1
    2410 MPPK 1
    2411 MRLKQ 1
    2412 NAVTE 1
    2413 NGLKG 1
    2414 NGLKL 1
    2415 NRLKG 1
    1914 NRLRD 1
    1863 NRLRV 1
    2416 NTLRV 1
    2417 PGLKV 1
    2418 QGLKV 1
    1858 QRLKV 1
    1938 QRLRV 1
    2419 QRQRV 1
    2420 QTLKA 1
    2421 QTLKG 1
    2422 QTLKK 1
    2423 QTLKM 1
    2424 QTLMV 1
    2125 RGLKV 1
    2425 RHLVQ 1
    2426 RLLPT 1
    2427 RLLSN 1
    2428 RLMPD 1
    2429 RMLPN 1
    2126 RRLGD 1
    2430 RSLKV 1
    2431 RTLKG 1
    2432 SALKQ 1
    2433 SALRQ 1
    2434 SELKV 1
    2435 SFLKV 1
    2133 SGLAM 1
    2436 SGLKQ 1
    2437 SHLKQ 1
    2438 SKLKA 1
    2187 SKLKE 1
    1888 SRLKD 1
    2145 SRLRA 1
    556 SSLRV 1
    2152 SSQRE 1
    2439 STLKK 1
    2440 STLKM 1
    385 STLMV 1
    448 STLQQ 1
    554 STLTA 1
    2441 STMKA 1
    2442 STMKV 1
    2443 TALKV 1
    2444 TGLKA 1
    2445 TGLKD 1
    1915 TGLKE 1
    2154 TGLKG 1
    2446 TGLMV 1
    2198 THLKE 1
    2447 THLKG 1
    2448 THLKL 1
    2449 THLKQ 1
    2450 THLMV 1
    64 TKLKV 1
    2451 TPLQV 1
    1882 TRLKD 1
    1981 TRLKG 1
    2452 TRLPQ 1
    1942 TRLQE 1
    2453 TTLEV 1
    2454 TTLHV 1
    507 TTLNQ 1
    577 TTLQV 1
    2455 TTLRG 1
    2456 TTLYV 1
    2457 TTMKV 1
    2458 TVLRQ 1
    2459 VGLGG 1
    2460 VTLKV 1
  • TABLE 16
    ZF5 selection on G: A change position 8 of the CBS
    core motif. Sequences reflect position 2 to 6.
    SEQ
    ID
    NO: Sequence # Read
    2461 GGLRR 341
    50 GGLVR 336
    2462 TGLRR 274
    2463 EGLRR 267
    1843 DGLRR 232
    2464 SGLRR 206
    2465 AGLAR 179
    2466 SGLAR 178
    2467 GGLAR 177
    55 GGLTR 168
    2468 DGLAR 152
    1986 AGLKR 148
    2469 TGLAR 135
    1837 DGLVR 129
    2470 GGLQR 127
    70 GNLTR 124
    117 GNLVR 123
    2471 HGLAR 123
    2027 GGLKR 111
    2472 TGLVR 108
    2473 AGLTR 105
    2474 SGLSR 102
    2475 AGLRR 100
    2476 GGLSR 94
    59 HGLRR 91
    54 HGLVR 87
    2477 SGLTR 84
    2478 NGLVR 80
    2479 AGLQR 79
    118 GNLRR 79
    2480 AGLHR 76
    2481 GNLER 76
    2482 HNLLR 76
    138 GNLAR 73
    1870 DGLTR 72
    2483 HALRR 69
    2484 HGLQR 69
    2485 NGLRR 69
    2486 SGLVR 68
    2487 SNLDR 67
    68 TNLRR 66
    2488 HGLTR 63
    2489 SSLRR 63
    108 DALRR 61
    2490 EGLTR 61
    2491 GGLER 61
    109 DGLKR 60
    2492 TGLQR 60
    56 HTLRR 59
    1985 AALKR 58
    1988 AGLVR 55
    2493 AGLIR 54
    1932 HGLKR 54
    2494 ANLVR 53
    2495 EGLKR 53
    2496 SNLLR 51
    2497 EGLAR 50
    2498 AGLSR 49
    2499 DGLIR 48
    2500 TGLKR 48
    2501 SGLQR 46
    2502 ETLKR 45
    2503 HGLLR 45
    2504 NGLQR 45
    2505 TGLMR 45
    69 ANLRR 43
    2506 DNLVR 42
    2507 TGLLR 42
    2508 DGLMR 41
    2509 ASLKR 39
    2510 QGLRR 38
    2511 TNLVR 38
    2512 NGLTR 37
    2513 SGLDR 37
    2514 SGLHR 37
    2515 TGLNR 37
    2516 TGLSR 37
    2517 GNLLR 36
    2518 NNLVR 36
    2519 TGLIR 36
    2520 DMLRR 35
    2521 GALKR 35
    2522 GNLDR 35
    2523 SALRR 35
    2524 SNLAR 35
    2525 SGLLR 34
    2526 TNLNR 33
    2527 AGLLR 31
    2528 GGLIR 31
    2529 DGLHR 30
    2530 DTLRR 30
    2531 HLLKR 30
    2532 SALAR 30
    2533 SMLAR 30
    2534 VGLKR 30
    2535 DNLLR 28
    2536 GGLMR 28
    2537 SGLMR 28
    2538 AALRR 27
    2539 ETLRR 27
    2540 NGLAR 27
    2157 TGLRV 27
    53 TGLTR 27
    2541 TNLQR 27
    2542 ANLAR 26
    2543 NNLAR 26
    2544 SNLSR 26
    2545 STLSR 26
    2546 AALAR 25
    2547 HALVR 25
    2548 HGLSR 25
    2549 SGLNR 25
    2550 STLAR 25
    2551 ANLIR 24
    2552 DGLDR 24
    2553 DGLSR 24
    2554 GTLKR 24
    1884 EALKR 23
    2555 NGLSR 23
    2556 SMLRR 23
    2557 HNLHR 22
    2558 HNLRR 22
    2559 SGLKR 22
    2560 TGLGR 22
    2561 TNLMR 22
    1871 DALVR 21
    2562 GTLTR 21
    2563 DGLNR 20
    2564 SSLVR 20
    2565 TGLER 20
    2566 DTLKR 19
    2567 GNLSR 19
    51 HGLIR 19
    2568 HSLVR 19
    2569 AGLNR 18
    2570 DALAR 18
    2571 GGLHR 18
    2572 NGLIR 18
    2573 QGLTR 18
    2574 QMLKR 18
    2575 QNLRR 18
    1845 TGLKV 18
    2576 AILKR 17
    119 GNLKR 17
    139 GNLMR 17
    2577 HNLTR 17
    2578 HTLAR 17
    2579 QGLKR 17
    2580 SGLER 17
    2581 SGLGR 17
    2582 SNLVR 17
    2583 EALRR 16
    2584 GTLRR 16
    2585 HGLGR 16
    2586 HTLMR 16
    2587 NTLRR 16
    2588 TGLHR 16
    2589 TSLRR 16
    2590 TTLQR 16
    2591 DNLKR 15
    2592 GALTR 15
    2593 QTLRR 15
    2594 SGLIR 15
    2595 TNLKR 15
    2596 DGLGR 14
    2597 DSLQR 14
    2598 EGLNR 14
    2599 ENLRR 14
    2600 GSLRR 14
    2601 NGLNR 14
    2602 QALKR 14
    2603 SALSR 14
    2604 SSLGR 14
    2605 VNLKR 14
    66 ATLRR 13
    2005 DGLLR 13
    2606 EMLKR 13
    2607 GALVR 13
    2608 GNLGR 13
    2609 GNLQR 13
    2610 HALAR 13
    2611 HSLIR 13
    2612 HTLER 13
    2613 HTLQR 13
    2614 NGLER 13
    2615 NGLMR 13
    2616 QGLVR 13
    2617 TALKR 13
    2618 TTLMR 13
    2619 VGLRR 13
    2620 ANLKR 12
    2621 ANLNR 12
    2622 ATLTR 12
    2623 DNLRR 12
    2624 ENLKR 12
    2625 GGLLR 12
    2626 GTLVR 12
    2627 HNLSR 12
    2628 NTLKR 12
    2629 SALER 12
    2630 SSLTR 12
    2631 TALVR 12
    52 ANLSR 11
    2632 DNLAR 11
    2633 ENLSR 11
    2634 ESLRR 11
    2635 NALRR 11
    2636 NGLKR 11
    2637 NNLLR 11
    2418 QGLKV 11
    116 SNLRR 11
    2638 STLRR 11
    2639 VNLSR 11
    2640 DMLKR 10
    2641 GALRR 10
    2642 GGLDR 10
    2643 HGLMR 10
    2644 HNLVR 10
    2645 HQLIR 10
    2086 NALKR 10
    1969 NTLKV 10
    2646 QNLQR 10
    1887 SALKV 10
    2647 SMLIR 10
    2648 TALRV 10
    2649 TNLAR 10
    2650 TQLKR 10
    1849 TTLKV 10
    2651 TTLTR 10
    2652 VGLQR 10
    2653 AALSR 9
    2654 ATLAR 9
    2655 DALGR 9
    2656 DTLNR 9
    2657 EILKR 9
    2658 ESLKR 9
    2659 GGLNR 9
    2660 GSLTR 9
    2661 HNLAR 9
    2662 MGLKR 9
    2663 NGLHR 9
    2664 NMLKR 9
    2665 PNLKR 9
    2666 SALTR 9
    2667 SDLKR 9
    2668 STLGR 9
    2669 AGLER 8
    2670 DILRR 8
    2671 DMLNR 8
    2672 DTLAR 8
    2673 HALLR 8
    2674 HALSR 8
    2675 HNLGR 8
    2676 NALVR 8
    2677 SMLTR 8
    2678 TALAR 8
    2679 TNLER 8
    2680 TNLGR 8
    2681 TTLNR 8
    2682 DALLR 7
    2683 DSLAR 7
    2684 GTLAR 7
    2685 GTLLV 7
    2686 HALIR 7
    2687 HGLDR 7
    2688 HGLER 7
    2689 HTLLR 7
    2690 NNLIR 7
    2691 NNLMR 7
    2692 QSLKR 7
    2693 SALGR 7
    2694 SALVR 7
    2695 SNLMR 7
    2696 SQLRR 7
    2697 STLQR 7
    2698 STLVR 7
    2699 SVLKR 7
    2189 TSLKV 7
    2700 AALTR 6
    2701 DSLKR 6
    2702 DSLRR 6
    2703 DTLMR 6
    2704 EGLLR 6
    2705 ENLAR 6
    2706 GNLNR 6
    2707 GTLQR 6
    2708 HALDR 6
    2709 HVLER 6
    2710 IGLRR 6
    2711 INLTR 6
    2712 NMLRR 6
    2713 QMLRR 6
    2714 TNLHR 6
    2715 TSLHR 6
    2716 VGLAR 6
    2717 AALQR 5
    2718 AGLDR 5
    48 ATLKR 5
    1833 DGLKK 5
    2719 DTLQR 5
    2720 DVLKR 5
    2721 GALSR 5
    2722 GMLKR 5
    2723 GTLSR 5
    2724 HNLER 5
    2725 NGLLV 5
    2726 NNLTR 5
    2727 QALAV 5
    2728 QGLAR 5
    2729 QNLHR 5
    2730 SALMR 5
    2731 SLLLR 5
    2732 SVLAR 5
    2733 SVLTR 5
    2734 TALRR 5
    74 TMLRR 5
    2735 TQLRV 5
    2736 TTLLR 5
    2737 TTLRR 5
    2738 AALNR 4
    2739 ATLVR 4
    2740 DALHR 4
    2741 DALMR 4
    2742 DGLER 4
    2743 DGLQR 4
    45 DGLRV 4
    2744 DLLRR 4
    1855 DRLKV 4
    2745 GGLGR 4
    2746 GNLHR 4
    1892 GTLKV 4
    2747 GTLNR 4
    2748 HALHR 4
    2749 HALMR 4
    2750 HILTR 4
    2751 HLLLR 4
    2752 HNLQR 4
    2753 HTLGR 4
    2754 IGLTG 4
    2755 NGLLR 4
    2756 NSLRR 4
    2757 PNLIR 4
    2758 PNLRR 4
    2759 SALIR 4
    2760 SILGR 4
    2761 SPLVR 4
    2762 STLTR 4
    2763 TALKT 4
    2764 TALTR 4
    2765 TGLDR 4
    2766 TSLKR 4
    2767 TTLVR 4
    2768 VGLQN 4
    2769 VNLRR 4
    2770 AALVR 3
    58 ADLKR 3
    2771 ANLGR 3
    2772 ATLSR 3
    2773 DNLQR 3
    2774 DNLTR 3
    2775 DRLRR 3
    2776 DTLVR 3
    2777 EGLVR 3
    2778 GALNR 3
    2779 GDLKR 3
    2780 GDLTR 3
    62 GGLGL 3
    2781 GSLQR 3
    1930 HALKR 3
    2782 HGLHR 3
    1866 HGLRV 3
    2783 HTLKR 3
    2784 HVLKR 3
    2785 NGLDR 3
    2786 NMLAR 3
    2787 NSLAR 3
    2788 NTLAR 3
    2789 QGLHR 3
    2134 SGLAV 3
    2790 SILTR 3
    2791 SILVR 3
    2792 SQLKR 3
    2793 SSLQR 3
    2794 TALHR 3
    2795 TALNR 3
    2796 TALSR 3
    2797 AGLGR 2
    2798 AGLMR 2
    2799 ASLQR 2
    2800 ASLVR 2
    2801 ATLMR 2
    2802 AVLKR 2
    2803 DALNR 2
    2804 DALQR 2
    2805 DALSR 2
    1853 DGLRK 2
    2806 DHLHR 2
    2807 DHLVR 2
    2808 DNLSR 2
    2809 DTLSR 2
    2810 DTLTR 2
    2811 DVLRR 2
    2812 EGLIR 2
    2813 EGLSR 2
    2814 GAEE . . .  2
    2815 GALQR 2
    2319 GALRV 2
    2816 GDLRR 2
    2817 GDLVR 2
    1957 GGLKV 2
    2358 GGLRV 2
    2818 GSLAR 2
    2819 GSLKR 2
    2820 HDLRR 2
    2821 HGLNR 2
    2822 HHLIR 2
    2047 HMLKR 2
    2823 HMLRR 2
    2824 HQLVR 2
    2825 HSLAR 2
    2826 HSLHR 2
    2827 HSLRR 2
    46 HTLKV 2
    2828 HTLNR 2
    2829 HTLTR 2
    2830 HTLVR 2
    2831 IGLKR 2
    2832 ITLKR 2
    2833 MTLKR 2
    2834 NALHR 2
    2835 NALSR 2
    2836 NGLGR 2
    2837 NTLHR 2
    2838 QDLKR 2
    2839 QGLLR 2
    2840 QNLLR 2
    2841 QNLRW 2
    2842 QSLRR 2
    2843 QTLKR 2
    2131 SALKR 2
    2844 SALRV 2
    2845 SSLAR 2
    2846 SSLSR 2
    2847 STLDR 2
    2848 STLER 2
    2849 STLHR 2
    1851 STLKV 2
    2850 STLMR 2
    2851 TALGR 2
    2852 TGLAT 2
    2853 TGLSV 2
    2854 TGLVT 2
    2855 TNLKV 2
    2856 TNLSR 2
    2857 TTLAR 2
    2858 TTLGR 2
    2859 TTLIR 2
    2860 TTLKR 2
    2179 TTLKT 2
    2861 TVLRM 2
    2862 VQLAM 2
    2863 VTLTR 2
    A*S . . .  1
    2864 AALLR 1
    2865 AALMR 1
    2866 AAPER 1
    2867 ADLRR 1
    2868 AGLAW 1
    2869 AGLRW 1
    2870 AGLTS 1
    2871 AILTR 1
    71 AMLKR 1
    2872 ANLPR 1
    1944 ARLKR 1
    2873 ARLQR 1
    2874 ARLTR 1
    2875 ASLRR 1
    2876 ASLTR 1
    2877 ATLDR 1
    2878 ATLER 1
    2879 ATLIR 1
    2880 ATLLR 1
    2881 ATLQR 1
    2882 AVLRR 1
    1831 DALKR 1
    1950 DALRV 1
    2883 DGLSV 1
    2884 DILHR 1
    2885 DQLRR 1
    2886 DSLSR 1
    2887 DTLAK 1
    2888 DVLLR 1
    2889 EALNR 1
    2890 EALTR 1
    1953 EGLKV 1
    2891 EGLMR 1
    2892 EGLQR 1
    2893 EGLRL 1
    2894 EGLRV 1
    2895 EGVRR 1
    2896 ELLRR 1
    2897 ENLER 1
    2898 ETLLR 1
    2899 GALHR 1
    2900 GGHRR 1
    2901 GGLAG 1
    2356 GGLAV 1
    2902 GGLDV 1
    2903 GGLGS 1
    2904 GGLQE 1
    2905 GGLVL 1
    1958 GGLVT 1
    2906 GGPSH 1
    2907 GGPSR 1
    2908 GGQRR 1
    2909 GGVRR 1
    2910 GGWR . . .  1
    2911 GILER 1
    2912 GKLRR 1
    2913 GMLAR 1
    2914 GNLIR 1
    2915 GSLER 1
    2916 GSLVR 1
    2917 GTLER 1
    2918 GTLGR 1
    2919 GTLHR 1
    2920 GTQVR 1
    2921 GVLRR 1
    2922 GVLTR 1
    2923 HALGR 1
    43 HALKV 1
    2924 HDLAK 1
    2925 HGAAR 1
    2035 HGLKK 1
    2371 HGLKM 1
    41 HGLKV 1
    2926 HGLSV 1
    2927 HGLTW 1
    2928 HGPAR 1
    2929 HKLAR 1
    2930 HNLLS 1
    2931 HRLSR 1
    2932 HSLNR 1
    2933 HSLSR 1
    2934 HTLHR 1
    2935 HVLAR 1
    2936 INLSR 1
    2937 NALAR 1
    2938 NHLVQ 1
    2939 NTLIR 1
    2940 NTLNR 1
    2941 NTLQR 1
    2942 NVLKR 1
    2943 PALKR 1
    2944 PGLLR 1
    PWS . . .  1
    2945 QAAWG . . .  1
    2946 QALAR 1
    2947 QALTR 1
    2948 QDLIR 1
    2949 QTLAR 1
    2950 QTLQR 1
    2951 QVLRR 1
    2952 RGLTR 1
    2953 RGLVR 1
    2954 SALDR 1
    2955 SALMC 1
    2956 SALNR 1
    2957 SDLAR 1
    2958 SDLQR 1
    2959 SDLRR 1
    2960 SGPRR 1
    2961 SLLSD 1
    2962 SMLHR 1
    2963 SNLQR 1
    2964 SSLIR 1
    2965 SSLKR 1
    2966 STLLR 1
    2967 STLNR 1
    2968 STLRK 1
    2969 SVLGR 1
    2970 SVLRR 1
    2971 TALER 1
    2972 TALRT 1
    2973 TDLAR 1
    2974 TDLRR 1
    2975 TGLQV 1
    2976 TGLVRR 1
    2977 TGPAR 1
    2978 TMLKR 1
    2979 TNLPR 1
    2980 TSLAR 1
    2981 TSLGG 1
    2982 TSLGR 1
    2983 TSLQR 1
    2984 TSLVR 1
    2985 VALAR 1
    2986 VALKR 1
    2987 VALSR 1
    2988 VGLKC 1
    2989 VGLSR 1
    2990 VGLTM 1
    2991 VNLAR 1
    2992 VNLIR 1
    2993 VNLNR 1
    2994 VTLGR 1
    2995 VTLKR 1
    2996 VTLMR 1
    2997 VTLRR 1
    2998 WGLER 1
  • TABLE 17
    ZF5 selection on G: C change at nt 8 of core motif
    in CBS. Sequences reflect position 2 to 6.
    SEQ
    ID
    NO: Sequence # Read
    1843 DGLRR 498
    108 DALRR 388
    2463 EGLRR 348
    1871 DALVR 288
    1837 DGLVR 262
    2468 DGLAR 261
    1986 AGLKR 257
    1870 DGLTR 255
    2462 TGLRR 237
    2530 DTLRR 196
    59 HGLRR 192
    66 ATLRR 176
    2539 ETLRR 149
    2464 SGLRR 142
    2584 GTLRR 136
    50 GGLVR 132
    2545 STLSR 132
    2707 GTLQR 131
    2553 DGLSR 127
    2027 GGLKR 126
    2684 GTLAR 123
    2578 HTLAR 114
    2486 SGLVR 111
    2779 GDLKR 109
    2593 QTLRR 107
    2472 TGLVR 106
    2668 STLGR 103
    2776 DTLVR 102
    2563 DGLNR 100
    2811 DVLRR 100
    2698 STLVR 100
    2720 DVLKR 99
    48 ATLKR 96
    2461 GGLRR 93
    2638 STLRR 93
    2802 AVLKR 91
    2816 GDLRR 90
    2554 GTLKR 89
    1932 HGLKR 89
    56 HTLRR 89
    2492 TGLQR 87
    2559 SGLKR 86
    2672 DTLAR 84
    2654 ATLAR 83
    2848 STLER 81
    2737 TTLRR 80
    2495 EGLKR 79
    2562 GTLTR 79
    2469 TGLAR 75
    2529 DGLHR 74
    54 HGLVR 74
    2828 HTLNR 73
    2967 STLNR 71
    2489 SSLRR 69
    2516 TGLSR 68
    2772 ATLSR 67
    2656 DTLNR 67
    2788 NTLAR 66
    58 ADLKR 65
    2570 DALAR 65
    2626 GTLVR 64
    2719 DTLQR 62
    2739 ATLVR 61
    2478 NGLVR 61
    109 DGLKR 59
    2467 GGLAR 59
    2568 HSLVR 59
    2804 DALQR 58
    2507 TGLLR 58
    2640 DMLKR 57
    55 GGLTR 56
    2867 ADLRR 55
    2474 SGLSR 55
    2564 SSLVR 54
    2500 TGLKR 53
    2475 AGLRR 52
    2550 STLAR 52
    2783 HTLKR 51
    2587 NTLRR 51
    2857 TTLAR 51
    2622 ATLTR 49
    2817 GDLVR 49
    2667 SDLKR 49
    2767 TTLVR 49
    2466 SGLAR 48
    2847 STLDR 48
    2850 STLMR 48
    2515 TGLNR 48
    2502 ETLKR 47
    2970 SVLRR 47
    2849 STLHR 46
    2959 SDLRR 45
    2699 SVLKR 44
    2488 HGLTR 43
    2702 DSLRR 42
    2974 TDLRR 42
    2471 HGLAR 40
    2586 HTLMR 40
    2477 SGLTR 40
    2966 STLLR 40
    2736 TTLLR 40
    2636 NGLKR 39
    2810 DTLTR 38
    2598 EGLNR 37
    2723 GTLSR 37
    2978 TMLKR 37
    2589 TSLRR 37
    2801 ATLMR 36
    2999 DALTR 36
    2697 STLQR 36
    2762 STLTR 36
    2780 GDLTR 35
    2476 GGLSR 35
    51 HGLIR 35
    2509 ASLKR 34
    2630 SSLTR 34
    1985 AALKR 33
    3000 DALIR 33
    2859 TTLIR 33
    2490 EGLTR 32
    2753 HTLGR 32
    2613 HTLQR 32
    2692 QSLKR 32
    2701 DSLKR 31
    2131 SALKR 31
    2845 SSLAR 31
    2618 TTLMR 31
    2878 ATLER 30
    2086 NALKR 30
    2594 SGLIR 30
    2556 SMLRR 30
    3001 GVLKR 29
    53 TGLTR 29
    2497 EGLAR 28
    2612 HTLER 28
    2766 TSLKR 28
    3002 GDLHR 27
    2644 HNLVR 27
    1936 HTLRV 27
    2465 AGLAR 26
    3003 GDLNR 26
    2503 HGLLR 26
    3004 SILKR 26
    2858 TTLGR 26
    2499 DGLIR 25
    2732 SVLAR 25
    2590 TTLQR 25
    2473 AGLTR 24
    1988 AGLVR 24
    2805 DALSR 24
    3005 DTLIR 24
    2777 EGLVR 24
    2579 QGLKR 24
    2820 HDLRR 23
    2784 HVLKR 23
    3006 NTLTR 23
    2957 SDLAR 23
    2965 SSLKR 23
    2973 TDLAR 23
    2803 DALNR 22
    3007 HTLIR 22
    2628 NTLKR 22
    2838 QDLKR 22
    2860 TTLKR 22
    3008 EVLRR 21
    3009 GDLSR 21
    3010 HVLRR 21
    2837 NTLHR 21
    3011 TDLTR 21
    2681 TTLNR 21
    1833 DGLKK 20
    2520 DMLRR 20
    2919 GTLHR 20
    2833 MTLKR 20
    2980 TSLAR 20
    3012 ATLHR 19
    3013 DSLVR 19
    3014 GTLDR 19
    2830 HTLVR 19
    3015 NTLLR 19
    2843 QTLKR 19
    2634 ESLRR 18
    3016 HDLQR 18
    2821 HGLNR 18
    2823 HMLRR 18
    57 TVLKR 18
    3017 ATLNR 17
    2596 DGLGR 17
    2485 NGLRR 17
    2549 SGLNR 17
    2501 SGLQR 17
    3018 STLIR 16
    2617 TALKR 16
    2519 TGLIR 16
    3019 TTLSR 16
    3020 DILKR 15
    3021 ETLNR 15
    2916 GSLVR 15
    3022 MDLKR 15
    2504 NGLQR 15
    2949 QTLAR 15
    2964 SSLIR 15
    2538 AALRR 14
    2818 GSLAR 14
    2484 HGLQR 14
    2512 NGLTR 14
    3023 QDLRR 14
    2588 TGLHR 14
    3024 TSLTR 14
    71 AMLKR 13
    3025 ATLGR 13
    3026 GDLQR 13
    2470 GGLQR 13
    2819 GSLKR 13
    3027 NTLVR 13
    3028 SILRR 13
    2582 SNLVR 13
    2846 SSLSR 13
    2995 VTLKR 13
    2880 ATLLR 12
    2597 DSLQR 12
    2659 GGLNR 12
    2548 HGLSR 12
    2525 SGLLR 12
    2792 SQLKR 12
    2505 TGLMR 12
    2982 TSLGR 12
    2479 AGLQR 11
    2670 DILRR 11
    3029 DTLER 11
    3030 DTLLR 11
    2917 GTLER 11
    2689 HTLLR 11
    2540 NGLAR 11
    2663 NGLHR 11
    3031 SDLTR 11
    3032 SMLKR 11
    1849 TTLKV 11
    2879 ATLIR 10
    2722 GMLKR 10
    2600 GSLRR 10
    3033 GTLLR 10
    2510 QGLRR 10
    2480 AGLHR 9
    2498 AGLSR 9
    2740 DALHR 9
    2005 DGLLR 9
    3034 DTLGR 9
    3035 GDLAR 9
    1930 HALKR 9
    2782 HGLHR 9
    46 HTLKV 9
    3036 HVLVR 9
    2664 NMLKR 9
    2939 NTLIR 9
    3037 QDLAR 9
    2560 TGLGR 9
    2875 ASLRR 8
    2881 ATLQR 8
    3038 ETLAR 8
    2592 GALTR 8
    2607 GALVR 8
    2547 HALVR 8
    2643 HGLMR 8
    3039 HILKR 8
    3040 HMLVR 8
    2827 HSLRR 8
    3041 NTLSR 8
    2948 QDLIR 8
    3042 SDLVR 8
    2537 SGLMR 8
    2677 SMLTR 8
    2189 TSLKV 8
    2651 TTLTR 8
    2700 AALTR 7
    3043 ETLQR 7
    2521 GALKR 7
    2641 GALRR 7
    2528 GGLIR 7
    117 GNLVR 7
    3044 HDLGR 7
    3045 HDLTR 7
    2826 HSLHR 7
    2934 HTLHR 7
    2942 NVLKR 7
    2678 TALAR 7
    3046 TDLKR 7
    1845 TGLKV 7
    3047 TSLNR 7
    2983 TSLQR 7
    3048 VDLKR 7
    2014 DVLKK 6
    3049 GILKR 6
    2921 GVLRR 6
    2610 HALAR 6
    2483 HALRR 6
    2531 HLLKR 6
    3050 HNLKR 6
    2834 NALHR 6
    3051 QDLQR 6
    2616 QGLVR 6
    2532 SALAR 6
    3052 SDLGR 6
    2514 SGLHR 6
    2302 STLKT 6
    3053 TDLSR 6
    2565 TGLER 6
    2742 DGLER 5
    3054 DILVR 5
    2566 DTLKR 5
    1884 EALKR 5
    2657 EILKR 5
    3055 GVLVG 5
    3056 HSLTR 5
    3057 HTLDR 5
    2937 NALAR 5
    2572 NGLIR 5
    2555 NGLSR 5
    3058 QQLQR 5
    2523 SALRR 5
    2694 SALVR 5
    2513 SGLDR 5
    2581 SGLGR 5
    2496 SNLLR 5
    3059 SVLLR 5
    3060 TDLGR 5
    3061 TDLQR 5
    2534 VGLKR 5
    2493 AGLIR 4
    2576 AILKR 4
    3062 ALLKR 4
    2683 DSLAR 4
    2886 DSLSR 4
    3063 DTLRK 4
    3064 ETLTR 4
    3065 GELTR 4
    70 GNLTR 4
    2660 GSLTR 4
    2918 GTLGR 4
    2748 HALHR 4
    3066 HDLNR 4
    2482 HNLLR 4
    3067 MTLRR 4
    2615 NGLMR 4
    3068 NTLER 4
    2956 SALNR 4
    2958 SDLQR 4
    3069 SELKR 4
    2580 SGLER 4
    2604 SSLGR 4
    3070 STLSM 4
    3071 TDLMR 4
    68 TNLRR 4
    2650 TQLKR 4
    3072 TSLLR 4
    3073 TSLMR 4
    2984 TSLVR 4
    3074 TTLER 4
    3075 TVLRR 4
    2738 AALNR 3
    3076 ADLTR 3
    2669 AGLER 3
    2542 ANLAR 3
    69 ANLRR 3
    2877 ATLDR 3
    2741 DALMR 3
    3077 DILTR 3
    3078 DMLQR 3
    2632 DNLAR 3
    2591 DNLKR 3
    2809 DTLSR 3
    3079 DVLVR 3
    2583 EALRR 3
    2813 EGLSR 3
    3080 ETLRK 3
    2481 GNLER 3
    3081 GTLMR 3
    2747 GTLNR 3
    3082 HAEG . . .  3
    3083 HDLMR 3
    3084 HMLQR 3
    2577 HNLTR 3
    3085 HSLKR 3
    2829 HTLTR 3
    2935 HVLAR 3
    2835 NALSR 3
    2518 NNLVR 3
    3086 QSLNR 3
    3087 SILAR 3
    2962 SMLHR 3
    297 STLRV 3
    2733 SVLTR 3
    3088 SVLVR 3
    2734 TALRR 3
    2981 TSLGG 3
    2994 VTLGR 3
    2546 AALAR 2
    2864 AALLR 2
    2770 AALVR 2
    3089 ADLVR 2
    2569 AGLNR 2
    2494 ANLVR 2
    3090 ASLAR 2
    3091 ASLIR 2
    2800 ASLVR 2
    2655 DALGR 2
    2552 DGLDR 2
    2743 DGLQR 2
    1853 DGLRK 2
    2506 DNLVR 2
    3092 DVLMR 2
    3093 DVLQR 2
    3094 EGLGR 2
    3095 EGLHR 2
    2892 EGLQR 2
    2658 ESLKR 2
    2536 GGLMR 2
    138 GNLAR 2
    139 GNLMR 2
    3096 HDLSR 2
    2687 HGLDR 2
    2585 HGLGR 2
    2371 HGLKM 2
    3097 HILMR 2
    2557 HNLHR 2
    2627 HNLSR 2
    2611 HSLIR 2
    3098 HSLQR 2
    3099 HVLHR 2
    3100 IDLKR 2
    2755 NGLLR 2
    3101 NILVR 2
    2943 PALKR 2
    3102 PGLAR 2
    3103 PTLMR 2
    2573 QGLTR 2
    2574 QMLKR 2
    2842 QSLRR 2
    3104 QTLSR 2
    2759 SALIR 2
    2603 SALSR 2
    3105 SELRR 2
    2487 SNLDR 2
    116 SNLRR 2
    2544 SNLSR 2
    2696 SQLRR 2
    2153 STLKR 2
    2968 STLRK 2
    3106 TDLHR 2
    3107 TDLVR 2
    3108 TGLKL 2
    2157 TGLRV 2
    3109 TMLNR 2
    2649 TNLAR 2
    2595 TNLKR 2
    2511 TNLVR 2
    3110 TSLIR 2
    2176 TTLKA 2
    3111 VDLRR 2
    3112 VTLAR 2
    3113 AALHR 1
    2717 AALQR 1
    2866 AAPER 1
    3114 ADLNR 1
    3115 ADLRV 1
    2868 AGLAW 1
    3116 AGLKK 1
    2527 AGLLR 1
    3117 AILRR 1
    2621 ANLNR 1
    3118 ASLKS 1
    2799 ASLQR 1
    2876 ASLTR 1
    3119 ASMKR 1
    3120 ATPVP 1
    2882 AVLRR 1
    3121 AVLTR 1
    3122 CGLRR 1
    3123 DAEA . . .  1
    3124 DALER 1
    1831 DALKR 1
    2682 DALLR 1
    3125 DALPR 1
    3126 DARRR 1
    3127 DDLNR 1
    3128 DGAAE . . .  1
    1852 DGLKV 1
    3129 DGLWR 1
    3130 DGPAR 1
    3131 DGPKK 1
    3132 DGRRR 1
    3133 DGVRR 1
    3134 DMLTR 1
    2535 DNLLR 1
    2808 DNLSR 1
    3135 DSLNR 1
    3136 DTLDR 1
    371 DTLRV 1
    3137 DVLRK 1
    3138 DVLRS 1
    3139 DVLSR 1
    3140 DVQKR 1
    3141 EALVR 1
    2812 EGLIR 1
    3142 EGLKM 1
    2704 EGLLR 1
    2891 EGLMR 1
    3143 EGLQC 1
    3144 EGLRS 1
    2894 EGLRV 1
    3145 EGRRR 1
    2895 EGVRR 1
    3146 EGWS . . .  1
    2705 ENLAR 1
    2633 ENLSR 1
    3147 ESLAR 1
    3148 ETGWG . . .  1
    3149 ETLER 1
    3150 ETLHR 1
    3151 ETLVR 1
    3152 ETRRR 1
    3153 EVLKR 1
    2814 GAEE . . .  1
    3154 GALAR 1
    2778 GALNR 1
    3155 GDLYR 1
    3156 GDPAP . . .  1
    2642 GGLDR 1
    2745 GGLGR 1
    2904 GGLQE 1
    3157 GGQTR 1
    3158 GGVVR 1
    3159 GHLQR 1
    3160 GILRR 1
    3161 GMLRR 1
    2522 GNLDR 1
    3162 GNLLL 1
    2517 GNLLR 1
    2609 GNLQR 1
    3163 GNLVM 1
    2685 GTLLV 1
    2192 GTLRV 1
    3164 GTLRW 1
    3165 GTPHR 1
    3166 GVLAR 1
    3167 GVLNR 1
    3168 GVLVR 1
    3169 GWLSR 1
    3170 HAEA . . .  1
    43 HALKV 1
    3171 HDLKR 1
    3172 HELTR 1
    3173 HGLRW 1
    3174 HGMRR 1
    3175 HILIR 1
    3176 HLLNR 1
    2661 HNLAR 1
    3177 HPAP . . .  1
    2645 HQLIR 1
    2825 HSLAR 1
    2933 HSLSR 1
    3178 HTLNK 1
    3179 HTLRA 1
    3180 HTLRG 1
    3181 HTLSR 1
    2709 HVLER 1
    3182 HWLLR 1
    2710 IGLRR 1
    2754 IGLTG 1
    2711 INLTR 1
    3183 ITLTR 1
    3184 KGLPG 1
    3185 MDVKG 1
    3186 MTLIR 1
    2635 NALRR 1
    2676 NALVR 1
    2614 NGLER 1
    2938 NHLVQ 1
    2786 NMLAR 1
    2543 NNLAR 1
    2637 NNLLR 1
    2787 NSLAR 1
    2940 NTLNR 1
    2941 NTLQR 1
    3187 P*MGS 1
    3188 PALKP 1
    3189 PGWAG 1
    3190 PTLKR 1
    3191 PTLRR 1
    PWS . . .  1
    2602 QALKR 1
    2947 QALTR 1
    3192 QDLAT 1
    3193 QDLVR 1
    2728 QGLAR 1
    2729 QNLHR 1
    2646 QNLQR 1
    2575 QNLRR 1
    2841 QNLRW 1
    3194 QPACV 1
    3195 QTLHR 1
    2950 QTLQR 1
    3196 QTLTR 1
    3197 RGLKR 1
    3198 RPAA . . .  1
    2336 RTLKV 1
    3199 SALHR 1
    1887 SALKV 1
    2955 SALMC 1
    2730 SALMR 1
    3200 SDLKS 1
    3201 SILKV 1
    3202 SILNR 1
    2791 SILVR 1
    2533 SMLAR 1
    3203 SMLLR 1
    3204 SMLR 1
    2524 SNLAR 1
    3205 SNLHR 1
    2963 SNLQR 1
    3206 SPLHR 1
    3207 SSLKW 1
    3208 STPER 1
    3209 STQVR 1
    3210 SVLQR 1
    3211 SVLSR 1
    2795 TALNR 1
    2631 TALVR 1
    2765 TGLDR 1
    3212 TGLKW 1
    3213 TGLNV 1
    3214 TGLQC 1
    3215 TGLRQ 1
    2977 TGPAR 1
    3216 TGPNR 1
    3217 TGQRR 1
    74 TMLRR 1
    2561 TNLMR 1
    2526 TNLNR 1
    3218 TRLVR 1
    3219 TSLIS 1
    3220 TTLDR 1
    3221 TTLKK 1
    3222 TTLRT 1
    1919 TTLRV 1
    2861 TVLRM 1
    2985 VALAR 1
    3223 VALRR 1
    3224 VGLHR 1
    3225 VGLNR 1
    2652 VGLQR 1
    2619 VGLRR 1
    2990 VGLTM 1
    2605 VNLKR 1
    3226 YGLAR 1
    3227 YGLVR 1
    3228 YILRR 1
  • TABLE 18
    ZF5 selection on G: T change at nt 8 of core motif
    in CBS. Sequences reflect position 2 to 6.
    SEQ
    ID
    NO: Sequence Read #
    50 GGLVR 178
    2538 AALRR 174
    2607 GALVR 170
    2462 TGLRR 162
    2464 SGLRR 158
    2461 GGLRR 152
    2463 EGLRR 148
    2475 AGLRR 143
    2641 GALRR 126
    56 HTLRR 125
    2027 GGLKR 117
    2700 AALTR 111
    2473 AGLTR 108
    2521 GALKR 104
    2465 AGLAR 102
    54 HGLVR 101
    1932 HGLKR 99
    2610 HALAR 97
    1986 AGLKR 96
    59 HGLRR 96
    1985 AALKR 94
    2466 SGLAR 93
    66 ATLRR 90
    2539 ETLRR 90
    2471 HGLAR 90
    2495 EGLKR 83
    2477 SGLTR 82
    2488 HGLTR 79
    1843 DGLRR 77
    2592 GALTR 75
    2467 GGLAR 74
    2483 HALRR 74
    2523 SALRR 71
    2486 SGLVR 70
    2734 TALRR 69
    3154 GALAR 66
    2500 TGLKR 66
    55 GGLTR 63
    2694 SALVR 61
    2875 ASLRR 57
    108 DALRR 57
    2530 DTLRR 52
    2819 GSLKR 50
    2748 HALHR 46
    2568 HSLVR 46
    2546 AALAR 45
    2131 SALKR 45
    2583 EALRR 44
    2770 AALVR 42
    1884 EALKR 42
    2827 HSLRR 42
    2532 SALAR 42
    2666 SALTR 42
    2489 SSLRR 41
    2654 ATLAR 40
    1930 HALKR 40
    2587 NTLRR 40
    2956 SALNR 40
    2479 AGLQR 39
    1837 DGLVR 38
    2502 ETLKR 38
    49 QALRR 38
    2678 TALAR 36
    2857 TTLAR 36
    2737 TTLRR 36
    2547 HALVR 35
    2578 HTLAR 35
    2476 GGLSR 34
    2738 AALNR 33
    2470 GGLQR 33
    2564 SSLVR 33
    2656 DTLNR 31
    2600 GSLRR 31
    2586 HTLMR 30
    2559 SGLKR 30
    2550 STLAR 30
    2498 AGLSR 29
    1988 AGLVR 29
    2509 ASLKR 29
    2684 GTLAR 29
    3229 QALVR 29
    2594 SGLIR 29
    2545 STLSR 29
    2472 TGLVR 29
    2468 DGLAR 28
    2701 DSLKR 28
    2762 STLTR 28
    2653 AALSR 27
    2674 HALSR 27
    2603 SALSR 27
    2850 STLMR 26
    2828 HTLNR 25
    1870 DGLTR 24
    51 HGLIR 24
    2628 NTLKR 24
    2589 TSLRR 24
    2997 VTLRR 24
    2569 AGLNR 23
    2721 GALSR 23
    2630 SSLTR 22
    2480 AGLHR 21
    2778 GALNR 21
    2753 HTLGR 21
    2593 QTLRR 21
    53 TGLTR 21
    2717 AALQR 20
    2562 GTLTR 20
    2643 HGLMR 20
    2617 TALKR 20
    2799 ASLQR 19
    2739 ATLVR 19
    1831 DALKR 19
    2634 ESLRR 19
    2659 GGLNR 19
    2622 ATLTR 18
    2528 GGLIR 18
    2660 GSLTR 18
    2554 GTLKR 18
    2707 GTLQR 18
    2636 NGLKR 18
    2667 SDLKR 18
    2698 STLVR 18
    2584 GTLRR 17
    2525 SGLLR 17
    2493 AGLIR 16
    2800 ASLVR 16
    2818 GSLAR 16
    2934 HTLHR 16
    2549 SGLNR 16
    2474 SGLSR 16
    1871 DALVR 15
    2916 GSLVR 15
    2782 HGLHR 15
    2878 ATLER 14
    3098 HSLQR 14
    2501 SGLQR 14
    2519 TGLIR 14
    2516 TGLSR 14
    2858 TTLGR 14
    2767 TTLVR 14
    2995 VTLKR 14
    2772 ATLSR 13
    2702 DSLRR 13
    2759 SALIR 13
    2631 TALVR 13
    2736 TTLLR 13
    2864 AALLR 12
    3230 HALTR 12
    2616 QGLVR 12
    2469 TGLAR 12
    2880 ATLLR 11
    2563 DGLNR 11
    2626 GTLVR 11
    2602 QALKR 11
    3231 SALLR 11
    3232 SSLHR 11
    2967 STLNR 11
    2492 TGLQR 11
    2590 TTLQR 11
    2876 ASLTR 10
    109 DGLKR 10
    2756 NSLRR 10
    2692 QSLKR 10
    2537 SGLMR 10
    2849 STLHR 10
    2638 STLRR 10
    3113 AALHR 9
    2879 ATLIR 9
    3017 ATLNR 9
    2672 DTLAR 9
    2566 DTLKR 9
    2484 HGLQR 9
    2933 HSLSR 9
    2943 PALKR 9
    2964 SSLIR 9
    2764 TALTR 9
    2588 TGLHR 9
    2881 ATLQR 8
    3007 HTLIR 8
    2829 HTLTR 8
    2941 NTLQR 8
    2579 QGLKR 8
    2699 SVLKR 8
    3047 TSLNR 8
    3233 AALIR 7
    2865 AALMR 7
    2999 DALTR 7
    2719 DTLQR 7
    3234 GSLHR 7
    2781 GSLQR 7
    2548 HGLSR 7
    2478 NGLVR 7
    2965 SSLKR 7
    2848 STLER 7
    2795 TALNR 7
    48 ATLKR 6
    2802 AVLKR 6
    3038 ETLAR 6
    2503 HGLLR 6
    2830 HTLVR 6
    2784 HVLKR 6
    3235 NALQR 6
    2485 NGLRR 6
    3236 NSLVR 6
    2580 SGLER 6
    2514 SGLHR 6
    2860 TTLKR 6
    3237 AALER 5
    3238 AALGR 5
    3025 ATLGR 5
    2598 EGLNR 5
    2904 GGLQE 5
    70 GNLTR 5
    2086 NALKR 5
    2788 NTLAR 5
    2843 QTLKR 5
    2950 QTLQR 5
    2505 TGLMR 5
    2515 TGLNR 5
    2980 TSLAR 5
    2743 DGLQR 4
    2703 DTLMR 4
    2777 EGLVR 4
    2745 GGLGR 4
    2536 GGLMR 4
    3239 GSLIR 4
    3240 GSLNR 4
    2673 HALLR 4
    2783 HTLKR 4
    46 HTLKV 4
    2938 NHLVQ 4
    2510 QGLRR 4
    3241 QVLKR 4
    3199 SALHR 4
    2845 SSLAR 4
    2668 STLGR 4
    3018 STLIR 4
    2966 STLLR 4
    3242 TALQR 4
    3073 TSLMR 4
    3243 AALDR 3
    2527 AGLLR 3
    2542 ANLAR 3
    69 ANLRR 3
    3244 ASLSR 3
    3012 ATLHR 3
    2570 DALAR 3
    2804 DALQR 3
    2499 DGLIR 3
    2553 DGLSR 3
    2520 DMLRR 3
    2497 EGLAR 3
    2490 EGLTR 3
    2658 ESLKR 3
    2491 GGLER 3
    2625 GGLLR 3
    138 GNLAR 3
    117 GNLVR 3
    3245 GSLSR 3
    3246 HALQR 3
    2577 HNLTR 3
    3085 HSLKR 3
    2613 HTLQR 3
    2832 ITLKR 3
    2833 MTLKR 3
    2787 NSLAR 3
    3247 NSLSR 3
    2940 NTLNR 3
    2947 QALTR 3
    2573 QGLTR 3
    3195 QTLHR 3
    3248 QTLVR 3
    2730 SALMR 3
    2496 SNLLR 3
    2604 SSLGR 3
    2847 STLDR 3
    2970 SVLRR 3
    2507 TGLLR 3
    2561 TNLMR 3
    68 TNLRR 3
    3249 TSLER 3
    2618 TTLMR 3
    2534 VGLKR 3
    2718 AGLDR 2
    2669 AGLER 2
    2797 AGLGR 2
    3250 ASLMR 2
    3251 ASLNR 2
    2552 DGLDR 2
    2529 DGLHR 2
    2591 DNLKR 2
    2535 DNLLR 2
    2623 DNLRR 2
    2506 DNLVR 2
    2683 DSLAR 2
    3030 DTLLR 2
    2809 DTLSR 2
    2810 DTLTR 2
    2720 DVLKR 2
    2811 DVLRR 2
    2890 EALTR 2
    3043 ETLQR 2
    3252 GALDR 2
    2779 GDLKR 2
    2780 GDLTR 2
    3253 GGPRR 2
    2917 GTLER 2
    3254 HALNR 2
    2820 HDLRR 2
    2687 HGLDR 2
    2585 HGLGR 2
    2821 HGLNR 2
    2482 HNLLR 2
    2826 HSLHR 2
    3255 MPLTR 2
    2834 NALHR 2
    2540 NGLAR 2
    2572 NGLIR 2
    2755 NGLLR 2
    2504 NGLQR 2
    2512 NGLTR 2
    2837 NTLHR 2
    2939 NTLIR 2
    2942 NVLKR 2
    2948 QDLIR 2
    2838 QDLKR 2
    2842 QSLRR 2
    3004 SILKR 2
    2556 SMLRR 2
    2793 SSLQR 2
    2697 STLQR 2
    2971 TALER 2
    2851 TALGR 2
    2157 TGLRV 2
    2978 TMLKR 2
    2511 TNLVR 2
    2715 TSLHR 2
    3019 TTLSR 2
    2651 TTLTR 2
    3256 AALTG 1
    2866 AAPER 1
    58 ADLKR 1
    2868 AGLAW 1
    3257 AGVIR 1
    3258 AGVTR 1
    71 AMLKR 1
    2621 ANLNR 1
    3090 ASLAR 1
    3259 ASLRG 1
    2801 ATLMR 1
    3260 ATLRM 1
    3261 ATPRR 1
    3262 AVLAR 1
    2882 AVLRR 1
    3263 AVLVR 1
    2803 DALNR 1
    2596 DGLGR 1
    1833 DGLKK 1
    1853 DGLRK 1
    3129 DGLWR 1
    3264 DGPAA . . .  1
    2640 DMLKR 1
    2597 DSLQR 1
    2776 DTLVR 1
    2014 DVLKK 1
    3265 EALHR 1
    3266 EALSR 1
    3095 EGLHR 1
    2891 EGLMR 1
    3267 EGLRG 1
    2894 EGLRV 1
    2705 ENLAR 1
    2633 ENLSR 1
    2814 GAEE . . .  1
    3268 GALER 1
    3269 GALGK 1
    3270 GALIR 1
    3271 GALKV 1
    3272 GALMR 1
    2815 GALQR 1
    3273 GAPRR 1
    3003 GDLNR 1
    2817 GDLVR 1
    2642 GGLDR 1
    2571 GGLHR 1
    3274 GGPAR 1
    3275 GGPVR 1
    3276 GGQVR 1
    3277 GGVAR 1
    3278 GGWP . . .  1
    2913 GMLAR 1
    2481 GNLER 1
    139 GNLMR 1
    2609 GNLQR 1
    3279 GSLRV 1
    2918 GTLGR 1
    2919 GTLHR 1
    3081 GTLMR 1
    2747 GTLNR 1
    2723 GTLSR 1
    3280 HAAQ . . .  1
    3281 HALAS 1
    3282 HALER 1
    3283 HALVH 1
    3284 HAMRR 1
    3285 HAQHR 1
    3286 HGLTL 1
    3287 HGLVM 1
    2531 HLLKR 1
    2661 HNLAR 1
    2557 HNLHR 1
    3050 HNLKR 1
    2627 HNLSR 1
    2644 HNLVR 1
    3177 HPAP . . .  1
    2645 HQLIR 1
    3288 HSLGR 1
    1936 HTLRV 1
    2935 HVLAR 1
    2710 IGLRR 1
    2754 IGLTG 1
    2711 INLTR 1
    3184 KGLPG 1
    3289 MPLQR 1
    2937 NALAR 1
    2663 NGLHR 1
    2615 NGLMR 1
    2555 NGLSR 1
    2664 NMLKR 1
    2543 NNLAR 1
    2637 NNLLR 1
    3006 NTLTR 1
    PWS . . .  1
    3290 QAPWP . . .  1
    3023 QDLRR 1
    2728 QGLAR 1
    2574 QMLKR 1
    2729 QNLHR 1
    2646 QNLQR 1
    2841 QNLRW 1
    3104 QTLSR 1
    3291 RGLQR 1
    2629 SALER 1
    2693 SALGR 1
    2955 SALMC 1
    3292 SALQR 1
    3293 SAQR . . .  1
    3294 SARVR 1
    2957 SDLAR 1
    3295 SDLNR 1
    2958 SDLQR 1
    2959 SDLRR 1
    3105 SELRR 1
    3296 SGADA . . .  1
    3297 SGLR . . .  1
    3298 SGLVC 1
    3299 SGPDP . . .  1
    2533 SMLAR 1
    2487 SNLDR 1
    2963 SNLQR 1
    2544 SNLSR 1
    2696 SQLRR 1
    3300 SSLPR 1
    2302 STLKT 1
    2968 STLRK 1
    3301 STPSR 1
    2733 SVLTR 1
    3302 TALLR 1
    3303 TAPTR 1
    2973 TDLAR 1
    2974 TDLRR 1
    3304 TGLIK 1
    2977 TGPAR 1
    3217 TGQRR 1
    2595 TNLKR 1
    2526 TNLNR 1
    2766 TSLKR 1
    2983 TSLQR 1
    2859 TTLIR 1
    1849 TTLKV 1
    2681 TTLNR 1
    2861 TVLRM 1
    3305 TWLRR 1
    2985 VALAR 1
    3306 VALQR 1
    2652 VGLQR 1
    2990 VGLTM 1
    2605 VNLKR 1
    3307 VSLKR 1
    3308 VSLRR 1
    3112 VTLAR 1
    2994 VTLGR 1
  • TABLE 19
    ZF4 selection on G: T change at nt 10 of core
    motif in CBS. Sequences reflect position 2 to
    6.
    SEQ
    ID
    NO: Sequence Read #
    60 AHLRK 4967
    158 GHLKK 1446
    3309 THLRA 1429
    1386 EHLRR 1293
    162 GHLRK 1082
    3310 HHLTK 876
    63 AKLRI 867
    61 AKLRV 641
    3311 AKLRL 625
    3312 AKLKI 599
    3313 SHLRK 566
    159 AHLKK 560
    163 THLKK 496
    160 TKLRL 486
    92 SKLRL 475
    2137 SKLKV 466
    161 TKLKL 466
    3314 QHLRK 457
    3315 AKLKL 443
    3316 GHLVK 419
    3317 GKLKI 302
    3318 THLRK 268
    3319 AKLKV 258
    106 GKLRI 246
    3320 GKLRL 224
    3321 GHLRL 213
    3322 TKLKI 199
    3323 RSLGL 178
    90 AHLRV 177
    3324 AHLRL 153
    3325 TKLRV 152
    3326 SKLKI 146
    3327 SHLVG 132
    3328 GKLKL 116
    64 TKLKV 108
    3329 THLRT 107
    3330 GHLRR 102
    *R . . .  92
    3331 SHLRL 90
    65 SKLRV 80
    3332 GALV . . .  79
    3333 GHLKM 75
    3334 SKLRI 74
    3335 GILS . . .  71
    3336 SK*VL 63
    3337 SKLVL 62
    TR . . .  61
    3338 IRLGV 59
    3339 MALGL 58
    3340 EHLRK 54
    3341 GHLRM 54
    1407 EHLKR 50
    3342 ITLM . . .  48
    3343 AHLVK 40
    3344 THLRL 40
    3345 GKLKV 38
    3346 GHLKL 34
    3347 AHLRR 32
    3348 GHLIK 30
    3349 EHLVR 28
    3350 GKLRV 27
    3351 TALSM 26
    3352 EHLQR 25
    3353 EKLKV 25
    3354 QHLVK 25
    3355 TKLNL 25
    3356 GHLRA 23
    3357 GRLPK 21
    NGR . . .  21
    3358 SKLKL 21
    3359 THLTK 21
    3360 RLLSG 20
    3361 TKLRI 19
    3362 AHLRI 18
    409 GHLKV 16
    3363 GHLRV 16
    3364 GLLPG 16
    3365 AKLRT 14
    3366 RHLRV 14
    3367 AALRK 11
    3368 AHLHK 11
    3369 GHLTK 11
    3370 QHLRR 11
    3371 RSHS . . .  11
    3372 SHLNK 11
    3373 AHLQK 10
    3374 GHLMK 10
    3375 SKLRT 10
    287 AHLKV 9
    3376 AHLRA 9
    370 AHLRT 9
    3377 EHLRL 9
    3378 GHLKI 9
    3379 SHLKL 9
    3380 EHLKK 8
    3381 GHLRT 8
    3382 GKLKM 8
    3383 HHLKK 8
    3384 SKLTI 8
    3385 THEKP . . .  8
    *G . . .  7
    3386 AKLIL 7
    3387 AKLTI 7
    3388 HALAA 7
    3389 TKLQV 7
    3390 AKLRM 6
    3391 EHLRI 6
    3392 GHLAK 6
    3393 GHLKR 6
    3394 GKLTL 6
    3395 SHLKK 6
    3396 SHLRR 6
    3397 AILKA 5
    89 AKLRK 5
    3398 AKLTL 5
    3399 ASLTG 5
    201 EHLRV 5
    3400 EVLTM 5
    3401 GHLKT 5
    3402 NGRS . . .  5
    3403 THLRR 5
    3404 AHLKL 4
    3405 GALVH 4
    3406 GKLVL 4
    3407 NGRSPV . . .  4
    3408 QALSI 4
    3409 SHLRT 4
    TRS . . .  4
    3410 AALRL 3
    3411 AHLMK 3
    439 AHLRE 3
    3412 AHLRQ 3
    3413 AKLNL 3
    3414 AKLRA 3
    3415 APLRK 3
    186 EKLRI 3
    3416 GALMG 3
    3417 GALTG 3
    3418 GHLRG 3
    3419 GHLTL 3
    3420 GKLRK 3
    3421 GKLTV 3
    187 GKLVT 3
    3422 HHLRK 3
    3423 MGLVG 3
    1848 SHLKV 3
    3424 SHLRI 3
    3425 SKLIL 3
    3426 SKLMV 3
    3427 SLLAG 3
    3428 THLKI 3
    3429 THLQK 3
    3430 VPLAG 3
    3431 AGLLG 2
    3432 AHLKM 2
    3433 AHLRN 2
    3434 AHLTK 2
    3435 AKLIV 2
    3436 AKLKA 2
    88 AKLKK 2
    3437 AKLTV 2
    3438 AKLVL 2
    3439 AKSRI 2
    3440 AMLMQ 2
    3441 AQLRI 2
    3442 DALR . . .  2
    419 EHLRA 2
    313 EHLRT 2
    3443 EKLKL 2
    3444 GGLQK 2
    3445 GGLTM 2
    GH*R . . .  2
    3446 GHLLR 2
    3447 GHLRI 2
    3448 GHLVG 2
    3449 GHLVR 2
    3450 GKLNL 2
    2912 GKLRR 2
    3451 GKLVP 2
    3452 GLLGL 2
    3453 GNLGM 2
    3454 GVLQK 2
    3455 HGLLP 2
    2043 HHLRV 2
    3456 HLLEN 2
    3457 IGLQR 2
    3458 KTLGV 2
    3459 LSLLK 2
    3460 MRLGE 2
    3461 NSLTR 2
    3462 NVLNK 2
    3463 PHLRK 2
    3464 PLLMP 2
    3465 PRLRH 2
    3466 QKLHL 2
    3467 QKLNL 2
    3468 SHLRV 2
    3469 SKLHL 2
    3470 SKLKR 2
    3471 SKLNL 2
    3472 SPLAE 2
    3473 SVLML 2
    TH*R . . .  2
    2448 THLKL 2
    3474 THLRV 2
    3475 TKLIL 2
    3476 TKLMV 2
    3477 TPLNI 2
    3478 TRLQK 2
    3024 TSLTR 2
    3479 VGLGQ 2
    3480 VHLRK 2
    3481 AALES 1
    3482 AALRI 1
    3483 ADLRK 1
    3484 AELLG 1
    3485 AELRI 1
    3486 AGLAA 1
    1986 AGLKR 1
    3487 AGLMD 1
    3488 AHLGL 1
    3489 AHLK . . .  1
    3490 AHLKA 1
    3491 AHLKI 1
    438 AHLKT 1
    3492 AHLNK 1
    3493 AHLR . . .  1
    3494 AHLSK 1
    3495 AHLSP 1
    214 AHLTV 1
    3496 AHLWK 1
    3497 AKFKI 1
    3498 AKIKH 1
    3499 AKIRI 1
    3500 AKIRL 1
    3501 AKIRV 1
    3502 AKLHT 1
    3503 AKLKE 1
    3504 AKLKG 1
    3505 AKLKM 1
    3506 AKLMN 1
    3507 AKLNI 1
    3508 AKLQL 1
    3509 AKLRG 1
    3510 AKLRR 1
    3511 AKLSM 1
    3512 AKSRV 1
    3513 AKVKL 1
    3514 AKVRI 1
    3515 ALLMA 1
    3516 ALLRR 1
    3517 AMLIM 1
    3518 AMLKI 1
    3519 AMLRG 1
    3520 AMLRL 1
    3521 ANLSN 1
    3522 ANVAQ 1
    3523 APLKK 1
    3524 AQFRK 1
    3525 AQLVD 1
    3526 ARLAG 1
    3527 ARLGT 1
    3528 ARLRA 1
    3529 ARLRK 1
    3530 ASLRM 1
    3531 ATLKL 1
    3532 ATLRV 1
    3533 C*LKI 1
    3534 DELMR 1
    3535 DELRV 1
    3536 DGLES 1
    2005 DGLLR 1
    3537 DGLMD 1
    3538 DGLVG 1
    3539 DHLKK 1
    3540 DHLRK 1
    3541 DHLRR 1
    3542 DKLRK 1
    3543 DLLGV 1
    3544 DLLLN 1
    3545 DNLRE 1
    3546 DPLAR 1
    3547 DSLGE 1
    3548 EALMA 1
    3549 EDLVK 1
    3550 EELGL 1
    3551 EELMM 1
    3267 EGLRG 1
    3552 EGLVE 1
    3553 EHLG . . .  1
    3554 EHLHK 1
    3555 EHLKL 1
    3556 EHLKM 1
    2016 EHLRQ 1
    3557 EHLRS 1
    3558 EHLSE 1
    3559 EHLSR 1
    3560 EHLTK 1
    3561 EHLVK 1
    3562 EQLGP 1
    3563 ERLAA 1
    3564 ERLGR 1
    1893 ERLRR 1
    3565 ESLMA 1
    3566 ETLSH 1
    3567 EVLGI 1
    3568 FFLRV 1
    3569 GALGR 1
    3570 GALIM 1
    3571 GDLSG 1
    3572 GGLDL 1
    3573 GGLDQ 1
    1957 GGLKV 1
    3574 GGLNM 1
    3575 GGLPE 1
    2295 GGLVV 1
    3576 GHFKT 1
    3577 GHFQN 1
    3578 GHLK . . .  1
    3579 GHLMN 1
    3580 GHLMV 1
    3159 GHLQR 1
    3581 GHLR . . .  1
    3582 GILAG 1
    3583 GKLHE 1
    3584 GKLKA 1
    3585 GKLKF 1
    3586 GKLKT 1
    3587 GKLR . . .  1
    3588 GKLRA 1
    3589 GKLRM 1
    3590 GKLVA 1
    3591 GKLVV 1
    3592 GLLGE 1
    3593 GLLLD 1
    3594 GLLMG 1
    3595 GLLRG 1
    3596 GMLGG 1
    3597 GPLGV 1
    3598 GPLRV 1
    3599 GRLKI 1
    3600 GRLKK 1
    3601 GSLST 1
    3602 GSLVK 1
    2554 GTLKR 1
    3603 GVLAG 1
    3604 GVLLV 1
    3605 GVLS . . .  1
    3606 GYLRK 1
    3607 HALRT 1
    3608 HALVN 1
    3609 HGLTG 1
    3610 HHLAK 1
    3611 HHLRR 1
    3612 HIRS . . .  1
    3613 HTHEK 1
    3614 IELVQ 1
    3615 IGLGL 1
    3616 IKLRL 1
    3617 IMLRE 1
    3618 IMLVE 1
    3619 IPLGD 1
    3620 IQLRK 1
    3621 IRLG . . .  1
    3622 IRLGG 1
    3623 IRLVV 1
    3624 IVLAA 1
    3625 KHLRA 1
    3626 KHLRL 1
    3627 KILPE 1
    3628 KKLLE 1
    3629 KMLPP 1
    3630 KNLIK 1
    3631 KSLMP 1
    3632 LALGG 1
    3633 LGLGA 1
    3634 LGLVG 1
    3635 LHLTK 1
    LQ . . .  1
    3636 LRLIG 1
    LTE . . .  1
    3637 LTLQR 1
    3638 LVLRR 1
    3639 MA*SHMK 1
    3640 MALRL 1
    3641 MALTR 1
    3642 MGLDP 1
    3643 MGLGE 1
    3644 MGLQN 1
    3645 MHLRM 1
    3646 MKLEQ 1
    3647 MLLRN 1
    3648 MLLSH 1
    3649 MLLVN 1
    3650 MPLRA 1
    3651 MQLGG 1
    3652 MRLAR 1
    3653 MRLMG 1
    3654 MRLVG 1
    3655 MSLER 1
    3656 MTLPL 1
    3657 MTLSD 1
    3658 MVLAG 1
    NG . . .  1
    2615 NGLMR 1
    2504 NGLQR 1
    3659 NKLRL 1
    3660 NLAH 1
    3661 NLLPT 1
    3662 NRLES 1
    3663 NRLGG 1
    3664 NTLPK 1
    3665 PGLHG 1
    3666 PGLRA 1
    3667 PHFTK 1
    3668 PILLQ 1
    3669 PKLGL 1
    3670 PLLKS 1
    3671 PQLTG 1
    3672 PREAM 1
    3673 PTLQR 1
    3674 QELGR 1
    3675 QGLPV 1
    3676 QHLKK 1
    3677 QHLQR 1
    3678 QHLR . . .  1
    3679 QHLRI 1
    3680 QHLRL 1
    3681 QHLTK 1
    3682 QILLH 1
    3683 QKLRI 1
    3684 QNLHK 1
    3685 QPLIK 1
    3686 QQVTA . . .  1
    3687 QTLAE 1
    3688 QVTLA 1
    3689 RALSA 1
    RGL . . .  1
    3690 RGLGA 1
    3691 RGLTA 1
    2953 RGLVR 1
    3692 RGLVV 1
    3693 RHLRA 1
    3694 RHLRE 1
    3695 RHLRM 1
    3696 RHLRR 1
    3697 RILPR 1
    3698 RKLIV 1
    3699 RKLKL 1
    3700 RLLGA 1
    3701 RLLMP 1
    3702 RLLRR 1
    3703 RMLVP 1
    3704 RRLEG 1
    3705 RRLVN 1
    3706 RTLML 1
    3707 RTLTQ 1
    3708 SDLHV 1
    3709 SDLRK 1
    2581 SGLGR 1
    3710 SGLLV 1
    2486 SGLVR 1
    3711 SHLKM 1
    3712 SHLRA 1
    3713 SHLRE 1
    3714 SHLRG 1
    3715 SHLTK 1
    3716 SHLTM 1
    3717 SHLV . . .  1
    3718 SHLVK 1
    3719 SKIRL 1
    3720 SKLEG 1
    3721 SKLGA 1
    3722 SKLKG 1
    2191 SKLRM 1
    3723 SKLRN 1
    3724 SKLRR 1
    3725 SLLEE 1
    3726 SLLGT 1
    3727 SLLNG 1
    2138 SQLKV 1
    3728 SQLLE 1
    3729 SRLMA 1
    3730 STLLM 1
    3731 STLVG 1
    3732 TALRG 1
    TG . . .  1
    2469 TGLAR 1
    3733 TGLGL 1
    3734 TGLLK 1
    2157 TGLRV 1
    3735 TGLVD 1
    3385 THEKP 1
    3736 THFRT 1
    3737 THIR . . .  1
    3738 THLAR 1
    2449 THLKQ 1
    3739 THLLK 1
    3740 THLMK 1
    331 THLRP 1
    3741 THLVK 1
    3742 THMK 1
    3743 THVKK 1
    3744 TKLKM 1
    3745 TKLKR 1
    3746 TKLNM 1
    3747 TKLRK 1
    3748 TKLRP 1
    3749 TKLS . . .  1
    3750 TKLTI 1
    3751 TMLGG 1
    3752 TMLKL 1
    3753 TMLPG 1
    3754 TPLKR 1
    3755 TPLRA 1
    3756 TQLKK 1
    3757 TQLKL 1
    1941 TQLKV 1
    3758 TR*RL 1
    3759 TRLKL 1
    110 TRLRE 1
    TS . . .  1
    3760 TTLGI 1
    3761 TYLKK 1
    3762 VELDP 1
    3763 VELVN 1
    3764 VKLQQ 1
    3765 VKLRL 1
    3766 VKLRN 1
    3767 VKLRV 1
    3768 VLLKS 1
    3769 VLLQM 1
    3770 VMLKD 1
    3771 VMLMG 1
    3772 VPLAL 1
    3773 VPLER 1
    3774 VPLNT 1
    3775 VPLSS 1
    3776 VPLVP 1
    VQ*G . . .  1
    3777 VRLEE 1
    3778 VRLQA 1
    3779 VVTA . . .  1
    3780 WHLKK 1
    YG . . .  1
  • TABLE 20
    ZF4 selection on G: C change at nt 10 of core
    motif in CBS. Sequences reflect position 2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    61 AKLRV 5924
    3325 TKLRV 4888
    64 TKLKV 3542
    2137 SKLKV 3056
    3319 AKLKV 2451
    65 SKLRV 1583
    3375 SKLRT 474
    3350 GKLRV 320
    63 AKLRI 254
    3345 GKLKV 237
    3312 AKLKI 164
    1986 AGLKR 132
    3322 TKLKI 129
    1957 GGLKV 78
    3326 SKLKI 76
    3334 SKLRI 76
    3527 ARLGT 64
    3781 VALGS 48
    3454 GVLQK 46
    TRS . . .  39
    60 AHLRK 30
    3782 AKLVV 26
    3783 TKLRA 24
    3784 LGLRG 18
    3652 MRLAR 15
    3785 TKLKA 14
    3722 SKLKG 13
    3361 TKLRI 13
    3365 AKLRT 12
    NGR . . . . 12
    3786 PNLAV 12
    3787 GGLEV 10
    158 GHLKK 10
    3788 PREAI 10
    3789 TKLKG 10
    3790 TKLIV 9
    3791 WILRA 9
    3792 AK*RG 8
    3414 AKLRA 8
    3311 AKLRL 8
    3793 EK*KV 8
    106 GKLRI 8
    3310 HHLTK 8
    3385 THEKP . . . . 8
    3794 TK*RG 8
    3795 TKLRT 8
    3315 AKLKL 7
    3796 AKLRE 7
    3437 AKLTV 7
    3353 EKLKV 7
    2187 SKLKE 7
    3797 TKLRG 7
    3509 AKLRG 6
    1386 EHLRR 6
    3798 EKLRV 6
    3799 RALW . . .  6
    2438 SKLKA 6
    3504 AKLKG 5
    3390 AKLRM 5
    3400 EVLTM 5
    3314 QHLRK 5
    3800 SKLVV 5
    1851 STLKV 5
    3801 TKLKE 5
    3802 TKLNV 5
    3316 GHLVK 4
    3320 GKLRL 4
    3803 KDALQYESEC 4
    G . . . 
    3804 LSLVD 4
    3805 QKLKV 4
    3806 RELKE . . . . 4
    3807 RILGS 4
    163 THLKK 4
    3309 THLRA 4
    3808 TKIRV 4
    160 TKLRL 4
    3809 TKLRM 4
    3810 TKLVV 4
    3811 TKVRV 4
    3812 TRSHSR . . . . 4
    159 AHLKK 3
    3436 AKLKA 3
    3813 AKLRD 3
    1909 ATLKV 3
    3532 ATLRV 3
    3536 DGLES 3
    3814 GGLKG 3
    3418 GHLRG 3
    162 GHLRK 3
    3815 GKLIV 3
    3816 GKLKG 3
    3317 GKLKI 3
    3451 GKLVP 3
    3817 KKLHW . . .  3
    3408 QALSI 3
    3818 RTLS . . . . 3
    3819 SKLRA 3
    3820 SKVRV 3
    3427 SLLAG 3
    3821 TK*SV 3
    3822 TKLAV 3
    3823 TKLRE 3
    3824 TKSRV 3
    3825 TKVKV 3
    3826 VMLMM 3
    3430 VPLAG 3
    3431 AGLLG 2
    3827 AILQV 2
    3501 AKIRV 2
    3435 AKLIV 2
    3503 AKLKE 2
    3828 AKLMV 2
    3829 AKLSV 2
    3830 AKVKV 2
    3521 ANLSN 2
    2315 DKLRV 2
    3831 ETLMH 2
    3416 GALMG 2
    3444 GGLQK 2
    3445 GGLTM 2
    3333 GHLKM 2
    3832 GKSKV 2
    3592 GLLGE 2
    3452 GLLGL 2
    3453 GNLGM 2
    2554 GTLKR 2
    3456 HLLEN 2
    3457 IGLQR 2
    3833 IKLRV 2
    3834 KALHT 2
    3835 KGLMM 2
    3836 MELAE 2
    3423 MGLVG 2
    3460 MRLGE 2
    3656 MTLPL 2
    2615 NGLMR 2
    3402 NGRS . . .  2
    3837 NKLKV 2
    3838 PRLLA 2
    3465 PRLRH 2
    3839 PRLSR 2
    3840 QGLEA 2
    2434 SELKV 2
    3470 SKLKR 2
    3841 SKLRE 2
    3842 SKLRG 2
    TH*R . . .  2
    3843 TKIKV 2
    161 TKLKL 2
    3476 TKLMV 2
    3389 TKLQV 2
    3844 TKLRD 2
    3845 TKLSV 2
    3477 TPLNI 2
    3478 TRLQK 2
    3024 TSLTR 2
    1919 TTLRV 2
    V 2
    3481 AALES 1
    3846 AELKA 1
    3847 AELKV 1
    3484 AELLG 1
    3486 AGLAA 1
    3848 AGLKH 1
    2475 AGLRR 1
    2498 AGLSR 1
    2473 AGLTR 1
    1988 AGLVR 1
    3490 AHLKA 1
    287 AHLKV 1
    90 AHLRV 1
    3495 AHLSP 1
    3849 AKIRE 1
    3850 AKLAV 1
    3851 AKLGV 1
    3852 AKLMI 1
    3853 AKLNV 1
    3854 AKLRF 1
    3855 AKLRN 1
    3387 AKLTI 1
    3856 AKLWV 1
    3857 AKRRV 1
    3858 AKSKV 1
    3859 AKVRG 1
    3860 ALLKV 1
    3517 AMLIM 1
    3861 AMLKV 1
    3440 AMLMQ 1
    3519 AMLRG 1
    3862 AQLKV 1
    3863 AQLRV 1
    3525 AQLVD 1
    1945 ARLKV 1
    3864 ARLRI 1
    1993 ARLRM 1
    1947 ARLRV 1
    3865 ATLQV 1
    3866 AVLKV 1
    3867 AYPRE 1
    3868 CGLHW . . .  1
    3869 CKLRV 1
    1995 DALDR 1
    3535 DELRV 1
    1852 DGLKV 1
    2005 DGLLR 1
    3537 DGLMD 1
    3870 DGLTG 1
    3538 DGLVG 1
    3871 DHLKR 1
    206 DHLNV 1
    3543 DLLGV 1
    3544 DLLLN 1
    3545 DNLRE 1
    3546 DPLAR 1
    3872 DRLTI 1
    3873 DVLKG 1
    3874 DVLRG 1
    3875 EALVH 1
    3551 EELMM 1
    3267 EGLRG 1
    3552 EGLVE 1
    201 EHLRV 1
    3349 EHLVR 1
    3562 EQLGP 1
    3876 EQLMT 1
    3564 ERLGR 1
    3565 ESLMA 1
    3566 ETLSH 1
    3877 EVLAA 1
    3567 EVLGI 1
    G . . .  1
    3571 GDLSG 1
    3573 GGLDQ 1
    3878 GGLKD 1
    3879 GGLKI 1
    2659 GGLNR 1
    3575 GGLPE 1
    GH*R . . .  1
    3393 GHLKR 1
    3446 GHLLR 1
    3580 GHLMV 1
    3330 GHLRR 1
    3363 GHLRV 1
    3419 GHLTL 1
    3448 GHLVG 1
    3582 GILAG 1
    3880 GILRM 1
    3881 GK*RG 1
    3584 GKLKA 1
    3382 GKLKM 1
    3882 GKLML 1
    3883 GKLQV 1
    3588 GKLRA 1
    3884 GKLRQ 1
    3885 GKLRT 1
    3394 GKLTL 1
    3593 GLLLD 1
    3594 GLLMG 1
    3364 GLLPG 1
    3595 GLLRG 1
    3886 GPLGQ 1
    3597 GPLGV 1
    3887 GPLMG 1
    3888 GQLKA 1
    3889 GRLAV 1
    3890 GRLNA 1
    3601 GSLST 1
    3602 GSLVK 1
    3603 GVLAG 1
    3604 GVLLV 1
    3607 HALRT 1
    3455 HGLLP 1
    3612 HIRS . . .  1
    3891 HPLTV 1
    3892 HRLTR 1
    3614 IELVQ 1
    3615 IGLGL 1
    3893 IKLKV 1
    3894 IMLKS 1
    3618 IMLVE 1
    3895 IQSGE 1
    3896 IQVTLA 1
    3897 IRLAL 1
    3621 IRLG . . .  1
    3338 IRLGV 1
    3342 ITLM . . .  1
    3624 IVLAA 1
    3898 KALRG 1
    3628 KKLLE 1
    3899 KKLRE 1
    3900 KKLVR 1
    3629 KMLPP 1
    3630 KNLIK 1
    3631 KSLMP 1
    3458 KTLGV 1
    3632 LALGG 1
    3633 LGLGA 1
    3634 LGLVG 1
    LQ . . .  1
    3636 LRLIG 1
    3901 LSLDG 1
    3637 LTLQR 1
    3638 LVLRR 1
    MA . . .  1
    3339 MALGL 1
    3641 MALTR 1
    3902 MELDR 1
    3642 MGLDP 1
    3643 MGLGE 1
    3644 MGLQN 1
    3646 MKLEQ 1
    3903 MKLQA 1
    3904 MKLRV 1
    3647 MLLRN 1
    3649 MLLVN 1
    3905 MPLLA 1
    3650 MPLRA 1
    3906 MRLARHIRS 1
    HTGERP . . . 
    3653 MRLMG 1
    3655 MSLER 1
    3907 MSLVN 1
    3657 MTLSD 1
    3658 MVLAG 1
    3908 MVLQE 1
    3909 MVLVG 1
    N . . .  1
    3910 NDALEYESEC 1
    GP . . . 
    3911 NDALQYESV 1
    CVP . . . 
    2504 NGLQR 1
    3912 NGLVV 1
    3913 NK*NV 1
    3914 NKLRV 1
    3660 NLAH 1
    3661 NLLPT 1
    3663 NRLGG 1
    3664 NTLPK 1
    NV . . .  1
    3915 NVLGG 1
    3462 NVLNK 1
    3916 PGLAA 1
    3665 PGLHG 1
    3669 PKLGL 1
    3917 PKLRA 1
    3670 PLLKS 1
    3464 PLLMP 1
    3918 PNLAG 1
    3919 PNYW . . .  1
    3671 PQLTG 1
    3672 PREAM 1
    3673 PTLQR 1
    3920 PVLDH 1
    Q 1
    3921 QALTN 1
    3674 QELGR 1
    3675 QGLPV 1
    3682 QILLH 1
    3467 QKLNL 1
    3684 QNLHK 1
    3685 QPLIK 1
    3687 QTLAE 1
    3922 QVLRK 1
    3689 RALSA 1
    3923 RELVR 1
    RGL . . .  1
    3924 RGLDM 1
    3925 RGLDR 1
    3691 RGLTA 1
    3926 RGLVA 1
    2953 RGLVR 1
    3692 RGLVV 1
    3694 RHLRE 1
    3697 RILPR 1
    3698 RKLIV 1
    3927 RKLKA 1
    3928 RKLKV 1
    3929 RKLRE 1
    3930 RKLRV 1
    3931 RKVRV 1
    3700 RLLGA 1
    3701 RLLMP 1
    3932 RMLQE 1
    3703 RMLVP 1
    3933 RPLEV 1
    3705 RRLVN 1
    3706 RTLML 1
    3707 RTLTQ 1
    S*G . . .  1
    3708 SDLHV 1
    2581 SGLGR 1
    3710 SGLLV 1
    2486 SGLVR 1
    1848 SHLKV 1
    3331 SHLRL 1
    3934 SKFKV 1
    3935 SKFRV 1
    3936 SKIRT 1
    3469 SKLHL 1
    3937 SKLKD 1
    3358 SKLKL 1
    3938 SKLKM 1
    3939 SKLQI 1
    92 SKLRL 1
    3940 SKLSV 1
    3941 SKLTV 1
    3337 SKLVL 1
    3942 SKSRT 1
    3943 SKVKV 1
    3944 SKVRT 1
    3725 SLLEE 1
    3726 SLLGT 1
    3945 SNLKG 1
    3946 SNLTH 1
    3728 SQLLE 1
    1857 SRLKV 1
    3730 STLLM 1
    3947 TALIS 1
    3732 TALRG 1
    3948 TELIG 1
    3949 TELKV 1
    TG*S . . .  1
    2469 TGLAR 1
    3733 TGLGL 1
    2157 TGLRV 1
    3385 THEKP 1
    3737 THIR . . .  1
    3738 THLAR 1
    3429 THLQK 1
    3318 THLRK 1
    3344 THLRL 1
    3329 THLRT 1
    3950 TKLHV 1
    3951 TKLKD 1
    3744 TKLKM 1
    3745 TKLKR 1
    3952 TKLKT 1
    3953 TKLMA 1
    3746 TKLNM 1
    3954 TKLQI 1
    3955 TKLR . . .  1
    3956 TKLTV 1
    3957 TKLWV 1
    3958 TKSRD 1
    3751 TMLGG 1
    3959 TMLKV 1
    3753 TMLPG 1
    3960 TMLRV 1
    3754 TPLKR 1
    1864 TRLKV 1
    110 TRLRE 1
    2168 TRLRG 1
    1883 TRLRV 1
    3961 TRSHS . . .  1
    3962 TTIRV 1
    3760 TTLGI 1
    1849 TTLKV 1
    3963 TTLSA 1
    3964 TTLVP 1
    3965 TVLAP 1
    3966 TVLPM 1
    3967 VALTK 1
    3763 VELVN 1
    3479 VGLGQ 1
    3968 VGLLR 1
    3969 VKLLV 1
    3764 VKLQQ 1
    3766 VKLRN 1
    3767 VKLRV 1
    3768 VLLKS 1
    3970 VLLMA 1
    3971 VLLPS 1
    3770 VMLKD 1
    3771 VMLMG 1
    3972 VNLLE 1
    3772 VPLAL 1
    3773 VPLER 1
    3774 VPLNT 1
    3775 VPLSS 1
    3776 VPLVP 1
    VQ*G . . .  1
    3973 VQLPV 1
    3777 VRLEE 1
    3778 VRLQA 1
    2994 VTLGR 1
    3974 YTHMK 1
  • TABLE 21
    ZF4 selection on G: A change at nt 10 of core
    motif in CBS. Sequences reflect position 2 to 6.
    SEQ
    ID
    NO: Sequence Read #
    61 AKLRV 408
    3350 GKLRV 294
    TRS 180
    64 TKLKV 170
    3320 GKLRL 166
    3402 NGRS 155
    3325 TKLRV 124
    3390 AKLRM 109
    160 TKLRL 109
    3345 GKLKV 107
    3312 AKLKI 92
    3319 AKLKV 88
    186 EKLRI 84
    3655 MSLER 68
    3975 NGRSPVC 67
    3416 GALMG 66
    3976 AELIR 63
    2581 SGLGR 63
    3915 NVLGG 61
    3977 RGLT 61
    3978 TLLMG 58
    3451 GKLVP 57
    3430 VPLAG 57
    3682 QILLH 55
    3979 TLPL 55
    3980 *MLTS 54
    3981 EMLTS 53
    2137 SKLKV 53
    3615 IGLGL 52
    3322 TKLKI 52
    3495 AHLSP 51
    3828 AKLMV 51
    3982 DALRG 51
    3633 LGLGA 51
    3805 QKLKV 51
    3408 QALSI 50
    3983 PLLET 49
    3984 PSLM 49
    3452 GLLGL 48
    3985 TLLVG 48
    3766 VKLRN 48
    62 GGLGL 47
    3419 GHLTL 47
    3986 GPLHI 46
    3649 MLLVN 46
    3987 VELNS 46
    3988 AKLIT 45
    3394 GKLTL 45
    3946 SNLTH 45
    3989 AT*RR 44
    3544 DLLLN 44
    3596 GMLGG 44
    3923 RELVR 44
    3990 SPLLS 44
    3991 DKLRR 43
    3570 GALIM 43
    3992 GLLG 43
    3993 GLMM 42
    3994 IHLAD 42
    3995 TLTQ 42
    3996 TRSHSS 42
    3997 ALMQ 41
    1947 ARLRV 41
    3321 GHLRL 41
    3456 HLLEN 41
    3998 HTLNM 41
    3999 PMLVD 41
    3469 SKLHL 41
    4000 GK*KL 40
    3440 AMLMQ 39
    3546 DPLAR 39
    3328 GKLKL 39
    3914 NKLRV 39
    3732 TALRG 39
    3827 AILQV 38
    3435 AKLIV 38
    3311 AKLRL 38
    3612 HIRS 38
    3382 GKLKM 37
    3592 GLLGE 37
    3453 GNLGM 37
    3582 GILAG 36
    4001 GPLAL 36
    3908 MVLQE 36
    3669 PKLGL 36
    4002 ARLGL 35
    4003 EELLK 35
    3647 MLLRN 35
    3685 QPLIK 35
    288 AHLAV 34
    3400 EVLTM 34
    3460 MRLGE 34
    3548 EALMA 33
    4004 PLLGV 33
    3671 PQLTG 33
    3877 EVLAA 32
    4005 HPLQQ 32
    3916 PGLAA 32
    3467 QKLNL 32
    4006 SKLNN 32
    4007 TRLRN 32
    3438 AKLVL 31
    4008 DLLV 31
    462 DSLLA 31
    4009 GELRT 31
    4010 RLLGV 31
    2700 AALTR 30
    3444 GGLQK 30
    2615 NGLMR 30
    4011 NRLQ 30
    4012 PALGN 30
    4013 PLLGM 30
    4014 PPLMQ 30
    4015 TQLEE 30
    4016 VGLEG 30
    3543 DLLGV 29
    3572 GGLDL 29
    3418 GHLRG 29
    4017 KTLRE 29
    4018 PRLR 29
    4019 PSLGV 29
    4020 RR*PS 29
    3735 TGLVD 29
    3429 THLQK 29
    DGLMDHIRSH
    4021 TGERPF 28
    3459 LSLLK 28
    4022 MVLVP 28
    4023 SELTG 28
    4024 SGLKH 28
    3754 TPLKR 28
    4025 VGLG 28
    60 AHLRK 27
    3506 AKLMN 27
    63 AKLRI 27
    4026 DRLGP 27
    4027 GLLGR 27
    3617 IMLRE 27
    4028 KQLQP 27
    MA*S 27
    NGR 27
    3694 RHLRE 27
    4029 RPLLR 27
    4030 RSLRL 27
    65 SKLRV 27
    3427 SLLAG 27
    3760 TTLGI 27
    3484 AELLG 26
    2473 AGLTR 26
    3538 DGLVG 26
    4031 GALG 26
    4032 GDLSP 26
    3573 GGLDQ 26
    3580 GHLMV 26
    3317 GKLKI 26
    4033 GKLSL 26
    3603 GVLAG 26
    4034 LRLNL 26
    4035 MTLGN 26
    4036 PMLAA 26
    3375 SKLRT 26
    3746 TKLNM 26
    4037 ALIG 25
    4038 AQLAN 25
    4039 DGLAM 25
    3575 GGLPE 25
    4040 GLPV 25
    3631 KSLMP 25
    2601 NGLNR 25
    4041 SHMK 25
    3477 TPLNI 25
    3965 TVLAP 25
    4042 VLLME 25
    3431 AGLLG 24
    4043 GALPR 24
    4044 GKLIL 24
    3882 GKLML 24
    3604 GVLLV 24
    4045 KQLTD 24
    4046 LKLIG 24
    3636 LRLIG 24
    4047 LRLMS 24
    3663 NRLGG 24
    4048 PNYWP 24
    4049 RHLVP 24
    4050 SRLGA 24
    3855 AKLRN 23
    4051 DRLAS 23
    3547 DSLGE 23
    3563 ERLAA 23
    106 GKLRI 23
    4052 GSLS 23
    664 HRLGG 23
    4053 MDLLL 23
    4054 MTLGA 23
    4055 PPLER 23
    4056 PVLPG 23
    3674 QELGR 23
    3818 RTLS 23
    4057 SLLQG 23
    2157 TGLRV 23
    3476 TKLMV 23
    3773 VPLER 23
    4058 APLGM 22
    1386 EHLRR 22
    2607 GALVR 22
    2659 GGLNR 22
    3446 GHLLR 22
    4059 GILAK 22
    4060 GMLPD 22
    3597 GPLGV 22
    4061 GSLPM 22
    3602 GSLVK 22
    3166 GVLAR 22
    3634 LGLVG 22
    3637 LTLQR 22
    4062 NGRSPVET 22
    3666 PGLRA 22
    4063 PMLRV 22
    4064 TLML 22
    90 AHLRV 21
    3515 ALLMA 21
    4065 ASLGQ 21
    3870 DGLTG 21
    3267 EGLRG 21
    223 EHLAV 21
    4066 ELILE 21
    4067 GH*RS 21
    4068 GHLAM 21
    3589 GKLRM 21
    4069 GLLP 21
    4070 GTLAI 21
    4071 IRLKK 21
    4072 KELRR 21
    3627 KILPE 21
    4073 LHLPI 21
    3423 MGLVG 21
    3905 MPLLA 21
    4074 NELRG 21
    3462 NVLNK 21
    4075 PHLNG 21
    3464 PLLMP 21
    4076 RLLGS 21
    4077 RTLIS 21
    4078 SC*AS 21
    3708 SDLHV 21
    92 SKLRL 21
    4079 VKLMN 21
    4080 VTLIG 21
    4081 AGLQE 20
    4082 ALHT 20
    4083 DPLVD 20
    E 20
    4084 EALDA 20
    4085 GALAT 20
    4052 GSLS 20
    4086 GTLLM 20
    4087 IKLRP 20
    LQ 20
    NGP 20
    3684 QNLHK 20
    4088 RRLLD 20
    3726 SLLGT 20
    3948 TELIG 20
    4089 TGLMG 20
    4090 TKLLL 20
    4091 TTLGA 20
    4092 VE*DP 20
    3968 VGLLR 20
    4093 AGLGI 19
    4094 AGLLQ 19
    3526 ARLAG 19
    4095 AVLSH 19
    3535 DELRV 19
    4096 DRLAG 19
    4097 ERLSN 19
    4098 ETLM 19
    4099 GELRG 19
    3590 GKLVA 19
    4100 GRLNR 19
    4101 GRLRL 19
    4102 IMLAG 19
    4103 IVLDP 19
    4104 KVLAP 19
    4105 LMLGM 19
    3641 MALTR 19
    4106 MPLRE 19
    4107 RLLGP 19
    3819 SKLRA 19
    4108 SMYRS 19
    4109 THLAK 19
    3762 VELDP 19
    4110 VGLTR 19
    3775 VPLSS 19
    4111 VQLPT 19
    2538 AALRR 18
    4112 AGLD 18
    3517 AMLIM 18
    3519 AMLRG 18
    4113 DVLPG 18
    3562 EQLGP 18
    3393 GHLKR 18
    3880 GILRM 18
    4114 GLLV 18
    4115 GLMN 18
    4116 GMLVG 18
    4117 GPLTI 18
    4118 GRLE 18
    4119 GSLQS 18
    4120 GVLVS 18
    4121 HKLLK 18
    3614 IELVQ 18
    3619 IPLGD 18
    3632 LALGG 18
    3648 MLLSH 18
    4122 MRLKV 18
    4123 MRLRS 18
    4124 MSLSP 18
    4125 PALGG 18
    3665 PGLHG 18
    3673 PTLQR 18
    4126 QPLAG 18
    4127 SK*VV 18
    3842 SKLRG 18
    4128 TLIN 18
    4129 TLLTP 18
    4130 DALME 17
    4131 EALNK 17
    4132 EGLPT 17
    4133 ELLKS 17
    4134 GELTD 17
    3884 GKLRQ 17
    3161 GMLRR 17
    4135 GPLVS 17
    4136 GQLMM 17
    4137 GQLVG 17
    4138 KGLEG 17
    4139 QGLDN 17
    4140 RALVS 17
    4141 RGLAT 17
    3426 SKLMV 17
    3800 SKLVV 17
    3729 SRLMA 17
    4142 TLHE 17
    2168 TRLRG 17
    3864 ARLRI 16
    201 EHLRV 16
    4143 GHLKS 16
    4144 GLLKH 16
    3890 GRLNA 16
    4145 GVLSI 16
    4146 GVLST 16
    3607 HALRT 16
    3900 KKLVR 16
    3638 LVLRR 16
    4147 MPLVP 16
    3661 NLLPT 16
    4148 PKLQP 16
    4149 PVLMG 16
    4150 QALIG 16
    4151 RGLIT 16
    3691 RGLTA 16
    3705 RRLVN 16
    4152 RVQD 16
    3725 SLLEE 16
    4153 TELPM 16
    TGL 16
    3751 TMLGG 16
    3776 VPLVP 16
    4154 APLDL 15
    4155 ARLGR 15
    4156 DALSA 15
    4157 EGLAG 15
    50 GGLVR 15
    4158 GGLVS 15
    3363 GHLRV 15
    3815 GKLIV 15
    3595 GLLRG 15
    4159 GMLGT 15
    4160 GPLLG 15
    4161 HIRSH 15
    3457 IGLQR 15
    4162 IMLV 15
    3897 IRLAL 15
    304 KALGT 15
    3898 KALRG 15
    4163 LHLQG 15
    4164 MELMT 15
    4165 MPLGG 15
    4166 PGLAD 15
    4167 PTLEV 15
    4168 RQLGM 15
    4169 RVLRG 15
    2525 SGLLR 15
    4170 SVLRV 15
    3733 TGLGL 15
    4171 TVLAG 15
    4172 VGLA 15
    4173 VGLRG 15
    3770 VMLKD 15
    3774 VPLNT 15
    2994 VTLGR 15
    WR 15
    A 14
    4174 AALHH 14
    3490 AHLKA 14
    4175 ALLGV 14
    3525 AQLVD 14
    4176 ARLHA 14
    4177 DGLG 14
    4178 DHLVG 14
    4179 DILRG 14
    4180 DQLVE 14
    4181 DQLVG 14
    4182 EKLMM 14
    4183 ELLTP 14
    3564 ERLGR 14
    4184 GALRS 14
    3445 GGLTM 14
    3583 GKLHE 14
    4185 GKLNI 14
    3406 GKLVL 14
    4186 GRLLE 14
    3628 KKLLE 14
    3458 KTLGV 14
    4187 MALPE 14
    3653 MRLMG 14
    4188 NDALQYES 14
    3662 NRLES 14
    3461 NSLTR 14
    4189 PKLRS 14
    4190 PRLPP 14
    4191 PVLKL 14
    4192 QKLAN 14
    4193 QKLKL 14
    4194 RALPK 14
    3697 RILPR 14
    4195 THLGR 14
    3753 TMLPG 14
    4196 VALGT 14
    4197 VKLHE 14
    4198 VTLG 14
    4199 ARLLG 13
    4200 ARLTG 13
    4201 ASLGA 13
    4202 DLLSG 13
    3545 DNLRE 13
    4203 EALTI 13
    3551 EELMM 13
    4204 ETLS 13
    4205 GALGS 13
    3381 GHLRT 13
    4206 GPLVL 13
    4207 GRLGA 13
    4208 GRSYMA 13
    4209 GVLGS 13
    4210 HPLLV 13
    4211 ITLSP 13
    3642 MGLDP 13
    4212 MLLNG 13
    4213 MRLAE 13
    4214 NMLSR 13
    4215 PGLGG 13
    4216 PGLVP 13
    3670 PLLKS 13
    3468 SHLRV 13
    4217 SRLGV 13
    2469 TGLAR 13
    4218 TLMG 13
    4219 TRLMM 13
    TRLREHIRSHT
    4220 GERPF 13
    4221 VELGP 13
    4222 VHLAR 13
    4223 VKLVG 13
    3486 AGLAA 12
    4224 APLRV 12
    4225 EALV 12
    4226 EVLPE 12
    4227 GALMN 12
    4228 GLQA 12
    4229 GLTG 12
    4230 GTLGD 12
    4231 HLLGP 12
    4232 LKLKL 12
    4233 MALRK 12
    4234 MVLTG 12
    4235 NGLIE 12
    4236 NKLVV 12
    4237 PALNV 12
    4238 PMLRL 12
    4239 PQLLG 12
    4240 PVLRV 12
    4241 QPLKR 12
    3924 RGLDM 12
    4242 RGLEN 12
    3700 RLLGA 12
    4243 RRLMV 12
    2486 SGLVR 12
    4244 SPLSG 12
    3728 SQLLE 12
    4245 SRLGR 12
    4246 TGLVG 12
    3403 THLRR 12
    3809 TKLRM 12
    4247 TKLVM 12
    4248 TLLG 12
    4249 TMLPR 12
    4250 TNLRL 12
    4251 TPLGE 12
    4252 TPLVG 12
    4253 TRLLT 12
    4254 VGLGR 12
    4255 VKLQ 12
    3768 VLLKS 12
    4256 AGLML 11
    3398 AKLTL 11
    3521 ANLSN 11
    4257 ARLLT 11
    2880 ATLLR 11
    4258 EGLGG 11
    4259 EGLHL 11
    3333 GHLKM 11
    3889 GRLAV 11
    4260 GVLG 11
    4261 LGLEG 11
    4262 LNLQP 11
    4263 LRLRT 11
    4264 MELGD 11
    4265 MLLQR 11
    4266 MLPP 11
    4267 MSLGG 11
    4268 PKLII 11
    4269 PNLQT 11
    4270 PPLLS 11
    4271 PTLGM 11
    4272 QKLMT 11
    3687 QTLAE 11
    3701 RLLMP 11
    4273 RRLVG 11
    4274 SNLIM 11
    3730 STLLM 11
    3738 THLAR 11
    4275 TLTM 11
    4276 TRLGG 11
    3478 TRLQK 11
    4277 VGLLA 11
    4278 VKLRM 11
    4279 VLLGG 11
    4280 VQ*GG 11
    3777 VRLEE 11
    4281 AGLSG 10
    4282 AGLTE 10
    4283 AGLVA 10
    4284 ALSA 10
    4285 ATLMK 10
    2468 DGLAR 10
    206 DHLNV 10
    4286 EALAI 10
    4287 EELVE 10
    4288 EMLIP 10
    4289 EPLAA 10
    4290 ERLQE 10
    3878 GGLKD 10
    3588 GKLRA 10
    3591 GKLVV 10
    4291 GMLRV 10
    4292 GPLME 10
    4293 GVLSP 10
    4294 IKLMG 10
    4295 IPLNR 10
    4296 MLLKG 10
    4297 MRLPR 10
    4298 MSLRE 10
    3918 PNLAG 10
    4299 PPLMV 10
    4300 PTLGV 10
    4301 RGLRN 10
    3692 RGLVV 10
    4302 RSLIV 10
    4303 RTLGE 10
    4304 SSLGV 10
    3947 TALIS 10
    4305 TGLGT 10
    3344 THLRL 10
    3822 TKLAV 10
    4306 TKLLG 10
    4307 TLIG 10
    4308 TNLLR 10
    4309 TTLGG 10
    4310 VILGA 10
    3972 VNLLE 10
    3481 AALES 9
    4311 AALGL 9
    4312 AELMR 9
    4313 AGLDG 9
    1988 AGLVR 9
    3534 DELMR 9
    4314 DSLVI 9
    4315 EKLKA 9
    3798 EKLRV 9
    4316 GKLIA 9
    4317 GNLVT 9
    4318 GRLLI 9
    4319 GRLRS 9
    3239 GSLIR 9
    2554 GTLKR 9
    4320 HELMK 9
    4321 KMLGG 9
    4322 LGLIQ 9
    4323 LKLER 9
    4324 LPLNG 9
    4325 MGLGV 9
    3658 MVLAG 9
    3909 MVLVG 9
    2540 NGLAR 9
    3668 PILLQ 9
    4326 PMLTV 9
    4327 PPLII 9
    4328 QRLVE 9
    3698 RKLIV 9
    4329 RKLKE 9
    4330 RRLHE 9
    4331 RVLGA 9
    2532 SALAR 9
    4332 SC*RP 9
    4333 SGLDA 9
    4334 SQLDR 9
    2507 TGLLR 9
    3952 TKLKT 9
    4335 TSLTE 9
    2342 AGLKM 8
    4336 AGLRS 8
    4337 AHLGQ 8
    3493 AHLR 8
    4338 ALME 8
    2875 ASLRR 8
    1995 DALDR 8
    4339 DGLHG 8
    4340 DGLLQ 8
    3550 EELGL 8
    4341 EKLRS 8
    3876 EQLMT 8
    4342 ERLAR 8
    3569 GALGR 8
    4343 GELKA 8
    2295 GGLVV 8
    3341 GHLRM 8
    4344 GLML 8
    4345 GLQN 8
    4346 GLTA 8
    4347 GMLGE 8
    4348 GPLRR 8
    4349 GVLDT 8
    4350 GVLNT 8
    4351 IQLAD 8
    4352 KGLTM 8
    4353 MELGN 8
    4354 MPLMR 8
    3657 MTLSD 8
    4355 NGLAM 8
    4356 NGLQD 8
    4357 NTLDV 8
    4358 PHLSM 8
    4359 PILLG 8
    4360 PVLQG 8
    4361 QGLGG 8
    4362 QKLQI 8
    4363 QPLIA 8
    3926 RGLVA 8
    3727 SLLNG 8
    4364 SRLTD 8
    4365 TLLGD 8
    4366 TRSHSSV 8
    3024 TSLTR 8
    4367 TTLGD 8
    4368 VKLAP 8
    3973 VQLPV 8
    3367 AALRK 7
    159 AHLKK 7
    4369 AKLHP 7
    4370 AVLEN 7
    3571 GDLSG 7
    4371 GELGV 7
    187 GKLVT 7
    3593 GLLLD 7
    3594 GLLMG 7
    4372 GLMA 7
    4373 GLNR 7
    4374 GLVV 7
    4375 GPLPV 7
    4376 GSLTQ 7
    4377 GVLRG 7
    4378 HPLAV 7
    4379 HTLGM 7
    4380 IQLGG 7
    4381 KLLGD 7
    3630 KNLIK 7
    4382 MALAR 7
    4383 MELEP 7
    4384 MGLAN 7
    3643 MGLGE 7
    4385 MPLDG 7
    4386 NVLGR 7
    4387 PGLPE 7
    4388 PHLQN 7
    4389 PRLGS 7
    4390 PSLLV 7
    4391 PTLAR 7
    4392 QMLER 7
    4393 RDLGS 7
    4394 RGLGN 7
    4395 RLLEK 7
    3703 RMLVP 7
    4396 SVLSG 7
    4397 TGLVN 7
    4398 TLA*SH 7
    4399 TRLHT 7
    3967 VALTK 7
    3771 VMLMG 7
    4400 VVLAG 7
    4401 AGLVG 6
    3315 AKLKL 6
    4402 AR*PS 6
    1945 ARLKV 6
    2005 DGLLR 6
    4403 DKLHR 6
    2203 DKLKV 6
    4404 ERLPV 6
    4405 GDLVE 6
    4406 GELGE 6
    4407 GGLMQ 6
    4408 GLLT 6
    4409 GLPG 6
    4410 GSLRT 6
    4411 GTLQV 6
    4412 GVLKS 6
    4413 HGLVN 6
    4414 IELGR 6
    4415 KPLEL 6
    4416 MKLE 6
    3664 NTLPK 6
    4417 PALMR 6
    303 PHLVV 6
    4418 PPLVV 6
    4419 QALVP 6
    4420 QELGG 6
    3370 QHLRR 6
    4421 QTLGV 6
    4422 RILEP 6
    4423 RLLMN 6
    4424 RPLVG 6
    4425 RRLEP 6
    4426 SGLRA 6
    4427 SKLMA 6
    3940 SKLSV 6
    4428 TMLEP 6
    4429 TRSQ 6
    4430 VALRK 6
    4431 VDLSG 6
    4432 VMLLG 6
    4433 VPLSE 6
    2718 AGLDR 5
    4434 ARLPV 5
    4435 ARYGC 5
    1909 ATLKV 5
    2317 DGLRA 5
    4436 ERLLQ 5
    4437 ETLMG 5
    4438 GHLML 5
    4439 GHLQG 5
    4440 GKLMV 5
    4441 GPLG 5
    4442 GPLTM 5
    4443 GQLV 5
    4444 GSLTL 5
    4445 GTLRA 5
    4446 GTLTG 5
    3310 HHLTK 5
    4447 IVLVR 5
    4448 MALVR 5
    4449 MELGK 5
    4450 MGLEG 5
    4451 MGLMA 5
    4452 MPLNR 5
    4453 NMLGG 5
    4454 NPLEL 5
    4455 NSLGG 5
    4456 PRLLQ 5
    4457 PRLVK 5
    2953 RGLVR 5
    4458 RHLRS 5
    4459 RSLVV 5
    RSPV*ERMWI
    4460 LRA 5
    4461 RTLNA 5
    4462 TELN 5
    4463 VKLRA 5
    4464 VLLQD 5
    4465 VMLG 5
    4466 AGLNG 4
    4467 AHLRM 4
    3414 AKLRA 4
    4468 AR*RA 4
    4469 ARLPE 4
    4470 AVLNK 4
    DALQYESECG
    4471 GLNH 4
    3030 DTLLR 4
    4472 EGLRD 4
    4473 ESLMG 4
    G 4
    4474 GELV 4
    4475 GGLRP 4
    158 GHLKK 4
    3584 GKLKA 4
    4476 GLIG 4
    4477 GLIS 4
    4478 GLLGN 4
    4479 GMLVN 4
    4480 GPLED 4
    4481 GPLQA 4
    4482 GTLTV 4
    4483 GVLGI 4
    4484 IDLGM 4
    4485 IELGG 4
    4486 IGLAT 4
    4487 KKLMP 4
    4488 KLLGE 4
    4489 KLLLG 4
    3629 KMLPP 4
    4490 MGLTL 4
    4491 MNLGM 4
    4492 MPLMV 4
    3650 MPLRA 4
    3651 MQLGG 4
    2085 MRLRM 4
    4493 PALTV 4
    4494 PGLAL 4
    4495 PGLMG 4
    4496 PHLMS 4
    4497 PQLSA 4
    4498 PRLKA 4
    4499 QKLIR 4
    4500 RELGV 4
    4501 RGLHQ 4
    4502 RGLIG 4
    4503 RGLMG 4
    4504 RTRSH 4
    4505 SQLDT 4
    4506 TELGG 4
    163 THLKK 4
    3309 THLRA 4
    4507 TKLGV 4
    4508 TMLEG 4
    4509 VSLGV 4
    4510 VSLTA 4
    4511 VSLVG 4
    1986 AGLKR 3
    4512 AGLQN 3
    4513 AGLRV 3
    3516 ALLRR 3
    4514 ARLRT 3
    4515 ASLQK 3
    4516 ASLR 3
    2772 ATLSR 3
    4517 DILGE 3
    4518 EELRM 3
    4519 EGLTG 3
    4520 EMLKE 3
    4521 ESLLG 3
    3565 ESLMA 3
    4522 ETLAG 3
    4523 EVLVQ 3
    2521 GALKR 3
    2745 GGLGR 3
    162 GHLRK 3
    4524 GKLRS 3
    4525 GLKT 3
    4526 GLLGV 3
    4527 GMLLP 3
    4528 GMLSG 3
    3887 GPLMG 3
    4529 GRLAP 3
    4530 GSLLR 3
    4531 GTLTM 3
    GVI 3
    4532 ILLQQ 3
    4533 KLLQM 3
    4534 LGLPG 3
    4535 MELVL 3
    4536 MGLAG 3
    4537 MGLPV 3
    3644 MGLQN 3
    4538 MQLAD 3
    4539 MSLLR 3
    4540 MSLPE 3
    4541 NGLKQ 3
    2504 NGLQR 3
    4542 NGRSPV*E 3
    4543 NPLSR 3
    4544 NQLVA 3
    4545 NTLGL 3
    4546 PRLRV 3
    4547 PVLLM 3
    4548 PVLTG 3
    3314 QHLRK 3
    4549 QQLL 3
    4550 RGLVN 3
    4551 RHLVV 3
    4552 RLLAE 3
    4553 RLLPG 3
    4554 RPLIT 3
    4555 RVLMN 3
    4556 RVLQR 3
    2580 SGLER 3
    161 TKLKL 3
    4557 TLLPG 3
    110 TRLRE 3
    3249 TSLER 3
    4558 VGLPA 3
    4559 VPLRP 3
    4560 VRLMP 3
    4561 VSLGE 3
    4562 AALTK 2
    4563 AALVK 2
    4564 AHLTP 2
    4565 AILRT 2
    4566 AKLNS 2
    3853 AKLNV 2
    3509 AKLRG 2
    4567 ALLGA 2
    4568 ARLLR 2
    3528 ARLRA 2
    4569 DVLG 2
    4570 EELQS 2
    3552 EGLVE 2
    4571 ELLGP 2
    4572 ERMC 2
    4573 EVLAG 2
    4574 GALGE 2
    4575 GDLVP 2
    4576 GELRI 2
    4577 GGLEL 2
    4578 GHLSP 2
    4579 GKLEA 2
    4580 GKLKR 2
    2912 GKLRR 2
    4581 GKLVI 2
    4582 GLHQ 2
    4583 GLLR 2
    4584 GLMV 2
    4585 GLTL 2
    117 GNLVR 2
    4586 GPLVG 2
    4587 GQLVD 2
    4588 GRLSV 2
    4589 GVLAV 2
    3609 HGLTG 2
    4590 HVLEL 2
    4591 IELEM 2
    4592 IGLQA 2
    4593 KGLGN 2
    4594 KILPV 2
    4595 KPLPG 2
    4596 KSLRM 2
    4597 KTLGT 2
    4598 LGLAA 2
    4599 LGLGG 2
    4600 LVLQE 2
    4601 MGLAS 2
    4602 MLLEE 2
    771 MLPA 2
    3652 MRLAR 2
    4603 MSLRQ 2
    4604 MTLGT 2
    4605 NGLIV 2
    4606 NHLRM 2
    NLA 2
    4607 PALIM 2
    4608 PGLAG 2
    4609 PLLRA 2
    4610 PPLDG 2
    4611 PPLIM 2
    4612 PPLLG 2
    4613 PQLTE 2
    4614 PVLDG 2
    4615 QGLTT 2
    4616 QRLAV 2
    4617 RELGG 2
    4618 RGLDG 2
    4619 RGLTE 2
    4620 RHLGA 2
    4621 RSLMI 2
    4622 RSLRP 2
    3721 SKLGA 2
    4623 SKLGE 2
    T*LT 2
    2443 TALKV 2
    4624 THLR 2
    1864 TRLKV 2
    4625 TRLPP 2
    4626 VELGD 2
    3763 VELVN 2
    2459 VGLGG 2
    4627 VGLKD 2
    4628 VKLHV 2
    4629 VKLLS 2
    4630 VQLTK 2
    4631 VRLK 2
    4632 VRLPP 2
    4633 AALEN 1
    4634 AALGP 1
    4635 AALGT 1
    4636 AALKI 1
    4637 AALMN 1
    4638 AALMQ 1
    2865 AALMR 1
    4639 AALRV 1
    4640 AALSS 1
    4641 AELGP 1
    4642 AELRA 1
    3485 AELRI 1
    4643 AGIAA 1
    4644 AGILQ 1
    4645 AGLDS 1
    4646 AGLG 1
    4647 AGLGG 1
    4648 AGLGN 1
    4649 AGLGP 1
    4650 AGLGQ 1
    4651 AHFRV 1
    4652 AHLRG 1
    4653 AHLRP 1
    4654 AKFRM 1
    4655 AKLE 1
    4656 AKLGE 1
    4657 AKLGL 1
    4658 AKLHA 1
    3504 AKLKG 1
    4659 AKLLG 1
    4660 AKLML 1
    4661 AKLQP 1
    3854 AKLRF 1
    4662 AKLRQ 1
    4663 AKLS 1
    4664 AKLTN 1
    4665 AKLWL 1
    4666 ALDA 1
    4667 ALIM 1
    4668 ALKG 1
    4669 ALLGE 1
    4670 ALLRS 1
    4671 ALTG 1
    4672 ALTR 1
    4673 AMLPD 1
    4674 AMLR 1
    4675 APLAG 1
    4676 APLGP 1
    4677 AQLAD 1
    4678 AQLLL 1
    4679 AR*RG 1
    4680 ARLAA 1
    3527 ARLGT 1
    4681 ARLMS 1
    4682 ARLRS 1
    4683 ARLTE 1
    4684 ARYGR 1
    4685 ASLGP 1
    4686 ASLRP 1
    4687 AT*RS 1
    4688 ATLAK 1
    4689 ATLEV 1
    4690 ATLKI 1
    4691 ATLMG 1
    4692 ATLNM 1
    4693 ATLNV 1
    4694 AVIG 1
    4695 CGLGR 1
    4696 DALQP 1
    1999 DALTV 1
    4697 DELM 1
    4698 DELMN 1
    4699 DELRA 1
    4700 DGLE 1
    4701 DGLEK 1
    3536 DGLES 1
    4702 DGLML 1
    DGLTGHIRSHT
    4703 GERPF 1
    4704 DGVAM 1
    4705 DHLVD 1
    4706 DILG 1
    4707 DILRT 1
    2348 DKLKG 1
    4708 DKLMM 1
    4709 DLLA 1
    4710 DLLAR 1
    103 DNLRV 1
    4711 DRLAA 1
    4712 DRLGG 1
    4713 DSLPE 1
    4714 DSLV 1
    3874 DVLRG 1
    4715 DYLNV 1
    4716 EALA 1
    4717 EALKV 1
    4718 EALMV 1
    4719 EALTN 1
    4720 EELAP 1
    EELMMHIRSH
    4721 TGERPF 1
    EELVEHIRSHT
    4722 GERPF 1
    3377 EHLRL 1
    3349 EHLVR 1
    4723 EKLIV 1
    3353 EKLKV 1
    4724 ELLAR 1
    4725 ELLPS 1
    4726 EMLVA 1
    4727 EQLGT 1
    4728 ERLAV 1
    93 ERLRV 1
    4729 ETLNS 1
    4730 ETSSH 1
    4731 EVLAV 1
    3567 EVLGI 1
    4732 EVLIQ 1
    4733 EVLQE 1
    4734 GALGL 1
    4735 GALGV 1
    4736 GALIS 1
    4737 GALMQ 1
    4738 GALRD 1
    4739 GALRG 1
    4740 GAVMN 1
    4741 GE*GI 1
    4742 GELKV 1
    4743 GELML 1
    4744 GELMR 1
    4745 GELRV 1
    4746 GELTG 1
    4747 GFLAR 1
    4748 GGFRD 1
    4749 GGLA 1
    4750 GGLAE 1
    368 GGLGA 1
    4751 GGLGE 1
    4752 GGLGP 1
    4753 GGLHP 1
    1957 GGLKV 1
    4754 GGLMD 1
    4755 GGLMT 1
    4756 GGLNI 1
    2357 GGLRG 1
    4757 GGLRL 1
    4758 GGLSG 1
    4759 GGLVG 1
    4760 GGVGL 1
    4761 GHLAI 1
    4762 GHLQC 1
    3159 GHLQR 1
    3330 GHLRR 1
    4763 GHLSV 1
    3448 GHLVG 1
    3316 GHLVK 1
    4764 GILAR 1
    4765 GILSG 1
    4766 GKLAI 1
    4767 GKLGG 1
    4768 GKLIG 1
    4769 GKLII 1
    4770 GKLIT 1
    GKLKMHIRSH
    4771 TGERPF 1
    4772 GKLLK 1
    4773 GKLNA 1
    4774 GKLPT 1
    4775 GKLQA 1
    3587 GKLR 1
    3588 GKLRA 1
    4776 GKLRE 1
    4777 GKLT 1
    4778 GKLTM 1
    4779 GLAA 1
    4780 GLIV 1
    4781 GLLEK 1
    4782 GLLGG 1
    4783 GLLMV 1
    3364 GLLPG 1
    4784 GLLQD 1
    4785 GLLTG 1
    4786 GLSG 1
    4787 GLSGR 1
    4788 GLSV 1
    4789 GLVN 1
    4790 GLVQ 1
    4791 GMLAG 1
    4792 GNLSN 1
    727 GPLA 1
    4793 GPLKP 1
    4794 GPLRP 1
    4795 GPLVP 1
    4796 GQLGP 1
    4797 GQLLE 1
    4798 GR*ML 1
    4799 GRLGG 1
    4800 GRLLG 1
    4801 GRLMP 1
    4802 GRLVS 1
    4803 GRYGC 1
    3279 GSLRV 1
    4804 GSLSK 1
    4805 GSLSP 1
    4806 GTLKL 1
    4807 GTLLL 1
    2685 GTLLV 1
    4808 GTLMT 1
    2192 GTLRV 1
    4809 GTLTE 1
    4810 GVIN 1
    GVL 1
    4811 GVLDN 1
    4812 GVLE 1
    4813 GVLKD 1
    3454 GVLQK 1
    4814 GVLRL 1
    4815 GVLSG 1
    2220 GVLTG 1
    4816 GVMN 1
    4817 GVPV 1
    4818 HELMR 1
    4819 HLLVP 1
    4820 HPLDR 1
    4821 HPLLS 1
    4822 HPVKE 1
    4823 HTLKM 1
    4824 HTLLK 1
    4825 HTLNI 1
    3178 HTLNK 1
    4826 HTLRP 1
    4827 IALPG 1
    4828 IELAL 1
    4829 IELG 1
    4830 IELHL 1
    4831 IGIQR 1
    4832 IGLGA 1
    4833 IGLRL 1
    4834 IHLAG 1
    4835 IHLRM 1
    4836 IKLTG 1
    4837 IMLPR 1
    4838 IQLMG 1
    4839 IQLRL 1
    4840 IRLAA 1
    4841 IRLGP 1
    3338 IRLGV 1
    4842 IRLRR 1
    4843 ISLVG 1
    4844 ITLMV 1
    4845 ITLRG 1
    4846 ITLRP 1
    4847 ITLVG 1
    4848 IVLPG 1
    KG 1
    4849 KGLAT 1
    4850 KGLDL 1
    4851 KGLMR 1
    4852 KGRSPVET 1
    4853 KIIV 1
    4854 KILLA 1
    4855 KKLAG 1
    4856 KKLGV 1
    4857 KKLRI 1
    4858 KLLAG 1
    4859 KLLRV 1
    4860 KPLAA 1
    4861 KPLMV 1
    4862 KRLEG 1
    4863 KSLVG 1
    4864 KTLEG 1
    4865 KTLRG 1
    2404 KTLRV 1
    4866 KTLVG 1
    4867 KVLPV 1
    4868 LAHGT 1
    4869 LGLGP 1
    4870 LGLGV 1
    4871 LKVKL 1
    4872 LNLHT 1
    4873 LRLIM 1
    4874 LRVIG 1
    4875 LSLSG 1
    4876 LTLQQ 1
    4877 LVLRG 1
    4878 MALRG 1
    4879 MELIG 1
    4880 MGLRV 1
    4881 MLAA 1
    4882 MLLIS 1
    4883 MLLLP 1
    4884 MLLMV 1
    4885 MLLPP 1
    4886 MLLPV 1
    4887 MLLV 1
    4888 MLLVG 1
    4889 MLVG 1
    4890 MMLDP 1
    4891 MPLGA 1
    4892 MPLGL 1
    4893 MPLLG 1
    4894 MRLEE 1
    4895 MRLGA 1
    4896 MRLGG 1
    4897 MRLGR 1
    3654 MRLVG 1
    4898 MSLHG 1
    4899 MSLQQ 1
    4900 MTLER 1
    MVL 1
    4901 MVLMN 1
    4902 MVLNT 1
    4903 MVLRG 1
    4904 MVLVT 1
    4905 MVVAS 1
    4906 NDALQYD 1
    NDALQYESEC
    4907 GP 1
    4908 NELLR 1
    4909 NELMR 1
    4910 NELRV 1
    4911 NGLG 1
    NGLIVHIRSHT
    4912 GERPF 1
    NGR 1
    4913 NGRPPG*E 1
    4914 NGRSPVR 1
    4915 NILMG 1
    4916 NKLAR 1
    4917 NKLRA 1
    4918 NKLRG 1
    4919 NKLVA 1
    4920 NKLVK 1
    4921 NMLGV 1
    4922 NNLIN 1
    1838 NRLRE 1
    4923 NRLRI 1
    4924 NSLV 1
    4925 NSLVA 1
    NVHP*VVGLA
    4926 A 1
    4927 NVLGE 1
    4928 PALAG 1
    4929 PALGP 1
    4930 PALV 1
    4931 PASV 1
    4932 PDLRA 1
    4933 PGITE 1
    4934 PGLAP 1
    4935 PGLHE 1
    4936 PGVAA 1
    4937 PGVVP 1
    4938 PHLKR 1
    4939 PKLIF 1
    4940 PLRG 1
    4941 PMLAG 1
    4942 PMLTM 1
    4943 PNLAS 1
    3786 PNLAV 1
    3919 PNYW 1
    4944 PNYWS 1
    4945 PQLVV 1
    4946 PQSRG*RG 1
    4947 PR*GA 1
    4948 PRLRL 1
    4949 PSFQ 1
    4950 PTLAK 1
    4951 PVLKV 1
    4952 PVLMT 1
    2602 QALKR 1
    4953 QALRG 1
    4954 QALSP 1
    4955 QGLHL 1
    3675 QGLPV 1
    4956 QILLQ 1
    QILLRHIRSHT
    4957 GERPF 1
    4958 QILLY 1
    4959 QILPE 1
    4960 QMLAR 1
    4961 QPLAV 1
    4962 QPLTM 1
    4963 QRLGG 1
    4964 QTLAV 1
    4965 QTLGG 1
    4966 QTLGP 1
    4967 REIVR 1
    4968 RELRR 1
    4969 RGLAA 1
    4970 RGLDN 1
    4971 RGLNS 1
    4972 RGLRS 1
    4973 RGLTG 1
    4974 RGLVE 1
    4975 RGYGT 1
    RHE 1
    4976 RHLKM 1
    4977 RLLGL 1
    4978 RP*SG 1
    4979 RPLAG 1
    4980 RQLGK 1
    4981 RQLLE 1
    4982 RRLEA 1
    4983 RRLET 1
    2126 RRLGD 1
    4984 RRLGS 1
    4985 RRLSE 1
    4986 RRLTP 1
    4987 RRVVG 1
    RSH 1
    4988 RTLKL 1
    4989 RTLVG 1
    4990 RVLEP 1
    4991 RVLRE 1
    SC**A 1
    4992 SCLK 1
    4993 SGILV 1
    4994 SGLGG 1
    4995 SGLGL 1
    4996 SGLGT 1
    4997 SGLLG 1
    4998 SGLNL 1
    4999 SGLRL 1
    5000 SGLVG 1
    3331 SHLRL 1
    3425 SKLIL 1
    2438 SKLKA 1
    3722 SKLKG 1
    5001 SKLLG 1
    3334 SKLRI 1
    2191 SKLRM 1
    3337 SKLVL 1
    5002 SL*HG 1
    5003 SLLRT 1
    5004 SNLTY 1
    5005 SNYWP 1
    5006 SPLIG 1
    5007 SPLKI 1
    5008 SPLRN 1
    2138 SQLKV 1
    5009 SQMK 1
    SR*G 1
    1857 SRLKV 1
    5010 SRLMT 1
    5011 SRLVT 1
    5012 SSLGA 1
    5013 SSLGL 1
    5014 STLQK 1
    5015 SVLVG 1
    5016 SVLVS 1
    T 1
    5017 TALEA 1
    5018 TALKG 1
    5019 TELE 1
    5020 TELIR 1
    5021 TELPR 1
    5022 TELRV 1
    5023 TGLAD 1
    5024 TGLGA 1
    5025 THLAN 1
    5026 THLAV 1
    3318 THLRK 1
    3808 TKIRV 1
    3785 TKLKA 1
    5027 TKLLR 1
    5028 TKLME 1
    3802 TKLNV 1
    3955 TKLR 1
    3783 TKLRA 1
    3361 TKLRI 1
    5029 TKLRR 1
    5030 TKLVL 1
    5031 TKSGV 1
    5032 TLIS 1
    5033 TLLIR 1
    5034 TLLM 1
    5035 TLLMQ 1
    5036 TLNG 1
    5037 TLQP 1
    5038 TMLDP 1
    5039 TMLRE 1
    5040 TNLVG 1
    5041 TPLIV 1
    5042 TPLMQ 1
    5043 TPLSD 1
    5044 TPLSI 1
    5045 TQLED 1
    5046 TRLGA 1
    5047 TRLMI 1
    5048 TRLRL 1
    1883 TRLRV 1
    5049 TRLTG 1
    5050 TSLSE 1
    5051 TTLEP 1
    5052 TTLGV 1
    1849 TTLKV 1
    1919 TTLRV 1
    5053 TVLGG 1
    5054 TVLT 1
    V*KS 1
    5055 VALHT 1
    5056 VDLLL 1
    5057 VELAP 1
    5058 VELN 1
    5059 VELNN 1
    5060 VELRV 1
    5061 VGLPV 1
    5062 VGLQA 1
    2652 VGLQR 1
    5063 VGLRN 1
    5064 VGLRV 1
    5065 VGLSP 1
    5066 VGLSQ 1
    5067 VHLAL 1
    5068 VKLMA 1
    5069 VKLQN 1
    3765 VKLRL 1
    5070 VLLAA 1
    5071 VLLIE 1
    5072 VLLKI 1
    5073 VLLTP 1
    5074 VLMV 1
    5075 VLQR 1
    5076 VMLRG 1
    3772 VPLAL 1
    5077 VPLVG 1
    5078 VQLPM 1
    5079 VQLRV 1
    5080 VRLEG 1
    5081 VRLGG 1
    3778 VRLQA 1
    5082 VRLVR 1
    VTG 1
    5083 VTLER 1
    5084 VTLGS 1
    WRN 1
  • TABLE 22
    ZF4 selection on G:A change at
    nt 11 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    118 GNLRR 3407
    69 ANLRR 1937
    117 GNLVR 1794
    116 SNLRR 1771
    5085 SNLKR 1208
    68 TNLRR 862
    119 GNLKR 850
    138 GNLAR 805
    2582 SNLVR 764
    2609 GNLQR 562
    70 GNLTR 531
    121 NNLRR 486
    2914 GNLIR 475
    2494 ANLVR 455
    2706 GNLNR 373
    2517 GNLLR 360
    2620 ANLKR 326
    2524 SNLAR 269
    2963 SNLQR 261
    139 GNLMR 251
    2695 SNLMR 228
    2746 GNLHR 220
    5086 SNLTR 209
    5087 NNLKR 202
    5088 SNLIR 199
    5089 ANLMR 191
    2621 ANLNR 179
    74 TMLRR 158
    5090 SNLNR 155
    5091 ANLTR 136
    5092 ANLQR 125
    2595 TNLKR 118
    73 AMLRR 111
    2567 GNLSR 107
    2542 ANLAR 102
    66 ATLRR 96
    2558 HNLRR 90
    2538 AALRR 81
    2496 SNLLR 77
    5093 ANLER 73
    2556 SMLRR 62
    5094 ANLHR 59
    5095 ANLLR 58
    3032 SMLKR 51
    2544 SNLSR 47
    2541 TNLQR 47
    2521 GALKR 44
    2641 GALRR 44
    3347 AHLRR 42
    2823 HMLRR 40
    2047 HMLKR 36
    5096 RNLQR 35
    71 AMLKR 31
    2722 GMLKR 31
    3161 GMLRR 29
    2131 SALKR 28
    5097 SNLER 26
    5098 KNLQR 25
    5099 RNLRR 24
    2584 GTLRR 21
    2978 TMLKR 21
    2481 GNLER 20
    5100 QNLKR 19
    67 RRLDR 19
    2638 STLRR 19
    2526 TNLNR 17
    2575 QNLRR 16
    2523 SALRR 16
    2714 TNLHR 16
    2551 ANLIR 15
    1985 AALKR 14
    48 ATLKR 14
    2875 ASLRR 13
    2587 NTLRR 13
    2511 TNLVR 13
    3330 GHLRR 12
    2691 NNLMR 12
    2617 TALKR 12
    5101 KNLER 11
    2518 NNLVR 11
    3403 THLRR 11
    5102 SMLQR 10
    2561 TNLMR 10
    2737 TTLRR 10
    2475 AGLRR 9
    2622 ATLTR 9
    3050 HNLKR 9
    5103 KNLVR 9
    2464 SGLRR 9
    2769 VNLRR 9
    5104 AMLTR 8
    2882 AVLRR 8
    3393 GHLKR 8
    5105 TNLTR 8
    3017 ATLNR 7
    2739 ATLVR 7
    5106 HNLMR 7
    2734 TALRR 7
    4308 TNLLR 7
    5107 AMLQR 6
    52 ANLSR 6
    2509 ASLKR 6
    2876 ASLTR 6
    2801 ATLMR 6
    5108 GMLER 6
    5109 RLLIN 6
    5110 SGLLK 6
    2649 TNLAR 6
    5111 AHLVR 5
    3012 ATLHR 5
    2881 ATLQR 5
    2599 ENLRR 5
    3084 HMLQR 5
    72 HMLTR 5
    5112 ISLRV 5
    2543 NNLAR 5
    3205 SNLHR 5
    2153 STLKR 5
    5113 AHLKR 4
    2879 ATLIR 4
    2623 DNLRR 4
    2592 GALTR 4
    5114 GNLRK 4
    5115 KKLLR 4
    5116 MNLRR 4
    5117 MVLLR 4
    5118 NNLQR 4
    5119 QNLVR 4
    5120 RNLAR 4
    3396 SHLRR 4
    2962 SMLHR 4
    2679 TNLER 4
    5121 TVLLV 4
    2738 AALNR 3
    2770 AALVR 3
    1986 AGLKR 3
    2539 ETLRR 3
    3159 GHLQR 3
    3449 GHLVR 3
    5122 GMLNR 3
    5123 GMLTR 3
    5124 GMLVR 3
    2608 GNLGR 3
    5125 GNLRG 3
    5126 GNLVK 3
    2600 GSLRR 3
    2554 GTLKR 3
    56 HTLRR 3
    3010 HVLRR 3
    5127 KNLRR 3
    5128 MNLKR 3
    3407 NGRSPV... 3
    2712 NMLRR 3
    2757 PNLIR 3
    3370 QHLRR 3
    2956 SALNR 3
    5129 STLEV 3
    2967 STLNR 3
    5130 TALRS 3
    1305 THLKR 3
    5131 TNLIR 3
    2700 AALTR 2
    5132 AMLNR 2
    5133 ANLRL 2
    5134 ANLRW 2
    2654 ATLAR 2
    5135 DALLV 2
    2528 GGLIR 2
    4764 GILAR 2
    3160 GILRR 2
    GN*S... 2
    2522 GNLDR 2
    5136 GNLNK 2
    5137 GNLRP 2
    5138 GNLRS 2
    5139 GTLIR 2
    3081 GTLMR 2
    2626 GTLVR 2
    5140 HGLET 2
    5141 HMLNR 2
    2644 HNLVR 2
    5142 KNLMR 2
    2637 NNLLR 2
    2756 NSLRR 2
    5143 PGLLG 2
    5144 RNLVR 2
    5145 SMLNR 2
    2677 SMLTR 2
    2487 SNLDR 2
    2850 STLMR 2
    2970 SVLRR 2
    2462 TGLRR 2
    5146 TMLQR 2
    2766 TSLKR 2
    2860 TTLKR 2
    3075 TVLRR 2
    5147 AALRS 1
    5148 ADLER 1
    3089 ADLVR 1
    2798 AGLMR 1
    1431 AHLTR 1
    2871 AILTR 1
    5149 AMLAR 1
    5150 AMLHR 1
    5151 AMLIR 1
    5152 ANFRR 1
    5153 ANIQR 1
    5154 ANLDR 1
    2771 ANLGR 1
    5155 ANLVG 1
    5156 ANSRR 1
    5157 ANVRR 1
    5158 APLRR 1
    2799 ASLQR 1
    2880 ATLLR 1
    5159 ATLRS 1
    5160 AYFRR 1
    5161 CNLAR 1
    5162 CNLNR 1
    5163 CNLVR 1
    2591 DNLKR 1
    2506 DNLVR 1
    2778 GALNR 1
    3035 GDLAR 1
    2816 GDLRR 1
    2780 GDLTR 1
    2027 GGLKR 1
    2461 GGLRR 1
    2909 GGVRR 1
    5164 GHLNR 1
    5165 GNFRR 1
    5166 GNFVG 1
    5167 GNLAG 1
    5168 GNLAS 1
    5169 GNLHK 1
    5170 GNLLS 1
    5171 GNLMS 1
    5172 GNLNH 1
    5173 GNLQS 1
    5174 GNLRH 1
    5175 GNLS... 1
    5176 GNLTK 1
    5177 GNLTQ 1
    5178 GNLTW 1
    5179 GNLVW 1
    5180 GNLWR 1
    5181 GNSKR 1
    5182 GNSQR 1
    5183 GNSRR 1
    5184 GNVQR 1
    5185 GNVTR 1
    5186 GQLAL 1
    2819 GSLKR 1
    2747 GTLNR 1
    5187 GY*LR 1
    2661 HNLAR 1
    2752 HNLQR 1
    5188 ITLQR 1
    5189 KILGN 1
    5190 KNLKR 1
    1356 KNLTR 1
    5191 KSLRR 1
    5192 LNLRR 1
    5193 LNLVR 1
    2664 NMLKR 1
    2690 NNLIR 1
    5194 NNLNR 1
    2726 NNLTR 1
    5195 NNSRR 1
    2788 NTLAR 1
    2939 NTLIR 1
    2628 NTLKR 1
    2940 NTLNR 1
    5196 PRLRG 1
    5197 QHLKR 1
    2574 QMLKR 1
    2593 QTLRR 1
    5198 RLIIN 1
    5199 RNLKR 1
    3292 SALQR 1
    2559 SGLKR 1
    5200 SHLKR 1
    3202 SILNR 1
    5201 SKLTR 1
    2647 SMLIR 1
    5202 SMLVR 1
    5203 SNLFR 1
    5204 SNLIH 1
    5205 SNLRK 1
    5206 SNLRQ 1
    5207 SNLSG 1
    5208 SNLTS 1
    5209 SNLVW 1
    5210 SNSRR 1
    5211 SNVKR 1
    5212 SNVRG 1
    2698 STLVR 1
    5213 TMFRR 1
    3109 TMLNR 1
    2680 TNLGR 1
    5214 TNLLS 1
    5215 TPTRS 1
    5216 TQLVL 1
    2589 TSLRR 1
    5217 VNLTR 1
    2997 VTLRR 1
  • TABLE 23
    ZF4 selection on G:C change at
    nt 11 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID
    NO: Sequence Read #
    73 AMLRR 3064
    74 TMLRR 2212
    2556 SMLRR 1556
    3161 GMLRR 1320
    2722 GMLKR 1160
    3032 SMLKR 1049
    71 AMLKR 797
    2978 TMLKR 515
    2823 HMLRR 478
    2047 HMLKR 429
    66 ATLRR 261
    5102 SMLQR 248
    5107 AMLQR 212
    5132 AMLNR 125
    5104 AMLTR 124
    5146 TMLQR 123
    2712 NMLRR 119
    2664 NMLKR 102
    2677 SMLTR 98
    72 HMLTR 93
    5123 GMLTR 88
    5150 AMLHR 72
    5122 GMLNR 68
    2962 SMLHR 63
    5145 SMLNR 59
    48 ATLKR 58
    5124 GMLVR 50
    5141 HMLNR 47
    3084 HMLQR 47
    5149 AMLAR 46
    5218 AMLVR 45
    3109 TMLNR 38
    5219 GMLHR 34
    5202 SMLVR 34
    2533 SMLAR 29
    2638 STLRR 27
    2970 SVLRR 27
    67 RRLDR 26
    118 GNLRR 24
    2737 TTLRR 24
    2882 AVLRR 23
    5151 AMLIR 22
    2913 GMLAR 22
    5220 GMLQR 22
    2584 GTLRR 19
    2875 ASLRR 18
    5221 HMLAR 17
    2587 NTLRR 17
    69 ANLRR 16
    2713 QMLRR 16
    3017 ATLNR 15
    2574 QMLKR 15
    5222 RRLKN 15
    5223 AMLMR 14
    2801 ATLMR 14
    5224 GMLIR 14
    5225 EMLRR 13
    117 GNLVR 13
    5226 RTLAL 13
    5227 SMLSR 13
    116 SNLRR 13
    2647 SMLIR 12
    1986 AGLKR 11
    TRS 11
    2739 ATLVR 10
    TRS... 10
    2538 AALRR 9
    3012 ATLHR 9
    2582 SNLVR 9
    5228 TMLTR 9
    68 TNLRR 9
    5229 TMLVR 8
    3075 TVLRR 8
    2027 GGLKR 7
    2914 GNLIR 7
    2609 GNLQR 7
    3407 NGRSPV... 7
    2559 SGLKR 7
    5230 TMLMR 7
    2860 TTLKR 7
    2881 ATLQR 6
    2622 ATLTR 6
    5231 GMLMR 6
    70 GNLTR 6
    2554 GTLKR 6
    5085 SNLKR 6
    2965 SSLKR 6
    5232 AMLER 5
    5233 AMVRR 5
    2494 ANLVR 5
    119 GNLKR 5
    5086 SNLTR 5
    5234 TMLAR 5
    3987 VELNS 5
    2654 ATLAR 4
    2879 ATLIR 4
    2606 EMLKR 4
    138 GNLAR 4
    139 GNLMR 4
    5087 NNLKR 4
    5235 SMLMR 4
    2153 STLKR 4
    2462 TGLRR 4
    5093 ANLER 3
    2620 ANLKR 3
    2621 ANLNR 3
    5092 ANLQR 3
    2509 ASLKR 3
    2520 DMLRR 3
    2641 GALRR 3
    2706 GNLNR 3
    5236 HLLRR 3
    5237 HMLHR 3
    3010 HVLRR 3
    5238 KTLRR 3
    LL... 3
    121 NNLRR 3
    2477 SGLTR 3
    5239 SMLKN 3
    3203 SMLLR 3
    2963 SNLQR 3
    2967 STLNR 3
    1985 AALKR 2
    2738 AALNR 2
    3516 ALLRR 2
    5240 AMLLR 2
    5241 AMLRH 2
    5242 AMLRS 2
    5243 AMLRW 2
    5244 AMLSR 2
    5094 ANLHR 2
    2802 AVLKR 2
    5108 GMLER 2
    5245 GMLKN 2
    5246 GMLRW 2
    5247 GMVRR 2
    2600 GSLRR 2
    2921 GVLRR 2
    3039 HILKR 2
    5248 HILRR 2
    5249 HMLRS 2
    3040 HMLVR 2
    2558 HNLRR 2
    56 HTLRR 2
    5250 MGLST 2
    5251 NMLIR 2
    2628 NTLKR 2
    2593 QTLRR 2
    5252 RMLKR 2
    5253 RMLQR 2
    RN*P... 2
    5254 SMFKR 2
    2524 SNLAR 2
    2850 STLMR 2
    5255 TLLRR 2
    5256 TMIRR 2
    5257 TMVRR 2
    5258 VIKR... 2
    5259 AKLQR 1
    3062 ALLKR 1
    5260 AMFRR 1
    5261 AMIRR 1
    5262 AMITR 1
    5263 AMKTR 1
    5264 AMLCR 1
    5265 AMLHS 1
    5266 AMLPR 1
    4674 AMLR... 1
    3519 AMLRG 1
    5267 AMLRK 1
    5268 AMLTM 1
    5269 AMLWR 1
    5270 AMYT... 1
    2542 ANLAR 1
    5271 ARLRR 1
    4682 ARLRS 1
    1947 ARLRV 1
    3251 ASLNR 1
    2878 ATLER 1
    3025 ATLGR 1
    5159 ATLRS 1
    2772 ATLSR 1
    5272 CMLRR 1
    2640 DMLKR 1
    3078 DMLQR 1
    5273 DMVKR 1
    5274 EMLNS 1
    2539 ETLRR 1
    5275 GLLKR 1
    5276 GLLQS 1
    5277 GLLSR 1
    5278 GMIKR 1
    5279 GMLKT 1
    5280 GMLRM 1
    5281 GMLTW 1
    2746 GNLHR 1
    2517 GNLLR 1
    5282 GRLKR 1
    5283 GRLKS 1
    5284 GRLRV 1
    2747 GTLNR 1
    2626 GTLVR 1
    3001 GVLKR 1
    2483 HALRR 1
    2531 HLLKR 1
    5285 HLLNS... 1
    5286 HMLLR 1
    5287 HMLMR 1
    5288 HMVRR 1
    5106 HNLMR 1
    2784 HVLKR 1
    5189 KILGN 1
    5289 KMLKR 1
    5290 LMLGK 1
    5291 MLRR 1
    5292 NLLKR 1
    5293 NMLGR 1
    5294 NTFRR 1
    2939 NTLIR 1
    2940 NTLNR 1
    5295 PMLMR 1
    5296 PVVKR 1
    2692 QSLKR 1
    5297 RMFRR 1
    5298 RMLRR 1
    2956 SALNR 1
    2523 SALRR 1
    2464 SGLRR 1
    3004 SILKR 1
    3470 SKLKR 1
    5201 SKLTR 1
    5299 SLLNR 1
    5300 SMFRR 1
    5301 SMIKR 1
    5302 SMLGR 1
    5303 SMLKW 1
    5304 SMSRR 1
    5305 SMVKR 1
    2496 SNLLR 1
    5090 SNLNR 1
    2792 SQLKR 1
    1876 SRLKR 1
    5306 SRLRR 1
    2845 SSLAR 1
    2698 STLVR 1
    2699 SVLKR 1
    5307 TILRR 1
    5308 TMLER 1
    5309 TMLGR 1
    5310 TMLHR 1
    5311 TMLLR 1
    5312 TMLRH 1
    5313 TMLWR 1
    2595 TNLKR 1
    2856 TNLSR 1
    5215 TPTRS 1
    5314 VMLKR 1
    5315 VSLRK 1
    2997 VTLRR 1
    5316 WMLKR 1
    5317 WMLRR 1
    5318 YMLKR 1
    5319 YMLRR 1
  • TABLE 24
    ZF4 selection on G:T change at
    nt 11 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    66 ATLRR 6399
    67 RRLDR 1155
    2584 GTLRR 1073
    2737 TTLRR 1024
    2638 STLRR 970
    3017 ATLNR 770
    2739 ATLVR 727
    48 ATLKR 708
    2587 NTLRR 670
    2538 AALRR 657
    2801 ATLMR 456
    2654 ATLAR 418
    2554 GTLKR 399
    2875 ASLRR 366
    2622 ATLTR 363
    2593 QTLRR 298
    2539 ETLRR 292
    2881 ATLQR 291
    2879 ATLIR 261
    2153 STLKR 252
    2628 NTLKR 237
    56 HTLRR 227
    2882 AVLRR 208
    2880 ATLLR 171
    1985 AALKR 141
    2878 ATLER 134
    3012 ATLHR 130
    2860 TTLKR 125
    2509 ASLKR 95
    73 AMLRR 93
    3010 HVLRR 81
    2523 SALRR 63
    5248 HILRR 60
    74 TMLRR 59
    2967 STLNR 58
    2131 SALKR 47
    2738 AALNR 46
    2483 HALRR 44
    2641 GALRR 41
    2843 QTLKR 41
    2783 HTLKR 39
    3032 SMIKR 39
    1930 HALKR 36
    2970 SVLRR 36
    2802 AVLKR 35
    2556 SMIRR 34
    3161 GMLRR 33
    2722 GMLKR 31
    2850 STLMR 31
    2698 STLVR 31
    2626 GTLVR 28
    2521 GALKR 27
    2747 GTLNR 27
    2590 TTLQR 27
    2921 GVLRR 25
    118 GNLRR 24
    116 SNLRR 24
    2589 TSLRR 24
    69 ANLRR 23
    2997 VTLRR 23
    2700 AALTR 22
    71 AMLKR 22
    2697 STLQR 22
    5320 ATLRK 21
    117 GNLVR 21
    2823 HNIRR 20
    2772 ATLSR 17
    5321 RTLQR 17
    2734 TALRR 17
    2819 GSLKR 16
    3018 STLIR 16
    2717 AALQR 15
    2800 ASLVR 15
    2849 STLHR 15
    2489 SSLRR 14
    2978 TMLKR 14
    3075 TVLRR 14
    2876 ASLTR 13
    3081 GTLMR 13
    2047 HNIKR 13
    2966 STLLR 13
    2762 STLTR 13
    2681 TTLNR 13
    70 GNLTR 12
    5189 KILGN 12
    68 TNLRR 11
    3864 ARLRI 10
    2502 ETLKR 10
    2600 GSLRR 10
    2684 GTLAR 10
    5322 KTLER 10
    5323 QTLMR 10
    3028 SILRR 10
    5085 SNLKR 10
    2617 TALKR 10
    2799 ASLQR 9
    3001 GVLKR 9
    121 NNLRR 9
    2877 ATLDR 8
    138 GNLAR 8
    2914 GNLIR 8
    5324 KTLQR 8
    5325 RTLRR 8
    5102 SMLQR 8
    2965 SSLKR 8
    1947 ARLRV 7
    2607 GALVR 7
    5139 GTLIR 7
    2784 HVLKR 7
    3067 MTLRR 7
    5086 SNLTR 7
    2582 SNLVR 7
    2620 ANLKR 6
    119 GNLKR 6
    5326 HILNR 6
    5327 MTLMR 6
    2770 AALVR 5
    5107 ANIQR 5
    2609 GNLQR 5
    2940 NTLNR 5
    3027 NTLVR 5
    3196 QTLTR 5
    5328 RTLKR 5
    2666 SALTR 5
    2699 SVLKR 5
    5104 AMLTR 4
    2621 ANLNR 4
    2494 ANLVR 4
    5158 APLRR 4
    3025 ATLGR 4
    5329 ATVRR 4
    2530 DTLRR 4
    3160 GILRR 4
    5122 GMLNR 4
    3033 GTLLR 4
    2707 GTLQR 4
    5330 GVLSR 4
    5331 HRLKI 4
    2830 HTLVR 4
    5332 KTLIR 4
    5238 KTLRR 4
    5087 NNLKR 4
    2756 NSLRR 4
    2939 NTLIR 4
    2677 SMLTR 4
    2524 SNLAR 4
    2963 SNLQR 4
    2550 STLAR 4
    5333 TILAR 4
    2766 TSLKR 4
    2857 TTLAR 4
    2618 TTLMR 4
    3117 AILRR 3
    5089 ANLMR 3
    3090 ASLAR 3
    5334 ASLHR 3
    5335 ATLNK 3
    5336 ATLRG 3
    2583 EALRR 3
    3049 GILKR 3
    5123 GMLTR 3
    2706 GNLNR 3
    4375 GPLPV 3
    5337 GPLVR 3
    3245 GSLSR 3
    72 HMLTR 3
    2827 HSLRR 3
    5338 HVLNR 3
    5339 NSLKR 3
    5340 NTLMR 3
    5341 NVLRR 3
    2950 QTLQR 3
    5342 RRLNR 3
    2956 SALNR 3
    3292 SALQR 3
    2733 SVLTR 3
    1986 AGLKR 2
    2475 AGLRR 2
    1988 AGLVR 2
    5150 AMLHR 2
    5151 AMLIR 2
    5343 ARLKI 2
    3251 ASLNR 2
    3244 ASLSR 2
    5344 ATFRR 2
    5345 ATLNW 2
    5346 ATLRW 2
    2634 ESLRR 2
    3151 ETLVR 2
    2778 GALNR 2
    2815 GALQR 2
    5124 GMLVR 2
    2517 GNLLR 2
    3230 HALTR 2
    5141 HMLNR 2
    2558 HNLRR 2
    2586 HTLMR 2
    2613 HTLQR 2
    5347 IALAG 2
    5348 MSLRR 2
    5349 MTLLR 2
    5350 MTLVR 2
    3407 NGRSPV... 2
    2664 NMLKR 2
    2712 NMLRR 2
    3191 PTLRR 2
    5351 QRLSV 2
    4424 RPLVG 2
    5352 RRIDR 2
    5353 RRLDS 2
    5354 RRVDR 2
    5355 RSLIR 2
    5356 RTLIR 2
    5357 SDLTV 2
    2962 SMLHR 2
    5358 SRLKI 2
    2564 SSLVR 2
    5359 STVRR 2
    2651 TTLTR 2
    2767 TTLVR 2
    57 TVLKR 2
    2546 AALAR 1
    2864 AALLR 1
    5360 AALNS 1
    3367 AALRK 1
    3410 AALRL 1
    5147 AALRS 1
    5361 AAVRR 1
    5259 AKLQR 1
    3510 AKLRR 1
    3062 ALLKR 1
    5149 AMLAR 1
    5132 AMLNR 1
    5218 AMLVR 1
    5094 ANLHR 1
    5092 ANLQR 1
    5091 ANLTR 1
    AP*C... 1
    5362 APLHR 1
    5363 APLKR 1
    5364 APLMR 1
    5365 APLVR 1
    5366 APYP... 1
    5271 ARLRR 1
    2874 ARLTR 1
    5367 ARLVG 1
    5368 ASFRR 1
    5369 ASLER 1
    3250 ASLMR 1
    AT*G... 1
    5370 ATFKR 1
    5371 ATFRT 1
    5372 ATFTR 1
    5373 ATIRR 1
    5374 ATLES 1
    5375 ATLFR 1
    5376 ATLHW 1
    5377 ATLIS 1
    5378 ATLNH 1
    5379 ATLNS 1
    5380 ATLQG 1
    5381 ATLQW 1
    5382 ATLRI 1
    5383 ATLRP 1
    5384 ATLWR 1
    5385 ATSVR 1
    5386 ATVAR 1
    5387 AVLGR 1
    5388 AVLLR 1
    5389 AVLNR 1
    3121 AVLTR 1
    3991 DKLRR 1
    2640 DMLKR 1
    5390 DRLRA 1
    2656 DTLNR 1
    5391 EPLVM 1
    3038 ETLAR 1
    3043 ETLQR 1
    2592 GALTR 1
    2816 GDLRR 1
    2913 GMLAR 1
    139 GNLMR 1
    5392 GPFKR 1
    5393 GPLGL 1
    5394 GPLKR 1
    5395 GSLGA 1
    2781 GSLQR 1
    2660 GSLTR 1
    5396 GTFRR 1
    3014 GTLDR 1
    2917 GTLER 1
    2918 GTLGR 1
    5397 GTLMW 1
    5398 GTLRK 1
    2562 GTLTR 1
    386 GTLVS 1
    5399 GTSNR 1
    5400 GTSRR 1
    5401 GVLRK 1
    5402 GVVRR 1
    2749 HALMR 1
    3246 HALQR 1
    3039 HILKR 1
    5403 HILQR 1
    2578 HTLAR 1
    2689 HTLLR 1
    2828 HTLNR 1
    3180 HTLRG 1
    3181 HTLSR 1
    3099 HVLHR 1
    5404 KTLLR 1
    5405 KTLVR 1
    5406 MALRM 1
    5407 MPLAR 1
    4452 MPLNR 1
    5408 MPLVR 1
    MRS 1
    2833 MTLKR 1
    4923 NRLRI 1
    2788 NTLAR 1
    2837 NTLHR 1
    3015 NTLLR 1
    2941 NTLQR 1
    5409 NTLRW 1
    3006 NTLTR 1
    5410 NTLVS 1
    5411 NTVRR 1
    2942 NVLKR 1
    5412 PPLKR 1
    5413 PSLKR 1
    5414 PTFHR 1
    5415 QKLA... 1
    2574 QMLKR 1
    2692 QSLKR 1
    3195 QTLHR 1
    5416 QTLIR 1
    5417 QTLRQ 1
    3248 QTLVR 1
    RN*P... 1
    5418 RRLAG 1
    5419 RRLAR 1
    5420 RRLDG 1
    5421 RRLHR 1
    5422 RRLVR 1
    5423 RRSDR 1
    5424 RRVEK 1
    5425 RTLER 1
    5426 RTLNR 1
    5427 RTLRG 1
    5428 SAVKR 1
    2559 SGLKR 1
    5201 SKLTR 1
    2647 SMLIR 1
    5145 SMLNR 1
    5304 SMSRR 1
    5088 SNLIR 1
    5429 SPLRR 1
    5430 SRLRI 1
    5431 STLCR 1
    2848 STLER 1
    5432 STLKS 1
    5433 STLRI 1
    5434 STSRR 1
    5435 SVLRK 1
    5436 TALIR 1
    5437 TALMR 1
    2764 TALTR 1
    5146 TMLQR 1
    5438 TMLRG 1
    5131 TNLIR 1
    2595 TNLKR 1
    5439 TPIMM 1
    5215 TPTRS 1
    1883 TRLRV 1
    5440 TRSP... 1
    2858 TTLGR 1
    2859 TTLIR 1
    5441 TTLRS 1
    5442 TVLNR 1
    3308 VSLRR 1
    2995 VTLKR 1
    5443 VTLQR 1
    5444 VVLGN 1
    5445 WRLDR 1
    5446 WTLRR 1
  • TABLE 25
    ZF3 selection on G:A change at
    nt 13 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    81 GQLTV 1094
    5447 GQINV 906
    78 GELVV 766
    5448 AELIV 643
    5449 TELIV 552
    5450 QELLV 528
    5451 GELIV 525
    5452 GELTV 505
    80 GQLIV 476
    5453 QELLT 457
    5454 SELIV 416
    5455 GQLLV 372
    5456 SGLIV 372
    5457 GQLII 361
    5458 AELLV 311
    5459 VELLI 277
    5460 AELVV 271
    5461 AQLIV 267
    76 SQLIV 265
    82 TELII 251
    83 QGLLV 247
    5462 SQLII 243
    79 QQLLI 224
    5463 AGLIV 221
    5464 QELVV 209
    5465 GELLV 206
    86 GELLT 202
    5466 SQLLV 199
    5467 GELVI 194
    75 QQLIV 179
    5468 QELII 177
    5469 TQLIV 176
    5470 VELII 172
    5471 VELLV 160
    5472 GELLI 151
    85 GQLLT 150
    5473 NELLI 149
    5474 GQLLI 148
    5475 SQLLI 140
    5476 AQLLV 136
    5477 GQLIT 132
    5478 GQLTI 129
    5479 TELIT 122
    5480 TELLI 118
    5481 TELLV 116
    5482 QELLI 112
    5483 AGLVV 106
    5484 GSLLV 104
    5485 AQLVV 102
    5486 HPPEE 100
    5487 SQLVV 100
    77 QQLLV 98
    5488 QELIV 95
    5489 SELII 91
    5490 AQLII 90
    5491 QQLVV 90
    5492 TGLLV 88
    5493 NQLII 88
    5494 GQLVI 81
    5495 AGLLV 80
    5496 NQLLV 73
    5497 QELGV 69
    5498 GALVV 68
    5499 SQLTV 67
    5500 GELTT 67
    5501 GELII 65
    3710 SGLLV 63
    5502 AELII 60
    5503 TQLII 59
    5504 QQLII 59
    5505 AQLIT 58
    5506 SQLIT 58
    5507 SSLIV 57
    5508 SELTV 57
    5509 NELLV 57
    5510 TQLLV 56
    5511 QGLIV 55
    5512 QELVI 55
    5513 NELIV 55
    5514 TELLT 53
  • TABLE 26
    ZF3 selection on G:T change at
    nt 13 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID Read
    NO: Sequence #
    79 QQLLI 1145
    5452 GELTV 1108
    81 GQLTV 933
    5474 GQLLI 748
    5447 GQLVV 545
    5457 GQLII 518
    80 GQLIV 479
    78 GELVV 477
    5515 GELIT 438
    5466 SQLLV 432
    5462 SQLII 431
    85 GQLLT 404
    5516 SQLSM 365
    84 QQLLT 349
    75 QQLIV 312
    5486 HPPEE 308
    5453 QELLT 300
    5475 SQLLI 282
    4773 GKLNA 281
    5451 GELIV 263
    5455 GQLLV 225
    76 SQLIV 219
    5517 RALLI 216
    5518 ENLLI 201
    5476 AQLLV 174
    5519 PDLKR 174
    86 GELLT 172
    5505 AQLIT 164
    5520 GQLVT 138
    5521 GULLS 116
    5450 QELLV 112
    5522 GELNP 112
    5523 GQLIQ 98
    5524 PTLVG 98
    5525 LVLAD 95
    5526 EALRA 94
    5467 GELVI 87
    1926 STLKA 87
    5494 GQLVI 85
    5463 AGLIV 82
    5527 GQLTL 82
    5528 NVLGT 81
    5529 KGLGP 79
    5530 MQLRR 79
    3026 GDLQR 75
    5531 VLLPN 71
    5532 MRLGD 69
    5533 GQLAQ 67
    4074 NELRG 67
    5500 GELTT 66
    5534 GELVT 64
    333 STLVV 63
    5535 VDLAV 61
    5536 AQLTI 59
    5537 DALPA 57
    5538 SVLQL 57
    5539 GPLGN 56
    5540 GHLLL 52
    5541 DVLDP 51
    5542 SSLSI 50
    5543 KMLAD 50
  • TABLE 27
    ZF3 selection on G:C change at
    nt 13 of core motif in CBS.
    Sequences reflect position
    2 to 6.
    SEQ
    ID #
    NO: Sequence Reads
    173 RKHD 4641
    175 RKAD 1938
    174 RRSD 1299
    681 RRHD 868
    682 RKTD 182
    683 NVSM 146
    684 RQSD 76
    685 RKND 69
    686 SENV 69
    687 VDHR 60
    688 AQIV 58
    689 KTPH 56
    690 PKIV 51
    691 GAEP 42
    692 MIVE 40
    693 VVGN 40
    694 KGPE 36
    695 GKVM 33
    696 TEPG 33
    697 TPHN 32
    698 MPGG 31
    699 DLEK 28
    700 GTDN 27
    701 ISRL 25
    702 ATGL 21
    703 ASNP 19
    704 GAPT 17
    705 HSPN 17
    706 RPVA 16
    177 RKDD 6
    707 MIVD 4
    708 RHRK 3
    709 RKHV 3
    710 RKQD 3
    711 RKSD 3
    712 DHHT 2
    713 GKHD 2
    714 MKAD 2
    715 RKAE 2
    716 RRAD 2
    717 APIG 1
    718 AQNR 1
    719 DMDA 1
    720 EAPM 1
    721 EEMM 1
    722 EPIR 1
    723 GALE 1
    724 GENV 1
    725 GKAD 1
    726 GKVD 1
    727 GPLA 1
    728 GRIE 1
    729 IEKL 1
    730 KAAS 1
    731 KEEH 1
    732 LKVD 1
    733 LUVE 1
    734 LMTQ 1
    735 MASL 1
    736 MGIG 1
    737 MPGD 1
    738 MSLG 1
    739 NDMT 1
    740 NMHT 1
    741 NRIV 1
    742 PENA 1
    743 QKHD 1
    744 QVPD 1
    745 RASD 1
    746 REHD 1
    747 RGHD 1
    748 RKHA 1
    749 RKHY 1
    750 RKLD 1
    751 RKPD 1
    752 RKVD 1
    753 RKYD 1
    754 RMSD 1
    755 RRLD 1
    756 RRND 1
    757 RRRD 1
    758 RRSG 1
    759 RWHD 1
    760 SHRL 1
    761 SQHV 1
    762 SSHD 1
    763 TTHV 1
    764 VHHV 1
    765 WKAD 1
    766 WKHD 1
  • REFERENCES
    • 1. Ong, Chin-Tong & Corces, V. P., Nat Rev Genet. 2014 April; 15(4):234-46.
    • 2. Phillips, J. & Corces, V. P., Cell. 2009 Jun. 26; 137(7): 1194-1211.
    • 3. T. et al., Curr Opin Genet Dev, 2016 April; 37:17-26,
    • 4. Nora, E. P. et al., Nature. 2012 Apr. 11; 485(7398)381-5.
    • 5. Rao, S. S. et al., Cell. 2014 Dec. 18; 159(7): 1665-1680.
    • 6. Phillip, J., et al., Cell. 2013 Jun. 6; 153(6): 1281--1295.
    • 7. Shukla, S., et al., Nature. 2011 Nov. 3; 479(7371):74-9.
    • 8. Hilmi, K., et al. Sci Adv. 2017 May 24; 3(5):e1601898.
    • 9. Han, D., et al. Sci Rep. 2017 Mar. 6; 7:43530.
  • 10, Rhee, S., & Pugh., Cell. 2011 Dec. 9; 147(6):1408-19.
    • 11. Nakahashi, H., et al., Cell Rep. 2013 May 30; 3(5):1678-1689.
    • 12. Hashimoto, et al., Mol Cell. 2017 Jun. 1; 66(5):711-720.e3,
    • 13. Guo, A. et al., Nat Commun. 2018 Apr. 18; 9(1):1520.
    • 14. Schuijers, J. et al., Cell Reports (2018). Cell Rep. 2018 Apr. 10; 23(2):349-360.
    • 15. Kang, J. Y. et al., Oncogene. 2015 Nov. 5; 34(45):5677-84.
    • 16. Wright, D., et al. Nat Protoc. 2006; 1(3):1637-52.
    • 17. Sander, J., et al. Nat Methods. 2011 January; 8(1):67-9.
    • 18. Minder, M., et al. Mol Cell, 2008 Jul. 25; 31(2):294-301.
    • 19. Joung J. K. et al., Proc Natl Acad Sci USA. 2000 Jun. 20; 97(13):7382-7.
    Other Embodiments
  • It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

Claims (26)

1. An engineered CCCTC-binding factor (CTCF) variant comprising at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, wherein the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS comprising at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, wherein the at least one amino acid residue that differs in sequence from the amino acid sequence of the wild-type CTCF is selected from the group consisting of amino acid residues at position(s) −1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.
2. The engineered CTCF variant of claim 1, wherein the mutant CBS has a Thymine (T), Adenine (A), or Guanine (G) residue at position 2 of the consensus CBS motif, the engineered CTCF comprising an amino acid residue threonine, asparagine, or histidine at ZF7 position +3.
3. The engineered CTCF variant of claim 1, wherein the mutant CBS has a G residue at position 2 of the consensus CBS motif, the engineered CTCF comprising the amino acid sequence DHLQT, EHLNV, AHLQV, EHLRE, DHLQV, EHLKV, EHLVV, DHLRT, or DHLAT at ZF7 positions +2 to +6.
4. (canceled)
5. The engineered CTCF variant of claim 1, wherein the mutant CBS has a T, or A residue at position 5 of the consensus CBS motif, the engineered CTCF at ZF6 positions +2 to +6 comprising:
the amino acid sequence NAMKR, GNMAR, EGMTR, SNMVR, or NAMRG; wherein the mutant CBS has a T residue at position 5 of the consensus CBS motif; or
the amino acid sequence EHMGR, DHMNR, THMKR, EHMRR, or THMNR, wherein the mutant CBS has a G residue at position 5 of the consensus CBS motif.
6. The engineered CTCF variant of claim 1, wherein the mutant CBS has a T, or C residue at position 6 of the consensus CBS motif, the engineered CTCF at ZF6 positions −1 to +3 comprising:
the amino acid sequence MNES or HRES, wherein the mutant CBS has a T residue at position 6 of the consensus CBS motif; or
the amino acid sequence RPDT, RTDI, or RHDT, wherein the mutant CBS has a G residue at position 6 of the consensus CBS motif.
7. The engineered CTCF variant of claim 1, wherein the mutant CBS has a C, A, or T residue at position 7 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 comprising:
the amino acid sequence HGLKV, HRLKE, HALKV, SRLKE, or DGLRV, wherein the mutant CBS has a T residue at position 7 of the consensus CBS motif;
the amino acid sequence HTLKV, or HGLKV, wherein the mutant CBS has an A residue at position 7 of the consensus CBS motif; or
the amino acid sequence SRLKE, HRLKE or NRLKE, wherein the mutant CBS has a C residue at position 7 of the consensus CBS motif.
8. The engineered CTCF variant of claim 1, wherein the mutant CBS has a C, A, or T residue at position 8 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 comprising:
the amino acid sequence ATLKR, QALRR, GGLVR, or HGLIR, wherein the mutant CBS has a T residue at position 8 of the consensus CBS motif;
the amino acid sequence ANLSR, TGLTR, HGLVR, or GGLTR, wherein the mutant CBS has an A residue at position 8 of the consensus CBS motif;
the amino acid sequence HTLRR, TVLKR, ADLKR, or HGLRR, wherein the mutant CBS has a C residue at position 8 of the consensus CBS motif.
9. The engineered CTCF variant of claim 1, wherein the mutant CBS has a T, A, or C residue at position 10 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 comprising:
the amino acid sequence AHLRK, wherein the mutant CBS has a T residue at position 10 of the consensus CBS motif;
the amino acid sequence AKLRV, EKLRI, or AKLRI, wherein the mutant CBS has an A residue at position 10 of the consensus CBS motif; or
the amino acid sequence TKLKV, wherein the mutant CBS has a C residue at position 10 of the consensus CBS motif.
10. The engineered CTCF variant of claim 1, wherein the mutant CBS has a T, A, or C residue at position 11 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 comprising:
the amino acid sequence ATLRR or RRLDR, wherein the mutant CBS has a T residue at position 11 of the consensus CBS motif;
the amino acid sequence TNLRR, ANLRR, or GNLTR, wherein the mutant CBS has an A residue at position 11 of the consensus CBS motif; or
the amino acid sequence AMLKR, HMLTR, AMLRR, or TMLRR, wherein the mutant CBS has a C residue at position 11 of the consensus CBS motif.
11. The engineered CTCF variant of claim 1, wherein the mutant CBS has a T, A, or C residue at position 13 of the consensus CBS motif, the engineered CTCF at ZF3 positions +2 to +6 comprising:
the amino acid sequence QQLIV, SQLIV, QQLLV, GELVV, or QQLLI, wherein the mutant CBS has a T residue at position 13 of the consensus CBS motif;
the amino acid sequence GQLIV, GQLTV, GKLVT, TELII or QGLLV, wherein the mutant CBS has an A residue at position 13 of the consensus CBS motif; or
the amino acid sequence QQLLT, GQLLT, GELLT, or QQLLI, wherein the mutant CBS has a C residue at position 13 of the consensus CBS motif.
12. The engineered CTCF variant if claim 1, wherein the mutant CBS has A, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF comprising:
(i) the amino acid sequence AKLKK, AKLRK, AHLRV, AKLRV, or SKLRL at ZF4 positions +2 to +6 of the engineered CTCF;
(ii) the amino acid sequence ERLRV, NRLKV, SRLKE, or NRLKV at ZF5 positions +2 to +6 of the engineered CTCF;
(iii) the amino acid sequence RPDT, RTET, or RADV at ZF6 positions −1 to +3 of the engineered CTCF; and
(iv) the amino acid sequence DNLLA, SNLLV, DNLMA, or DNLRV at ZF7 positions +2 to +6 of the engineered CTCF.
13. The engineered CTCF variant of claim 1, wherein the mutant CBS has G, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF comprising:
(i) the amino acid sequence GHLKK, AHLRK, or GKLRI at ZF4 positions +2 to +6 of the engineered CTCF;
(ii) the amino acid sequence SRLKE, DALRR, DGLKR, or TRLRE at ZF5 positions +2 to +6 of the engineered CTCF;
(iii) the amino acid sequence at RPDTMKR or RTENMKM at ZF6 positions −1 to +6 of the engineered CTCF; and
(iv) the amino acid sequence EHLKV, DHLLA, or HHLDV at ZF7 positions +2 to +6 of the engineered CTCF.
14. The engineered CTCF variant of claim 1, wherein the mutant CBS has A, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF comprising:
(i) the amino acid sequence SNLRR, GNLVR, GNLRR, GNLKR, ANLRR, NNLRR, or TNLRR at ZF4 positions +2 to +6 of the engineered CTCF;
(ii) the amino acid sequence EHMKR, EHMRR, THMKR, EHMNR, or EHMAR at ZF6 positions +2 to +6 of the engineered CTCF; and
(iii) the amino acid sequence DNLLT, DNLLV, DNLQT, DNLLA, DNLAT, DNLQA, DNLMA, or DNLMT at ZF7 positions +2 to +6 of the engineered CTCF.
15. (canceled)
16. The engineered CTCF variant of claim 1, wherein the mutant CBS that has T, and T residues at positions 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF comprising:
(i) the amino acid sequence GHLKK, AHLKK, TKLRL, TKLKL, GHLRK, THLKK, or AHLRK at ZF4 positions +2 to +6 of the engineered CTCF;
(ii) the amino acid sequence TRLKE or SRLKE at ZF5 positions +2 to +6 of the engineered CTCF; and
(iii) the amino acid sequence RADN, RHDT, RRDT, RPDT, RTSS, or RNDT at ZF6 positions −1 to +3 of the engineered CTCF.
17. The engineered CTCF variant of claim 1, wherein the engineered CTCF variant interacts with cohesion to mediate the formation of an enhancer-promoter loop to modulate gene expression.
18. A method of treating a subject in need thereof, the method comprising administering to the subject a therapeutically effective amount of an engineered CTCF variant according to claim 1.
19. The method of claim 18, wherein the subject has cancer.
20. A method of activating or repressing expression of a gene under the control of a mutant CBS of claim 1, the gene being aberrantly expressed under the control of the mutant CBS, the method comprising contacting the mutant CBS with an engineered CTCF according to any one of claims 1-17, thereby regulating the expression of the gene.
21. The method of claim 20, wherein the engineered CTCF activates or represses expression of the gene by interacting with cohesion to mediate the formation of an enhancer-promoter loop.
22. A pharmaceutical composition comprising an engineered CTCF variant according to claim 1.
23. A gene expression system for regulation of a gene, the system comprising a nucleic acid encoding an engineered CTCF variant according to claim 1.
24. A method of altering the structure of chromatin comprising contacting an engineered CTCF variant according to claim 1 with a mutant CBS to form a binding complex, such that the structure of the chromatin is altered.
25. A method of modulating expression of a gene that is under the control of a CBS bearing one or more mutations, the method comprising contacting the CBS bearing one or more mutations with an engineered CTCF variant according to claim 1.
26. A kit comprising an engineered CTCF variant according to claim 1 and instructions for use in a method described herein.
US17/118,378 2018-05-17 2020-12-10 CCCTC-Binding Factor Variants Pending US20210102213A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/118,378 US20210102213A1 (en) 2018-05-17 2020-12-10 CCCTC-Binding Factor Variants

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862672682P 2018-05-17 2018-05-17
US201962828277P 2019-04-02 2019-04-02
US16/415,989 US11041155B2 (en) 2018-05-17 2019-05-17 CCCTC-binding factor variants
US17/118,378 US20210102213A1 (en) 2018-05-17 2020-12-10 CCCTC-Binding Factor Variants

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/415,989 Division US11041155B2 (en) 2018-05-17 2019-05-17 CCCTC-binding factor variants

Publications (1)

Publication Number Publication Date
US20210102213A1 true US20210102213A1 (en) 2021-04-08

Family

ID=68541036

Family Applications (2)

Application Number Title Priority Date Filing Date
US16/415,989 Active 2039-07-17 US11041155B2 (en) 2018-05-17 2019-05-17 CCCTC-binding factor variants
US17/118,378 Pending US20210102213A1 (en) 2018-05-17 2020-12-10 CCCTC-Binding Factor Variants

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US16/415,989 Active 2039-07-17 US11041155B2 (en) 2018-05-17 2019-05-17 CCCTC-binding factor variants

Country Status (6)

Country Link
US (2) US11041155B2 (en)
EP (1) EP3793584A4 (en)
JP (1) JP2021523719A (en)
AU (1) AU2019269692A1 (en)
CA (1) CA3100726A1 (en)
WO (1) WO2019222670A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023196220A3 (en) * 2022-04-04 2023-11-23 The General Hospital Corporation Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers
WO2024026269A1 (en) * 2022-07-25 2024-02-01 The General Hospital Corporation Ccctc-binding factor (ctcf)-mediated gene activation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3059208A1 (en) 2017-04-21 2018-10-25 The General Hospital Corporation Inducible, tunable, and multiplex human gene regulation using crispr-cpf1
CN114146186B (en) * 2021-11-04 2023-05-12 华中科技大学同济医学院附属协和医院 Polypeptide drug conjugate based on sulfonium salt stabilized target HDAC and application thereof

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9824544D0 (en) * 1998-11-09 1999-01-06 Medical Res Council Screening system
WO2004099367A2 (en) * 2002-10-23 2004-11-18 The General Hospital Corporation Methods for producing zinc finger proteins that bind to extended dna target sequences
JP2009520463A (en) 2005-11-28 2009-05-28 ザ スクリプス リサーチ インスティテュート Zinc finger binding domain for TNN
WO2010046493A2 (en) * 2008-10-23 2010-04-29 Université de Lausanne Gene transfer vectors comprising at least one isolated dna molecule having insulator and or boundary properties and methods to identify the same
EP2414524B1 (en) * 2009-04-03 2017-08-23 Centre National De La Recherche Scientifique Gene transfer vectors comprising genetic insulator elements and methods to identify genetic insulator elements
WO2011017293A2 (en) * 2009-08-03 2011-02-10 The General Hospital Corporation Engineering of zinc finger arrays by context-dependent assembly
SG10201801782PA (en) * 2013-09-04 2018-04-27 Csir Site-specific nuclease single-cell assay targeting gene regulatory elements to silence gene expression
WO2015138852A1 (en) * 2014-03-14 2015-09-17 University Of Washington Genomic insulator elements and uses thereof
WO2017031370A1 (en) 2015-08-18 2017-02-23 The Broad Institute, Inc. Methods and compositions for altering function and structure of chromatin loops and/or domains

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023196220A3 (en) * 2022-04-04 2023-11-23 The General Hospital Corporation Method for genome-wide functional perturbation of human microsatellites using engineered zinc fingers
WO2024026269A1 (en) * 2022-07-25 2024-02-01 The General Hospital Corporation Ccctc-binding factor (ctcf)-mediated gene activation

Also Published As

Publication number Publication date
CA3100726A1 (en) 2019-11-21
AU2019269692A1 (en) 2020-11-19
WO2019222670A1 (en) 2019-11-21
EP3793584A4 (en) 2022-10-26
US20190382767A1 (en) 2019-12-19
US11041155B2 (en) 2021-06-22
EP3793584A1 (en) 2021-03-24
JP2021523719A (en) 2021-09-09

Similar Documents

Publication Publication Date Title
US20210102213A1 (en) CCCTC-Binding Factor Variants
US11661600B2 (en) Methods of rescuing stop codons via genetic reassignment with ACE-tRNA
AU2015299850B2 (en) Genome editing using Campylobacter jejuni CRISPR/CAS system-derived RGEN
RU2237715C2 (en) Method for preparing insertion mutations
Xie et al. High-fidelity SaCas9 identified by directional screening in human cells
Lee et al. Identification and characterization of int (integrase), xis (excisionase) and chromosomal attachment sites of the integrative and conjugative element ICEBs1 of Bacillus subtilis
WO2014059255A1 (en) Transcription activator-like effector (tale) - lysine-specific demethylase 1 (lsd1) fusion proteins
KR20230129230A (en) Compositions and methods for targeting BCL11A
WO2020007325A1 (en) Cas9 variants and application thereof
Pelczar et al. Agrobacterium proteins VirD2 and VirE2 mediate precise integration of synthetic T‐DNA complexes in mammalian cells
Rentas et al. Defining the bacteriophage T4 DNA packaging machine: evidence for a C-terminal DNA cleavage domain in the large terminase/packaging protein gp17
US20230279059A1 (en) Novel bacterial protein fibers
Zhou et al. A chromosome-level genome assembly of anesthetic drug–producing Anisodus acutangulus provides insights into its evolution and the biosynthesis of tropane alkaloids
Agúndez et al. Nuclear targeting of a bacterial integrase that mediates site-specific recombination between bacterial and human target sequences
LaRoche-Johnston et al. Group II introns generate functional chimeric relaxase enzymes with modified specificities through exon shuffling at both the RNA and DNA level
Wei et al. Transcriptomic identification of a unique set of nodule-specific cysteine-rich peptides expressed in the nitrogen-fixing root nodule of Astragalus sinicus
CA3231594A1 (en) Serpina-modulating compositions and methods
CA3231679A1 (en) Hbb-modulating compositions and methods
Luo et al. Transposase N-terminal phosphorylation and asymmetric transposon ends inhibit piggyBac transposition in mammalian cells
Wieczorek et al. Defining cosQ, the site required for termination of bacteriophage λ DNA packaging
Badet et al. Recent reactivation of a pathogenicity-associated transposable element is associated with major chromosomal rearrangements in a fungal wheat pathogen
US9809811B2 (en) Method of random circular permutation by MuCP-ISC and MuCP-ISSC transposons
CA3231677A1 (en) Methods and compositions for modulating a genome
Wei et al. Transcriptomic identification of a unique set of NCR peptides expressed in the nitrogen-fixing root nodule of Astragalus sinicus.
Weinstein Characterization of the Promoter Region for the EVI2A Gene

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

AS Assignment

Owner name: THE GENERAL HOSPITAL CORPORATION, MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:COTTMAN, REBECCA TAYLER;JOUNG, J. KEITH;SIGNING DATES FROM 20191007 TO 20191018;REEL/FRAME:055906/0817

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION