WO2002057308A2

WO2002057308A2 - Zinc finger polypeptides and their use

Info

Publication number: WO2002057308A2
Application number: PCT/GB2002/000246
Authority: WO
Inventors: Michael Moore; Mark Isalan; Lindsey Reynolds; Christopher Ullman; John Girdlestone; Christophe Demaison; Yen Choo
Original assignee: Sangamo Biosciences, Inc.
Priority date: 2001-01-22
Filing date: 2002-01-22
Publication date: 2002-07-25
Also published as: WO2002057308A3; US7947469B2; AU2002225187A1; US20040110923A1

Abstract

Polypeptides comprising a zinc finger motif having the consensus sequence: X0-2CX1-5CX2-7XXXXXXXHX3-6H/C and their use to modulate the transcription of a nucleotide sequence encoding a receptor.

Description

NUCLEIC ACID BINDING POLYPEPTIDES

FIELD OF THE INVENTION

The present invention relates to molecules. In particular, the present invention relates to molecules capable of binding to receptor nucleotide sequences.

BACKGROUND OF THE INVENTION

Many diseases are caused by viral infections. Infection of humans with Human Immunodeficiency Virus such as HIN-1 causes a dramatic decline in the numbers of white blood cells, particularly in the numbers of CD4+ T-lymphocytes. When the number of such cells becomes low enough, opportunistic infections and neoplasms occur, and the pathology may progress to Acquired Immune Deficiency Syndrome (AIDS). Therapeutics aimed at combating HIN and other viruses, as well as research tools for their study, are extremely important.

Many viruses subvert host proteins to gain entry into cells, exploit their metabolic functions for replication, and evade immune surveillance. Furthermore, immune reactions that may be beneficial for clearing pathogens can be deleterious to the host if they persist or if they are triggered inappropriately, such as in autoimmune syndromes.

The present invention seeks to overcome one or more problem(s) associated with the prior art.

SUMMARY OF THE INVENTION

According to one aspect of the present invention, we provide a polypeptide capable of binding to a nucleic acid comprising a receptor nucleotide sequence. Other aspects of the invention are set out below as independent claims. Preferred aspects of the invention are as set out in the subclaims, and also in the description below.

BRIEF DESCRIPTION OF THE DRAWINGS

Figures 1 A, IB and IC show DNA-binding by the six-finger CXCR4 constructs using an in vitro (TNT) fluorescence ELISA assay. Figure 1 A. Binding of the six-zinc fmger peptide, CXCR4-5-1, to its preferred target site and to 5 control sites containing base mutations (underlined). Figure IB. Binding of the six-zinc finger peptide, CXCR4-10-1, to its preferred target site and to a control site containing base mutations (underlined). Figure IC. Binding of the six-zinc finger peptide, CXCR4-10- 3, to its preferred target site and to a control site containing base mutations (underlined).

Figures 2A, 2B and 2C show specific down-regulation of endogenous CXCR4 by selected zinc fmger protein. Figure 2A. Expression of CXCR4 at the surface of Jurkat cells. Figure 2B. Expression of CXCR4 as assessed by FACS analysis on different population: non-transfected Jurkat cells which represent an internal control (Rl) and sub populations R2, R3 and R4 exhibit 10, 100- and 1000- fold higher GFP fluorescence intensity relative to background, respectively. Figure 2C. Down- regulation of CXCR4 observed with Jurkat cells transfected with zinc finger proteins 10-1 and 10-3.

Figures 3A, 3B and 3C show binding by the TNF six-zinc finger peptides using an in vitro (TNT) fluorescence ELISA assay. Figure 3 A. Binding of TNFR1-4-2 to its preferred target, a control site, and no binding site (nbs). Figure 3B. Binding of the six- zinc finger peptide TNFR1-7-9 to its preferred target, an overlapping site, a control site, and no binding site. Figure 3C. Binding of the six-zinc fmger peptide TNFR1-9- 12 to its preferred target, an overlapping site, a control site, and no binding site. Sequences of target sites 4-2, 7-9 and 9-12 are shown under Figure 3A. The target sites of TNFR1-9-12 and TNFR1-7-9 overlap over a distance corresponding to 4 1/2 of the 6 fingers binding region.

Figures 4A and 4B show specific down-regulation of endogenous TNFR1 expression by four distinct zinc finger proteins, as assayed by FACS analysis. Figure 4A. Expression of TNFR1 on Jurkat T cells is indicated by the filled area, while the open area represents a negative staining control. Figure 4B. Expression of TNFR1 after transient transfection with expression vectors encoding GFP only (pTracer, Invitrogen), or GFP and four different zinc fmger proteins targetted at the TNFR1 promoter. An expression vector encoding the CXCR4 zinc finger protein used in Fig. 2A,B is used as a control. The left panel depicts GFP fluorescence, which serves to identify transfected cells. Electronic gates are applied for GFP+ tranfected cells (GI). The second panel shows TNFR1 expression on the total cell population. TNFR1 expression on the GI, GFP+ sub-population is depicted in the third panel.

Figure 5 shows down-regulation of expression from a TNFR promoter as assayed by a CAT assay. The figure demonstrates down-regulation with TNFRl-4-2 (A), TNFR1-9-12 (B), TNFR1-7-9 (C), TNFR1-7-10 (D), as well as a combination of all four of these (A+B+C+D). No down-regulation is seen with a pTracer only control.

Figure 6 shows the specific down-regulation of TNFR- 1 and CXCR4 by TNFRl-4-2 -Kox and CXCR4-5-l-Kox, respectively. Column 1, total cell population, with proportion of cells expressing GFP from the transfected plasmid indicated under the horizontal line; column 2, expression levels of TNFR-1 in cells also expressing GFP; column 3, expression levels of CXCR4 in cells also expressing GFP; column 4, expression levels of ICAM in cells also expressing GFP. Data in rows A, B, C and D is for cells transfected with empty pTracer vector; pTracer containing TNFR1 -4-2-

Kox; pTracer containing CXCR4-5-l-Kox; and both TNFRl-4-2-Kox and CXCR4-5- 1-Kox -containing vectors, respectively. DETAILED DESCRIPTION OF THE INVENTION

In general, the invention provides a polypeptide capable of binding to a nucleic acid comprising a receptor nucleotide sequence.

Polypeptides according to preferred embodiments of the invention include those which are capable of binding to receptor nucleotide sequences, in which the receptor is involved in any aspect of viral function, for example, viral binding, viral infection, etc. Such a viral receptor maybe capable of mediating or facilitating etc the viral function. The receptor may be involved in viral infection of any kind, including HIV infection, for example, CD4, CXCR4, CCR5, CCR3 or CCR2b.

Furthermore, the invention includes polypeptides capable of binding receptor nucleotide sequences in which the receptor is capable of binding to one or more other molecules ("ligands"). Such a receptor may be involved in any form of signal transduction, and may in particular, be involved in an immune reaction (whether appropriate or inappropriate) against any agent, including pathogens or other insults. It will be appreciated that receptors may have plural activities or functions, such that the above preferred embodiments are not to be considered mutually exclusive.

In a further preferred embodiment of the invention, the receptor is involved in transmitting signals upon binding of a ligand. Such a ligand may include proteins, or other chemical compounds that can act as agonists or antagonists of such molecules. Preferably, the receptor is involved in regulation of the immune system, preferably TNFR1 and other members of the TNF receptor family (TNFR2, Fas/Apo-l/CD95, lymphotoxin beta receptor, TRAIL receptor, osteoprotegerin, RANK), the IL-1 and IL- 18 receptors, and the receptors for Type I and II Interferons.

Nucleic acid binding polypeptides according to the invention preferably comprise zinc finger polypeptides. Preferably, the nucleic acid binding polypeptide is capable of downregulating a receptor, preferably capable of transcriptional down- regulation of a receptor. Highly preferred embodiments include zinc finger polypeptides selected from the group consisting of consensus sequences 1 to 18 (see Table 1), a variant of one or more thereof, and polypeptides comprising combinations of two or more thereof.

Preferred zinc fingers include those comprising three fingers, i.e., selected from the group consisting of zinc fingers comprising consensus sequences 1 to 3, zinc fingers comprising consensus sequences 4 to 6, zinc fingers comprising consensus sequences 7 to 9, zinc fingers comprising consensus sequences 10 to 12, zinc fingers comprising consensus sequences 13 to 15, and zinc fingers comprising consensus sequences 16 to 18 (see Table 1). Preferably, the zinc finger polypeptides comprise six zinc fingers, viz, zinc fingers comprising consensus sequences 1 to 6, consensus sequences 7 to 12 or consensus sequences 13 to 18 (see Table 1). In each of these cases, the individual finger modules may be linked by canonical, structured or flexible linkers (as described in further detail below), preferably, linkers comprising GERP, GGGGSGGSGGSERP or GGGGSGGSGGSGGSGGSERP. The nucleic acid binding polypeptides may further comprise one or more repressor domains, preferably, a KOX domain, as described below.

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of chemistry, molecular biology, microbiology, recombinant DNA and immunology, which are within the capabilities of a person of ordinary skill in the art. Such techniques are explained in the literature. See, for example, J.

Sambrook, E. F. Fritsch, and T. Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A. Kahn, 1996, DNA Isolation and Sequencing: Essential Techniques, John Wiley &

Sons; J. M. Polak and James O'D. McGee, 1990, In Situ Hybridization: Principles and Practice; Oxford University Press; M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A Practical Approach, Irl Press; and, D. M. J. Lilley and J. E. Dahlberg, 1992, Methods ofEnzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in Enzymology, Academic Press. Each of these general texts is herein incorporated by reference.

RECEPTOR

The term "receptor" as used in the present document is intended to refer to a molecule which is capable of binding to another molecule (the ligand). Preferably, the receptor is located in the plasma membrane of a cell (a cell surface receptor). More preferably, the receptor is a transmembrane receptor. The receptor may be a chemokine receptor or a cytokine receptor. Most preferably, the receptor is a G-protein coupled receptor. The receptor may be involved in the regulation (repression or activation) of a downstream a transcription factor. Such a transcription factor may for example include members of the NF-kB or STAT families of transcription factors. Transmembrane receptors, including G-protein coupled and cytokine receptors, their properties and sequences, are known in the art.

Binding of the ligand to the receptor may lead to the activation of one or more biological activities, such as a post-transcriptional modification (for example, phosphorylation), dimerisation, nuclear (or other subcellular) localisation, binding to a third molecule, etc.

In one aspect of the invention, a nucleic acid binding polypeptide is capable of binding to a nucleic acid comprising a nucleotide sequence of a receptor involved in a viral function. The viral function is preferably selected from the group consisting of: viral titre, viral infectivity, viral replication, viral packaging, viral transcription, viral entry, viral attachment and viral penetration. Most preferably, the receptor is involved in viral infection.

A single receptor appears to be necessary and sufficient for entry of many retroviruses, but there exist exceptions to this simple model. For example, HIV requires two proteins for cell entry, neither of which alone is sufficient; 10A1 murine leukemia virus can enter cells by using either of two distinct receptors; two retroviruses can use different receptors in some cells but use the same receptor for entry into other cells. The term "co-receptor" is used here to refer any one of two or more polypeptides which contribute to, or are necessary for, cellular entry of the virus. The invention thus encompasses use of one or more nucleic acid binding polypeptides capable of binding nucleic acid sequences comprising a co-receptor, where more than one receptor is necessary and/or sufficient for viral entry into the cell. As will be appreciated, two or more such nucleic acid binding polypeptides may be used, each one capable of targeting one or more co-receptors. For example, a nucleic acid binding polypeptide capable of targeting CD4 may be used in conjunction with a nucleic acid binding polypeptide capable of targeting CXCR4. The term "receptor" where the context admits includes reference to a "co-receptor".

Preferably, the receptor is capable of binding a ligand which is a viral protein, preferably a viral envelope protein. Preferably, binding of the receptor to the viral protein is involved in, results in, or leads to, infection of a cell with the virus. Preferably, the receptor is capable of binding a polypeptide ligand of a lentivirus, preferably Human Immunodeficiency Virus, preferably, HIN-1. Most preferably, the polypeptide ligand is a gp 120 protein of HIN.

Preferably, the receptor is involved in infection by HIN, for example, CD4, CXCR4, CCR5, CCR3, CCR2b, etc. In a highly preferred embodiment of the invention, the receptor is a CXCR4 receptor, as discussed in further detail below.

Preferably, the nucleic acid binding polypeptide comprises a zinc finger polypeptide capable of binding to a CXCR4 receptor promoter sequence. Preferably, the CXCR4 receptor promoter sequence comprises a sequence selected from the group consisting of: 5 ' -Tccccgccccagcggcgcatgcgccgcgctcggagcgtgtttttata-3 ' , 5 ' -CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT-3 '

( CXCR4 - 5 - 1 ) , 5 ' -CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT- 3 ' ( CXCR4-10-1 ) , and 5 ' -

CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT-3 ' ( CXCR4-10-3 ) . In another aspect of the invention, the receptor is involved in regulation of a biological function. Preferably, the receptor is capable of binding a ligand which is either diffusible in bodily fluids, or integrated into or bound covalently or noncovalently to cell membranes or viral envelopes, the extracellular matrix, or any other substrate. Preferably, binding of the receptor to ligand is involved in, results in, or leads to, metabolic responses of a cell to the ligand. Preferably, the receptor is capable of binding a polypeptide ligand associated with an immune response, typically classified as cytokines or interleukins.

The biological function which is capable of being regulated by the receptor may include an immune function, for example, an immune response. The immune function is preferably associated with activation of cells associated with the immune system. Such cells may include B, T , NK, macrophages, neutrophils, and other lymphoid and myeloid cell types. The receptor may also be involved in any of the various metabolic responses of cells in the body to molecules in the circulation. Such molecules may be expressed and/or released by a variety of cell types, for example, lymphoid, myeloid, or stromal cells in response, to an immune stimulus. The receptor may in particular be involved in one or more anti- viral responses of the immune system. Preferred receptors include those which are involved in an autoimmune disease.

Preferred receptors of this aspect of the invention include TNFR1, TNFR2, IL-

1R, IFN receptors, etc. In a highly preferred embodiment of the invention, the receptor comprises a TNFR1 receptor. Preferably, the nucleic acid binding polypeptide comprises a zinc finger polypeptide capable of binding to a TNFR1 receptor promoter sequence. Preferably, the TNFR1 receptor promoter sequence comprises a sequence selected from the group consisting of: 5 ' GTCGGATTGGTGGG TTGGGGGCACAAGGCA-3 ' ( TNFRl-4-2 ) ;

5 ' TGGAGGAGGAGGCCACAGGAGGCGGGAGGA-3 ' ( TNFRl-7-9 ; TNFR1-7- 10 ) ; 5 ' CAGGAAGAGCTGGAGGAGGAGGCCCACAGGA- 3 ' ( NFRl -9-12 ) . A nucleic acid binding polypeptide according to the invention is preferably capable of down-regulating the receptor. The receptor may be down-regulated by a number of ways, including down-regulation of DNA replication of a sequence comprising the receptor, down-regulation of transcription of a gene encoding the receptor, down-regulation of RNA processing of a receptor transcript, down-regulation of transcript RNA turnover, down-regulation of translation, down-regulation of transport and or intracellular localisation of polypeptide and/or RNA within the cell, down-regulation of post-transcriptional modification, down-regulation of protease or other activity which activates the receptor, down-regulation of receptor cofactors, down-regulation of activity of the receptor, up-regulation of breakdown of the receptor, etc. Accordingly, the invention encompasses any nucleic acid binding polypeptide capable of any of these activities through binding to its target. Preferably, down-regulation of the receptor is achieved through transcriptional regulation, as described below.

A "receptor nucleotide sequence" is intended to refer to any nucleotide sequence comprising, derived from, including or associated with, etc a gene encoding a receptor. In preferred embodiments of the invention, the receptor nucleotide sequence comprises a gene, or a portion of a gene, for the receptor. The receptor nucleotide sequence may comprise a nucleic acid sequence capable of encoding a receptor, or it may comprise any of the control sequences of the receptor gene, for example, a promoter sequence, an enhancer sequence, a transcription factor binding site, a splice site, a sequence controlling transcription termination, etc. What is important is that the sequence is in some way involved in the regulation of expression of the receptor gene. Preferably, binding of the nucleic acid binding polypeptide to its target results in down-regulation of expression of the receptor.

CHEMOKESE RECEPTORS

In a particular embodiment of the invention, we provide a polypeptide capable of binding to a nucleic acid comprising a chemokine nucleotide sequence. Chemokines are a family of chemoattractive polypeptides, that are classified into 4 groups, depending on the position of conserved cysteine residues. Chemokines attract leukocyte subsets to inflammation sites and/or have homeostatic functions in regulating lymphocyte trafficking between and within lymphoid organs. A review of chemokines and their receptors is provided in Wells TN, Power CA, Proudfoot AE, 1998, Definition, function and pathophysiological significance of chemokine receptors Trends Pharmacol Sci 1998 Sep;19(9):376-80; WiUcinson D, 1996, Curr Biol 1996 Sep l;6(9):1051-3; Miller AD, 1996, Proc Natl Acad Sci U S A 1996 Oct 15;93(21):11407-13.

Preferred chemokines and receptors for them according to the invention include the following: C Chemokines such as SCYC1, Lptn, CL1, Lymphotactin, SCM-la, ATAC, SCYC2, CL2, SCM-lb; CC Chemokines, CXC Chemokines, CXXXC Chemokines and virus-encoded chemokines. In a particularly preferred embodiment of the invention, the receptor is a CXC chemokine receptor. Examples of CXC chemokine receptors include receptors for SCYB 1 , SCYB2, SCYB3, SCYB4,, SCYB4N1, SCYB5, SCYB6, (GCP-2, like, SCYB7, SPBPBP, SCYB8, SCYB9, SCYB 10, SCYB 11, (SCYB9B), SCYB 12, SCYB 13, SCYB 14, SCYB 15, SCYB 16, SCYB IP, ΝAP-4 and LFCA-1.

In particular, the receptor may be chosen from the group consisting of: a CCR3 receptor, a CCR2b receptor, a CCR5 receptor, and a CXCR-4 receptor.

CXCR4 RECEPTOR

In a preferred embodiment of the invention, the receptor is a receptor capable of binding a SDF-1 ligand. Preferably, the receptor is a CXCR4 receptor. Stromal cell- derived factor- 1 (SDF-1) is a CXC chemokine that binds to a unique transmembrane G protein-coupled receptor called CXCR4, is chemoattractive for hematopoietic progenitor cells, B lymphocytes, T lymphocytes, and monocytes. SDF-1 has been shown to regulates B lymphopoiesis and myelopoiesis by confimng precursors within the supportive bone marrow microenvironment before further maturation. CXCR4 also acts as a co-receptor for the human immunodeficiency virus (HIV)-l and 2.

The invention therefore relates in a particularly preferred embodiment to a molecule capable of binding to a nucleic acid comprising a CXCR4 receptor sequence, preferably a CXCR4 promoter sequence. Preferably, the molecule is capable of binding to and downregulating the expression of the CXCR4 gene.

CXCR4 (CXC chemokine receptor 4) is a co-receptor for T-cell-tropic HIV-1 strains, known also as X4 viruses (1, 6, reviewed in 7). CXCR4 is a G-protein-coupled, 7-transmembrane domain receptor whose physiological ligand is the Stromal Cell- Derived Factor-1 (SDF-1) chemokine (2, 13). CXCR4 is also known as CMKAR4, LCR1, NPY3R (systematic Human Genome nomenclature), fusin, HM89, LESTR, NPYRL, SDF-1R, D2S201E, CXCR4-Lo.

CXCR4 is expressed in dendritic cells (9), naive, non-memory T-cells (3), neurons and microglia (11), fresh primary monocytes (5, 16), endothelial cells (10, 15), neutrophils and B -cells (8). CXCR4 expression is induced by phytohemagglutinin (PHA) stimulation of peripheral blood mononuclear cells (PBMCs) (3). The first, second and third extracellular domains of CXCR4 + have been implicated in the recognition of CXCR4 by X4 HIV-1 strains. X4 strains differ significantly in their use of CXCR4 extracellular domains (4,12); for example: HIN-1(NDK) and HIV-2(ROD) are unable to use CXCR-4 in which most of the amino-terminal domain is deleted, while HIV-l(LAI) is able to use the truncated receptor (4).

Several compounds with anti-HIV-1 activity are thought to function through disruption of the envelope: CXCR4 interaction: T22, AMD3100, and ALX40-4C. In addition, the CXCR4-specific 12G5 monoclonal antibody has been shown to have anti-HIV-1 activity in-vitro (12), as do the CXCR4 ligands SDF-1 and SDF-la (2,13). SDF-1 and phorbol 12-myristate 13 -acetate (PMA) also cause rapid CXCR4 endocytosis and down modulation, presumably rendering cells less susceptible to infection by X4 viruses (14). Accordingly, the invention comprises a method of treatment or prevention of a viral disease, comprising administering to a patient a polypeptide as described here, together with a molecule capable of disrupting the interaction of a viral envelope protein with a viral receptor.

Patients infected by the human immunodeficiency virus type 1 (HIV-1) exhibit an invariable loss of CD4⁺ lymphocytes. However, primary isolates of HIV-1 show distinct differences in their biological properties, including differences in replication kinetics, tropism, and syncytium-inducing capacity. Cellular entry of HIV-1 requires binding to both CD4 and to one transmembrane chemokine co-receptors. CXCR-4 has been shown to mediate entry of T cell line- adapted syncytium-inducing (SI) strains of HIN-1 into CD4⁺ T cells, but does not permit entry of ΝSI isolates whereas CCR5 has been identified as a coreceptor for macrophage-tropic, nonsyncytium-inducing (ΝSI) strains of HIV-1.

Isolates of HIV-1 from early in the course of infection predominantly used CCR5. In patients with disease progression, the virus however expanded its co- receptor use to include other chemokine receptor such as CCR3, CCR2b, and CXCR- 4. More importantly, the emergence of variants using the CXCR-4 co-receptor is associated with a switch from ΝSI to SI phenotype, a decreasing CD4⁺ T cell counts and therefore a progression to AIDS.

A highly preferred embodiment of a polypeptide according to the invention comprises a polypeptide which is capable of binding to a nucleic acid sequence comprising a CXCR4 nucleotide sequence or a nucleotide sequence from a promoter or enhancer of CXCR4. Preferably, the CXCR4 receptor is a human CXCR4 receptor. More preferably, the CXCR4 receptor is the human CXCR4 receptor having GenBank accession number X71635 (Loetscher, M., Geiser, T., O'Reilly, T., Zwahlen, R., Baggiolini, M., and Moser, B. (1994) J. Biol. Chem. 269, 232-237). The sequence of a preferred CXCR4 receptor (GenBank accession number X71635) nucleotide sequence is shown here:

1 ttgcagatat acacttcaga taactacacc gaggaaatgg gctcagggga ctatgactcc

61 atgaaggaac cctgtttccg tgaagaaaat gctaatttca ataaaatctt cctgcccacc 121 atctactσca tcatcttctt aactggcatt gtgggcaatg gattggtcat cctggtcatg 181 ggttaccaga agaaactgag aagcatgaσg gacaagtaca ggctgcacct gtcagtggcc

241 gacctcctct ttgtcatσac gcttcccttc tgggcagttg atgcσgtggc aaactggtac

301 tttgggaact tcctatgcaa ggcagtccat gtcatctaca cagtcaacct ctacagcagt

361 gtcctcatcc tggσcttcat cagtctggac cgctaσσtgg ccatcgtcca cgccaccaac

421 agtcagaggc caaggaagct gttggctgaa aaggtggtct atgttggcgt ctggatccct

481 gccctcctgc tgactat cc cgacttσatc tttgccaacg tcagtgaggc agatgacaga

541 tatatctgtg accgcttcta ccccaatgac ttgtgggtgg ttgtgttcca gtttcagcac

601 atσatggttg gccttatσct gcctggtatt gtcatcctgt cctgctattg cattatcatσ

661 tccaagctgt cacactccaa gggccaccag aagcgcaagg ccctcaagac cacagtcatc

721 ctcatcctgg ctttcttcgσ ctgttggctg ccttactaca ttgggatcag catcgactcc

781 ttcatcctcc tggaaatcat caagcaaggg tgtgagtttg agaacactgt gcacaagtgg

841 a tσcatca ccgaggccct agctttcttc cactgttgtc tgaaccccat cctctatgct

901 ttccttggag ccaaatttaa aacctctgcσ cagcacgcac tcacctctgt gagcagaggg

961 tccagcctca agatcctctc caaaggaaag cgaggtggac attcatctgt ttccactgag

1021 tctgagtctt caagttttca ctccagctaa cacagatgta aaagactttt ttttatacga

1081 taaataactt ttttttaagt tacacatttt tcagatataa aagactgacc aatattgtac

1141 agtttttatt gcttgttgga tttttgtctt gtgtttcttt agtttttgtg aagtttaatt

1201 gacttattta tataaatttt ttttgtttca tattgatgtg tgtctaggca ggacctgtgg

1261 ccaagttctt agttgctgta tgtctcgtgg taggactgta gaaaagggaa ctgaacattc

1321 cagagcgtgt agtgaatcac gtaaagctag aaatgatccc cagctgttta tgcatagata

1381 atctctccat tcccgtggaa cgtttttcct gttcttaaga cgtgattttg ctgtagaaga

1441 tggcacttat aaccaaagcσ caaagtggta tagaaatgct ggtttttcag ttttcaggag

1501 tgggttgatt tcagcacσta cagtgtacag tcttgtatta agttgttaat aaaagtacat

1561 gttaaactta cttagtgtta tg

Preferably, the nucleotide sequence is chosen from the non-coding upstream region of the CXCR4 gene, for example, positions 1 to 68 of the above sequence.

VIRUS

In highly preferred embodiments, nucleic acid binding polypeptides of the present invention are capable of binding to nucleotide sequences of receptors or co- receptors involved in viral infection, preferably leading to their downreguloation. Thus, they are capable of reducing, preventing, or alleviating the spread of infection of a number of viruses, and may hence be used for treating or preventing diseases associated with or caused by such viruses.

The virus may be an RNA virus or a DNA virus. Preferably, the virus is an integrating virus. Preferably, the virus is selected from a lentivirus and a herpesvirus. More preferably, the virus is an HIN virus or a HSN virus. The methods described here can therefore be used to prevent the development and establishment of diseases caused by or associated with any of the above viruses, including human immunodeficiency virus, such as HIV-1 and HIV-2, and herpesvirus, for example HSV-1, HSV-2, HSV-7 and HSV-8, as well as human cytomegalovirus, varicella-zoster virus, Epstein-Barr virus and human herpesvirus 6. in humans.

Examples of viruses which may be targeted using the present invention are given in the tables below.

DNA VIRUSES

Genus or

Family Example Diseases [Subfamily] Herpesviridae [Alphaherpes- Herpes simplex virus type 1 virinae] Encephalitis, cold sores, gingivostomatitis

(aka HHV-1) Herpes simplex virus type 2

Genital herpes, encephalitis

(oka HHV-2)

Varicella zoster virus (aha

Chickenpox, shingles

HHV-3)

[Gammaherpesviri Epstein Barr virus (aka HHV-

Mononucleoisis, hepatitis, tumors (BL, NPC) nae] 4)

Kaposi's sarcoma associated

?Probably: tumors, inc. Kaposi's sarcoma herpesvirus, KSHV (aka (KS) and some B cell lymphomas

Human herpesvirus 8)

Human cytomegalovirus (aka Mononucleosis, hepatitis, pneumonitis,

[Betaherpesvirinae]

HHV-5) congenital

Human herpesvirus 6 Roseola (aka E. subitum), pneumonitis

Adenoviridae Human herpesvirus 7 Some cases of roseola? Papovaviridae Mastadenovirus Human adenoviruses 50 serotypes (species); respiratory infections

Papillomavirus Human papillomaviruses 80 species; warts and tumors

Hepadnaviridae Polyomavirus JC, BK viruses Mild usually; JC causes PML in AIDS Poxviridae Orthohepadnavirus Hepatitis B virus (HBV) Hepatitis (chronic), cirrhosis, liver tumors

Hepatitis C virus (HCV) Hepatitis (chronic), cirrhosis, liver tumors

Orthopoxviras Vaccinia virus Smallpox vaccine virus

Smallpox-like disease; a rare zoonosis (recent

Monkeypox virus outbreak in Congo; 92 cases fi-om 2/96 - 2/97)

Parvoviridae Parapoxvirus Orf virus Skin lesions ("pocks")

E. infectiousum (aka Fifth disease), aplastic Erythrovirus B19 parvovirus crisis, fetal loss

Useful for gene therapy; integrates into

Circoviridae Dependovirus Adeno-associated virus chromosome Circovirus TT virus (TTV) Linked to hepatitis of unknown etiology

RNA VIRUSES

Genus or

Family Example Diseases [Subfamily]

3 types; Aseptic meningitis, paralytic

Picomaviridae Enterovirus Polioviruses poliomyelitis

Echoviruses 30 types; Aseptic meningitis, rashes

Coxsackieviruses 30 types; Aseptic meningitis, myopericarditis

Hepatovirus Hepatitis A virus Acute hepatitis (fecal-oral spread) Rhϊnovirus Human rhinoviruses 115 types; Common cold

Caliciviridae Calicivirus Norwalk virus Gastrointestinal illness

4 types; Common cold, bronchiolitis, Paramyxoviridae Paramyxovirus Parainfluenza viruses pneumonia

Mumps: parotitis, aseptic meningitis (rare: Rubulavirus Mumps virus orchitis, encephalitis)

Measles: fever, rash (rare: encephalitis, Morbillivirus Measles virus SSPE) Common cold (adults), bronchiolitis,

Pneumovirus Respiratory syncytial virus pneumonia (infants)

Orthomyxo- Flu: fever, myalgia, malaise, cough,

Influenzavirus A Influenza virus A viridae pneumonia

Flu: fever, myalgia, malaise, cough,

Influenzavirus B Influenza virus B pneumonia

Rabies: long incubation, then CNS disease,

Rhabdoviridae Lyssavirus Rabies virus death

Filoviridae Filovirus Ebola and Marburg viruses Hemorrhagic fever, death

Uncertain; linked to schizophrenia-like

Bornaviridae Bornavirus Borna disease virus disease in some animals

Human T-lymphotropic virus Adult T-cell leukemia (ATL), tropical spastic

Retroviridae Deltaretrovirus type-1 paraparesis (TSP)

Spumavirus Human foamy viruses No disease known

Human immunodeficiency

Lentivirus AIDS, CNS disease virus type-1 and -2

Togaviridae Rubivirus Rubella virus Mild exanthem; congenital fetal defects

Equine encephalitis viruses

Alphavirus Mosquito-born, encephalitis (WEE, EEE, VEE) .

Mosquito-born; fever, hepatitis (yellow

Flaviviridae Flavivirus Yellow fever virus fever!)

Dengue virus Mosquito-born; hemorrhagic fever St. Louis Encephalitis virus Mosquito-born; encephalitis

Hepacivirus Hepatitis C virus Hepatitis (often chronic), liver cancer Hepatitis G virus Hepatitis???

Reoviridae Rotavirus Human rotaviruses Numerous serotypes; Diarrhea

Colti virus Colorado Tick Fever virus Tick-born; fever

Orthoreovirus Human reoviruses Minimal disease

Pulmonary Syndrome Rodent spread; pulmonary illness (can be

Bunyaviridae Hantavirus Hantavirus lethal, "Four Corners" outbreak)

Rodent spread; hemorrhagic fever with renal

Hantaan virus syndrome

Phlebovirus Rift Valley Fever virus Mosquito-born; hemorrhagic fever

Crimean-Congo Hemorrhagic

Nairovirus Mosquito-born; hemorrhagic fever Fever virus

Lymphocytic

Arenaviridae Arenavirus Rodent-born; fever, aseptic meningitis Choriomeningitis virus

Rodent-born; severe hemorrhagic fever (BL4

Lassa virus agents; also: Machupo, Junin)

Deltavirus Hepatitis Delta virus Requires HBV to grow; hepatitis, liver cancer

Coronaviridae Coronavirus Human coronaviruses Mild common cold-like illness

Astroviridae Astrovirus Human astroviruses Gastroenteritis

"Hepatitis E-like

Unclassified Hepatitis E virus Hepatitis (acute); fecal-oral spread viruses"

Receptors and co-receptors involved in viral functions of the above and other viruses, such as infection, are known in the art. For example, receptors for HIV infection include CD4, CCR5, CXCR4, etc. Reference is made to Rojo D, Suetomi K, Navarro J, 1999, Biol Res 1999;32(4):263-72; D'Souza MP, Cairns JS, Plaeger SF, 2000, JAMA 2000 Jul 12;284(2):215-22.

Receptors for Herpes Simplex Virus infection include HveA, HveB and HveC. HveC allows both HSV-1 and -2 to enter skin cells at the site of infection and to spread into ells of the nervous system. In addition to serving as a co-receptor for HSV-1 and - 2, this receptor permits entry of other closely related forms of herpesvirus, porcine pseudorabies virus, and bovine herpesvirus, into cells. HveA is a member of the tumor necrosis factor receptor family, while HveB and HveC belong to tl e immunoglobulin gene superfamily and are closely related to the poliovirus receptor (Pvr). These receptors mediate HSV-1 infection of various cell types, including lymphoid cells (HveA) and neurons (HveC); HveB mediates infection of cells by HSV-2 and by certain alphaherpesviruses of animals, but does allow infection of cells by wild-type HSV-1. Reference is made to Ramos-Kuri JM, 1992, Envelope and membrane glycoproteins of Herpes simplex virus, Rev Latinoam Microbiol 1992 Jan- Mar;34(l):23-3, : Rajcani J, Vojvodova A, The role of herpes simplex virus glycoproteins in the virus replication cycle, Acta Virol 1998 Apr;42(2): 103-18. HAVcr-1 is a simian cellular receptor for the hepatitis A virus (HAV).

Methods of identifying receptors for viruses are known in the art, and are disclosed for example in Okuma K, Yanagi Y (1999), Approach to the identification of virus receptor, Uirusu 1999 Jun;49(l):l-9.

HUMAN IMMUNODEFICIENCY NIRUS-1 (HIV-1)

The nucleic acid binding polypeptides of the present invention are capable of binding to nucleic acid sequences capable of encoding Human Immunodeficiency Virus (HIV) receptors and/or co-receptors such as CD4, CXCR4, CCR5„ CCR2b, CCR3 etc. We also provide nucleic acid binding polypeptides capable of treating HIV infection. The methods described here can therefore be used to prevent the development and establishment of diseases caused by or associated with human immunodeficiency virus, such as HIV-1 and HIV-2.

Human Immunodeficiency Virus (HIV) is a retrovirus which infects cells of the immune system, most importantly CD4⁺ T lymphocytes. CD4⁺ T lymphocytes are important, not only in terms of their direct role in immune function, but also in stimulating normal function in other components of the immune system, including CD8⁺ T-lymphocytes. These HIV infected cells have their function disturbed by several mechanisms and/or are rapidly killed by viral replication. The end result of chronic HIV infection is gradual depletion of CD4⁺ T lymphocytes, reduced immune capacity, and ultimately the development of AIDS, leading to death.

HIV infection typically begins when an HTV particle, which contains two copies of the HIV RNA, encounters a cell with a surface molecule CD4. Cells with this molecule are known as CD4 positive (CD4+) cells. One or more of the virus's gpl20 molecules binds tightly to CD4 molecule(s) on the cell's surface. The membranes of the virus and the cell fuse, a process that probably involves both gp41 and a second "fusion cofactor" molecule on the cell surface. Following fusion, the virus's RNA, proteins and enzymes are released into the cell. Although CD4+ T cells appear to be HIV's main target, other immune system cells with CD4 molecules on their surfaces are infected as well. Among these are long-lived cells called monocytes and macrophages, which apparently can harbor large quantities of the virus without being killed, thus acting as reservoirs of HIN. HIN may also may infect cells without CD4 on their surfaces, using other docking molecules. For example, cells of the central nervous system may be infected via a receptor known as galactosyl ceramide. The use of such receptors as targets is also encompassed by the present invention. Cell-to-cell spread of HIN also can occur through the CD4-mediated fusion of an infected cell with an uninfected cell.

CYTOKINE RECEPTORS

In a particular embodiment of the invention, we provide a polypeptide capable of binding to a nucleic acid comprising a cytokine receptor nucleotide sequence.

Cytokines are polypeptides, typically but not exclusively produced by cells involved in the immune system, and for the purposes of this invention include those referred to as lymphokines, monokines, interleukins, and interferons. They are a diverse group of proteins, although sub-groups can be identified by virtue of sequence or structural homologies. The cellular responses elicited by the interaction of cytokines with their cognate receptors are also diverse, and even for a given cytokine can differ according to the responding cell type, its differentiation stage, or its history of stimulation by other cytokines. Broadly, cytokines can regulate cell survival, proliferation, differentiation, morphology, and trancriptional and translational activities. Collectively these responses serve to establish metabolic states that protect the organism against infection by pathogens such as bacteria or viruses. However, although the production of cytokines is highly regulated, inappropriate or excessive levels can cause life-threatening inflammatory episodes or lead to chronic disease.

Preferred cytokines and receptors for them according to the invention include the following: members of the TNF family (TNF α/β, lymphotoxin cc/β , TRANCE/ RANKL, TRAIL, FasL and all combinations of subunits that compose biologically active multimeric ligands) and their receptors; Interleukin-1 (IL-1) and its receptors; IL-18 and its receptors; type I interferons (all α forms, β, and ω) and their receptors; type TJ interferon (γ) and its receptors. In a particularly preferred embodiment of the invention, the receptor is a TNF receptor. Examples of TNF family receptors include TNFRl (also p55, p60), TNFRII (also p75, p80), KTLLER/DR5, Fas/CD95, RANK, and osteoprotegerin.

A preferred TNFRl receptor has the accession number X69810; a preferred TNFR2 receptor has the accession number U53481 and a preferred CD95/Fas receptor has the accession number HSFASX1. It will however be appreciated that tlie sequences referred to using such accession numbers may only include cDNA

(expressed sequence), or they may include only small regions of upstream sequence. In particular they may not include upstream regions which may include promoter and enhancer sequences. It is clear that such upstream sequences may be used as target sequences for designing zinc fingers capable of binding to these sequences and thereby regulating expression of the receptors under their control or influence. However, it is a simple matter for the skilled person to determine the nature or sequence of such control regions, using methods known in the art (for example, chromosome walking, primer extension, etc). Furthermore, determination of a corresponding genomic sequence enables identification of promoter and enhancer or other control regions. Sequences identified in the various genome sequencing projects currently being conducted, including the Human Genome Sequence Project, will also enable the skilled person to choose sequences for targetting. Thus, for example, it is a simple matter for a person skilled in the art to identify a genomic sequence corresponding to, for example, the CD4 receptor sequence, and to identify the control regions. If necessary, reference may be made to "The International Workshop on Human Leukocyte Differentiation Antigens" (Tissue Antigens. 1996 Oct;48(4 Pt2):352-508).

TNFα

Originally identified as a factor capable of inducing the death of transformed cells, TNFα has since been found to exert induce a range of cellular responses. TNFα is a homo-trimeric molecule formed by specific association of three polypeptide chains endoded by the TNFα gene locus. The protein is produced as a transmembrane complex, and when expressed at the cell surface the receptor-interacting domains are exposed to the extracellular space and are capable of interacting with their cognate receptor on adjacent cells. Although biologically active in this membrane-bound form, TNFα is generally found as a secreted product that is produced by specific cleavage of the membrane form by metalloproteinases.

Produced by macrophages and T cells, particularly those of the Thl type,

TNFα is regarded as a key cytokine for inducing immune responses, particularly those associated with inflammation. When it binds to its receptors it can induce expression of MHC class I molecules that present antigen to cytotoxic T cells, and adhesion molecules such as ICAM that promote recruitment of cells of the immune system to sites of inflammation. Importantly, TNFα often induces transcription of genes for these molecules in a synergistic fashion with other cytokines, particularly the interferons. While TNFα serves an important function in promoting inflammation in order to neutralise pathogens, it is often associated with a range of clinical problems (19, 24). Acute over-production of TNFα in response to bacterial toxins can cause septicaemia, toxic shock syndrome, and other forms of immune damage. Chronic autoimmune diseases and other syndromes including inflammatory bowel disease, rheumatoid arthritis, psoriasis, myocarditis, myelodysplasia, multiple sclerosis, and type II diabetes are also linked to TNFα. Murine models have shown that over- expression of TNFα can lead to myocardial fibrosis, and this could be ameliorated with adenoviral gene therapy with a decoy TNF receptor (23). The pivotal role of TNFα in rheumatoid arthritis is illustrated by the favourable clinical responses of patients to treatment with an antibody to TNFα, Infliximab (20), or a recombinant decoy receptor, Etanercept (21).

TNFRl RECEPTOR

In a preferred embodiment of the invention, the receptor is capable of binding a ligand that is a member of the TNF family. In a highly preferred embodiment, the ligand is TNFα. Preferably, the receptor is TNFRl (also p55, p60, CD120a). More preferably, the TNFRl receptor comprises a receptor having accession number TNFRl is X69810.

TNFα is recognised by two distinct cell-surface, transmembrane receptors: TNFRl (p55,p60, CD120a), and TNFRl 1 (p75,p80, CD120b). Each appears to function as a homotrimer (17). They have strong homology in their extracellular ligand-binding domains, but diverge in their intracellular domains and thereby transmit overlapping but distinct patterns of signals when engaged by TNFα (17). Consequently TNFRl and TNFRII have distinct immunological functions, as found in smdies of mouse strains where genes for one or both have been knocked-out. Mouse strains susceptible to myocarditis do not develop inflammatory heart disease when TNFRl is not expressed but TNFRII is still present (22). In a murine model of experimental autoimmune encephalomyelitis (EAE), knock-out of TNFRl prevented EAE, while knockout of TNFRII exacerbated the disease (18).

Accordingly, the polypeptides provided here may be used to target any cytokine receptor, including the TNFRl receptor, and to regulate their expression. Such polypeptides may therefore be used to treat or prevent various diseases or syndromes associated with or caused by malfunction (in particular, up-regulation or over-expression or inappropriate activation, etc) of the receptor such as the TNFRl receptor. Such diseases and syndromes include autoimmune diseases such as inflammation, autoimmune encephalomyelitis (EAE), rheumatoid arthritis, myocarditis, etc.

VARIANTS

A nucleic acid binding polypeptide molecule as provided by the present invention includes splice variants encoded by mRNA generated by alternative splicing of a primary transcript, amino acid mutants, glycosylation variants and other covalent derivatives of said molecule which retain the physiological and/or physical properties of said molecule, such as its nucleic acid binding activity. Exemplary derivatives include molecules wherein the protein of the invention is covalently modified by substitution, chemical, enzymatic, or other appropriate means with a moiety other than a naturally occurring amino acid. Such a moiety may be a detectable moiety such as an enzyme or a radioisotope, or may be a molecule capable of facilitating crossing of cell membrane(s) etc.

Derivatives can be fragments of the nucleic acid binding molecule. Fragments of said molecule comprise individual domains thereof, as well as smaller polypeptides derived from the domains. Preferably, smaller polypeptides derived from the molecule according to the invention define a single epitope which is characteristic of said molecule. Fragments may in theory be almost any size, as long as they retain one characteristic of the nucleic acid binding molecule. Preferably, fragments maybe at least 3 amino acids and in length. Derivatives of the nucleic acid binding molecule also comprise mutants thereof, which may contain amino acid deletions, additions or substitutions, subject to the requirement to maintain at least one feature characteristic of said molecule. Thus, conservative amino acid substitutions may be made substantially without altering the nature of the molecule, as may truncations from the N- or C- terminal ends, or the corresponding 5'- or 3'- ends of a nucleic acid encoding it. Deletions or substitutions may moreover be made to the fragments of the molecule comprised by the invention. Nucleic acid binding molecule mutants may be produced from a DNA encoding a nucleic acid binding protein which has been subjected to in vitro mutagenesis resulting e.g. in an addition, exchange and/or deletion of one or more amino acids. For example, substitutional, deletional or insertional variants of the molecule can be prepared by recombinant methods and screened for nucleic acid binding activity as described herein.

The fragments, mutants and other derivatives of the polypeptide nucleic acid binding molecule preferably retain substantial homology with said molecule. As used herein, "homology" means that the two entities share sufficient characteristics for the skilled person to determine that they are similar in origin and/or function. Preferably, homology is used to refer to sequence identity. Thus, the derivatives of the molecule preferably retain substantial sequence identity with the sequence of said molecule. Examples of such sequences are presented as consensus sequences 1 to 8.

"Substantial homology", where homology indicates sequence identity, means more than 75% sequence identity and most preferably a sequence identity of 90% or more. Amino acid sequence identity may be assessed by any suitable means, including the BLAST comparison technique which is well known in the art, and is described in Ausubel et al, Short Protocols in Molecular Biology (1999) 4^th Ed, John Wiley & Sons, Inc. MUTATIONS

Mutations may be performed by any method known to those of skill in the art. Preferred, however, is site-directed mutagenesis of a nucleic acid sequence encoding the protein of interest. A number of methods for site-directed mutagenesis are known in the art, from methods employing single-stranded phage such as M13 to PCR-based techniques (see "PCR Protocols: A guide to methods and applications", M.A. Innis, D.H. Gelfand, J.J. Sninsky, TJ. White (eds.). Academic Press, New York, 1990). Preferably, the commercially available Altered Site II Mutagenesis System (Promega) may be employed, according to the directions given by the manufacturer.

Screening of the proteins produced by mutant genes is preferably performed by expressing the genes and assaying the binding ability of the protein product. A simple and advantageously rapid method by which this may be accomplished is by phage display, in which the mutant polypeptides are expressed as fusion proteins with the coat proteins of filamentous bacteriophage, such as the minor coat protein pπ of bacteriophage ml3 or gene III of bacteriophage Fd, and displayed on the capsid of bacteriophage transformed with the mutant genes. The target nucleic acid sequence is used as a probe to bind directly to the protein on the phage surface and select the phage possessing advantageous mutants, by affinity purification. The phage are then amplified by passage through a bacterial host, and subjected to further rounds of selection and amplification in order to enrich the mutant pool for the desired phage and eventually isolate the preferred clone(s). Detailed methodology for phage display is known in the art and set forth, for example, in US Patent 5,223,409; Choo and Klug, (1995) Current Opinions in Biotechnology 6:431-436; Smith, (1985) Science 228:1315-1317; and McCafferty et al, (1990) Nature 348:552-554; all incorporated herein by reference. Vector systems and kits for phage display are available commercially, for example from Pharmacia.

The present invention allows tl e production of what are essentially artificial nucleic acid binding proteins. In these proteins, artificial analogues of amino acids may be used, to impart the proteins with desired properties or for other reasons. Thus, the term "amino acid", particularly in the context where "any amino acid" is referred to, means any sort of natural or artificial amino acid or amino acid analogue that may be employed in protein construction according to methods known in the art. Moreover, any specific amino acid referred to herein may be replaced by a functional analogue thereof, particularly an artificial functional analogue. The nomenclature used herein therefore specifically comprises within its scope functional analogues of the defined amino acids.

The polypeptides which comprise the libraries according to the invention may comprise zinc fmger polypeptides. In other words, they comprise a Cys2-His2 zinc finger motif.

Molecules according to the invention may advantageously comprise multiple zinc finger motifs. For example, molecules according to the invention may comprise any number of motifs, such as three zinc finger motifs, or may comprise four or five such motifs, or may comprise six zinc finger motifs, or even more. Advantageously, molecules according to the invention may comprise zinc finger motifs in multiples of three, such as three, six, nine or even more zinc fmger motifs. Preferably, molecules according to the invention may comprise about three to about six zinc finger motifs.

NUCLEIC ACID BINDING POLYPEPTIDES

This invention relates to nucleic acid binding polypeptides. The term "polypeptide" (and the terms "peptide" and "protein") are used interchangeably to refer to a polymer of amino acid residues, preferably including naturally occurring amino acid residues. Artificial analogues of amino acids may also be used in the nucleic acid binding polypeptides, to impart the proteins with desired properties or for other reasons. The term "amino acid", particularly in the context where "any amino acid" is referred to, means any sort of natural or artificial amino acid or amino acid analogue that may be employed in protein construction according to methods known in tlie art. Moreover, any specific amino acid referred to herein may be replaced by a functional analogue thereof, particularly an artificial functional analogue. Polypeptides may be modified, for example by the addition of carbohydrate residues to form glycoproteins.

As used herein, "nucleic acid" includes both RNA and DNA, constructed from natural nucleic acid bases or synthetic bases, or mixtures thereof. Preferably, however, the binding polypeptides of the invention are DNA binding polypeptides.

Particularly preferred examples of nucleic acid binding polypeptides are Cys2-His2 zinc finger binding proteins which, as is well known in the art, bind to target nucleic acid sequences via α-helical zinc metal atom co-ordinated binding motifs known as zinc fingers. Each zinc finger in a zinc finger nucleic acid binding protein is responsible for determining binding to a nucleic acid triplet, or an overlapping quadruplet, in a nucleic acid binding sequence. Preferably, there are 2 or more zinc fingers, for example 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or more zinc fingers, in each binding protein. Advantageously, the number of zinc fingers in each zinc finger binding protein is a multiple of 2.

All of the DNA binding residue positions of zinc fingers, as referred to herein, are numbered from the first residue in the α-helix of the finger, ranging from +1 to +9. "-1" refers to the residue in the framework structure immediately preceding the α-helix in a Cys2-His2 zinc finger polypeptide. Residues referred to as "++" are residues present in an adjacent (C-terminal) finger. Where there is no C-terminal adjacent finger, "++" interactions do not operate.

The present invention is in one aspect concerned with the production of what are essentially artificial DNA binding proteins. In these proteins, artificial analogues of amino acids may be used, to impart the proteins with desired properties or for other reasons. Thus, the term "amino acid", particularly in the context where "any amino acid" is referred to, means any sort of natural or artificial amino acid or amino acid analogue that may be employed in protein construction according to methods known in the art. Moreover, any specific amino acid referred to herein may be replaced by a functional analogue thereof, particularly an artificial functional analogue. The nomenclature used herein therefore specifically comprises within its scope functional analogues or mimetics of the defined amino acids.

The α-helix of a zinc finger binding protein aligns antiparallel to the nucleic acid strand, such that the primary nucleic acid sequence is arranged 3' to 5' in order to correspond with the N terminal to C-terminal sequence of the zinc finger. Since nucleic acid sequences are conventionally written 5' to 3', and amino acid sequences N-terminus to C-terminus, the result is that when a nucleic acid sequence and a zinc finger protein are aligned according to convention, the primary interaction of the zinc finger is with the - strand of the nucleic acid, since it is this strand which is aligned 3' to 5'. These conventions are followed in the nomenclature used herein. It should be noted, however, that in nature certain fingers, such as finger 4 of the protein GLI, bind to the + strand of nucleic acid: see Suzuki et al, (1994) NAR 22:3397-3405 and Pavletich and Pabo, (1993) Science 261:1701-1707. The incorporation of such fingers into DNA binding molecules according to the invention is envisaged.

The present invention may be integrated with the rules set forth for zinc finger polypeptide design in our copending European or PCT patent applications having publication numbers; WO 98/53057, WO 98/53060, WO 98/53058, WO 98/53059, describe improved techniques for designing zinc finger polypeptides capable of binding desired nucleic acid sequences. In combination with selection procedures, such as phage display, set forth for example in WO 96/06166, these techniques enable the production of zinc finger polypeptides capable of recognising practically any desired sequence.

Thus, in one embodiment, the invention provides a method for preparing a nucleic acid binding polypeptide of the Cys2-His2 zinc finger class capable of binding to a target DNA sequence comprising a receptor nucleotide sequence, in which binding to each base of a DNA triplet by an α-helical zinc finger DNA binding module in the polypeptide is determined as follows: if the 5' base in the triplet is G, then position +6 in the α-helix is Arg and or position +4-2 is Asp; if the 5' base in the triplet is A, then position +6 in the α-helix is Gin or Glu and ++2 is not Asp; if the 5' base in the triplet is T, then position +6 in the α-helix is Ser or Thr and position ++2 is Asp; or position +6 is a hydrophobic amino acid other than Ala; if the 5' base in the triplet is C, then position +6 in the α-helix may be any amino acid, provided that position ++2 in the α- helix is not Asp; if the central base in the triplet is G, then position +3 in the α-helix is His; if the central base in the triplet is A, then position +3 in the α-helix is Asn; if the central base in the triplet is T, then position +3 in the α-helix is Ala, Ser, lie, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; if the central base in the triplet is 5-meC, then position +3 in the α-helix is Ala, Ser, He, Leu, Thr or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; if the 3' base in the triplet is G, then position -1 in the α-helix is Arg; if the 3' base in the triplet is A, then position -1 in the α-helix is Gin and position +2 is Ala; if the 3' base in the triplet is T, then position -1 in the α-helix is Asn; or position -1 is Gin and position +2 is Ser; if the 3' base in the triplet is C, then position -1 in the α-helix is Asp and Position +1 is Arg; where the central residue of a target triplet is C, the use of Asp at position +3 of a zinc finger polypeptide allows preferential binding to C over 5-meC.

The foregoing represents a set of rules which permits the design of a zinc finger binding protein specific for any given target DNA sequence, in particular a receptor nucleotide sequence.

A zinc finger binding motif is a structure well known to those in the art and defined in, for example, Miller et al, (1985) EMBO J. 4:1609-1614; Berg (1988) PNAS (USA) 85:99-102; Lee et al, (1989) Science 245:635-637; see International patent applications WO 96/06166 and WO 96/32475, corresponding to USSN 08/422,107, incorporated herein by reference.

In general, a preferred zinc finger framework has the structure:

( A ) X₀_₂ C X₁-.₅ C X₉_ι H X₃_₆ /_c

where X is any amino acid, and the numbers in subscript indicate the possible numbers of residues represented by X. The above framework may be further refined to include the structure:

(A') X₀_₂ C Xχ-,₅ C X₂_₇ X X X X X X X H X₃_₆ ^H/_c

-1 1 2 3 4 5 6 7

where X is any amino acid, and the numbers in subscript indicate the possible numbers of residues represented by X.

In a preferred aspect of the present invention, zinc finger nucleic acid binding motifs may be represented as motifs having the following primary structure:

(B) X^a C X₂_₄ C X₂_₃ F X^c X X X X L X X H X X^b H - linker

-1 1 2 3 4 5 6 7 8 9

wherein X (including X^a, X and X^c) is any amino acid. X^ and X₂.₃ refer to the presence of 2 or 4, or 2 or 3, amino acids, respectively. The Cys and His residues, which together co-ordinate the zinc metal atom, are marked in bold text and are usually invariant, as is the Leu residue at position +4 in the α-helix. The linker, as noted elsewhere, may comprise a canonical, structured or flexible linker. Structured and flexible linkers are described below, and in our UK application numbers GB 0001582.6, GB0013103.7, GB0013104.5 and our International Patent Application filed 19^th January 2001 PCT/GB01/00202 (WO01/53480), all of which are hereby incorporated by reference.

Modifications to this representation may occur or be effected without necessarily abolishing zinc finger function, by insertion, mutation or deletion of amino acids. For example it is known that the second His residue may be replaced by Cys (Krizek et al, (1991) J. Am. Chem. Soc.113:4518-4523) and that Leu at +4 can in some circumstances be replaced with Arg. The Phe residue before X_c may be replaced by any aromatic other than Trp. Moreover, experiments have shown that departure from the preferred structure and residue assignments for the zinc finger are tolerated and may even prove beneficial in binding to certain nucleic acid sequences. Even taking this into account, however, the general structure involving an α-helix co-ordinated by a zinc atom which contacts four Cys or His residues, does not alter. As used herein, structures (A) and (B) above are taken as an exemplary structure representing all zinc fmger structures of the Cys2-His2 type.

Preferably, X^a is ^F/_γ-X or P-^F/_γ-X. In this context, X is any amino acid. Preferably, in this context X is E, K, T or S. Less preferred but also envisaged are Q, V, A and P. The remaining amino acids remain possible.

Preferably, X₂.₄ consists of two amino acids rather than four. The first of these amino acids may be any amino acid, but S, E, K, T, P and R are preferred. Advantageously, it is P or R. The second of these amino acids is preferably E, although any amino acid may be used.

Preferably, X^b is T or I. Preferably, X^c is S or T.

Preferably, X₂.₃ is G-K-A, G-K-C, G-K-S or G-K-G. However, departures from the preferred residues are possible, for example in the form of M-R-N or M-R.

As set out above, the major binding interactions occur with amino acids -1, +3 and +6. Amino acids +4 and +7 are largely invariant. The remaining amino acids may be essentially any amino acids. Preferably, position +9 is occupied by Arg or Lys. Advantageously, positions +1, 4-5 and +8 are not hydrophobic amino acids, that is to say are not Phe, Tip or Tyr. Preferably, position ++2 is any amino acid, and preferably serine, save where its nature is dictated by its role as a ++2 amino acid for an N-terminal zinc finger in the same nucleic acid binding molecule.

The code provided by the present invention is not entirely rigid; certain choices are provided. For example, positions +1, +5 and +8 may have any amino acid allocation, whilst other positions may have certain options: for example, the present rules provide that, for binding to a central T residue, any one of Ala, Ser or Val may be used at +3. In its broadest sense, therefore, the present invention provides a very large number of proteins which are capable of binding to every defined target DNA triplet.

Preferably, however, the number of possibilities may be significantly reduced. For example, the non-critical residues +1, +5 and +8 may be occupied by the residues Lys, Thr and Gin respectively as a default option. In the case of the other choices, for example, the first-given option may be employed as a default. Thus, the code according to the present invention allows the design of a single, defined polypeptide (a "default" polypeptide) which will bind to its target triplet. Zinc fingers may be based on naturally occurring zinc fingers and consensus zinc fingers.

In general, naturally occurring zinc fingers may be selected from those fingers for which the DNA binding specificity is known. For example, these may be the fingers for which a crystal structure has been resolved: namely Zif 268 (Elrod-Erickson et al, (1996) Structure 4:1171-1180), GLI (Pavletich and Pabo, (1993) Science 261:1701-1707), Tramtrack (Fairall et al, (1993) Nature 366:483-487) and YYl (Houbaviy et al, (1996) PNAS (USA) 93: 13577-13582). Preferably, the modified nucleic acid binding polypeptide is derived from Zif 268, GAC, or a Zif- GAC fusion comprising three fingers from Zif linked to three fingers from GAC. By "GAC-clone", we mean a three-finger variant of ZIF268 which is capable of binding the sequence GCGGACGCG, as described in Choo & Klug (1994), Proc. Natl. Acad. Sci USA, 91, 11163-11167.

The naturally occurring zinc finger 2 in Zif 268 makes an excellent starting point from which to engineer a zinc finger and is preferred.

Consensus zinc finger structures may be prepared by comparing the sequences of known zinc fingers, irrespective of whether their binding domain is known. Preferably, the consensus structure is selected from the group consisting of the consensus structure PYKCPECGKSFSQKSDLVKHQRTHT,andthe consensus structure PYKCSECGKAFSQKSNLTRHQRIHT. The consensuses are derived from the consensus provided by Krizek et al, (1991) J. Am. Chem. Soc. 113: 4518-4523 and from Jacobs, (1993) PhD thesis, University of Cambridge, UK. In both cases, canonical, structured or flexible linker sequences, as described below, may be formed on the ends of the consensus for joining two zinc finger domains together.

When the nucleic acid specificity of the model finger selected is known, the mutation of the finger in order to modify its specificity to bind to the target DNA may be directed to residues known to affect binding to bases at which the natural and desired targets differ. Otherwise, mutation of the model fingers should be concentrated upon residues -1, +3, +6 and ++2 as provided for in the foregoing rules.

In order to produce a binding protein having improved binding, moreover, the rules provided by the present invention may be supplemented by physical or virtual modelling of the protein/DNA interface in order to assist in residue selection.

In a further embodiment, the invention provides a method for producing a zinc finger polypeptide capable of binding to a target DNA sequence comprising a receptor nucleotide sequence, the method comprising: a) providing a nucleic acid library encoding a repertoire of zinc finger domains or modules, the nucleic acid members of the library being at least partially randomised at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix of the zinc finger modules; b) displaying the library in a selection system and screening it against the target DNA sequence; and c) isolating the nucleic acid members of the library encoding zinc finger modules or domains capable of binding to the target sequence.

Methods for the production of libraries encoding randomised polypeptides are known in the art and may be applied in the present invention. Randomisation may be total, or partial; in the case of partial randomisation, the selected codons preferably encode options for amino acids as set forth in the rules above. Zinc finger polypeptides may be designed which specifically bind to nucleic acids incorporating the base U, in preference to the equivalent base T.

In a further preferred aspect, the invention comprises a method for producing a zinc finger polypeptide capable of binding to a target DNA sequence comprising a receptor nucleotide sequence, the method comprising: a) providing a nucleic acid library encoding a repertoire of zinc finger polypeptides each possessing more than one zinc finger, the nucleic acid members of the library being at least partially randomised at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix in a first zinc finger and at one or more of the positions encoding residues -1, 2, 3 and 6 of the α-helix in a further zinc finger of the zinc finger polypeptides; b) displaying the library in a selection system and screening it against the target DNA sequence; and d) isolating the nucleic acid members of the library encoding zinc finger polypeptides capable of binding to the target sequence.

In this aspect, the invention encompasses library technology described in our copending International patent application WO 98/53057, incorporated herein by reference in its entirety. WO 98/53057 describes the production of zinc fmger polypeptide libraries in which each individual zinc finger polypeptide comprises more than one, for example two or three, zinc fingers; and wherein within each polypeptide partial randomisation occurs in at least two zinc fingers. This allows for the selection of the "overlap" specificity, wherein, within each triplet, the choice of residue for binding to the third nucleotide (read 3' to 5 ' on the + strand) is influenced by the residue present at position +2 on the subsequent zinc finger, which displays cross- strand specificity in binding. The selection of zinc finger polypeptides incorporating cross-strand specificity of adjacent zinc fingers enables the selection of nucleic acid binding proteins more quickly, and/or with a higher degree of specificity than is otherwise possible.

The above rules allow the engineering of a zinc finger capable of binding to a given nucleotide sequence. Engineering of zinc fingers which involves applying rules which specify the choice of amino acid residues based on the identity of residues in a target nucleic acid sequence is referred to here as "rule based" or "rational" design. Such rational design provides a great deal of versatility in zinc finger design, and may be used instead of, or to complement zinc finger production by selection from libraries.

Zinc finger binding motifs designed according to the invention may be combined into nucleic acid binding polypeptide molecules having a multiplicity of zinc fingers. Preferably, the proteins have at least two zinc fingers. The presence of at least three zinc fingers is preferred. Nucleic acid binding proteins may be constructed by joining the required fingers end to end, N-terminus to C-terminus, with canonical, flexible or structured linkers, as described below. Preferably, this is effected by joining together the relevant nucleic acid sequences which encode the zinc fingers to produce a composite nucleic acid coding sequence encoding the entire binding protein.

The invention therefore provides a method for producing a DNA binding protein as defined above, wherein the DNA binding protein is constructed by recombinant DNA technology, the method comprising the steps of: preparing a nucleic acid coding sequence encoding a plurality of zinc fmger domains or modules defined above, inserting the nucleic acid sequence into a suitable expression vector; and expressing the nucleic acid sequence in a host organism in order to obtain the DNA binding protein. A "leader" peptide may be added to the N-terminal finger. Preferably, the leader peptide is MAEEKP.

FLEXIBLE AND STRUCTURED LINKERS

The nucleic acid binding polypeptides according to the invention may comprise one or more linker sequences. The linker sequences may comprise one or more flexible linkers, one or more structured linkers, or any combination of flexible and structured linkers. Such linkers are disclosed in our co-pending British Patent Application Numbers 0001582.6, 0013102.9, 0013103.7, 0013104.5 and International Patent Application Number PCT/GB01/00202 (WOO 1/53480), which are incorporated by reference. By "linker sequence" we mean an amino acid sequence that links together two nucleic acid binding modules. For example, in a "wild type" zinc finger protein, the linker sequence is the amino acid sequence lacking secondary structure which lies between the last residue of the α-helix in a zinc finger and the first residue of the β- sheet in the next zinc finger. The linker sequence therefore joins together two zinc fingers. Typically, the last amino acid in a zinc finger is a threonine residue, which caps the α-helix of the zinc finger, while a tyrosine/phenylalanine or another hydrophobic residue is the first amino acid of the following zinc finger. Accordingly, in a "wild type" zinc finger, glycine is the first residue in the linker, and proline is the last residue of the linker. Thus, for example, in the Zif268 construct, the linker sequence is G(E/Q)(K/R)P.

A "flexible" linker is an amino acid sequence which does not have a fixed structure (secondary or tertiary structure) in solution. Such a flexible linker is therefore free to adopt a variety of conformations. An example of a flexible linker is the canonical linker sequence GERP/GEKP/GQRP/GQKP. Flexible linkers are also disclosed in WO99/45132 (Kim and Pabo). By "structured linker" we mean an amino acid sequence which adopts a relatively well-defined conformation when in solution. Structured linkers are therefore those which have a particular secondary and/or tertiary structure in solution.

Determination of whether a particular sequence adopts a structure may be done in various ways, for example, by sequence analysis to identify residues likely to participate in protein folding, by comparison to amino acid sequences which are known to adopt certain conformations (e.g., known alpha-helix, beta-sheet or zinc finger sequences), by NMR spectroscopy, by X-ray diffraction of crystallised peptide containing the sequence, etc as known in the art.

The structured linkers of our invention preferably do not bind nucleic acid, but where they do, then such binding is not sequence specific. Binding specificity may be assayed for example by gel-shift as described below. The linker may comprise any amino acid sequence that does not substantially hinder interaction of the nucleic acid binding modules with their respective target subsites. Preferred amino acid residues for flexible linker sequences include, but are not limited to, glycine, alanine, serine, threonine proline, lysine, arginine, glutamine and glutamic acid..

The linker sequences between the nucleic acid binding domains preferably comprise five or more amino acid residues. The flexible linker sequences according to our invention consist of 5 or more residues, preferably, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 or more residues. In a highly preferred embodiment of the invention, the flexible linker sequences consist of 5, 7 or 10 residues.

Once the length of the amino acid sequence has been selected, the sequence of the linker may be selected, for example by phage display technology (see for example United States Patent No. 5,260,203) or using naturally occurring or synthetic linker sequences as a scaffold (for example, GQKP and GEKP, see Liu et al., 1997, Proc. Natl. Acad. Sci. USA 94, 5525-5530 and Whitlow et al., 1991, Methods: A Companion to Methods in Enzymology 2: 97-105). The linker sequence may be provided by insertion of one or more amino acid residues into an existing linker sequence of the nucleic acid binding polypeptide. The inserted residues may include glycine and/or serine residues. Preferably, the existing linker sequence is a canonical linker sequence selected from GEKP, GERP, GQKP and GQRP. More preferably, each of the linker sequences comprises a sequence selected from GGEKP, GGQKP, GGSGEKP, GGSGQKP, GGSGGSGEKP, and GGSGGSGQKP.

Structured linker sequences are typically of a size sufficient to confer secondary or tertiary structure to the linker; such linkers may be up to 30, 40 or 50 amino acids long. In a preferred embodiment, the structured linkers are derived from known zinc fingers which do not bind nucleic acid, or are not capable of binding nucleic acid specifically. An example of a structured linker of the first type is TFIIIA finger IN; the crystal structure of TFIIIA has been solved, and this shows that finger IV does not contact the nucleic acid (Νolte et al, 1998, Proc. Natl. Acad. Sci. USA 95, 2938-2943.). An example of the latter type of structured linker is a zinc finger which has been mutagenised at one or more of its base contacting residues to abolish its specific nucleic acid binding capability. Thus, for example, a ZIF fmger 2 which has residues -1, 2, 3 and 6 of the recognition helix mutated to serines so that it no longer specifically binds DNA may be used as a structured linker to link two nucleic acid binding domains.

The use of structured or rigid linkers to jump the minor groove of DNA is likely to be especially beneficial in (i) linking zinc fingers that bind to widely separated (>3bp) DNA sequences, and (ii) also in minimising the loss of binding energy due to entropic factors.

Typically, the linkers are made using recombinant nucleic acids encoding the linker and the nucleic acid binding modules, which are fused via the linker amino acid sequence. The linkers may also be made using peptide synthesis and then linked to the nucleic acid binding modules. Methods of manipulating nucleic acids and peptide synthesis methods are known in the art (see, for example, Maniatis, et al., 1991. Molecular Cloning: A Laboratory Manual Cold Spring Harbor, New York, Cold Spring Harbor Laboratory Press).

TRANSCRIPTIONAL REGULATION

The nucleic acid binding polypeptides according to our invention maybe linked to one or more transcriptional effector domains, such as an activation domain or a repressor domain. Examples of transcriptional activation domains include the VP16 and VP64 transactivation domains of Herpes Simplex Virus. Alternative transactivation domains are various and include the maize Cl transactivation domain sequence (Sainz et al, 1997, Mol. Cell. Biol. 17: 115-22) and PI (Goff et al, 1992, Genes Dev. 6: 864-75; Estruch et al, 1994, Nucleic Acids Res. 22: 3983-89) and a number of other domains that have been reported from plants (see Estruch et al, 1994, ibid). Instead of incorporating a transactivator of gene expression, a repressor of gene expression can be fused to the nucleic acid binding polypeptide and used to down regulate the expression of a gene contiguous or incorporating the nucleic acid binding polypeptide target sequence. Such repressors are known in the art and include, for example, the KRAB-A domain (Moosmann et al, Biol. Chem. 378: 669-677 (1997)) the engrailed domain (Han et al., Embo J. 12: 2723-2733 (1993)) and the snag domain (Grimes et al, Mol Cell. Biol. 16: 6263-6272 (1996)). These can be used alone or in combination to down-regulate gene expression.

It is known that zinc finger proteins may be fused to transcriptional repression domains such as the Kruppel-associated box (KRAB) domain to form powerful repressors. These fusions are known to repress expression of a reporter gene even when bound to sites a few kilobase pairs upstream from the promoter of the gene (Margolin et al., 1994, PNAS USA 91, 4509-4513). Thus, zinc finger-KRAB fusion proteins capable of binding to nucleic acid sequences comprising a receptor nucleotide sequence may be used to regulate expression, preferably enable down regulation, of receptors.

MULTIFINGER POLYPEPTIDES

According to a preferred embodiment of the present invention, the nucleic acid binding polypeptides comprise a plurality of binding domains or motifs. For example, a preferred zinc finger polypeptide according the invention comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, etc or more zinc finger binding domains or motifs. Highly preferred embodiments are zinc finger polypeptides which comprise three zinc finger motifs and those which comprise six fmger motifs.

Zinc fmger polypeptides comprising multiple fingers may be constructed by joining together two or more zinc finger polypeptides (which may themselves be selected using phage display, as described elsewhere in this document) with suitable linker sequences. Means of joining polypeptide sequences, for example, by recombinant DNA technology are known in the art, and are for example disclosed in Sambrook et al (supra) and Ausubel et al (supra). Furthermore, other sequences such as nuclear localisation sequences and "tag" sequences for purification may be included as known in the art. A specific example of production of a six finger protein 6F6 is described in the Examples below.

VECTORS

The nucleic acid encoding the nucleic acid binding protein according to the invention may be incorporated into vectors for further manipulation. As used herein, vector (or plasmid) refers to discrete elements that are used to introduce heterologous nucleic acid into cells for either expression or replication thereof. Selection and use of such vehicles are well within the skill of the person of ordinary skill in the art. Many vectors are available, and selection of appropriate vector will depend on the intended use of the vector, i.e. whether it is to be used for DNA amplification or for nucleic acid expression, the size of the DNA to be inserted into the vector, and the host cell to be transformed with the vector. Each vector contains various components depending on its function (amplification of DNA or expression of DNA) and the host cell for which it is compatible. The vector components generally include, but are not limited to, one or more of the following: an origin of replication, one or more marker genes, an enhancer element, a promoter, a transcription termination sequence and a signal sequence.

Both expression and cloning vectors generally contain nucleic acid sequence that enable the vector to replicate in one or more selected host cells. Typically in cloning vectors, this sequence is one that enables the vector to replicate independently of the host chromosomal DNA, and includes origins of replication or autonomously replicating sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. The origin of replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, the 2μ plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, polyoma, adenovirus) are useful for cloning vectors in mammalian cells. Generally, the origin of replication component is not needed for mammalian expression vectors unless these are used in mammalian cells competent for high level DNA replication, such as COS cells.

Most expression vectors are shuttle vectors, i.e. they are capable of replication in at least one class of organisms but can be transfected into another class of organisms for expression. For example, a vector is cloned in E. coli and then the same vector is transfected into yeast or mammalian cells even though it is not capable of replicating independently of the host cell chromosome. DNA may also be replicated by insertion into the host genome. However, the recovery of genomic DNA encoding the nucleic acid binding protein is more complex than that of exogenously replicated vector because restriction enzyme digestion is required to excise nucleic acid binding protein DNA. DNA can be amplified by PCR and be directly transfected into the host cells without any replication component.

SELECTABLE MARKERS

Advantageously, an expression and cloning vector may contain a selection gene also referred to as selectable marker. This gene encodes a protein necessary for the survival or growth of transformed host cells grown in a selective culture medium. Host cells not transformed with the vector containing the selection gene will not survive in the culture medium. Typical selection genes encode proteins that confer resistance to antibiotics and other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement auxotrophic deficiencies, or supply critical nutrients not available from complex media.

As to a selective gene marker appropriate for yeast, any marker gene can be used which facilitates the selection for transformants due to the phenotypic expression of the marker gene. Suitable markers for yeast are, for example, those conferring resistance to antibiotics G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic yeast mutant, for example the URA3, LEU2, LYS2, TRP1, or HIS3 gene. Since the replication of vectors is conveniently done in E. coli, an E. coli genetic marker and anE. coli origin of replication are advantageously included. These can be obtained from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, e.g. pUC18 or pUC19, which contain both E. coli replication origin and E. coli genetic marker conferring resistance to antibiotics, such as ampicillin.

Suitable selectable markers for mammalian cells are those that enable the identification of cells competent to take up nucleic acid binding protein nucleic acid, such as dihydrofolate reductase (DHFR, methotrexate resistance), thymidine kinase, or genes conferring resistance to G418 or hygromycin. The mammalian cell transformants are placed under selection pressure which only those transformants which have taken up and are expressing the marker are uniquely adapted to survive. In the case of a DHFR or glutamine synthase (GS) marker, selection pressure can be imposed by culturing the transformants under conditions in which the pressure is progressively increased, thereby leading to amplification (at its chromosomal integration site) of both the selection gene and the linked DNA that encodes the nucleic acid binding protein. Amplification is the process by which genes in greater demand for the production of a protein critical for growth, together with closely associated genes which may encode a desired protein, are reiterated in tandem within the chromosomes of recombinant cells. Increased quantities of desired protein are usually synthesised from thus amplified DNA.

EXPRESSION

Expression and cloning vectors usually contain a promoter that is recognised by the host organism and is operably linked to nucleic acid binding protein encoding nucleic acid. Such a promoter may be inducible or constitutive. The promoters are operably linked to DNA encoding the nucleic acid binding protein by removing the promoter from the source DNA by restriction enzyme digestion and inserting the isolated promoter sequence into the vector. Both the native nucleic acid binding protein promoter sequence and many heterologous promoters may be used to direct amplification and/or expression of nucleic acid binding protein encoding DNA. Promoters suitable for use with prokaryotic hosts include, for example, the β- lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (Tip) promoter system and hybrid promoters such as the tac promoter. Their nucleotide sequences have been published, thereby enabling the skilled worker operably to ligate them to DNA encoding nucleic acid binding protein, using linkers or adapters to supply any required restriction sites. Promoters for use in bacterial systems will also generally contain a Shine-Delgarno sequence operably linked to the DNA encoding the nucleic acid binding protein.

Preferred expression vectors are bacterial expression vectors which comprise a promoter of a bacteriophage such as phagex or T7 which is capable of functioning in the bacteria. In one of the most widely used expression systems, the nucleic acid encoding the fusion protein may be transcribed from the vector by T7 RNA polymerase (Studier et al, Methods in Enzymol. 185; 60-89, 1990). In the E. coli BL21(DE3) host strain, used in conjunction with pET vectors, the T7 RNA polymerase is produced from the λ-lysogen DE3 in the host bacterium, and its expression is under the control of the IPTG inducible lac UV5 promoter. This system has been employed successfully for over-production of many proteins. Alternatively the polymerase gene may be introduced on a lambda phage by infection with an int- phage such as the CE6 phage which is commercially available (Novagen, Madison, USA), other vectors include vectors containing the lambda PL promoter such as PLEX (Invitrogen, NL) , vectors containing the trc promoters such as pTrcHisXpressTm (Invitrogen) or pTrc99 (Pharmacia Biotech, SE) or vectors containing the tac promoter such as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, MA, USA).

Moreover, the nucleic acid binding protein gene according to the invention preferably includes a secretion sequence in order to facilitate secretion of the polypeptide from bacterial hosts, such that it will be produced as a soluble native peptide rather than in an inclusion body. The peptide may be recovered from the bacterial periplasmic space, or the culture medium, as appropriate. A "leader" peptide may be added to the N-terminal finger. Preferably, the leader peptide is MAEEKP. Suitable promoting sequences for use with yeast hosts may be regulated or constitutive and are preferably derived from a highly expressed yeast gene, especially a Saccharomyces cerevisiae gene. Thus, the promoter of the TRP1 gene, the ADHI or ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating pheromone genes coding for the a- or α-factor or a promoter derived from a gene encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- phosphate dehydrogenase (GAP), 3-phospho glycerate kinase (PGK), hexokinase, pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, phosphoglucose isomerase or glucokinase genes, or a promoter from the TATA binding protein (TBP) gene can be used. Furthermore, it is possible to use hybrid promoters comprising upstream activation sequences (UAS) of one yeast gene and downstream promoter elements including a functional TATA box of another yeast gene, for example a hybrid promoter including the UAS(s) of the yeast PH05 gene and downstream promoter elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid promoter). A suitable constitutive PHO5 promoter is e.g. a shortened acid phosphatase PH05 promoter devoid of the upstream regulatory elements (UAS) such as the PH05 (-173) promoter element starting at nucleotide -173 and ending at nucleotide -9 of the PH05 gene.

Nucleic acid binding protein gene transcription from vectors in mammalian hosts may be controlled by promoters derived from the genomes of viruses such as polyoma virus, adenovirus, fowlpox virus, bovine papiUoma virus, avian sarcoma virus, cytomegalovirus (CMV), a retrovirus and Simian Virus 40 (SV40), from heterologous mammalian promoters such as the actin promoter or a very strong promoter, e.g. a ribosomal protein promoter, and from the promoter normally associated with nucleic acid binding protein sequence, provided such promoters are compatible with the host cell systems.

Transcription of a DNA encoding nucleic acid binding protein by higher eukaryotes may be increased by inserting an enhancer sequence into the vector. Enhancers are relatively orientation and position independent. Many enhancer sequences are known from mammalian genes (e.g. elastase and globin). However, typically one will employ an enhancer from a eukaryotic cell virus. Examples include the SV40 enhancer on the late side of the replication origin (bp 100-270) and the CMV early promoter enhancer. The enhancer may be spliced into the vector at a position 5' or 3' to nucleic acid binding protein DNA, but is preferably located at a site 5' from the promoter.

Advantageously, a eukaryotic expression vector encoding a nucleic acid binding protein according to the invention may comprise a locus control region (LCR). LCRs are capable of directing high-level integration site independent expression of transgenes integrated into host cell chromatin, which is of importance especially where the nucleic acid binding protein gene is to be expressed in the context of a permanently-transfected eukaryotic cell line in which chromosomal integration of the vector has occurred, or in transgenic animals.

Eukaryotic vectors may also contain sequences necessary for the termination of transcription and for stabilising the mRNA. Such sequences are commonly available from the 5' and 3' untranslated regions of eukaryotic or viral DNAs or cDNAs. These regions contain nucleotide segments transcribed as polyadenylated fragments in the untranslated portion of the mRNA encoding nucleic acid binding protein.

An expression vector includes any vector capable of expressing nucleic acid binding protein nucleic acids that are operatively linked with regulatory sequences, such as promoter regions, that are capable of expression of such DNAs. Thus, an expression vector refers to a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virus or other vector, that upon introduction into an appropriate host cell, results in expression of the cloned DNA. Appropriate expression vectors are well known to those with ordinary skill in the art and include those that are replicable in eukaryotic and or prokaryotic cells and those that remain episomal or those which integrate into the host cell genome. For example, DNAs encoding nucleic acid binding protein may be inserted into a vector suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based vector such as pEVRF (Matthias, et al., (1989) NAR 17, 6418).

Particularly useful for practising the present invention are expression vectors that provide for the transient expression of DNA encoding nucleic acid binding protein in mammalian cells. Transient expression usually involves the use of an expression vector that is able to replicate efficiently in a host cell, such that the host cell accumulates many copies of the expression vector, and, in turn, synthesises high levels of nucleic acid binding protein. For the purposes of the present invention, transient expression systems are useful e.g. for identifying nucleic acid binding protein mutants, to identify potential phosphorylation sites, or to characterise functional domains of the protein.

Construction of vectors according to the invention employs conventional ligation techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to generate the plasmids required. If desired, analysis to confirm correct sequences in the constructed plasmids is performed in a known fashion. Suitable methods for constructing expression vectors, preparing in vitro transcripts, introducing DNA into host cells, and performing analyses for assessing nucleic acid binding protein expression and function are known to those skilled in the art. Gene presence, amplification and or expression may be measured in a sample directly, for example, by conventional Southern blotting, Northern blotting to quantitate the transcription of mRNA, dot blotting (DNA or RNA analysis), or in situ hybridisation, using an appropriately labelled probe which may be based on a sequence provided herein. Those skilled in the art will readily envisage how these methods may be modified, if desired.

In accordance with another embodiment of the present invention, there are provided cells containing the above-described nucleic acids. Such host cells such as prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and producing the nucleic acid binding protein. Suitable prokaryotes include eubacteria, such as Gram-negative or Gram-positive organisms, such as E. coli, e.g. E. coli K-12 strains, DH5a and HB101, or Bacilli. Further hosts suitable for the nucleic acid binding protein encoding vectors include eukaryotic microbes such as filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. Higher eukaryotic cells include insect and vertebrate cells, particularly mammalian cells including human cells or nucleated cells from other multicellular organisms. In recent years propagation of vertebrate cells in culture (tissue culture) has become a routine procedure. Examples of useful mammalian host cell lines are epithelial or fϊbroblastic cell lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T cells. The host cells referred to in this disclosure comprise cells in in vitro culture as well as cells that are within a host animal.

DNA may be stably incorporated into cells or may be transiently expressed using methods known in the art. Stably transfected mammalian cells may be prepared by transfecting cells with an expression vector having a selectable marker gene, and growing the transfected cells under conditions selective for cells expressing the marker gene. To prepare transient transfectants, mammalian cells are transfected with a reporter gene to monitor transfection efficiency.

To produce such stably or transiently transfected cells, the cells should be transfected with a sufficient amount of the nucleic acid binding protein-encoding nucleic acid to form the nucleic acid binding protein. The precise amounts of DNA encoding the nucleic acid binding protein may be empirically determined and optimised for a particular cell and assay.

Host cells are transfected or, preferably, transformed with the above-captioned expression or cloning vectors of this invention and cultured in conventional nutrient media modified as appropriate for inducing promoters, selecting transformants, or amplifying the genes encoding the desired sequences. Heterologous DNA may be introduced into host cells by any method known in the art, such as transfection with a vector encoding a heterologous DNA by the calcium phosphate coprecipitation technique or by electroporation. Numerous methods of transfection are known to the skilled worker in the field. Successful transfection is generally recognised when any indication of the operation of this vector occurs in the host cell. Transformation is achieved using standard techniques appropriate to the particular host cells used.

Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic cells with a plasmid vector or a combination of plasmid vectors, each encoding one or more distinct genes or with linear DNA, and selection of transfected cells are well known in the art (see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press).

Transfected or transformed cells are cultured using media and culturing methods known in the art, preferably under conditions, whereby the nucleic acid binding protein encoded by the DNA is expressed. The composition of suitable media is known to those in the art, so that they can be readily prepared. Suitable culturing media are also commercially available.

Nucleic acid binding molecules according to the invention may be employed in a wide variety of applications, including diagnostics and as research tools. Advantageously, they may be employed as diagnostic tools for identifying the presence of nucleic acid molecules in a complex mixture.

Preferred molecules according to the invention have gene-specific DNA binding activity. These may be constructed by the engineering of DNA-binding polypeptide domains with given DNA sequence-specificity, to target the appropriate gene(s).

Given the speed and convenience with which a great number of selections can be performed in parallel using the bipartite library strategy, we believe that the system is of great utility. The 'bipartite' system is a most time— and cost-effective general method of engineering zinc fingers by phage display.

Described herein is a rapid and convenient method that can be used to design zinc finger proteins against an unlimited set of DNA binding sites. This is based on a pair of pre-made zinc fmger phage display libraries, which are used in parallel to select two DNA-binding domains that each recognise given 5 bp sequences, and whose products are recombined to produce a single protein that recognises a composite (10 bp) site of predefined sequence. Engineering using this system can be completed in less than two weeks and yields polypeptide molecules that bind sequence-specifically to DNA with K_dS in the nanomolar range. Library selection is therefore suitable for production of zinc fingers capable of binding to sequences comprising receptor nucleotide sequences. Our methods enable the production of polypeptides capable of binding to any receptor nucleotide sequence, by identification of a motif or sequence within that sequence, and selection of one or more zinc fingers (or other nucleic acid binding polypeptides) which bind to that sequence or motif.

As used herein, the term 'region' may mean part, segment, locus, area, fragment, motif, domain, section, site or similar part of said promoter, and may even include the promoter in its entirety. Thus, the phrase 'region of the/a ... promoter' includes segment(s), fragments etc. of the promoter, and may include the whole promoter, or motifs therein such as transcription factor binding site(s), or other such parts thereof.

Presented herein is a zinc finger engineering strategy which (i) yields zinc finger polymers that bind DNA specifically, with good affinity, and without significant sequence restrictions on the generation of such polymer molecules, (ii) can be executed relatively rapidly, and (iii) can be easily adapted to a high-throughput automated format. This strategy is based on recent advances in our understanding of zinc finger function, particularly the phenomenon of synergistic DNA recognition by adjacent zinc fingers (11, 18), in combination with certain technical advances in zinc fmger library design as discussed herein. The invention thus relates to the construction of a zinc finger library according to the new strategy disclosed herein.

It should be noted that it is possible for the recombinant proteins of the present invention to feature idiosyncratic combinations of amino acids that would not necessarily have been predicted by a recognition code. This is particularly true of the combinations of amino acids that are responsible for the inter-finger synergy that allows any base-pair to be specified at the interface of zinc finger DNA subsites (11).

Zinc finger domains may be made by methods described and/or referred to herein. For example, said zinc finger DNA binding domains may be made as discussed in the examples, or as described in one or more of WO96/06166, WO98/53058, WO98/53057, or WO/98/53060.

LIBRARY

The nucleic acid binding polypeptides, including zinc finger polypeptides, capable of binding and preferably downregulating nucleic acid sequences comprising receptor nucleotide sequences, may be selected from a library.

The term "library" is used according to its common usage in the art, to denote a collection of polypeptides or, preferably, nucleic acids encoding polypeptides. The polypeptides of the invention contain regions of randomisation, such that each library will comprise or encode a repertoire of polypeptides, wherein individual polypeptides differ in sequence from each other. The same principle is present in virtually all libraries developed for selection, such as by phage display.

Randomisation, as used herein, refers to the variation of the sequence of the polypeptides which comprise the library, such that various amino acids may be present at any given position in different polypeptides. Randomisation may be complete, such that any amino acid may be present at a given position, or partial, such that only certain amino acids are present. Preferably, the randomisation is achieved by mutagenesis at the nucleic acid level, for example by synthesising novel genes encoding mutant proteins and expressing these to obtain a variety of different proteins. Alternatively, existing genes can be themselves mutated, such by site-directed or random mutagenesis, in order to obtain the desired mutant genes. THE 'BIPARTITE' LIBRARY STRATEGY

Nucleic acids encoding nucleic acid binding polypeptides according to the invention may be selected using a 'bipartite-complementary' system for the construction of DNA-binding domains by phage display. This system comprises two master libraries, Libl2 and Lib23, each of which encodes variants of a three-finger DNA-binding domain based on that of the transcription factor Zif268 (6, 19). The two libraries are complementary because Lib 12 contains randomisations in all the base- contacting positions of FI and certain base-contacting positions of F2, while Lib23 contains randomisations in the remaining base-contacting positions of F2 and all the base-contacting positions of F3. The non-randomised DNA-contacting residues carry the nucleotide specificity of the parental Zif268 DNA-binding domain.

The design of the bipartite system features at least two modifications to the conventional zinc finger engineering strategies. As described above, each library contains members that are randomised in the α-helical DNA-contacting residues from more than one zinc finger. We have shown that the simultaneous randomisation of positions from adjacent fingers results in selected zinc finger pairs that can achieve comprehensive DNA recognition, i.e. bind DNA without significant sequence limitations.

The proteins produced by these libraries are therefore not limited to binding DNA sequences of the form GNNGNN..., as is the case with many prior art libraries (eg. 9, 13, 20).

The repertoire of randomisations does not encode all 20 amino acids, rather representing only those residues that most frequently function in sequence-specific DNA binding from the respective α-helical positions. Excluding the residues that do not frequently function in DNA recognition advantageously helps to reduce the library size and/or the 'noise' associated with non-specific binding members of the library. The bipartite selection strategy allows the recombination in vitro of the complementary portions of the two libraries, without the need for further purification steps. We take advantage of selective PCR, so as to amplify only the products of recombination. PCR with enzymes lacking 5'→3' exonuc lease activity cannot proceed if primers contain one or more 3' mismatches against their template binding sites. The two complementary libraries may therefore be designed with unique sequences at their 5' and 3' termini, and the corresponding primers used to amplify any recombinants of the two libraries. Furthermore, the selection procedure is amenable to a microtitre plate format so that selections and most subsequent manipulations may be automated (e.g., be carried out using liquid handling robots).

Most of the steps of the engineering process using our bipartite protocol - bacterial growth, phage selection, colony picking, phage ELISA, PCR and cloning - may be automated using commercially available instruments. Microtitre plates, such as 96 or 384 well microtitre plates, may be used to carry out phage selections, ELISA reactions and PCR preparation on a liquid-handling robotic platform. A robotic arm shuttles the microtitre plates between a pipeting station, a plate hotel, a plate washer, a spectrophotometer, and a PCR block. A colony picking robot may be used to inoculate micro-cultures of bacteria in microtitre plates in order to provide monoclonal phage for ELISA. A robot may be used that interfaces with the spectrophotometer and which is capable of returning to the liquid culture archive in order to 'cherry-pick' particular clones that are suitable for recombination, or which should be archived. A bar-coding system may be used to keep track of the various plates used for phage selections, phage ELISAs or for archiving interesting clones.

The ability to carry out selective PCR implies that the protocol may even be adapted to selecting complementary library portions in the same tube or well. For example, both universal libraries may be co-screened in a single well, thereby increasing the efficiency of high throughput applications. The output of such combined selections may be monitored by any means, for example, by selective PCR, or by ELISA of samples of isolated clones, etc. This system is further discussed elsewhere in this application, such as in the Examples section.

In a preferred embodiment, the nucleic acid binding molecules of the invention can be incorporated into an ELISA assay. For example, phage displaying the molecules of the invention can be used to detect the presence of the target nucleic acid, and visualised using enzyme-linked anti-phage antibodies. The sites at which molecules according to the invention bind the target nucleic acid molecule may be determined by methods known in the art for example using binding assays, footprinting, truncation or mutant analysis.

ENGINEERING ZINC FINGER POLYPEPTIDE MOLECULES

The engineering of zinc fmger polypeptides according to the invention preferably utilises a strategy of engineering zinc finger DNA-binding domains by phage display which has distinct advantages over the existing methods (1, 2), resulting in an advance in our ability to select and/or produce DNA-binding proteins.

Such a strategy can produce zinc fingers binding to diverse DNA sequences, while other methods yield proteins that require the presence of G nucleotide at every third base position (13, 20). This feature is based upon an improvement of our understanding of the synergistic nature of zinc fmger interactions, as discussed herein. Prior art techniques have been confined to small subsets of G-rich DNA sequences. The ability to bind a variety of DNA sequences enables targeting of any given promoter in the genome, and is an advantageous feature of at least one aspect of the present invention.

Another advantage of the methods described here is the speed with which DNA-binding domains may be produced. The main reason for the relatively fast turnover is that our new system takes advantage of pre-made phage display libraries, rather than being based on recurring library construction (2) in order to assemble a zinc finger polymer. This in turn allows for parallel (c.f. serial) selection of zinc fingers from phage display libraries, thus saving time beyond that required simply for cloning. Additionally, the selective PCR protocols allow recombination to be advantageously carried out in vitro using a mixed population of zinc finger phage as starting material, thereby circumventing cumbersome clone isolation, DNA preparation and gel purification procedures. It is envisaged that the methods of the present invention may be useful in high-throughput protein engineering, such as via automation using liquid handling robotic systems.

Nucleic acid binding molecules according to the invention may comprise tag sequences to facilitate studies and/or preparation of such molecules. Tag sequences may include flag-tag, myc-tag, 6his-tag or any other suitable tag known in the art.

Another advantage of the methods described here is the ability to target nucleic acid sequences which comprise cis-acting elements. Examples of cis-acting elements include promoters, enhancers, repressors, transcription factor binding sites, initiators, and other such nucleic acid sequences. Molecules according to the invention may advantageously be targetted to bind at and/or adjacent and/or near to such cis-acting elements. Preferably, molecules according to the invention may be targetted to transcription factor binding sites. By directing or targeting the nucleic acid binding molecules of the invention to nucleic acid sequences in this manner, surprisingly high effects, such as repression effects, may be achieved. This is discussed further below. Such molecules may be advantageously targeted to bind at sites comprising all or part of, or adjacent to, transcription factor sites such as SP1 sites, NF-kB sites, or any other transcription factor binding sites. Preferably, such molecules are targetted to SP1 sites.

Preferably, the DNA-binding domains described herein are highly effective in repressing gene expression from nucleic acid molecules to which they bind. More preferably, the DNA-binding domains described herein are highly effective in repressing gene expression from the HIV-1 promoter. In a highly preferred embodiment, said repression of gene expression involves the binding of said DNA- binding domains to one or more region(s) of the HIV-1 promoter comprising or adjacent to one or more SP1 transcription factor binding site(s). Advantageously, molecules according to the invention be used in combination. Use in combination includes both fusion of molecules into a single polypeptide as well as use of two or more discrete polypeptide molecules in solution.

The transcription factor binding site may be a binding site for a known transcription factor. The transcription factor may be an animal, preferably vertebrate, or plant transcription factor. Such transcription factors, and their putative or determined binding sites, including any consensus motifs, are known in the art, and may be found in (for example), the "Transcription Factor Database", at http://www.hsc.virginia.edu/achs/molbio/databases/tfd_dat.html. Reference is also made to Nucleic Acids Res 21, 3117-8 (1993), Gene Transcription: A Practical Approach, 321-45 (1993) and Nucleic Acids Res 24, 238-41 (1996). A list of transcription factors, together with their binding sites, is contained in the file "tfsites.dat", is a composite of the datasets TFD (release 7.5) SITES dataset file, 3/96 and Transfac (release 2.5) SITES dataset selected entries, 1/96. The file "tfsites.dat" may be obtained using the GCG command "FETCH tfsites.dat". Any of these binding sites may be targeted according the invention. Preferred transcription factors include those comprising homeodomains. Specific transcription factors and sites include those for NF-kB (GGGAAATTCC), Spl (consensus sequence G/T-GGGCGG-G/A-G/A- C/T) Oct-1 (ATTTGCAT), p53, myC, myB, API etc.

PHARMACEUTICALS

Moreover, the invention provides therapeutic agents and methods of therapy involving use of nucleic acid binding proteins as described herein. In particular, the invention provides the use of polypeptide fusions comprising an integrase, such as a viral integrase, and a nucleic acid binding protein according to the invention to target nucleic acid sequences in vivo (Bushman, (1994) PNAS (USA) 91 :9233-9237). In gene therapy applications, the method may be applied to the delivery of functional genes into defective genes, or the delivery of nonsense nucleic acid in order to disrupt undesired nucleic acid. Alternatively, genes may be delivered to known, repetitive stretches of nucleic acid, such as centromeres, together with an activating sequence such as an LCR. This would represent a route to the safe and predictable incorporation of nucleic acid into the genome.

In conventional therapeutic applications, nucleic acid binding proteins according to the invention may be used to specifically knock out cell having mutant vital proteins. For example, if cells with mutant ras are targeted, they will be destroyed because ras is essential to cellular survival. Alternatively, the action of transcription factors may be modulated, preferably reduced, by administering to the cell agents which bind to the binding site specific for the transcription factor. For example, the activity of HIV tat may be reduced by binding proteins specific for HIV TAR.

Moreover, binding proteins according to the invention may be coupled to toxic molecules, such as nucleases, which are capable of causing irreversible nucleic acid damage and cell death. Such agents are capable of selectively destroying cells which comprise a mutation in their endogenous nucleic acid.

Nucleic acid binding proteins and derivatives thereof as set forth above may also be applied to the treatment of infections and the like in the form of organism- specific antibiotic or antiviral drugs. In such applications, the binding proteins may be coupled to a nuclease or other nuclear toxin and targeted specifically to the nucleic acids of microorganisms.

The invention likewise relates to pharmaceutical preparations which contain the compounds according to the invention or pharmaceutically acceptable salts thereof as active ingredients, and to processes for their preparation.

The pharmaceutical preparations according to the invention which contain the compound according to the invention or pharmaceutically acceptable salts thereof are those for enteral, such as oral, furthermore rectal, and parenteral administration to (a) warm-blooded animal(s), the pharmacological active ingredient being present on its own or together with a pharmaceutically acceptable carrier. The daily dose of the active ingredient depends on the age and the individual condition and also on tl e manner of administration.

The novel pharmaceutical preparations contain, for example, from about 10 % to about 80%, preferably from about 20 % to about 60 %, of the active ingredient. Pharmaceutical preparations according to the invention for enteral or parenteral administration are, for example, those in unit dose forms, such as sugar-coated tablets, tablets, capsules or suppositories, and furthermore ampoules. These are prepared in a manner known per se, for example by means of conventional mixing, granulating, sugar-coating, dissolving or lyophilising processes. Thus, pharmaceutical preparations for oral use can be obtained by combining the active ingredient with solid carriers, if desired granulating a mixture obtained, and processing the mixture or granules, if desired or necessary, after addition of suitable excipients to give tablets or sugar- coated tablet cores.

Suitable carriers are, in particular, fillers, such as sugars, for example lactose, sucrose, mannitol or sorbitol, cellulose preparations and/or calcium phosphates, for example tricalcium phosphate or calcium hydrogen phosphate, furthermore binders, such as starch paste, using, for example, corn, wheat, rice or potato starch, gelatin, tragacanth, methylcellulose and/or polyvinylpyrrolidone, if desired, disintegrants, such as the abovementioned starches, furthermore carboxymethyl starch, crosslinked polyvinylpyrrolidone, agar, alginic acid or a salt thereof, such as sodium alginate; auxiliaries are primarily glidants, flow-regulators and lubricants, for example silicic acid, talc, stearic acid or salts thereof, such as magnesium or calcium stearate, and/or polyethylene glycol. Sugar-coated tablet cores are provided with suitable coatings which, if desired, are resistant to gastric juice, using, inter alia, concentrated sugar solutions which, if desired, contain gum arabic, talc, polyvinylpyrrolidone, polyethylene glycol and or titanium dioxide, coating solutions in suitable organic solvents or solvent mixtures or, for the preparation of gastric juice-resistant coatings, solutions of suitable cellulose preparations, such as acetylcellulose phthalate or hyc oxypropylmethylcellulose phthalate. Colorants or pigments, for example to identify or to indicate different doses of active ingredient, may be added to the tablets or sugar-coated tablet coatings.

Other orally utilisable pharmaceutical preparations are hard gelatin capsules, and also soft closed capsules made of gelatin and a plasticiser, such as glycerol or sorbitol. The hard gelatin capsules may contain the active ingredient in the form of granules, for example in a mixture with fillers, such as lactose, binders, such as starches, and/or lubricants, such as talc or magnesium stearate, and, if desired, stabilisers. In soft capsules, the active ingredient is preferably dissolved or suspended in suitable liquids, such as fatty oils, paraffin oil or liquid polyethylene glycols, it also being possible to add stabilisers.

Suitable rectally utilisable pharmaceutical preparations are, for example, suppositories, which consist of a combination of the active ingredient with a suppository base. Suitable suppository bases are, for example, natural or synthetic triglycerides, paraffin hydrocarbons, polyethylene glycols or higher alkanols. Furthermore, gelatin rectal capsules which contain a combination of the active ingredient with a base substance may also be used. Suitable base substances are, for example, liquid triglycerides, polyethylene glycols or paraffin hydrocarbons.

Suitable preparations for parenteral administration are primarily aqueous solutions of an active ingredient in water-soluble form, for example a water-soluble salt, and furthermore suspensions of the active ingredient, such as appropriate oily injection suspensions, using suitable lipophilic solvents or vehicles, such as fatty oils, for example sesame oil, or synthetic fatty acid esters, for example ethyl oleate or triglycerides, or aqueous injection suspensions which contain viscosity-increasing substances, for example sodium carboxymethylcellulose, sorbitol and/or dextran, and, if necessary, also stabilisers.

The dose of the active ingredient depends on the warm-blooded animal species, the age and the individual condition and on the manner of administration. In the normal case, an approximate daily dose of about 10 mg to about 250 mg is to be estimated in the case of oral administration for a patient weighing approximately 75 kg EXAMPLES

Example 1. Zinc Finger Engineering Strategy (CXCR4 and TNFRl)

Sequences situated within the previously described minimal promoter and comprising the GC I box, the NRF-1 site and the TATA box are targeted (position —73 to -30). Zinc fingers are engineered to bind to the CXCR4 promoter using the

'bipartite' method described above and in WO98/53057. The bipartite method is based on a pair of pre-made zinc finger phage display libraries, which are used in parallel to select two DNA-binding domains that each recognise given 5 bp sequences, and whose products are recombined to produce a single protein that recognises a composite (9-10 bp) site of predefined sequence. Engineering using this system can be completed in less than two weeks and yields three-zinc finger polypeptide molecules that bind sequence-specifically to DNA with Kds in the nanomolar range. Having thus obtained three-zinc finger molecules, the genes for these peptides are linked together to make functional six-zinc fmger proteins. In the present case, we generate three such six-zinc finger proteins, engineered to bind a promoter sequence of the CXCR4 receptor.

A similar strategy is used to generate three and six finger proteins engineered to bind a promoter sequence of the TNFRl receptor.

Zinc finger proteins selected to bind to the CXCR4 or the TNFRl promoter regions are then engineered into repressor polypeptides. These repressors contain the zinc finger DNA binding domain at the N-terminus fused in frame to the translation initiation sequence ATG. The 7 amino acid nuclear localisation sequence (NLS) of the wild-type Simian Virus 40 large-T antigen (Kalderon et al., Cell 39:499-509 (1984)) is fused to the C-terminus of the zinc finger sequence and the Kruppel-associated box (KRAB) repressor domain from human KOX1 protein (Margolin et al., PNAS 91:4509-4513 (1994)) is fused downstream of the NLS.

The sequence of the SV40-NLS-KOXl-c-myc repressor domain (NLS-KOXl- c-myc domain sequence) is as follows: AARNSGPKKKRKVDGGGA SPQHSAVTQGSIIKNKEGMDAKS TA SRTLVTF KDVFVDFTREEWKLLDTAQQIVYRNVMLENYKNLVSLGYQLT PDVILRLEKG EEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEEDL

The KOX1 domain contains amino acids 1-97 from the human KOX1 protein (database accession code P21506) in addition to 23 amino acids which act as a linker. In addition, a 10 amino acid sequence from the c-myc protein (Evan et al., Mol. Cell. Biol. 5: 3610 (1985)) is introduced downstream of the KOX1 domain as a tag to facilitate expression studies of the fusion protein.

Zinc fmger constructs are then tested for specific target binding using a fluorescence ELISA, and for repression activity using FACS analysis.

Example 2. Sequences of Binding-Sites and Composition of Zinc Fingers Engineered Against the CXCR4 Gene Promoter

A. DNA Binding Sites in the CXCR4 Promoter

DNA sequences in the promoter region of the CXCR4 gene showing 9-bp binding sites (underlined), that are used as targets for engineering 3-finger proteins using the 'bipartite' protocol, are shown below. The resulting three-zinc finger proteins, that bind the 9-bp sites, are paired to generate the six-zinc finger proteins, CXCR4-5-1, CXCR4-10-1 and CXCR4-10-3, that bind 18-bp sites (underlined).

Sequence targeted within the CXCR4 promoter (position -73 to -30): 5'-TCCCCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTTTTTATA-3' CXCR4-5-1 5'-CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT-3 '

1 5

CXCR4-10-1 5'-CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT-3'

1 10

CXCR4-10-3 5'-CCGCCCCAGCGGCGCATGCGCCGCGCTCGGAGCGTGTT-3' 3 10

B. Zinc Finger Protein Sequences

Amino acid sequences of the helical regions from recombinant six-zinc finger DNA-binding domains engineered against the CXCR4 gene promoter. Residues are numbered relative to the first position in the a-helix (position 1) in each finger (Fl-6).

CXCR4-5-1 (Linker A TGGGGSGGSGGSERP between F3 and F4)

CXCR4-10-1 (Linker B TGGGGSGGSGGSGGSGGSERP between F3 and F4^

CXCR4-10-3 (Linker B TGGGGSGGSGGSGGSGGSERP between F3 and F4)

C. Peptide Linkers Between Fingers 3 and 4

Amino acid composition of peptide linkers (A and B), used to link the three- finger units shown above into six-finger constructs. Different lengths of linker are used to span different lengths of gap between the binding sites of the subunits.

Wild-type linker = TGERP LINKER-A = TGGGGSGGSGGSERP

LINKER-B = TGGGGSGGSGGSGGSGGSERP P. Sequences of CXCR4 Six Finger Domains

CXCR4 5-1

MAERPYACPVESCDRRFSDSATLTEHIRIHTGQKPFQCRICMRNFSRRDDLSRHIRTH TGEKPFACDICGRKFARKSDRTRHTKIHTGGGGSGGSGGSERPYACPVESCDRRFSRS DELTRHIRIHTGQKPFQCRICMRNFSRSDTLSKHIRTHTGEKPFACDICGRKFAQKHD RTQHTKIHLRQKD

CXCR4-10-1 AERPYACPVESCDRRFSKSNDLIRHIRIHTGQKPFQCRICMRNFSQSAHLSRHIRTH TGEKPFACDICGRKFADNRDRTKHTKIHTGGGGSGGSGGSGGSGGSERPYACPVESCD RRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDTLSKHIRTHTGEKPFACDICGRK FAQKHDRTQHTKIHLRQKD

CXCR410-3

MAERPYACPVESCDRRFSKSNDLIRHIRIHTGQKPFQCRICMRNFSQSAHLSRHIRTH TGEKPFACDICGRKFADNRDRTKHTKIHTGGGGSGGSGGSGGSGGSERPYACPVESCD RRFSRSTDLIRHIRIHTGQKPFQCRICMRNFSTSSNLSAHIRTHTGEKPFACDICGRK FARNADRTKHTKIHLRQKD

E. Sequences of Constructs Comprising Repressor Domains

CXCR4 5-1 + Repressor

MAERPYACPVESCDRRFSDSATLTEHIRIHTGQKPFQCRICMRNFSRRDDLSRHIRTH TGEKPFACDICGRKFARKSDRTRHTKIHTGGGGSGGSGGSERPYACPVESCDRRFSRS DELTRHIRIHTGQKPFQCRIC RNFSRSDTLSKHIRTHTGEKPFACDICGRKFAQKHD RTQHTKIHLRQKDAARNSGPKKKRKVDGGGALSPQHSAVTQGSIIKNKEGMDAKS TA SRTLVTFKDVFVDFTREE KLLDTAQQIVYRNVMLENYKNLVSLGYQLTKPDVILRL EKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEEDL

CXCR4-10-1+ Repressor lAERPYACPVESCDRRFSKSNDLIRHIRIHTGQKPFQCRICMRNFSQSAHLSRHIRTH TGEKPFACDICGRKFADNRDRTKHTKIHTGGGGSGGSGGSGGSGGSERPYACPVESCD RRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDTLSKHIRTHTGEKPFACDICGRK FAQKHDRTQHTKIHLRQKDAARNSGPKKKRKVDGGGA SPQHSAVTQGSIIKNKEGMD AKSLTAWSRTLVTFKDVFVDFTREEWKLLDTAQQIVYRNV LENYKNLVSLGYQLTKP DVILRLEKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEED CXCR4-10-3 + Repressor

MAERPYACPVESCDRRFSKSNDLIRHIRIHTGQKPFQCRICMRNFSQSAHLSRHIRTH TGEKPFACDICGRKFADNRDRTKHTKIHTGGGGSGGSGGSGGSGGSERPYACPVESCD RRFSRSTDLIRHIRIHTGQKPFQCRICMRNFSTSSNLSAHIRTHTGEKPFACDICGRK FARNADRTKHTKIHLRQKDAARNSGPKKKRKVDGGGALSPQHSAVTQGSIIKNKEGMD AKS--.TAWSRTLVTFKDVFVDFTREEWKLLDTAQQIVYRNVM ENYKN VSLGYQI-TKP DVILRLEKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEED

Example 3. Sequences of Binding-Sites and Composition of Zinc Fingers Engineered Against the TNFRl Gene Promoter

A. DNA Binding Sites in the TNFRl Promoter

DNA sequences in the promoter region of the TNFRl gene showing 9-bp binding sites (underlined) , that are used as targets for engineering 3-finger proteins using the 'bipartite' protocol, are shown below. The resulting three-zinc finger proteins that bind the 9-bp sites, are paired to generate the six-zinc finger proteins, TNFRl-4-2, TNFR1-7-9, TNFRl -7-10 and TNFR1-9-12.

Sequences targeted within the TNFRl promoter

-331 GCCTCCTCCTCCCGCCTCCTGTGGCCTCCTCCTCCAGCTCTTCCTGTCCCGCTG -278

( REF . 25 )

-420 TGGTCGGATTGGTGGGTTGGGGGCACAAGGCA -389 (REF. 25)

TNFRl-4-2

5 ' GTCGGATTGGTGGGTTGGGGGC ACAAGGC A- 3 ' ( NFR 1 - 4 - 2 )

TNFRl-7-9 and TNFR1-7-10 5 ' TGGAGGAGGAGGCCACAGGAGGCGGGAGGA-3 ' (TNFR1-7-9; TNFR1-7-10)

TNFR1-9-12

5 ' CAGGAAGAGCTGGAGGAGGAGGCCCACAGGA-3 ' ( TNFR1-9-12)

TNFR-14-2 recognises underlined sites in GGATTGGTGGG TTGGGGGCACA: TNFR1-7-9 and TNFRl -7-10 recognise underlined sites in

CCGCCTCCTGTGGCCTCCTCC; and TNFRl -9-12 recognises underlined sites in GTGGCCTCCTCCTCCAGCTCTT B. Zinc Finger Protein Sequences

Amino acid sequences of the helical regions from recombinant six-zinc finger DNA-binding domains engineered against the TNFRl gene promoter. Residues are numbered relative to the first position in the a-helix (position 1) in each fmger (Fl-6).

TNFRl-4-2 (Linker TGSERP between F3 and F4

TNFR1-7-9 (Linker TGGGGSERP between F3 and F4

TNFR1-9-12 (Linker TGGGGSERP between F3 and F4)

TNFR1-7-10 (Linker TGGGGSGGSERP between F3 and F4)

Amino acid linkers as shown above are used to link the three-finger units shown above into six-finger constructs. Different lengths of linker are used to span different lengths of gap between the binding sites of the subunits. C. Sequences of TNFRl Six Finger Domains

TNFRl-4-2. recognising GGATTGGTGGG TTGGGGGCACA

M A E R P Y A c P V E S C D R

R F S A S A D L T R H I R I H

T G Q K P F Q C R I C M R N F

S R R D H L S E H I R T H T G

E K P F A C D I C G R K F A R

N D S R T N H T K I H T G S E

R P Y A C P V Ξ s c D R R F S

R S Q H T E H I R I H T G Q

K P F Q C R I C M R N F S T S s H L S V H I R T H T G E K P

F A C D I C G R K F A H S N A

R K T H T K I H L R Q K D

TNFRl -7-9, recognising CCGCCTCCTGTGGCCTCCTCC

M A E R P Y A C P V E s C D R

R F S R s D Ξ T R H I R I H

T G Q K P F Q C R I C M R N F

S R S D N L S E H I R T H T G

E K P F A C D I C G R K F A R

Ξ D N R K T H T K I H T G G G

G S E R P Y A c P V E S C D R

R F S D N R D L I R H I R 1 H

T G Q K P F Q c R I C M R N F

S R S D D L S R H I R T H T G

E P F A C D I C G R K F A R

S D N R T K H T K I H L R Q K

D

TNFR1-9-12. recognising GTGGCCTCCTCCTCCAGCTCTT

M A Ξ R P Y A c P V E S C D R

R F S D N R D L I R H I R I H

T G Q K P F Q C R I C M R N F

S R S D D L S R H I R T H T G

E K P F A C D I c G R K F A R

S D N R T K H T K I H T G G G

G S E R P Y A c P V E S C D R

R F S D S A H I R H I R I H

T G Q K P F Q c R I C M R N F

S T S S D L S R H I R T H T G

E K P F A C D I c G R K F A Q

S A H R K T H T K I H L R Q K

D

TNFR1-7-10. recognising CCGCCTCCTGTGGCCTCCTCC

M A E R P Y A c P V E S C D R

R F S R s D E L T R H I R I H

T G Q K P F Q c R I C M R N F 46 s R S D N L S E H I R T H T G

61 Ξ K P F A C D I C G R K F A R

76 Ξ D N R K T H T K 1 H T G G G

91 G S G G s E R P Y A C P V E S

106 C D R R F S D N R D L I R H I

121 R I H T G Q K P F Q C R I C M

136 R N F S R S D D L S R H I R T

151 H T G Ξ K P F A C D I C G R K

166 F A R S D N R T K H T K I H L

18 1 R Q K D

D. Sequences of Constructs ' Comprising Repressor Domains (linkers underlined)

TNFRl-4-2 + Repressor

MAERPYACPVEΞCDRRFSASADLTRHIRIHTGQKPFQCRICMRNFSRRDHLSEHIRTHTGEKPFACDIC GRKFARNDSRTNHTKIHTGSERPYACPVESCDRRFSRSQHLTΞHIRIHTGQKPFQCRICMRNFSTSSH SVHIRTHTGEKPFACDICGRKFAHSNARKTHTKIHLRQKDAARNSGPKKKRKVDGGGALSPQHSAVTQG SIIKNKEG DAKS TAWSRT VTFKDVFVDFTREE KLLDTAQQIVYRNVMIiENYKNLVSLGYQ TKPD VI RLEKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEED

TNFRl -7-9 + Repressor MAERPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFSRSDN SEHIRTHTGEKPFACDIC GRKFARSDNRKTHTKIHTGGGGSERPYACPVESCDRRFSDNRDLIRHIRIHTGQKPFQCRICMRNFSRS DD SRHIRTHTGEKPFACDICGRKFARSDNRTKHTKIHLRQKDAARNSGPKKKRKVDGGGALSPQHSAV TQGSIIKNKEGMDAKSLTA SRTLVTFKDVFVDFTREEWKLLDTAQQIVYRNV ENYKN VSLGYQ T KPDVILRLEKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEED

TNFR1-9-12 + Repressor

MAERPYACPVESCDRRFSDNRDLIRHIRIHTGQKPFQCRICMRNFSRSDDLSRHIRTHTGΞKPFACDIC GRKFARSDNRTKHT IHTGGGGSERPYACPVESCDRRFSDSAH IRHIRIHTGQKPFQCRICMRNFSTS SDLSRHIRTHTGEKPFACDICGRKFAQSAHR THTKIHLRQKDAARNSGPKKKRKVDGGGALSPQHSAV TQGSIIKNKEGMDAKSLTAWSRTLVTFKDVFVDFTREEWK LDTAQQIVYRNVM ENYKN VS GYQ T KPDVILR EKGEEPW VEREIHQETHPDSETAFEIKSSVEQK ISEEDL

TNFR1-7-10 + Repressor

MAERPYACPVESCDRRFSRSDE TRHIRIHTGQKPFQCRICMRNFSRSDNLSEHIRTHTGEKPFACDIC GRKFARSDNRKTHTKIHTGGGGSGGSERPYACPVESCDRRFSDNRDLIRHIRIHTG KPFQCRIC RNF SRSDD SRHIRTHTGEKPFACDICGRKFARSDNRTKHTKIHLRQKDAARNSGPKKKRKVDGGGALSPQH SAVTQGSIIKNKEG DAKSLTAWSRT VTFKDVFVDFTREEWK LDTAQQIVYRNVMLENYKN VSLGY QLTKPDVILRLEKGEEPWLVEREIHQETHPDSETAFEIKSSVEQKLISEED Example 4. Experimental Protocol for in vitro Fluorescence ELISA

The binding properties of the six-finger proteins are assayed using an in vitro zinc fmger fluorescence ELISA DNA-binding assay to assess whether the proteins bind specifically to their respective target sequences.

Preparation of Template

Zinc finger constructs are inserted into the protein expression vector pTracer (Invitrogen), downstream of the T7 RNA transcription promoter. Suitable templates for in vitro ELISA are created by PCR using the 5' primer

(GCAGAGCTCTCTGGCTAACTAGAG), which binds upstream of the T7 promoter and a 3 ' primer, which binds to the 3 ' end of the zinc fmger construct and adds a sequence encoding for the HA-antibody epitope tag (YPYDVPDYA).

Zinc Finger Expression

In vitro transcription and translation are performed using the T7 TNT Quick Coupled Transcription / Translation System for PCR templates (Promega), according to the manufacturers instructions, except that the medium is supplemented with 500 μM ZnCl₂.

Fluorescence ELISA

DNA binding reactions contained the appropriate zinc finger peptide, biotinylated binding site (10 nM) and 5 μg competitor DNA (sonicated salmon sperm DNA), in a total volume of 50 μl, which contained: 1 x PBS (pH 7.0), 1.25 x 10^"3 U high affinity anti-HA-Peroxidase antibody (Boehringer Mannheim), 50 μM ZnCl₂, 0.01 mg/ml BSA, and 0.5% Tween 20. Incubations are performed at room temperature for 40 minutes. Black streptavidin-coated wells are blocked with 4% marvel for 1 hour. Binding reactions are added to the streptavidin-coated wells and incubated for a further 40 minutes at room temperature. Wells are washed 5 times in 100 μl wash buffer (1 x PBS (pH 7.0), 50 μM ZnCl₂, 0.01 mg/ml BSA, and 0.5% Tween 20), and finally 50 μl QuantaBlu peroxidase substrate solution (Pierce) is added to detect bound HA-tagged zinc finger peptide. ELISA signals are read in a SPECTRAmax GeminiXS spectrophotometer (Molecular Devices) and analysed using SOFTmax Pro 3.1.2 (Molecular Devices).

Example 5. Experimental Protocol for Transient Assay (FACS)

Jurkat T cells are grown according to supplier instructions in RPMI 1640 L- glutamine containing medium (Life Technology) supplemented with penicillin/streptomycin and fetal calf serum. 5x10⁶ cells in 0.2ml of culture medium are transfected in a 0.4cm gap electroporation cuvette at 0.20 kV and 960 μF with 20 μg pTracerTM-CMV/Bsd zinc finger expressing plasmid. An empty pTracerTM- CMV/Bsd vector plasmid is used as control. The electroporated samples are transferred to 27 cm² cell culture flask containing 30ml Jurkat T cell growth media and incubated for 24 to 72 hours at 37°C and 5% CO₂.

Cells are harvested and pelleted at 1000 rpm for 5 minutes at room temperature. Pellets are resuspended in ice-cold 50 μl staining buffer (PBS, 0.5% BSA, 0.01% sodium azide) and incubated for 30 minutes on ice with saturating amounts of antibodies.

To assess for CXCR4 expression, cells are stained with bitotinylated mouse- anti-human CXCR4 antibody (clone 44716.111 or clone 44717.111, R&Dsystem, diution 1:10 in staining buffer) which is detected with streptavidin-PE (R&Dsystem, diution 1 :20 in staining buffer).

To assess for TNFRl expression, cells are stained with a mouse-anti-human TNFRl antibody (550514, Pharmingen, 1 :25 dilution), which is then bound by a biotinylated anti-mouse IgG, Fab-specific (Sigma), and detected with streptavidin- Cychrome (Pharmingen).

Between each step, cells are pelleted at lOOOrpm for 5 minutes and washed twice in 150 μl ice-cold staining buffer. Flow cytometric analysis is performed on a FACSCalibur using the CellQuest software package (Becton Dickinson). About 300,000 events corresponding to 15,000 events gated on GFP positive cells are collected per sample.

Example 6. DNA-Binding by CXCR4 Constructs Using an in Vitro Fluorescence ELISA Assay

Zinc finger peptides are expressed using the T7 TNT Quick Coupled

Transcription / Translation System for PCR templates (Promega) and assayed for DNA-binding by fluorescence ELISA (see above; Figure 1). Binding is plotted vertically, on a relative scale. Sequences of the DNA target sites used in each binding reaction are shown below the figure. Bases expected to be in the linker region between F3 and F4 are indicated by brackets [eg (cat)].

Figure IA shows binding of the six-zinc finger peptide, CXCR4-5-1, to its preferred target site and to 5 control sites containing base mutations (underlined). Figure IB shows binding of the six-zinc finger peptide, CXCR4-10-1, to its preferred target site and to a control site containing base mutations (underlined). Figure IC shows binding of the six-zinc finger peptide, CXCR4-10-3, to its preferred target site and to a control site containing base mutations (underlined).

The above assays reveal that the proteins bind specifically to their respective target sequences. Example 7. Specific Down-regulation of Endogenous CXCR4 By Selected Zinc Finger Proteins (FACS Analysis)

The Jurkat T-cell line is a human derived lymphoblast T cell shown to express high levels of CXCR4 receptor. This line is used to validate the ability of selected zinc finger protein to control the expression of CXCR4 receptor. Using fluorescence activated cell sorting (FACS), the Jurkat T-cell line is used in a transient transfection assay to validate the selected zinc fmger constructs.

The zinc finger constructs containing the repressor domain, generated as described above, are subcloned in the pTracerTM-CMV/Bsd plasmid vector (Invitrogen) for expression in Jurkat T cells. Expression of the zinc finger protein in transfected cells is monitored by the co-expression of the green fluorescent protein (GFP) gene also encode by the pTracerTM-CMV/Bsd plasmid vector. An empy pTracerTM-CMV/Bsd plasmid vector are used as a control. Jurkat T cells are transfected by electroporation and after 24 to 72 hours, CXCR4 expression is assay by fluorescence activated cell sorting (FACS) analysis.

Expression of CXCR4 at the surface of Jurkat cells is first measured, and the results are as shown in Figure 2 A. IO⁶ cells are stained with bitotinylated mouse-anti- human CXCR4 antibody which is subsequently detected with streptavidin-PE. Unstained cells are used as control (unfilled). 95% of Jurkat expressed CXCR4 at their surface.

Figure 2B shows the results of experiments in which Jurkat T-cells are transfected by electroporation with pTracerTM-CMV/Bsd zinc finger expressing plasmid. An empty pTracerTM-CMV/Bsd plasmid is used as control. From 24 hours post transfection, the expression of the zinc finger protein 5-1 is monitored by expression of GFP. Expression of CXCR4 is assess by FACS analysis on different population: non-transfected Jurkat cells which represent an internal control (Rl) and sub populations R2, R3 and R4 exhibited 10-, 100- and 1000- fold higher GFP fluorescence intensity relative to background, respectively. As expected the population Rl expressed high levels of CXCR4, which was comparable to cells transduced with the empty pTracer (unfilled). Population R2, R3 and R4 present a specific inhibition of CXCR4 expression, compared to the control (unfilled), which seems to correlate with the level of expression of the zinc fmger. The percentage of cells expressing CXCR4 is shown.

Figure 2C shows that dowregulation of CXCR4 is also observed with Jurkat cells transfected with zinc finger proteins 10-1 and 10-3, compared to the cell transfected with an empty control vector (unfilled). The percentage of cells expressing CXCR4 is shown.

The expression of zinc fingers 5-1, 10-1 and 10-3, monitored by the GFP expression, in transfected Jurkat cells results in the specific down-regulation of CXCR4 in comparison to the control. Moreover the down-regulation of CXCR4 on transduced Jurkat cells by the zinc finger protein 5-1 is also observed at low level of expression, relative to GFP expression which presume to high binding specificity of this zinc finger on its endogenous target sequence. Exposure of such transfected cells to HIV results in lower levels of infection and viral titre than in un-transfected cells.

Example 8. DNA-Binding by TNFRl Constructs Using an in Vitro Fluorescence ELISA Assay

Zinc finger peptides are expressed using the T7 TNT Quick Coupled Transcription / Translation System for PCR templates (Promega) and assayed for DNA-binding by fluorescence ELISA, as described above in Examples 4 and 6. Sequences of the DNA target sites used in each binding reaction are shown below Figure 3A.

Figure 3 A shows binding of the six-zinc finger peptide, TNFRl-4-2, to its preferred target site and to a control site from elsewhere in the TNFRl promoter (9- 12). Figure 3B shows binding of the six-zinc finger peptide, TNFR1-7-9, to its preferred target site, an overlapping site (9-12), and to a control site from elsewhere in the TNFRl promoter (4-2). Figure 3C shows binding of the six-zinc finger peptide, TNFRl -9- 12, to its preferred target site, an overlapping site (7-9), and to a control site from elsewhere in the TNFRl promoter (4-2).

The above assays reveal that the proteins bind specifically to their respective target sequences.

Example 9. Specific Down-regulation of Endogenous TNFRl By Selected Zinc Finger Proteins (FACS Analysis)

The Jurkat T-cell line is used in a transient transfection assay to validate the selected zinc finger constructs. Expression of TNFRl at the surface of unmanipulated Jurkat cells is measured by FACS analysis as described in Example 7, and the results are as shown in Figure 4 A and 4B.

Figure 4 shows the results of experiments in which Jurkat T-cells are transfected by electroporation with pTracerTM-CMV/Bsd zinc finger-expressing plasmids. An empty pTracer plasmid is used as a general control, and the CXCR4-5-1 zinc finger expression vector described above is used as a control to demonstrate specificity of the zinc fingers. Analysis is performed 40 hours post-electroporation, and GFP expression is used to identify the transfected cells (left column, normally 20- 30% of the population). As seen in Fig. 4B (middle column), transfection with each of the TΝFR1 -specific zinc fingers, but not the pTracer and CXCR4-5-1 control vectors, produces a population of cells that has lost expression of TΝFR1. Electronic gating on the GFP+ cells (right column) demonstrates that the TΝFR1 -negative population corresponds to the cells that have been transfected with the expression vector. TΝFR1 expression in GFP+ cells transfected with the pTracer or CXCR4-5-1 vectors is normal. Example 10. Specific Downregulation of a Reporter Gene Under Control of the TNFRl Promoter By Selected Zinc Finger Proteins

The region upstream of the TNFRl gene from -163 to -790 (25) is obtained by standard PCR amplification of normal human genomic DNA obtained from peripheral blood. A proof-reading polymerase system is used (High Fidelity, Roche) and the recombinant product is sequenced after cloning into the reporter vector to confirm that the orientation is correct and that the product conformed to published TNFRl sequences. The control reporter vector used is pMet-CAT (26), with a basal metallothionein promoter upstream of the CAT (chloramphenicol acetyl transferase) gene. This promoter is removed by HinDIII digestion and replaced with the TNFRl PCR product for the specific reporter.

Jurkat T cells are used for the reporter gene analysis, using the transfection conditions described above. 15 μg of the CAT reporter is mixed with 3 μg of pTracer, or 1 μg of the zinc finger expression vectors plus 2 μg of pTracer. Cells are harvested 22 hours after electroporation and CAT activities are measured with a radioactive assay (QuanT-CAT, Amersham). As shown in Figure 5 all four zinc fingers directed at the TNFRl sequence substantially reduced promoter activity, and the level of repressed activity is below that generated from the metallothionein minimal promoter. Inclusion of all four zinc finger vectors did not further repress activity, but this is most likely due to the complete repression by constructs rather than by the lack of additive or cooperative repression by multiple finger zinc finger proteins directed at distinct sites associated with the gene.

Example 11: Down-Regulation of CXCR4 and TNFR-1 in the same Cell.

Jurkat T cells are grown according to supplier instructions in RPMI 1640 L-glutamine containing medium (Life Technology) supplemented with penicillin streptomycin and fetal calf serum. 5x10⁶ cells in 0.2 ml of culture medium are transfected in a 0.4cm gap electroporation cuvette at 0.20 kV and 960 μF with a total of 20 μg plasmid DNA, comprising 15 μg of a 'carrier' plasmid, such as pBluescript (Stratagene), or equivalent, and 5 μg of the appropriate test expression vector. pTracerTM-CMV/Bsd zinc finger expressing plasmid. An empty pTracerTM-CMV/Bsd vector plasmid is used as control. The electroporated samples are transferred to 25 cm² cell culture flask containing 10 ml Jurkat T cell growth media and incubated for 24 to 72 hours at 37°C and 5% CO₂.

Duplicate wells are transfected with:

A. 5 μg empty pTracerTM-CMN/Bsd plasmid as a control B. 2.5 μg TΝFRl-4-2-Kox expressing pTracer plus 2.5 μg empty pTracer

C. 2.5 μg CXCR4-5-l-Kox expressing pTracer plus 2.5 μg empty pTracer

D. 2.5 μg TNFRl-4-2-Kox expressing pTracer plus 2.5 μg CXCR4-5-l-Kox expressing pTracer

Cells are harvested and pelleted at 1000 rpm for 5 minutes at room temperature in a desk-top centrifuge. Pellets are resuspended in ice-cold 50 μl staining buffer (PBS, 0.5% BSA, 0.01% sodium azide) and incubated for 30 minutes on ice with saturating amounts of antibodies.

To assess for CXCR4 expression, cells are stained with biotinylated mouse-anti-human CXCR4 antibody (clone 44716.111 or clone 44717.111, R&Dsystem, diluted 1:10 in staining buffer) which is detected with streptavidin-PE (R&Dsystem, diluted 1 :20 in staining buffer).

To assess for TNFR-1 expression, cells are stained with a mouse-anti-human TNFRl antibody (550514, Pharmingen, 1:25 dilution), which is then bound by a biotinylated anti-mouse IgG, Fab-specific (1:100 dilution, Sigma), and detected with streptavidin- Cychrome (1:200 dilution, Pharmingen).

To assess for ICAM expression, cells are stained with a mouse-anti-human ICAM antibody (MCA532, Serotec, 1:25 dilution), which is then bound by a biotinylated anti-mouse IgG, Fab-specific (1:100 dilution, Sigma), and detected with streptavidin- Cychrome (1:200 dilution, Pharmingen).

Between each step, cells are pelleted at 1000 φm for 5 minutes and washed twice in 150 μl ice-cold staining buffer. Flow cytometric analysis is performed on a

FACSCalibur using the CellQuest software package (Becton Dickinson). About 300,000 events corresponding to 15,000 events gated on GFP positive cells are collected per sample.

Figure 6 shows the results of the experiment described above. The first column shows tlie total cell population, analysed according to GFP expression (dark filled area), with the population of cells expressing GFP (and therefore containing the zinc fmger construct) indicated by a horizontal line. Columns 2, 3 and 4 show the expression levels of TNFR-1, CXCR4 and ICAM (a non-target cell- surface protein, which acts as a control), respectively, in the cell populations which express GFP. The graphs further show that cells expressing only the TNFRl-4-2 -Kox peptide have greatly reduced expression of TNFR-1 while having normal levels of CXCR4 and ICAM. Cells expressing only the CXCR4-5-l-Kox protein display greatly reduced levels of CXCR4 and unaltered levels of TNFR-1 and ICAM. More importantly, cells expressing both the CXCR4 and TNFR-1 inhibitory proteins display greatly reduced expression of both CXCR4 and TNFR-1, but do not show a change in the level of the control protein, ICAM.

The above experiments show that zinc finger polypeptides engineered against receptor nucleotide sequences are capable of down-regulating expression of receptor polypeptides, in their native environment, as well as assayed by reporter systems.

REFERENCES

1. Berger, E.a.; Doms, R.W.; Fenyo, E.M.; Korber, B.T.; Littman, D.R.; Moore, J.P.; Sattentau, Q.J.; Schuitemaker, H.; Sodroski, J.; Weiss, R.a., a New Classification for Hiv-1. Nature 391(6664):240 (1998). 2. Bleul, CC; Farzan, M.; Choe, H.; Parolin, C; Clark-Lewis, L; Sodroski, J.; Springer, T.a., the Lymphocyte Chemoattractant Sdf-1 is a Ligand for Lestr/Fusin and Blocks Hiv-1 Entry. Nature 382(6594):829-833 (1996).

3. Bleul, C.C.; Wu, L.; Hoxie, J.a.; Springer, T.a.; Mackay, C.R., the Hiv Coreceptors Cxcr4 and Ccr5 Are Differentially Expressed and Regulated On Human T

Lymphocytes. Proc Natl Acad Sci Usa 94(5):1925-1930 (1997).

4. Brelot, a.; Heveker, N.; Pleskoff, O.; Sol, N.; Alizon, M., Role of the First and Third Extracellular Domains of Cxcr-4 in Human Immunodeficiency Virus Coreceptor Activity. J Virol 71(6):4744-4751 (1997).

5. Di Marzio, P.; Tse, J.; Landau, N.R., Chemokine Receptor Regulation and Hiv Type 1 Tropism in Monocyte-Macrophages. Aids Res Hum Retroviruses 14(2): 129-138 (1998).

6. D'souza, M.P.; Harden, V.a., Chemokines and Hiv-1 Second Receptors. Nature Medicine 2:1293-1300 (1996).

7. Feng, Y.; Broder, CC; Kennedy P.E.; Berger E.a., Hiv-1 Entry

Cofactor: Functional C-Dna Cloning of a Seven-Transmembrane, G Protein-Coupled Receptor. Science 272:872-877 (1996).

8. Forster, R.; Kremmer, E.; Schubel, a.; Breitfeld, D.; Kleinschmidt, a.; Nerl, C; Bernhardt, G.; Lipp, M.; Intracellular and Surface Expression of the Hiv-1 Coreceptor Cxcr4/Fusin On Various Leukocyte Subsets: Rapid Intemalization and Recycling Upon Activation. J Immunol 160(3): 1522-1531 (1998).

9. Granelli-Piperno, a.; Moser, B.; Pope, M.; Chen, D.; Wei, Y._; Isdell, F.; O'doherty, U.; Paxton, W.; Koup, R.; Mojsov, S.; Bhardwaj, N.; Clark-Lewis, I.; Baggiolini, M.; Steinman, R.M.; Efficient Interaction of Hiv-1 with Purified Dendritic Cells Via Multiple Chemokine Coreceptors. J Exp Med 184(6):2433-2438 (1996). 10. Gupta, S.K.; Lysko, P.G.; Pillarisetti, K.; Ohlstein, E.; Stadel, J.M.; Chemokine Receptors in Human Endothelial Cells - Functional Expression of Cxcr4 and Its Transcriptional Regulation By Inflammatory Cytokines. J Biol Chem 273(7):4282-4287 (1998).

11. Lavi, E.; Strizki, J.M.; Ulrich, a.M.; Zhang, W.; Fu, L.; Wang, Q.;

O'connor, M.; Hoxie, J.a.; Gonzalez-Scarano, F.; Cxcr-4 (Fusin), a Co-Receptor for the Type 1 Human Immunodeficiency Virus (Hiv-1), is Expressed in the Human Brain in a Variety of Cell Types, Including Microglia and Neurons. Am J Pathol 151(4):1035-1042 (1997).

12. Lu, Z.H.; Berson, J.F.; Chen, Y.H.; Turner, J.D.; Zhang, T.Y.; Sharron,

M.; Jenks, M.H.; Wang, Z.X.; Kim, J.; Rucker, J.; Hoxie, J.a.; Peiper, S.C.; Doms, R.W., Evolution of Hiv-1 Coreceptor Usage Through Interactions with Distinct Ccr5 and Cxcr4 Domains. Proc Natl Acad Sci Usa 94(12):6426-6431 (1997).

13.Oberlin, E.; Amara, a.; Bachelerie, F.; Bessia, C; Virelizier, J.-L.; Arenzana-Seisdedos, F.; Schwartz, O.; Heard, J.-M.; Clark-Lewis, I.; Legler, D.F.; Loetscher, M.; Baggiolini, M.; Moser, B., the Cxc Chemokine Sdf-1 is the Ligand for Lestr/Fusin and Prevents Infection By T-Cell-Line- Adapted Hiv-1. Nature 382(6594):833-835 (1996).

14. Signoret, N.; Oldridge, J.; Pelchen-Matthews, a.; Klasse, P.J.; Tran, T.; Brass, L.F.; Rosenkilde, M.M.; Schwartz, T.W.; Holmes, W.; Dallas, W.; Luther, M.a.;

Wells, T.N.C.; Hoxie, J.a.; Marsh, M., Phorbol Esters and Sdf-1 Induce Rapid Endocytosis and Down Modulation of the Chemokine Receptor Cxcr4. J Cell Biol 139(3):651-664 (1997).

15. Volin, M.V.; Joseph, E.; Shockley, M.S.; Davies, P.F., Chemokine Receptor Cxcr4 Expression in Endothelium. Biochem Biophys Res Commun

242(l):46-53 (1998). 16. Yi, Y.J.; Rana, S.; Turner, J.D.; Gaddis, N.; Collman, R.G., Cxcr-4 is Expressed By Primary Macrophages and Supports Ccr5-Independent Infection By Dual-Tropic But Not T-Tropic Isolates of Human Immunodeficiency Virus Type 1. J Virol 72(l):772-777 (1998).

17. Chan, F.K., Siegel, R.M., Lenardo, M.J., Signaling by the TNF Receptor

Superfamily and T Cell Homeostasis. Immunity 13: 419-422 (2000).

18. Suvannavejh, G.C., Lee, H.O., Padilla, J., Dal Canto, M.C., Barret, T.A., Miller, S.D., Divergent roles for p55 and p75 tumour necrosis factor receptors in the pathogenesis of MOG(35-55)-induced experimental autoimmune encephalomyelitis. Cell. Immunology 205: 24-33 (2000).

19. Immunology. Eds Roitt, I., Brostoff, J., Male, D. Mosby, London, 4^th edition (1996).

20. Maini, R.N., Taylor, P.C., Paleolog, E., Charles, P., Ballara, S., Brennan, F.M., Feldmann, M., Anti-tumour necrosis factor specific antibody (infliximab) treatment provides insights into the pathophysiology of rheumatoid arthritis. Ann. Rheum. Disease 58:156-160 (1999).

21. Garrison, L., McDonnell, N.D., Etanercept: therapeutic use in patients with rheumatoid arthritis. Ann. Rheum. Disease58:165-169 (1999).

22. Bachmaier, K., Pummerer, C, Kozieradzki, I., Pfeffer, K., Mak, T.W., Neu., N., Penninger, J.M. Low-molecular-weight tumour necrosis factor receptor p55 controls induction of autoimmune heart disease. Circulation 95: 655-661 (1997).

23. Li, Y.Y., Feng, Y.Q., Kadokami, T., Mctiernan, C.F., Draviam, R., Watkins, S.C., Feldman, A.M., Myocardial extracellular matrix remodeling in transgenic mice overexpressing tumor necrosis factor α can be modulated by anti- tumor necrosis factor α therapy. Proc. Natl. Acad. Sci. USA 97: 12746-12751 (2000). 24. Kollias, G., Douni, E., Kassiotis, G., Kontoyiannis, D., On the role of tumour necrosis factor and receptors in models of multi-organ failure, rheumatoid arthritis, multiple sclerosis and inflammatory bowel disease. Immunol. Rev. 169: 175- 194 (1999).

25. Kemper, O., Wallach, D., Cloning and partial characterization of the promoter for the human p55 tamor necrosis factor (TNF) receptor. Gene 134: 209-216 (1993).

26. Luscher B, Mitchell PJ, Williams T, Tjian R. Regulation of transcriptional factor AP-2 by the morphogen retinoic acid and by second messengers. Genes Dev. 3:1507-1517 (1989).

Each of the applications and patents mentioned above, and each document cited or referenced in each of the foregoing applications and patents, including during the prosecution of each of the foregoing applications and patents ("application cited documents") and any manufacturer's instructions or catalogues for any products cited or mentioned in each of the foregoing applications and patents and in any of the application cited documents, are hereby incoφorated herein by reference. Furthermore, all documents cited in this text, and all documents cited or referenced in documents cited in this text, and any manufacturer's instructions or catalogues for any products cited or mentioned in this text, are hereby incoφorated herein by reference. In particular, we hereby incoφorate by reference International Patent Application Numbers PCT/GB00/02080 (WO01/00815), PCT/GB00/02071 (WOOO/73434), PCT/GB00/03765 (WOO 1/25417), United Kingdom Patent Application Numbers GB0001582.6, GB0001578.4, and GB9912635.1 as well as US09/478513.

Various modifications and variations of the described methods and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology or related fields are intended to be within the scope of the following claims.

TABLE 1:

ZINC FINGER CONSENSUS SEQUENCES:

CONSENSUS SEQUENCE 1

Xo-₂ C Xμj C X₂_₇ D S A T L T E H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 2

X₀-₂ C Xj_._₅ C X₂_₇ R R D D L S R H X₃-.₆ ^H/_c

CONSENSUS SEQUENCE 3

X₀_₂ C i-,₅ C X₂_₇ R K S D R T R H X₃_₆ /_c

CONSENSUS SEQUENCE 4

X i.Q 0--2 C C XXι_._5 C C XX 2_-₇7 R R oS LD> JE_ι JL_ι T rR. iH-. ΛX₃ 3_-6S

CONSENSUS SEQUENCE 5

Xo-₂ C LS C X₂_₇ R S D T L S K H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 6

Xo-₂ C X_x_₅ C X₂_₇ Q K H D R T Q H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 7

Xo-₂ C .₅ C X₂_₇ K S N D L I R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 8 X₀_₂ C X_x_₅ C X₂_₇ Q S A H L S R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 9

Xo-₂ C X_x_₅ C X₂_₇ D N R D R T K H X₃_₆ ^H/_c CONSENSUS SEQUENCE 10

Xo_ C Xι_₅ C X _₇ R S D E L T R H X₃_₆ / c

CONSENSUS SEQUENCE 11

X₀-₂ C XLS C X₂_₇ R S D T L S K H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 12

Xo-₂ C Xχ.₅ C X₂_₇ Q K H D R T Q H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 13

Xo-₂ C Xi_₅ C X₂_₇ K S N D L I R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 14 X₀_₂ C X_!_₅ C X₂_₇ Q S A H L S R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 15

X₀_₂ C Xi_₅ C X₂_₇ D N R D R T K H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 16

XQ— 2 C Xι_5 C X₂_7 R S T D L I R H X₃_ς

CONSENSUS SEQUENCE 17

X₀_₂ C Xi_₅ C X₂_₇ T S S N L S A H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 18

X₀_₂ C Xι_₅ C X₂_₇ R N A D R T K H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 19

Xo_₂ c Xι_5 c x₂_₇ A S A D L T R H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 20

Xo-₂ C Xi-s C X₂_₇ R R D H L S E H X₃_₆ ^H/_C CONSENSUS SEQUENCE 21

Xo-₂ C Xχ-,₅ C X₂_₇ R N D S R T N H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 22

X₀_₂ C Xχ-,₅ C X₂_₇ R S Q H L T E H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 23 x₀_₂ c x_x_₅ c x₂.₇ T S S H L S V H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 24 x₀_₂ c x_x_₅ c ^χ ₂_₇ H S N A R K T H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 25 X₀_₂ C Xi_₅ C X₂_₇ R S D E L T R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 26

Xo-₂ C X_!_₅ C X₂_₇ R S D N L S E H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 27

Xo-₂ C X_!_₅ C X₂_₇ R S D N R K T H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 28

Xo-₂ C X_LS C X₂_₇ D N R D L I R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 29

Xo-₂ c Xχ_₅ c x₂_₇ R S D D L S R H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 30 X₀_₂ C X_LS C X₂_₇ R S D N R T K H X₃_₆ ^H/_C

CONSENSUS SEQUENCE 31

X₀-₂ C Xχ-,5 C X₂_₇ D N R D L I R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 32

Xo-₂ C Xχ_₅ C X₂_₇ R S D D L S R H X₃_₆ ^H/_c CONSENSUS SEQUENCE 33

X₀_₂ C X .₅ C X₂_₇ R S D N R T K H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 34

X₀_₂ C X_x_₅ C X₂_₇ D S A H L I R H X₃_₆ ^H/_c

CONSENSUS SEQUENCE 35

XQ_2 C Xχ-,₅ C X₂_₇ S S D L S R H x₃_₆ ^H/_c

CONSENSUS SEQUENCE 36

Xo-₂ C Xχ.₅ C X₂_₇ Q S A H R K T H X₃_₆ ^H/_C

ADDITIONAL ZINC FINGER CONSENSUS SEQUENCES: x^a C X₂-4 C x₂_₃ F X^c D S A T L T E H X X X^b H x^a c X₂-4 C X₂-3 F X^c R R D D L S R H X X X^b H x^a C X2-4 C X₂-3 F x^c R K S D R T R H X X X^b H x^a C X2-4 C X2-3 F x^c R S D E L T R H X X X^b H x^a c X2-4 C X₂-3 F x^c R S D T L s K H X X X^b H x^a C X2-4 C X₂-3 F x^c Q K H D R T Q H X X X^b H x^a C X₂-4 C X₂-3 F x^c K s N D L I R H X X X^b H x^a C X2-4 C X₂-3 F x^c Q s A H L s R H X X X^b H x^a C X2-4 C X₂-3 F x^c D N R D R T K H X X X^b H x^a C X2-4 C X₂-3 F x^c R S D E L T R H X X X^b H x^a C X2-4 C X2-3 F x^c R s D T L s K H X X X^b H x^a C X2-4 C X₂-3 F x^c Q K H D R T Q H X X X^b H x^a C X2-4 C X2-3 F _χc K s N D L I R H X X X^b H x^a C X2-4 C X₂-3 F x^c Q s A H L s R H X X X^b H x^a C X2-4 C X2-3 F x^c D N R D R T K H X X X^b H x^a C X2-4 C X2-3 F x^c R S T D L I R H X X X^b H x^a c X2-4 C X₂-3 F x^c T S S N L s A H X X X^b H x^a C X2-4 C X₂-3 F x^c R N A D R T K H X X X^b H x^a C X2-4 C X -3 F x^c A s A D L T R H X X X^b H x^a C X2-4 C X₂-3 F x^c R R D H L s E H X X X^b H x^a C X2-4 C X2-3 F x^c R N D s R T ¹ N H X X X^b H x^a C X2-4 C X2-3 F x^c R S Q H L T E H X X X^b H x^a C X2-4 C X₂-3 F x^c T S s H L s V H X X X^b H x^a C X2-4 C X₂-₃ F x^c H S N A R K : T H X X X^b H x^a C X2-4 C X2-3 F x^c R S D E L T ' R H X X X^b H x^a C X2-4 C X₂-3 F x^c R S D N L s E H X X X^b H x^a C X2-4 C X₂-3 F x^c R S D N R K : T H X X X^b H x^a C X2-4 C X₂-3 F x^c D N R D L I R H X X X^b H o

H U α.

X X X X X X X X

XI

X X X X X X X

X X X X X X X X

P4 w & w (- & EH

Q £ Q Q £ HJ Q ffi

Q Q ^ Q P <! CO tj co co S CO co CO CO O rt rt Q rt \% Q EH α u o u u 0 ϋ u X X X X X X X fa fa fa fa fa fa fa fa m n n n n m

1

X X X X X X X X u u u u u u u u αo o <* J •* 1

IΛ

O X X X X X X X X o u u u u u u υ u O

X X X X X X X X v

Claims

1. A polypeptide capable of binding to a nucleic acid comprising a receptor nucleotide sequence.

2. A polypeptide according to Claim 1, in which the polypeptide is capable of downregulating the expression of the receptor nucleotide sequence, or a nucleic acid sequence linked to the receptor nucleotide sequence.

3. A polypeptide according to Claim 1 or 2, in which the receptor nucleotide sequence comprises a promoter sequence of the receptor.

4. A polypeptide according to any preceding claim, in which the receptor is capable of functioning as a receptor for viral infection.

5. A polypeptide according to any preceding claim, in which the receptor is capable of functioning as a receptor for infection by Human Immunodeficiency Virus (HIN).

6. A polypeptide according to any preceding claim, in which the receptor is selected from the group consisting of: CD4, CXCR-4, CCR5, CCR3 and CCR2b.

7. A polypeptide according to any preceding claim, in which the polypeptide comprises a zinc fmger motif having a general primary structure:

(A⁷ ) X₀-2 C X_LS C X₂_₇ X X X X X X X H X₃_₆ ^H/_C

-1 1 2 3 4 5 6 7

where X is any amino acid, and the numbers in subscript indicate the possible numbers of residues represented by X, in which the receptor comprises a CXCR-4 receptor, and in which the polypeptide comprises three zinc fingers Fl, F2 and F3, at least one of the amino acids at positions -1, 3, and 6 of Fl, -1, 3 and 6 of F2 and-1, 3 and 6 of F3 being selected from amino acids specified in the following table:

8. A polypeptide according to Claim 7, in which at least one of the amino acids at positions -1, 1, 2, 3, 4, 5 and 6 of Fl, -1, 1, 2, 3, 4, 5 and 6 of F2 and-1, 1, 2, 3, 4, 5 and 6 of F3 are selected from amino acids specified in the following table:

Fl: amino acid

-1 D, R, K, R

1 S

2 A, D, N, D, T

3 T,E,D

4 L

5 T.I

6 E,R

F2

-1 R,Q,T

1 R,S

2 D,A, S

3 D, T, H, N

4 L

5 s,ι

6 R, K, R, A

F3

-1 R,Q,D

1 K,N

2 S, H, R, A

3 D

9. A polypeptide according to Claim 1, 2, or 3, in which the receptor comprises a cytokine receptor.

10. A polypeptide according to any of Claims 1 to 3 and 9, in which the receptor is selected from the group consisting of: TNFRl, TNFR2, IL-IR, and an IFN receptor.

11. A polypeptide according to any of Claims 1 to 3, 9 and 10, in which the polypeptide comprises a zinc finger motif having a general primary structure:

(A ' ) X₀_₂ C XI_₅ C X₂_₇ X X X X X X X H X₃_₆ ^H/_C

-1 1 2 3 4 5 6 7

where X is any amino acid, and the numbers in subscript indicate the possible numbers of residues represented by X, in which the receptor comprises a TNF receptor, and in which the polypeptide comprises three zinc fingers Fl, F2 and F3, at least one of the amino acids at positions -1, 3, and 6 of Fl, -1, 3 and 6 of F2 and -1, 3 and 6 of F3 being selected from amino acids specified in the following table:

12. A polypeptide according to Claim 11, in which at least one of the amino acids at positions — 1, 1, 2, 3, 4, 5 and 6 of Fl, -1, 1, 2, 3, 4, 5 and 6 of F2 and -1, 1, 2, 3, 4, 5 and 6 of F3 are selected from amino acids specified in the following table:

13. A polypeptide according to any of Claims 7 to 12, in which each of the amino acids at the numbered positions are selected from amino acids specified in the table.

14. A polypeptide according to any preceding claim, in which the polypeptide comprises one or more amino acid sequences selected from the group consisting of consensus sequences 1 to 36 shown in Table 1, a variant of one or more thereof, and polypeptides comprising combinations of two or more thereof.

15. A polypeptide according to any preceding claim, in which the polypeptide comprises a zinc finger comprising six zinc finger motifs selected from the group consisting of: CXCR4-5-1, CXCR4-10-1, CXCR4-10-3, TNFRl-4-2, TNFRl-7-9, TNFRl-9-12, and TNFR1-7-10.

16. A polypeptide according to Claim 15, which comprises two three zinc fmger motifs, each comprising a polypeptide according to any of Claims 1 to 14.

17. A polypeptide according to any preceding claim, which further comprises a transcriptional effector domain.

18. A polypeptide according to Claim 17, in which the transcriptional effector domain is a repressor domain selected from the group comprising a KRAB-A domain, an engrailed domain and a snag domain.

19. A polypeptide according to any preceding claim selected by phage display.

20. A polypeptide according to any preceding claim which is engineered on the basis of rational design.

21. A composition comprising a pharmaceutically effective amount of a polypeptide according to any preceding claim, together with a pharmaceutically acceptable excipient, diluent or carrier.

22. A nucleic acid molecule encoding a polypeptide according to any of Claims 1 to 20.

23. An expression vector comprising a nucleic acid molecule according to Claim

22.

24. A particle harbouring a nucleic acid according to Claim 22, an expression vector according to Claim 23, or a polypeptide according to any of Claims 1 to 20.

25. A method of modulating expression of a nucleic acid sequence, the method comprising contacting the nucleic acid sequence with a polypeptide according to any of Claims 1 to 20.

26. Use of a zinc finger polypeptide, or a nucleic acid encoding such a polypeptide, to modulate transcription of a receptor nucleotide sequence.

27. A method of treating or preventing a disease in a patient caused by a virus, the method comprising administering a nucleic acid binding polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is capable of functioning as a receptor for infection by the virus, or a nucleic acid encoding such a polypeptide, to the patient.

28. A polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is capable of functioning as a receptor for infection by the virus, or a nucleic acid encoding such a polypeptide, for use in a method of treatment or prevention of a disease caused by a virus.

29. Use of a nucleic acid polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is capable of functioning as a receptor for infection by the virus, or a nucleic acid encoding such a polypeptide, in the preparation of a medicament for use in the treatment or prevention of a disease caused by a virus in a patient.

30. A method according to Claim 27, a polypeptide according to Claim 28 for a use as specified therein, or a use according to Claim 29, in which the virus is Human Immunodeficiency Virus (HIV).

31. A method according to Claim 27 or 30, a polypeptide according to Claim 28 or 30 for a use as specified therein, or a use according to Claim 29 or 30,in which the zinc fmger polypeptide comprises a polypeptide according to any of Claims 1 to 8 and 13 to 20.

32. A method of treating or preventing a disease in a patient associated with an immune response, the method comprising administering a nucleic acid binding polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is involved in the immune response, or a nucleic acid encoding such a polypeptide, to the patient.

33. A polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is involved in an immune response, or a nucleic acid encoding such a polypeptide, for use in a method of treatment or prevention of a disease associated with an immune response.

34. Use of a nucleic acid polypeptide capable of binding to a nucleic acid sequence comprising a receptor nucleotide sequence, in which the receptor is involved in an immune response, or a nucleic acid encoding such a polypeptide, in the preparation of a medicament for use in the treatment or prevention of a disease associated with an immune response.

35. A method according to Claim 32, a polypeptide according to Claim 33 for a use as specified therein, or a use according to Claim 34, in which the disease is an autoimmune disease.

36. A method according to Claim 32 or 35, a polypeptide according to Claim 33 or 35 for a use as specified therein, or a use according to Claim 34 or 35,in which the zinc fmger polypeptide comprises a polypeptide according to any of Claims 1 to 3 and 9 to 20.

37. A method of downregulating the expression of a receptor involved in viral infection or an immune response, the method comprising: (a) providing a nucleic acid binding polypeptide capable of binding to a nucleic acid sequence comprising a nucleotide sequence of the receptor; (b) providing a native nucleic acid sequence comprising one or more nucleotide sequences capable of being bound by the nucleic acid binding polypeptide; and (b) contacting the nucleic acid binding polypeptide with the native nucleic acid sequence.

38. A method of reducing viral titre in a system comprising a host cell and a virus, the method comprising administering to the system or any of its components a polypeptide capable of binding to a nucleotide sequence of a receptor involved in viral infection, or a nucleic acid encoding such a polypeptide.

39. A method of downregulating a viral function in a cell, the method comprising contacting the cell with a polypeptide capable of binding to a nucleotide sequence of a receptor involved in viral infection, or a nucleic acid encoding such a polypeptide.

40. A method according to Claim 39, in which the viral function is selected from the group consisting of: viral titre, viral infectivity, viral replication, viral packaging, and viral transcription, viral entry, viral attachment and viral penetration..

41. A method of modulating a viral function or an immune response in a system comprising administering a polypeptide according to any of Claims 1 to 20, or a nucleic acid according to Claim 22 or 23 to said system.