WO2014070227A1

WO2014070227A1 - Methods and reagents for identifying proximate proteins

Info

Publication number: WO2014070227A1
Application number: PCT/US2013/030261
Authority: WO
Inventors: Kyle ROUX
Original assignee: Roux Kyle
Priority date: 2012-11-03
Filing date: 2013-03-11
Publication date: 2014-05-08

Abstract

The present invention provides methods for identifying in vivo proximate proteins and reagents to use in such methods. Eg, a promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells.

Description

Methods and reagents for identifying proximate proteins

Cross Reference

This application claims priority to U.S. Provisional Patent Application Serial No. 61/722145 filed November 3, 2012, incorporated by reference herein in its entirety.

Background

The elucidation of protein-protein interactions represents a significant barrier to the understanding of complex biological processes. In recent years it has become increasingly clear that the functions of many proteins can only be fully understood in the context of networks of interactions. Furthermore, the description of such networks provides keys to our understanding of disease processes . Biochemical and genetic techniques, including affinity-capture complex purification and yeast two-hybrid strategies have provided powerful tools in the search for new molecular associations. However, these methods also display fundamental limitations. For high-throughput genetic approaches, protein interactions are commonly assessed in a cellular environment different to that in which they would normally occur, often lacking the proper machinery for post-translational modifications and the normal complement of associated binding partners, including molecular chaperones. This can lead to incomplete or erroneous datasets. Biochemical approaches suffer loss of candidates through protein insolubility and transient or weak interactions. As a consequence of these limitations many proteins remain refractory to conventional methods used to screen for protein interactions. These issues are more relevant than ever, as we collectively look to the daunting task of unraveling the protein "interactome."

Summary of the Invention

In a first aspect, the present invention provides methods for identifying in vivo proximate proteins, comprising (a) culturing recombinant cells comprising a recombinant nucleic acid capable of directing expression of a fusion protein comprising (i) a heterologous promiscuous biotin protein ligase (BPL), or functional equivalent thereof, and (ii) a bait polypeptide, wherein the culturing is carried out under conditions suitable for expression of the fusion protein in the eukaryotic cells; and (b) identifying biotinylated proteins, wherein the biotinylated proteins are proteins present in the cells proximate to the bait polypeptide. In one embodiment, the BPL, or functional equivalent thereof, comprises a protein with an amino acid sequence of general formula 1;

X1-X2-X3-X4-X5-X6-X7-X8-X9-X10-X11-X12-X13-X14-X15-X16-X17-X18-X19- X20-X21-X22-X23-X24-X25-X26-X27-X28-X29-X30-X31-X32 (SEQ ID NO: 13), wherein

I is selected from the group consisting of A, L, I, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R_ K, E, and I;

X7 is Q;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, K, N, S, Q, and E;

X10 is G;

XI I is selected from the group consisting of R and K;

X12 is G;

X13 is any residue other than R;

X14 is selected from the group consisting of R, L, W, G, and S;

X15 is selected from the group consisting of G, Q, P, and ;

X16 is selected from the group consisting of R and N;

X17 is selected from the group consisting of , Q, V, E, T, M_f and A,

X18 is W;

X19 is selected from the group consisting of F, L, Y, E, I, and V;

X20 is selected from the group consisting of S, M, and N;

X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consisting of F, E, A, , Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C, or is absent;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L, or is absent; X27 is selected from the group consisting of A, G, and S, or is absent;

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, I, A, and F;

X30 is selected from the group consisting of Y, M, A, V, and L;

X31 is selected from the group consisting of L, G_f V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

In another aspect, the present invention provides recombinant nucleic acids, comprising (a) a first nucleic acid domain encoding a promiscuous biotin protein ligase (BPL); and (b) a second nucleic acid domain encoding a bait polypeptide. In one embodiment, the first nucleic acid domain encodes a protein comprising an amino acid sequence of general formula 1 :

XI is selected from the group consisting of A, L, I, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K, E, and I;

X7 is Q;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, , N, S, Q, and E;

XlO is G;

XI I is selected from the group consisting of R and K,

X12 is G;

13 is any residue other than R;

14 is selected from the group consisting of R, L, W, G, and S;

X15 is selected from the group consisting of G, Q, P, and K;

X16 is selected from the group consisting of R and N;

X17 is selected from the group consisting of , Q, V, E, T, M, and A;

X18 s W;

X19 is selected from the group consisting of F, L, Y, E, I, and V;

X20 is selected from the group consisting of S, M, and N; X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consisting of F, E, A, K, Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C, or is absent;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L, or is absent;

X27 is selected from the group consisting of A, G, and S, or is absent;

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, I, A, and F;

X30 is selected from the group consisting of Y, M, A, V, and L;

X31 is selected from the group consisting of L, G, V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

In another aspect, the present invention provides recombinant expression vectors comprising a recombinant nucleic acid of the invention. In another aspect, the invention provides recombinant host cells comprising a recombinant expression vector of the invention. In another aspect, the invention provides transgenic, non- human organism comprising a recombinant host cell of the invention.

In a further aspect, the present invention provides recombinant fusion proteins, comprising (a) a first domain encoding a biotin protein ligase (BPL); and (b) a second domain encoding a bait polypeptide. In one embodiment, the first domain comprises an amino acid sequence of general formula 1 :

X1-X2-X3-X4-X5-X6-X7-X8-X9-X10-X11-X12-X13-X14-X15-X16-X17-X18-X19- X20-X21 -X22-X23-X24-X25-X26-X27-X28-X29-X30-X31 -X32 (SEQ ID NO: 13), wherein

XI is selected from the group consisting of A, L, I, T, V, or is absent,

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K, E, and I;

X7 is Q;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, K, N, S, Q, and E;

X10 is G; XI 1 is selected from the group consisting of R and K.;

X12 is G;

X 13 is any residue other than R;

X14 is selected from the group consisting of R, L, W, G, and S;

X 15 is selected from the group consisting of G, Q, P, and ;

X16 is selected from the group consisting of R and N;

X17 is selected from the group consisting of , Q, V, E, T, M, and A;

X18 is W;

X19 is selected from the group consisting of F, L, Y, E, I, and V;

X20 is selected from the group consisting of S, M, and N;

X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consisting of F, E, A, K, Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C, or is absent;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L, or is absent;

X27 is selected from the group consisting of A, G, and S, or is absent;

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, I, A, and F;

X30 is selected from the group consisting of Y, M, A, V, and L;

X31 is selected from the group consisting of L, G, V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

Description of the Figures

Figure 1, Model for application of BioID method, (a) Expression of a promiscuous biotin-ligase fusion protein in live cells leads to the selective biotinylation of proteins proximate to that fusion protein. After stringent cell lysis and protein denaturation, biotinylated proteins are affinity purified. These candidate proteins can be identified by mass spectrometry or immunoblot analysis, (b) In our application of BioID™ to LaA to identify candidate proteins we used HEK293 cells stably expressing inducible mycBirA*LaA. 24 h before lysis, cells were induced to express mycBirA*LaA with doxycycline and to biotinylate endogenous proteins with 50 μΜ biotin. Cells were lysed under stringent conditions and biotinylated. Figure 2. BirA* promiscuously biotinylates endogenous proteins in mammalian cells. HeLa cells were analyzed 24 h after transient transfection with myc-BirA- WT or myc-BirA* (Rl 18G). After transfection, cells were cultured either with or without supplemental biotin (50 μΜ). (a) By Western blot analysis similar levels of the exogenous BirA (asterisk) are detected in all samples with anti-myc. Biotinylated proteins, including both exogenous BirA (asterisk) and endogenous proteins, were detected with HRP-streptavidin. Enhanced protein biotinylation is observed in the myc-BirA* samples as compared with the WT isoform. This difference is dramatically enhanced by the presence of excess biotin. (b) By fluorescence microscopy the BirA is predominantly nuclear as observed with anti- myc. Biotinylated proteins were detected with fluorescently labeled streptavidin. Considerable biotinylation is only observed in cells expressing myc-BirA* and supplemented with excess biotin. The biotin signal predominantly co-localizes with myc-BirA*. DNA is labeled with Hoechst. Bar, 4 μιη.

Figure 3. Proximity-dependent promiscuous biotinylation by BioID-

LaA. HEK293 cells inducibly expressing myc-BirA*LaA, or parental controls, were analyzed 24 h after induction with or without excess biotin. (a) By immunoblot analysis the LaA fusion protein (asterisk) is detected with anti-myc. In control cells levels of endogenously biotinylated proteins are unaffected by the supplemental biotin; however, the biotinylation of endogenous proteins by myc-BirA*LaA is dramatically enhanced in the presence of excess biotin. (b) The myc-BirA*LaA is detected at the nuclear rim and to a lesser extent in the nucleoplasm. Biotinylated proteins colocalize with the LaA fusion protein. DNA is labeled with Hoechst. Bar, 5 pm. (c) To monitor the relative rate of biotinylation by BioID, myc-BirA*LaA HEK293 cells were provided with 50 μΜ biotin at different time points. The levels of myc-BirA*LaA (asterisk in panel a) are similar for all conditions; however, the extent of biotinylation increases with duration of biotin supplementation until it is saturated by 24 h. (d) Similar results were observed by fluorescence microscopy. Myc- BirA*LaA was detected with anti-myc and biotinylated proteins with fluorescently labeled streptavidia DNA was labeled with Hoechst. Bar, 25 pm. (e) The identity of the proteins biotinylated by BioI -LaA was determined by mass spectrometry of proteins isolated with streptavidin-coupled magnetic beads. The relative abundance (percentage of total identified spectra adjusted for protein size, identified spectra per 10³ amino acids) and classification of proteins uniquely biotinylated by myc- BirA*LaA is depicted in the chart. Conservatively estimated, 50% of the detected proteins predominantly reside at the nuclear lamina, INM, or nucleoplasmic face of the NPCs.

Figure 4. SLAP75 is a novel peripheral membrane constituent of the NE identified with BioID™-LaA. To determine if F AM 169A/SL AP75 is a novel peripheral membrane constituent of the NE its subcellular localization was analyzed by immunofluorescence microcopy, (a) As detected with anti-SLAP75, the endogenous protein is co-localized with LaA at the NE of HEK293 cells, (b) Although endogenous SLAP75 is not detected in HeLa cells (not depicted), transiently expressed HA-SLAP75 detected with anti-FAM169A/SLAP75 is localized to the NE, labeled with anti-Nu l53. DNA was labeled with Hoechst . Bar, 10 μιτι.

Figure 5. Comparison of Gl- and G2-BioID™ by immune- and streptavidin-blot analysis. To compare the relative efficacy of the 2^nd generation (G2) of BioID™ to the 1 ^st (Gl ), we transiently transfected human HEK293 cells with myc-tagged Gl (*) or G2 (**) BioID™ in the presence or absence of exogenous biotin. Protein lysates from these cells prepared 24 hours after transfection were separated by SDS-PAGE and analyzed by immuno- and streptavidin-blotting. As compared to control samples without BioID™ expression, similar levels of biotinylation of endogenous proteins were observed with Gl and G2.

Figure 6. Comparison of Gl- and G2-BioID™ by fluorescence microscopy. To compare the relative efficacy of the 2^nd generation (G2) of BioID™ to the 1^st (Gl), we transiently transfected human U20S cells with myc-tagged Gl (*) or G2 (**) BioID™ in the presence or absence of exogenous biotin. 24 hours after transfection cells we fixed and analyzed these cells by immuno- and streptavidin- fluorescence. In cells expressing either BioID™-fusion protein, biotinylation of endogenous proteins was observed.

Figure 7. Comparison of Gl- and G2-BioID™-LaA by immuno- and streptavidin-blot analysis. To compare the relative efficacy of the 2^nd generation (G2) of BioID™ to the 1* (Gl), we generated human HEK293 cells stat stable express myc-tagged Gl (*) or G2 (**) BioK)™-LaA. Protein lysates from these cells prepared 24 hours after the addition of excess biotin were separated by SDS-PAGE and analyzed by immuno- and streptavidin-blotting. As compared to control samples without BioID -LaA expression, a similar level and pattern of biotinylation of endogenous proteins was observed with Gl- and G2-BioID™-LaA .

Figure 8a through 8c. Exemplary Biotin Protein Ligase Amino Acid Sequences. A list of BPL amino acid sequences from exemplary species.

Figure 9. Alignment of exemplary Biotin Protein Ligase Amino Acid

Sequences. A ClustalW alignment of BPL sequences from exemplary species. The conserved BPL active site is bracketed.

Detailed Description of the Invention

All references cited are herein incorporated by reference in their entirety.

Within this application, unless otherwise stated, the techniques utilized may be found in any of several well-known references such as: Molecular Cloning: A Laboratory Manual {Sambrook, et al., 1989, Cold Spring Harbor Laboratory Press), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. Academic Press, San Diego, CA), "Guide to Protein Purification" in Methods in Enzymology (M.P. Deutshcer, ed., (1990) Academic Press, Inc.); PCR Protocols: A Guide to Methods and Applications (Innis, et al. 1 90. Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, ^ά Ed. (R.I. Freshney. 1987. Liss, Inc. New York, NY), Gene Transfer and Expression Protocols, pp. 109- 128, ed. E.J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1 98 Catalog (Ambion, Austin, TX).

As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. "And" as used herein is interchangeably used with "or" unless expressly stated otherwise.

As used herein, the term "about" means +/- 5% of the relevant unit of measurement.

As used herein, the amino acid residues are abbreviated as follows: alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine (Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gin; Q), glycine (Gly; G), histidine (His; H), isoleucine (He; I), leucine (Leu; L), lysine (Lys; ), methionine (Met; M), phenylalanine (Phc; F), proline (Pro; P), serine (Scr; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V).

All embodiments of any aspect of the invention can be used in combination, unless the context clearly dictates otherwise. In a first aspect, the present invention provides methods for identifying in vivo proximate proteins, comprising

(a) culturing recombinant cells comprising recombinant nucleic acids capable of directing expression of a fusion protein comprising (i) a heterologous promiscuous biotin protein ligase (BPL), or functional equivalent thereof, and (ii) a bait polypeptide, wherein the culturing is carried out under conditions suitable for expression of the fusion protein in the eukaryotic cells; and

(b) identifying biotinylated proteins, wherein the biotinylated proteins are proteins present in the cells proximate to the bait polypeptide.

The methods and reagents of the present invention can be used, for example, for proximity-dependent labeling of proteins in live cells. Named "BioID™" for proximity-dependent biotin identification, this approach is based on fusion of a promiscuous biotin protein ligase to a targeting protein (the "bait polypeptide"). BioID™ features proximity-dependent biotinylation of proteins that are near- neighbors of the fusion protein. The examples presented herein demonstrate that the methods and reagents of the invention are generally applicable method to screen for both interacting and neighboring proteins in their native cellular environment.

BioID™ is effective when applied to insoluble and membrane associated proteins, two classes of proteins that may be refractory to screening with conventional approaches. Additional advantages include the ability to identify weak and/or transient interactions, to screen for interactions in a relatively natural cellular setting and temporal inducibility of biotin labeling

In one embodiment the proteins present in the cells proximate to the bait polypeptide are candidate proteins for interacting with the bait polypeptide.

As used herein, the term "proximate" means within at least about 25 nm or closer. In a further embodiment, the biotinylated proteins are proteins present in the cells within at least about 20 nm of the bait polypeptide. In various further embodiments, the biotinylated proteins are proteins present in the cells within at least about 19, 18, 17, 16, 15, 14, 13, 12, 11, 10 nm of the bait polypeptide, or closer.

The methods can be used with any recombinant cell type that capable of expressing the fusion protein. The cells may be prokaryotic or eukaryotic. Non- limiting embodiments include bacterial cells, trypanosomal cells, protozoan cells, fungal cells, and mammalian cells. As used herein, "heterologous" means that the BPL is not naturally expressed in the cell (i.e., the gene encoding the promiscuous biotin protein ligase is recombinantly introduced in the cell).

The BPL is "promiscuous," in that it is not selective for its endogenous substrate, and thus shows improved proximity-dependent biotinylation of proteins. In one embodiment, the promiscuous BPL may be a naturally occurring BPL f agment; in another embodiment, the promiscuous biotin protein ligase is a mutated biotin protein ligase, or a fragment thereof.

Any suitable promiscuous BPL can be used in the methods of the invention. Virtually every organism has a BPL that has been identified and its amino acid sequence demonstrated. Any such promiscuous BPL fragment, or BPL mutation can be used in the methods of the invention, so long as it is not endogenous to the recombinant cell.

Thus, in one embodiment the BPL is a fragment of a wild-type BPL, including but not limited to a fragment of any of the BPLs in Figure 9, or functional equivalent thereof. The functional equivalent can be any modification to the promiscuous BPL that retains BPL activity (i.e.: insertions, deletions, point mutations, etc.) In one embodiment, the BPL is deleted for all or part of its DNA binding domain. The DNA binding domain is not conserved beyond some prokaryotes and is fragmented in some BPLs. In a non-limiting example, the E. eeli BPL (SEQ ID NO: 1) is found between residues 22-46. As used herein, "deleted for" includes both BPLs that naturally lack all or a part of their DNA binding domain, and BPLs that are engineered to delete all or a part of their DNA binding domain. In other embodiments, the BPL may be full length but with one or more mutations, such as mutations to increase "promiscuity" of the BPL. Exemplary such embodiments are shown in Figure 8, which possess an R to G mutation to increase promiscuity (SEQ ID NOS: 1-4, 6-12). One such embodiment is the E. colt BirA protein (SEQ ID NO: 1).

In one embodiment, the BPL comprises a protein with an amino acid sequence of general formula 1 :

I is selected from the group consisting of A, L, Γ, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent; X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K, E, and I;

X7 is Q;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, K, N, S, Q, and E;

X10 is G;

XI 1 is selected from the group consisting of R and K.;

X12 is G;

X13 is any amino acid other than R;

X14 is selected from the group consisting of R, L, W, G, and S;

X15 is selected from the group consisting of G, Q, P, and K;

X16 is selected from the group consisting of R and N;

X17 is selected from the group consisting of , Q, V, E, T, M, and A;

X18 is W;

X19 is selected from the group consisting of F, L, Y, E, I, and V;

X20 is selected from the group consisting of S, M, and N;

X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consisting of F, E, A, K, Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C, or is absent;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L, or is absent;

X27 is selected from the group consisting of A, G, and S, or is absent,

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, I, A, and F;

X30 is selected from the group consisting of Y, M, A, V, and L;

X31 is selected from the group consisting of L, G, V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

Proteins according to general formula 1 include the BPL active site based on a number of known BPLs (See Figure 9— sequence alignment; residues 511 to 542). In many BPLs, the XI 3 position is R; mutating this to any other amino acid helps to make the resulting BPL more promiscuous. In an exemplary embodiment, X13 is G. As will be understood by those of skill in the art, there are many BPLs that in which the active site may include residues different from those recited in general formula I. Thus, this embodiment is an exemplary embodiment and is not intended to be limiting on the scope of the invention.

In various embodiments, at least one of the following is true of the BPL of general formula 1 (SEQ ID NO: 17):

X4 is A;

X8 is T or S;

X13 is G;

XlS is G;

X1 is L, I, or V;

X20 is S;

X21 is P

X23 is G;

X2 is L or I;

X30 is A. V, or L;

X31 is L, G, V, or I; and/or

X32 is S.

In another embodiment, the BPL comprises or consists of an amino acid sequence according to SEQ ID NO: 14, as represented in Table 1 below. Numbering in the Table is based on the numbering in the sequence alignment shown in Figure 9.

Proteins with an amino acid sequence according to SEQ ID NO: 14 include the BPL active site and additional residues required for BPL activity based on a number of known BPLs (See Figure 9; residues 506 to 796). As will be understood by those of skill in the art, there arc many BPLs that in which the amino acid sequence required for activity may include residues different from those recited in SEQ ID NO: 14.

Thus, this embodiment is an exemplary embodiment and is not intended to be limiting on the scope of the invention.

Position Residues Position Residues

506 P, T, Q, absent 507 E, N, absent

508 S, T, M, absent 509 T, G, absent

510 T, F, L, absent 511 A, L, I, V, T, absenl

512 C, L, V, H, A, absent 513 I, V, L 514 Α,σ 515 E, D, R, T, N,V,A

516 Y,R,K,E,I 517 Q

518 T,S,Q,N, V,F 519 Α,Κ,Ν, S, Q,E

520 G 521 R,K

522 G 523 Any AA other than R,

524 R, L, W, G, S 525 G, K, Q, P

526 R,N 527 K, Q, V, E, T, M, A

528 W 529 L, V, I, F, Y, E

530 S,M,N 531 P,Q,D

532 F, E, A, K, Y, V 533 G, A

534 V, C, absent 535 C, A, absent

536 A, L, absent 537 A, G, S, absent

538 N,G, Q, T,C 539 L, I, A, F

540 A,V, L,Y,M 541 L, G,V,I,F

542 R, T, F 543 , F, Γ, 1

544 F, L, Y, V, P, I, S 545 L, I, V, W

546 R,N,A,D,Q 547 L, P, F, S, V, M

548 F., K, Π, Q, P 549 1., G, V, A, Q, F, P,D

550 G, F, K, Y, T 551 N, R, Q, absent

552 R, F, V, absent 553 N, S, absent

554 T, absent 555 T, I, absent

556 S, P, absent 557 V, I, absent

558 P, A, V, absent 559 A, Ε,Κ,Ρ, L, F

560 A,V,L,N, S,K 561 A,L,I,V,Q

562 I, L, E, A, Y, H, absent 563 G, L, V, Q, absent

564 L, I, V, S, M 565 S,P,N, M, A

566 L, V, F 567 V,L,A

568 A,I,L, V,Y 569 G, P,A,C,U,T

570 I, L, V, K, R, E 571 V, A, G, S, C,E

572 V, I, L, A, M 573 A, I, L,V,S,Y,R,K

574 D, E, S, R 575 V,A,I, Y₅S

576 L,A,I,C 577 R, Ε,Ν,Κ, G.P.H

578 K, E, N, L, G, E, absent 579 L, I, Y, A, L, F, ., absent

580 G, T, V, S, C, E, Q, absent 581 A, E, D, G, N, L, absent

582 P, absent 583 F, A, absent

584 I, A, L, D, K, absent 585 K, P, E, R, D, N, absent

586 V,L, A, F 587 R, K, S, Q, G₇ A, F

588 V,L,I 589 K

9 W 591 P

592 N 593 D 594 , V 595 Y,L,M

596 L,V,A,F,Y 597 Q, D, , N, L, R, S

598 D, E, V, G, S 599 N, P, D, L, absent

600 T, K, absent 601 Y, G, absent

602 Y, G, absent 603 K, absent

604 R, absent 605 , absent

606 N, absent 607 L, absent

608 K, absent 609 L, absent

610 V, absent 611 N, absent

612 T, absent 613 G, absent

614 F, absent 615 E, absent

616 II, absent 617 T, absent

618 K, absent 619 L, absent

620 P, absent 621 L, I, absent

622 G, N, absent 623 D, F, absent

624 I, Q, absent 625 E, G, absent

626 P, K, D, absent 627 N, A, Q, absent

628 Y, absent 629 R, , G, L, M

630 K 631 L, V, I

632 A,G,S 633 G

634 I, v 635 L, I

636 V.C.T 637 Ε,Ν,Τ

638 S, L,I 639 I, S,A,H,N

640 G, N, absent 641 K,S,H,F, Y, L, absent

642 T, K, G, I, R, V, M, absent 643 G, D, N, P, K, S, absent

644 D, , G, N, E 645 A, M, V, K, T, absent

646 A, L, D, Q, A, Y, F, absent 647 Q, Y, A, C, H, N absent

648 I, L, V, M 649 A,V,I,L

650 I, L, A, V 651 G

652 A,1,C 653 G

654 I, V, L, F 655 N

656 V, L, M 657 A, N, S, T, D

658 N,Q,M,L, G,S 659 I^S.Q^A.D.L.

660 R, E, K, I, P, G, N 661 R, D_; P, absent

662 A, T, a sent 663 V, G, I, T, absent

664 E, T, A, S, C, absent 665 V, 1., I, E, P, Q, absent

666 Q,N,S,E, T,D 667 V,E,P, T, A, D

668 V, I, G, L, W 669 N, K, S, R, A, I, V, Q

670 Q,D,E,T,R 671 G, R, I, P, E

672 W, Α,Υ,Ρ,ί,Η 673 J,T, A,V,N,S 674 T, E, S, K, absent 675 E, N, absent

676 R, absent 677 Q, absent

678 Q, absent 679 L, V, S, Q, absent

680 Q, Y, C, K, H, D, N, absent 681 E,G,L,N, P,

682 A, I, E, D, P 683 G, T, D, L, R, A, G, E

684 I, G, L, P, S 685 N,K,D,H,T,P,E,Q,K

686 D, E, A, K, P, absent 687 L, V, I, W, F

688 D, E, K, S, R, Q 689 R, K, P, A, F

690 E,D,N,K,Q,V 691 T,E,M,K,C,Y

692 L, V, I, F 693 A, 1, L, Q

694 A,L, P, S, D,T 695 M, K, N, L, S, A, R

696 L, V, A, F, Y 697 V, A, I, L, M, Γ

698 R, K, Q, P, Υ,Ν,Τ 699 E, R, Η,Τ,Α, K,N,Q,V

700 L,I,F 701 D, E, R, S, Y, A

702 A, E, T, S, R, V, K 703 A, Γ, I,, N, R, Y, W

704 L,I,Q,H,Y 705 Ε,Κ,Ν, A,R, D

706 L, K, I, Q, R, E 707 F, L

708 E,K,D,I,L,Q 709 Q, E, N, T, D

710 E, K, N, Y, Q 711 G, S,E

712 L, F, 1, A, P 713 A, K, D, S, Q, N

714 P, E, D, L, S, O 715 E, Y, absent

716 E, absent 717 I,L. V,Y,F

718 L,K,Q,R,E 719 S, Q, G.K.N, P,E

720 R, , A, S, E, L 721 Y, W,I

722 E, Q, K, L, Y 723 R, ,S,A,E,Q

724 L, K, Y, R, T 725 D, M, N, S, A, W

726 L,A,V,I,N 727 H, R, F, Y, T

728 I,L, S,G 729 G, N, D

730 R, E, S, K, Q 731 P, E, Q, I, T, R

732 V,I 733 K, R, H, N, I,T

734 L,V, AJ 735 G, A, I, L, R, P, Q

736 I,G,T, L,D,S,E 737 G, E, S, H, N, T, A

738 D, G, Q, N, T, S, E 739 K,G, N, A

740 E, D₌ absent 741 Q, absent

742 M, absent 743 V, absent

744 E, V, T, P, absent 745 S, I, Q,K, A, absent

746 F,T,S, H, A, V 747 G, I, V_: , S

748 Ι,Τ,Κ 749 S,L, E, V, F, T,Q,I

750 R_; V, Q, Y, E, G 751 G,D,I

752 L,I,T 753 D, E, S, T 754 D, E, K, F, T, S 755 Q,K,R,D, S,Y,F

756 G 757 A, G, Y, L, F, absent

758 L, A, I 759 L, I, V, K, Q

760 V,A,I,L 761 E, L,G,R, K,Q,I,H

762 Q, T, C, O, E, L 763 E, G, P, L, absent

764 D,E,N,K,A,V,G 765 G,K,S,E,N

766 G, N, V, E, absent 767 S, N absent

768 S, absent 769 T, absent

770 Q, E, absent 77 L F, P, absent

772 T, absent 773 G, T, absenl

774 N, K, absent 775 V, M, absent

776 I, V, R, Y, C, absent 777 I,Q,E,N,II, S,T

778 V, L, K, M, R 779 P, E,F,I,Q,H

780 P, W, I, N 781 M, L, S, A, D, absent

782 G, S 783 G, T, E, N

784 E,C,V, T, S,R 785 F,I, V,N

786 S, H, F, D, N 787 L, V, G, I, M, absent

788 L, G, R, F, M, D 789 R, It, H, absent

790 S, N, Q, absent 791 L, V, absent

792 S, I, R, absent 793 A, V, L, K, T, absent

794 E, K, R, P, absenl 795 K, A, absenl

796 V, T, L, R, absent

Residues X1-X32 of general formula 1 are shown in Table 1 at residues 511- 542. In one embodiment, the BPL for use in the invention comprises any residue other than R, including but not limited to a G residue at position 523 (XI 3 in general formula 1 ). In this embodiment, the invariant R residue in wild type BPLs at position X13 is modified, which results in enhanced promiscuity of the resulting BPL

In one embodiment, the BPL compiises or consists of a BPL comprising or consisting of an amino acid sequence selected from the group consisting of SEQ ID NO: 1-4 and 6-12. These amino acid sequences are also shown in Figure 8a through 8c (SEQ ID NOs: 1 -4 and 6-12). In specific embodiments, the E. coli BirA* protein of SEQ ID NO: 1 or the A. eolicus BPL protein mutant of SEQ ID NO: 2, or functional equivalents thereof, are used

Similarly, the bait polypeptide maybe any bait polypeptide deemed suitable for a given use. The bait polypeptide can be anything that will directly or indirectly target the protein to specific cellular regions. It can be a full length protein or any fragment that imparts some kind of targeting or binding property. In various non- limiting embodiments, the bait polypeptide may comprise lamin-A (P02545; SEQ ID NO: 18), frataxin (Q16595; SEQ ID NO: 19), sun2 (Q9UH99; SEQ ID NO: 20), Nupl07 (P57740; SEQ ID NO: 21), torsin-A (014656; SEQ ID NO: 22), or fragments or homologues thereof.

The BPL and the bait polypeptide may be immediately adjacent in the fusion protein, or may be separated by any number of additional amino acids as desirable for a given purpose. Including such a "linker" region between the BPL and the bait polypeptide may, for example, extend the range of biotinylation and/or provide the enzyme with increased accessibility to label adjacent proteins. The BPL protein may be anywhere within the fusion protein, even internal between other domains N- terminal and C-terminal to the promiscuous BPL.

The recombinant cells are capable of expressing the fusion protein; thus, the recombinant cells comprise nucleic acids encoding the fusion protein, including but not limited to recombinant nucleic acids and vectors as described below. In one embodiment, the fusion protein-encoding nucleic acids in the recombinant cells may be present in an expression vector, such as a plasmid or viral-based vector. Any expression vector suitable for an intended use can be used in the present invention. Such expression vectors can be of any type known in the art, including but not limited to plasmid and viral-based expression vectors. The construction of expression vectors for use in transfccting prokaryotic and eukaryotic cells is also well known in the art, and thus can be accomplished via standard techniques. (See, for example, Sambrook, Fritsch, and Maniatis, in: Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1 89; Gene Transfer and Expression Protocols, pp. 109- 128, ed, E.J, Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX). The expression vector must be rcplicablc in the host organisms either as an episome or by integration into host chromosomal DNA. In a preferred embodiment, the expression vector comprises a plasmid. However, the invention is intended to include other expression vectors that serve equivalent functions, such as viral vectors. Specifics of the expression vector will depend on the ultimate desired use. Designing appropriate expression vectors for an intended use is well within the level of those of skill in the art based on the teachings herein.

The fusion protein-encoding nucleic acids in the expression vector are under control of a promoter capable of directing expression of the encoded fusion protein in the recombinant cell.. Any suitable promoter may be used that can direct expression (i.e.: is "operatively linked") of the encoded proteins. The term "promoter" includes any nucleic acid sequence sufficient to direct expression of the encoded protein(s), including inducible promoters, repressible promoters and constitutive promoters. If inducible, there are sequences present which mediate regulation of protein expression so that the polynucleotide is transcribed only when an inducer molecule is present. Such cis-active sequences for regulated expression of an associated polynucleotide in response to environmental signals are well known to the art. The expression vector may comprise any other control sequences as may be suitable for an intended use. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered "operably linked" to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, enhancers, termination signals, and ribosome binding sites.

The promoter sequence used to drive expression of the fusion protein may be constitutive (driven by any of a variety of promoters, including but not limited to, CMV, SV40, RSV, actin, EF) or inducible (driven by any of a number of inducible promoters including, but not limited to, tetracycline, ecdysone, steroid-responsive, etc.).

In another embodiment, the fusion protein-encoding nucleic acids may be integrated into cellular chromosomal DNA In the latter embodiment, the fusion protein-encoding nucleic acids may comprise a "knock-in" of the fusion protein- encoding nucleic acids into an endogenous gene. In this embodiment, the fusion protein-encoding nucleic acid may be knocked-in to any suitable cellular chromosomal gene. In one embodiment, the knock-in comprises a mouse knock-in to any suitable mouse cellular chromosomal gene.

In another embodiment, the method comprises supplementation of the culture medium with biotin. This embodiment is particularly useful when the recombinant cells do not endogenously produce biotin. Based on the teaching provided herein, it will be well within the level of skill in the art to determine appropriate timing and concentration of biotin supplementation of the culture medium, based on specifics of the assay to be carried out. Thus, in embodiments where biotin is added to the culture medium, it can be added at any suitable concentration, including but not limited to those described in the examples that follow. In one non-limiting embodiment, biotin can be added at con concentrations ranging from about 1 um to about 100 urn, or at about 50 um.

Suitable culture conditions will depend on the cell type used and all other specifics of a given assay, as will be understood by those of skill in the art. In one non-limiting embodiment, the recombinant cells are cultured in the presence of biotin (endogenous or exogenously added) for between about 0.5 and about 96 hours. In various other non-limiting embodiments, the recombinant cells are cultured in the presence of biotin (endogenous or exogenously added) for between about 1 and about 96 hours, about 2 and 96 hours, about 6 and 96 hours, about 12 and 96 hours, about 24 and 96 hours, about 0.5 and 72 hours, about 0.5 and 48 hours, about 0.5 and 24 hours, about 1 and 96 hours, about 1 and 72 hours, about 1 and 48 hours, about 1 and 24 hours, about 2 and 96 hours, about 2 and 72 hours, about 2 and 48 hours, about 2 and 24 hours, about 6 and 96 hours, about 6 and 72 hours, about 6 and 48 hours, about 6 and 24 hours, about 12 and 96 hours, about 12 and 72 hours, about 12 and 48 hours, about 12 and 24 hours, about 24 and 96 hours, about 24 and 72 hours, or about 24 and 48 hours.

In another embodiment, identifying the biotinylated proteins comprises isolating the biotinylated polypeptides. This embodiment is optional, as one can use any technique for identifying the biotinylated proteins, including but not limited to by mass spectrometry analysis of crude cell lysates. In embodiments where the biotinylated proteins are isolated, any suitable method can be used, including but not limited to affinity purification, mass spectrometry, and Western blotting.

In one exemplary embodiment, the method comprises the following;

1. Prepare an expression plasmid encoding a fusion protein of interest.

2. For each experimental condition (eg. control, fusion protein) plate at least two wells of cells, one with glass coverslips and the other without.

3. Express the fusion protein in the cells of choice by transient transfection.

Process mock or non-transfected cells in parallel. Apply a suitable concentration of biotin to cells at time of transfection, or shortly after transfection if the transfection protocol used requires replacement of media shortly after transfection. Culture cells in presence of biotin for a suitable period of time, such as between 18-24 hours.

Process the cells (using any suitable technique) on coverslips for analysis by immunofluorescence microscopy following transfcction and culture.

Process the cells for immunoblot analysis following transfection and culture. Following whole cell lysis, electrophoretic separation of proteins and Western blotting, agitate the resulting membrane in a suitable blocking conditions. Agitate membrane in streptavidin-horseradish peroxidase (HRP) at a suitable dilution lin blocking buffer under suitable conditions.

Wash membrane to remove unbound streptavidin HRP.

Add enhanced chemiluminescence (ECL) reagent to observe biotinylated proteins.

Following successful analysis of biotinylated proteins, quench HRP signal on membrane.

Wash membrane to remove quenching solution.

Proceed to immunoblot membrane with antibodies specific to the fusion protein to confirm its expression and migration by SDS-PAGE.

In another exemplary embodiment, the method comprises the following: Begin with four 10 cm dishes for each experimental condition (cells expressing fusion protein or control cells).

When cells reach approximately 80% confluency, change medium to fresh complete medium containing suitable concentration of biotin.

Incubate cells for suitable time period, such as 18-24 hours..

Remove medium and rinse the cells.

Add lysis buffer and scrape cells gently to harvest the cells.

Transfer lysed cells to a conical tube.

Add a suitable amount of Triton X- 100 (such as a final concentration -2%) and mix.

Sonicate under suitable conditions.

Aliquot the sample to separate tubes.

Spin down at suitable speed (such as 16,500 x g relative centrifugal force (RCF)). 11. Place new tubes in a Magnetic Separation Stand and add a suitable amount of lysis buffer to each tube.

12. Equilibrate the beads in the binding buffer.

13. Remove the buffer the beads equilibrated in.

14. Resuspend the samples and beads.

15. Incubate the tube on a rotator under suitable conditions.

16. Place the tubes on the Magnetic Separation Stand and collect beads.

17. Remove the supernatant.

18. Add wash buffer to each rube and resuspend beads gently by pipetting.

19. Place tubes on a rotator under suitable conditions.

20. Remove the supernatant.

21. Pool beads in wash buffer and resuspend gently.

22. Remove detergents in the sample that may interfere with mass-spectrometry.

23. Prepare samples for mass-spec analysis.

In another aspect, the present invention provides recombinant nucleic acids, comprising:

(a) a first nucleic acid domain encoding a biotin protein ligase (BPL); and

(b) a second nucleic acid domain encoding a bait polypeptide.

The BPL may be any BPL as deemed appropriate for a given use, including but not limited to any of those disclosed herein. In one embodiment, the BPL comprises a protein with an amino acid sequence of general formula 1 :

X 1 -X2-X3-X4-X5-X6-X7-X8-X9-X 10-X 11 -X 12-X 13-X 14-X 15-X 16-X 17-X 18-X 19- X20-X21 -X22-X23-X24-X25-X26-X27-X28-X29-X30-X31 -X32 (SEQ ID NO: 13), wherein

XI is selected from the group consisting of A, L, I, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K, E, and I;

X7 is Q;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, K, N, S, Q, and E; X10 is G;

XI 1 is selected from the group consisting of R and K;

X12 is G;

X13 is any residue other than R;

X14 is selected from the group consisting of R, L, W, G, and S;

X15 is selected from the group consisting of G, Q, P, and K;

X16 is selected from the group consisting of R and N;

X17 is selected from the group consisting of K, Q, V, E, T, M, and A;

X18 is W;

X19 is selected from the group consisting of F, L, Y, E, I, and V;

X20 is selected from the group consisting of S, M, and N;

X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consisting of F, E, A, , Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C, or is absent;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L, or is absent;

X27 is selected from the group consisting of A, G, and S, or is absent;

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, I, A, and F;

X30 is selected from the group consisting of Y, M, A, V, and L;

X31 is selected from the group consisting of L, G, V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

Proteins according to general formula 1 include the BPL active site based on a number of known BPLs (Sec Figure — sequence alignment, residues 511 to 542). As will be understood by those of skill in the art, there are many BPLs that in which the active site may include residues different from those recited in general formula I. Thus, this embodiment is an exemplary embodiment and is not intended to be limiting on the scope of the invention.

X4 is A;

X8 is T or S;

X13 is G; X15 is G;

X1 is L, I, or V;

X20 is S;

X21 is P

X23 is G;

X29 is L or I;

X30 is A. V, or L;

X31 is L, G, V, or I; and/or

X32 is S.

In a preferred embodiment, X13 is G. In this embodiment, the invariant R residue in wild type BPLs at position XI 3 is modified to any other amino acid residue, including but not limited to G, which results in enhanced promiscuity of the resulting BPL.

Proteins with an amino acid sequence according to SEQ ID NO: 14 include the BPL active site and additional residues required for BPL activity based on a number of known BPLs (See Figure 9; residues 506 to 796). As will be understood by those of skill in the art, there are many BPLs that in which the amino acid sequence required for activity may include residues different from those recited in SEQ ID NO: 14.

In one embodiment, the BPL for use in the invention comprises any residue other than R, including but not limited to a G residue at position 523 (XI 3 in general formula 1). In this embodiment, the invariant R residue in wild type BPLs at position

X13 is modified, which results in enhanced promiscuity of the resulting BPL

In one embodiment, the BPL comprises or consists of a BPL comprising or consisting of an amino acid sequence selected from the group consisting of SEQ ID NO:l-4 and 6-12. These amino acid sequences are also shown in Figure 8a through

Figure 8c (SEQ ID NOs: 1-4 and 6-12). In specific embodiments, the E. coli BirA* protein of SEQ ID NO: 1 or the A. aeolicus BPL mutant protein of SEQ ID NO: 2, or functional equivalents thereof, are used. Similarly, the nucleic acid encoding the bait polypeptide may be any bait polypeptide deemed suitable for a given use. In various non-limiting embodiments, the bait polypeptide may comprise lamin-A (SEQ ID NO: 18), f ataxin (SEQ ID NO: 19), sun2 (SEQ ID NO: 20), Nup 107 (SEQ ID NO: 21), torsin-A (SEQ ID NO: 22), or fragments or homologues thereof.

The coding regions for the BPL and the bait polypeptide may be immediately adjacent in the fusion protein, or may be separated by any number of additional nucleic acids as desirable for a given purpose. Including such a "linker" region between the BPL and the bait polypeptide coding regions may, for example, extend the range of biotinylation and/or provide the encoded fusion protein with increased accessibility to label adjacent proteins.

The recombinant nucleic acid may be of any type, including DNA and RNA, and may be single stranded or double stranded. It will be understood by those of skill in the art that the first domain and the second domain can be present in any order in the recombinant nucleic acid (i.e. : the first domain may be 5 ' or 3 ' to the second domain); similarly the BPL protein may be anywhere within the fusion protein, even internal between other domains N-terminal and C-terminal to the BPL. The recombinant nucleic acid may further comprise "spacer" nucleotides between the first domain and the second domain; such spacer sequences may be of any length suitable for a given purpose, as will be understood by those of skill in the art.

In one embodiment, the fusion protein-encoding nucleic acids in the recombinant cells may be present in an expression vector, such as a plasmid or viral- based vector. Any expression vector suitable for an intended use can be used in the immunogenic compositions of the present invention. Such expression vectors can be of any type known in the art, including but not limited to plasmid and viral-based expression vectors. The construction of expression vectors for use in transfecting prokar otic or eukaryotic cells is also well known in the art, and thus can be accomplished via standard techniques. (See, for example, Sambrook, Fritsch, and Maniatis, in; Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1 89; Gene Transfer and Expression Protocols, pp. 109-128, ed. E.J. Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin, TX). The expression vector must be replicable in the host organisms either as an episome τ by integration into host chromosomal DNA. In a preferred embodiment, the expression vector comprises a plasmid. However, the invention is intended to include other expression vectors that serve equivalent functions, such as viral vectors. Specifics of the expression vector will depend on the ultimate desired use. Designing appropriate expression vectors for an intended use is well within the level of those of skill in the art based on the teachings herein.

In expression vectors of the invention, the first domain and the second domain are operatively linked to a promoter. Any suitable promoter may be used that can direct expression (i.e.: is "operatively linked") of the encoded fusion protein. The term "promoter" includes any nucleic acid sequence sufficient to direct expression of the encoded protein(s), including inducible promoters, repressible promoters and constitutive promoters. If inducible, there are sequences present which mediate regulation of protein expression so that the polynucleotide is transcribed only when an inducer molecule is present. Such cis-active sequences for regulated expression of an associated polynucleotide in response to environmental signals are well known to the art. The expression vector may comprise any other control sequences as may be suitable for an intended use. The control sequences need not be contiguous with the nucleic acid sequences, so long as they function to direct the expression thereof. Thus, for example, intervening untranslated yet transcribed sequences can be present between a promoter sequence and the nucleic acid sequences and the promoter sequence can still be considered "operably linked" to the coding sequence. Other such control sequences include, but are not limited to, polyadenylation signals, enhancers, termination signals, and ribosome binding sites.

In a further aspect, the present invention provides recombinant fusion proteins expressed from the recombinant nucleic acids of the invention. Thus, in one embodiment, the fusion proteins comprises a BPL according to any embodiment or combination of the embodiments disclosed herein and a bait polypeptide, including but not limited to lamin-A, frataxin, sun2, Nupl07, torsin-A, or portions thereof.

The isolated polypeptides can be used, for example, in contacting candidate proteins in an extracellular environment to test verify candidate protein interactions with the bait polypeptide. In a further aspect, the present invention provides recombinant host cells comprising an expression vector of the present invention. The cells may be prokaryotic or eukaryotic. Non-limiting embodiments include bacterial cells, trypanosomal cells, protozoan cells, fungal cells, vertebrate cells, and mammalian cells (such as mouse cells). The expression vectors may be transiently or stably present in the host cell. As will be apparent to those of skill in the art, the recombinant host cells of the invention are capable of expressing the fusion protein, and thus can be used in the methods of the invention.

In one embodiment, the recombinant host cell comprises a recombinant nucleic acid of any embodiment or combination of embodiments of the present invention knocked-in to an endogenous host cell gene. As is understood by those of skill in the art, a "knock-in" refers to insertion of the recombinant nucleic acid encoding the fusion protein at a specific chromosomal locus (i.e., a targeted insertion). The fusion protein-encoding nucleic acid may be knocked-in to any suitable cellular chromosomal gene. In a further embodiment, the knock-in is generated in a cell (such as an embryonic stem cell) of a non-human organism, such as a mammal including but not limited to a mouse. Knock-in procedures are well defined in a variety of organisms where embryonic stem cells are easily obtained and manipulated.

Thus, in another aspect, the present invention provides a transgenic, non- human organism comprising a recombinant host cell of the present invention. In one embodiment, the organism is a mammal. In a specific embodiment, the organism is a mouse. In one embodiments, recombinant host cells of the present invention comprise, embryonic stem cells which are then used to produce the transgenic non- human organisms of the present invention using standard techniques in the art.

Example 1

Here we describe an approach to screen for proximate proteins in a relatively natural cellular environment. Our method to identify neighboring and potentially interacting proteins is based on the use of a promiscuous prokaryotic biotin protein ligase. The biotin ligase is fused to a protein of interest, and then introduced into mammalian (or other) cells where it will biotinylatc vicinal proteins upon supplementation of the culture medium with biotin. Biotinylated proteins can then be selectively isolated and identified by conventional methods, most notably mass spectrometry. We have applied this strategy, which we call BioID™, to identify candidate proteins that are proximate to and/or interact with human lamin A (LaA), a well-characterized component of the nuclear envelope (NE), a specialized extension of the endoplasmic reticulum that surrounds the nuclear contents during interphase.

LaA is an intermediate filament protein and member of the A-type lamin family that is encoded by the LMNA gene (Gerace and Huber, 2012). Together with B-type lamins, the A-type lamins are constituents of the nuclear lamina, a filamentous protein meshwork that is intimately associated with the inner nuclear membrane (INM), the membranous portion of the NE that faces the interior of the nucleus. This association is mediated, at least in part, by multiple interactions with integral INM proteins. In addition, nuclear pore complexes (NPCs), large multi-protein channels that span the nuclear membranes and which mediate nucleocytoplasmic trafficking of macromolecules, are anchored to the nuclear lamina (Aaronson and Blobel, 1975; Dwyer and Blobel, 1976). Although the bulk of the A- and B-type lamins are localized to the nuclear lamina, a nucleoplasmic population is thought to function in various aspects of nuclear metabolism, including transcription and replication (Mon^¬ et al., 2000; Goldman et al., 2002).

In mammalian somatic cells, the nuclear lamina is roughly 15-20 tun thick and is considered to represent an important structural element of the NE (Gerace and Huber, 2012). Indeed, the role of the nuclear lamina as a determinant of both NE and global nuclear architecture has been highlighted by findings that mutations in the LMNA gene are linked to multiple human diseases including muscular dystrophy, lipodystrophy, and premature aging syndromes {Worman et al., 2009; Worman, 2012). Many of these disorders, known as laminopathies, are associated with oftentimes gross perturbations in nuclear and NE organization. To better understand the etiology of the laminopathies, much effort has been focused on identifying lamin- interacting proteins. However, both A- and B-type lamins are highly insoluble and consequently it has proven extremely difficult to define their molecular associations using conventional approaches. For these reasons we felt that LaA represented an ideal candidate with which to evaluate the utility of BioID™ as a general proximity- based approach to screen for potential protein-protein interactions. At the same time BioID™ introduces a new strategy with which to further explore LaA function.

Our development and use of BioID™ to identify LaA-proximal proteins has revealed a number of abundant candidates among which are known inteiactors of LaA. These include integral proteins of the INM as well as NPC components. Less abundant candidates fall into functional categories that include transcription, chromatin regulation, RNA processing, and DNA repair. An uncharacterized protein was also among the more prominent candidates revealed by LaA BiolD™. We demonstrate that this protein, which we have named SLAP75, is a novel constituent of the NE that appears to be expressed in a cell type-specific fashion. Taken together, these findings demonstrate that BiolD™ is an effective method to screen for proximate and interacting proteins. This relatively simple and rapid technique has broad applicability to monitor protein behavior in live cells, providing a number of advantages over existing methods.

We sought to generate a method for labeling proteins in a proximity- dependent manner in mammalian cells. We envisioned a system based on the fusion of a protein of interest to an enzyme that could selectively modify vicinal proteins in vivo (Fig. 1 a). There are two requirements for such a system. The first is that the fusion protein must be targeted appropriately when expressed in cells. The second is that the modification itself must facilitate isolation of the specifically labeled proteins. Because it is relatively uncommon in vivo and amenable to selective isolation,we chose to focus on biotinylation.

BirA is a 35-kD DNA-binding biotin protein ligase in Escherichia coli that regulates the biotinylation of a subunit of acetyl-CoA carboxylase and acts as a transcriptional repressor for the biotin biosynthetic operon (Chapman-Smith and

Cronan, 1 99). BirA has been harnessed for experimental applications, including use in eukaryotic cells. The BirA acceptor-peptide system takes advantage of the extreme specificity of BirA in biotinylating its substrate peptide (Beckett et al., 1999). With this system, a minimal recognition sequence, a biotin acceptor tag (BAT), is fused to a protein of interest and coexpressed with BirA. This leads to the biotinylation of the BAT sequence permitting one-step high affinity (Kj = 10^~H M; Green, 1 63) avidin streptavidin-mediated purification of the tagged protein. Because biotinylation is a rare modification, in mammalian cells it is restricted primarily to only a few carboxylases (Chapman-Smith and Cronan, 1 99); BAT-independent binding is minimal.

Biotinylation by BirA is a two-step process. The first of these combines biotin and ATP to form biotinoyl-5'-AMP (bioAMP; Lane et al., 1964). This activated biotin is held within the BirA active site until it reacts with a specific lysine residue of the BAT sequence in the second step. For our purposes, the problem with BirA lies with its stringent selectivity for its endogenous substrate. What we desired was a far more promiscuous biotin ligase. This requirement led us to certain BirA mutants that prematurely release the highly reactive yet labile bioAMP (Kwon and Beckett, 2000; Streaker and Beckett, 2006). One such BirA mutant (Rl 18G, hereafter called BirA*), which is defective in both self-association and DNA binding (Kwon et al., 2000), and displays an affinity for bioAMP two orders of magnitude less than that of the wild- type enzyme (BirA-WT; Kwon and Beckett, 2000). In E, coli, BirA* expression results in promiscuous protein biotinylation because free bioAMP will readily react with primary amines. More significantly however, it has been demonstrated in vitro that BirA* will promiscuously biotinylate proteins in a proximity-dependent fashion (Choi-Rhee et al., 2004; Cronan, 2005).

We explored the possibility that BirA* would promiscuously biotinylate proteins in live mammalian cells. To this end we generated myc epitope-tagged, humanized BirA-WT and BirA* for transient expression in HeLa cells. Western blot analysis using streptavidin-HRP revealed modest levels of biotinylated proteins with BirA* as compared with BirA-WT (Fig. 2 a). Addition of 50 μΜ biotin to tissue culture medium, however, results in a massive stimulation of promiscuous biotinylation by BirA* but not BirA-WT (Fig. 2 a). By fluorescence microscopy the distribution of biotinylated proteins appears similar to that of myc-BirA* itself, which is predominantly nuclear with a subpopulation found in the cytoplasm (Fig. 2 b). These results indicate that BirA* promiscuously biotinylates proteins in mammalian cells. Furthermore, the level of that biotinylation is primarily regulated by the concentration of available free biotin. In conventional tissue culture media formulations fetal calf serum is the source of biotin. Our results indicate that the concentrations of biotin in standard complete media are insufficient to fuel significant biotinylation by BirA*. This has also been demonstrated for BAT biotinylation by BirA-WT (Nesbeth et al., 2006; Kulman etal., 2007), suggesting that it is not a BirA*-specifk phenomenon. Application of BioID™ to the nuclear lamina

Wc next wished to determine whether BirA* could be used as a tool to identify vicinal proteins in vivo. To this end we fused myc-BirA* to the N terminus of human LaA. a well-characterized constituent of the nuclear lamina. During interphase LaA has a relatively restricted distribution within the cell. It is detected for the most

2Θ part at the NE with a subpopulation found throughout the nucleoplasm (Goldman et at., 2002). To provide consistent and controllable expression levels, we generated HEK293 cells that stably and inducibly express myc-BirA*LaA (Fig. 1 b). In these cells, myc-BirA*LaA localizes predominantly to the nuclear envelope, similar to both endogenous LaA and LaA harboring an N-terminal GFP tag, a modification that does not appear to alter the function of LaA (Broers et al., 1999; Shumaker et al.₅ 2006). Biotinylation of endogenous proteins in cells expressing myc-BirA*LaA, either in the presence or absence of exogenous biotin, was monitored on Western blots probed with streptavidin-HRP. As is the case with myc-BirA* alone, the presence of 50 μΜ biotin in the culture medium strongly stimulates biotinylation of a wide range of endogenous proteins, in addition to myc-BirA *LaA itself (Fig. 3 a). Microscopy using fluorescent streptavidin reveals that the bulk of these biotinylated proteins must reside at the NE and colocalize with myc-BirA*LaA (Fig. 3 b). The implication is that proteins in the vicinity of myc-BirA*LaA are preferentially biotinylated. It should be noted that not only does the intracellular localization of these biotinylated proteins differ between myc-BirA* (predominantly nucleoplasmic) and myc-BirA*LaA (predominantly at the NE), but their electrophoretic mobilities and hence identities also differ as revealed by Western blot analysis (Figs. 2 a and 3 a). These results suggest that BirA* can be targeted to a specific cellular location and will biotinylate endogenous proteins in a proximity-dependent manner.

Temporal regulation of BioID^{I M}

The requirement for exogenous biolin suggested to us a means to modulate BirA* activity. To explore this further, HE 293 cells expressing myc-BirA*LaA were analyzed by Western blot at various times after addition of 50 μΜ biotin to their culture medium (Fig. 3 c). Levels of biotinylated proteins increase in parallel with the duration of biotin exposure. This effect reaches saturation within 6 to 24 h with no obvious increase observed at later time points. A similar increase in biotinylation can also be observed by fluorescence microscopy (Fig. 3 d). Both methods reveal a time- dependent accumulation of biotinylated proteins, the majority of which appear to be endogenous and which evidently co-localize with myc-BirA*LaA. These studies indicate that by controlling access to biotin we can temporally regulate biotinylation by BirA*. This opens up the possibility of performing pulse-chase type experiments using this technique. Identification of vicinal proteins with BioID -LaA

We next set out to test our hypothesis that proteins biotinylated by myc- BirA*LaA should be enriched with known interactors of LaA as well as with near neighbors within the nuclear lamina and I M, and to a lesser extent within the nucleoplasm. To accomplish this we induced myc-BirA*LaA expression in HEK293 cells in the presence of doxycycline and 50 μΜ biotin for 24 h and then lysed the cells under stringent denaturing conditions using an SDS-containing buffer (Fig. 1 b). Parental HE 293 cells, processed in parallel, were used as controls. For these experiments 4.0 * 10⁷ cells (four confluent 10-cm dishes) were analyzed. Biotinylated proteins were captured with streptavidin immobilized on paramagnetic beads, rigorously washed, and bound proteins analyzed by mass spectrometry. Proteins unique to the BiolD™-LaA (myc-BirA*LaA) pull-down, and not detected with identical pull-downs from control cells, were categorized based on localization and function (Fig. 3 e). The relative abundance of the identified proteins within each category is given as a percentage of the total. The bulk of the proteins identified by BioID™-LaA are known NE components, including a number of INM proteins. The most abundant of these are the β and γ iso forms of lamina-associated polypeptide 2 (LAP2, TMPO) and lamina-associated polypeptide 1 (LAPl, TORIAIP). LAPl has a documented association with LaA (Foisner and Gerace, 1993), as does LAP2a, a soluble LAP2 isoform that also appeared prominently in the dataset (Dechat et al., 2000). Two other INM proteins, emerin (EMD; Lee et al., 2001) and MAN1 (LEMD3; Mansharamani and Wilson, 2005) identified by BioID™-LaA are also known to interact with LaA. An additional ΓΝΜ protein detected in our screen was SAMP1 (TMEM201). Also known as NET5, SAMP1 was originally identified in a proteomic analysis of rat liver nuclear membrane proteins (Schirmer et al., 2003). A recent study (Gudise et al., 2011) suggests that SAMP1 is part of a protein network that includes A-type lamins and L1NC complexes. The latter are evolutionarily conserved protein assemblies that span the NE and couple nucleoskeletal and cytoskeletal structures (Burke and Roux, 2009).

12 proteins associated with nuclcocytoplasmic transport were detected by BioID-LaA. The three most prominent of these, Nupl53, Nup50, and ELYS have been localized to the nucleoplasmic face of NPCs where they would be situated in the vicinity of the nuclear lamina (Sukegawa and Blobel, 1993; Guan et al., 2000; Walther et al., 2001; Rasala et al., 2008). At least one of these, Nupl53, has previously been shown to interact directly with LaA (Al-Haboubi et al., 2011). A fourth NPC protein, Tpr, which is itself associated with Nu l53 (Hase and Cordes, 2003; Krull et al., 2004), also appeared in the BioID™ screen. The detection of these NPC proteins by BioID™-LaA is consistent with an NPC anchorage function for the nuclear lamina.

Several additional classes of proteins were represented among the BioID™- LaA candidates, albeit at lower levels. These included proteins associated with DNA repair, transcription, chromatin regulation, and RNA-processing. Proteins considered to be components of a nucleoskeleton were also detected, the most abundant of which was filamin A (FLNA; Castano et al., 2010).

Identification of a novel ΝΈ constituent detected by BioID™-LaA

An uncharacterized protein of 75 kD, FAM169A (KIAA0888), featured prominently in the BioID™-LaA dataset. FAM169A has no predicted transmembrane domain and lacks any sequence motifs that might provide clues to its function. To test the possibility that FAM169A is a novel NE constituent we examined the localization of the endogenous protein in HEK293 cells by immunofluorescence microscopy. Fig. 4 a clearly shows that FAM169A is concentrated at the NE. Differential permeabilization of HE 293 cells with digitonin versus Triton X-100 indicates that FAM169A resides on the nuclear face of the NE. We also introduced human HA epitope-tagged FAM1 9A into HeLa cells, which do not appear to express this protein. Consistent with the findings in HEK293 cells, recombinant FAM169A, detected using the anti-FAM16 A localizes predominantly to the NE (Fig. 4 b), although in both cell types we could always observe what appeared to be a nucleoplasmic population. In neither cell line was there any obvious association with NPCs. Taken together, these findings indicate that FAM169A is a novel NE component that must be enriched at the nuclear lamina or at the interface of the lamina and ΓΝΜ. We therefore propose to name this protein, SLAP75 (for soluble lamina-associated protein of 75 kD). Besides SLAP75, only two other soluble proteins (other than the lamins themselves), barrier to autointegration factor (BAF; Segura-Totten et al., 2002) and germ cell-less (GCL; Holaska and Wilson, 2006), have been shown to accumulate at the nuclear lamina. Proteomic screens have identified scores of membrane proteins thai are enriched at the nuclear periphery (Schirmer et al., 2003). Our identification of an entirely new peripheral membrane constituent of the NE highlights the use of BioID™ as a valuable complement to these earlier studies. Furthermore, it confirms the use of BioID™ as an effective proximity- based tool to screen for neighboring and potentially interacting proteins. With this in mind, we have identified 10 other uncharacterized proteins, including UPF0428 protein CXorf56, UPF0414 transmembrane protein C20orf30, UPF0552 protein C15orf38, and uncharacterized protein C9orf78.

Discussion

We have devised a simple and rapid technique, BioID™, which provides a means of identifying neighboring and potentially interacting proteins in vivo. The method takes advantage of BirA*, a highly promiscuous form of the E. coti BirA biotin protein ligase. BirA* may be targeted to specific subcellular locations by fusion to a "bait" protein. Nearby proteins, biotinylated by BirA*, can then be recovered in a single step on streptavidin-coated beads and identified by mass spectrometry. The only requirement for BioID™ is the expression of a single fusion protein.

Consequently, BioID™ is applicable to map protein associations in essentially any accessible cell type, mammalian or otherwise.

There are currently two strategies that are widely used to detect protein interactions. The first of these involves the yeast two-hybrid (Ύ2Η) system and takes advantage of the ability of hybrid transcription factor domains to functionally associate, thereby driving expression of reporter genes. The second strategy is based upon co-immunoprecipitation or pull-down, frequently involving expression of single- or double-tagged bait proteins. Irnmunoprecipitated proteins are then identified by mass spectrometry. A significant attribute of the Y2H approach is that because it is based on a cDNA library screen, it is more likely to detect weak interactions or interactions between low abundance proteins. Furthermore, it is the method of choice where the focus is on proteins that may only be expressed in rare cell types. On the other hand, it is contingent upon proteins, or protein fragments maintaining their ability to fold correctly and to associate when removed from their normal cellular environment because by definition these interactions must take place within yeast, often in subcellular regions unlike that which they normally inhabit and without their normal complement of associated proteins and post-translational modifications. In many situations this may present a significant problem, especially when membrane proteins enter the equation. The other side of the coin is that incorrectly folded "bait" or "prey" proteins, while failing to interact with their cognate partners, may display other spurious interactions and hence give rise to false positives.

The pull-down approach has provided valuable data in a variety of systems. However, it has two limitations. The first of these is the problem of scale when dealing with low abundance proteins. Simply put, such proteins may not be detected where it proves impractical to prepare or manipulate sufficient start material. The second limitation concerns solubility. Conditions required to sohibilize many bait proteins may not be compatible with preserving interactions with partner proteins and vice versa. This becomes especially significant when considering weak interactions. In the case of lamin A, a highly insoluble protein, this has proved to be a serious stumbling block in the reliable identification of interacting proteins. Recently, Kubben et al. (2010) have introduced a work-around for this problem. They have used chemical cross-linking to stabilize lamin complexes before solubilization and pull- down. Significantly, this approach detected many of the same putative LaA interactors that we have now identified using BioID™-LaA. Cross-linking certainly represents a valuable enhancement to the pull-down strategy. However, as an added variable it may in turn introduce additional artifacts such as aggregation.

BioID™ provides an improvement to these prior approaches in the characterization of potential protein-protein interactions and near-neighbor analyses. BioID™ uniquely combines two important attributes. The first of these is that it detects potential interactions in their normal cellular context. The second is that it sidesteps issues associated with bait or prey protein solubility. Because the key step of biotinylation occurs before solubilization it will detect both weak and transient interactions. Both of these features are highlighted in the present example using BioID™-LaA data where both soluble and membrane proteins were efficiently detected.

BioID™ relies on the expression of an exogenous protein that is fused to BirA*, a protein slightly larger than GFP. With respect to myc-BirA*LaA, the fusion protein appears to be targeted appropriately to the nuclear lamina where it shares the same solubility properties as both wild-type and GFP-taggcd LaA. A more subtle issue may arise through biotinylation. Although we observed no evidence of a detrimental effect, our studies have used the addition of excess biotin to cell culture media to enhance the biotinylation of vicinal proteins. The covalent attachment of biorin to primary amines, predominantly lysines, leads to the loss of charge on these sites and at the same time could inhibit other secondary modifications. These effects might in turn alter the behaviors of both the fusion protein and neighboring proteins. The efficacy of BioID™ is contingent upon the ability to biotinylate neighboring proteins, which is in turn dependent on the number and availability of primary amines in these proteins. Consequently, the abundance of the biotinylated proteins should not be used to indicate the strength or abundance of an association. Similarly, the absence of biotinylation does not rule out interaction or proximity. BioID™-mediated biotinylation is not generally used to validate an actual protein interaction, but instead is used as a screen to identify candidates that can be subsequently investigated systematically or in a hypothesis-based manner.

Based on our mass spectrometry results we see clear evidence that BioID™ identifies well-characterized protein interactors of LaA, including a number of proteins detected by Kubben et al. (2010) using complex purification in combination with chemical cross-linking (LAP1, LAP2 isoforms, Emerin, and MAN1). It is clear that most (if not all) of the more abundant proteins identified with BioID™-LaA, amounting to more than 50% of those detected, largely reside in close proximity to the INM. These could fell into the transient, indirect, or vicinal categories.

Furthermore, BioID™ identified SLAP75, a previously uncharacterized protein that is clearly enriched at the nuclear lamina. Other NE proteins in the dataset are lamins B 1 and B2 and lamin B receptor (LBR). The relatively low level of detection of B-type lamins could be a reflection of findings that A- and B-type lamins may be segregated into separate filament systems (Shiim et al., 2008). LBR is not known to interact with A-type lamins and its appearance could be simply a consequence of indirect interactions and/or proximity. However, it was detected by Kubben et al. (2010) using their approach of LaA affinity-capture combined with chemical cross-linking.

Also included in the list of identified proteins, albeit at reduced levels, are many nuclear proteins associated with DNA repair, transcription, chromatin regulation, and RNA processing. These proteins are not predominantly enriched at the NE, raising the question of how they were biotinylated by BioID™-LaA. We propose that these represent either a subpopulation of nuclear proteins that transiently associate with LaA at the NE and/or were biotinylated by nucleoplasm^ BioH)™- LaA (Goldman et al., 2002). Several of these proteins have what could be described as a circumstantial connection to LaA and might therefore be pan of a LaA interaction network. PARP1, MDCl, NUMA, and NONO were all detected as part of the BAF proteome (Montes de Oca et al., 2009), i.e., they associate either directly or indirectly with BAF, while BAF itself is known to associate directly with LaA (Holaska et al., 2003). Consequently, the detection of these proteins by BioID™-LaA could be a reflection of these reciprocal associations. However, BAF itself was not picked up in the BioID™-LaA screen, potentially due tc its small size (89 residues) and/or due to limited association in these cells.

Mutations in LMNA and EMD, the gene encoding emerin, both give rise to Emery-Dreifuss muscular dystrophy (EDMD2, 3, and EDMD1, respectively; Bione et al., 1994; Bonne et al., 1999, 2000; Raffaele Di Barletta et al., 2000). LaA and emerin are known to interact (Lee et al., 2001); indeed, emerin was one of the more abundant proteins detected in the BioID™-LaA screen. Defects in the genes encoding at least three other proteins, nesprin-1, nesprin-2, and FHL1 (four-and-a-half LIM protein 1) are also known to cause EDMD (EDMD4-6, respectively; Zhang et al., 2007;

Gueneau et al., 2009). Both nesprin- 1 and nesprin-2 are LINC complex and NE components. FHL1, although apparently nucleoplasmic and cytoplasmic (there are three splice isoforms), was detected by BioID™-LaA. This raises the possibility that these proteins may constitute an interaction network that if disrupted gives rise to the common phenotype of EDMD (Simon and Wilson, 2011).

Some of the proteins identified by BioID™-LaA are classified as either cytoplasmic or ER residents. The latter are all membrane proteins. It is possible that at least some of these could have access to the Π Μ, although not concentrate there. Certainly there is precedent for this (Torrisi and Bonatti, 1985; Torrisi et al., 1 87). Alternatively, these cytoplasmic and ER proteins might become biotinylated during mitosis when the NE breaks down and lamins aie dispersed throughout the cytoplasm. We are currently investigating the application of BioID in synchronized cell populations that may shed light on these possibilities. It should be noted that ACE (angiotensin-converting enzyme), a type-I membrane protein synthesized in the ER, was likely identified due to nonspecific binding, as there are no available primary amines for biotinylation in its cytoplasmic domain.

Several cytoplasmically oriented NPC proteins, including Nup214 and Nup358, were found to be biotinylated. As with the cytoplasmic and ER proteins, it is possible that this biotinylation occurs during mitosis. It is also possible that this might occur during nuclear import of myc-BirA*LaA. However, the fact that import receptors were not detected in the screen places a question mark over this. On the other hand, the large size of these nucleoporins may have biased their identification by mass spectrometry. The significantly more abundantly represented nucleoplasmic NPC proteins such as Nupl 53, Nup50, ELYS, and TPR likely reflect the close association between the NPCs and the lamina (Daigle et at., 2001). Certainly, Nupl53 has already been shown to interact with LaA (Al-Haboubi et al., 201 1). These interactions could explain the altered distribution of NPCs observed in LMNA- deficient cells (Sullivan et al., 1 99).

Based on our results here, we conservatively estimate that 50% of the proteins detected by BioID™-LaA predominantly reside in the INM, the nuclear lamina, or the nucleoplasmic face of NPCs. The nuclear lamina in mammalian somatic cells is generally agreed to be ~ 15 20 nm thick (Aaronson and Blobel, 1 75; Dwyer and Blobcl, 1976) and is closely apposed to the INM (Geracc and Hubcr, 2012). Nupl53 and Nup50 appear to be associated with the nuclear ring of NPCs, as does the N- terminal region of TPR (Guan et al., 2000; rull et al., 2004). This would place these nucleoporins at about the level of the nuclear lamina. Taken together, these findings suggest that roughly 50% of detected proteins likely reside within -20-30 nm of the nearest LaA molecule, and could well be much closer. More accurate measurements are hmited by the population of nucleoplasmic mycBirA*LaA and the considerable mobility of most proteins over the 24-h labeling period.

A significant finding that lends additional credence to the utility of BioID™ is the observation that histones, which are lysine rich and highly abundant in the nucleus, constitute a disproportionately small f action of the identified proteins. This indicates that BioID™ is not generating widespread biotinylation, but is more selectively labeling only those proteins in immediate proximity to the fusion protein. This could also be inferred from our fluorescence microscopy data where the streptavidin labeling is colocalized with the myc-BirA*LaA and restricted largely to the NE. It should also be noted that low levels of histones are reported to be biotinylated in vivo ( uroishi et al., 2011). We detected biotinylated histones (HI .3/H1.4 and HI .0) in our control preparations at levels substantially lower than the four endogenously biotinylated mammalian carboxylases.

In summary, BioID™ provides a powerful new approach to probe protein interactions and proximity in a variety of cell types, it is a technique that is accessible to a broad range of researchers comfortable working with conventional molecular and cell biology techniques, and does not require specialized equipment other than the proteomic analysis that has become a commonly available service. . While the present example focuses on mammalian cells, BiolD™ can be applied in cells from a wide variety of species as well as in model organisms.

Materials and methods

Plasmids

Humanized BirA (Mechold et al., 2005) was mutated to Rl 18G by overlap extension PCR. Products for both the WT and Rl 18G contain a 5' Sail site and at the 3' end, an Xhol, stop codon and Aflll. These were digested with Sail and Aflll and inserted into pcDNA3.1 C-terminal to a myc-epitope digested with Xhol and Aflll. Human LaA was excised from pcDNA3.1 by Xhol and Aflll and inserted in frame with the mycBirA* in pcDNA3.1 using the same restriction sites. The entire myc- BirA*LaA sequence was removed from pcDNA3.1 by Nhel and Aflll, bunted and inserted into pRetroX™.Tight.puro that was digested with EcoRI and blunted. Clones were screened for proper directionality. pRetroX™.Tight.puro is a puromycin selectable mammalian expression vector that contains a Tet-on-based tetracycline- inducible promoter to inducibly regulate expression.

Cell culture and generation of stable cell lines

pRetroX™ Tet-ON Advanced HEK293 cells (Takara Bio Inc.) that stably express the doxycycline-regulated transactivator protein were transiently transfected with pRetroX™-Tight.puro myc-BirA*LaA with Lipofectamine 2000 (Invitrogen; Roux et al., 2009). Cells began selection with 0.5 μg ml puromycin 48 h after transfection. Upon colony formation, subclones were isolated and screened by immunofluorescence after induction by the addition of 1 ml doxycycline for 24 h. Immunofluorescence

Cells were fixed with 3% paraformaldehyde PBS and permeabilized in 0.4% Triton X-100/PBS (Roux et al., 2009). Differential permeabilization was performed after paraformaldehyde fixation with 0.001% digitonin at 4°C for 10 min (Crisp et al., 2006; Liu et al, 2007; Roux et al., 2009). Mouse anti-myc (1:10 9E10; American Type Culture Collection) and strcptavidin-568 (1:1,000; Invitrogen) were used to identify myc fusion proteins and biotinylated proteins, respectively. Other antibodies include rabbit ann-FAM16 A SLAP75 (1 :200; Sigma-Aldrich), mouse anti-HA (1:200 12CA5; Covance), mouse anti-Nupl53 (1:2, SA1; Bodoor et al., 1999), and mouse anti-LaA (1:100, XB10; Horton et al., 1992). Proteins were visualized with goat anti-mouse, goat anti-rabbit, or streptavadin coupled to Alexa Fluor 488 or -568 (1:1,000; Invitrogen). DNA was detected with Hoechst dye 33258. Coverslips were mounted in 10% Mowiol 4-88. The majority of images were obtained at 25°C using either a Leica DM B microscope (40x/l .00 PL FLUOTAR™ oil PH3 and 63x/l .32 HCL PL APO oil PH3 Leica objectives) running IPLab IVisian software, or an Applied Precision Delta Vision Core system based on an Olympus LX71 microscope equipped with a 60x NA 1.42 lens. Image acquisition and processing was accomplished using DeltaVision Resolve3D and Softworx 4.1.0 software. Both microscope systems were equipped with Photometries CoolSnap HQ cameras. Some conventional epifluorescence images of HA-SLAP75-transfected HeLa cells were acquired using a Zeiss Axioimager.Zl equipped with a 63x NA 1.4 lens and CoolSnap HQ camera.

Western blotting

Cells were lysed in Laemmli SDS-sample buffer, separated by SDS-PAGE and transferred to nitrocellulose (Liu et al., 2007). Immunoblotting was performed (Liu et al., 2007) with rabbit anti-myc (1:50,000; Abeam). Biotinylated proteins were detected similarly with the following modifications. Membranes were blocked in 2.5% bovine serum albumin in PBS with 0.4% Triton X-100 and incubated in the same buffer with HRP-conjugated streptavidin ( 1 :40,000; Invitrogen).

Affinity capture of biotinylated proteins

Cells were incubated for 24 h in complete media supplemented with 1 μ^'ηιΐ doxycycline and 50 μΜ biorin. After three PBS washes, cells (for small-scale analysis, <10⁷; for large scale analysis, 4 1 ⁷) were lysed at 25°C in 1 ml lysis buffer (50 mM Tris, pH 7.4, 500 mM NaCl, 0.4% SDS, 5 mM EDTA, 1 mM DTT, and lx Complete protease inhibitor [Roche]) and sonicated. Triton X-100 was added to 2% final concentration. After further sonication, an equal volume of 4°C 50 mM Tris (pH 7.4) was added before additional sonication (subsequent steps at 4°C) and centrifugation at 16,000 relative centrifugal force. Supernatants were incubated with 600 μΐ Dynabeads™ (MyOne Steptavadin CI; Invitrogen) overnight. Beads were collected and washed twice for 8 min at 25°C (all subsequent steps at 25°C) in 1 ml wash buffer 1 (2% SDS in dr^O). This was repeated once with wash buffer 2 (0.1% deoxycholate, 1% Triton X-100, 500 mM NaCl, 1 mM EDTA, and 50 mM Hepes, pH 7.5), once with wash buffer 3 (250 mM LiCl, 0.5% NP-40, 0.5% deoxycholate, 1 mM EDTA, and 10 mM Tris, pH 8.1) and twice with wash buffer 4 (50 mM Tris, pH 7.4, and 50 mM NaCl). 10% of the sample was reserved for Western blot analysis. Bound proteins were removed from the magnetic beads with 50 μΐ of Laemmli SDS-sample buffer saturated with biotin at 98°C. For the larger scale preparation, 90% of the sample to be analyzed by mass spectrometry was washed twice in 50 mM NH₄HCO₃. Protein identification by mass spectrometry

For reduced scale experiments, proteins eluted from the streptavidin beads by SDS-sample buffer were reduced and alkylated and separated by ID SDS-PAGE. Separated proteins were visualized by colloidal Coomassie blue staining. The whole gel lane was cut in 24 equal-sized gel bands, destained, and submitted to tryptic in-gel digestion, all using perforated microtiter plates (Proxeon) with exchange of solvents by low-speed centritugation. Peptides were eluted into V-bottom polypropylene microtiter plates, freeze-dried, dissolved in 0.1% formic acid in water, and submitted to nano-flow HPLC coupled to a QTOF mass spectrometer (1260 nanoHPLC [Agilent Technologies] and QTOF 6554 with ChipCube [Agilent Technologies]). Separation of peptides was performed on a 150-mm ^χ 75-um C 18 Reprosil column in a chip (Chip II; Agilent Technologies). The applied gradient was from 8% acetonitrile in water with 0.2% formic acid to 35% acetonitrile in water with 0.2% formic acid over 35 min. The mass spectrometer calibration was maintained by continuous submission of a calibrant solution and recalibration of the acquired spectra after the analytical run. The LC-MS/MS system was controlled by MassHunter™ Acquisition software (Agilent Technologies), 4 MS spectra per second and 3 MS/MS spectra per second were collected. The MS to MS/MS switching was done data dependent with a threshold of 1,000 counts and a charge of 2-4 for the peptides. Raw data were converted into mzdata.xml using MassHunter™ Qualitative Analysis software

(Agilent Technologies) and database search was performed using MASCOT™ 3.2 (MatrixScience) and human IPI database (version 3.65). Carboxymethylated Cys was set as fixed modification, oxidized Met, deamidation of Asn and Gin, pyroGlu formation of the N terminus and acetylation of the N terminus as variable modification. The resulting .dat files were loaded into SCAFFOLD Q+ (Proteome Software). The acceptance level for proteins was two identified peptides with minimum 95% probability each. Spectra of candidates were verified visually.

For large-scale analysis, on bead tryptic digests were analyzed by 1 D LC/MS MS by the Sanford-Burnham Proteomic Facility (La Jo 11a, CA). Tris(2- carboxyethyljphosphine (TCEP) was added to 100 μΐ of beads suspension mix and proteins were reduced at 37°C for 30 min. Iodoacetamide was added (to 20 mM) and proteins were alkylated at 37°C for 40 min in the dark. Mass spectrometry grade trypsin (Promega) was added (—1 :50 ratio) for overnight digestion at 37°C. Magnetic beads were removed by centrifugation. Formic acid was added to the peptide solution (to 2%) before on-line analysis of peptides by high-resolution, high-accuracy LC- MS MS, consisting of a Michrom™ HPLC, a 15-cm Michrom Magic™ CI 8 column, a low-flow ADVANCED Michrom™ MS source, and a LTQ-Orbitrap™ XL (Thermo Fisher Scientific). A 120-min gradient of 10-30%B (0.1% formic acid, 100% acetonitrile) was used to separate the peptides. The total LC time was 141 min. The LTQ-Orbitrap™ XL was set to scan precursors in the Orbitrap at a resolution of 60,000, followed by data-dependent MS/MS of the top four precursors. Raw LC- MS MS data were submitted to Sorcerer Enterprise™ (Sagc-N Research Inc.) for protein identification against the IPI human protein database, which contains semi- tryptic peptide sequences with the allowance of up to two missed cleavages and precursor mass tolerance of 50.0 ppm. A molecular mass of 57 D was added to all cysteines to account for carboxyamidometliylation. Differential search included 16 D for methionine oxidation, and 226 D on N terminus and lysine for biotinylation. Search results were sorted, filtered, statically analyzed, and displayed using

PeptideProphet™ and ProteinProphet™ (Institute for Systems Biology). The minimum trans-proteomic pipeline (TPP) probability score for proteins was set to

0.95, to assure TPP error rate of lower than 0.01.

References

Aaronson, R.P., and G. Blobel. 1975, Isolation of nuclear pore complexes in

association with a lamina. Proceedings of the National Academy of Sciences of the United States of America. 72: 1007- 11.

Al-Haboubi, T., D.K. Shumaker, J. oser, M. Wehnert, and B. Fahrenkrog. 2011.

Distinct association of the nuclear pore protein Nupl53 with A- and B-type lamins. Nucleus. 2.

Beckett, D., E. Kovaleva, and P.J. Schatz. 1999. A minimal peptide substrate in biotin holoenzyme synthctasc-catalyzed biotinylation. Protein Sci. 8:921-9.

Bione, S., E. Maestrini, S. Rivella, M. Mancini, S. Regis, G. Romeo, and D. Toniolo.

1994. Identification of a novel X-linked gene responsible for Emery-Dreiruss muscular dystrophy. Nature genetics. 8:323-7. Bodoor, ., S. Shaikh, D. Salina, W.H. Raharjo, R. Bastos, M. Lohka, and B. Burke. 1999. Sequential recruitment of PC proteins to the nuclear periphery at the end of mitosis. Journal of cell science. 112 ( Pt 13):2253-64.

Bonne, G., M.R. Di Barletta, S. Varnous, H.M. Becane, E.H. Hanunouda, L. Merlini, F. Muntoni, C.R. Greenberg, F. Gary, J.A. Urtizberea, D. Duboc, M. Fardeau,

D. Toniolo, and K. Schwartz. 1 99. Mutations in the gene encoding lamin A/C cause autosomal dominant Emery-Dreifuss muscular dystrophy. Nature genetics. 21 :285-8.

Bonne, G., E. Mercuri, A. Muchir, A. Urtizberea, H.M. Becane, D. Recan, L. Merlini, M. Wehnert, R. Boor, U. Reuner, M. Vorgerd, E.M. Wicklein, B. Eymard, D.

Duboc, I. Penisson-Besnier, J.M. Cuisset, X. Ferrer, I. Desguerre, D.

Lacombe, K. Bushby, C. Pollitt, D. Toniolo, M. Fardeau, . Schwartz, and F. Muntoni. 2000. Clinical and molecular genetic spectrum of autosomal dominant Emery-Dreifuss muscular dystrophy due to mutations of the lamin A C gene. Annals of neurology. 48:170-80.

Broers, J.L., B.M. Machiels, GJ. van Eys, HJ. uijpers, E.M. Manders, R. van Driel, and F.C. Ramaekers. 1999. Dynamics of the nuclear lamina as monitored by GFP-tagged A-type lamins. Journal of cell science. 1 12 ( Pt 20):3463-75.

Burke, B., and K J. Roux. 2009. Nuclei take a position: managing nuclear location.

Developmental cell. 17:587-97.

Castano, E., V.V. Philimonenko, M. ahle, J. Fukalova, A. Kalendova, S. Yildirim, R. Dzijak, H. Dingova-Krasna, and P. Hozak. 2010. Actin complexes in the cell nucleus: new stones in an old field. Histochemistry and cell biology. 133:607-26.

Chapman-Smith, A., and J.E, Cronan, Jr. 1999, Molecular biology of biotin

attachment to proteins. JNutr. 129:477S-484S.

Choi-Rhee, E., H. Schulman, and J.E. Cronan. 2004. Promiscuous protein

biotinylation by Escherichia coli biotin protein Hgase. Protein Sci. 13:3043-50. Crisp, M., Q. Liu, K. Roux, J.B. Rattner, C. Shanahan, B. Burke, P.D. Stahl, and D.

Hodzic. 2006. Coupling of the nucleus and cytoplasm: role of the LINC complex. The Journal of cell biology. 172:41-53.

Cronan, J.E. 2005. Targeted and proximity-dependent promiscuous protein

biotinylation by a mutant Escherichia coli biotin protein ligase. JNutr

Biochem. 16:416-8. Daigle, N., J. Beaudouin, L. Hartnell, G. Imreh, E. Hallberg, J. Lippincott-Schwartz, and J. Ellenberg. 2001. Nuclear pore complexes form immobile networks and have a very low turnover in live mammalian cells. The Journal of cell biology. 154:71-84.

Dechat, T., B. Korbei, OA. Vaughan, S. Vlcek, C.J. Hutchison, and R. Foisner. 2000.

Lamina-associated polypeptide 2alpha binds intranuclear A-type lamins.

Journal of cell science. 113 Pt 19:3473-84.

Dwyer, N., and G. Blobel. 1 76. A modified procedure for the isolation of a pore complex-lamina fraction from rat liver nuclei. The Journal of cell biology. 70:581-91.

Foisner, R., and L. Gerace. 1993. Integral membrane proteins of the nuclear envelope interact with lamins and chromosomes, and binding is modulated by mitotic phosphorylation. Cell. 73: 1267-79.

Gerace, L., and M.D. Huber. 2012. Nuclear lamina at the crossroads of the cytoplasm and nucleus. Journal of structural biology, 177:24-31.

Goldman, R.D., Y. Gruenbaum, R.D. Moir, D.K. Shumaker, and T.P. Spann. 2002.

Nuclear lamins: building blocks of nuclear architecture. Genes &

development. 16:533-47.

Green, N.M. 1 63. Avidin. 1. The Use of (l4-C)Biotin for Kinetic Studies and for Assay. Biochem J. 89:585-91.

Guan, T., R.H. Kehlenbach, E.C. Schirmer, A. Kehlenbach, F. Fan, B.E. Clurman, N.

Arnheim, and L. Gerace. 2000. Nup50, a nucleoplasmically oriented nuclcoporin with a role in nuclear protein export. Molecular and cellular biology. 20:5619-30.

Gudise, S., R.A. Figueroa, R. Lindberg, V. Larsson, and E. Hallberg. 201 1 , Sampl is functionally associated with the LINC complex and A-type lamina networks. Journal of cell science. 124:2077-85.

Gueneau, L., A.T. Bertrand, J.P. Jais, M.A. Salih, T. Stojkovic, M. Wehnert, M.

Hoeltzenbein, S. Spuler, S. Saitoh, A. Verschueren, C. Tranchant, M. Beuvin, E. Lacene, N.B. Romero, S. Heath, D. Zelenika, T. Voit, B. Eymard, R. Ben

Yaou, and G. Bonne. 2009. Mutations of the FHL1 gene cause Emcry- Dreifuss muscular dystrophy. American journal of human genetics. 85:338-53. Hase, M.E., and V.C. Cordes. 2003. Direct interaction with nupl53 mediates binding of Tpr to the periphery of the nuclear pore complex. Molecular biology of the cell. 14: 1923-40.

Holaska, J.M., .K. Lee, A.K. Kowalski, and K.L. Wilson. 2003. Transcriptional repressor germ cell-less (GCL) and barrier to autointegration factor (RAF) compete for binding to emerin in vitro. The Journal of biological chemistry. 278:6969-75.

Holaska, J.M., and K.L. Wilson. 2006. Multiple roles for emerin: implications for

Emery-Dreifuss muscular dystrophy. The anatomical record. Part A,

Discoveries in molecular, cellular, and evolutionary biology. 288:676-80. Horton, H., I. McMorrow, and B. Burke. 1 92. Independent expression and assembly properties of heterologous lamins A and C in murine embryonal carcinomas.

European journal of cell biology. 57: 172-83.

Krull, S., J. Thyberg, B. Bjorkroth, H.R. Rackwitz, and V.C. Cordes. 2004.

Nucleoporins as components of the nuclear pore complex core structure and

Tpr as the architectural element of the nuclear basket. Molecular biology of the cell. 15 :4261 -77.

Kubben, N., J.W. Voncken, J. Demmers, C. Calis, G. van Almen, Y. Pinto, and T.

Misteli. 2010. Identification of differential protein interactors of lamin A and progeria Nucleus. 1 :513-25.

Kulman, J.D., M. Satake, and J.E. Harris. 2007. A versatile system for site-specific enzymatic biotinylation and regulated expression of proteins in cultured mammalian cells. Protein expression and purification. 52:320-8.

Kuroishi, T.₅ L. Rios-Avila, V. Pestinger, S.S. Wijeratne, and J. Zempleni. 2011.

Biotinylation is a natural, albeit rare, modification of human histones.

Molecular genetics and metabolism. 104:537-45.

Kwon, ., and D. Beckett. 2000. Function of a conserved sequence motif in biotin holoenzyme synthetases. Protein science : a publication of the Protein

Society, 9: 1530-9.

Kwon, K., E,D. Streaker, S. Ruparelia, and D. Beckett. 2000. Multiple disordered loops function in corcprcssor-induccd dimerization of the biotin repressor. J Mol Biol. 304:821-33. Lane, M.D., .L. Romioger, D.L. Young, and F. Lynen. 1964. The Enzymatic

Synthesis of Holotranscaiboxylase from Apotranscarboxylase and (+)-Biotin.

Ii. Investigation of the Reaction Mechanism. J Biol Chem. 239:2865-71. Lee, . ., T. Haraguchi, R.S. Lee, T. Koujin, Y. Hiraoka, and K.L. Wilson. 2001.

Distinct functional domains in emerin bind lamin A and DNA-bridging protein

BAF. Journal of cell science. 114:4567-73.

Liu, Q., N. Pante, T. Misteli, M. Elsagga, M. Crisp, D. Hodzic, B. Burke, and KJ,

Roux. 2007. Functional association of Sunl with nuclear pore complexes. The

Journal of cell biology. 178:785-98,

Mansharamani, M., and K.L. Wilson. 2005. Direct binding of nuclear membrane protein MANl to emerin in vitro and two modes of binding to barrier-to- autointegration factor. The Journal of biological chemistry. 280: 13863-70. Mechold, U., C. Gilbert, and V. Ogryzko. 2005. Codon optimization of the BirA enzyme gene leads to higher expression and an improved efficiency of biotinylation of target proteins in mammalian cells. Journal of biotechnology.

116:245-9.

Moir, R.D., M. Yoon, S. Khuon, and R.D. Goldman. 2000. Nuclear lamins A and B 1 : different pathways of assembly during nuclear envelope formation in living cells. The Journal of cell biology. 151 : 1155-68.

Montes de Oca, R., C.J. Shoemaker, M. Gucek, R.N. Cole, and K.L. Wilson. 2009.

Barrier-to-autointegration factor proteome reveals chromatin-regulatory partners. PloSone. 4:e7050.

Nesbeth, D., S.L. Williams, L. Chan, T. Brain, N.K. Slater, F. Farzaneh, and D.

Darling. 2006. Metabolic biotinylation of lentiviral pseudotypes for scalable paramagnetic microparticle-dependent manipulation. Molecular therapy : the journal of the American Society of Gene Therapy. 13:814-22.

Raffaele Di Barletta, M., E. Ricci, G. Galluzzi, P. Tonali, M. Mora, L. Morandi, A.

Romorini, T. Voit, K.H. Orstavik, L. Merlini, C. Trevisan, V. Biancalana, I. Housmanowa-Petrusewicz, S. Bione, R. Ricotti, K. Schwartz, G. Bonne, and D. Toniolo. 2000. Different mutations in the LMNA gene cause autosomal dominant and autosomal recessive Emcry-Drcifuss muscular dystrophy. American journal of human genetics. 66: 1407-12. Rasala, B.A., C. Ramos, A. Harel, and D.J. Forbes. 2008. Capture of AT-rich chromatin by ELYS recruits POM121 and NDC1 to initiate nuclear pore assembly. Molecular biology of the cell. 1 :3982-96.

Roux, K.J., M.L. Crisp, Q. Liu, D. Kim, S. Kozlov, C.L. Stewart, and B. Burke. 2009.

Nesprin 4 is an outer nuclear membrane protein that can induce kinesin- mediated cell polarization. Proceedings of the National Academy of Sciences of the United States of America. 106:2194-9.

Sang, L., JJ. Miller, K.C. Corbit, RH. Giles, M.J. Brauer, E.A. Otto, L.M. Baye, X.

Wen, S J. Scales, M. Kwong, E.G. Huntzicker, M.K. Sfakianos, W. Sandoval, J.F. Bazan, P. Kulkarni, F.R. Garcia-Gonzalo, A.D. Seol, J.F. O'Toole, S.

Held, H.M Reutter, W.S. Lane, M.A. Rafiq, A. Noor, M. Ansar, A.R. Devi, V.C. Sheffield, D.C. Slusarski, J.B. Vincent, D.A. Doherty, F. HUdebrandt, J.F. Reiter, and P.K. Jackson. 2011. Mapping the NPHP-JBTS-MKS protein network reveals ciliopathy disease genes and pathways. Cell. 145:513-28. Schirmer, E.C., L. Florens, T. Guan, J.R. Yates, 3rd, and L. Gerace. 2003. Nuclear membrane proteins with potential disease links found by subtractive proteomics. Science. 301 : 1380-2.

Segura-Totten, M., A.K. Kowalski, R. Craigie, and K.L. Wilson. 2002. Bamer-to- autointegration factor: major roles in chromatin decondensation and nuclear assembly. The Journal of cell biology. 158:475-85.

Shimi, T„ K. Pfleghaar, S. Kojima, C.G. Pack, I. Solovei, A.E. Goldman, S.A. Adam, D.K. Shumaker, M. Kinjo, T. Cremer, and R.D. Goldman. 2008. The A- and B-type nuclear lamin networks: microdomains involved in chromatin organization and transcription. Genes & development. 22:3409-21.

Shumaker, D,K,, T. Dechat, A. Kohlmaier, S.A. Adam, M.R, Bozovsky, M.R. Erdos, M. Eriksson, A.E. Goldman, S. Khuon, F.S. Collins, T. Jenuwein, and R.D. Goldman. 2006. Mutant nuclear lamin A leads to progressive alterations of epigenetic control in premature aging. Proceedings of the National Academy of Sciences of the United States of America. 103 8703-8.

Simon, D.N., and K.L. Wilson. 2011. The micleoskeleton as a genome-associated dynamic 'network of networks'. Nature reviews. Molecular cell biology. 12:695-708. Streaker, E.D., and D. Beckett. 2006. Nonenzymatic biotinylation of a biotin carboxyl carrier protein: unusual reactivity of the physiological target lysine. Protein Sci. 15:1928-35.

Sukegawa, J., and G. Blobcl. 1993. A nuclear pore complex protein that contains zinc finger motifs, binds DNA, and faces the nucleoplasm. Cell. 72:29-38.

Sullivan, T., D. Escalante-Alcalde, H. Bhatt, M. Anver, N. Bhat, K. Nagashima, C.L.

Stewart, and B. Burke. 1999. Loss of A-type lamin expression compromises nuclear envelope integrity leading to muscular dystrophy. The Journal of cell biology. 147:913-20.

Torrisi, .R., and S. Bonatti. 1985. Immunocytochemical study of the partition and distribution of Sindbis virus glycoproteins in freeze-fractured membranes of infected baby hamster kidney cells. The Journal of cell biology. 101 : 1300-6.

Torrisi, .R., L.V. Lotti, A. Pavan, G. Migliaccio, and S. Bonatti. 1987. Free

diffusion to and from the inner nuclear membrane of newly synthesized plasma membrane glycoproteins. The Journal of cell biology. 104:733-7. van Steensel, B., and S. Henikoff. 2000. Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nature

biotechnology. 18:424-8.

Walther, T.C., M. Fornerod, H. PickersgiU, M. Goldberg, T.D. Allen, and I.W. Mattaj.

2001. The nucleoporin Nupl53 is required for nuclear pore basket formation, nuclear pore complex anchoring and import of a subset of nuclear proteins. The EMBO journal. 20:5703-14.

Worman, H.J. 2012. Nuclear lamins and laminopathies. The Journal of pathology.

226:316-25.

Worman, H.J., L.G. Fong, A, Muchir, and S.G. Young, 2009. Laminopathies and the long strange trip from basic cell biology to therapy. The Journal of clinical investigation. 1 19: 1825-36.

Zhang, Q., C. Bethmann, N.F. Worth, J.D. Davies, C. Wasner, A. Feuer, CD.

Ragnauth, Q. Yi, J.A. Mellad, D.T. Warren, M.A. Wheeler, J.A. Ellis, J.N. Skepper, M. Vorgerd, B. Schlotter-Weigel, P.L. Weissberg, R.G. Roberts, M. Wchnert, and CM. Shanahan. 2007. Ncsprin-1 and -2 arc involved in the pathogenesis of Emery Dreifuss muscular dystrophy and are critical for nuclear envelope integrity. Human molecular genetics. 16:2816-33. Example 2. 2^nd Generation BioID™

To optimize the original ¹ generation BioID™ method (Gl-BioID™) we have developed a 2^nd generation BioID™ (G2-BioID™) whose primary improvement is a considerable reduction in the size of the promiscuous biotin ligase. Whereas Gl- BioID™ is based on a mutation (Rl 18G) of the E. coli BirA gene, G2-BioID™ is based on a mutation (R40G) of biotin protein ligase (BPL) from A. aeolicus. A humanized nucleic acid sequence for G2-BioID was generated by gene synthesis such that it could be inserted into a mammalian expression vector (pcDNA3.1). The original version has an N-terminal myc-epitope tag for protein detection. G2-BioID™ is 234 amino acids (26.56 kDa), whereas Gl -BioID™ is 322 amino acids (35.22 kDa), both calculated without the 9 amino acid myc-epitope tag. The wild type ligases that are the basis for Gl -BioID™ and G2-BioID™ have been structurally characterized. The dimensions (length and width) of Gl-BioID™ are 7.4 x 4.0 nm and for G2-BioID™ are 5.2 x 4.2 nm. This considerable size difference (~30% difference) is due to the presence of a naturally occurring DNA-binding domain that is found in Gl -BioID™ but not in the G2-BioID™.

Nucleic acid sequence for G2-BioID (untagged)

ATGTTCAAGAACCTGATCTGGCTGAAGGAGGTGGACAGCACCCAGGAGA G

ACTGAAGGAGTGGAACGTGAGCTACGGCACCGCCCTGGTGGCCGACAGA C

AGACCAAGGGCAGAGGCGGCCTGGGCAGAAAGTGGCTGAGCCAGGAGGG C

GGCCTGTACTTCAGCTTCCTGCTGAACCCCAAGGAGTTCGAGAACCTGCT GCAGCTGCCCCTGGTGCTGGGCCTGAGCGTGAGCGAGGCCCTGGAGGAGA TCACCGAGATCCCCTTCAGCCTGAAGTGGCCCAACGACGTGTACTTCCAG GAGAAGAAGGTGAGCGGCGTGCTGTGCGAGCTGAGCAAGGACAAGCTGA T

CGTGGGCATCGGCATCAACGTGAACCAGAGAGAGATCCCCGAGGAGATC A

AGGACAGAGCCACCACCCTGTACGAGATCACCGGCAAGGACTGGGACAG A AAGGAGGTGCTGCTGAAGGTGCTGAAGAGAATCAGCGAGAACCTGAAGA A

GTTCAAGGAGAAGAGCTTCAAGGAGTTCAAGGGCAAGATCGAGAGCAAG A TGCTGTACCTGGGCGAGGAGGTGAAGCTGCTGGGCGAGGGCAAGATCACC GGCAAGCTGGTGGGCCTGAGCGAGAAGGGCGGCGCCCTGATCCTGACCG A

GGAGGGCATCAAGGAGATCCTGAGCGGCGAGTTCAGCCTGAGAAGAAGC T GA (SEQ ID NO: 15)

Amino acid sequence for G2-BioID (untagged)

MF NLIWLKEVDSTQERL EWNVSYGTALVADRQTKGRGGLGRKWLSQEG GLYFSFLI-NPKEFENLLQLPLVLGLSVSEALEEITEIPFSLKWPNDVYFQ E KVSGVLCELS DKLIVGIGINVNQREIPEE1 DRATTLYEITGKDWDR KEVI KVLKRISENLKKFKEKSFKEFKGKIESKMLYLGEEVKLLGEGKIT GKLVGLSE GGALU_^TEEGIKEILSGEFSLRRS (SEQ ID NO; 16)

HEK293 cells were either mock transfected (control) or transfected with myc- Gl- or myc- G2-BioID™ either in the presence or absence of 50 μΜ biotin.

Experiments were carried out as described in Example 1. Results are shown in

Figure 5. As compared to the control lanes, the expression of the Gl - (*) and G2- BioID™ (*) proteins is detected by anti-myc immunoblotting. Promiscuous biotinylation of endogenous proteins is observed with streptavidin-HRP for both Gland G2-BioID™. This effect is enhanced by the presence of exogenous biotin.

We then compared Gl- and G2-BioID™ by fluorescence microscopy. U20S cells were transfected with myc-Gl- or myc-G2-BioID™ either in the presence or absence of 50 uM biotin. Gl - (*) and G2-BioID™ (*) proteins were detected with anti-myc. Promiscuous biotinylation (biotin) of endogenous proteins was observed with streptavidin-488 for both Gl - and G2-BioID™ (See Figure 6). This promiscuous biotinylation is enhanced by the presence of exogenous biotin. Nuclei are labeled with Hoechst dye. Scale bar is 20 um.

We then compared Gl- and G2-BioID™-LaA by immuno- and streptavidin- blot analysis. HEK293 cells were either mock transfected (control) or transfected with myc-Gl - or myc-G2-BioID -LaA either in the presence of 50 μΜ biotin. Similar levels of promiscuous biotinylation of endogenous proteins is observed with streptavidin-HRP for both G l- and G2-BiolD™-LaA. Expression of the Gl- and G2- BioID-LaA proteins is detected by anti-myc immunoblotting.

Example 3. Exemplary BioID™ protocols

Introduction

The overall process of BioID™ can be broken down into two stages; 1) generation and characterization of a BiolD™-fusion protein in a mammalian (or other) expression vector for stable expression in a cell line and 2) use of that cell line for a large-scale BioID™ pull-down to identify protein candidates by mass spectrometry. Note that this exemplary protocol is specific to the application of BioID™ in common mammalian cell lines. Based on the teachings herein, it will be well within the level of skill in the art to apply BioID™ to a variety of cell types and species.

Design of the BioID fusion protein

The fundamental component of this system is the BioID™ fusion protein. If the purpose of BioID™ is to screen for physiologically relevant protein interactions for a protein of interest then the ideal fusion protein will represent a functional replacement of the original protein. Thus care should be taken in deciding where in the protein the biotin ligase will be incorporated. Prior evidence of successful fusions with similarly sized proteins, such as GFP, can be a good starting point to guide this decision. Consideration of posttranslational modifications, either to the N- or C- terminus must be considered as well. For example, C-terminal prenylation or an N- terminal signal peptide should not be disrupted. This may complicate the molecular cloning process of generating the expression plasmid for the fusion protein, but is worth the effort. If there is little information as to how N- or C-terminal fusions might affect the protein of interest it is recommended to try both in parallel, at least to the point of validating each fusion protein for proper targeting. Functional validation of the fusion protein

Another consideration is how to ensure that the fusion protein is functional. In many cases this is not possible unless the function of the protein is well known, some form of knockout/knockdown system exists and there is a phenotypic outcome to rescue. However, it is typically possible to at least compare the localization of the fusion protein to a more minimally tagged, or preferably endogenous, protein by immunofluorescence microscopy. Attention should be paid to expression levels as overexpression of proteins can often lead to mis-localization. In most cases low levels of BioID™ fusion proteins, at or below the level of the endogenous protein, are sufficient for the identification of candidates. Thus when validating by transient transfection, attention should be paid to cells expressing lower levels of the fusion protein as this is the desired, and often likely, outcome for the cells that stably express the fusion protein. Choice of cells and expression method

Once the BioID™ fusion protein has been designed and an expression plasmid generated it is necessary to consider the cell type in which to express the BioID™ fusion protein. This is a situation-specific decision, however there are a few concepts to consider. Preferably the expression will be similar in all cells within the population to be used for large-scale BioID™ pull-down. Thus traditional transient transfection is best limited to preliminary functional validation. A population of cells stably expressing similar amounts of the fusion protein is preferred. This can be accomplished with standard random integration or various viral expression systems. Since it is usually sufficient to screen for BioID™ candidates more likely to be physiologically relevant, an extremely low level of protein expression is ideal.

Inducible expression of the fusion protein can be utilized, especially if expression of the bait is toxic. However, inducible expression is not necessary in most instances since biotinylation itself is induced by the addition of excess biotin. BASIC PROTOCOL 1

GENERATION OF CELLS EXPRESSING A BioID FUSION PROTEIN

This protocol describes exemplary techniques for creation and characterization of a BioID™-fusion protein expression plasmid followed by its stable expression in mammalian cells. Techniques such as PCR cloning and generation of stably expressing cells will not be described as they vary depending on the situation. The methods needed to test the targeting of the fusion protein and its expression and activity in cells prior to a large-scale BiolD pull-down will be described. Materials

1 mM Biotin (see recipe)

PBS

BSA blocking buffer (see recipe)

Streptavidin-Alexa Fluor® (Invitrogen)

Streptavidin-HRP (High Sensitivity Streptavidin-HRP, Thermo Scientific)

ABS Blocking buffer (see recipe)

Quenching solution (see recipe)

SDS-PAGE Electrophoresis Unit (Mini-PROTEAN® II Electrophoresis Cell, Bio- Rad)

Semi-Dry Transfer Cell (Trans-Blot®, Bio-Rad)

Validation of BiolD fusion protein

1. Prepare the expression plasmid for ihe BiolD™ fusion protein.

2. For each experimental condition (eg. control, BiolD™ fusion protein) plate two wells of cells in a standard 6-well plate, one with glass coverslips and the other without

3. Express the fusion protein in the cells of choice by transient transfection.

Process mock or non-transfected cells in parallel. Apply biotin (50 μΜ final concentration) to cells at time of transfection. If transfection protocol requires replacement of media shortly after transfection add biotin at that time.

a. The presence of excess biotin promotes biotinylation by ihe BiolD™ fusion protein (usually over a period of 18-24 hrs). This permits simultaneous analysis of expression, targeting and function of the BiolD™ fusion protein by immunofluorescence microscopy and immunoblot.

4. Process the cells on coverslips for analysis by immunofluorescence

microscopy 1 day following transient transfection.

a. It is recommended to use PFA fixation for 10 min followed by TX-100 permeabilization for 15 min. Use of MeOH fixation will result in strong mitochondrial signal observed with fluorescentfy labeled streptavidin. Use the labeled streptavidin at 1:1,000 with the secondary antibody and DNA labeling reagent (eg. hoechst, DAPI),

5. Process the cells for immunoblot analysis 1 day following transient

transfection and addition of biotin.

6. Following whole cell lysis, SDS-PAGE separation and protein transfer, agitate membrane in BSA blocking buffer for 20-30 min at RT.

a. BSA is preferred for blocking to eliminate any free biotin that is potentially present in milk or serum. Free biotin will compete with biotinylate proteins for binding to streptavidin-HRP.

7. Agitate membrane in streptavidin-HRP at 1:40,000 in blocking buffer for 40 min at room temperature (RT).

a. May need to optimize concentration depending on source of streptavidin- HRP. This incubation can be performed overnight at 4 °C for convenience.

8. Quickly wash membrane in PBS 2-3 times to wash away unbound streptavidin HRP.

9. Agitate membrane in ABS blocking buffer (see recipe) for 5 min.

a. This step can be used to reduce background signal on membrane.

10. Quickly wash membrane in PBS 2-3 times.

1 1. Agitate membrane in PBS for 5 min.

12. Add enhanced chemiluminescence (ECL) reagent to observe biotinylated proteins.

13. Following successful analysis of biotinylated proteins, quench HRP signal on membrane with 20 min agitation in quenching solution.

a. Removal of streptavidin-HRP with standard membrane stripping methods is futile given the strength of the biotin-streptavidin interaction. The quenching permanently inactivates the HRP allowing further probing with additional antibodies. Consider reapplication of ECL to the membrane and visualization to confirm success of quenching.

14. Wash membrane with PBS multiple times to remove quenching solution.

a. This step removes residual sodium azide and hydrogen peroxide that may affect the following Western blot analysis. 15. Proceed to immunoblot membrane with antibodies specific to the BioID fusion protein (eg. anti-myc/HA) to confirm its expression and migration by SDS-PAGE.

a. Use standard protocol for this beginning with blocking in ABS or equivalent (BSA blocking not required). As an alternative to HA/myc tag detection of the BioID™ fusion protein an anti-BirA antibody is effective by immunofluorescence and immvnoblot (chicken anti-BirA ab 14002, Abeam).

1 . Generation of cells stably expressing BioID™ fusion protein

17. Initiate generation of stable cell lines with validated cell line.

a. This is a highly variable process depending on the strategy and cell type.

18. If subcloning of cells is performed, screen subclones first by

immunofluorescence. Perform immunoblot analysis on subclones that pass the immunofluorescence screening. If viral infection is utilized, screen population of infected cells by immunofluorescence and immunoblot.

a. 50 μΜ biotin can be added to the cells as in Step 3 to monitor biotinylation function of BioID™ fusion protein.

1 . Freeze down multiple vials of stably expressing cells for future BioID^1M experiments. BASIC PROTOCOL 2

BioID™ PULL-DOWN TO IDENTIFY CANDIDATE PROTEINS

This protocol describes an exemplary embodiment of the invention using cells stably expressing a BioID™ fusion protein (along with non-expressing control cells) to perform large-scale BioID™ pull-down experiments. The purpose of these experiments is to isolate sufficient amounts of proteins biotinylated by the BioID™ fusion protein to be identified by mass spectrometry. The starting material for these experiments may vary depending on a number of factors. These include the efficiency of biotinylation by the BioID™ fusion protein and the number of desired candidate proteins. This protocol describes the analysis of four confluent 10 cm plates of cells/condition (~ xl0⁷ cells). The protocol ends immediately prior to analysis by mass spectrometry, a service typically performed by a core facility.

Materials

1 mM Biotin (see recipe) PBS

Protease inhibitor (Halt Protease Inhibitor Cocktail, EDTA-Free, Thermo Scientific)

20% Triton X-100

Lysis buffer (see recipe)

50 mM Tris (pH 7.4)

Dynabeads® (MyOne Steptavadin CI, Life Technologies)

Wash buffer 1 (see recipe)

Wash buffer 2 (see recipe)

Wash buffer 3 (see recipe)

50 mM Ammonium bicarbonate (NHtHCC^)

Sonicator (Sonifier-250, Branson)

Centrifuge (Legend Micro 21R, Thermo Scientific)

MagneSphere® Technology Magnetic Separation Stand (Promega)

Note: Unless otherwise specified, lysis and wash steps can be carried out at RT to avoid precipitation of SDS and deoxycholic acid. To reduce keratin contamination, use unopened DNAse RNAse-free tubes, wear gloves and use a tube cap opener.

Perform cell lysis

1. Begin with four 10 cm dishes for each experimental condition (cells expressing BioID™ constructs or control cells).

The number of cells utilized will correlate with numbers of candidates identified,

2. When cells reach approximately 80% confluency, change medium to fresh complete medium containing 50 μΜ biotin (IX).

3. Incubate cells for 24 hr.

Incubation time may vary depending on the goals of the experiment.

4. Remove medium completely by aspiration and rinse the cells twice at room temperature with 5 ml/dish of PBS.

This step helps to remove residual free-biotin from the medium.

5. Add 600 μΐ of lysis buffer/dish and scrape cells gently to harvest the cells.

This can be be done at RT. The purpose of this harsh lysis is to try to disrupt all protein interactions and completely denature/solubilize the proteins. Transfer lysed cells to a 15 ml conical tube.

Add 240 μΐ of 20% Triton X-100 (final concentration -2%) and mix by trituration. This tube can be kept on ice during subsequent sonication.

Adding a five fold excess of Triton X-100 dilutes out the SDSand prevents its precipitation at 4°C. Position the probe tip in the sample just above the tube bottom.

Apply sonication for two sessions with 30 pulses using a Branson Sonifier 250 (or equivalent) at 30% duty cycle and an output level of 3. Let the tube sit on ice for 2 min between each session to prevent overheating.

If the sample is still viscous and cloudy after sonication, apply an additional period of sonication. Sonication also functions to shear DNA. Add 2.16 ml of prc-chillcd 50 mM Tris (pH 7.4) and mix well.

This dilution provides more favorable conditions for affinity capture.

Apply one session of sonication (30 pulses using a Branson Sonifier 250 at 30% duty cycle and an output level of 3).

This step helps solubilize any precipitated proteins that may be present due to reduced salt and detergent concentrations from the previous step and assists in mixing the sample. Aliquot the sample evenly to three pre-chilled 2 ml tubes.

Spin down at 16,500 x g relative centrifugal force (RCF) for 10 min at 4°C. Perform affinity purification ofbiotinylated proteins During the centrifugatkra in step 13, place three new 2 ml tubes in the Magnetic Separation Stand and add 0.75 ml of RT lysis buffer and 0.75 ml of RT 50 mM Tris (pH 7.5) to each tube.

Mix the stock of magnetic streptavidin beads well with gentle tapping to resuspend and add 200 μΐ of beads to each tube prepared in step 14. Wait for 3 min RT.

This step junctions to equilibrate the beads in the binding buffer. Once the beads accumulate at one side of tube wall, remove the supernatant gently by pipetting.

This step removes the buffer the beads equilibrated in. After sample centrifugation, carefully transfer supernatant to the tubes prepared in step 16. Do not disturb the small insoluble pellet on the tube wall when removing the supernatant.

Step 17 can be performed quickly to prevent the beads from drying out. Resuspend the samples and beads with gentle pipetting.

Incubate the tube on a rotator at 4°C overnight.

Place the tubes on the Magnetic Separation Stand and wait 3 min to collect beads at RT.

Remove the supernatant gently by pipetting. Try not to disturb beads on the wall of the tube.

Add 1.5 ml of wash buffer 1 to each tube and resuspend beads gently by pipetting.

Place tubes on a rotator for 8 min at RT.

Remove the supernatant as described in step 20-21.

To pool beads into one tube, add 1.5 ml of wash buffer 1 to one of the three tubes and resuspend gently.

Transfer resuspended sample in step 25 to another tube. Repeat this step to combine three samples into one tube in a final volume of 1.5 ml buffer 1. Place tubes on a rotator for 8 mins at RT.

Wash once with 1.5 ml of wash buffer 2 as described in steps 20-24.

Wash once with 1.5 ml of wash buffer 3 as described in steps 20-24.

Add 1.5 ml of 50 mM Tris (pH 7.4) and resuspend gently with pipetting. This step can be used to remove detergents in the sample that may interfere with mass-spectrometry. Save 150 μΐ (10% of total) of resupended beads for further analysis by Western blot.

Spin beads (both 1.35 ml and 150 ul samples) at 6,000 x g RCF for 5 min to collect beads.

Remove the supernatant completely. Do not disrupt beads at the tube bottom. 34. Add SO μΙ of SO m ammonium bicarbonate to the tube that contained 1.3S ml sample (destined for mass-spec analysis) and resuspend gently with pipetting.

Final wash buffer may vary depending on analysis method. If necessary, freeze the sample quickly in liquid nitrogen. Frozen samples can be stored at -

35. For the 150 μΐ sample, after removing supernatant, add 100 μΐ of IX SDS- PAGE sample buffer and resuspend gently by pipetting. Heat samples at 98°C for 5 min. Samples can be stored at -20°C for further analysis by immunoblot as described in Basic Protocol 1.

Prior to sending samples for analysis by mass spectrometry it is recommended to perform immunoblot analysis of ~5 μΐ from the protein in the SDS-PAGE sample buffer. This should follow the same protocol in Basic Protocol I to check for evidence of biotinylated proteins from BioID fusion proteins. A considerable difference in biotinylated proteins between the control and BioID fusion protein samples justifies proceeding to mass spectrometry.

REAGENTS AND SOLUTIONS

Use Milli-Q purified water or equivalent in all recipes.

1 mM Biotin (20X)

Dissolve 12.2 mg Biotin (Sigma, B4501) in 50 ml of serum-free DMEM (or standard tissue culture media). Pipetting may be required to dissolve biotin completely.

Sterilize by passing through a 0.22 μιη syringe-driven filter unit (Millex). Dispense into sterile 50 ml tube, cap tighdy. Store at4°C.

Lysis Buffer

50 mM Tris (pH 7.4)

500 mM NaCl

0.2% SDS (w/v)

Store at RT

I proteinase inhibitor (add just before use)

1 mM DTT (add just before use) Note: This lysis buffer composition differs from that described in the initial publication of BioID (Roux et al., 2012).

Wash buffer 1

2% SDS (w/v)

Store at T

Wash buffer 2

0.1% (w/v) deoxycholic acid

1% (w/v) Triton X-100

1 mM EDTA

500 mM NaCl

50 mM Hepes (pH 7.5)

Deoxycholic acid stock solution must be protected from light.

Deoxycholic acid does not solubilize well under pH 7.1. Add Hepes last. Store at RT

Wash buffer 3

0.5% (w/v) deoxycholic acid

0.5% (w/v) NP-40

1 mM EDTA

250 mM LiCl

Store at RT

BSA blocking buffer

1% bovine serum albumin, fraction V

0.2% (w/v) Triton X-100

Bring up to final volume with IX PBS.

Store at 4°C

ABS blocking buffer

10% (v/v) adult bovine serum

1% (w/v) Triton X-100 Bring up to final volume with IX PBS

Store at 4°C

Quenching solution

3% (v/v) sodium azide

4.5% (v/v) hydrogen peroxide

Bring up to volume with IX PBS

Make this solution fresh with each use The subcellular locations in which BioID™ has successfully been applied to date include the nucleus, cytoplasm, endoplasmic reticulum (ER) and mitochondrial matrix. This protocol does not provide specific details for identification of proteins by mass-spectrometry. BioID™ experiments performed to date have utilized on-bead tryptic digestion to release peptides for analysis by ID LC-MS MS (Roux et al., 2012). Advantages of this approach include circumvention of the difficulty in removing biotinylation proteins/peptides from streptavidin matrix without also removing the streptavidin itself. An additional benefit from this approach is removal of the SDS-PAGE separation of proteins, a common source of keratin contamination and mass-spectrometry interference due to residual SDS.

The expected outcome is the identification of proteins that interact with and or are proximate to the BioID™ fusion protein. For a iull-scale BioID™ pull-down the results will likely arrive as a list of proteins that were identified by mass- spectrometry. This list should not only contain the names of the proteins but the peptide sequences and number of times each peptide was identified. This provides some information as to the relative abundance of that protein in the sample. In parallel there will be results from the control cells that do not express a BioID™ fusion protein. This list usually contains the five naturally biotinylated carboxylases (pyruvate carboxylase, PC; methylcrotonoyl-CoA carboxylase subunit alpha, MCCl; acetyl-CoA carboxylase 1, ACACA; acetyl-CoA carboxylase 1, ACACB; propionyl- CoA carboxylase alpha chain, PCCA), limited contaminating keratins and ribosomal proteins, and a small number of histones. These can be subtracted from the mass spectrometry results from the BioID™-fusion protein samples. After this subtraction, the rcrnairiing proteins are candidate interactors or proximate proteins that can be validated by other methods. It is cost-effective to coordinate experiments so that a single control is used for several BioID™ pull-downs. This is acceptable if the control pull-down is performed in parallel under the same conditions and in the same cellular background as the cells expressing BioID™ fusion proteins. BioID™ typically results in a relatively large list of candidates, from -50 to 250, depending on the experiment. These can be ranked by abundance and/or based on what is already known about those proteins. It should be noted that the abundance, or lack thereof, does not necessarily imply anything about biological relevance and the user will have to make decisions as how to best utilize the results.

The process of generating the BioID™-fusion protein expression plasmid by

PCR cloning and validation of the fusion protein by transient transfection can be accomplished in as little as a week. Generation and selection of stably expressing cells could take as little as a week if viral expression is utilized or 3-4 weeks if relying on random integration following transient transfection and selection. A large-scale BioID™ pull-down can be accomplished in under a week. This includes the time required for the generation of sufficient cell numbers (typically 4xl0⁷ cells).

Identification of candidates by mass-spectrometry typically takes 1-2 weeks. Thus a standard BioID™ experiment can be accomplished in 1-2 months from start to finish. LITERATURE CITED

Choi-Rhee, E., H. Schulman, and J.E. Cronan. 2004. Promiscuous protein

biotinylation by Escherichia coli biotin protein ligase. Protein science : a publication of the Protein Society. 13:3043-50.

Cronan, J.E. 2005. Targeted and proximity-dependent promiscuous protein

biotinylation by a mutant Escherichia coli biotin protein ligase. The Journal of nutritional biochemistry. 16:416-8.

Kwon, ., and D. Beckett. 2000. Function of a conserved sequence motif in biotin holoenzyme synthetases. Protein science : a publication of the Protein Society.

9:1530-9.

Roux, K.J., D.I. Kim, M. Raida, and B. Burke. 2012. A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. The Journal of cell biology. van Steensel, B., and S. Henikoff. 2000. Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nature biotechnology. 18:424-8.

Claims

I claim:

1. A method for identifying in vtvo proximate proteins, comprising

(a) culturing recombinant cells comprising a recombinant nucleic acid capable of directing expression of a fusion protein comprising if) a hetcrologoas promiscuous biotin protein ligase (BPL), or functional equivalent thereof, and (if) a bait: polypeptide, wherein the cuiftn ing is carried out under conditions suitable o expression of the fusion protein in the eukaryotie cells; and

fb) identifying biotinyfated proteins, wherein the biottnylated proteins are proteins present in the cells proximate to the bait polypeptide.

2. The method of claim I, wherein the BPL, or functional equivalent thereof, comprises a protein with an amino acid sequence of general formula 1 :

Xi-X2~X3-X4-X5-X6-X7-XS-X9-Xi 0-Xi 1 -X.12-X I3-X14-X! 5-X 16-Xi 7- 1 S-X 1 - X2O-X21-X22-X23-X24-X25-X26-X27-X28-X29-X30-X31 -X32 (^'SEQ ID NO: 13), wherein

XI is selected from the group consisting of A , L, l, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of I, V, and L;

X4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K, E, and 1;

X7 is Q;

XH selected from the group consisting of Q, T, N_« F, V, and S;

X.9 is selected from the group consisting of A , , M, S, Q, and E;

Xl is G;

I 1 is selected from the group consisting of R and :

X12 is G .

X.1.3 is any residue other than R;

ΧΪ4 is selected front the group consisting of R, L, W, G, and S;

X 15 is selected from the group consisting of G. Q, P, a d K;

XJ.6 is selected from the group consisting of R and :

X17 is selected front the group consisting of K, Q, V, E, T, M , and A;

XI 8 is ;

X ί 9 is selected from the group consisting of i^'\ L, Y, E, I, and V ;

X22 is selected from the group consisting of F, E, A, R^'„ Y, and V;

X23 is selected from the group consisting of G and A

X24 is selected from the group consisting of V and C. or is abscot;

X25 is selected from the group consisting of C and A, or is absent:

X26 is selected from the group consisting of A and L, or is absent;

X27 is selected from the group consisting of A. G, and S, or is absent; .X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting o L, ?„ A, and F:

X30 is selected, from the group consisting of Y. M, A, V, and L;

X31 is selected from the group consisting of L, G. V, 1, and F: and X.32 is selected from the group consisting of S, T, and .

3. The method of claim 2, wherein at least one of the following is true of the BPL of general formula 1 f SEQ ID NO: 17) :

X4 is A :

X8 is T or S;

XI 3 is G;

XI 5 is G;

X i 'Hs ! . or V;

X2 is S;

X2 i is P

X23 is G;

X29 is i, or Ϊ:

X30 is A. V, or L:

X31 is L₅ G, V, or Ϊ; and/or

X32 is S,

4. The method of claim 3, wherein X.13 is G.

5. The meihod of any one of claims l~4„ wherein tbe BPL, or functional equivalent thereof, comprises of an ammo acid sequence according to SEQ ID NO: 14.

6. The method of claim 5. wherein the BPL comprises a G residue at position 523.

7. Toe method of any one of claims 1-6, wherein, the BPL comprises an amino acid sequence selected from the group consisting of SEQ ID NO .1 -4 and 6- 12.

8. Hie method of any one of c laims i -7, wherein the recombinant nucleic ac id is present i an expression vector,

9. The method of any one of claims 1-7, wherein the recombinant nucleic acid is integrated into cellular chromosomal DNA.

Hi. The method of any one of claims 1 -9, wherein the colturing comprises adding bkrtin to culture medium that the recombinant cel ls arc cultured in.

i 1. The method of any one of claims 1-10, wherein die cuituring is carried for between about 0.5 hours and about 96 hours.

12. The method of any one of claims 1-1 1 , where in the identifying the biotin lated proteins comprises isolating the biotmy!ated polypeptides,

. 3. A recombinant nucleic acid, comprising

(a) a first nucleic acid domain encoding a promiscuous biotin protein itgase (BPL); and

(b) a second nucleic acid domain encoding a bait polypeptide.

14. The recombinant nucleic acid of claim 13, wherein the firs nucleic acid domain encodes a protein comprising an amino acid sequence of general formula 1 : XI -X.-.X3-X4-.X5-X6-X7-X8-X9-X10-X1 l-Xl 2-Xt 3-ΧΊ 4- 1 5-X16-X 1 7-X18-. 19- X20-X2 J.-X22-X23-X24-X25-X26-X27-X28-X29-X30-X3 1 -X32 (S^'EQ ID NO: 13), wherein

XI is selected from the group consisting of A, L, 1, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of 1, V, and L;

X4 is selected from the group consisting of A and G;

XS is selected from the group consisting of E, D, R, T, N, V, and A;

X6 is selected from the group consisting of Y, R, K. E. and 1;

X7 is 0;

X8 selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, , , S, Q, and E;

iO is G ;

XI I is selected from the group consisting of R and ;

12 is G;

X ! 3 i any residue other than R ,

X^'f4 is selected from the group consisting of R, L, W, G , and S;

XI 5 is selected from the group consisting of G. Q, P, and ; X ! 6 is selected

Χί7 is selected

XI 8 is W;

XI9 is selected

Λ Κ> IS selected

X2I is selected

X22 is selected

X23 is selected

X25 is setected

X26 is selected

X27 is selected

X2X is selected

X29 is selected

X30 is selected

X31 is selected

X32 is selected

.

15. The recombinant nucleic acid of claim Ϊ 4, wherein. XI 3 ts G.

16. Hie recombinant nucteie acid of any one of claims 13-15, wherein the first nucleic acid domain encodes a BPL protein comprising an amino acid sequence according to SEQ ID NO: 14,

17. The recombinant nucleic acid of claim 16, wherein the first n uc leic acid domain encodes a BPL protein comprising a G residue at position 523.

1 . The recombinant nucleic acid of any one of claims 13-17, wherein the first nucleic acid domain encodes a BPL protein comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1-4 and 6-12.

19. .An expressio vector comprising the recombinant nucleic acid of any one of claims 1 -18 operatively linked to a promoter.

20. A recombinant host cell comprising the recombinant nucleic acid, of an one of claims 13-18 or the expression vector of claim 19.

21. The recombinant host cell of claim 20, wherein the recombinant host ceil is an embryonic stein cell.

22 A transgenic, non-human organism comprising a recombinant host ceil of claim 20 or 21 .

23. The transgenic, non-human organism of claim 22, wherein die organism is a mammal

24. The transgenic, non-human organism of claim 23, wherein the mammal is a mouse

25. A recombinant fusion protein, comprising

fa) a first domain encoding a bio iii rotein ligase (BP.L); and

(b) a second domain encoding a bait -polypeptide.

26. lite recombinant fusion protein of claim 25, wherein the first domain comprises an amino acid sequence of general formula i:

X 1 -X2-X3 -X4-X5 -X6-X7-X8 -X9- 10-X i i -X 12-X 13-X i 4-X i >-X 16-X 17-X18-X i 9- X2O.X2 -X22-X23-X24-X25-X26-X27-X2S-X29-X30-X31 -X32 (SEQ ID NO: 13), wherein

.XI is selected from the group consisting of A, L, 1, T, V, or is absent;

X2 is selected from the group consisting of C, L, V, H, A, or is absent;

X3 is selected from the group consisting of 1, V, and L:

X.4 is selected from the group consisting of A and G;

X5 is selected from the group consisting of E, D, ^' , T, N, V, and A;

X6 is selected from the group consisting of Y, R, , E, and I;

X7 is Q;

X selected from the group consisting of Q, T, N, F, V, and S;

X9 is selected from the group consisting of A, , N. S, Q, and E;

XIO is G;

X I 1 is selected from the group consistin of and K;

Xf.2 is G;

XI 3 is any residue other than R;

X14 is selected from the group consisting of R, L, W, G, and S;

XI is selected from the group consisting of G, Q, P, an K;

X.1.6 is selected from the group consisting of R and N;

XI 7 is selected from the group consisting o , Q, V, E, T, M, and A;

X18 is W;

Xf. is selected from the group consisting of F, L Y, E, Ϊ, and V;

X2 is selected from the group consisting of S, , and N;

X21 is selected from the group consisting of P, Q, and D;

X22 is selected from the group consistin of 1_\ £, .A, , Y, and V:

X23 is selected from the group consisting of G and A X24 is selected from the group consisting of V and C. or is abseot;

X25 is selected from the group consisting of C and A, or is absent;

X26 is selected from the group consisting of A and L. or is absent;

X27 is selected from the group consisting of A, G, and 8, or is absent;

X28 is selected from the group consisting of N, G, Q, T, and C;

X29 is selected from the group consisting of L, Ϊ, A, and P:

X30 is selected from the group consisting of Y, M, A. V, and L;

X. 1 is selected from the group consisting of L, G, V, I, and F; and

X32 is selected from the group consisting of S, T, and F.

27. The recombinant fusion protein of claim 26, wherein X i 3 is G.

28. The recombinant fusion protein o aty one of claims 25-27, wherein the first domain comprises an amino acid sequence according to SEQ ID NO: 14.

29. The recombinant fusion protein of claim 28, wherein the first domain comprises a G residue at position 523.

30. The recombinant fusion protein of an one of claims 25-29, wherein the first domain comprises an amino acid sequence selected from the group consisting of SEQ ID O: 1 -4 and 6-12.