WO2018104962A1

WO2018104962A1 - Hemiaminal-tag for protein labeling and purification

Info

Publication number: WO2018104962A1
Application number: PCT/IN2017/050570
Authority: WO
Inventors: Vishal Rai; Landa PURUSHOTTAM
Original assignee: Indian Institute Of Science Education And Research Bhopal
Priority date: 2016-12-07
Filing date: 2017-12-05
Publication date: 2018-06-14

Abstract

The invention pertains to the synthesis, isolation, and characterization of hemiaminal for selective labeling of peptides, proteins, antibodies, and organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ group over -C(=0) CHRNH₂ group (where R≠H). The invention also pertains to the method of single-site immobilization of proteins through N-terminus Gly on solid phase. The invention includes late-stage tagging of N-terminus Gly with an affinity tag, ¹⁹F NMR probe, and a fluorophore and a method for metal-free protein purification and isolation of analytically pure proteins.

Description

HEMIAMINAL-TAG FOR PROTEIN LABELING AND PURIFICATION

FIELD OF INVENTION:

The invention is in the field of biotechnology with specific reference to single-site labeling, late- stage tagging, and purification of proteins.

BACKGROUND OF THE INVENTION:

Single-site labeling of proteins facilitates insight into biological processes through biophysical probes, imaging probes, and toxins. Such labeling emerges through (a) pre- engineered protein equipped with unnatural amino acids, (b) an amino acid sequence recognized by enzymes or single residues such as cysteine, (c) chemoselective labeling of low-frequency residues in native proteins, and (d) chemical labeling of side chain functionalities enabled by selective ligand-protein interaction.

Attempts have been made to label Ν^ε-ΝΗ2 group of lysine and N-terminus a-amine (N -NH2) in proteins that can serve as a reactivity hotspot. The biomimetic transformation of N- terminus residue renders a carbonyl group for subsequent bio-orthogonal reactions. The labeling of N -NH2 by an electrophile is achieved through chemoselective transformations and a latent electrophile generated in the form of imine from N -NH2 and aldehyde undergoes subsequent nucleophilic addition. In general, the site-selectivity gets compromised very early in the chemical transformation in the presence of multiple primary amines as nucleophiles or their latent electrophiles. Albeit, the nucleophilic attack of backbone amide to imine generated by N -NH₂ with an aldehyde can render isolable imidazolidinone with high site- specificity. But, none of the methodologies mentioned above distinguish one N-terminus residue from the other. Also, N-terminal Cys containing proteins can render unique reactivity to form thiazolidine with an aldehyde as well as 2-cyanobenzothiazole.

The N-terminal amino acid of protein is varied, and it is a challenge to identify unique reactivity for the other N-terminus residues. In particular, selective targeting of N-terminus Gly poses a prominent complexity as there is no assistance available from the side-chain residue. Hence the tools are needed for such labeling of N-terminus glycine. The hemiaminals (synonym: carbinolamine) are formed spontaneously in the reaction between amine and aldehyde. Its subsequent dehydration results in imine or iminium that serves as the key intermediate in several reactions. It also obviates the inherent instability of native hemiaminal and is difficult for it to be isolated without using protecting groups for hydroxyl group or secondary amine or both. It had been limiting the isolation of stable hemiaminal from primary amine and aldehyde. Thus, there is a need for techniques that can stabilize hemiaminal in physiological conditions to render unique tools for the protein labeling.

OBJECT OF THE INVENTION

The object of the invention is for synthesis, isolation, and characterization of hemiaminal for selective labeling of peptides, proteins, and organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH2NH2 group over -C(=0) CHRNH2 group (where R≠H).

Another object of the invention is to design an appropriate intramolecular hydrogen bond promoter for the formation of stable hemiaminal.

Another object of the invention is for labeling of N-terminus Gly in proteins with stable hemiaminals with remarkable efficiency and selectivity where -C(=0)CH₂NH₂ group is differentiated from the - C(=0)CHRNH₂ group (where R is not equal to H). Another object of the invention is to enable late-stage tagging of N-terminus Gly with an affinity tag, ¹⁹F NMR probe, and a fiuorophore.

Another object of the invention is to develop a method for metal-free protein purification and isolation of analytically pure proteins.

BRIEF DESCRIPTION OF THE DRAWINGS AND FIGURES:

Figure 1 (Scheme a) outlines the reaction of primary amine and aldehyde to give stable hemiaminal.

Figure 1 (Scheme b) shows the principles regulating the stability and selectivity of hemiaminal.

Figure 2 (Scheme a) illustrates the design of aldehyde with hydrogen bond promoters for stable hemiaminal formation with glycine. Figure 2 (Scheme b) outlines the selective formation of hemiaminal with amide derivative of Gly (la).

Figure 2 (Scheme c) depicts the H-bond that locks the hemiaminal in conformation disfavored for dehydration in unsubstituted amino acids but not for substituted amino acids. Figure 3 depicts LC-MS spectrum for 3a.

Figure 4 (Scheme a) outlines N-terminus Gly labeling of proteins.

Figure 4 (Scheme b) outlines the extension of methodology for labeling insulin in a mixture of proteins.

Figure 5 depicts ESI-MS spectra (a) for insulin 6d and mono-labeled insulin 7d. (b) MS-MS spectrum of labeled GLSDGEWQQVLNVWGK (Gl -K16). Site of modification is N- terminus glycine (Gl) in labeled insulin 7d.

Figure 6A and 6B depicts ESI-MS (a) for insulin 6d and mono-labeled insulin after oxime formation, (b) Peptide mapping: MS spectra of mono labeled insulin after DTT reduction, (c) MS-MS spectrum of label GIVEQCCTSICSLYQLENYCN (G1 -N21). Site of modification is N-terminus glycine (Gl , chain A) in labeled insulin.

Figure 7 illustrates the late-stage tagging and isolation of analytically pure tagged proteins. Figure 8 depicts ESI-MS spectra (a) for insulin 6d and mono-labeled insulin 1 la after oxime formation, (b) Peptide mapping of ¹⁹F-NMR probe tagged insulin 11a after DTT reduction; ¹⁹F-NMR probe tag is presented in chain A of labeled insulin 11a.

Figure 9 depicts ESI-MS spectra (a) for insulin 6d (1 equiv.) and mono-labeled insulin lib after oxime formation, (b) Peptide mapping of biotin tagged insulin lib after DTT reduction; biotin tag is presented in chain A of labeled insulin lib.

Figure 10 depicts ESI-MS spectra (a) for insulin 6d (1 equiv.) and mono-labeled insulin 11c after oxime formation, (b) Peptide mapping of coumarin tagged insulin 11c after DTT reduction; coumarin tag is presented in chain A of labeled insulin 11c.

Figure 11 illustrates late stage bio-orthogonal reactions allow installation of an NMR tag, affinity tag, and a fluorophore. The protein 6d (20 μΜ) is vortexed with reagent 2i (10 mM) for 24 h. Subsequently, unreacted reagent 2i was removed from the reaction mixture and 9 was vortexed with derivatives of O-hydroxylamine (lla-llc, 10 mM) for 3-6 h.

Figure 12 A and 12 B illustrates purification of single-site tagged insulin from the reaction mixture (a) Step 1 : Hemiaminal (9) formation. Step 2: Single-site immobilization of the labeled insulin 9 on hydrazide resin. Step 3: Transoximization of 13 with derivatives of O- hydroxylamine (10a- 10c) releases the tagged labeled protein (l la-l lc) in analytically pure. ESI-MS spectra of (b) purified insulin tagged with O-hydroxylamine, (c) F NMR probe tagged insulin 11a, (d) biotin tagged insulin l ib, and (e) coumarin tagged insulin 11c.

Figure 13 (Scheme a) outlines insulin bioactivity assay using western blot analysis of phospho-Akt (pAkt) and GAPDH in HEK293T cells lysates.

Figure 13 (Scheme b) illustrates quantification of pAkt signal relative to GAPDH.

Figure 13 (Scheme c) shows uptake of tagged insulin (green) and a mixture of untagged and tagged insulin in cells. Blue signal is for chromatin. (Scheme d) Activation of IR signaling and pAkt (red) accumulation in HEK293T cells after insulin treatment (scale bar: 10 μιη). Figure 14 outlines the synthesis of hemiaminal precursor-resin conjugate.

DETAILED DESCRIPTION OF THE INVENTION:

Accordingly, the present invention is for synthesis, isolation, and characterization of hemiaminal for selective labeling of peptides, proteins, and organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ group over -C(=0) CHRNH₂ group (where R≠H) .

The stable hemiaminal formation is with native proteins or recombinant proteins or any peptide with Gly at the N-terminus, i.e., selective hemiaminal formation- with peptides, proteins, and organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ group over -C(=0) CHRNH₂ group (where R≠H) .

The invention is a method for single site modification of native proteins or recombinant proteins, antibodies or any peptide with Gly at the N-terminus, i. e., selective hemiaminal formation- with peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH2NH2 group over -C(=0) CHRNH2 group (where R≠H), where reactivity parameters are regulated by the chemical agents for single-site bench-stable hemiaminal formation.

In one embodiment, the stable hemiaminals synthesized are used for the single site modification of N-terminus 01-NH2 of Gly without interference from Lys e-NFh.

Stable hemiaminal with Gly

The ortho substituted benzaldehyde garners the advantage of geometrical constraint imposed by the aromatic ring. To assess the stability of the hemiaminal a hydrogen bond promoter ortho to the aldehyde in the form of methoxy group was reacted (Figure 1). The reaction of Gly amide la results in 0% stable hemiaminal with reagent 2a, 27% desired product with reagent 2b, 33% product with reagent 2c, 63% conversion with reagent 2d, and >95% conversion with reagent 2e. The method operates under physiological conditions. By the method of the invention, the stability of hemiaminal is derived from co-operative intramolecular bis-H-bonding. These interactions ensure high potential energy barrier for imine formation to prohibit the degradation of hemiaminal into imine. Further, such stabilization survives the challenges imposed by the physiological reaction conditions. The stable hemiaminals are water stable even though the water molecules can lower the free energy barrier to imine formation substantially.

The single-site modification is also a chemoselective modification. The single-site modification is also a site-specific modification.

The single-site modification also enables late-stage tagging with the probe of interest. The single-site modification also enables purification of proteins from mixture of proteins.

The success of chemical agent for stabilization of hemiaminal relies on appropriately designed hydrogen bond promoters.

In one embodiment the chemical agent for the selective stabilization of -C(=0)CH₂NH₂ group in the form of hemiaminal is selected from the aldehyde reagents of general structure (I and II).

These chemical agents offer a method for modification of proteins and can be extended to any functionally similar molecule.

In one embodiment the chemical agents for purification of proteins can be reagents of general structure (III) that includes immobilized Ι/Π on a resin.

R¹ = natural and un-natural amino acid derivatives with -CH₂NH₂ or -C(=0)CH₂NH₂ group

peptides, proteins, organic fragments with -C(=0)CH₂NH₂ group at the termini

For R^1"9: n, m, 1 = 1-10, The R¹ group is selected as stated in figure or can be selected from aryl; heteroaryl; heterocycle; cycloalkyl; alkyl; lower alkyl; and alkenyl. R⁶ is selected as outlined in the figure, R^2"5 and R^7"9 are independently selected H; alkyl; lower alkyl; cycloalkyl; aryl; heteroaryl; alkenyl; heterocycle; halides; nitro; -C(0)OR^* wherein R^* is selected from H, alkyl; cycloalkyl and aryl; -C(0)NR^**R^***, wherein R^** and R^*** are independently selected from H, alkyl; cycloalkyl and aryl; -CH2C(0)R_a, wherein R_a is selected from -OH, lower alkyl, cycloalkyl; aryl, -lower alkyl-aryl, -cycloalkyl-aryl; or - NR_bRc, where R_b and R_c are independently selected from H, lower alkyl, cycloalkyl; aryl or - lower alkyl-aryl; -C(0)R_d, wherein R_d is selected from lower alkyl, cycloalkyl; aryl or -lower alkyl-aryl; or -lower alkyl-OR_e, wherein R_e is a suitable protecting group or OH group. R¹ group can also be selected from an amino acid, small peptide, large peptide, a protein, an antibody, their unnatural derivatives or other biomolecules bearing -CH2NH2 group. Here small peptide is a 2-mer to 10-mer peptide, large peptide is 11-mer to 30-mer peptide, and 31- mer or larger amino acid sequences are considered as proteins. All the Rⁿ groups are optionally substituted at one or more substitutable positions with one or more suitable substituents.

The term "suitable substituent" is meant to include independently H; hydroxyl; cyano; alkyl, such as lower alkyl, such as methyl, ethyl, propyl, n-butyl, t-butyl, hexyl and the like; alkoxy, such as lower alkoxy such as methoxy, ethoxy, and the like; aryloxy, such as phenoxy and the like; vinyl; alkenyl, such as hexenyl and the like; alkynyl; formyl; haloalkyl, such as lower haloalkyl which includes CF3, CCI3 and the like; halide; aryl, such as phenyl and napthyl; heteroaryl, such as thienyl and furanyl and the like; amide such as C(0)NR**R***, , where R** and R*** are independently selected from lower alkyl, aryl or benzyl, and the like; acyl, such as C(0)-C6H5, and the like; ester such as -C(0)OCH3 the like; ethers and thioethers, such as O-Bn and the like; thioalkoxy; phosphino; and -NRbR_c, where Rb and R_c are independently selected from lower alkyl, aryl or benzyl, and the like. It is to be understood that a suitable substituent as used in the context of the present invention is meant to denote a substituent that does not interfere with the formation of the desired product by the processes of the present invention.

As used in the context of the present invention, the term "lower alkyl" as used herein either alone or in combination with another substituent means acyclic, straight or branched chain alkyl substituent containing from one to six carbons and includes for example, methyl, ethyl, 1 methylethyl, 1-methylpropyl, 2-methylpropyl, and the like. A similar use of the term is to be understood for "lower alkoxy", "lower thioalkyl", "lower alkenyl" and the like in respect of the number of carbon atoms. For example, "lower alkoxy" as used herein includes methoxy, ethoxy, t -butoxy.

The term "alkyl" encompasses lower alkyl, and also includes alkyl groups having more than six carbon atoms, such as, for example, acyclic, straight or branched chain alkyl substituents having seven to ten carbon atoms.

The term "aryl" as used herein, either alone or in combination with another substituent, means an aromatic monocyclic system or an aromatic polycyclic system. For example, the term "aryl" includes a phenyl or a napthyl ring, and may also include larger aromatic polycyclic systems, such as fluorescent (eg. anthracene) or radioactive labels and their derivatives.

The term "heteroaryl" as used herein, either alone or in combination with another substituent means a 5, 6, or 7-membered unsaturated heterocycle containing from one to 4 heteroatoms selected from nitrogen, oxygen, and sulphur and which form an aromatic system. The term "heteroaryl" also includes a polycyclic aromatic system comprising a 5, 6, or 7-membered unsaturated heterocycle containing from one to 4 heteroatoms selected from nitrogen, oxygen, and sulphur.

The term "cycloalkyl" as used herein, either alone or in combination with another substituent, means a cycloalkyl substituent that includes for example, but is not limited to, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl and cycloheptyl. The term also involves "cycloalkyl-alkyl-" that means an alkyl radical to which a cycloalkyl radical is directly linked; and includes, but is not limited to, cyclopropylmethyl, cyclobutylmethyl, cyclopentylmethyl, 1-cyclopentylethyl, 2-cyclopentylethyl, cyclohexylmethyl, 1 -cyclohexylethyl and 2-cyclohexylethyl. A similar use of the "alkyl" or "lower alkyl" terms is to be understood for aryl-alkyl-, aryl-lower alkyl- (eg. benzyl), -lower alkyl-alkenyl (eg. allyl), heteroaryl- alkyl-, and the like as used herein. For example, the term "aryl-alkyl-" means an alkyl radical, to which an aryl is bonded. Examples of aryl-alkyl- include, but are not limited to, benzyl (phenylmethyl), 1-phenylethyl, 2-phenylethyl and phenylpropyl.

As used herein, the term "heterocycle", either alone or in combination with another radical, means a monovalent radical derived by removal of a hydrogen from a three- to seven-membered saturated or unsaturated (including aromatic) heterocycle containing from one to four heteroatoms selected from nitrogen, oxygen and sulfur. Examples of such heterocycles include, but are not limited to, pyrrolidine, tetrahydrofuran, thiazolidine, pyrrole, thiophene, hydantoin, diazepine, imidazole, isoxazole, thiazole, tetrazole, piperidine, piperazine, homopiperidine, homopiperazine, 1 ,4-dioxane, 4-morpholine, 4-thiomorpholine, pyridine, pyridine-N-oxide or pyrimidine, and the like.

The term "alkenyl", as used herein, either alone or in combination with another radical, is intended to mean an unsaturated, acyclic straight chain radical containing two or more carbon atoms, at least two of which are bonded to each other by a double bond. Examples of such radicals include, but are not limited to, ethenyl (vinyl), 1-propenyl, 2-propenyl, and

1 -butenyl.

The term "alkynyl", as used herein is intended to mean an unsaturated, acyclic straight chain radical containing two or more carbon atoms, at least two of which are bonded to each other by a triple bond. Examples of such radicals include, but are not limited to, ethynyl, 1-propynyl, 2-propynyl, and 1-butynyl.

The term "alkoxy" as used herein, either alone or in combination with another radical, means the radical -0-(Ci-_x)alkyl wherein alkyl is as defined above containing 1 or more carbon atoms, and includes for example methoxy, ethoxy, propoxy, 1 -methylethoxy, butoxy and 1,1-dimethylethoxy. Where x is 1 to 6, the term "lower alkoxy" applies, as noted above, whereas the term "alkoxy" encompasses "lower alkoxy" as well as alkoxy groups where x is greater than 6 (for example, x = 7 to 10). The term "aryloxy" as used herein alone or in combination with another radical means -O-aryl, wherein aryl is defined as noted above. Typically, the structure of chemical agent is: - ^vvvv C enables late-stage modification

A representative example of such a hemiaminal forming reagent is 2-formylphenoxyacetate (FPA).

The chemical agents of the invention allow selective stabilization of -C(=0)CH₂NH₂ group in the form of hemiaminal, whereas -C(=0)CHRNH2 group (where R≠H) led intermediate rapidly converts to an imine. Further, the bis-H-bond stabilized hemiaminal of the invention is stable in physiological conditions even in the presence of several competing H-bond acceptors and donors in the protein. The bis-H-bond stabilized hemiaminal of the invention is selected for single-site modification of a residue having a -C(=0)CH2NH2 or -CH2NH2 group. The examples include Gly residue at the N-terminus of protein, C(=0)CH2NH2 or -CH2NH2 bearing groups, peptides, and other biomolecules.

The FPA-sepharose conjugate has been used to purify the N-terminus Gly protein from all the other proteins. The wide range potential has been demonstrated by purifying a protein expressed through recombinant protein expression in E. coli.

The stable hemiaminal of the invention is used for labeling or making several probes including biophysical probes, protein-based materials, and protein-based therapeutics. The hemiaminal is selective in the stabilization of peptides, proteins, and organic fragments with -C(=0) CH2NH2 and derivatives with -CH₂NH₂ group over -C(=0) CHRNH₂ group (where R≠H).

The hemiaminal is for the stabilization of recombinant proteins modified with N-terminal glycine. The percent stabilization of the hemiaminal of peptides, proteins, organic fragments with - C(=0) CH2NH2 and derivatives with -CH₂NH₂ group are in the range 25% -100%.

The invention also pertains to the method of obtaining single site labeled peptides, proteins, and organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ with the hemiaminal comprising of: reacting the peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ with the hydrogen bond promotor; removing the unreacted hydrogen bond promotors and purification to obtain the single-site labeled proteins and peptides with the stable hemiaminal formed.

Another embodiment is for a method of obtaining single-site labeled proteins from a mixture of proteins with -C(=0) CH₂NH₂ and -C(=0) CHRNH₂ group (where R≠H) with the hemiaminal comprising of: reacting the mixture of proteins with the hydrogen bond promotor;

removing the unreacted hydrogen bond promotors;

removing the unreacted proteins;

and purification to obtain the single site labeled proteins with the stable hemiaminal formed from the protein mixture.

In addition, the invention is for a method of obtaining N-terminal glycine introduced recombinant proteins for single site labeled proteins with the hemiaminal.

In addition, the invention is for a method of obtaining single-site (N-terminal glycine) ordered immobilization of proteins on solid phase.

Another embodiment is for a method of obtaining hemiaminal resin conjugate with the hemiaminal, comprising of:

synthesis of the hydrogen bond promoter with functional groups required for its installation on the resin;

installation of the hydrogen bond promoter on the resin; reacting the peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ with the hydrogen bond promotor immobilized on resin. The stable hemiaminal tagged molecules of the invention is selected from proteins conjugated to drugs, polymer chains, glycosyl groups, chromophores, and biohybrid materials.

Further, the Gly-hemiaminal tag is used in metal-free protein purification. The technology provides purification protocol complementary to widely used immobilized metal affinity chromatography (IMAC) through a His-tag. Metal leaching has been one of the challenges associated with the IMAC.

[A] Stable hemiaminal with Gly

The ortho substituted benzaldehyde garners the advantage of geometrical constraint imposed by the aromatic ring. To assess the stability of the hemiaminal a hydrogen bond promoter ortho to the aldehyde in the form of methoxy group was placed (Figure 1 )The reaction of Gly amide la results in 0% stable hemiaminal with reagent 2a, 27% desired product with reagent 2b, 33% product with reagent 2c, 63% conversion with reagent 2d, and >95% conversion with reagent 2e. These results substantiate our hypothesis.

[B] Selective hemiaminal formation with the unsubstituted amino acid

We also observed that the hemiaminal formed by Ala lb underwent spontaneous dehydration to equilibrate with imine contrary to Gly la. The appropriate alignment of N-H - 0 is essential for the dehydration step. Under the experimental conditions, the hemiaminal 3a exhibits remarkable stability even under the silica-gel chromatographic conditions. All the other proteinogenic amino acids (lc-lt) behave similarly to Ala lb.

[C] Single-site hemiaminal formation with proteins

The translation of hemiaminal chemistry to proteins was checked with melittin 6a. The reaction of reagent 2e with 6a led to N -NH₂ (Gly) mono-labeled melittin 7a (52% conversion). After that, myoglobin 6b was selected and vortexed with aldehyde 2e. The N - NH2 (Gly) mono-labeled myoglobin 7b is formed exclusively in this case (Figure 4a). Peptide mapping of the enzymatic digest and MS-MS confirmed the site of labeling at N-terminus Gly. The installation of Gly at the N-terminus of recombinant proteins is convenient and promises a broad reach for the hemiaminal labeling platform. In bacteria, the N-terminal Met is excised from proteins exposing the penultimate amino acid. This removal is highly efficient when Gly is the second residue. Also, the recombinant proteins expressed with a recognition sequence for prescission protease renders the desired N-terminus post proteolytic digestion. For recombinant proteins, the later route was adopted which generated small Ubiquitin-like modifier (SUMO) protein 6c with a Gly residue introduced at the N-terminus. The labeling of 6c with 2e resulted in N-terminus mono-labeled SUMO (7c, 43% conversion). Next, insulin 6d (Figures 5 and 6) a protein containing two N-terminus was selected. The N -NH₂ (Phe) of chain B is established to exhibit the highest reactivity amongst primary amines. When insulin 6d was vortexed with reagent 2e, it results in the mono-labeled N -NH₂ (Gly) modification in chain A of insulin 7d in 71% conversion. Next, a mixture of proteins was selected and examined for the methodology for its capability to label a single- site in a single protein selectively. The protein mixture comprised of insulin (6d), aprotinin (6e), ubiquitin (6f), cytochrome C (6g), lysozyme C (6h), β-lactoglobulin (6i), and chymotrypsinogen A (6j) each having diverse chemical composition at the N-terminus (Figure 4 ). The labeling experiment with reagent 2e resulted in exclusive labeling of N- terminus Gly bearing insulin 6d to render mono-labeled product 7d (42% conversion). The methodology exhibits remarkable selectivity as none of the other residues in any protein interfere or participate in the labeling.

[D] Preparation of analytically pure tagged proteins

The methodology was extended for its compatibility with (a) late-stage installation of tags and (b) purification of tagged proteins. The tag could be pre-installed in the reagent for hemiaminal formation and also convenient for late-stage installation of the tag that provides the downstream advantage towards the purification of the final adduct. Further, a systematic resign of the reagent 2e into symmetric bis-aldehyde (2i) with the orthogonal functional group for late-stage modification was attempted. Initially, the reaction of insulin with bis- aldehyde 2i resulted in excellent (88%) conversion. The efficiency of late-stage installation with benzyloxyamine was examined and the product characterization involved disulfide reduction, peptide mapping, and MS-MS. The potential of this transformation was shown with derivatives of O-hydroxylamine such as ¹⁹F NMR probe 11a, biotin lib, and a fluorophore 11c The overall conversion over the two steps were 85%, 87%, and 87% respectively) while retaining the selectivity. ¹⁹F NMR probe attached insulin 11a shows a sharp NMR signal at -62.80 ppm [TFA (20 mM), internal standard at -75.45 ppm] (figure 8). The incubated mixture of biotinylated insulin lib and streptavidin (Stv-n) renders a sharp band at -70 kDa for the Stv-llb complex (Figure 9). The coumarin attached insulin 11c results in a fluorescence emission at 428 nm (excitation at 333 nm) (Figure 10).

A reliable purification protocol with the hemiaminal chemistry is developed. The mono- labeled proteins enabled by hemiaminal chemistry by hydrazide activated resin were purified. Initially, the labeled protein (9) was reacted with hydrazide functionalized resin. This step exhibits excellent efficiency and results in >95% single-site immobilization of the labeled protein. Also, the unreacted protein was recovered with >95% efficiency and recycled as it demonstrated no structural perturbations. Finally, the immobilized protein was released with O-hydroxylamine from resin through transoximization. The centrifugal concentration results in the analytically pure single-site tagged protein. We recovered and reused the hydrazide activated resin in multiple cycles. The single-site immobilized labeled insulin rendered efficient parallel installation of ¹⁹F NMR probe, biotin, and fluorophore. The protocol is highly convenient for single- site labeling, purification, tagging, and purification. Notably, the overall isolated yields (three steps, two purifications) of analytically pure tagged insulin 11a, lib, and 11c is 70% (88% brsm), 75% (90% brsm), and 77% (91 % brsm) respectively (figure 13).

[E] Tagging of Insulin and its bioactivity assay

The bioactivity of the N-terminal hemiaminal-tagged proteins was examined. The circular dichroism of coumarin tagged insulin (11c) confirmed the conservation of structure. The activity was examined in cell-based assays using coumarin tagged insulin through its ability to activate insulin receptor (IR) mediated signaling. Activation of IR mediated signaling was assessed by an increase in levels of phospho-Akt (pSer-473) and subsequent uptake of insulin inside the cells. The prior is detected by phospho-Akt specific antibodies on western blotting and immunofluorescence and the later through its fluorophore signal. Cells treated with the tagged (11c) or untagged insulin (6d) render enhanced phospho-Akt reactivity relative to loading control protein (GAPDH). Here, the untreated (mock) cells exhibit basal phospho- Akt reactivity. All the outlined observations unambiguously establish that hemiaminal- tagging of N -NH₂ (Gly) in insulin results in no adverse effect on its bioactivity. The tagged insulin (11c) is as competent as untagged insulin (6d) while activating the insulin receptor signaling (Figure 14).

[F] Synthesis of resin conjugate and purification of proteins using hemiaminal platform One of the applications involves "metal-free protein purification" using "Gly-tag". The technology provides purification protocol complementary to widely used immobilized metal affinity chromatography (IMAC) through a His-tag. The technique is used for protein immobilization, detection, fractionation, and proteomics. Metal leaching has been one of the challenges associated with IMAC. However, the absence of efficient small tag metal-free alternatives has restricted the users to this technique. This application is enabled by the retro- hemiaminal process and access to resin conjugates.

Retro-hemiaminal process: Next, the chemical platform for protein purification was extended to protein purification. To enable such a technology, it was essential to develop a protocol for dissociating the hemiaminal back into its precursors. The desired process is required to be mild and leave the structure and function of proteins unperturbed. After optimization, it was identified that if the hemiaminal is incubated in glycine buffer at pH 6.0, >95% dissociation can be achieved within 12-48 hours.

Resin conjugate: With the protocols for hemiaminal formation and its dissociation in hand, the hemiaminal precursor-resin conjugates (figure 15) was synthesized. The screening of protein mixture with the hemiaminal precursor-resin conjugate established the convenient extension of the chemical platform to a solid phase for protein purification.

EXAMPLES:

The following examples are for the purpose of illustration of the invention and are not intended in any way to limit the scope of the invention.

Example 1: General Procedures

Synthesis of hydrogen bond promoters

a) Synthesis of ethyl 2-(2-formylphenoxy) acetate

Procedure: In a 25 ml round bottom flask, 2-hydroxy benzaldehyde (366 mg, 3 mmol), ethyl 2 -bromo acetate (1 g, 6 mmol) and K₂CO3 (828 mg, 6 mmol) was dissolved in acetone (6 ml). The reaction mixture was allowed to reflux for 6 h. The reaction was monitored using thin layer chromatography and upon completion, the reaction mixture was filtered to remove potassium carbonate. The solution was concentrated under vacuum and the product was purified using flash column chromatography (ethyl acetate :n-hexane 2:98) to afford ethyl 4- (4-formylphenoxy) acetate (77% yield, 480 mg). TLC (ethyl acetate :n-hexane 10:90, Rf 0.52), ¾ NMR (500 MHz, CDCI3) δ 10.56 (s, 1H), 7.85 (dd, / = 7.7, 1.8 Hz, 1H), 7.56 - 7.46 (m, 1H), 7.14 - 7.01 (m, 1H), 6.87 (t, / = 11.7 Hz, 1H), 4.74 (s, 2H), 4.26 (q, / = 7.1 Hz, 2H), 1.29 (t, / = 10.9, 3.7 Hz, 3H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 189.5, 168.1, 160.1, 135.7, 128.6, 125.4, 121.8, 112.6, 65.6, 61.6, 14.1 ppm. MS (ESI) [M+H]+ calcd. C11H12O4 209.1, found 209.0.

b) Synthesis of 2-(2-formylphenoxy) acetic acid (2e)

Procedure: In a 25ml round bottom flask, ethyl 4-(4-formylphenoxy) acetate (440 mg, 2.1 mmol), was mixed with water (1 ml). To this solution trifiuoro acetic acid (964 mg, 8.4 mmol) was added drop wise at 0-5 °C. The reaction mixture was allowed to reflux for 12 h. The reaction was monitored using thin layer chromatography and upon completion of the reaction, the solution was concentrated under vacuum and the product was purified using flash column chromatography (ethyl acetate: n-hexane 50:50) to afford ethyl 4-(4- formylphenoxy) acetate 2e (63% yield, 240 mg). TLC (ethyl acetate:n-hexane 40:60, Rf 0.52), ¾ NMR [500 MHz, (CD₃)₂CO] δ 10.55 (s, 1H), 7.78 (dd, / = 7.7, 1.8 Hz, 1H), 7.63 (ddd, / = 8.5, 7.3, 1.8 Hz, 1H), 7.19 (d, / = 8.5 Hz, 1H), 7.15 - 7.09 (m, 1H), 4.94 (s, 2H) ppm. ¹³C NMR [125 MHz, (CD₃)₂CO] δ 189.6, 169.7, 161.4, 136.7, 128.5, 126.3, 122.3, 114.5, 65.9 ppm. HRMS (ESI) [M+Na]⁺ calcd. C₉¾ Na0₄203.0320, found 203.0330.

c) Synthesis of 3-(2-formylphenyl)propanoic acid (2b)

Procedure: In a 25 ml Ace pressure tube, β-Tetralone (146 mg, 1 mmol), FeCb (16.2 mg, 1 mmol), H₂O (1 mmol), and DMSO (2 ml) was added. Then the tube was pressurized with air, after that it was stirred at 110°C for 20 h. The reaction was monitored using thin layer chromatography and upon completion of the reaction, the reaction mixture was purified by column chromatography (MeOH:DCM 0.5:99.5) to afford 2b (36% yield, 64 mg). TLC (MeOH:DCM 10:90, Rf 0.52), ¾ NMR (500 MHz, CDCb) δ 10.19 (s, 1H), 7.82 (dd, / = 7.6, 1.3 Hz, 1H), 7.53 (t, / = 7.5, 1.4 Hz, 1H), 7.44 (t, / = 7.5, 1.0 Hz, 1H), 7.34 (d, / = 7.5 Hz, 1H), 3.36 (t, / = 7.6 Hz, 2H), 2.70 (t, / = 7.6 Hz, 2H) ppm. ¹³C NMR (125 MHz, CDC1₃) δ 192.9, 178.0, 142.4, 134.1 , 133.9, 133.8, 131.2, 127.2, 35.1, 27.9 ppm. MS (ESI) [M+H]⁺ calcd. C10H10O3 179.0, found 179.1.

(d) Synthesis of 3-(2-formylphenoxy)propanoic acid (2d)

Procedure: In a 25ml round bottom flask, 3-bromopropanoic acid (619 mg, 4 mmol) and salicylaldehyde (500 mg, 4 mmol), was mixed with water (8 ml). To this sodium hydroxide solution (964 mg, 8 mmol) was added drop wise at 0-5 °C with vigorous stirring. The reaction mixture was allowed to reflux for 6 h. The reaction was monitored using thin layer chromatography and upon completion of the reaction, neutralize the aqueous layer with the slow addition of conc.HCl with constant stirring. The solution was concentrated under vacuum and the product was purified by using column chromatography (CHCb) to afford 3- (2-formylphenoxy)propanoic acid 2d (39% yield, 300 mg). TLC (MeOH:CHCl₃ 5:95, Rf 0.52), ¾ NMR (500 MHz, CDCI3) δ 10.44 (s, 1H), 7.83 (dd, / = 7.7, 1.7 Hz, 1H), 7.58 - 7.53 (m, lH), 7.05 (t, / = 7.5 Hz, 1H), 7.01 (d, / = 8.4 Hz, 1H), 4.37 (t, / = 6.1 Hz, 2H), 2.93 (t, / = 6.1 Hz, 2H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 189.9, 175.8, 160.7, 136.0, 128.5, 124.9, 121.2, 112.6, 63.8, 34.1 ppm. MS (ESI) [M+H]⁺ calcd. C10H10O4 195.0, found 195.1.

(e) Synthesis of ethyl 4-(2-formylphenoxy)butanoate

Procedure: This compound is synthesized according to the synthesis of compound 2e.

60% yield; MeOH:CHCl₃ 0.5:99.5, Rf 0.52, ¾ NMR (500 MHz, CDCI3) δ 10.47 (s, 1H), 7.83 (dd, / = 7.7, 1.7 Hz, 1H), 7.56 - 7.51 (m, 1H), 7.02 (t, / = 7.5 Hz, 1H), 6.97 (d, / = 8.4 Hz, 1H), 4.16 (t, / = 6.0 Hz, 2H), 2.62 (t, / = 7.0 Hz, 2H), 2.25 - 2.18 (m, 2H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 189.8, 178.3, 161.1, 136.0, 128.5, 124.8, 120.8, 112.3, 67.2, 30.6, 24.2 ppm. MS (ESI) [M+H]⁺ calcd. C11H12O4 209.0, found 209.1. (ethane-l,2-diyl)bis(2-bromoacetamide)

Procedure: The ethane- 1 ,2 -diamine (10 mmol, 670 mg) was dissolved in DCM (10 ml), in a 100 ml round bottom flask and K2CO3 (4.1 g, 30 mmol) in 20 ml of H₂O was added to it. Bromoacetyl bromide (30 mmol, 6 g), dissolved in DCM (20 ml) was added drop wise to the mixture at 0-5 °C. The reaction mixture was stirred for 12 h and the reaction progress was analyzed using thin layer chromatography. On completion of the reaction, reaction mixture was extracted with DCM. The collected organic fractions were dried over anhydrous sodium sulfate and filtered, the filtrate was concentrated under reduced pressure to afford Ν,Ν'- (ethane-l ,2-diyl)bis(2-bromoacetamide) as a white solid (95% yield, 2.85 g). ¾ NMR (500 MHz, CDCb) ^lH NMR (500 MHz, CDCI3) δ 3.88 (s, 2H), 3.50 - 3.47 (m, 2H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 166.7, 40.2, 28.8 ppm. MS (ESI) [M+H]⁺ calcd. CioHioBr₂N₂02 302.0, found 302.1.

(g) Synthesis N,N'-(ethane-l,2-diyl)bis(2-(2-formylphenoxy)acetamide) (2h)

In a 100 ml round bottom flask, 2-hydroxy benzaldehyde (1.97 g, 16.2 mmol), N,N'-(ethane-

1 ,2-diyl)bis(2-bromoacetamide) (1.6 g, 5.4 mmol) and K2CO3 (2.23 g, 16.2 mmol) dissolved in acetonitrile (54 ml). The reaction mixture was allowed to reflux for 12 h. The reaction was monitored using thin layer chromatography and upon completion, the reaction mixture was filtered to removed potassium carbonate. The solution was concentrated under vacuum and the product was purified using flash column chromatography (MeOH:DCM 3 :97) to afford N,N'-(ethane-l,2-diyl)bis(2-(2-formylphenoxy)acetamide) 2h (78% yield, 1.56 g). TLC (MeOH:DCM 10:90, Rf 0.52), ¾ NMR (500 MHz, CDCI3) δ 10.11 (s, 1H), 7.72 (dd, / = 7.6, 1.7 Hz, 1H), 7.58 (t, / = 8.4, 7.5, 1.8 Hz, 1H), 7.14 (t, / = 7.2 Hz, 1H), 6.92 (d, / = 8.3 Hz, 1H), 4.58 (s, 2H), 3.66 - 3.62 (m, 2H) ppm. ¹³C NMR (125 MHz, CDCb) δ 190.5, 168.3, 158.1, 136.1, 133.3, 124.9, 121.9, 113.0, 67.5, 39.3 ppm. HRMS (ESI) [M+Na]⁺ calcd. C2oH₂oN₂Na04407.1219, found 407.1185.

(h) Synthesis of N,N'-(((oxybis(ethane-2,l-diyl))bis(oxy))bis(propane-3,l-diyl))bis(2- bromoacetamide)

Procedure: This compound is synthesized according to the synthesis of compound. Yield 72%; TLC (MeOH: DCM 10:90, Rf 0.52), ¾ NMR (500 MHz, CDC1₃) δ 3.85 (s, 4H), 3.68 - 3.64 (m, 4H), 3.64 - 3.56 (m, 8H), 3.41 (dd, / = 12.2, 5.8 Hz, 4H), 1.81 (dd, / = 11.5, 5.8 Hz, 4H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 165.4, 70.5, 70.3, 70.3, 38.9, 29.3, 28.5 ppm. MS (ESI) [M+H]⁺ calcd. CioH₂6Br₂N₂05 463.0, found 463.1.

(i) Synthesis of N,N'-(((oxybis(ethane-2,l-diyl))bis(oxy))bis(propane-3,l-diyl))bis(2-(2- formylphenoxy)acetamide) (2i)

Procedure: This compound is synthesized according to the synthesis of compound 2h. Yield 73%; MeOH : DCM 10:90, Rf 0.52, ¾ NMR (500 MHz, CDCI3) δ 10.25 (s, 2H), 7.77 (d, / = 7.6 Hz, 2H), 7.59 - 7.53 (t, 2H), 7.13 (t, / = 7.5 Hz, 2H), 6.92 (d, / = 8.4 Hz, 2H), 4.55 (s, 4H), 3.56 - 3.51 (m, 10H), 3.48 - 3.42 (m, 4H), 1.87 - 1.80 (m, 4H) ppm. ¹³C NMR (125 MHz, CDCI3) δ 190.0, 167.3, 158.4, 136.1, 132.6, 125.0, 121.9, 113.0, 70.4, 70.2, 69.3, 67.6, 37.0, 29.2 ppm. HRMS (ESI) [M+H]⁺ calcd. C28H36N2O9545.2421, found 545.2508.

Example 2:

Tagging reagents: O-alkoxyamine derivatives

2-(3-bromopropoxy)isoindoline-l,3-dione

Procedure: In a 250 ml round bottom flask, N-hydroxyphthalimide (4894 mg, 30 mmol) and triethyl amine (6.09 ml, 60 mmol) were dissolved in ACN (60 ml). To this solution, 1,3- dibromo propane (8.34 ml, 60 mmol) was added and stirred at 25 °C for 16 h. The reaction mixture was concentrated in vacuo and was added 1 N NaOH solution and ethyl acetate. The organic layer was separated, dried over anhydrous sodium sulfate, filtered and concentrated in vacuo. Purification of the crude mixture by flash column chromatography using ethyl acetate :hexane (3:97) gave in 50% yield. ¾ NMR (400 MHz, CDCI3) δ 7.89-7.81 (m, 2H), 7.80-7.73 (m, 2H), 4.37 (t, / = 5.8 Hz, 2H), 3.71 (t, / = 6.5 Hz, 2H), 2.31 (p, / = 6.2 Hz, 2H) ppm. ¹³C NMR (100 MHz, CDC13) δ 163.7, 134.7, 129.0, 123.7, 76.2, 31.6, 29.4 ppm. MS (ESI) [MH]⁺ calcd. CnHnBr⁷⁹N0₃ 283.9, found 283.9 and calcd. CnHnBr⁸¹N0₃ 285.9, found 285.9. (b) Synthesis of 2-(3-((4-methyl-2-oxo-2H-chromen-7-yl)thio)propoxy)isoindoline-l,3- dione

Procedure: In a 25 ml round bottom flask, 7-Mercapto-4-methylcoumarin (192 mg, 1 mmol), K₂CO3 (276 mg, 2 mmol) and 2-(3-bromopropoxy)isoindoline-l ,3-dione (568 mg, 2 mmol) were dissolved in degassed acetonitrile (5 ml) and refiuxed for 16 h. The reaction mixture was concentrated in vacuo and purified by silica gel flash column chromatography using ethyl acetate :hexane (7:3) to give S17 in 95% yield. ¾ NMR (400 MHz, CDCI3) δ 7.90-7.82 (m, 2H), 7.81-7.73 (m, 2H), 7.50 (d, / = 8.2 Hz, 1H), 7.26-7.20 (m, 2H), 6.22 (d, / = 0.8 Hz, 1H), 4.36 (t, / = 5.8 Hz, 2H), 3.35 (t, / = 7.1 Hz, 2H), 2.41 (d, / = 0.9 Hz, 3H), 2.23-2.08 (m, 2H). ¹³C NMR (100 MHz, CDCI3) δ 163.8, 160.7, 154.0, 152.3, 142.6, 134.7, 129.0, 124.9, 123.8, 123.4, 117.5, 114.8, 114.1, 76.6, 28.7, 27.8, 18.7 ppm. HRMS (ESI) [MH]⁺ calcd. C₂iHi₈N0₅S 396.0906, found 396.0925

aminooxy)propyl)thio)-4-methyl-2H-chromen-2-one

Procedure: 2-(3-((4-methyl-2-oxo-2H-chromen-7-yl)thio)propoxy)isoindoline-l ,3-dione S17 (237 mg, 0.6 mmol) in 5 ml round bottom flask was dissolved in CH₂CI₂ (12 ml). To this solution, hydrazine monohydrate (80%, 29 μΐ, 0.6 mmol) was added and stirred at 25 °C for 3 h. The reaction mixture was filtered and the filtrate was concentrated. Purification of crude mixture by reverse phase preparative HPLC gave 10c (76 mg, 45% yield). ^lH NMR (400 MHz, CDCI3) δ 7.44 (d, / = 8.3 Hz, 1H), 7.20-7.11 (m, 2H), 6.18 (d, / = 0.9 Hz, 1H), 3.77 (t, / = 5.9 Hz, 2H), 3.05 (t, / = 7.3 Hz, 2H), 2.38 (d, / = 0.8 Hz, 3H), 2.02-1.90 (m, 2H) ppm. ¹³C NMR (100 MHz, CDCI3) δ 160.7, 154.0, 152.3, 143.3, 124.7, 123.1 , 117.2, 114.1, 113.9, 73.9, 29.0, 27.8, 18.7 ppm. HRMS (ESI) [MH]⁺ calcd. Ci₃Hi₆N0₃S 266.0851, found 266.0841.

(d) Synthesis of 3-((l,3-dioxoisoindolin-2-yl)oxy)propyl 3,5 bis(trifluoromethyl)benzoate

Procedure: In a 25 ml round bottom flask, 3,5-bis(trifluoromethyl)benzoic acid (258 mg, 1 mmol), 2-(3-bromopropoxy)isoindoline-l ,3-dione S15 (312 mg, 1.1 mmol) and TEA (418 μΐ, 3 mmol) were dissolved in acetonitrile (5 ml) to reflux. Progress of the reaction was followed by TLC. After 8 h, reaction mixture was concentrated and purification of crude by flash column chromatography (ethyl acetate :n-hexane, 2:98) gave 3-((l,3-dioxoisoindolin-2- yl)oxy)propyl 3,5-bis(trifluoromethyl)benzoate (335 mg, 73% yield). ^lU NMR (400 MHz, CDCb) δ 8.51 (s, 2H), 8.05 (s, 1H), 7.90-7.70 (m, 4H), 4.70 (t, / = 6.3 Hz, 2H), 4.39 (t, / = 6.0 Hz, 2H), 2.29 (p, / = 6.1 Hz, 2H) ppm. ¹³C NMR (101 MHz, CDCb) δ 164.0, 163.7, 134.7, 132.5, 132.3 (q, / = 34.1 Hz, 2C), 130.1 -129.8 (m, 2C), 129.0, 126.6-126.6 (m, 1C), 123.7, 123.0 (q, / = 272.8 Hz, 2C), 74.9, 62.7, 27.8 ppm. ¹⁹F NMR (376 MHz, CDCb) δ - 62.94 [Trifluoro acetic acid (TFA) was used as an internal standard, -75.70 ppm] ppm. HRMS (ESI) [MH]⁺ calcd. C20H14F6NO5 462.0776, found 462.0775.

nooxy)propyl 3,5-bis(trifluoromethyl)benzoate (10a)

Procedure: In a 5 ml round bottom flask, 3-((l ,3-dioxoisoindolin-2-yl)oxy)propyl 3,5- bis(trifluoromethyl)benzoate (138 mg, 0.3 mmol) in DCM (3 ml) was added hydrazine monohydrate (80%, 37 μΐ, 0.75 mmol) and stirred at room temperature. The progress of the reaction was followed by TLC. After 3 h, reaction mixture was filtered and concentration of filtrate in vacuo gave 3-(aminooxy)propyl 3,5-bis(trifiuoromethyl)benzoate 10a (80 mg, 81 % yield). ¾ NMR (400 MHz, CDCb) δ 8.48 (s, 2H), 8.07 (s, 1H), 5.42 (bs, 2H), 4.50 (t, / = 6.5 Hz, 2H), 3.83 (t, / = 6.1 Hz, 2H), 2.11 (p, / = 6.3 Hz, 2H) ppm. ¹³C NMR (101 MHz, CDCb) δ 164.1, 132.6, 132.4 (q, / = 33.9 Hz, 2C), 130.0-129.7 (m, 2C), 126.6-126.3 (m, 1C), 123.02 (q, / = 273.0 Hz, 2C), 72.2, 63.6, 27.9 ppm. ¹⁹F NMR (376 MHz, CDCb) δ -62.54 (TFA was used as an internal standard, -75.70 ppm) ppm. HRMS (ESI) [MH]⁺ calcd. C12H12F6NO3 332.0731 , found 332.0699.

(f ) Synthesis of 3-((l,3-dioxoisoindolin-2-yl)oxy)propyl 5-((3aS,4S,6aR)-2- oxohexahydro-lH-thieno[3,4-d]imidazol-4-yl)pentanoate

Procedure: In 5 ml round bottom flask, biotin (244 mg, lmmol), 2-(3- bromopropoxy)isoindoline-l,3-dione (568 mg, 2 mmol) and DBU (304 μΐ, 2 mmol) were dissolved in acetonitrile (20 ml) to reflux. The progress of the reaction was analyzed by TLC. After 16 h, reaction mixture was concentrated on vacuum and carried out for ethyl acetate and water work up. The collected organic fractions were dried on anhydrous sodium sulfate, filtered and concentrated on rotary evaporator. Purification of crude reaction mixture by flash chromatography (MeOH/DCM, 0.5-5%) gave 3-((l ,3-dioxoisoindolin-2-yl)oxy)propyl 5-(2- oxohexahydro-lH-thieno[3,4-d]imidazol-4-yl)pentanoate (224 mg, 50% yield). *H NMR (500 MHz, CDCb) δ 7.87-7.80 (m, 2H), 7.78-7.71 (m, 2H), 5.95 (s, IH), 5.46 (s, IH), 4.48 (dd, / = 15.0, 9.8 Hz, IH), 4.38-4.23 (m, / = 14.0, 6.2 Hz, 5H), 3.20-3.11 (m, IH), 2.89 (dd, / = 12.8, 5.0 Hz, IH), 2.72 (d, / = 12.8 Hz, IH), 2.34 (t, / = 7.5 Hz, 2H), 2.16-2.05 (m, 2H), 1.80-1.60 (m, 4H), 1.53-1.37 (m, 2H) ppm. ¹³C NMR (126 MHz, CDCb) δ 173.7, 163.7, 163.7, 134.7, 128.9, 123.7, 75.1, 62.0, 60.7, 60.2, 55.5, 40.7, 33.9, 28.4, 28.3, 27.8, 24.9 ppm. HRMS (ESI) [MH]⁺ calcd. C21H26N3O6S 448.1542, found 448.1548.

(g) Synthesis of 3-(aminooxy)propyl 5-(2-oxohexahydro-lH-thieno[3,4-d]imidazol-4-yl)

¾ NMR (500 MHz, D₂0) δ 4.63 (dd, / = 7.9, 4.9 Hz, IH), 4.45 (dd, / = 7.9, 4.5 Hz, IH), 4.21 (t, / = 6.3 Hz, 2H), 3.90 (t, / = 6.2 Hz, 2H), 3.42-3.31 (m, IH), 3.02 (dd, / = 13.1, 5.0 Hz, IH), 2.80 (d, / = 13.0 Hz, IH), 2.44 (t, / = 7.3 Hz, 2H), 2.08-1.94 (m, 2H), 1.84-1.55 (m, 4H), 1.53-1.37 (m, 2H) ppm ¹³C NMR (126 MHz, D₂0) δ 176.9, 165.3, 72.4, 62.1, 62.0, 60.3, 55.3, 39.7, 33.6, 27.9, 27.6, 26.7, 24.1 ppm. HRMS (ESI) [MH]⁺ calcd. C13H24N3O4S 318.1488, found 318.1467.

Example 3 :

amino acid amides 1

Procedure: The amino acid (4 mmol) was dissolved in methanol (10 ml), the solution was cooled to 0 °C and thionyl chloride (8 mmol) was added drop wise. The reaction mixture was heated to reflux, stirred for 6 h-12 h and cooled to room temperature. Solvents were evaporated under reduced pressure, and the resulting product was used in the next step without further purification (95% yield). The amino acid ester hydrochloride (4 mmol) was dissolved in ammonia solution (2 ml) and the reaction mixture was stirred at room temperature for 2-4 h. Solvents were evaporated under reduced pressure and resulting in amino acid amide 1.

¾ NMR (500 MHz, D₂0) δ 3.75 (d, / = 7.5 Hz, 1H) ppm. ¹³C NMR (125 MHz, D₂0) δ 169.3, 40.0 ppm. MS (ESI) [M+H]⁺ calcd. C₂H₆N₂0 75.0, found 75.1.

¾ NMR (500 MHz, D₂0) δ 4.13 (q, / = 6.0 Hz, 1H), 1.55 (d, / = 18.1 Hz, 3H) ppm. ¹³C NMR (125 MHz, D₂0) δ 173.1, 48.8, 16.4 ppm. MS (ESI) [M+H]⁺ calcd. C₃H₈N₂0 89.0, found 89.1.

lU NMR (500 MHz, D₂0) δ 4.07 (t, / = 6.5 Hz, 1H), 3.28 (t, / = 6.8 Hz, 2H), 2.01 -1.90 (m, 2H), 1.68 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 172.1, 156.7, 52.6, 40.3, 28.0, 23.5 ppm. MS (ESI) [M+H]⁺ calcd. C₆Hi₅N₅0 174.1 , found 174.1.

¾ NMR (500 MHz, D₂0) δ 3.92 (ddd, / = 5.6, 4.1, 1.5 Hz, 1H), 3.08 - 2.92 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 172.3, 55.8, 24.7 ppm. MS (ESI) [M+H]⁺ calcd. C₃H₈N₂OS 121.0, found 121.1.

¾ NMR (500 MHz, D₂0) δ 4.34 (dd, J = 9.1 , 5.2 Hz, 1H), 2.64 - 2.37 (m, 3H), 2.18 - 2.06 (m, 1H) ppm. ¹³C NMR (125 MHz, D₂0) δ 182.2, 177.8, 56.6, 29.2, 25.2 ppm. (ESI) [M+H]⁺ calcd. C5H11N3O2 146.0, found 146.1.

¾ NMR (500 MHz, D₂0) δ 8.10 (s, 1H), 7.23 (s, 1H), 4.22 (t, / = 6.6 Hz, 1H), 3.25 (m, / = 15.4, 6.7 Hz, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 172.0, 135.6, 129.7, 117.1 , 52.7, 28.1

+H]⁺ calcd. C₆Hi₀N₄O 155.0, found 155.1.

¾ NMR (500 MHz, D₂0) δ 3.82 (d, / = 5.5 Hz, 1H), 1.96 - 1.86 (m, 1H), 1.51 - 1.35 (m, 1H), 1.24 - 1.12 (m, 1H), 0.94 (dd, / = 12.1, 7.0 Hz, 3H), 0.87 (q, / = 7.3 Hz, 3H) ppm. (ESI) [M+H]⁺ calcd. C₆Hi₄N₂0 130.1 , found 130.1.

¾ NMR (500 MHz, D₂0) δ 3.96 (t, 1H), 1.67 (s, / = 4.7 Hz, 4H), 0.89 (s, 9H) ppm. ¹³C NMR (125 MHz, D₂0) δ 173.0, 51.6, 39.8, 23.8, 21.8, 20.9 ppm. (ESI) [M+H]⁺ calcd.

found 130.1

¾ NMR (500 MHz, D₂0) δ 4.07 (t, / = 6.5 Hz, l H), 3.08 - 3.01 (m, 2H), 2.01 - 1.90 (m, 2H), 1.80 - 1.71 (m, 2H), 1.58 - 1.45 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 172.0, 52.7, 39.0, 30.2, 26.3, 21.1 ppm. (ESI) [M+H]⁺ calcd. C₆Hi₅N₃0 146.1 , found 146.1.

1 m ¾ NMR (500 MHz, D₂0) δ 4.19 (t, / = 6.6 Hz, 1H), 2.67 (m, / = 8.8, 6.9, 2.2 Hz, 2H), 2.26

- 2.19 (m, 2H), 2.15 (s, / = 2.4 Hz, 3H) ppm. ¹³C NMR (125 MHz, D₂0) δ 171.6, 52.1, 29.9,

28.1 , 13.9 ppm. (ESI) [M+H]⁺ calcd. C5H12N2OS 149.0, found 149.1.

¾ NMR (500 MHz, D₂0) δ 7.40 - 7.30 (m, 3H), 7.27 (d, / = 7.2 Hz, 2H), 4.21 (t, / = 7.1 Hz, 1H), 3.24 - 3.02 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 171.4, 135.0, 133.8, 129.4,

0, 127.6, 54.1, 36.7 ppm. (ESI) [M+H]⁺ calcd. C9H12N2O 165.1, found 165.1.

¾ NMR (500 MHz, D₂0) δ 4.24 (t, / = 7.0 Hz, 1H), 3.38 - 3.23 (m, 2H), 2.38 (t, / = 18.4

Hz, 1H), 2.00 (s, 3H) ppm. ¹³C NMR (125 MHz, D₂0) δ 173.7, 59.6, 46.4, 29.8, 24.1 ppm.

(ESI) [M+H]⁺ calcd. C5H10N2O 115.0, found 115.1.

o

HO y NH₂

NH₂

lU NMR (500 MHz, D₂0) δ 4.19 (t, 1H), 4.08 - 3.96 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 169.9, 60.1, 54.4 ppm. (ESI) [M+H]⁺ calcd. C3H8N2O2 105.0, found 105.1.

OH O

NH₂

t*

lU NMR (500 MHz, D₂0) δ 4.13 - 4.05 (m, 1H), 3.70 (d, / = 3.9 Hz, 1H), 1.22 (d, / = 6.1

[M+H]⁺ calcd. C4H10N2O2 119.0, found 119.1.

¾ NMR (500 MHz, D₂0) δ 7.71 (d, / = 8.1 Hz, 1H), 7.54 (dd, / = 8.1, 3.6 Hz, 1H), 7.32 - 7.25 (m, 1H), 7.20 (t, / = 7.5 Hz, 1H), 4.25 (t, / = 6.8 Hz, 1H), 3.43 - 3.32 (m, 2H) ppm. ¹³C NMR (125 MHz, D₂0) δ 172.9, 136.2, 126.5, 125.2, 122.1, 119.4, 118.2, 111.9, 106.6, 53.4, 27.2 ppm. (ESI) [M+H]⁺ calcd. C11H13N3O 204.1 , found 204.1.

¾ NMR (500 MHz, D₂0) δ 7.10 (t, / = 8.9 Hz, lH), 6.85 - 6.77 (m, 1H), 4.12 (t, / = 7.0 1H), 3.12 - 2.98 (m, lH) ppm. ¹³C NMR (125 MHz, D₂0) δ 171.6, 155.0, 130.8, 125.6, 115.8, 54.2, 35.9 ppm. (ESI) [M+H]⁺ calcd. C9H12N2O2 181.0, found 181.1.

¾ NMR (500 MHz, D₂0) δ 3.37 (t, / = 62.3 Hz, 1H), 2.23 - 1.85 (m, 1H), 0.90 (d, / = 24.5 Hz, 6H) ppm. (ESI) [M+H]⁺ calcd. C₄H₁0N₂O 117.0, found 117.1.

Example 4: General Procedures

The experiments are typically performed with a few μg to several mg of protein. The products, their purity, and site of modification are validated by gel electrophoresis, NMR spectroscopy, and mass spectrometry (MS, peptide mapping, and MS -MS).

[A] Single-site labeling of a protein

In a 1.5 ml Eppendorf tube, protein (3 nmol) was dissolved in sodium bicarbonate buffer (120 μΐ, 0.1 M, pH 7.8). To this solution, 2-(2-formyl phenoxy) acetic acid (1.5 μπιοΐ) in DMSO (30 μΐ) from a freshly prepared stock solution was added and vortexed at 25 °C. The overall concentration of protein and 2-(2-formylphenoxy)acetic acid was 20 μΜ and 10 mM respectively. After 24-48 h, the reaction mixture was diluted with acetonitrile: water (10:90, 3000 μΐ). The unreacted 2-(2-formylphenoxy)acetic acid and salts were removed by Amicon® Ultra-0.5 mL 3-kDa or 10-kDa MWCO centrifugal filters spin concentrator. The protein mixture was further washed with Millipore Grade I water (5 x 0.4 ml). The desalted sample was analyzed by ESI-MS or MALDI-ToF-MS. The aqueous sample was concentrated by lyophilization before subjecting it to digestion, peptide mapping, and sequencing by MS-

MS.

[B] Single-site immobilization, late-stage tagging, and purification of a protein

(i) Protein (3 nmol) in sodium bicarbonate buffer (120 μΐ, 0.1 M, pH 7.8) was taken in a 1.5 ml Eppendorf tube. To this solution, N,N'-(((oxybis(ethane-2,l-diyl))bis(oxy))bis(propane- 3,l-diyl))bis(2-(2-formylphenoxy)acetamide) 2i (1500 nmol) in DMSO (30 μΐ) from a freshly prepared stock solution was added and vortexed at 25 °C. After 24 h, the reaction mixture was diluted with acetonitrile:buffer (10:90, 1500 μΐ). Unreacted N,N'-(((oxybis(ethane-2,l- diyl))bis(oxy))bis(propane-3,l-diyl))bis(2-(2-formylphenoxy)acetamide) and salts were removed by spin concentrator (0.5 ml 3-kDa MWCO). The protein mixture was further washed with sodium bicarbonate buffer (0.1 M, pH 7.8) and concentrated to 160 μΐ. To the concentrated sample in sodium bicarbonate buffer, derivatives of O-hydroxylamine such as 3- (aminooxy)propyl 3,5-bis(trifluoromethyl)benzoate, 3-(aminooxy)propyl 5-((3aS,4S,6aR)-2- oxohexahydro-lH-thieno[3,4-d]imidazol-4-yl) pentanoate and 7-((3-(aminooxy)propyl)thio)- 4-methyl-2H-chromen-2-one (2 μmol) in DMSO (40 μΐ) from a freshly prepared stock solution was added to convert mono-labeled protein to its oxime derivative for 3-6 h. The excess of O-alkoxyamine and salts were removed by the spin concentrator. The sample was analyzed by ESI-MS. The salt-free sample was concentrated by lyophilization before subjecting it to digestion, peptide mapping, and sequencing by MS-MS.

(ii) Hydrazide beads (200 μΐ, hydrazide resin loading: 16 μιηοΐ/ιηΐ) were taken in a 5 ml fritted polypropylene chromatography column with end tip closures. Phosphate buffer (0.1 M, pH 7.0, 5 x 1 ml) was used to wash the beads. The beads were re-suspended in phosphate buffer (100 μΐ, 0.1 M, pH 7.0). Protein mixture (250 μΜ) in phosphate buffer (150 μΐ, 0.1 M, pH 7.0) and aniline (100 mM) in phosphate buffer (100 μΐ, 0.1 M, pH 7.0) were added to the beads followed by end-to-end rotation (30 rpm, rotary mixer) at 25 °C. The progress of the immobilization of the labeled protein on hydrazide resin was monitored (8-10 h) by UV- absorbance of the supernatant. Subsequently, the supernatant was removed and the beads were washed with phosphate buffer (0.3 M, pH 7.3, 4 x 1 ml) and KC1 (1 M, 3 x 1 ml) to remove the adsorbed protein from resin. The beads were further washed with the phosphate buffer (0.3 M, pH 7.0, 4 x 1 ml) and re-suspended (phosphate buffer, 200 μΐ, 0.3 M, pH 7.0). To release the labeled protein from its immobilized derivative, aniline (100 mM) in phosphate buffer (100 μΐ, 0.3 M, pH 7.0) and a O-hydroxylamine derivative of a tagging reagent (10a or 10b or 10c, 50 μΐ, 150 mM in DMSO) were added followed by vortex at 25 °C for 6-8 h. The supernatant was collected while the salts, aniline, and O-hydroxylamine were removed using the spin concentrator (3 kDa MWCO). The purity of the tagged protein was confirmed by ESI-MS.

[C] Preparation of resin-hydrogen bond promoter conjugate

N-hydroxy succinimidyl resin beads (200 μΐ, resin loading: 23 μιηοΐ/ιηΐ) were taken in a 5 ml fritted polypropylene chromatography column with end tip closures. Sodium bicarbonate buffer (0.1 M, pH 7.8, 3 x 1 ml) was used to wash the beads and were re-suspended (sodium bicarbonate buffer, 360 μΐ, 0.1 M, pH 7.8). To this solution, N-(3-(2-(2-(3- aminopropoxy)ethoxy)ethoxy)propyl)-2-(2-formylphenoxy)acetamide (13.8 μΜ) in DMSO (40 μΐ) from a freshly prepared stock solution was added and vortexed at 25 °C. The progress of the immobilization of the reagent on resin resin was monitored (2-6 h) by UV-absorbance of the supernatant. Subsequently, the supernatant was removed and the beads were washed with aqueous buffer (0.1 M NaHCOs/ 0.5 M NaCl pH 8.0, 3 x 1 ml; 0.1 M acetate/ 0.5 M NaCl pH 4.0, 3 x 1 ml) and H2O (3 x 1 ml) to remove the adsorbed reagent from resin. The hydrogen bond promoter immobilized resin was further washed with the sodium bicarbonate buffer (0.1 M, pH 7.8, 3 x 1 ml) and re-suspended (sodium bicarbonate buffer, 375 μΐ, 0.1 M, pH 7.8).

[D] Purification of unlabeled protein from mixture of proteins by forming resin-hemiaminal conjugate

To the above solution of hydrogen bond promoter immobilized resin, mixture of proteins (10 nmol) dissolved in sodium bicarbonate buffer (25 μΐ, 0.1 M, pH 7.8) was added and vortexed at 25 °C. The protein with Gly at N-terminus is immobilized selectively. To release the labeled protein from its immobilized derivative, glycine buffer (400 μΐ, 1 M, pH 3.6) was added and vortexed at 25 °C for 12-24 h. The supernatant was collected while the salts were removed using the spin concentrator (3 kDa MWCO). The purity of the native protein was confirmed by ESI-MS.

[E] Labeling of single protein in a mixture of proteins

Representative mixture of seven proteins - insulin, aprotinin, ubiquitin, cytochrome C, lysozyme C, β-lactoglobulin, and chymotrypsinogen A.

In a 1.5 ml Eppendorf tube, each protein (3 nmol) was mixed with sodium bicarbonate buffer (120 μΐ, 0.1 M, pH 7.8). To this solution, 2-(2-formyl phenoxy) acetic acid 2e (1500 nmol) in DMSO (30 μΐ) from a freshly prepared stock solution was added and vortexed at 25 °C. After 48 h, the reaction mixture was diluted with acetonitrile:buffer (10:90, 1500 μΐ). Unreacted N,N'-(((oxybis(ethane-2,l -diyl))bis(oxy))bis(propane-3,l-diyl))bis(2-(2- formylphenoxy)acetamide) and salts were removed by spin concentrator (0.5 ml 3-kDa MWCO). The modification of protein was analyzed by ESI-MS. [F] Expression of SUMO protein with N-terminus glycine Bacterial transformation

Desired E. coli strain was thawed [(DH5a for plasmid replication and BL21 (DE3) for protein expression]. The plasmid (1 μΐ) was added to the competent cells (50-100 μΐ) and was incubated on ice for 20 min. Subsequently, the heat shock was given at 42 °C for 40 seconds. The cells were kept on ice for 1 min, and 1 ml of LB was added to cells for recovery. The cells were incubated at 37 °C, 180 rpm for 45 min. The recovered cells were plated on LB plates containing desired antibiotics. The plates were incubated at 37 °C, overnight.

Protein purification

Primary culture was grown in LB overnight at 37 °C. The 1 :100 inoculation was done from the primary culture in LB media for secondary culture. At approximately 0.6 OD (600 nm), the secondary culture was induced with IPTG (200 μΜ) for 4 h at 30 °C. The induced culture was spun at 8000 rpm for 10 min to pellet down, and the pellet was stored at -80 °C.

For lysis, the cells were thawed. The pellet was resuspended in lysis buffer [20 mM Tris (pH 7.5), 150 mM NaCl, 1 mM EDTA, 50 μ^πιΐ lysozyme, 0.2% Triton X-100, 1 mM PMSF, IX LPA mix, 5 mM β-ΜΕ] and incubated for 10-15 min in ice with constant shaking in between. This was followed by sonication (45% Amplitude, 10 sec ON 10 sec OFF) till the solution became clear. The supernatant was collected after spinning for 30 min at 11000 rpm, 4 °C. For binding and elution, the supernatant was transferred to column containing washed GSH beads. The protein bead binding was facilitated at 4 °C on the tumbler for 1 h. The beads were washed thrice with wash buffer [20 mM Tris (pH 7.5), 400 mM NaCl, 1 mM EDTA, 5 mM β-ΜΕ]. The protein was eluted in elution buffer [20 mM Tris (pH 8.0), 150 mM NaCl, 1 mM EDTA, 20 mM glutathione] and concentration was determined using Bradford assay. For clipping, protein-bound beads were washed thrice with prescission protease buffer [50 mM Tris (pH 7.5), 1 mM EDTA, 1 mM DTT, 150 mM NaCl, 0.1% triton]. Prescission protease buffer and prescission protease was added to the column (1 :50). The column was incubated at 4 °C overnight and the soup containing clipped protein was eluted out.

[G] Digestion of protein

All solutions were made freshly prior to use.

Tryptic digestion of modified melittin and myoglobin

Protein (0.1 mg) in 100 mM tris (10 μΐ, pH 7.8) with urea (6 M) was taken in a 1.5 ml Eppendorf tube. Tert-butanol (10 μΚ) was added to this solution and incubated for 3 h at 37 °C. Grade I water was used to reduce the concentration of the sample to 0.6 M. The trypsin (10 μg) dissolved in aqueous medium (10 μΐ.) was added to this solution and the mixture was incubated at 37 °C for 18 h. The pH of digested solution was adjusted to < 6 (verified by pH paper) with trifluoroacetic acid (0.5 %). Afterwards, the sample was used for peptide mapping by MS and sequencing by MS-MS investigations.

Reduction of disulfide bond in modified insulin

Protein (0.1 mg) in 100 μΐ. of H₂O was taken in a 1.5 ml Eppendorf tube. To reduce the disulfide bond between chain A and chain B, dithiothreitol (10 μΕ, 0.2 M DTT in H₂O) was added to the modified insulin solution. The reaction mixture was incubated at 37°C for 3-6 h. The N-terminal modification of chain A was confirmed by LC-MS/MS analysis.

[H] Western blot analysis

Activated Akt (pAkt) and GAPDH (loading control) were determined by western blot and quantified. For sample preparation, cells were washed twice with PBS and lysed directly in the lamelli buffer by boiling at 100 °C for 10 min. Cell lysate thus obtained was analyzed on 8% SDS-polyacrylamide gel and transferred onto polyvinylidene fluoride (PVDF) membrane (0.45 μιη pore size). The membrane was blocked with 3% BSA for 30 minutes and incubated overnight with primary antibodies pAkt antibody (1 :2000) and GAPDH (1 :6000) at 4 °C. After primary antibody incubation, the membrane was washed thrice with TBST (5 min each at room temperature) and then incubated with the HRP-conjugated anti-rabbit secondary antibody (Genie) (1 : 10,000) for 1 h at room temperature. Protein bands were detected by chemiluminescence using ECL plus Western Blotting Detection System (Thermo Pierce).

[I] Imaging assays

The HEK293T cells were grown in a six-well plate with coverslips in DMEM media containing 1 % serum for 24 h. Subsequently, the cells were washed twice with PBS and treated with coumarin tagged 11c and untagged insulin 6d for 30 min in 10% FBS containing DMEM media. Post-treatment, cells were again washed twice with PBS and fixed using 100% chilled methanol for 15 min at -20 °C. The cells were then rehydrated and permeabilized with rehydration buffer (10 mM Tris, 150 mM NaCl, 0.1% TritonX-100) for 10 min. For coumarin tagged insulin imaging, nuclei were stained with Hoechst 33342 (Invitrogen) directly after permeabilization and images were taken. For pAkt imaging, cells were blocked with 5% Normal Goat Serum (NGS) for 30 min at room temperature after rehydration. The cells were stained overnight with pAkt antibodies (1 :200) at 4 °C. After primary antibody incubation, cells were washed three times with PBS-T (5 min each). Alexa Fluor-568 conjugated goat anti-rabbit IgG (1 :800, Life Technologies) secondary antibody was used against pAkt. After this, the nuclei were stained with Hoechst 33342 (Invitrogen), and fluorescence images were captured on APOTOME/Zeiss LSM 780 confocal microscope. All image analysis was performed using ZEN (Zeiss) or Image J software.

ADVANTAGES:

Synthesis of stable hemiaminals which do not exhibit signs of decomposition during purification, isolation, and characterization.

They are air and water stable and can be stored at room temperature.

Single-site chemical modification of native proteins or recombinant proteins or any polypeptide with Gly at the N-terminus.

Selective modification of proteins with N -NH₂ Gly at N-terminus in mixture of proteins. Chemoselective and site-specific chemical modification of native proteins or recombinant proteins or any polypeptide with Gly at the N-terminus.

The technique offers predictability and diversity in single-site protein modification.

The technique offers opportunity for late stage modification of labeled site and purification of tagged protein.

The technique offers opportunity for single-site ordered immobilization of proteins on solid phase.

Hemiaminal process enables a simple protocol for protein purification from mixture of proteins through hemiaminal precursor-resin conjugate.

The derivatives of native proteins thus obtained by site selective modification has wide range of applications in probing biological interactions, ligand discovery, protein purification, disease diagnosis, and high-throughput screening.

The technique can be used for protein conjugation with drugs, proteins, and other biomolecules.

REFERENCES:

Usera; Aimee et al 2015, US Patent 20150017192. (Site-specific chemoenzymatic protein modifications)

Hober; Sophia et al 2013, US Patent 20130184442. Method for labeling of compounds) Schultz et al 2012, US Patent 20120202243. (In vivo incorporation of unnatural amino acids) Schultz et al 2015, US Patent 20150018523. (Unnatural reactive amino acid genetic code additions)

Davis et al 2011, US Patent 20110059501. (Protein glycosylation)

Noren, C. J.; Anthony-Cahill, S. J.; Griffith, M. C; Schultz, P. G. Science, 1989, 244, 182- 188. (A General Method for Site-Specific Incorporation of Unnatural Amino Acids into Proteins)

Cornish, V. W.; Benson, D. R.; Altenbach, C. A. Hideg, K.; Hubbell, W. L.; Schultz, P. G. Proc. Natl. Acad. Sci. (USA), 1994, 91, 2910-2914. (Site Specific Incorporation of Biophysical Probes into Proteins)

Kim, C ; Axup, J.; Schultz, P. G. Curr. Opin. Chem. Biol. 2013, 17, 412-419. Protein conjugation with genetically encoded unnatural amino acids)

Xiao, H. ; Chatterjee, A.; Choi, S.; Bajjuri, K. M.; Sinha, S. C; Schultz, P. G.; Angew. Chem. Int. Ed. 2013, 52, 14080-14083. (Genetic incorporation of multiple unnatural amino acids into proteins into mammalian cells)

Chalker, J. M; Bernardes, G. J. L.; Davis, B. G. Acc. Chem. Res., 2011, 44, 730-741. (A "Tag-and-Modify" Approach to Site-Selective Protein Modification)

Krueger, A. T.; Imperiali, B. ChemBioChem 2013, 14, 788-799. (Fluorescent Amino Acids: Modular Building Blocks for the Assembly of New Tools for Chemical Biology)

Smith, E. L.; Giddens, J. P.; Iavarone, A. T.; Godula, K.; Wang, L. X.; Bertozzi, C. R. Bioconjug. Chem. 2014, 25,788-795. (Chemoenzymatic Fc Glycosylation via Engineered Aldehyde Tags.

Claims

The Claims:

1. A hemiaminal for selective labeling of N-terminal glycine in proteins and peptides for stabilization and of general formula I and II;

For R^1"9: n, m, 1 = 1-10, The R¹ group is selected from natural and unnatural amino acid derivatives with -CH₂NH₂ group, peptides, proteins, organic fragments with - C(=0) CH₂NH₂ or can be selected from aryl; heteroaryl; heterocycle; cycloalkyl; alkyl; lower alkyl; and alkenyl. R⁶ is selected as outlined in the figure, R^2"5 and R^7"9 are independently selected H; alkyl; lower alkyl; cycloalkyl; aryl; heteroaryl; alkenyl; heterocycle; halides; nitro; -C(0)OR^* wherein R^* is selected from H, alkyl; cycloalkyl and aryl; -C(0)NR^**R^***, wherein R^** and R^*** are independently selected from H, alkyl; cycloalkyl and aryl; -CH₂C(0)R_a, wherein R_a is selected from -OH, lower alkyl, cycloalkyl; aryl, -lower alkyl-aryl, -cycloalkyl-aryl; or -NR_bR_c, where R_b and Rc are independently selected from H, lower alkyl, cycloalkyl; aryl or -lower alkyl-aryl; -C(0)R_d, wherein R_d is selected from lower alkyl, cycloalkyl; aryl or - lower alkyl-aryl; or -lower alkyl-ORe, wherein R_e is a suitable protecting group or OH group. R¹ group can also be selected from an amino acid, small peptide, large peptide, a protein, an antibody, their unnatural derivatives or other biomolecules bearing -CH₂NH₂ group. Small peptide is a 2-mer to 10-mer peptide and large peptide is 11 -mer to 30-mer peptide. All the Rⁿ groups are optionally substituted at one or more substitutable positions with one or more suitable substituents; the "suitable substituent" includes independently H; hydroxyl; cyano; alkyl, such as lower alkyl, such as methyl, ethyl, propyl, n-butyl, t-butyl, hexyl and the like; alkoxy, such as lower alkoxy such as methoxy, ethoxy, and the like; aryloxy, such as phenoxy and the like; vinyl; alkenyl, such as hexenyl and the like; alkynyl; formyl; haloalkyl, such as lower haloalkyl which includes CF3, CCI3 and the like; halide; aryl, such as phenyl and napthyl; heteroaryl, such as thienyl and furanyl and the like; amide such as C(0)NR**R***, , where R** and R*** are independently selected from lower alkyl, aryl or benzyl, and the like; acyl, such as C(0)-C6Hs, and the like; ester such as - C(0)OCH3 the like; ethers and thioethers, such as O-Bn and the like; thioalkoxy; phosphino; and -NRbRc, where Rb and R_c are independently selected from lower alkyl, aryl or benzyl, and the like. The term "lower alkyl" as used herein either alone or in combination with another substituent means acyclic, straight or branched chain alkyl substituent containing from one to six carbons and includes for example, methyl, ethyl, 1 methylethyl, 1-methylpropyl, 2-methylpropyl, and the like. A similar use of the term is to be understood for "lower alkoxy", "lower thioalkyl", "lower alkenyl" and the like in respect of the number of carbon atoms. For example, "lower alkoxy" as used herein includes methoxy, ethoxy, t -butoxy;

the term "alkyl" encompasses lower alkyl, and also includes alkyl groups having more than six carbon atoms, such as, for example, acyclic, straight or branched chain alkyl substituents having seven to ten carbon atoms;

the term "aryl" as used herein, either alone or in combination with another substituent, means an aromatic monocyclic system or an aromatic polycyclic system. For example, the term "aryl" includes a phenyl or a napthyl ring, and may also include larger aromatic polycyclic systems, such as fluorescent (eg. anthracene) or radioactive labels and their derivatives;

the term "heteroaryl" as used herein, either alone or in combination with another substituent means a 5, 6, or 7-membered unsaturated heterocycle containing from one to 4 heteroatoms selected from nitrogen, oxygen, and sulphur and which form an aromatic system. The term "heteroaryl" also includes a polycyclic aromatic system comprising a 5, 6, or 7-membered unsaturated heterocycle containing from one to 4 heteroatoms selected from nitrogen, oxygen, and sulphur;

the term "cycloalkyl" as used herein, either alone or in combination with another substituent, means a cycloalkyl substituent that includes for example, but is not limited to, cyclopropyl, cyclobutyl, cyclopentyl, cyclohexyl and cycloheptyl. The term also involves "cycloalkyl-alkyl-" that means an alkyl radical to which a cycloalkyl radical is directly linked; and includes, but is not limited to, cyclopropylmethyl, cyclobutylmethyl, cyclopentylmethyl, 1 -cyclopentylethyl, 2-cyclopentylethyl, cyclohexylmethyl, 1 -cyclohexylethyl and 2-cyclohexylethyl. A similar use of the "alkyl" or "lower alkyl" terms is to be understood for aryl-alkyl-, aryl-lower alkyl- (eg. benzyl), -lower alkyl-alkenyl (eg. allyl), heteroaryl- alkyl-, and the like as used herein. For example, the term "aryl-alkyl-" means an alkyl radical, to which an aryl is bonded. Examples of aryl-alkyl- include, but are not limited to, benzyl (phenylmethyl), 1 -phenylethyl, 2-phenylethyl and phenylpropyl. As used herein, the term "heterocycle", either alone or in combination with another radical, means a monovalent radical derived by removal of a hydrogen from a three- to seven-membered saturated or unsaturated (including aromatic) heterocycle containing from one to four heteroatoms selected from nitrogen, oxygen and sulfur. Examples of such heterocycles include, but are not limited to, pyrrolidine, tetrahydrofuran, thiazolidine, pyrrole, thiophene, hydantoin, diazepine, imidazole, isoxazole, thiazole, tetrazole, piperidine, piperazine, homopiperidine, homopiperazine, 1 ,4-dioxane, 4-morpholine, 4-thiomorpholine, pyridine, pyridine-N-oxide or pyrimidine, and the like;

the term "alkenyl", as used herein, either alone or in combination with another radical, is intended to mean an unsaturated, acyclic straight chain radical containing two or more carbon atoms, at least two of which are bonded to each other by a double bond. Examples of such radicals include, but are not limited to, ethenyl (vinyl), 1-propenyl, 2-propenyl, and 1-butenyl. The term "alkynyl", as used herein is intended to mean an unsaturated, acyclic straight chain radical containing two or more carbon atoms, at least two of which are bonded to each other by a triple bond. Examples of such radicals include, but are not limited to, ethynyl, 1-propynyl,

2-propynyl, and 1 -butynyl;

the term "alkoxy" as used herein, either alone or in combination with another radical, means the radical -0-(Ci_-x) alkyl wherein alkyl is as defined above containing 1 or more carbon atoms, and includes for example methoxy, ethoxy, propoxy, 1 -methylethoxy, butoxy and 1,1-dimethylethoxy. Where x is 1 to 6, the term "lower alkoxy" applies, as noted above, whereas the term "alkoxy" encompasses "lower alkoxy" as well as alkoxy groups where x is greater than 6 (for example, x = 7 to 10). The term "aryloxy" as used herein alone or in combination with another radical means -O-aryl, wherein aryl is defined as noted above.

The hemiaminal as claimed in claim 1 , wherein the R⁶ is a hydrogen bond donor or acceptor.

3. The hemiaminal as claimed in claim 2, wherein the hydrogen bond donor is selected from

4. The hemiaminal as claimed in claim 2, wherein the hydrogen bond acceptor is

The hemiaminal as claimed in claim 1, wherein the hemiaminal is selective in stabilization of peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH2NH2 group over -C(=0) CHRNH2 group (where R≠H).

The hemiaminal as claimed in claim 1 , wherein the hemiaminal is for the stabilization of recombinant proteins modified with N-terminal glycine.

The hemiaminal as claimed in claim 1, wherein the percent stabilization of the hemiaminal of peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH2NH2 group is in the range 25% -100%.

Stabilized peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH2NH2 with the hemiaminal as claimed in claim 1.

A hemiaminal resin conjugate with the hemiaminal as claimed in claim 1.

10. A method of obtaining single site labeled peptides, proteins, organic fragments with - C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ with the hemiaminal as claimed in claim 1. comprising of:

reacting the peptides, proteins, organic fragments with -C(=0) CH2NH2 and derivatives with -CH₂NH₂ with the hydrogen bond promotor;

removing the unreacted hydrogen bond promotors and purification to obtain the single site labelled proteins and peptides with the stable hemiaminal formed.

11. A method of obtaining single site labeled proteins from a mixture of proteins with - C(=0) CH2NH2, and -C(=0) CHRNH₂ group (where R≠H) with the hemiaminal as claimed in claim 1 , comprising of: reacting the mixture of proteins with the hydrogen bond promotor; removing the unreacted hydrogen bond promotors; removing the unreacted proteins; and purification to obtain the single site labelled proteins with the stable hemiaminal formed from the protein mixture.

12. A method of obtaining N-terminal glycine introduced recombinant proteins for single site labeled proteins with the hemiaminal as claimed in claim 1.

13. A method of obtaining hemiaminal resin conjugate with the hemiaminal as claimed in claim 1, comprising of: synthesis of the hydrogen bond promoter with functional groups required for its installation on the resin; installation of the hydrogen bond promoter on the resin; reacting the peptides, proteins, organic fragments with -C(=0) CH₂NH₂ and derivatives with -CH₂NH₂ with the hydrogen bond promotor immobilized on resin.

14. A method of obtaining single-site ordered immobilization of proteins through N- terminus Gly on solid phase with the hemiaminal as claimed in claim 1