CN101124581A - Identification and characterization of proteins using new database search modes - Google Patents

Identification and characterization of proteins using new database search modes Download PDF

Info

Publication number
CN101124581A
CN101124581A CNA2005800070925A CN200580007092A CN101124581A CN 101124581 A CN101124581 A CN 101124581A CN A2005800070925 A CNA2005800070925 A CN A2005800070925A CN 200580007092 A CN200580007092 A CN 200580007092A CN 101124581 A CN101124581 A CN 101124581A
Authority
CN
China
Prior art keywords
polypeptide
candidate
sample
mass
protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005800070925A
Other languages
Chinese (zh)
Inventor
尼尔·L.·凯莱赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Illinois
Original Assignee
University of Illinois
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Illinois filed Critical University of Illinois
Priority claimed from PCT/US2005/007344 external-priority patent/WO2005088303A2/en
Publication of CN101124581A publication Critical patent/CN101124581A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

A method of selecting a set of candidate polypeptides for a sample polypeptide that includes a first refining of a collection of candidate polypeptides from differences in mass of fragments of the sample polypeptide produced by mass spectrometry and a second refining of the collection of candidate polypeptides from the absolute mass of the sample polypeptide and the absolute mass of the fragments.

Description

Use new database search modes to differentiate and identification of protein
The statement of government-funded
The present invention finishes under from National Science Foundation (fund # CHE-0134953) and the government's support from NIH (fund # GM 067193-01).Government has certain right in the present invention.
The annex material
Annex comprises 3 CD duplicate copy, and it provides software and database file.It is for referencial use that the CD content is incorporated this paper into.
Background
A molecular biological purpose is structure and the chemical-biological activities of identifying by gene order encoded protein matter.To a great extent, the structure of protein is identified and is depended on the primary structure (amino acid sequence) of determining protein under the n cell condition when these protein is expressed.In case protein is translated from mRNA, the primary structure of protein is often modified owing to the effect of enzyme.The side chain that these modifications are included in amino acid residue adds a new group (moiety), as adding a phosphate to serine, and perhaps protease cracking, as remove initial methionine or burst.Therefore, the structure of protein is identified the linearity group structure comprise amino acid sequence (as being influenced by alternative splicing and polymorphism) and the existence of any modification that can take place in sequence.
For this reason, the fundamental purpose of proteomics research is to understand the detailed modification that takes place on protein.This category information is critical for the biologic activity of understanding protein not only, and for exploitation be used for controlling with the cell proliferation of human diseases correlated process and the medicine of differentiation also be very important.
Mass spectrophotometry (MS) is a kind of analytical technology that is used to differentiate unknown compound, quantitative known compound and definite molecular structure.Mass spectrometer is the instrument of a kind of measurement from the quality of the ion of individual molecular conversion.This instrument is according to the extra fine quality of ion and charge ratio and measure molecular mass indirectly.Electric charge on ion is represented by the z of elementary charge unit and the mass-charge ratio m/z of electronics.Typically, the ion in the mass spectrophotometry only has a single electric charge (z=1), and therefore, the m/z value equates in number with the molecular mass of representing with Da.For single ionization, m/z is than the quality that is specific ion.
Usually, the MS bombardment has the ion of the sample of high density proton, electronics or neutral gas, and fracture connects key, causes forming fragmention (fragment ions) from the molion of complete molecule.Although by MS produce positive and negative ion the two, only have a kind of polarity ion detection to be set by a particular instrument.The formation of gas phase sample ions makes can be according to the various ions of quality sorting, and make it to be detected.Sample can be solid, liquid or steam, and they enter the vacuum chamber of instrument through inlet.Static and/or Magnetic filter are used to basis m/z separately than selecting ion, and these ions are concentrated on the detector.In detector, ionic flux is changed into proportional current.Instrument writes down the spoke degree of these electric signal subsequently, as the function of m/z, and changes this information into mass spectrum.
The absolute mass retrieval makes and can differentiate the protein (see figure 1) clearly from a sequence library with the combination of full-quality and fragmention quality.Discriminating is by having the annotations database all sequences in tolerance (the user specified tolerance of an observed average or monoisotopicintact mass) scope of the average or single isotope full-quality of selecting to be in user-defined observation realize from one.Preferably, candidate albumen matter is to retrieve from the database with the protein form of quality index.
With the fragmention of observing each candidate sequence is scored then.This process relates to from each candidate sequence to be calculated all theoretical b/y or c/z type fragmention quality (average or single isotope) and calculates the observed fragmention number that is in user-defined any theoretical fragmention tolerance (umbers absolute or per 1,000,000 parts).Observed fragmention number and be used to calculate the probability of false discriminating corresponding to the observed fragmention number of theoretical fragmention.All calculate score values and the candidate sequence number of being considered and multiply each other and obtain score value based on probability.Then, the candidate albumen matter with minimum score value (and therefore have for false discriminating minimum probability) is considered to the candidate albumen matter of possibility maximum.
MS has been used to determine the one-level amino acid sequence of protein.The mass discrepancy of observed protein fragmention can be used to the to derive amino acid of a part of protein sequence is formed.These sequence marks can be used to differentiate that protein sequence, condition are the MS data that can obtain the related protein fragmention of enough numbers.
Use the strategy of MS being developed efficient and the reliability that detects protein modification in the protein scale to improve now.Although in the mammalian genes group, there is the gene dosage that will lack (the Lander et al. that thought more than the past, 2001), but because nucleotide polymorphisms, alternative RNA splicing, rna editing have different protein forms with each gene due to the posttranslational modification is possible.Except regulating the protein function by modifying, ambient signal also causes the chemical modification of protein.The detection of modifying provides a significant opportunity for understanding eukaryotic basic regulation mechanism and diagnosing human disease.
The protein structure based on MS of most common form is measured and is related to utilization " (bottom up) from bottom to top " method: at first use known specific protease digestion whole protein, to produce short peptide fragment (see figure 2).These fragments are purified subsequently and identify with MS.Based on the absolute mass of viewed each peptide fragment, can infer that amino acid forms, and the database that uses searching algorithm and known protein matter the to form protein identity of can deriving.Use this method, conventional detection of modifying on single protein is composed (Biemann andPapayannopoulos, 1994) thereby produce near the peptide of 100% sequence coverage.Certainly this method is in that identify when modifying can leaving certain gaps uncovered by the economic plan, and fragment may experience extra chemical change and therefore fail to provide the information of enough redundancies of relevant urporotein because proteinase is derived.The searching algorithm that is used for this method can support the modification of some types to detect and the location now, and is conventional obtainable (Clauser et al., 1999; Perkins et al., 1999; Wilkins et al., 1999; And Zhang et al., 2000).
Now developing based on the measuring technique of analyzing derived from the peptide fragment of using the trypsinization whole protein and adoring with direct target periodical repair.For example, used multiple program to strengthen phosphorylation and glycosylated detection, contain the modified polypeptides fragment selectivity purifying of the peptide of modifying (for example based on), use the MS detection specificity and modify (for example mark ion of the peptide modified of scanning) or use this two kinds of methods (Goshe et al., 2001 simultaneously as separation; Oda et al., 2001; Steen et al., 2001; Zhou et al., 2001; Ficarro et al., 2002).Finally, described method from bottom to top has been used to detect difference (for example phosphoric acid proteomics) (Oda et al., 1999 from the protein modification spectrum of two kinds of biological samples; Goshe et al., 2001; Odaet al.2001; Zhou et al., 2001; Ficarro et al., 2002; Gerber et al., 2002).Although some in these technology are being exaggerated to be used to analyze hundreds of protein, none can be common to all types of modifications.
A kind of being called,, the alternative method of " (top down) from top to bottom " was developed the modification (see figure 2) that is used for differentiating and identifying whole protein.This method is used tandem mass spectrometry (MS/MS or (MS) n) at first to make the whole protein fragmentation, collect fragment subsequently and make it to carry out the fragmentation and the mass measurement of subsequent passes.Therefore, method is determined the absolute mass of whole protein and protein fragmention from top to bottom.Because whole protein carries out MS, be not left in the basket so do not have structural information in analyzing, therefore method has the potentiality of all modifications that discriminating takes place in whole protein from top to bottom.Method has been used to obtain from nearly decoration information (Kelleher et al., 1998 of 32 protein of 4 kinds of biosomes from top to bottom; Pineda et al., 2000; Reid et al., 2002; Meng et al., 2001).
Method is common to all modifications from top to bottom.Comprised glycosylation (Reid et al., 2002 by the modification of method evaluation from top to bottom at present; Ge et al., 2003), Cys alkylation (Kelleheret al., 1995), disulfide bond form (Ge et al., 2002), oxidation (Ge et al., 2003) and phosphorylation (Meng et al., 2001).The major obstacle of this method is by improvement (Kachman et al., 2002 of protein purification program; Meng et al., 2002), the robotization (Johnson et al., 2002) of Fourier transform MS (FTMS), exploitation (the Belovet al. of four utmost points-FTMS blending apparatus, 2001) and improvement (Reid et al., 2002 of differentiating the software that whole protein is required from the MS/MS data; Meng et al., 2001) and be lowered.Yet, still have major obstacles aspect data processing that is used for identifying fully protein and the retrieval software with modification.
General introduction
In one aspect, the invention provides the method (a method of selecting a set of candidate polypeptides for asample polypeptide) of a kind of selection at a kind of one group of candidate's polypeptide of sample polypeptide, comprise according to the sample peptide fragment mass discrepancy that produces by mass spectroscopy first time selected (refining) that set is carried out to candidate's polypeptide, and the absolute mass of the absolute mass of polypeptide and fragment is selected the second time that set is carried out to candidate's polypeptide per sample.
In second aspect, the invention provides the computer program that is used for computing machine.Described computer program comprises a computer usable medium, is useful on the computer readable program code of selection at a kind of one group of candidate's polypeptide of sample polypeptide in described medium.Described computer program comprises and is used to instruct computing machine to select computer readable program code at a kind of one group of candidate's polypeptide of sample polypeptide, comprise according to the sample peptide fragment mass discrepancy that produces by mass spectroscopy first time selected (refining) that set is carried out to candidate's polypeptide, and the absolute mass of the absolute mass of polypeptide and fragment is selected the second time that set is carried out to candidate's polypeptide per sample.
In the third aspect, the invention provides a kind of system that is used to select at a kind of one group of candidate's polypeptide of sample polypeptide, comprise being used to implement according to the sample peptide fragment mass discrepancy that produces by the mass spectroscopy device of the first time selected (refining) that set is carried out to candidate's polypeptide, being used to implement according to the absolute mass of the sample polypeptide that produces by mass spectroscopy and the absolute mass selected device and the computing machine second time that set is carried out to candidate's polypeptide of fragment.
Definition
Term " fragment (fragments) " and " fragmention (fragment ions) " are used interchangeably when the fragment of the complete polypeptide that refers to be produced by mass spectroscopy in this manual.
" nascent polypeptide (nascent polypeptide) is meant the initial translation product of mRNA to term.
Term " modification " is meant any chemical change of the primary structure of nascent polypeptide at this paper." modification " of protein comprises: (i) in the polymorphism of a codon position, it produces a different aminoacids in the prlmary structure of protein; The (ii) alternative splicing of mRNA transcript or rna editing (editing), it causes producing different primary structures when being translated by montage or editor's mRNA; The (iii) chemical modification behind protein translation, it causes the change of protein molecule quality.Chemical modification is included in the posttranslational modification (proteolysis for example of natural generation in the cell; protein splicing; the removal of N-Met and burst; ribosylation; phosphorylation; alkylation; hydroxylation; glycosylation; oxidation; reduction; myristylation; biotinylation; ubiquitination (ubiquination); iodate; nitrosylation (nitrosylation); amination; sulphur adds; peptide connects; cyclisation; nucleotide adds; fatty acid adds; acyl groupization etc.) and from for the non-endogenic source of biological cell (environmental mutagen for example; chemical carcinogens; it is manually modified etc. that experiment is induced) modification that takes place.
Term " air gun note (shotgun annotation) " is meant the description (for example phosphorylation of serine hydroxyl) to the specific modification of an amino acid residue generation in the polypeptide.Typically, the air gun note can be limited to a specific modification that limits the polypeptide amino acid residue that takes place in the sequence scope (for example at sequence RXXS/TXRX, wherein X is the phosphorylation of the hydroxyl of serine in any amino acid or threonine).The air gun note causes database to be extended to and comprises the protein form that contains the appointment modification.The air gun note comprises the modification of any kind that term used herein " modification " is represented.
Phrase " dynamic embellishment " is meant the variation that produces in software program or the database in carrying out retrieving.
Phrase " dynamically air gun note " is meant the air gun note that produces the protein structure in the database in carrying out retrieving.
Term " expansion (expanding) " is meant in the increase to the protein form quantity in the set after carrying out the air gun note than small set.
Phrase " set of expansion " is meant in the set to the protein form that carries out than small set obtaining behind the air gun note.
Term " selected " is meant after inquiring about than big collection one with sequence mark pattern retrieval or the retrieval of absolute mass pattern, the reduction of protein form quantity in the set.
Phrase " selected set " is meant is inquiring about the afterwards set of the protein form of acquisition to one than big collection with retrieval of sequence mark pattern or the retrieval of absolute mass pattern.
Term used herein " peptide " is meant the compound of being made up of the strand of the D-that links together through peptide bond or L-amino acid or D-and the amino acid whose potpourri of L-.Preferably, peptide contains and is less than 50 amino acid at least on 2 amino acid residues and the length.
Term used herein " polypeptide " is meant that the polymkeric substance of at least two amino acid residues and its contain one or more peptide bond." polypeptide " comprises peptide and protein, and no matter whether this polypeptide has clear and definite conformation.Preferably, polypeptide is naturally occurring protein.
Term used herein " protein " is meant the compound of being made up of the linearly aligned amino acid that is linked to each other by peptide bond, but opposite with peptide, it has clear and definite conformation.Protein is opposite with peptide preferably to contain the chain that 50 or more a plurality of amino acid are formed.Although be pointed out that protein in this article, common sense be that the present invention is applicable to all polypeptide.
Phrase " protein form (protein form) " is meant the polypeptide or the protein of single kind, comprises any modification.Therefore, according to the structure of gene structure, the mRNA that transcribes and the character of any modification, a single-gene codified numerous protein form.
Phrase " RNA montage " is meant the extron RNA sequence of removing at least one RNA intervening sequence and connecting both sides by phosphodiester bond by the phosphodiester bond cracking of two non-adjacent phosphodiester bonds in the given RNA.
Phrase " rna editing (RNA editing) " is meant the change of the nucleotide of RNA sequence in forming, and at least one nuclear base of the RNA that wherein transcribes has the specific nuclear base replacement of different hydrogen bondings by one.The peptide sequence (for example by due to the importing terminator codon) of peptide sequence of resulting RNA codified polymorphism of being edited, prolongation (for example by due to elimination terminator codon or the importing initiation codon) or brachymemma.
Phrase " RNA processing " is meant any reaction of the covalent modification that causes the RNA sequence." RNA processing " comprises RNA montage and rna editing.
Phrase " search modes " is meant the method for differentiating and retrieve candidate albumen matter form from a depot data bank.
Phrase " sequence mark (sequence tag) " is meant the short end sequence that at least two continuous amino acids of a peptide fragment are formed, and it can be inferred from the mass discrepancy by two related fragment of the polypeptide of mass spectroscopy generation.
Term used herein " structure " is meant the one-level amino acid sequence of protein when being used for protein, comprise modification.Term used herein " structure " and phrase " primary structure " have identical meanings.
Phrase " depot data bank (warehouse database) " is meant the set of two or more protein forms.
Brief description of the drawings
Fig. 1 has described the process flow diagram that the absolute mass pattern search program that uses the MS data obtains the system architecture of candidate albumen matter.
Fig. 2 illustrates and is used for carrying out through MS that protein is differentiated and " from top to bottom " and " from bottom to top " method of identification of proteins, wherein can differentiate and locatees modification (for example posttranslational modification (" PTM ")).
Fig. 3 has described the method flow diagram that the mixed index mode method is learned.
Fig. 4 is the software systems process flow diagrams, and these software systems comprise the depot data bank (ProSight PTM Warehouse) and the main tool (primary utilities) of a searching algorithm (ProSightRetriever), protein form.
Fig. 5 shows an embodiment, and wherein database is retrieved with " Delta m " pattern
Fig. 6 shows the synoptic diagram of air gun note.
Fig. 7 shows at the MS/MS embodiment from the ALS-PAGE/RPLC fraction of saccharomyces cerevisiae (S.cerevisiae).
Describe in detail
The present invention has utilized the discovery of mixed index mode method and software platform to determine to comprise the protein structure of modification. The mixed index mode method of structure that is used for determining containing the protein of modification is learned the combination of retrieving with a kind of sequence mark pattern retrieval and one or more absolute mass pattern and is selected candidate's polypeptide of a selected series to obtain the sample polypeptide. This methodology and related software platform are as described below.
The mixed index mode method is learned (Hybrid searching mode methodology)
The mixed index pattern is combined (seeing Fig. 3) with the sequence distinguishing ability of sequence mark retrieval and modification detection and the evaluation power of absolute mass retrieval. This mixed method has represented the method for gathering than the previous more effective selected protein that may reach with sequence mark or absolute mass search method separately. In mixed index, sequence mark gathers from fragmentation data and the set of candidate albumen matter. Candidate albumen matter can be derived from depot data bank. The character of each modification and the position in protein thereof use the absolute mass method of the quality of being devoted to whole protein ion and fragment ion to determine subsequently. Any quality of disregarding in the Theoretical Mass of protein form is attributable to the existence of modifying in whole protein or the protein fragment usually.
Preferably, the database of protein form is comprised of large protein set at first. Preferably, the not sequence information of note is contained in the original date storehouse. Preferably, this database forms the initial set of candidate's polypeptide. In a preferred embodiment, the sequence mark retrieval is with the selected candidate albumen matter set that is comprised of the polypeptide of unmodified. Randomly, the set of candidate albumen matter can be expanded to consider to modify with the note of candidate's polypeptide subsequently. Preferably, after the sequence mark retrieval, carry out the absolute mass pattern in this set and retrieve to obtain final candidate's polypeptide set. If the set after selected only contains a kind of protein form, then the absolute mass search modes is differentiated the modification in the protein uniquely.
The mixed index mode method is learned and is always adopted a kind of sequence mark pattern retrieval, is at least a absolute mass pattern retrieval subsequently. Randomly, the retrieval of absolute mass pattern can be before the retrieval of sequence mark pattern. For example, a kind of " three stages " retrieval can be carried out with the mixed index pattern. This method uses the initial absolute mass of fragment to utilize non-strict search argument (for example bottom line is considered to modify or large mass accuracy tolerance or both) to differentiate collection of candidate sequences, to retrieve with selected collection of candidate sequences for the sequence mark pattern subsequently. Then carry out the retrieval of absolute mass pattern with further selected set.
Software platform (Software platforms)
Described computer software and system, they comprise depot data bank and other instrument (seeing Fig. 4) of searching algorithm, protein form. The searching algorithm support is based on b/y and/or the retrieval of c/z ion and the sequence mark retrieval of the absolute mass value of the fragment ion of observing. The depot data bank of protein form can comprise note and decoration information note. Other useful facility comprises data management system, ion fallout predictor, data reduction instrument and figure viewer interface tool (graphical viewer interface tool).
Searching algorithm (Retrieval algorithm)
Made up the mixed index method of sequence mark search modes and absolute mass search modes by use, searching algorithm has promoted to comprise the discriminating from top to bottom of the protein of decoration information. Referring to Fig. 3, at first the MS data for whole protein and the protein fragment ion that produces that obtain are carried out the sequence mark retrieval and inquisition of the depot data bank of protein form. In the sequence mark retrieval, the user determines the partial sequence of protein based on the fragment ion mass discrepancy. When producing sequence mark, provide the amino acid whose support with identical nominal mass value (for example, Ile and Leu; Lys and Gln). A kind of figure that produces all possible sequence mark that representative data may contain that carries out. Then analyze this figure to produce the Rule Expression for each sequence mark that is represented. Subsequently people can with this partial sequence information never the database of the protein sequence of note select candidate albumen matter. Randomly, the user can retrieve with the sequence mark collection of manual tabulation. Each candidate sequence is accepted a score value, and the length of all sequences mark of this score value by will mating this sequence multiplies each other and calculates. For convenience's sake, the sequence of only selecting to have the score value higher than the tolerance of regulation is exported as data.
When retrieval is when carrying out with the sequence mark search modes, the sequence mark of note is not supported usually. This is reasonably because a unlikely sequence mark with decorating site overlapping and because if consider a given note sequence mark gather in producible might the modification, then the diagrammatic representation meeting complicated of data. Use this restriction, can implement strong linear retrieval (robust linear searches) for the acceptable performance measurement of search function (for example to obtain at Protein Data Bank, for actual queries (real queries), retrieval time is typically below 3 running times in second).
Randomly, the absolute mass search modes of a kind of delta of being called M pattern (" Δ m pattern ") by considering input complete MW value and database in mass discrepancy between the theoretical value of including so that can retrieve the protein (seeing Fig. 5) of the modification of carrying a unknown character or quality. If about ± 1 Da retrieves with complete quality error, then can produce mass accuracy difference. The degree of accuracy of Δ m value also is ± 1 Da, and the fragment ion degree of accuracy can be the umber (ppm) in per 1,000,000 parts. According to selected input setting, Δ m value can the vicissitudinous degree of accuracy.
The depot data bank of protein form (Warehouse database of protein forms)
Use all identification algorithms of top-down methods from a database, to select collection of candidate sequences at first. The protein of note form not can be used as the FASTA file and derives from the world public database, such as SWISS-PROT, and GenBank etc. These databases can be detected so that people are created as the special required protein form depot data bank of specific project. Preferably, the PERL script is used to the FASTA file is changed into the file that is easy to assemble depot data bank. When the FASTA file is converted, to the amino acid no that adds from the basic sequence of FASTA file in necessary information such as the calculating of average and single isotopic mass and the sequence.
The air gun note of depot data bank (Shotgun annotation of the warehouse database)
Can hinder it to differentiate in view of lack correct protein form in database, with the data warehouse of RESID name establishment note sequence, RESID is the authoritative database (Garavelli, 2003) of known modified types. Have the protein form database so that people consider to be represented by the appearance of the sequence motifs of uniqueness the known and modification of inferring. This method purpose is from the known protein of protein form database its discriminating of carrying out to be connected (seeing Fig. 6) with the partially or completely evaluation of protein form and by retrieval.
Can be in database the posttranslational modification event of note comprise that N-is terminated acetylated, signal peptide prediction, phosphorylation, acyl (lipoylation), GPI grappling, ribosylation, alkylation, hydroxylating, glycosylation, oxidation, reduction, myristylation, biotinylation, ubiquitination, nitrosylation, amination, sulphur interpolation, peptide connection, cyclisation, nucleotides interpolation, aliphatic acid interpolation, acyl group, proteolysis etc. (the 150-200 kind posttranslational modification (Garavelli, 2003) of having an appointment is known and can be considered to note for polypeptide). People can obtain to modify note from public database such as SWISS-PROT maybe will modify the manual input of note depot data bank.
Preferably, each depot data bank has three tables that mix gene attribute, protein form attribute and modify attribute. The gene attribute comprises the detailed description of gene authentication information and gene structure. The protein form attribute comprises mark such as burst, the initial methionine etc. of gene discriminating, protein form discriminating, single isotopic mass, average quality, amino acid no, any known attribute. Modify attribute and comprise modification (RESID) discriminating, average quality, single isotopic mass and RESID encoded attributes.
The main task of depot data bank is the inquiry of being responsible for processing from searching algorithm. Preferably, searching algorithm is always based on quality (average or single isotopic mass) inquiry depot data bank. Therefore, thus database should carry out index and should repay rapidly the speed that corresponding sequence does not reduce whole system with quality. The table of protein form contains the required most information of searching algorithm. Because the table of protein form has contained the annotated sequence of institute and quality, so people can obtain from database the rapid response from the inquiry of searching algorithm.
Although decorating site can be through the theory prediction from the hereditary information of protein, what usually wish is to form annotations database with all potential possible notes. These notes are included and will be produced from awkward database retrieval time of its shearing size and prolongation.
In case searching algorithm has been differentiated selected candidate albumen matter set based on the sequence mark search program, then can produce the expanded set that contains for all possible note of those specified proteins. This modification of depot data bank can not weaken the performance of searching algorithm, because retrieval and inquisition is restricted in the small set of possible protein form. Therefore, the dynamic air gun note of depot data bank can be included in the mixed index method. In case this set protein material standed for has been produced final candidate's polypeptide and the set of relevant modifications thereof by selected, the air gun note of then dynamically being inputted depot data bank can be cancelled before another sample polypeptide is identified.
Ion fallout predictor (Ion Predctor)
The theoretical b/y of ion predictor predicts and c/z ion, and be included in software and the system. These calculating can be used for the error of calculation, with the expression of the umber (parts-per-million) in dalton or per 1,000,000 parts (for example seeing embodiment 1, Table I).
Data reduction instrument (Data reduction tool)
Comprise data reduction instrument in software and the system, in order to remove from the fragmentation data of reduction from the redundant peak of multiple state of charge and water/ammonia forfeiture generation. The MS data that this class instrument is used in acquisition be used to before the searching algorithm rapid analysis they.
Data management system (Database management system)
Any data management system all can be used for depot data bank.Preferably, data management system comprises MySQL.This ADABAS has the support facility and the API of many practicalities, and this system is that the public is easy to obtain.The software that provides in the annex uses Version11.18 distribution 3.23.52 MySQL for Linux.
Figure viewer interface tool (Graphical viewer interface tool)
In all search methods, collection of candidate sequences is endowed different score values and repays.The figure viewer interface tool that is used for observing derived from the collection of candidate sequences of all search methods is included in software and system.Randomly, figure viewer interface tool is included in the local work station, and this workstation comprises further feature of the present invention.Randomly, figure viewer interface tool is suitable for observing the data that obtain from remote server through the Internet.
For the retrieval of absolute mass pattern, b (or c) the type number of ions of gene description, sequence, sequence length, Theoretical Mass, mass discrepancy (absolute and ppm), coupling, y (or z) the type number of ions of coupling, the fragment sum of coupling and the probable value of calculating are offered the user.The user can be subsequently by many new lines of listing (header) to candidate albumen matter sets classification and observe the fragmentation particulars of the sequence of any retrieval.The fragmentation particulars are observed the details of each fragment that offers subscriber-related and this sequences match.This observation provides the quality, Theoretical Mass, simple mass discrepancy (promptly before considering any mass shift, as by with the derivation of " delta M " pattern) of ion, the observation of discriminating and the mass discrepancy of displacement (promptly in having considered " delta M " pattern after the mass shift) and the difference of the displacement represented with 1,000,000/umber.Figure viewer interface tool also allows the visual of fragmentation particulars, and this is to be used for determining that sequence coverage and identification fragmentation pattern are to increase the useful feature of user to the degree of confidence of correct discriminating.
The database that is supported (Databases supported)
Supporting database can be configured to any biosome.An embodiment support is used for the database of 9 kinds of biosomes, these 9 kinds of biosomes comprise: saccharomyces cerevisiae (Saccharomycescerevisiae), dust Xi Shi Escherichia coli (Escherichia coli), arabidopsis (Arabidopsisthaliana), bacillus subtilis (Bacillus subtilis), Methanococcus jannaschii (Methanococcusjannaschii), mycoplasma pneumoniae (Mycoplasma pneumoniae), husky thunder bacterium (Shewanella oneidensis), house mouse (Mus musculus) and people (Homosapiens).Yeast bio Saccharomyces cerevisiae database contains maximum notes, has decoration information known and prediction.
Database scalability (Database Scalability)
Interested especially is how to amplify with the increase of decoration information database and retrieval time.One given gene produces exponential protein form with the modification set of inferring, and wherein each form contains the Asia set of possible modification.Therefore along with n kind protein and the possible processing incident of each protein m kind, an embodiment comprises and contains O (n2 m) database of protein form.In view of searching algorithm depending on the constant operation O (mlog 2n) of complete tolerance, the absolute mass searching algorithm is with respect to m linear amplification almost.By means of known and infer the database of protein form, can differentiate and identify observed protein form, condition is that some modifications are correctly predicted.The increase of false information can make some indeterminate based on the retrieval of rareness (sparse) MS/MS data in public's accessible database.But the fragmention mass number of coupling will increase along with more extensive and accurate decoration information used in query steps.
Computer interface with mass spectrometer
Randomly, each assembly be installed in the computer system with the mass spectrometer communication.In one embodiment, computing machine is a local work station.In another embodiment, computing machine is at the scene a server (server located off-site) not.In one embodiment of back, assembly can be stored on the server and use based on the interface tool of Internet and visit.The MS data that produce from mass spectrometer are passed to the computing machine to be used for data acquisition and storage.The central processing unit of computing machine coordinates to use that the searching algorithm of operation in a preferred embodiment carries out to the analysis of the MS data of gathering with retrieval protein form depot data bank.The tolerance of operator's regulation the option that is provided by searching algorithm software is provided modifies with further analysis so that collect the protein material standed for from the protein form depot data bank.
Medical application
People can distinguish the influence of ambient signal to the degree of modification on the particular target protein in the body.For example, many human disease states are regulated as phosphorylation by modifying.People can diagnose outer genetic disease, and it relates to the change based on modification of the specific gene in the family.Special protein can be measured with find uncommon modification exist and provide to may with new the seeing clearly of the bad morbid state of the correlativity of change in the known sequence.Therefore this system provides the strong platform that is used to screen disease or the individuality of suffering from the specified disease tendency is arranged.
When the modification of individual proteins changes when involving in the aetiology of disease, system can be fabricated and be used for research equipment and find control or regulate the discovery that specified protein is added or removes the medical compounds of modification promoting.In an embodiment disclosed herein, system is as a complete assemblies of a high flux screening strategy and implement, and the ability of the modification on the enzymatic specified protein substrate that wherein promotion of the combinatorial libraries of candidate drug compounds or inhibition are relevant with modification activities is estimated.With exist (or not the existing) of whether modifying in the MS query protein substrate.Compound with desirable pharmacological effect can be used to the second level drug development program at specified disease subsequently.
System can be fabricated and be used for that clinical practice is added with Evaluation and Control or the modification of regulating specified protein or the effect of the medical compounds removed.In one embodiment, system can be used to determine from patient's sample whether special protein carries the modification of replying drug quality.For example, interested target protein can be purified to homogeneous from preparation from the lysate of patient's sample, and carries out MS/MS according to method described herein, software and system and analyze.The MS data that derive from sample protein matter have the difference that its all natural air guns modify the respective egg white matter form of notes and will obtain easily with respect to contained in the depot data bank, and meaningful for the pharmaceutically active of therapeutic scheme.
The technician of the technical field of the invention can understand that the present invention can be used to detect the multiple modification in the protein, and no matter what its mechanism is.For example, people can use the present invention to differentiate and identify the existence to the chemical modification of influence, posttranslational modification and the environmental induction of gained protein sequence of the RNA montage of position, mRNA of polymorphism on the simple protein or rna editing.In addition, those skilled in the art understand that the mixed index methodology makes any biology incident or the bioinformatics out of true of the mass discrepancy between the polypeptide can detect the polypeptide form that is created in theoretical prediction and actual measurement.
ProSight PTM: software and structure
Annex comprises CD dish, and it provides the depot data bank of the protein form of implementing all required Software tools of aspect disclosed herein and embodiment and sample note.The system that is called " ProSight PTM " is an embodiment preferred.This system comprises 4 primary clusterings, and all have the interface based on the Internet: (ProSightWarehouse, database retrieval algorithm (Retriever), data management system, plan tracker and other useful tool (are seen Fig. 4 to Protein Data Bank; Taylor et al., 2003).
The task of prescribing a time limit, as: database search and score, utilize OO design to write with C Plus Plus on Linux, and utilize the iODBC storehouse to carry out the database connection.
Employing (selecting at language performance) OCaml comes write data reduction instrument and utilizes the GD module of drawing image (rendering images) to write visualization tool with PERL.
Use the absolute mass retrieval need on the data base management system (DBMS) that ODBC activates, carry out ProSight Warehouse.The Internet uses to use provides the CGI of service to write PERL by the Apache http server that moves on dual processor Athlon 2200+MP.
Embodiment
Disclose some embodiments, the MS/MS that has specifically illustrated the modification relevant with saccharomyces cerevisiae 36-kDa protein analyzes, and this protein is differentiated to be GAPD 3 type enzymes after a while.Although used Q-FTMS, can replace the mass spectrometric data that derive from any kind about whole protein.The database policies of describing is at the retrieval score value of the desirable improvement of application-specific that is about to carry out and modifies identification rate and use known and decoration information that infer.
Embodiment 1: a kind of robotization top-down analysis of unartificial yeast protein
In a kind of ALS-PAGE/RPLC fraction, observe a kind of M rValue is 35, the yeast protein of 758.3Da (Fig. 7 A).In same sample, also have 3 kinds of other compositions, one of them corresponding to a kind of phosphoric acid addition product that is attached to this 35.8-kDa material (+98Da).Online deconvolution algorithm (on-line deconvolution algorithm) is chosen this 35.8-kDa protein and is produced suitable SWIFT waveform to select 5 kinds of state of charge shown in the output map 7B.Use the IR laser instrument, produce the MS/MS spectrum of Fig. 7 C automatically, it has 39 kinds of isotopic distribution corresponding to 27 kinds of discrete fragmention mass values that detected automatically by the THRASH algorithm.After wave filter is removed ghost peak (spurious peaks) (for example dehydration peak), use 20 kinds of mass of ions as the final input that is used for database retrieval.This protein is differentiated to be glyceraldehyde-3-phosphate dehydrogenase (GAPDH3), and it has the y-type ion (Table I and Table II) of 9 b-type ions and 3 couplings.The P value of this retrieval is 4 * 10 -8, show that this discriminating unlikely is a false incident.
The fragment ion data of Table I GAPDH3 (SEQ ID NO:1) 1
Ion Observe quality (Da) Theoretical Mass (Da) Error (Da) Error (PPM)
B26 3072.81 3072.8 0.02 5
B29 3143.85 3143.83 0.02 6
B30 3256.91 3256.92 0 -1
B31 3370.98 3370.96 0.02 5
B32 3486.01 3485.99 0.02 6
B33 3583.06 3583.04 0.02 6
B34 3730.12 3730.11 0.01 3
B82 9227.73 9227.75 -0.02 -3
B89 9955.03 9955.08 -0.06 -6
Y52 5733.78 5733.83 -0.05 -9
Y53 5832.82 5832.9 -0.08 -13
Y139 14810.62 14810.81 -0.19 -13
1GAPDH3 has 331 amino acid; Theoretical Mass 35,615.5Da; Δ m 142.8Da
The diagram fragment patterns stored of Table II: GAPDH3 (SEQ ID NO:1) 1
Figure A20058000709200251
1The Cys residue of underscore is differentiated to contain the residue that acrylamide is modified.Symbol
Figure A20058000709200252
Be meant the fragmention that amino is derived, and symbol
Figure A20058000709200253
Be meant the fragmention that carboxyl is derived.
This gene outcome (GAPDH3; SEQ ID NO:1) successfully distinguished with other member GAPDH2 (SEQ ID NO:2) and the GAPDH1 (SEQ IDNO:3) of GAPDH gene family, itself and they have 96% and 80% sequence homogeny respectively.These data also pick out this protein form from inconsistent by of ExPASy report, wherein 3 differences only in 331 amino acid.In addition, the observation molecular mass of GAPDH3 gene outcome compares the big 142Da of theoretical value that the sequence (not having initial Met) from database calculates.Fragment patterns stored is positioned at Asp with this mass discrepancy (Δ m) 90And Asp 192Between, at this sequence of interval two Cys residue (Cys are only arranged 149And Cys 153) (seeing Table II).
Use the outer ion collision of manual Q-FTMS/MS and superconducting magnet dissociate to this protein form carry out subsequently inquire after the spectrogram that has produced Fig. 7 D, it has 98 kinds of isotopic distribution.Use these data further described+142 Da Δ m to be limited to Pro as the input data of searching algorithm 126-Leu 154The zone.These data and two Cys residues during gel electrophoresis by the acrylamide alkylation (each+71Da) conform to.Although accurately be not positioned Cys 149And Cys 153But the interior modification of this gel has some precedents and expection is applicable to based on the free mercaptan in the fractionated of PAGE.Therefore, holistic approach relates to initial usefulness method detection from top to bottom covalent modification.
In view of absolute mass is linearly dependent on the candidate sequence number of being scored retrieval time, less complete tolerance is quickened retrieval time.The simple retrieval that yeast is carried out with ± 2-kDa tolerance was finished with 400 milliseconds for 200 material standed fors and carry out identical retrieval with the 200-Da tolerance for 6 seconds 1500 material standed for times spent.The sequence mark number that mixed index is linearly dependent on FASTA docuterm number and is considered.Retrieve in 4 seconds with 5 sequence marks and to finish., the absolute mass of half fragmention that can use observe with searching algorithm is arranged approximately and differentiate by the yeast protein of fragmentation at present.For remaining, have 20% can be differentiated via the sequence mark of the generation of the relative mass difference between the observed fragmention.In the sequence mark pattern, the robotization of Fig. 7 C data gathers and produces 4 marks (two genuine, two vacations, each length is 4 amino acid).The fragmention that the compilation of sequence mark is limited to identical charges only provides 2 correct marks.With the data of Fig. 7 D, through state of charge restriction, there are 5 to be false (length: a 1-4 amino acid) in 8 marks, there are 4 to be false (length: a 1-3 amino acid) in 6.
Embodiment 2: the compound (contemplated embodiments) of the enzyme with modification activities is regulated in screening
The purpose of present embodiment is that the high flux strategy of regulating the compound of the enzyme of showing modification activities in the positive or negative mode is differentiated in general introduction from combinatorial libraries.Although certain embodiments is described in external environment, make this embodiment adapt to interior application of body and recognize easily.
People Src kinases cancer protein (UpState Biotechnology, the Inc. that will contain the recombinant forms of the terminal histidine mark of N-; Lake Placid is NY) at Src kinase buffer liquid (100mM Tris-HCl (pH 7.2), 125mM MgCl 2, 25mM MnCl 2, 2mM EGTA, 500 μ M ATP, 0.25mM sodium orthovanadate and 2mM dithiothreitol (DTT)) in be fixed in the 96 hole wares with Ni-NTA resin bag quilt.After adding was dissolved in test compounds in the Src kinase buffer liquid, a kind of homogenization compound in preferably every hole added the Src protein substrate (concentration is 100-300 μ M) of known array so that its phosphorylation in every hole.Behind the incubation, reclaim substrate and carry out mass spectrophotometry from top to bottom with ProSight PTM system.
The ability that specific compound suppresses the Src activity shows by there not being the modification relevant with phosphorylated tyrosine in the protein.This compounds is suitable for further identifying to confirm described top-down analysis with other mensuration.For example can in mensuration, use [γ- 32P] ATP and the TCA precipitation that is used on the P81 paper measure the monitoring phosphorylation activity.
Embodiment 3: genetic disease (contemplated embodiments) outside detecting in individuality
The purpose of present embodiment is to confirm that ProSight PTM system is for the purposes of using the modification that Mass Spectrometer Method is relevant with outer genetic disease from top to bottom.From chicken that infects with avian sarcomata virus and the chicken that does not infect, obtain sample tissue.Sample homogenization is also clarified to produce the solubility lysate.With anti-γ-catenin antibody affinity purification γ-catenin (substrate in a kind of known kinase whose body of fowl Src from lysate.Analyze the γ-catenin sample that reclaims with top-down mass spectrophotometry and ProSightPTM then.To be γ-catenin of reclaiming from normal structure be stored in the normal modification pattern of the protein form the ProSight depot data bank with displaying to expected result, and the γ-catenin that reclaims from the chicken that infects will comprise the extra modification relevant with tyrosine phosphorylation.
Embodiment 4: the experimental arrangement of embodiment 1-3
Cell culture and lysate fractionated
Brewing yeast cell (bacterial strain S288C) is under anaerobic cultivated.About 2g cell (wet quality) is resuspended in the 10mL lysis buffer (25mM Tris, 1mM EDTA, 1mM TCEP, pH7.0, adding 1mL DNAase) that contains two protease inhibitors.After French press cracking, cell fragment is clarified through centrifugal 30 minutes of 10,000 * g.Then supernatant is mixed with sour unstable surfactant (ALS) sample buffer, last sample to 491 type prepares in the gel device (Bio-Rad), replaces 0.1%SDS with 0.1%ALS-I.The 4%T stacking gel is differentiated gel with the 12%T with 0.50mL/min flow velocity wash-out and is used.In 80 collected fractions (each fraction 2mL), there are 2 fractions to be further processed, i.e. cold acetone precipitation, resuspended and use symmetrical 300 C4 post (4.6 * 50mm in 6M guanidine hydrochloride (pH2); Waters Inc., Milford is MA) with standard solvent (H 2O, CH 3CN, and 0.1%TFA) carry out reversed-phase liquid chromatography 15 minutes internal linear gradients.
The ESI-Q-FTMS device
The protein of dry RPLC fractionated also is resuspended in 80 μ L ESI solution (50%ACN, 49%H 2O and 1% formic acid) in, (Advion BioSciences, Ithaca NY), analyze 5-10 μ L sample with~100nL/min to application of sample in receiving stream spraying robot ' (nanospray robot) then.The 8.5-T Q-FTMS instrument that uses in this research such as its place are stated in intra-company and are made up.In brief, before in ICR pond the most finally, analyzing, the protein ion at first is stored in the ends of the earth (octopole), shifts by four utmost points (quadrupole) then, in second ends of the earth, gather afterwards.Four extremely can select or " fi-only " mode operation with quality.The automatized script that writes among the Tc1 has obtained the intact proteins mass spectrum, calls online deconvolution algorithm subsequently to calculate M rValue, and SWIFT separates 5 state of charge the abundantest (charge state).Behind the state of charge that 5 scanning separates, start the IR laser instrument, carry out 25 or 50 scannings (0.45s, 75% power, 40-W laser).By collision when special state of charge is shifted into second ends of the earth from four utmost points dissociate they and the Q-FTMS/MS spectrum of the manual Fig. 7 of acquisition D.
List of references
Belov ME,Nikolaev EN,Anderson GA,Auberry KJ,Harkewicz R,Smith RD.″Electrospray ionization-Fourier transform ion cyclotron massspectrometry using ion preselection and external accumulation forultrahigh sensitivity,″J.Am.Soc.Mass Spectrom.12:38-48(2001).
Biemann K,Papayannopoulos I.Acc.Chem.Res.27:370-78(1994).
Clauser KR,Baker P,Burlingame AL.″Role of accurate massmeasurement(+/-10ppm)in protein identification strategies employingMS or MS/MS and database searching,″Anal.Chem.71:2871-82(1999).
Ficarro S,McCleland M,Stukenberg P,Burke D,Ross M,Shabanowitz J,Hunt D,White F.″Phosphoproteome analysis by massspectrometry and its application to Saccharomyces cerevisiae,″Nat.Biotechnol.20:301-305(2002).
Garavelli,JS.″The RESID Database of Protein Modifications:2003developments,″Nucleic Acids Res.31:499-501(2003).
Ge Y,Lawhorn BG,ElNaggar M Strauss E,Park JH,Begley TP,McLafferty FW.″Top down characterization of larger proteins(45kDa)by electron capture dissociation mass spectrometry,″J.Am.Chem.Soc.124:672-78(2002).
Ge Y,ElNaggar M,Sze SK,Bin OH,Begley TP,McLafferty FW,Boshoff H,Barry CE.J.Am.Soc.Mass Spectrom.14:253-61(2003).
Gerber SA,Rush J,Stemmann O,Steen H,Kirschner MW,Gygi SP.In:50th ASMS Conference on Mass Spectrometry and Allied Topics,Orlando,FL,2002.
Goshe MB,Conrads TP,Panisko EA,Angell NH,Veenstra TD,Smith RD.″Phosphoprotein isotope-coded affinity tag approach forisolating and quantitating phosphopeptides in proteome-wide analyses,″Anal. Chem.2001,73:2578-86(2001).
Johnson JR,Meng F,Forbes AJ,Cargile BJ,Kelleher NL.″Fourier-transform mass spectrometry for automated fragmentation andidentification of 5-20kDa proteins in mixtures,″Electrophoresis23:3217-23(2002).
Kachman MT Wang H,Schwartz DR,Cho KR,Lubman DM.″A 2-Dliquid separations/mass mapping method for interlysate comparison ofovarian cancers,″Anal.Chem.74:1779-91(2002).
Kelleher NL,Costello CA,Begley TP,McLafferty FW.J.Am.Soc.Mass Spectrom.6:981-84(1995).
Kelleher NL,Taylor SV,Grannis D,Kinsland C,Chiu HJ,Begley TP,McLafferty FW.″Efficient sequence analysis of the six gene products(7-74kDa)from the Escherichia coli thiamin biosynthetic operon bytandem high-resolution mass spectrometry,″Protein Sci.7:1796-1801(1998).
Lander ES et al.″Initial sequencing and analysis of the humangenome,″Nature 409:860-921(2001).
MacCoss MJ McDonald WH,Saraf A,Sadygov R,Clark JM,TastoJJ,Gould KL,Wolters D,Washburn M,Weiss A Clark JI,Yates JR.,III.″Shotgun identification of protein modifications from protein complexesand lens tissue,″Proc.Natl.Acad.Sci.U.S.A.99:7900-7905(2002).
Meng F,Cargile BJ,Miller LM,Forbes AJ,Johnson JR,Kelleher NL.″Informatics and multiplexing of intact protein identification in bacteriaand the archaea,″Nat.Biotechnol.19:952-57(2001).
Meng F,Cargile BJ,Patrie SM,Johnson JR,McLoughlin SM,Kelleher NL.″Processing complex mixtures of intact proteins for directanalysis by mass spectrometry,″Anal.Chem.74:2923-29(2002).
Oda Y,Huang K,Cross FR,Cowburn D,Chait BJ,″Accuratequantitation of protein expression and site-specific phosphorylation,″Proc.Natl.Acad.Sci.U.S.A.96:6591-96(1999).
Oda Y,Nagasu T,Chait BT.″Enrichment analysis of phosphorylatedproteins as a tool for probing the phosphoproteome,″Nat.Biotechnol.19:379-82(2001).
Perkins D,Pappin D,Creasy D,Cottrell J.″Probability-based proteinidentification by searching sequence databases using mass spectrometrydata,″Electrophoresis 20:355l-67(1999).
Pineda FJ,Lin JS,Fenselau C,Demirev PA.″Testing the significanceof microorganism identification by mass spectrometry and proteomedatabase search,″Anal.Chem.72:3739-44(2000).
Reid GE,Shang H,Hogan JM,Lee GU,McLuckey SA.″Gas-phaseconcentration,purification,and identification of whole proteins fromcomplex mixtures,″J.Am.Chem.Soc.124:7353-62(2002).
Reid GE,Stephenson JL,McLuckey SA.″Tandem massspectrometry of ribonuclease A and B:N-linked glycosylation siteanalysis of whole protein ions,″Anal.Chem.74:577-83(2002).
Steen H,Kuster B,Fernandez M,Pandey A,Mann M.″Detection oftyrosine phosphorylated peptides by precursor ion scanning quadrupoleTOF mass spectrometry in positive ion mode,″Anal.Chem.73:1440-48(2001).
Taylor GK,Kim YB,Forbes AJ,Meng F,McCarthy R,Kelleher NL″Web and database software for identification of intact proteins using topdown mass spectrometry,″Anal.Chem.75:4081-86(2003).
Wilkins MR,Gasteiger E,Gooley AA,Herbert BR,Molloy MP,BinzPA,Ou K,Sanchez JC,Bairoch A,Williams KL,Hochstrasser DF.″High-throughput mass spectrometric discovery of proteinpost-translational modifications,″J.Mol.Biol.289:645-57(1999).
Zhang W,Chait B.″ProFound:an expert system for proteinidentification using mass spectrometric peptide mapping information,″Anal.Chem.72:2482-89(2000).
Zhou H,Watts JD,Aebersold R.″A systematic approach to theanalysis of protein phosphorylation,″Nat.Biotechnol.19:375-78(2001).
Sequence table
<110〉University of Illinois council
<120〉use new database search modes to differentiate and identification of protein
<130>ILL06-046-US
<140>10/794,431
<141>2004-03-05
<160>1
<170>PatentIn Ver.3.2
<210>1
<211>331
<212>PRT
<213>Saccharomyces cerevisiae
<400>1
Val Arg Val Ala Ile Asn Gly Phe Gly Arg Ile Gly Arg Leu Val Met
1 5 10 15
Arg Ile Ala Leu Ser Arg pro Asn Val Glu Val Val Ala Leu Asn Asp
20 25 30
Pro Phe Ile Thr Asn Asp Tyr Ala Ala Tyr Met Phe Lys Tyr Asp Ser
35 40 45
Thr His Gly Arg Tyr Ala Gly Glu Val Ser His Asp Asp Lys His Ile
50 55 60
Ile Val Asp Gly Lys Lys Ile Ala Thr Tyr Gln Glu Arg Asp Pro Ala
65 70 75 80
Asn Leu Pro Trp Gly Ser Ser Asn Val Asp Ile Ala Ile Asp Ser Thr
85 90 95
Gly Val Phe Lys Glu Leu Asp Thr Ala Gln Lys His Ile Asp Ala Gly
100 105 110
Ala Lys Lys Val Val Ile Thr Ala Pro Ser Ser Thr Ala Pro Met Phe
115 120 125
Val Met Gly Val Asn Glu Glu Lys Tyr Thr Ser Asp Leu Lys Ile Val
130 135 140
Ser Asn Ala Ser Cys Thr Thr Asn Cys Leu Ala Pro Leu Ala Lys Val
145 150 155 160
Ile Asn Asp Ala Phe Gly Ile Glu Glu Gly Leu Met Thr Thr Val His
165 170 175
Ser Leu Thr Ala Thr Gln Lys Thr Val Asp Gly Pro Ser His Lys Asp
180 185 190
Trp Arg Gly Gly Arg Thr Ala Ser Gly Asn Ile Ile Pro Ser Ser Thr
195 200 205
Gly Ala Ala Lys Ala Val Gly Lys Val Leu Pro Glu Leu Gln Gly Lys
210 215 220
Leu Thr Gly Met Ala Phe Arg Val Pro Thr Val Asp Val Ser Val Val
225 230 235 240
Asp Leu Thr Val Lys Leu Asn Lys Glu Thr Thr Tyr Asp Glu Ile Lys
245 250 255
Lys Val Val Lys Ala Ala Ala Glu Gly Lys Leu Lys Gly Val Leu Gly
260 265 270
Tyr Thr Glu Asp Ala Val Val Ser Ser Asp Phe Leu Gly Asp Ser His
275 280 285
Ser Ser Ile Phe Asp Ala Ser Ala Gly Ile Gln Leu Ser Pro Lys Phe
290 295 300
Val Lys Leu Val Ser Trp Tyr Asp Asn Glu Tyr Gly Tyr Ser Thr Arg
305 310 315
Val Val Asp Leu Val Glu His Val Ala Lys Ala
325 330

Claims (33)

1. a selection comprises at a kind of method of one group of candidate's polypeptide of sample polypeptide:
Difference according to the sample peptide fragment quality that is produced by mass spectroscopy is selected the first time that set is carried out to candidate's polypeptide; And
The absolute mass of the absolute mass of polypeptide and described fragment is selected the second time that set is carried out to candidate's polypeptide per sample.
2. the process of claim 1 wherein the selected partial amino-acid series at least of determining the sample polypeptide according to the difference of fragment quality that comprises for the first time.
3. the method for claim 2 further comprises:
Determine the absolute mass of sample polypeptide of complete form and the absolute mass of sample peptide fragment.
4. the method for claim 2 further comprises:
Comprised a depot data bank by selected set; With
Described partial amino-acid series at least based on the sample polypeptide is selected candidate's polypeptide from described depot data bank.
5. the Method for Primary Structure of a definite sample polypeptide comprises:
Select one group of candidate's polypeptide with the method for claim 1;
By the absolute mass of sample polypeptide and the theoretical absolute mass data of candidate's polypeptide are compared the probable value that obtains mating; With
By to matching probability value ordering and based on differentiating the primary structure of sample polypeptide with the most probable value of the coupling of one of candidate's polypeptide.
6. the method for claim 4, wherein said depot data bank further comprises at least a air gun note of at least a polypeptide in the depot data bank.
7. the method for claim 6, wherein said air gun note comprises posttranslational modification.
8. the method for claim 7; wherein said posttranslational modification comprises at least one member who is selected from as next group, described group by ribosylation, phosphorylation, alkylation, hydroxylation, glycosylation, oxidation, reduction, myristylation, biotinylation, ubiquitination, iodate, nitrosylation, amination, sulphur interpolation, cyclisation, nucleotide add, fatty acid adds and acidylate is formed.
9. the method for claim 4, wherein said depot data bank is stored in the electronic memory of computing machine.
10. the method for claim 9, wherein the user can be by searching algorithm through the telecommunications access computer and from described depot data bank retrieving information.
11. the method for claim 10, wherein said searching algorithm further comprise the internet works software application.
12. a screening compounds comprises a kind of method of inhibition activity of the enzyme that peptide substrate is carried out posttranslational modification:
Described enzyme is contacted with described compound to form premix; And
In described premix, add peptide substrate to form reaction mixture;
With the described peptide substrate of the methods analyst of claim 5.
13. the method for claim 12, further comprise the co-factor of adding with the enzyme catalytic reaction, wherein said co-factor comprises and is selected from one group at least one member who is made up of ATP, ADP, AMP, GTP, GDP, GMP, CTP, CDP, CMP, UTP, UDP and UMP.
14. the method for claim 12, wherein said enzyme is fixed on the solid support.
15. computer program that is used for computing machine, described computer program comprises computer usable medium, have computer readable program code in described medium, to be used for selecting one group of candidate's polypeptide at a kind of sample polypeptide, described computer program comprises:
Computer readable program code is used to instruct computing machine to select one group of candidate's polypeptide at a kind of sample polypeptide, comprising:
Difference according to the sample peptide fragment quality that is produced by mass spectroscopy is selected the first time that set is carried out to candidate's polypeptide; And
The absolute mass of the absolute mass of polypeptide and described fragment is selected the second time that set is carried out to candidate's polypeptide per sample.
16. the computer program of claim 15, wherein be used to instruct computing machine to determine the selected computer readable program code first time of pair set, the wherein said first time the selected at least a portion amino acid sequence of determining the sample polypeptide according to the difference of fragment quality that comprises.
17. the computer program of claim 16 further comprises the computer readable program code of the absolute mass of the absolute mass of the sample polypeptide that is used to instruct computing machine to determine complete form and sample peptide fragment.
18. the computer program of claim 16 further comprises being used for instructing computer based to select the computer readable program code of candidate's polypeptide from the protein form set in the partial amino-acid series at least of sample polypeptide.
19. the computer program of claim 16, further comprise being used to instruct computing machine to select the computer readable program code of one group of candidate's polypeptide, with by the absolute mass of sample polypeptide and the theoretical absolute mass data of candidate's polypeptide are compared the probable value that obtains mating through the method for claim 1; With by to matching probability value ordering and based on differentiating the primary structure of sample polypeptide with the most probable value of the coupling of one of candidate's polypeptide.
20. the computer program of claim 15 further comprises a system, wherein this system comprises:
Computing machine;
The depot data bank of protein form; With
Main tool.
21. comprising, the computer program of claim 20, wherein said main tool be selected from one group at least one member who forms by data management system, ion fallout predictor, reduction of data instrument and figure viewer interface tool.
22. the computer program of claim 20, wherein said depot data bank further comprises the air gun note.
23. the computer program of claim 20, wherein said depot data bank further comprise dynamic air gun note.
24. the computer program of claim 20, wherein said system further comprises searching algorithm, and wherein said searching algorithm comprises absolute mass search modes and sequence mark search modes.
25. the computer program of claim 24, wherein said absolute mass search modes further comprises Δ m search modes.
26. the computer program of claim 20 further comprises the mass spectrometer with the computing machine communication.
27. the computer program of claim 20, wherein said computing machine and user are by the applying Internet software communication.
28. the computer program of claim 20 further comprises:
Computing machine;
The depot data bank of protein form;
Be used to retrieve the searching algorithm of described depot data bank;
Data management system;
The ion fallout predictor;
The reduction of data instrument; With
Figure viewer interface tool.
29. a system that is used for selecting at a kind of sample polypeptide one group of candidate's polypeptide comprises:
Be used for difference according to the sample peptide fragment quality that produces by mass spectroscopy to candidate's polypeptide set carry out for the first time selected device;
The absolute mass that is used for the absolute mass of polypeptide per sample and described fragment to candidate's polypeptide set carry out for the second time selected device; With
Computing machine.
30. the system of claim 29, wherein said computing machine and mass spectrometer communication.
31. the system of claim 29, wherein said computing machine and user are by the applying Internet software communication.
32. a system that is used for selecting at a kind of sample polypeptide one group of candidate's polypeptide comprises:
The computer program of claim 15; With
Computing machine.
33. the method for claim 1, it is selected for the third time to comprise further that per sample the absolute mass pair set of polypeptide and sample peptide fragment carries out, and wherein takes place before selected the selected for the third time first time at pair set of pair set.
CNA2005800070925A 2005-03-03 2005-03-03 Identification and characterization of proteins using new database search modes Pending CN101124581A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2005/007344 WO2005088303A2 (en) 2004-03-05 2005-03-03 Identification and characterization of proteins using new database search modes

Publications (1)

Publication Number Publication Date
CN101124581A true CN101124581A (en) 2008-02-13

Family

ID=39086091

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2005800070925A Pending CN101124581A (en) 2005-03-03 2005-03-03 Identification and characterization of proteins using new database search modes

Country Status (1)

Country Link
CN (1) CN101124581A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389335A (en) * 2012-05-11 2013-11-13 中国科学院大连化学物理研究所 Analysis device and method for identifying biomacromolecules
CN103776891A (en) * 2013-09-04 2014-05-07 中国科学院计算技术研究所 Method for detecting differentially-expressed protein
CN105117620A (en) * 2015-07-27 2015-12-02 清华大学深圳研究生院 Proteome database and application thereof
CN107924429A (en) * 2015-04-14 2018-04-17 皮阿赛勒公司 Method and electronic system, related computer program product at least one fitness value for predicting protein
CN108038353A (en) * 2017-12-26 2018-05-15 重庆佰诺吉生物科技有限公司 A kind of web page display method of genomic data
CN109661574A (en) * 2016-09-01 2019-04-19 株式会社岛津制作所 Analytical data of mass spectrum processing unit
CN111788633A (en) * 2017-12-29 2020-10-16 诺迪勒思生物科技公司 Decoding method for protein identification

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389335A (en) * 2012-05-11 2013-11-13 中国科学院大连化学物理研究所 Analysis device and method for identifying biomacromolecules
CN103776891A (en) * 2013-09-04 2014-05-07 中国科学院计算技术研究所 Method for detecting differentially-expressed protein
CN103776891B (en) * 2013-09-04 2017-03-29 中国科学院计算技术研究所 A kind of method of detection differential expression protein
CN107924429A (en) * 2015-04-14 2018-04-17 皮阿赛勒公司 Method and electronic system, related computer program product at least one fitness value for predicting protein
CN107924429B (en) * 2015-04-14 2022-12-09 皮阿赛勒公司 Method and electronic system for predicting at least one fitness value of a protein
CN105117620A (en) * 2015-07-27 2015-12-02 清华大学深圳研究生院 Proteome database and application thereof
CN105117620B (en) * 2015-07-27 2018-03-02 清华大学深圳研究生院 Proteome databases and its application
CN109661574A (en) * 2016-09-01 2019-04-19 株式会社岛津制作所 Analytical data of mass spectrum processing unit
CN108038353A (en) * 2017-12-26 2018-05-15 重庆佰诺吉生物科技有限公司 A kind of web page display method of genomic data
CN111788633A (en) * 2017-12-29 2020-10-16 诺迪勒思生物科技公司 Decoding method for protein identification

Similar Documents

Publication Publication Date Title
Li et al. Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures
Weatherly et al. A Heuristic method for assigning a false-discovery rate for protein identifications from Mascot database search results
Hernandez et al. Automated protein identification by tandem mass spectrometry: issues and strategies
Xu et al. MassMatrix: a database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data
Nesvizhskii et al. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS
Yates et al. Automated protein identification using microcolumn liquid chromatography-tandem mass spectrometry
US20110136675A1 (en) Identification and characterization of proteins using new database search modes
Blueggel et al. Bioinformatics in proteomics
US20100137151A1 (en) Protein Expression Profile Database
IL150840A (en) Method of non-targeted complex sample analysis
CN101124581A (en) Identification and characterization of proteins using new database search modes
Eddes et al. CHOMPER: A bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high‐throughput proteomic strategies
WO2003006951A9 (en) System and method of determining proteomic differences
Liska et al. Combining mass spectrometry with database interrogation strategies in proteomics
Na et al. Computational methods in mass spectrometry-based structural proteomics for studying protein structure, dynamics, and interactions
Salzano et al. Mass spectrometry for protein identification and the study of post translational modifications
US20060003460A1 (en) Method for comparing proteomes
Pardanani et al. Primer on medical genomics part IV: expression proteomics
Thavarajah et al. Re-evaluation of the 18 non-human protein standards used to create the Empirical Statistical Model for Decoy Library Searching
Ossipova et al. Optimizing search conditions for the mass fingerprint‐based identification of proteins
Fridman et al. The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry
US20060172430A1 (en) Identification and characterization of protein fragments
Bennett et al. Analysis of large-scale MS data sets: the dramas and the delights
Graves et al. Proteomics and the molecular biologist
AU2002324503B2 (en) System and method of determining proteomic differences

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication