WO2003038728A2 - Systeme informatique et procede utilisant des donnees de spectrometrie de masse et une base de donnees proteine pour l'identification de proteines inconnues - Google Patents

Systeme informatique et procede utilisant des donnees de spectrometrie de masse et une base de donnees proteine pour l'identification de proteines inconnues Download PDF

Info

Publication number
WO2003038728A2
WO2003038728A2 PCT/IB2002/004839 IB0204839W WO03038728A2 WO 2003038728 A2 WO2003038728 A2 WO 2003038728A2 IB 0204839 W IB0204839 W IB 0204839W WO 03038728 A2 WO03038728 A2 WO 03038728A2
Authority
WO
WIPO (PCT)
Prior art keywords
peak
peaks
protein
probability
database
Prior art date
Application number
PCT/IB2002/004839
Other languages
English (en)
Other versions
WO2003038728A3 (fr
Inventor
Jari HÄKKINEN
Thorsteinn RÖGNVALDSSON
Jim Samuelsson
Original Assignee
Biobridge Computing Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Biobridge Computing Ab filed Critical Biobridge Computing Ab
Priority to AU2002347462A priority Critical patent/AU2002347462A1/en
Publication of WO2003038728A2 publication Critical patent/WO2003038728A2/fr
Publication of WO2003038728A3 publication Critical patent/WO2003038728A3/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins

Definitions

  • the present invention relates to a computer system and a method for selecting one or more candidate proteins from a plurality of proteins stored in a database.
  • the known methods are based on semi-computerised comparisons between numerical representations of peptide peaks of known proteins and peptide peaks of the unknown protein.
  • an object of the present invention is to provide a method and a computer system for selecting at least one candidate protein from a database, in such a manner that the human factor is minimised.
  • a method for providing a measure applicable in selecting proteins in a database storing a numerical representation of theoretical mass spectrum for a plurality of proteins said method comprises preferably based on a numerical representation of a mass spectrum of a protein to be identified in the form of corresponding values of entity mass / electric charge and intensity, extracting noise-free mono-isotopic peptide peaks from the numerical representation of the mass spectrum to be identified, thereby providing, if possible, a list of selected peaks; selecting at least one candidate protein from the database, said candidate(s) being selected based on a closeness-of-fit algorithm providing a set of peak matches between the list of selected peaks and peaks of the candidate protein(s), and determining the probability for the set of peak matches to occur, thereby providing a measure applicable in selecting protein candidates.
  • the set of peak matches are preferably all those peaks that satisfy a condition that the distance between the two peak masses is less than a certain value. This certain value is in many preferred embodiments user defined, but may, of course, be defined based on the actual data.
  • the method comprises preferably the step of determining a score value ( ⁇ ) relating to the probability for the set of peak matches to occur, said score value being preferably determined as the negative logarithm of the probability for the set of peak matches to occur.
  • the method preferably always computes a score value to every protein stored in the database.
  • the present invention may preferably further comprise the steps of determining the probability for getting a score value equal to or above a predefined score ( ⁇ ), wherein
  • the probability for getting a score value equal to or above a predefined score is the probability to reach randomly a score value at least as large as the score value in question and/or the probability for getting a score value equal to or above a predefined score ( ⁇ ) is the probability to reach from the data base a score value at least as large as the score value in question.
  • the assessment may lead to the conclusion that none of the proteins represented in the data base is a likely candidate as a comparison of these two probabilities normally will show that top-scoring candidates from the data base search have score values much higher than what can be expected from just random matching. This last situation is especially valuable if the unknown protein is not in the database.
  • the probability for the set of peak matches to occur preferably reflects the probability to have a predetermined number(r) of matches. Furthermore, the probability for the set of peak matches to occur may preferably be determined so that the probability is rewarded by many matches, and at the same time takes into account the propensity for large proteins having many matches.
  • the candidate protein(s) is preferably represented by a first set of peptide masses being a theoretical spectrum wherein each peak has the same intensity as all other peaks.
  • the extracting of noise-free mono-isotopic peptide peak by the method according to the present invention may preferably comprise determining the intensity level of the spectrum where the signal-to-noise ratio is unity, such as substantial unity, preferably by use of equation 1 disclosed herein; determining the intensity level of the spectrum where the signal-to-noise ratio is zero, such as substantial zero, preferably by use of equation 2 disclosed herein; and locating peak candidates by locating all parts, such as substantial all parts, of the spectrum over noise and extract, preferably by copying, the ranges to form peak candidates.
  • the extracting of noise-free mono-isotopic peptide peak may preferably further comprise determining the peak entities mass/electric charge, intensity, width and signal to noise ratio for the peak candidates; bundling of peak candidates into clusters; - deconvolution of compound peak clusters into single peptide peak clusters; and resolving single peptide peak clusters into mono-isotopic peptide peaks.
  • the extracting of noise-free mono-isotopic peptide peak may preferably comprising or further comprise determining a baseline of the spectrum and smoothening the baseline; determining a noise level and smoothening the noise level; and - locating peak candidates by locating all parts, such as substantial all parts, of the spectrum over noise and extract, preferably by copying, the ranges to form peak candidates.
  • the extracting of noise- free mono-isotopic peptide peak may further comprise - fitting function parameters for the peak candidates bundling of peak candidates into clusters; deconvolution of compound peak clusters into single peptide peak clusters; and - resolving single peptide peak clusters into mono-isotopic peptide peaks.
  • the method according to the present invention allows peak extraction of peptide peaks representing peptides having any value of electrical charge.
  • the method according to the present invention may preferably further comprise the step of filtering the list of selected peaks. Additionally, the filtering may comprise discarding one or more peaks preferably based on input from a user of the method, said input may claim one or more specific peak to be discarded.
  • the list is preferably filtered based on one or more of the strategies: echoes to consider, intensity cut, low and mass, aldi, peaks to exclude, peaks to keep, width cut.
  • a peak match is defined as a situation where the distance between the two peak masses is less than a predefined match value according to equation 6 disclosed herein and the predefined match value may preferably be user defined.
  • the method according to the present invention method may preferably further comprise the step of determining a score value ( ⁇ ) relating to the probability for the set of peak matches to occur, said score value being determined as the negative logarithm of the probability for the set of peak matches to occur.
  • a score value is determined for all proteins in the data base.
  • the method may preferably further comprise the step of determining the probability for getting a score value equal to or above a predefined score ( ⁇ ).
  • the step of determining the probability may preferably be based on the list of peptide peaks, the predefined match value and the probability density for the list of peptide peaks.
  • the probability for getting a score value equal to or above a predefined score ( ⁇ ) is the probability to reach randomly a score value at least as large as the score value in question. Additionally or in combination thereto, the probability for getting a score value equal to or above a predefined score ( ⁇ ) is preferably the probability to reach from the data base a score value at least as large as the score value in question.
  • is the probability for getting a score value equal to or above a predefined score ( ⁇ ) calculated by equation 24 disclosed herein.
  • the database is storing a list of peptide masses and the corresponding parent proteins.
  • the database results from a digestion of proteins and the digestion may preferably has been performed by the method according to the second aspect of the present invention.
  • the present invention preferably relates to a method for in silico digesting proteins, comprising establishing a plurality of protein sequences, checking, for each amino acid in the sequences, whether the amino acid acquires a post- translational modification and if so modifying the amino acid, and whether the current position coincides with a cleavage sites pre-specified or is the current position right-most amino acid and if so modify the acid accordingly, and compute and register the masses for all possible combinations of minimal peptide masses.
  • the present invention preferably relates to a computer system for providing a measure applicable in selecting proteins in a database storing a numerical representation of theoretical mass spectrum for a plurality of proteins
  • said computer system comprises preferably means for - extracting noise-free mono-isotopic peptide peaks from a numerical representation of the mass spectrum to be identified, thereby providing, if possible, a list of selected peaks, the extracting being based on a numerical representation of the mass spectrum of a protein to be identified in the form of corresponding values of entity mass / electric charge and intensity, selecting at least one candidate protein from the database, said candidate(s) being selected based on a closeness-of-fit algorithm providing a set of peak matches between the list of selected peaks and peaks of the candidate protein(s), and determining, such as computing means, the probability for the set of peak matches to occur, thereby providing a measure applicable in selecting protein candidates.
  • the computer system according to the present invention comprises preferably means for performing some or all of the steps of the method according to the first aspect of the present invention.
  • Such means comprises preferably one or more computer processor, memory, disk storage, one or more data connections and/or the like.
  • a graphical user interface is provided in a fourth aspect of the present invention.
  • This graphical user interface is preferably particular useful for guiding a user through a protein identification process and for presenting the result of the identification process, as the interface preferably comprises a number of module fields each representing one or more module adapted to perform one or more of the steps according to the method according to first aspect of the present invention, and/or the second aspect of the present invention.
  • these fields are preferably graphically arranged and graphically linked so as to reflect a predefined executing order of the modules, the executing order being preferably the calculation flow governed by the underlying algorithms.
  • the graphical user interface is preferably adapted to initiating executing of a module in response to input to the computer system.
  • the graphical user interface is preferably also adapted to change the appearance of a field corresponding to the module being executed.
  • the input is preferably provided by a pointing device, such as a computer mouse, and a thereto associated bottom press. Additionally or in combination thereto, the one or more of these fields are changing appearance during execution of their corresponding module(s).
  • the graphical user interface according to the present invention may preferably further comprise one or more result fields appearing after results have been and/or during results are being generated by one or more of said modules and displaying the results.
  • the graphical user interface typically and preferably further comprise input fields preferably appearing when a user input is required. Furthermore, the graphical user interface may preferably further comprise one or more dialog windows through which user input may be inputted and/or edited. In connection thereto or in general, the one or more of the one or more dialog windows may preferably allow a user to edit values stored in a data base and the one or more dialog windows are preferably accessible via button(s) appearing on the interface.
  • the graphical user interface further comprises a tool bar via which actions can be executed by pushing buttons appearing on said tool bar, said tool bar preferably further comprising curtains, wherein each curtain represents a category of action and wherein each curtain comprises buttons for actions belonging to a particular category.
  • the graphical user interface may preferably further comprise a set of windows that communicate the results of the protein identification process.
  • Fig. 1 shows a raw experimental spectrum.
  • Fig. 2 shows the algorithmic flow of the present invention, (a: automatic input u: user input).
  • Fig. 3 shows an example of a cluster that might be mistaken for a single, very broad, peak.
  • Fig. 4 shows a peak cluster
  • Fig. 5 shows the isotope abundancy distributions for the four lowest-lying isotopes 1(0, m), 1(1, m), 1(2, m) and 1(3, m). (the horizontal axis: m: the vertical axis: isotope abundancy)
  • Fig. 6 shows a simple illustration of the chemistry of post-translational modifications (ptmrs) and missed cleavages, and their potential effect on the peaks of a mass spectrum, a. no ptm:s, no missed cleavage, b. ptm:s, both fixed and variable, c. missed cleavage.
  • ptmrs post-translational modifications
  • Fig. 7 shows an example of the distribution of the number of peptides per parent protein. On the y-axis is shown the values of the frequency distribution for the number of peptides per parent protein. Note that the distribution is dependent on choice of database, post-translational modifications, and allowed number of missed cleavages.
  • Fig. 8 shows an example of the distribution of the peptide masses. On the y- axis is shown the ⁇ 'alues of the peptide mass frequency distribution. Note that the distribution is dependent on choice of database, post-translational modifications, and allowed number of missed cleavages.
  • Fig. 9 shows an example of a clear case of protein identification. On the y-axis is shown the values of the functions ⁇ andom and ⁇ oiice base defined in equations eq. 25 and eq. 26 respectively.
  • the top-scoring candidates from the database search have score values much higher than what can be expected from just random matching.
  • Fig. 10 shows an example of a case with no clear protein identification.
  • the top-scoring candidates from the database search have score values not higher than what can be expected from just random matching. Further experimental measures have to be taken.
  • Fig. 11 shows the workspace as it appears on startup.
  • the arrows between the analysis boxes illustrate the flow of data in the analysis, the boxes represent key steps in the analysis, as described in the sections "Algorithms". Graphs of the data and results will occupy the empty space on the left-hand side, once the analysis is started.
  • Fig. 12 shows the File menu.
  • Fig. 13 shows the Edit menu.
  • Fig. 14 shows the Actions menu.
  • Fig. 15 shows the dialogue window that comes up when the MSFiles box is activated. By clicking on the relevant fields the user chooses what spectrum files to be analyzed. Note that a user, by clicking on the "Masslist” button, can also send in a list of already extracted peak mass values. Given that choice the present invention will skip the peak extraction step, and move directly to the peak filtering step.
  • Fig. 16 shows the options menu for the peak filtering step, corresponding to the box Pepfil.
  • Fig. 17 shows the options menu related to the database digestion as well as the scoring and the validation steps; corresponding to the box Matcher.
  • Fig. 18 shows the window in which a user can specify his/her own post- translational modification.
  • Fig. 19 shows the mass spectrum during different stages in the analysis.
  • the top graph is the unprocessed spectrum.
  • the middle graph shows the extracted mono-isotopic peaks.
  • the bottom graph shows the mono-isotopic peaks after the filtering step. Zooming in any of the three graphs will automatically 2;oom the two other graphs as well.
  • the table at the bottom left of the workspace shows the top-scoring proteins from the database.
  • Fig. 20 shows a table presenting the top-scoring proteins of the database.
  • Fig. 21 shows a table presenting the peaks that were extracted and met the peak criteria in the filtering step. This interactive window enables a user to manually add or remove peaks that he/she thinks can affect the database search.
  • Fig. 22 shows detailed information from the database search about a particular database protein.
  • Fig. 23 shows the search result using the default parameters of the present invention. No likely protein candidate is found.
  • Fig. 24 shows the search result when the parameter values have been chosen judiciously, taking into account known information about the experimental spectrum, assumed post-translational modifications etc, leading to one very likely protein candidate, marked by a green flag.
  • Fig. 25 shows the workspace after a batch run. The result for each individual spectrum can be studied in the same way as after processing a single spectrum.
  • the present invention takes as input a spectrum from a mass spectrometer, a protein database, and a set of user options.
  • the output is a table of those proteins (from the database) that are the most likely candidates for having generated the mass spectrum.
  • • h is a header row, which may or may not be empty. If it is not empty it may contain information about the experimental setup that generated the spectrum. If it is empty, it may also be left out.
  • the first column should contain values of the entity mass/electric charge (m/z), measured in units of Dalton (Da).
  • the second column is a field that separates the first and the third column. It can, for example, consist of a comma or white space. • The third column should contain values of the intensity measured at the mass spectrometer.
  • N is the number of datapoints in the spectrum.
  • a protein database consists of a table of proteins. Depending on how well annotated the database is, the amount of information available varies. A minimum requirement for the present invention to work, is that for each database entry there is an identification tag (number or name) of the protein, and that its amino acid sequence is presented, where the amino acids are represented by their one-letter code. Additional but not necessary information is, for example, information about species, protein weight etc. Examples of protein databases that can be used are SWISSPROT [1], and NCBI non-redundant peptide sequence database.
  • the output is a table where those proteins in the database whose theoretical spectra show the strongest spectrum resemblance to the unknown experimental spectrum, are presented, the top candidates.
  • the measure of resemblance is a score value, defined and described in the section "The Algorithms" below.
  • the proteins in the table are presented in descending order with respect to resemblance, hence having the most likely candidate on the top row, the second most likely candidate on the second row etc.
  • For each protein top candidate in the table its score value and other parameters related to the search statistics, as well as its particular amino acid sequence, are presented.
  • Given the contents of the output table a graphical illustration of the output can also be given. This is further described in the section "The Graphical User Interface” below.
  • the peak extraction is divided into a set of five sub-processes.
  • a user can influence which peaks to be extracted by the choice of signal-to-noise ratio s/n.
  • the user's choice is denoted by ⁇
  • the peak extraction 1 Separating signal from noise
  • the separation of signal from noise is done in a series of steps.
  • V ,fl ⁇ ⁇ - w ⁇ ⁇ z - wh • ere w ⁇ ( 2 )
  • the peak extraction 2 Classification of the datapoints and peak determination
  • a peak V is built up by datapoints (d n ⁇ (x n , y n )) according to the following criteria:
  • FIG 3 is shown a portion of a raw experimental spectrum
  • the non-constant line marked 1 (solid) connects experimental datapoints.
  • Crosses marked 3 are those datapoints that belong to the set V s ⁇ gna i-
  • the dots on line 1 between lines 2 and 4 are those datapoints that belong to the set > supp ⁇ rt .
  • This one peak will have an x range, r x (V) ⁇ X ( p ) + ⁇ — - covering approximately 6 Da. Since peaks representing different isotopes of the same singly charged peptide should be separated with a distance of approximately 1 Da, a peak cannot have an x range very much more than 1 Da. Therefore a fourth criterion to the three above is added:
  • the present invention employs a procedure that systematically checks the value of r x (V):
  • the peak extraction 3 Computing peak properties
  • the intensity of a peak V is in the present invention taken as the maximum intensity value of those datapoints that build up the peak.
  • the center of mass and width of a peak is determined by a centroid calculation. Furtehrmore, the signal-to-noise ratio of the peak is therefore determined through its maximum intensity value, using the definition of signal-to-noise ratio for an individual datapoint as described in step 8 in the passage "Separating signal from noise" above.
  • the vertical impulses marked 6 indicate the central x values (x c (P)) and maximum y values (y(V)) for the peaks that have been extracted.
  • the peak extraction 4 Determining peak clusters
  • the next step is to partition the peaks into peak clusters.
  • Two peaks, V and V are defined to be neighbours in and members of the same peak cluster if 1 — r ⁇ ⁇ 1 + T, where T PH 0.1 Da.
  • the reason for having the value of r in that particular range is based on the fact that consecutive isotopes that belong to the same singly charged chemical compound should be separated by the mass of the neutron « 1 Da.
  • An additional criterion is imposed on a cluster: At least one peak in a cluster needs to have at least one datapoint in T> s i gno ⁇ . Such a peak is said to be in > s i gna [.
  • the p:th cluster can, in turn, be defined as a set of peaks:
  • V p (r) is peak number r in the p:th cluster C p ⁇ and r(p) (> 1) is the number of peaks in that cluster.
  • the peak extraction 5 Finding mono-isotopic peaks in a cluster
  • the present invention proceeds now to select the mono-isotopic peaks in a peak cluster.
  • p:th cluster C p defined above, an example of which is shown in figure 4.
  • the present computes the peptide abundancies that build up the cluster.
  • FIG 5 is shown the distributions X(i, m) for the four lowest-lying isotopes. Due to statistical variations there is a width in each distribution, indicated by errorbars.
  • the values of the Isotopic distributions are in the present invention kept in a table.
  • the solution a can in general contain components o ⁇ ⁇ 0; a non-realistic solution in the present context.
  • the way to approach the problem will instead be to find a solution a such that it is the best possible with the constraint that V ⁇ r > 0.
  • “best possible” solution is meant the following:
  • d(a,p,r) y(p,r)- (a_ ⁇ ⁇ (r - l,m) + ... + a ⁇ ⁇ ⁇ (0,m))
  • the present invention employs the well-known technique of quadratic programming to solve this constrained minimization problem. So, for each cluster C p there ⁇ vill be mono-isotopic peptides in and only in those postions where r > 0.
  • a peak in a spectrum that may contain peptides with higher charges can represent peptides of different charges. This means that the systems of equations that descibe the peak intensities in terms of isotopic distributions X(i, rn) and peptide abundancies ⁇ r , may get contributions from more than one peak cluster. In this general case it is therefore necessary to introduce a procedure that creates a set of disjoint systems of equations, where a disjoint system may or may not get get contributions from more than one peak cluster, and a peak cluster contribute to one and only one such disjoint system.
  • the baseline of the spectrum is calculated. This is done by finding, over the given raw data, the smallest intensity within a running window (user option: -bcr [baseline constant range]). This calculation is performed by analysis of the measured intensity values by means of histograms, and the smallest valued bin in the histogram is choosen as the baseline value. The baseline is smoothed after the analysis to make sure that the curve is continuous.
  • the last fit is to try fitting more than one gaussian to the peak candidate. This is done to resolve a possible situation of superposition of peaks. If this is successiveful; that is, more than one peak was fitted, then these peak parameters are used, and the candidate resulted in more than one peak.
  • the single peptide clusters are analysed, mono-isotopic mass, charge and a quality measure is calculated for each cluster.
  • the user option -mir [min Jsotopic_reduction] is used in quality analysis of whether the single cluster is a superpostion of more than one peptide differing an integer times the mass of a neutron, -mir sets the accept- able deviation from the cluster model used.
  • User option -mi (monoJsotopes] defines whether to report all extracted peaks or mono-isotopes only.
  • User option -of [overflow] sets the intensity measurement limit of the mass spectrometer. This parameter is needed since the measured Isotopic clusters become corrupt if the measurements are overflowed.
  • the user options -im and -ic [signal-to-noise or absolute intensity] set the model to use and the cut-off value.
  • the peak extraction step described above has resulted in a set of selected peaks.
  • a user of the present invention may want to discard some peaks. Those unwanted peaks may or may not be part of the selected peaks; if they are, they will be filtered out; that is removed, in this step.
  • the invention supports the following filters and any combination of them: echoes to consider: In a mass spectrum, so-called peak echoes sometimes occur. By this is meant that the experimental mass spectrum sometimes contains peaks that are false doublets of true peaks. These false doublets often appear at certain well- established distances from the true peaks. This filter handles that problem. Assume the user chooses a value d, measured in Dalton.
  • peaks to exclude In a mass spectrum some particular m/z values often represent calibrants, parts of the digestion enzyme, or known contaminants. These m/z values can be specified by the user and if these appear in the set of selected peaks, within a tolerance window, the peaks are removed. peaks to keep: Suppose the number of selected peaks is S and the user only wants U peaks to go into the spectrum matching. In this filter those U peaks with the highest signal-to-noise are kept, and the other S-U are removed.
  • width cut A threshold for width of peaks. Peaks with a width above the width cut value, specified by the user, are removed.
  • each row contains information about an extracted peak that has survived the filter step.
  • a row consists of values for the parameters of peak properties such as m/z, intensity, peak width, signal-to-noise ratio and peptide abundancy.
  • digestion in silico The purpose of digestion of proteins in a protein database, so-called digestion in silico, is to mimic the enzymatic digestion of the real unknown protein that takes place in the laboratory, and hence compute theoretical spectra, one theoretical spectrum for each database protein. Having that, the present invention can compare the experimental spectrum with each theoretical spectrum.
  • digestion has been carried out by a site-specific enzyme.
  • the enzyme cleaves; that is cuts, the protein only at certain cleavage sites.
  • digestion enzymes there is trypsin that cleaves only on the C-terminal side of the amino acids arginin and lysin - unless there is a neighbouring prolin amino acid on the C-terminal side of the arginin or lysin. If that is the case no cleavage will be done by the trypsin enzyme at that particular site.
  • Post-translational modifications It may happen that a protein gets modified by more or less complicated chemical compounds that cannot be predicted by only studying the nucleotide base-pair sequence of its corresponding gene.
  • One example of post-translational modifications are methionine oxidation where the amino acid methionine acquires an extra oxygen atom.
  • a peptide containing methionine therefore gets its mass shifted upwards by the mass of an oxygen atom.
  • Other examples of post- translational modifications can be found in [4]. Post-translational modifications can be divided into variable and fixed modifications.
  • a variable modification is such that it may or may not occur; in the example above it would mean that some of the methionine amino acids acquire an extra oxygen atom while others do not.
  • a fixed modification on the other hand, always occurs; in the example above it would mean that every methionine amino acid in each peptide acquires an extra oxygen atom.
  • Post-translational modifications are not really part of the protein digestion process, but need to be included when computing the theoretical spectra. It is therefore natural to Incorporate this circumstance at the digestion stage. 2: Missed cleavages: It may also happen that an enzyme does not cut at a site on the protein where it is allowed to cut.
  • the general procedure of digestion in silico is the following: In a protein database the present invention will process entry j, where j — l, .-, iV rfo ⁇ 6ose and N database is the number of proteins registered in the database. Entry j contains a unique identification tag T[j], and an amino acid sequence
  • ⁇ (j) a ⁇ (j)a 2 ⁇ j)...a k (j)...a ⁇ j) (j)
  • a k (j) is the k:th amino acid residue, counted from the N-terminal side of the 2 b protein j.
  • a k (j) can be any of the 20 amino acids found in proteins, and is represented by a one-letter code. (For a reference of their code-letters, chemical composition and mass, see [3].)
  • K (j) is the number of amino acids in protein j.
  • y trypsin, M[Y)—2
  • S[Y] ⁇ (the C-terminal side of arginin, unless there is a neighbouring prolin on the C-terminal side of the arginin)
  • S'[y] 2 (the C-terminal side of lysin, unless there is a neighbouring prolin on the C-terminal side of the lysin).
  • M [FM 1 , .., FM f , ..., FM F ; VM 1 , ....VM v , ..., VM v )
  • FM and VM denote fixed and variable modifications respectively.
  • To every fixed and variable modification is assigned a mass, m(FM j ) and m(VM v ) respectively.
  • Counters for all members of the set M are set to n(FM ) — 0 and n(VM ⁇ ) — 0.
  • the invention reads off, from left to right; that is from the N-terminal to the C-terminal side, the protein sequence A(j).
  • IV For each amino acid that is read off in the sequence, the invention checks whether a. the current amino acid shall acquire a post-translational modification specified by M.
  • n Read off next amino acid; that is go to step III.
  • y n m(p[j, c, x(c)]) — ⁇ m(p[j, c, x(c)])+ mass(current amino acid residue)
  • Yl y 1 [n(VM v ) + 1] such combinations) m(p[j, c, x(c)]) —? m(p[j, c, x(c))) + ⁇ v __ ⁇ - m(MV v ) ⁇ for each value n' ⁇ ; 0 ⁇ ⁇ n(VM v ).
  • the integer x(c) runs between 1 and X(c).
  • V Take into account missed cleavages: Having performed in silico digestion at every allowed cleavage site at protein T[j], there is a set of minimal peptides p[j,c',x(c')], where 1 ⁇ d ⁇ c and 1 ⁇ x(c') ⁇ X(d).
  • the invention now computes and registers the masses for all possible combinations of minimal peptides with the restriction that for each member of the combination, p[j, c', x(c')] say, there has to be at least one other member, p ⁇ j, J', x(c")], such that
  • ⁇ ( ⁇ ; N) The frequency distribution of peptide masses ⁇ for peptides whose parent proteins have given rise to TV peptides
  • Distributions 1 and 2 are shown in figures 7 and 8.
  • Distribution 2 is the sum of distributions 3 for all different values of TV, keeping the value of ⁇ fixed. In the present invention theses three distributions are kept in memory and will be used in the spectrum matching and validation algorithms described below.
  • the present invention computes the probability for the set of peak matches to occur.
  • the score is then taken as the negative logarithm of that probability, so that a high score value reflects an unlikely event, and hence a high degree of spectrum resemblance; that is, a good match.
  • There are, of course, different ways to do this basically meaning different ways to define the set of peak matches and different ways of computing probabilities.
  • the aim in the present invention is, quite naturally, to reward many matches, and at the same time take into account the propensity for large database proteins to have many matches.
  • z t (j) is the mass of the i:ih peak in the theoretical spectrum of protein T[j] 7 and N(j) is the number of peptides that resulted from in silico digestion of protein T[j].
  • the present invention calculates the probability for the set of matches between the peak list x and the theoretical spectrum z(j) as described by M[j].
  • Eq. 8 is taken as the general definition of a score value in the context of the present invention. There are now different ways to specialize that general expression. Here two examples of such specializations will be given.
  • is taken to be the probability that a peptide whose parent protein has given rise to N(j) peptides will find a match with one of the L peaks in x. K is then the probability for no match.
  • N TV rflndOT n ( c ) ⁇ r * a r d a ⁇ n m do ( m ⁇ c) ( 33 ) in order to contain n random ( ⁇ c ) random proteins that reach a score value at least ⁇ c .
  • n database (o ⁇ c ) s the number of real database proteins that reached a score value at least ⁇ c
  • N a t a e is the size of the real protein database.
  • p — p( ⁇ c ) is defined as the random probability of getting at least one protein with a score value of at least ⁇ c given the size of the real database, N (lfnbase . This implies
  • the parameters score value, quality measure and p-value are in the present invention calculated and reported for every database protein.
  • GUI Graphical User Inteface
  • GUI Graphical User Interface
  • GUI is designed so that it is platform independent. By this is meant that the computer code is written such that the GUI can run on any computer irrespective of the computer's operating system.
  • the invention is designed so that the Main Application can run independently of the GUI. 4.
  • the invention can, through the GUI, be run in a stepwise manner. This means that each algorithmic step, described above in the section "The Algorithms" , can be executed such that the following step in the algorithmic flow will not be executed before the user chooses to do so.
  • GUI workspace As illustrated, the workspace is divided into
  • the first two and the fourth workspace areas are interactive with the user. By this is meant that when a user clicks on one of the items in those areas, the user can either select input for the algorithms or start a process.
  • the menu bar at the top of the workspace, see figure 11, consists of five menus with the following features:
  • Edit The Edit menu contains one item, Boxes, that gives the user access to all the option menus for the analysis boxes in the workspace. It is shown in figure 13.
  • View The View menu is used to specify the items a user wants to see on the desktop. It has only one option: Hide icon toolbar, which specifies whether the icons at the top of the workspace should be shown or not.
  • the Preferences menu is divided into
  • Actions The Actions menu, see figure 14, has the following items:
  • Run step Runs only the high-lighted analysis step.
  • high-lighted is meant that the title bar of the corresponding analysis box is coloured.
  • Run batch Runs the entire analysis on every spectrum selected by the user.
  • Run spectrum Runs the entire analysis on a single spectrum selected by the user.
  • Halt process Halts a batch run.
  • the icon bar see figure 11, consists of a set of often-used icons that correspond to features and functions controlled in the menu bar, described above.
  • the status bar is located at the bottom of the workspace, see figure 11. It contains updated information about what is being currently processed by the present invention.
  • MSFiles Selects those files containing the user Pepex/the screen raw spectra to be analysed.
  • Pepex Extracts mono-isotopic peaks MSFiles/the user Pepfil/the screen/file from the raw spectrum.
  • Pepf ⁇ l Filters peaks from the peak Pepex/the user Matcher /the screen/file list created by Pepex. Possibility for recalibration of a spectrum.
  • the MSFiles box When clicking on the box MSFiles, or by clicking in the MSFiles field in the Edit menu, a window appears, see figure 15. There the user can indicate the file (containing data for a raw experimental spectrum) or set of files to be analysed. By clicking on "Add Files” a browser will appear and a user can choose to run one spectrum or many spectra for a batch job. Note that a user, by clicking on the "Masslist” button, can also send in a list of already extracted peak mass values. Given that choice the present invention will skip the peak extraction step, and move directly to the peak filtering step. Al
  • the left side of the window contains a list of the filters that are supported in the present invention.
  • the right side contains the filters that are currently active. By writing in the Parameter fields, the user can choose values that correspond to the active filters.
  • the filters that are supported which were described above in the section "The Algorithms" , are the following:
  • buttons that controls which filters to be used:
  • Deactivating a filter is done by marking a filter on the right side, followed by clicking on the ⁇ button.
  • the floppy disk button enables the user to save the current filter setup to a file.
  • This window contains a set of option fields.
  • Validation database size This is the number of random proteins that is used when calulating the function -F Tan om defined above in the section "The Algo- rithms”.
  • the GUI output from a run of the present invention can be divided into
  • FIG 19 is shown the GUI workspace after a run of the present invention.
  • three spectrum graphs are presented. These graphs are related to, from top to bottom, the experimental raw spectrum, the extracted peptide peaks, and the peptide peaks left after filtering.
  • a user can visually study the extracted peaks and relate them to the experimental spectrum.
  • a zooming function enables detailed study over the whole m/z-range.
  • the Results window
  • Results window a user gets information about the top-scoring protein candidates as well as about the peaks that were used in the database search.
  • FIG 20 is illustrated a list of the top-scoring protein candidates.
  • the proteins are listed in descending order of spectrum resemblance in comparison with the filtered peak list extracted from the experimental spectrum.
  • the list contains a number of rows (as many as chosen by the user in the Matcher box) where each row holds search results for one database protein. In the present illustration there are five columns for each row:
  • • quality flag The flag is a small rectangle whose colour is dependent on the quality measure of the fifth column and chosen cut-off values. In the present illustration it is implemented such that a quality value above 7 gives a green flag, a quality value between 3 and 7 gives a yellow flag, and below 3 gives a red flag. The statistical significance of the quality measure was discussed in the "Algorithms" section above.
  • • protein id In this column is reported the name and database identification tag of the database protein. Denoted "Protein id" in the present illustration.
  • This window contains detailed Information about that particular protein, and is described below. Peaks
  • FIG 21 is illustrated a list of the peaks that were used in the database search.
  • the upper field contains only the m/z values for the peaks.
  • a "Copy" button By clicking on a "Copy" button a user can copy and then paste the set of /z values into any desired application.
  • rows where more detailed information about every selected peak is given. In the present illustration there are six columns for each row; that is, for each peak V that is used in the database search:
  • intensity The absolute intensity value of the peak. It is calculated as y(V), described in the passage "Computing peak properties" of the “Algorithms” section. It is denoted by "Intensity” in the present illustration.
  • FIG 22 is shown an example of such a window, where a user, among other things, can find information about
  • PC hardware requirements: PC:s; that is, personal computers, or work sta- tions.
  • a client-server solution in which the binary files executing the algorithms are run on a server and the GUI is run on one or many clients.
  • the protein database Should be formatted in a so-called FASTA format and be stored on the server if a client-server solution is the user's choice. If a stand-alone version is used, the formatted protein database is stored on the computer at hand.
  • a rough outline of the method is the following.
  • the whole chosen database is digested Then, for every peptide mass, the two numbers i are j are calculated, such that
  • the score value for an entry in the protein database Is now defined as
  • the spectrum resemblance to the peak list x is computed. This resemblance is based on how well peaks from z(j) and x match each other. The criterion for a match between two peaks is such that their mutual distance has to be less than e. The spectrum resemblance is now based on a score that is written as
  • contains a scoring method based on the probability for matches and misses between the experimental spectrum and a theoretical spectrum.
  • Peak extraction not continued by the identification steps is valuable in many cir- cumstances.
  • One such case is when a user is only interested in visually inspecting spectrum differences; for example comparison of a protein spectrum from a healthy cell sample with a spectrum from a cell sample in disease. It is, however, also valuable if a user does want to make protein identification using, in parallell with the present invention, some other protein identification software. This can easily be done using the "Copy" function available for the peak list in the GUI Results window, described above. Running different methods in parallell, in order to raise the statistical significance, is not uncommon when doing protein identification.
  • the example spectrum file is called spectrum.txt. It is loaded by choosing the box MSFiles under the Edit menu, clicking on the "Add Files" button, and then use the browser to select the spectrum file.
  • the present invention will process the spectrum, with all user options set to their default values.
  • the search result is shown in figure 23.
  • the best scoring protein candidate gets a score value ("Score") of 14.52, a p-value ("Probability") of 0.15 and a quality value ("Quality”) of 3.89.
  • Score score value
  • Probability p-value
  • Q quality value
  • the quality value Q is directly related to the size needed of a random database in order to expect a certain score value. That size is in the present case exp(3.&9) ⁇ 50 times the size of the actual random database used for the search, maybe not a very convincing number.
  • the colour-coded flags in the left column also helps a user to quickly asses the statistical significance of the search result.
  • a user can choose to change the values of the option parameters from their default values.
  • the user uses the filter option to remove known peaks from the digestive enzyme, takes into account the possibility of missed cleavages and known post-translational modifications.
  • the present invention is rerun, and the result is shown in figure 24.
  • As can be seen there is a new top-scoring candidate. It gets a score value ("Score") of 16.42, a p-value ( "Probability") ⁇ 0, and a quality value Q 12.1.
  • the needed size of a random database for expecting the reported score value is in the present case exp(12.1) « 180000 times the size of the actual random database used for the search.
  • top candidate is also the correct protein.
  • a quick glance by the user at the colour-coding of the flags in the left column gives strong support for the top-scoring candidate to actually be the unknown protein. Additional support is also given by the following three candidates; all of the same type as the top candidate.
  • the present invention allows for running many experimental spectra in one go; so- called batch jobs. This is extremely valuable, and basically the only method feasible when a user wants to perform high-throughput screening of many spectra in a fast and automated fashion.
  • a user selects all the desired spectrum files in the MSFiles box, and selects "Run batch" in the "Actions" menu. After a batch run a user clicks anywhere on the workspace, and a list of the processed spectra appears, as shown in figure 25. The result for each individual spectrum can then be studied in the same way as after processing a single spectrum, with access to spectrum graphs, lists of top-scoring protein candidates etc.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Hematology (AREA)
  • Chemical & Material Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Food Science & Technology (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

L'invention concerne un système informatique et un procédé permettant de sélectionner une ou plusieurs protéines candidates à partir d'une pluralité de protéines mémorisées dans une base de données. L'invention concerne en particulier un procédé fournissant une mesure applicable à la sélection des protéines dans une base de données mémorisant une représentation numérique de spectre de masse théorique pour une pluralité de protéines. Le procédé selon l'invention comprend les étapes suivantes : extraction, sans bruit de fond, de pics de peptides mono-isotopes à partir de la représentation numérique du spectre de masse à identifier, sur la base d'une représentation numérique d'un spectre de masse d'une protéine à identifier sous la forme de valeurs correspondantes de masse d'entité/charge électrique et intensité, fournissant, si possible, une liste de pics sélectionnés ; sélection d'au moins une protéine candidate à partir de la base de données, lesdites candidates étant sélectionnées sur la base d'un algorithme précision d'ajustement fournissant une série d'appariemments de pics entre la liste de pics sélectionnés et de pics des protéines candidates, avec détermination de la probabilité que ladite série d'appariements se présente, ce qui permet d'obtenir une mesure applicable dans la sélection des protéines candidates. L'invention concerne également une interface utilisateur graphique facilitant l'utilisation dudit procédé.
PCT/IB2002/004839 2001-11-01 2002-11-01 Systeme informatique et procede utilisant des donnees de spectrometrie de masse et une base de donnees proteine pour l'identification de proteines inconnues WO2003038728A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002347462A AU2002347462A1 (en) 2001-11-01 2002-11-01 A computer system and method using mass spectrometry data and a protein database for identifying unknown proteins

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA200101616 2001-11-01
DKPA200101616 2001-11-01

Publications (2)

Publication Number Publication Date
WO2003038728A2 true WO2003038728A2 (fr) 2003-05-08
WO2003038728A3 WO2003038728A3 (fr) 2003-11-06

Family

ID=8160803

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2002/004839 WO2003038728A2 (fr) 2001-11-01 2002-11-01 Systeme informatique et procede utilisant des donnees de spectrometrie de masse et une base de donnees proteine pour l'identification de proteines inconnues

Country Status (2)

Country Link
AU (1) AU2002347462A1 (fr)
WO (1) WO2003038728A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2410608A (en) * 2004-02-02 2005-08-03 Agilent Technologies Inc System and methods for mass spectrometry analysis and dynamic library searching
WO2006133568A1 (fr) * 2005-06-16 2006-12-21 Caprion Pharmaceuticals Inc. Spectrometrie de masse virtuelle
WO2008007821A1 (fr) * 2006-07-12 2008-01-17 Korea Basic Science Institute Procédé de reconstitution d'une base de données de protéines et procédé d'identification de protéines à l'aide de ce procédé
US7894650B2 (en) * 2005-11-10 2011-02-22 Microsoft Corporation Discover biological features using composite images

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389335A (zh) * 2012-05-11 2013-11-13 中国科学院大连化学物理研究所 一种鉴定生物大分子的分析装置和方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5538897A (en) * 1994-03-14 1996-07-23 University Of Washington Use of mass spectrometry fragmentation patterns of peptides to identify amino acid sequences in databases
WO2000073787A1 (fr) * 1999-05-27 2000-12-07 Rockefeller University Systeme expert pour l'identification de proteines utilisant l'information en spectrometrie de masse combinee a la recherche de base de donnees
WO2001057519A2 (fr) * 2000-02-07 2001-08-09 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Procede d'identification et/ou de caracterisation d'un (poly)peptide
WO2002021139A2 (fr) * 2000-09-08 2002-03-14 Oxford Glycosciences (Uk) Ltd. Identification automatisee de peptides

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5538897A (en) * 1994-03-14 1996-07-23 University Of Washington Use of mass spectrometry fragmentation patterns of peptides to identify amino acid sequences in databases
WO2000073787A1 (fr) * 1999-05-27 2000-12-07 Rockefeller University Systeme expert pour l'identification de proteines utilisant l'information en spectrometrie de masse combinee a la recherche de base de donnees
WO2001057519A2 (fr) * 2000-02-07 2001-08-09 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Procede d'identification et/ou de caracterisation d'un (poly)peptide
WO2002021139A2 (fr) * 2000-09-08 2002-03-14 Oxford Glycosciences (Uk) Ltd. Identification automatisee de peptides

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
BERNDT P ET AL: "Reliable automatic protein identification from matrix-assisted laser desorption/ionization mass spectrometric peptide fingerprints" ELECTROPHORESIS, WEINHEIM, DE, vol. 20, 18 December 1999 (1999-12-18), pages 3521-3526, XP002223132 ISSN: 0173-0835 *
BREEN E J ET AL: "AUTOMATIC POISSON PEAK HARVESTING FOR HIGH THROUGHPUT PROTEIN IDENTIFICATION" , ELECTROPHORESIS, VOL. 21, PAGE(S) 2243-2251 , WILEY-VCH VERLAG GMBH, WEINHEIM DE, XP001040008 ISSN: 0173-0835 page 2243, column 2, line 10 - line 12 page 2250, column 2, paragraph 2 abstract *
MARCO WEHOFSKY ET AL: "Automated deconvolution and deisotoping of electrospray mass spectra." JOURNAL OF MASS SPECTROMETRY, vol. 37, 2002, pages 223-229, XP002902843 John Wiley & Sons ltd 2002 *
MARTIN ETHIER ET AL: "Automated structural assignment of derivatized complex N-linked oligosaccharides from tandem mass spectra." RAPID COMMUNICATIONS IN MASS SPECTROMETRY, vol. 16, 2002, pages 1743-1754, XP002902846 2002 John Wiley & sons ltd *
OLE N JENSEN ET AL: "Delayed extraction improves specificity in database searches by matrix-assisted laser desorption / ionization peptide maps" RAPID COMMUNICATIONS IN MASS SPECTROMETRY, vol. 10, 23 July 1996 (1996-07-23), pages 1371-1378, XP002902844 1996 John Wiley & Sons ltd ISSN: 0951-4198 *
PAPPIN D J C ET AL: "RAPID IDENTIFICATION OF PROTEINS BY PEPTIDE-MASS FINGERPRINTING." 1993 , CURRENT BIOLOGY, CURRENT SCIENCE,, GB, VOL. 3, NR. 6, PAGE(S) 327-332 XP000856937 ISSN: 0960-9822 the whole document *
PERKINS D N ET AL: "PROBABILITY-BASED PROTEIN IDENTIFICATION BY SEARCHING SEQUENCE DATABASES USING MASS SPECTROMETRY DATA." , ELECTROPHORESIS, VOL. 20, PAGE(S) 3551-3567 , WILEY-VCH VERRLAG GMBH , WEINHEIM (DE) XP001051561 ISSN: 0173-0835 the whole document *
ROBIN GRAS ET AL: "Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection" ELECTROPHORESIS, vol. 20, 1999, pages 3535-3550, XP002902845 Wiley-VCH Verlag GMBH 1999 ISSN: 0173-0835 *
WENZHU ZHANG ET AL: "Profound: An expert system for protein identification using mass spectrometric peptide mapping information." ANALYTICAL CHEMISTRY, vol. 72, no. 11, 1 June 2000 (2000-06-01), pages 2482-2489, XP002902847 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2410608A (en) * 2004-02-02 2005-08-03 Agilent Technologies Inc System and methods for mass spectrometry analysis and dynamic library searching
GB2410608B (en) * 2004-02-02 2010-07-21 Agilent Technologies Inc Dynamic library searching
WO2006133568A1 (fr) * 2005-06-16 2006-12-21 Caprion Pharmaceuticals Inc. Spectrometrie de masse virtuelle
US7894650B2 (en) * 2005-11-10 2011-02-22 Microsoft Corporation Discover biological features using composite images
US8275185B2 (en) 2005-11-10 2012-09-25 Microsoft Corporation Discover biological features using composite images
WO2008007821A1 (fr) * 2006-07-12 2008-01-17 Korea Basic Science Institute Procédé de reconstitution d'une base de données de protéines et procédé d'identification de protéines à l'aide de ce procédé
US8296300B2 (en) 2006-07-12 2012-10-23 Korea Basic Science Institute Method for reconstructing protein database and a method for screening proteins by using the same method

Also Published As

Publication number Publication date
AU2002347462A1 (en) 2003-05-12
WO2003038728A3 (fr) 2003-11-06

Similar Documents

Publication Publication Date Title
Polasky et al. Fast and comprehensive N-and O-glycoproteomics analysis with MSFragger-Glyco
Yang et al. MSBooster: improving peptide identification rates using deep learning-based features
JP5512546B2 (ja) 複合混合物の化学成分の組成を決定するためのシステム、方法、及びコンピュータ読み取り可能な媒体
Nesvizhskii Protein identification by tandem mass spectrometry and sequence database searching
US7457708B2 (en) Methods and devices for identifying related ions from chromatographic mass spectral datasets containing overlapping components
Carvalho et al. YADA: a tool for taking the most out of high-resolution spectra
Samuelsson et al. Modular, scriptable and automated analysis tools for high-throughput peptide mass fingerprinting
EP2450815B1 (fr) Méthode d'identification de peptides et de protéines à partir de données de spectrométrie de masse
Savitski et al. New data base-independent, sequence tag-based scoring of peptide MS/MS data validates Mowse scores, recovers below threshold data, singles out modified peptides, and assesses the quality of MS/MS techniques
US20030078739A1 (en) Feature list extraction from data sets such as spectra
Du et al. A noise model for mass spectrometry based proteomics
EP1614140A2 (fr) Techniques d'analyse de donnees de spectrometrie de masse
Awan et al. MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing
Durbin et al. Prosight native: defining protein complex composition from native top-down mass spectrometry data
Hu et al. A semi‐empirical approach for predicting unobserved peptide MS/MS spectra from spectral libraries
JP4058449B2 (ja) 質量分析方法および質量分析装置
Noy et al. Improved model-based, platform-independent feature extraction for mass spectrometry
Mao et al. MS-Decipher: a user-friendly proteome database search software with an emphasis on deciphering the spectra of O-linked glycopeptides
Hundertmark et al. MS-specific noise model reveals the potential of iTRAQ in quantitative proteomics
JP5776443B2 (ja) 質量分析を用いた修飾タンパク質同定方法及び同定装置
WO2003038728A2 (fr) Systeme informatique et procede utilisant des donnees de spectrometrie de masse et une base de donnees proteine pour l'identification de proteines inconnues
Villanueva et al. Data analysis of assorted serum peptidome profiles
Du et al. Data reduction of isotope-resolved LC-MS spectra
Flikka et al. Implementation and application of a versatile clustering tool for tandem mass spectrometry data
US20080166696A1 (en) Method for Analyzing Proteins

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 240804)

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP