Background technology
In current proteome research, be one of the most widely used technology (list of references: Aebersold, R.and Mann based on the identification of proteins of tandem mass spectrum, M.Mass spectrometry-based proteomics, Nature, 2003,422:198-207).One of them problem is exactly that the tandem mass spectrum how to obtain from experiment automatically identifies these mass spectral peptide sequences of generation.In order to identify the sequence of the peptide that produces the experiment tandem mass spectrum, the method of database search is widely adopted (list of references: Eng, J.K., McCormack, A.L.and Yates, J.R.An approach tocorrelate tandem mass spectral data of peptides with amino acid sequences in a proteindatabase.J Am Soc Mass Spectrom, 1994,5:976-989; Perkins, D.N., Pappin, D.J., Creasy, D.M.and Cottrell, J.S.Probability-based protein identification by searchingsequence databases using mass spectrometry data.Electrophoresis, 1999,20:3551-3567; Field, H.I., Feny , D.and Beavis, R.C.RADARS, a bioinformatics solution that automatesproteome mass spectral analysis, optimises protein identification, and archives data in arelational database.Proteomics, 2002,2:36-47).In the method, the peptide sequence in the database by theory cracked be fragmention, the generative theory tandem mass spectrum; And want certified peptide in mass spectrometer by cracked be fragmention, and generate and test tandem mass spectrum; Theoretical tandem mass spectrum is compared with the experiment tandem mass spectrum, thereby the candidate's peptide in the database is given a mark; Result according to marking selects the corresponding peptide of the theoretical tandem mass spectrum the most similar with testing tandem mass spectrum as qualification result at last.
As seen, in the method for database search, the problem of a key is theoretical tandem mass spectrum to be made suitable similarity with the experiment tandem mass spectrum calculate, and promptly selects suitable peptide marking algorithm.Unsuitable similarity is calculated the algorithm of peptide marking in other words can increase wrong peptide qualification result---and be false positive results, and by selecting suitable peptide marking algorithm can reduce the false positive results that peptide is identified.
Used scoring functions supposes that usually the appearance of fragmention is mutually independently in the tandem mass spectrum in the existing peptide marking algorithm, thereby adopts linear scoring functions.In linear scoring method, the correlativity that may exist between fragmention has been left in the basket fully.All ion couplings between experiment and theoretical mass spectrum are put on an equal footing calculates total mark.In fact, the foreseeability fully of peptide fragmentation pattern, the expendable information of being lost in the fragmentation, the enormous quantity of candidate's peptide all make the random error coupling often take place, the peptide that finally may lead to errors is identified, promptly false-positive result.
In fact, peptide is by theoretical or test after cracked back produces fragmention, and continuous fragmention wherein is potential positively related ion.When positively related ion is mated simultaneously, should have higher credibility as individuality than them on these couplings are directly perceived as a whole.So these positively related ions should be carried out to a certain extent and emphasize, correspondingly just need to use nonlinear peptide scoring functions.
Summary of the invention
An object of the present invention is to provide a kind of method of using tandem mass spectrum data to identify peptide, adopt a kind of new peptide scoring method in the method; Another object of the present invention provides a kind of method of using tandem mass spectrum data to identify peptide, has considered the correlativity of continuous fragmention in the method.
To achieve these goals, the invention provides a kind of method of using tandem mass spectrum data to identify peptide, comprise step:
Will experimentize cracked by certified peptide to generate the experiment tandem mass spectrum;
It is cracked to generate a plurality of theoretical tandem mass spectrums that a plurality of candidate's peptides in the database are carried out theory;
Calculate a plurality of theoretical tandem mass spectrums and the similarity of testing tandem mass spectrum respectively with radial basis function nuclear, this radial basis function comprises an exponential part;
Select the theoretical tandem mass spectrum pairing peptide the most similar as qualification result according to the similarity of being calculated to testing tandem mass spectrum.
The method of described use tandem mass spectrum data evaluation peptide also comprises carries out denoising to described experiment tandem mass spectrum.
In generating described theoretical tandem mass spectrum step, also comprise selected fragmention type.
The exponential part of described radial basis function nuclear comprises the summation operation to continuous fragmention.
In the similarity step of calculating described a plurality of theoretical tandem mass spectrums and experiment tandem mass spectrum, also comprise:
With theoretical tandem mass spectrum with the experiment tandem mass spectrum according to selected fragmention type and fragment
The cracked position of ion is arranged in matrix T and Matrix C respectively; Described continuous fragmention is arranged in the continuous position of matrix delegation;
Described radial basis function kernel form is
C wherein
IkAnd t
IkBe respectively the matrix element of matrix T and Matrix C, when k≤0 and k>n, c
IkAnd t
IkBe changed to 0;
Positive integer l
1And l
2Equal respectively
With
Integer l is the number of the described continuous fragmention that will consider; γ is described customized parameter.L=5 and 0.8≤γ≤1 preferably.
Use tandem mass spectrum data of the present invention identifies that the method for peptide adopts radial basis function to examine and estimates a plurality of theoretical tandem mass spectrums and the similarity of testing tandem mass spectrum, and further in the exponential part of radial basis function nuclear by the summation of continuous fragmention being emphasized the positive correlation characteristic of continuous fragmention, have higher accuracy rate than the method for identifying peptide in the prior art, obviously reduced false positive results.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
As shown in Figure 1, two amino acid can link up at their C-end and N-end formation peptide bond by losing a hydrone, and peptide is exactly that amino acid residue interconnects the sequence that forms by peptide bond.This sequence has been determined the identity of peptide.
In order to identify the amino acid sequence of peptide, after being ionized, peptide enters mass spectrometer.In mass spectrometer, the peptide ion (these peptide ions also have identical amino acid sequence usually) with specific mass-to-charge ratio (m/z) is in collision one separation of inducing (Collision-Induced Dissociation, CID) effect cracking down.Under low-yield CID effect, the rib key can rupture in three kinds of modes usually, generates the fragmention of six series, i.e. a of N-end, and b, the x of c and C-end, y, z series fragmention, as shown in Figure 2.Fig. 2 is the example of a peptide fragmention that cracking forms under the CID effect that is made of four amino acid residues, the cracked position of peptide when wherein representing index number 1~3 expression of alphabetical a, b, c, x, y and the z of fragmention series to generate this fragmention, the symbol H in the upper right corner among Fig. 2
+The true positive charge of expression peptide band.
The m/z of these fragmentions is tested to be measured, thereby forms tandem mass spectrum, perhaps is referred to as to test tandem mass spectrum.Fig. 3 has provided an exemplary experiment tandem mass spectrum.The m/z of the fragmention that mass spectral horizontal ordinate representative is detected, ordinate is represented the relative intensity of fragmention.Mass peak in the mass spectrum also may be formed by uncertain fragmention (such as inner ion) except being formed by foreseeable fragmention, also may be physics or chemical noise.Usually need carry out denoising to the tandem mass spectrum that experiment obtains.Simple way is to keep the bigger mass peak of certain proportion intensity, and removes other mass peak, for example in one embodiment, can only keep preceding 200 mass peaks that intensity is bigger.
Identify peptide sequence in order to utilize tandem mass spectrum, need generate the process of tandem mass spectrum to the simulation of the candidate's peptide sequence in the database of forming by known peptide, the mass spectrum that this simulation generates is called theoretical tandem mass spectrum, the corresponding theoretical tandem mass spectrum of each candidate's peptide sequence.When the generative theory tandem mass spectrum, at first will be according to mass spectrometric type and the selected fragmention type that will consider of characteristic.For example in one embodiment, only consider a, b and y series fragmention among Fig. 2, this is because the fragmention of common a, b and y series (situation that comprises monovalence and multivalence and dehydration or lose ammonia) is main.Be readily appreciated that those skilled in the art can be according to the selected fragmention type considered different with the foregoing description of actual conditions.After the selected fragmention type that will consider, again peptide sequence is simulated crackedly, predict the mass-to-charge ratio (m/z) and the intensity of the fragmention of all specified type, to form theoretical mass spectrum.The mass-to-charge ratio of fragmention equals the charge number of the molecular weight of this ion divided by this ion.The prediction of the theoretical strength of fragmention itself is that another one studies a question, and can all be appointed as 1 under the simple scenario, supposes that promptly the probability that all ions occur equates.
According to the cracked position of selected fragmention type and fragmention correspondence pre-measured ion is arranged in the form of an array, this array is called pre-measured ion array.Fig. 4 shows the embodiment of a pre-measured ion array, and in this embodiment, selected fragmention type is b and y series fragmention, specifically comprises b, b
0, b
*And b
++And y, y
0, y
*And y
++, wherein two positive charges of subscript ' ++ ' expression ion band are not gone up target and are represented positive charge of ion band, and subscript ' * ' expression ion has lost an amino molecule, and subscript ' 0 ' expression ion has lost a hydrone, b, b
0, b
*And b
++And y, y
0, y
*And y
++The cracked position of index number 1~n representative peptide when generating this fragmention.In Fig. 4, with the fragmention type as vertically, will generate the cracked position of peptide of fragmention correspondence as the pre-measured ion array of transversely arranged one-tenth.
Fragmention intensity in the theoretical tandem mass spectrum is shown as matrix T according to the sequence list of pre-measured ion array,
Wherein corresponding with pre-measured ion array, in matrix T, element t
I, jSubscript i be used to distinguish different fragmention types, subscript j is used to distinguish different cracked positions, element t
I, jBe in the pre-measured ion array (i, the j) intensity of locational fragmention in theoretical tandem mass spectrum, for example, t
2,3Corresponding to the b among Fig. 4
3 *The intensity of ion in theoretical tandem mass spectrum; M is the number of selected fragmention type; N+1 is the amino acid residue number that peptide sequence comprises, and such peptide comprises n cracked position.
The intensity of each mass peak in the experiment tandem mass spectrum also is shown as Matrix C according to the sequence list of pre-measured ion array,
Wherein, if having one or more mass peaks in experiment in the tandem mass spectrum, the (i, j) mass-to-charge ratio of the fragmention of individual position is complementary, then c in their mass-to-charge ratio and the pre-measured ion array
I, jEqual to test the intensity of mating mass peak in the tandem mass spectrum and, otherwise C
I, j=0.Corresponding with theoretical tandem mass spectrum matrix T with pre-measured ion array, subscript i is used to distinguish different fragmention types, and subscript j is used to distinguish different cracked positions.Here in the be complementary mass-to-charge ratio that is meant some mass peaks in the experiment tandem mass spectrum and the pre-measured ion array of said mass-to-charge ratio the difference of the mass-to-charge ratio of the fragmention of some positions in the specification error scope, specified error range is generally about 1Da for the ion trap mass spectrometry data, and error range specified for the Q-Tof data is generally about 0.4Da.
Weigh experiment mass spectrum and theoretical mass spectrum similarity with formula (1), this method can be described as RBF-KSDP marking algorithm.
Wherein, positive integer l
1And l
2Equal respectively
With
(symbol
With
Respectively representative downwards and round up), and integer l (<n) be the number correlation window length in other words of the continuous fragmention that will consider, γ is the parameter in the RBF kernel function.For k≤0 and k>n, c
IkAnd t
IkBe changed to 0.
Formula (1) is radial basis function nuclear exp (γ ‖ x-y ‖
2) a concrete form, it comprises the summation of the summation of various fragmention types (promptly to subscript i summation) and each cracked position (promptly to subscript j summation).Further, in formula (1), its exponential part also comprises a summation to k, and it is summed to j is that Center Length is the summation of l.This shows, in the character of having considered continuous fragmention with formula (1) when giving a mark, said continuous fragmention is meant a plurality of fragmentions that are in continuous cracked position in a kind of fragmention type, as three empty frames of usefulness exemplary among Fig. 4 respectively frame gone out three groups of continuous fragmentions (number of continuous ionic promptly is the l in the formula (1) in the empty frame), continuously fragmention is arranged in the continuous position of pre-measured ion array delegation.
All peptide sequences in the database can be arranged according to itself and the mass spectral RBF-KSDP score value size of experiment, thereby identify the peptide sequence that most probable generates the experiment tandem mass spectrum.
Fig. 5 illustrates an experimental result that adopts authentication method of the present invention, the horizontal ordinate of Fig. 5 is a γ value in the formula (1), and ordinate is for identifying error rate, and the curve among the figure is represented the variation of l=2~6 o'clock error rate with γ respectively, from Fig. 5, can obtain, preferably l=5 and 0.8≤γ≤1.