CN117746993A - Mirror image peptide fragment mass spectrogram pair identification method - Google Patents

Mirror image peptide fragment mass spectrogram pair identification method Download PDF

Info

Publication number
CN117746993A
CN117746993A CN202311757260.9A CN202311757260A CN117746993A CN 117746993 A CN117746993 A CN 117746993A CN 202311757260 A CN202311757260 A CN 202311757260A CN 117746993 A CN117746993 A CN 117746993A
Authority
CN
China
Prior art keywords
spectrogram
pair
pairs
mirror image
mass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311757260.9A
Other languages
Chinese (zh)
Inventor
付岩
曹子璇
周丕宇
彭学力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Mathematics and Systems Science of CAS
Original Assignee
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Mathematics and Systems Science of CAS filed Critical Academy of Mathematics and Systems Science of CAS
Priority to CN202311757260.9A priority Critical patent/CN117746993A/en
Publication of CN117746993A publication Critical patent/CN117746993A/en
Pending legal-status Critical Current

Links

Landscapes

  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a mirror image peptide fragment mass spectrogram pair identification method, which comprises the following steps: obtaining two groups of mass spectrograms obtained through paired mirror image enzyme digestion respectively, and generating a spectrogram pair set; screening candidate mirror image spectrogram pairs from the spectrogram pair set to obtain a candidate mirror image spectrogram pair list; directly utilizing information differences among spectrogram pairs to calculate the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list; and determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair. By utilizing the scheme of the invention, the identification efficiency can be improved, and the time cost can be saved; and the accuracy of the identification result is effectively improved.

Description

Mirror image peptide fragment mass spectrogram pair identification method
Technical Field
The invention relates to the technical field of biology, in particular to a mirror image peptide fragment mass spectrogram pair identification method.
Background
In proteomics research based on mass spectrometry technology, peptide de novo sequencing methods have received extensive attention in terms of their flexibility and efficiency, however the accuracy of de novo sequencing is limited by the low mass spectrum data quality, especially ion coverage. Mirror image enzyme technology is a common means of solving the problem of low ion coverage. Mirror image cleavage refers to cleavage of the same amino acid at the C-and N-termini with two enzymes, respectively, to produce peptide fragments beginning and ending with the amino acid and having the same intermediate sequence, respectively, such peptide fragments exhibiting features of mirror images of each other in mass spectra, and thus are referred to as mirror image peptide fragments, and the two corresponding spectra are referred to as mirror image spectrogram pairs.
For common split spectra, it is critical to identify which are mirror image spectrum pairs. The current mainstream mirror image spectrogram pair identification method is to pre-sequence peptide fragments and match spectrogram pairs with mirror images of sequences, the accuracy of the method depends on the accuracy of the result of the pre-sequence to a great extent, and the time cost consumed by sequencing is not negligible. Thus, studying a sequence-independent mirror spectrogram is necessary for the recognition algorithm. Meanwhile, the mirror image spectrogram pair identification result can help to realize fragment ion peak complete coverage and ion type identification of part of spectrograms, so that a more accurate peptide de novo sequencing method is developed.
Disclosure of Invention
The invention provides a mirror image peptide mass spectrogram pair identification method, which can accurately identify the mirror image peptide mass spectrogram pair without pre-sequencing, and improves the identification efficiency.
Therefore, the invention provides the following technical scheme:
a method of mirror image peptide fragment mass spectrum pair identification, the method comprising:
obtaining two groups of mass spectrograms obtained through paired mirror image enzyme digestion respectively, and generating a spectrogram pair set;
screening candidate mirror image spectrogram pairs from the spectrogram pair set to obtain a candidate mirror image spectrogram pair list;
directly utilizing information differences among spectrogram pairs to calculate the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list;
and determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair.
Optionally, the pair of mirror image enzymes comprises: any pair of mirror image enzymes, or any combination of pairs of mirror image enzymes.
Optionally, the screening candidate mirror spectrogram pairs from the spectrogram pair set, and obtaining a candidate mirror spectrogram pair list includes:
for each pair of spectrogram pairs in the spectrogram pair set, calculating a parent ion mass difference of the spectrogram pairs;
and if the parent ion mass difference is within the error range set by the theoretical mass difference, the pair of spectrograms is taken as a candidate mirror image spectrogram pair.
Optionally, the method further comprises: and preprocessing the candidate mirror spectrogram pairs before calculating the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list.
Optionally, the preprocessing includes any one or more of: removing isotope peaks, converting into single charges, removing water and ammonia, removing imine ion peaks, removing noise peaks, normalizing spectral peak intensity, and generating complementary ion peaks.
Optionally, the information difference between the spectrogram pairs includes: peak mass to charge ratio differences in the fragment ion spectra and parent ion mass differences;
the calculating the matching degree score of each pair of candidate mirror image spectrogram pairs in the candidate mirror image spectrogram pair list by directly utilizing the information difference between spectrogram pairs comprises:
and calculating the matching degree score of the candidate mirror image spectrogram pair by utilizing the peak-to-charge ratio difference of the fragment ion spectrum and the mass difference of the parent ion.
Optionally, the calculating the matching score of the candidate mirror image spectrogram pair by using the peak-to-charge ratio difference of the fragment ion spectrum and the mass difference of the parent ion comprises:
uniformly dividing the peak mass-to-charge ratio difference of the fragment ion spectrum into a plurality of cells, counting the sum of the fragment ion intensities and the number of fragment ion pairs falling in each cell, and multiplying the sum of the fragment ion intensities and the number of fragment ion pairs in each cell as a mass difference statistic of the cell;
calculating the statistical score of candidate mirror spectrogram pairs according to the mass difference statistic and the parent ion mass difference;
and determining the matching degree score of the candidate mirror spectrogram pair according to the statistical score of the candidate mirror spectrogram pair.
Optionally, said calculating a statistical score for candidate mirror spectrogram pairs based on the mass difference statistic and the parent ion mass difference comprises:
determining one or more intervals in which the theoretical mass difference of the fragment ions exists according to the mass difference of the parent ions;
calculating the maximum value of the quality difference statistics of the interval where the theoretical quality difference of the fragment ions is located, and taking the ranking value of the maximum value in the quality difference statistics of all the intervals as the statistical scoring of candidate mirror spectrogram pairs; or alternatively
And calculating the minimum e-value of the interval in which the theoretical mass difference of the fragment ions is positioned from the distribution of the mass difference statistics of all the intervals, and taking the minimum e-value as the statistical scoring of the candidate mirror spectrogram pairs.
Optionally, the information difference between the spectrogram pairs further includes: retention time differences;
the method further comprises the steps of:
before screening candidate mirror image spectrogram pairs from the spectrogram pair set, calculating retention time differences of the spectrogram pairs for each pair of spectrogram pairs in the spectrogram pair set;
and removing candidate mirror image spectrogram pairs, wherein the retention time difference between spectrogram pairs in the candidate mirror image spectrogram pair list is larger than a set time threshold.
Optionally, the information difference includes: fragment ion spectrum peak mass to charge ratio difference, parent ion mass difference, retention time difference;
the calculating the matching degree score of each pair of candidate mirror image spectrogram pairs in the candidate mirror image spectrogram pair list by directly utilizing the information difference between spectrogram pairs comprises:
and inputting the fragment ion mass and intensity, the parent ion mass, the fragment ion spectrum peak mass-to-charge ratio difference, the parent ion mass difference and the retention time difference of the candidate mirror image spectrogram pairs into a matching degree identification model which is manually constructed or is obtained based on machine learning training, so as to obtain the matching degree score of the candidate mirror image spectrogram pairs.
Optionally, the determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair includes:
and screening candidate mirror spectrogram pairs with the matching degree score larger than a set matching degree threshold value as mirror spectrogram pairs.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the mirrored peptide fragment mass spectrogram pair identification method described hereinbefore.
According to the mirror image peptide fragment mass spectrogram pair identification method provided by the invention, two groups of mass spectrograms respectively obtained through paired mirror image enzyme digestion are obtained, a spectrogram pair set is generated, and then a candidate mirror image spectrogram pair list is screened out from the spectrogram pair set; then calculating the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list; and determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair. The calculation of the matching degree score of the candidate mirror image spectrogram is only related to the information provided by the spectrogram itself and does not depend on the prediction sequence results of other software, so that the recognition efficiency can be improved, and the time cost can be saved; and the accuracy of the identification result is effectively improved.
Drawings
FIG. 1 is a flow chart of a mirror image peptide fragment mass spectrogram pair identification method provided by the invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
The present invention will be described in detail below with reference to the drawings and the specific embodiments, which are not described in detail herein, but the embodiments of the present invention are not limited to the following embodiments.
Aiming at the problems that the current mainstream mirror image peptide mass spectrogram pair identification method needs to pre-sequence peptide fragments and then match spectrogram pairs with mirror images of each other, the accuracy of the method depends on the accuracy of a predicted sequence result to a great extent.
As shown in fig. 1, the invention provides a flow chart of a mirror image peptide mass spectrogram pair identification method, which comprises the following steps:
step 101, obtaining two groups of mass spectrograms obtained through paired mirror image enzyme digestion respectively, and generating spectrogram pair sets.
The mass spectrum is a spectrum for describing the mass-to-charge ratio and intensity information of the detected ions, and can also be simply referred to as a "spectrum".
The spectrogram pair set refers to a set of all spectrogram pairs consisting of spectrograms in two spectrogram sets.
For convenience of description, two spectrogram sets are referred to as spectrogram set 1 and spectrogram set 2, and the spectrogram pair set represents a set of all spectrogram pairs composed of spectrograms in 1 and 2.
And 102, screening candidate mirror image spectrogram pairs from the spectrogram pair set to obtain a candidate mirror image spectrogram pair list.
The basic principle of screening candidate mirror spectrogram pairs is that the intermediate sequences of peptide fragments corresponding to each type of mirror spectrogram pair are identical, so that the theoretical mass difference of parent ions is determined.
Therefore, in the embodiment of the present invention, the parent ion mass difference of each pair of spectrogram pairs in the spectrogram pair set, that is, each spectrogram in the spectrogram set 1 and all spectrograms in the spectrogram set 2, may be sequentially determined, and the parent ion mass difference of the spectrogram pairs is calculated; and if the parent ion mass difference is within the error range set by the theoretical mass difference, using the pair of spectrograms as candidate mirror image spectrogram pairs, thereby obtaining a candidate mirror image spectrogram pair list.
The theoretical mass difference of the parent ions can be multiple, and is specifically determined according to the mirror image enzymes used by the sample (i.e., spectrogram), for example, the mirror image enzymes are trypsin and lysargina, and enzyme digestion is performed at the N end and the C end of the amino acid K, R respectively, so that the theoretical mass difference of the parent ions of different types of mirror image spectrogram pairs is 0, ±28, ±128, ±156 respectively. The error range e is specified by the user, e.g., e=10 can be set.
Specifically, the currently judged spectrogram pair is recorded as spectrogram t and spectrogram l, and the mother of the spectrogram pairIon masses are mass respectively t And mass l The theoretical mass difference of the parent ions of the type to be judged is Mdalton, the set error tolerance range is eppm, and the method comprises the following steps of:
D 1 =[mass t (1-e·10 -6 )]-[mass l (1+e·10 -6 )]
D 2 =[mass t (1+e·10 -6 )]-[mass l (1-e·10 -6 )]
if M is E [ D ] 1 ,D 2 ]Then spectrogram t and spectrogram l are candidate mirror image spectrogram pairs of this type; otherwise not.
It should be noted that the types refer to types of spectrogram pairs, for example, in a non-limiting embodiment, spectrogram pairs are 7 types in total (refer to table 1 below for details). For each pair of spectrograms, the 7 types are judged in turn, and when the type of the spectrogram pair is judged to be the A type, the A type corresponds to a parent ion theoretical mass difference M (A). Calculating to obtain the D 1 And D 2 Later, if M (A) e [ D 1 ,D 2 ]And determining that the pair of spectrograms is of the type A, and if not, continuing to judge the next type until all 7 types are judged.
And step 103, calculating the matching degree score of each pair of candidate mirror image spectrogram pairs in the candidate mirror image spectrogram pair list directly by utilizing information differences among spectrogram pairs.
The information difference may include: the fragment ion spectrum peak mass to charge ratio difference and parent ion mass difference may further include: retention time differences. The difference of the peak mass to charge ratios of the fragment ion spectrum can be obtained by making a difference of the peak mass to charge ratios of the fragment ion spectrum of the candidate mirror image spectrogram pair, the difference of the mass of the parent ion can be obtained by making a difference of the mass of the parent ion of the candidate mirror image spectrogram pair, and the difference of the retention time can be obtained by making a difference of the retention time of the candidate mirror image spectrogram pair.
The matching degree calculation of the spectrogram pairs is only related to the information provided by the spectrogram, and is independent of the prediction sequence results of other software. The basic principle is that the theoretical mass difference of the fragment ions of each type of mirror spectrogram pair is determined, and the error range can be set according to the requirement, wherein the theoretical mass difference of the fragment ions can be multiple. Specifically, the matching score can be designed according to the spectral peaks of each pair of candidate mirror image spectrogram pairs, and combining the mass and intensity information of fragment ions, the mass and retention time information of parent ions.
In the embodiment of the present invention, the calculation of the matching degree may have various implementation manners, so long as the confidence degree that the current spectrogram pair is considered as the mirror spectrogram pair can be reflected, for example, statistical scoring, machine learning scoring and the like may be adopted, which is not limited to the embodiment of the present invention.
For example, the peak-to-charge ratio difference of the fragment ion spectrum and the parent ion mass difference may be used to calculate a match score for a candidate mirror spectrogram pair. The step of calculating the matching degree score using a statistical scoring method may include: uniformly dividing the peak-to-charge ratio difference of the fragment ion spectrum into a plurality of cells, counting the sum of the fragment ion intensities and the number of fragment ion pairs falling in each cell, and multiplying the sum of the fragment ion intensities in each cell by the number of fragment ion pairs to obtain a mass difference statistic of the cell; calculating the statistical scoring of candidate mirror spectrogram pairs according to the mass difference statistic and the parent ion mass difference; and determining the matching degree score of the candidate mirror spectrogram pair according to the statistical score of the candidate mirror spectrogram pair.
Wherein the step of calculating a statistical score may comprise: determining one or more intervals in which the theoretical mass difference of the fragment ions exists according to the mass difference of the parent ions; calculating the maximum value of the quality difference statistics of the interval where the theoretical quality difference of the fragment ions is located, and taking the ranking value of the maximum value in the quality difference statistics of all the intervals as the statistical scoring of candidate mirror spectrogram pairs; or calculating the minimum e-value of the interval in which the theoretical mass difference of the fragment ions is located from the distribution of the mass difference statistics of all the intervals, and taking the minimum e-value as the statistical score of the candidate mirror spectrogram pairs. The method for determining the matching degree score according to the statistical scoring can comprise the following steps: setting a statistical score as x, and taking a matching degree score of 1/x; or taking sigmoid (x); or take e -x
The matching degree score is calculated by a machine learning scoring method, namely the parent ion mass, the fragment ion mass and the intensity of the candidate mirror image spectrogram pairs, the peak-to-charge ratio difference of the fragment ion spectrum and the parent ion mass difference are input into a matching degree recognition model obtained based on machine learning training, and the matching degree score of the candidate mirror image spectrogram pairs is obtained. Further, the input parameters of the matching degree identification model may further include: retention time differences.
The matching degree recognition model can be constructed manually or obtained based on machine learning training, and the construction and the training process are not limited in the embodiment of the invention.
And 104, determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair.
Without loss of generality, the higher the match score of a spectrogram pair is assumed, the more reasonable it is considered a pair of mirror spectrogram pairs. Therefore, a candidate mirror spectrogram pair having a matching degree score greater than a set threshold value may be regarded as a mirror spectrogram pair.
In another non-limiting embodiment, the candidate mirror spectrogram pairs may also be preprocessed before calculating the matching score of each candidate mirror spectrogram pair in the candidate mirror spectrogram pair list in step 103. The pretreatment may include, but is not limited to, any one or more of the following: removing isotope peaks, converting into single charges, removing water and ammonia, removing imine ion peaks, removing noise peaks, normalizing spectral peak intensity, and generating complementary ion peaks.
The method of removing noise peaks may include: global denoising, and/or local denoising. The global denoising refers to filtering spectral peaks with the intensity larger than a set value or filtering the strongest spectral peaks with the set number; the local denoising refers to spectral peaks with filtering intensity larger than a set value in each window of xdalton or the strongest spectral peaks with filtering set quantity in each window of xdalton.
The method for normalizing the spectrum peak intensity can comprise the following steps: maximum minimum normalization, zero mean normalization, etc.
In another non-limiting embodiment, the retention time difference for each pair of the spectrum pair set is calculated for the spectrum pair set prior to screening candidate mirror image spectrum pairs from the spectrum pair set at step 202 above; and removing candidate mirror image spectrogram pairs, wherein the retention time difference between spectrogram pairs in the candidate mirror image spectrogram pair list is larger than a set time threshold. The efficiency and the accuracy of the subsequent determination of the mirror spectrogram pairs can be further improved.
According to the mirror image peptide fragment mass spectrogram pair identification method provided by the invention, two groups of mass spectrograms obtained through paired mirror image enzyme digestion are obtained, and a spectrogram pair set is generated; screening a candidate mirror image spectrogram pair list from the spectrogram pair set; then calculating the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list; and determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair. The calculation of the matching degree score of the candidate mirror image spectrogram is only related to the information provided by the spectrogram itself and does not depend on the prediction sequence results of other software, so that the recognition efficiency can be improved, and the time cost can be saved; and the accuracy of the identification result is effectively improved.
Compared with the existing mirror image peptide fragment mass spectrogram pair identification method, the scheme of the invention does not depend on the prediction sequence results of other software. Before the mirror image spectrogram pairs are identified by the similar algorithm pMerge, the two groups of mass spectrograms are required to be sequenced by using pNovo software respectively, after a sequencing result is obtained, the matching sequences are mirror image spectrogram pairs, a matching degree score is designed according to the sequencing result and spectrogram information, and the mirror image spectrogram pair result is filtered.
The mirror image peptide fragment mass spectrogram pair identification method provided by the scheme of the invention is suitable for any pair of mirror image enzymes or combination of the pair of mirror image enzymes.
Taking the single pair of mirror enzymes Trypsin and LysargiNase as an example, assuming a protein sequence … ABCKEPEPTIDEDDE …, the mirror peptide fragments PEPTIDER and KPEPTIDE can be produced after hydrolysis with Trypsin and LysargiNase, respectively, with a parent ion mass difference of approximately 28 (R-K). Ideally, if the peptide fragment is completely hydrolyzed to fragment ions, then:
for the Trypsin peptide, the b ion has P, PE, PEP, PEPT, PEPTI, PEPTID, PEPTIDE; y has R, ER, DER, IDER, TIDER, PTIDER, EPTIDER;
for the LysargiNase peptide, the b ion has K, KP, KPE, KPEP, KPEPT, KPEPTI, KPEPTID; y ions have E, DE, IDE, TIDE, PTIDE, EPTIDE, PEPTIDE, whose fragment ion mass is poor and tends to concentrate at-128 (-K, b ions) and 156 (R, y ions).
For other protein sequences, the mass differences of parent and fragment ions of the peptide fragments produced by cleavage of Trypsin and LysargiNase also have similar rules, as shown in Table 1 below, for the theoretical mass differences (approximations) of parent and fragment ions of different Trypsin and LysargiNase peptide fragments.
TABLE 1
There is provided a trypsin spectrogram t and a lysargina spectrogram l.
Firstly judging whether the retention time difference accords with the error, secondly calculating the mass difference of parent ions, and respectively calculating the upper limit and the lower limit D of the mass difference according to the definition 1 And D 2 If D 1 <0<D 2 The spectrogram pair is a candidate spectrogram pair of type a; if D 1 <28<D 2 The spectrogram pair is a candidate spectrogram pair of type B; the rest and so on. Thus, a list of all 7 types of candidate mirror spectrogram pairs can be obtained.
For each pair of candidate spectra, a matching score is calculated to characterize the degree of confidence that the pair is a mirror image of the spectrum. Assuming that the spectral pairs t and l are candidate spectral pairs of type B, the fragment ion mass differences should be theoretically centered at-128 and 156.
A specific implementation of scoring is given below:
firstly, the spectrum peak intensity is squared to make the integral scoring more average, then the intensities with different mass-to-charge ratios are respectively given weightWhere x is the mass to charge ratio size.
Next, the mass difference of fragment ions is calculated pairwise, the proper interval step length is taken, and the mass difference is [ -200,200]Dividing into N subintervals, mapping each quality difference to one interval index, combining the spectrum peaks of the same index, and sequentially obtaining the weighted intensity and sum_intensity corresponding to each index i Peak count i Calculate the quality difference statistic s= [ s ] for each index 1 ,…,s i ,…,s N ]Wherein s is i =sum_intensity i *count i . Assuming that the non-negative statistic S obeys the maximum extremum distribution, in the high-value region of S, S and F (S) =pr (S>s) log f(s) =a in logarithmic linear relationship 1 log s+a 2 +2, parameter a 1 And a 2 Unknown, but can be estimated from the high-value distribution of s to obtain an estimated value of F(s), and then e-value can be calculated as escore(s) =n·f(s), so as to obtain a statistical score escore of the spectrogram pair, and further, the score final_score of the spectrogram pair can be obtained by taking the reciprocal of escore.
As can be seen from definition of final_score, the higher the score is, the more reliable the result is proved, so that a threshold value alpha is set, and spectrogram pairs with all matching degree scores of final_score being more than or equal to alpha are filtered to obtain the recognized mirror image spectrogram pair result.
The term "plurality" as used in the embodiments of the present invention means two or more.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
Accordingly, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the mirror image peptide mass spectrogram pair identification method of fig. 1.
While the embodiments of the present invention have been described in detail, the detailed description of the invention is provided herein, and the description of the embodiments is provided merely to facilitate the understanding of the method and system of the present invention, which is provided by way of example only, and not by way of limitation. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention, and the present description should not be construed as limiting the present invention. It is therefore contemplated that any modifications, equivalents, improvements or modifications falling within the spirit and principles of the invention will fall within the scope of the invention.

Claims (12)

1. A method for identifying a mirror image peptide fragment mass spectrum pair, the method comprising:
obtaining two groups of mass spectrograms obtained through paired mirror image enzyme digestion respectively, and generating a spectrogram pair set;
screening candidate mirror image spectrogram pairs from the spectrogram pair set to obtain a candidate mirror image spectrogram pair list;
directly utilizing information differences among spectrogram pairs to calculate the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list;
and determining the mirror spectrogram pair according to the matching degree score of the candidate mirror spectrogram pair.
2. The method for identifying a pair of mirror image peptide fragment mass spectrograms according to claim 1, wherein the pair of mirror image enzymes comprises: any pair of mirror image enzymes, or any combination of pairs of mirror image enzymes.
3. The method for identifying a mirror image peptide fragment mass spectrum pair according to claim 1, wherein said screening candidate mirror image spectrum pairs from the spectrum pair set to obtain a list of candidate mirror image spectrum pairs comprises:
for each pair of spectrogram pairs in the spectrogram pair set, calculating a parent ion mass difference of the spectrogram pairs;
and if the parent ion mass difference is within the error range set by the theoretical mass difference, the pair of spectrograms is taken as a candidate mirror image spectrogram pair.
4. The method of claim 1, further comprising:
and preprocessing the candidate mirror spectrogram pairs before calculating the matching degree score of each pair of candidate mirror spectrogram pairs in the candidate mirror spectrogram pair list.
5. The method of claim 4, wherein the preprocessing comprises any one or more of the following: removing isotope peaks, converting into single charges, removing water and ammonia, removing imine ion peaks, removing noise peaks, normalizing spectral peak intensity, and generating complementary ion peaks.
6. The method of claim 1, wherein the information differences between the pairs of spectra comprise: peak mass to charge ratio differences in the fragment ion spectra and parent ion mass differences;
the calculating the matching degree score of each pair of candidate mirror image spectrogram pairs in the candidate mirror image spectrogram pair list by directly utilizing the information difference between spectrogram pairs comprises:
and calculating the matching degree score of the candidate mirror image spectrogram pair by utilizing the peak-to-charge ratio difference of the fragment ion spectrum and the mass difference of the parent ion.
7. The method of claim 6, wherein calculating a match score for the candidate mirror spectrogram pair using the peak-to-charge ratio difference of the fragment ion spectrum and the parent ion mass difference comprises:
uniformly dividing the peak mass-to-charge ratio difference of the fragment ion spectrum into a plurality of cells, counting the sum of the fragment ion intensities and the number of fragment ion pairs falling in each cell, and multiplying the sum of the fragment ion intensities and the number of fragment ion pairs in each cell as a mass difference statistic of the cell;
calculating the statistical score of candidate mirror spectrogram pairs according to the mass difference statistic and the parent ion mass difference;
and determining the matching degree score of the candidate mirror spectrogram pair according to the statistical score of the candidate mirror spectrogram pair.
8. The method of claim 7, wherein said calculating a statistical score for candidate mirror spectrogram pairs based on the mass difference statistic and the parent ion mass difference comprises:
determining one or more intervals in which the theoretical mass difference of the fragment ions exists according to the mass difference of the parent ions;
calculating the maximum value of the quality difference statistics of the interval where the theoretical quality difference of the fragment ions is located, and taking the ranking value of the maximum value in the quality difference statistics of all the intervals as the statistical scoring of candidate mirror spectrogram pairs; or alternatively
And calculating the minimum e-value of the interval in which the theoretical mass difference of the fragment ions is positioned from the distribution of the mass difference statistics of all the intervals, and taking the minimum e-value as the statistical scoring of the candidate mirror spectrogram pairs.
9. The method of identifying pairs of mirror image peptide fragment mass spectra according to any one of claims 6 to 8, wherein the information differences between the pairs of spectra further comprise: retention time differences;
the method further comprises the steps of:
before screening candidate mirror image spectrogram pairs from the spectrogram pair set, calculating retention time differences of the spectrogram pairs for each pair of spectrogram pairs in the spectrogram pair set;
and removing candidate mirror image spectrogram pairs, wherein the retention time difference between spectrogram pairs in the candidate mirror image spectrogram pair list is larger than a set time threshold.
10. The method for identifying a mirror image peptide fragment mass spectrum pair according to claim 1, wherein the information difference comprises: fragment ion spectrum peak mass to charge ratio difference, parent ion mass difference, retention time difference;
the calculating the matching degree score of each pair of candidate mirror image spectrogram pairs in the candidate mirror image spectrogram pair list by directly utilizing the information difference between spectrogram pairs comprises:
and inputting the fragment ion mass and intensity, the parent ion mass, the fragment ion spectrum peak mass-to-charge ratio difference, the parent ion mass difference and the retention time difference of the candidate mirror image spectrogram pairs into a matching degree identification model which is manually constructed or is obtained based on machine learning training, so as to obtain the matching degree score of the candidate mirror image spectrogram pairs.
11. The method of claim 1, wherein determining a mirror spectrogram pair from a matching score of the candidate mirror spectrogram pair comprises:
and screening candidate mirror spectrogram pairs with the matching degree score larger than a set matching degree threshold value as mirror spectrogram pairs.
12. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the mirror image peptide fragment mass spectrogram pair identification method of any one of claims 1 to 11.
CN202311757260.9A 2023-12-19 2023-12-19 Mirror image peptide fragment mass spectrogram pair identification method Pending CN117746993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311757260.9A CN117746993A (en) 2023-12-19 2023-12-19 Mirror image peptide fragment mass spectrogram pair identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311757260.9A CN117746993A (en) 2023-12-19 2023-12-19 Mirror image peptide fragment mass spectrogram pair identification method

Publications (1)

Publication Number Publication Date
CN117746993A true CN117746993A (en) 2024-03-22

Family

ID=90278994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311757260.9A Pending CN117746993A (en) 2023-12-19 2023-12-19 Mirror image peptide fragment mass spectrogram pair identification method

Country Status (1)

Country Link
CN (1) CN117746993A (en)

Similar Documents

Publication Publication Date Title
Zhang et al. ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data
US9354236B2 (en) Method for identifying peptides and proteins from mass spectrometry data
US8987662B2 (en) System and method for performing tandem mass spectrometry analysis
CN104076115B (en) Based on the Protein secondary Mass Spectrometric Identification method of peak intensity recognition capability
CN103245714B (en) Protein secondary mass spectrum identification method of marker loci based on candidate peptide fragment discrimination
US8694264B2 (en) Mass spectrometry system
CN104182658B (en) Tandem mass spectrogram identification method
Razumovskaya et al. A computational method for assessing peptide‐identification reliability in tandem mass spectrometry analysis with SEQUEST
CN103810200A (en) Database searching method and database searching system for open type protein identification
Beslic et al. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly
CN110349621B (en) Method, system, storage medium and device for checking reliability of peptide fragment-spectrogram matching
Zou et al. Charge state determination of peptide tandem mass spectra using support vector machine (SVM)
CN107729719B (en) De novo sequencing method
CN117746993A (en) Mirror image peptide fragment mass spectrogram pair identification method
CN108388774B (en) Online analysis method of polypeptide spectrum matching data
CN112464804B (en) Peptide fragment signal matching method based on neural network framework
CN114093415B (en) Peptide fragment detectability prediction method and system
CN106770605B (en) De novo sequencing method and device
He et al. Optimization-based peptide mass fingerprinting for protein mixture identification
Fei Novel Peptide Sequencing With Deep Reinforcement Learning
CN103177198B (en) A kind of protein identification method
CN107622184A (en) Amino acid confidence level and the appraisal procedure of decorating site positioning
Tschager Algorithms for Peptide Identification via Tandem Mass Spectrometry
V Nefedov et al. Bioinformatics tools for mass spectrometry-based high-throughput quantitative proteomics platforms
CN116486907A (en) Protein sequence tag sequencing method based on A star algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination