CN106872554B - The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic - Google Patents
The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic Download PDFInfo
- Publication number
- CN106872554B CN106872554B CN201510919595.5A CN201510919595A CN106872554B CN 106872554 B CN106872554 B CN 106872554B CN 201510919595 A CN201510919595 A CN 201510919595A CN 106872554 B CN106872554 B CN 106872554B
- Authority
- CN
- China
- Prior art keywords
- ion
- peptide fragment
- spectrogram
- peptide
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/68—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
- G01N33/6803—General methods of protein analysis not limited to specific proteins or families of proteins
- G01N33/6848—Methods of protein analysis involving mass spectrometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2570/00—Omics, e.g. proteomics, glycomics or lipidomics; Methods of analysis focusing on the entire complement of classes of biological molecules or subsets thereof, i.e. focusing on proteomes, glycomes or lipidomes
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Urology & Nephrology (AREA)
- Immunology (AREA)
- Biomedical Technology (AREA)
- Hematology (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Food Science & Technology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Pathology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention discloses a kind of analysis methods of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic.Peptide fragment-second order ms (MS/MS) figure Match Analysis based on fuzzy discrimination and reasoning from logic that the present invention relates to a kind of.The algorithm simulates people to the fuzzy Judgment of spectrogram quality with Logistics function, realizes logical derivation of the people to peptide section sequence with matrix inner products numerical procedure.It is tested by Null-test, which finds that algorithm of the invention can be by Null-test, intelligence is higher than contrast method compared with existing algorithm/software (Mascot, Morpheus, Pfind, MaxQuant).The algorithm simultaneously describes mass spectrometric data match condition without using conventional ion space, and peptide fragment information space is used to carry out the matching operation with mass spectrogram.
Description
Technical field
The present invention is to judge that the marking of candidate peptide fragment and second order ms (MS/MS) figure matching degree is analyzed in proteomics
Method, to the MS/MS spectrogram acquired from protein enzymatic hydrolyzate to candidate's peptide fragments all in specific protein sequence library and experiment
Matching marking is carried out, it is final to differentiate that there are which peptide fragments in enzymolysis liquid.
Background technique
Currently, Shotgun scheme is the weight that proteomics research largely identifies existing protein from complex system
Want method.Shotgun strategy is by after protease hydrolytic, obtaining peptide fragment, then from target egg for the albumen in research system
Bai Ku carries out matching marking.The performance of marking algorithm, determines the qualification result degree of reliability.Existing marking algorithm is based on
Probabilistic model perhaps the ion space based on peptide fragment or has increasingly complex Evaluation Strategy, by manually screening, it is found that
The qualification result false positive of existing algorithm is still higher, and the result of study of proteomics can be influenced in root.The present invention from
The angle for simulating artificial analysis spectrogram is set out, and the new marking algorithm of simulation mankind's fuzzy Judgment and reasoning from logic is developed, to reach
To the robustness and reliability of artificial spectrum unscrambling.
Summary of the invention
To avoid being screened superiority and inferiority one by one to tens of thousands of marking results using manual method, test and comparison scheme is used
Null-test scheme.Null-test contains the target protein library of a random sequence albumen up to ten thousand by randomized policy construction, leads to
The Target-Decoy scheme based on anti-library is crossed to carry out searching storehouse matching.Set FDR (False Discovery Rate) as
In the case where 20%, if the peptide fragment that algorithm identifies is 0 or 1, algorithm can be tested by Null-test, and performance is steady
It is strong, there is certain intelligence;If the peptide fragment that algorithm identifies be greater than 1, illustrate algorithm distinguish two be all random
The congeniality sequence library (the anti-library with hangar is still with hangar) in library, algorithm over-fitting, false positive results are relatively high.
Technical solution:
The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic,
According to the requirement of Shotgun Proteomics, protein enzymatic hydrolyzate is subjected to second order ms (MS/MS) analysis, is obtained big
In or equal to one MS/MS second order ms figure;To all in existing target protein database (Target database)
Protein sequence carries out inverted sequence and obtains Decoy database, by the protein sequence in two databases according to above-mentioned protein enzymatic hydrolyzate
Acquisition process carries out simulation digestion, obtains candidate peptide section sequence library;For a specific MS/MS mass spectrogram, according to setting
Quality error 0-50ppm, several candidate peptide fragments are filtered out from peptide section sequence library according to the mass number of parent ion, if
Screening obtained peptide fragment number is 0, then the MS/MS mass spectrogram is invalid;If screening obtained peptide fragment number is more than or equal to 1 to have
Mass spectrogram is imitated, then calculates peptide fragment in the score of second order ms figure according to following scoring methods, the peptide fragment of highest scoring is this
The best match peptide fragment of mass spectrogram;Above-mentioned scoring operations are carried out to other MS/MS mass spectrograms, all best " peptide fragments-mass spectrum
Figure " matching result arranges from big to small according to score, and presetting FDR (False Discover Rate) value is 0-5%, then can
The cutoff value of score is calculated, then is present in protein enzymatic hydrolyzate greater than the peptide fragment of the Target database that must be worth;
Determine it in protein sequence databank under corresponding mass by first mass spectrometric parent ion in peptide fragment qualification process
Candidate peptide section sequence S, give a mark to the matching degree of S and second order ms;
Scoring process is as follows:
1) the peptide fragment fragment ion effective ratio index u of MS/MS second level spectrogram is calculated, processing No Parity element marks spectrogram
When: u=(with the matched peptide fragment ion fragment peak strength signal summation of candidate peptide fragment)/(current spectrogram peak intensity signal summation);
Or, when processing has isotope labelling spectrogram: u=is (with the matched significant notation peptide fragment ion fragment peak strength signal of candidate peptide fragment
Summation)/(current spectrogram significant notation peak intensity signal summation), simulation people is converted to spectrogram matter by Logistics formula 1
The fuzzy evaluation index w of amount;
2) sequence information that the fragment ion in sequence S is included is encoded, default behavior are as follows: remember that peptide segment length is
N, n are the positive integer more than or equal to 2, and peptide section sequence arranges from top to bottom according to from aminoterminal (N-terminal) to c-terminus (C-terminal), and
And it is corresponding with hereafter column vector;
Peptide fragment b ion coding mode are as follows: if there is b1Ion, b1The column vector of ion corresponding specification n*1, first is
1, remaining position is 0;If there is b2Ion, b2 ion then first to second be 1, remaining position be 0;So analogize, such as
There are b for fruitn-1Ion, bn-1Ion corresponds to length then to be all 1 in first to n-1 in the column vector of n, and remaining position is
0;
To the mode of y ion coding are as follows: if there is y-1Ion, y1Ion pair is answered last in the column vector that length is n
One is 1, remaining position is 0;If there is y2Ion, y2Ion pair answers in the column vector that length is n last position to inverse the
Two be 1, remaining position is 0;So analogize, if there is yn-1Ion, yn-1It is last of the column vector of n that ion pair, which answers length,
Position is 1 to the 2nd, remaining position is 0;
The N-terminal ion of remaining type is encoded by b ion coding mode, remaining C-terminal ion carries out in such a way that y ion encodes
Coding;Finally information representation Matrix C is merged by column vector obtained above along row;
3) matrix inner products calculating is carried out to information representation Matrix C by formula 2, obtains information representation matrix X;Inner product calculates real
Show same ion to demonstrate,prove presence certainly, prove different end (C-terminals, N such as short ion presence, a-y, b-y with the long ion in end (such as C-terminal)
End) ion shake hands proof etc. reasoning from logics function;
4) it is summed all elements in X matrix again divided by peptide segment length n, then can obtain the check information of peptide fragment expression
Summation;And the logarithm that complementary ion in mass spectrogram is indicated with p, represents the amount of complementary information, and p is the positive integer more than or equal to 0;
Finally, by check information summation be multiplied again with fuzzy discrimination index w after the adduction of the amount of complementary information, then obtain this algorithm
To candidate peptide fragment S and the matched score score (formula 3) of MS/MS spectrogram;
The inner product computing function includes three reasoning from logic functions: same ion self proves, with the long ion card in end
Proof that the bright same short ion in end exists, different ends ion is shaken hands.
The algorithm simulates people to the fuzzy Judgment of spectrogram quality, with matrix inner products numerical procedure with Logistics function
Realize logical derivation of the people to peptide section sequence.It is tested by Null-test, the algorithm and existing algorithm/software (Mascot,
Morpheus, Pfind, MaxQuant) compare, it is found that algorithm of the invention can compare contrast method by Null-test, intelligence
Want high.The algorithm simultaneously describes mass spectrometric data match condition without using conventional ion space, uses peptide fragment information space
Carry out the matching operation with mass spectrogram.
The present invention has the advantage that
1. the present invention can be by Null-test, as a result more excellent, algorithm itself has certain intelligence.
2. the present invention uses peptide fragment information coding schemes, and does not use traditional ion space scheme.
The present invention can determine a large amount of erroneous matching result (being scored at 0), have stronger resolution capability.
Attached formula explanation:
Formula 1 is the Logistics formula that spectrogram quality index is converted into fuzzy evaluation.
Formula 2 is matrix inner products operation.
Formula 3 is scoring functions complete form of the invention.
Formula 1
X=CTC
Formula 2
Formula 3
Detailed description of the invention
Fig. 1 is the encoding scheme of peptide segment information.
Fig. 2 is the three kinds of reasoning from logic schemes realized by matrix inner products.
Specific embodiment
Method provided by the invention is described in detail below by embodiment, but the invention is not limited in any way.
Embodiment 1:
Use the collected human hepatocarcinoma cells' enzymolysis liquid data pair of Thermo Scientific Q Exactive mass spectrum
Marking algorithm is verified.It include 77979 second order ms figures in the data.Tolerance, which is composed, in level-one composes tolerance for 10ppm, second level
Under conditions of being set as 1% for 20ppm, FDR, marking analysis method of the invention can identify 14909 PSM (Peptide-
Spectrum Match), 8813 unique peptide fragments, 1752 albumen.Same type software, Morpheus identify 14903
PSM, 9038 unique peptide fragments, 1880 albumen;Mascot identifies 16648 PSM, 10247 unique peptide fragments, 1975 eggs
It is white.Method performance in the present invention is suitable with software at this stage.
It is 6 that Fig. 1, which demonstrates one of length, and a, b, y ion are all by the collected peptide fragment coding mode of mass spectrum.Figure
2, then the same ion for demonstrating length to be realized in 6 peptide fragment by matrix inner products demonstrate,proves presence, with end (such as C-terminal) length certainly
Ion, which proves that different end (the C-terminal, N-terminal) ions such as short ion presences, a-y, b-y are shaken hands, the reasoning from logics function such as proves.
Embodiment 2:
Using Null-test scheme, random formation sequence length is in 100~1000 random albumen, 12000 eggs altogether
It is white, the library Null anti-library corresponding with it is constituted, setting FDR is 20%.Algorithm of the invention in the above-mentioned library Null under the conditions of
It is matched to 1 peptide fragment, is tested by Null-test.Mascot is matched to 2 peptide fragments, cannot be tested by Null-test;
Morpheus and Pfind is respectively matched to 6 peptide fragments, cannot be tested by Null-test;MaxQuant is matched to 33 peptides
Section, cannot equally be tested by Null-test.Thus it proves, invention algorithm has intelligence really, and result is more
Reliably, steadily and surely.
Claims (2)
1. the analysis method of the protein enzymatic hydrolyzate based on fuzzy evaluation and reasoning from logic is based on fuzzy evaluation and reasoning from logic
The method for analyzing peptide fragment-second order ms (MS/MS) figure matching relationship in protein enzymatic hydrolyzate, it is characterised in that:
According to Shotgun Proteomics requirement, by protein enzymatic hydrolyzate carry out second order ms (MS/MS) analysis, obtain be greater than or
Person is equal to one MS/MS second order ms figure;To albumen all in existing target protein database (Target database)
Sequence carries out inverted sequence and obtains Decoy database, by the protein sequence in two databases according to the acquisition of above-mentioned protein enzymatic hydrolyzate
Process carries out simulation digestion, obtains candidate peptide section sequence library;For a specific MS/MS mass spectrogram, according to the matter of setting
Error 0-50ppm is measured, filters out several candidate peptide fragments from peptide section sequence library according to the mass number of parent ion, if screening
Obtained peptide fragment number is 0, then the MS/MS mass spectrogram is invalid;If screening obtained peptide fragment number to be more than or equal to 1 is effective matter
Spectrogram then calculates peptide fragment in the score of second order ms figure according to following scoring methods, and the peptide fragment of highest scoring is this mass spectrum
The best match peptide fragment of figure;Scoring operations are carried out to other MS/MS mass spectrograms, all best " peptide fragment-mass spectrogram " is matched
As a result arranged from big to small according to score, preset FDR(False Discover Rate) value be 0-5%, then can calculate
The cutoff value divided, then the peptide fragment greater than the Target database of the cutoff value is present in protein enzymatic hydrolyzate;
The scoring method is to be directed to determine it in protein sequence databank under corresponding mass by first mass spectrometric parent ion
Candidate peptide section sequence S, give a mark to the matching degree of S and second order ms;Including calculating fuzzy evaluation index w, letter is generated
Cease expression matrix C, to information representation Matrix C carry out matrix inner products calculate obtain information representation matrix X, by check information summation with
It is multiplied to obtain final score score with fuzzy evaluation index w again after the amount p adduction of complementary information;
The scheme of the calculating fuzzy evaluation index w are as follows: calculate the peptide fragment fragment ion useful signal ratio of MS/MS second level spectrogram
Rate index u, when processing No Parity element marks spectrogram: u=(total with the matched peptide fragment ion fragment peak strength signal of candidate peptide fragment
With)/(current spectrogram peak intensity signal summation);Or, processing is when having isotope labelling spectrogram: u=(with candidate peptide fragment is matched has
Criterion remembers peptide fragment ion fragment peak strength signal summation)/(current spectrogram significant notation peak intensity signal summation), pass through
Logistics formula 1 is converted to simulation people to the fuzzy evaluation index w of spectrogram quality;
Formula 1;
The generation information representation Matrix C process are as follows: compile the sequence information that the fragment ion in sequence S is included
Code, default behavior are as follows: note peptide segment length is n, and n is positive integer more than or equal to 2, peptide section sequence according to from aminoterminal (N-terminal) to
C-terminus (C-terminal) arranges from top to bottom, and corresponding with hereafter column vector;
Peptide fragment b ion coding mode are as follows: if there is b1Ion, b1The column vector of ion corresponding specification n*1, first is 1,
Remaining position is 0;If there is b2Ion, b2 ion then first to second be 1, remaining position be 0;So analogize, if deposited
In bn-1Ion, bn-1Ion corresponds to length then to be all 1 in first to n-1 in the column vector of n, and being left position is 0;
To the mode of y ion coding are as follows: if there is y1Ion, y1It is that last position in the column vector of n is that ion pair, which answers length,
1, remaining position is 0;If there is y2Ion, y2It is that last position is in the column vector of n to penultimate that ion pair, which answers length,
1, remaining position is 0;So analogize, if there is yn-1Ion, yn-1It is that last position of the column vector of n is arrived that ion pair, which answers length,
2nd is 1, remaining position is 0;
The N-terminal ion of remaining type is encoded by b ion coding mode, remaining C-terminal ion is compiled in such a way that y ion encodes
Code;Finally information representation Matrix C is merged by column vector obtained above along row;
The calculation method of the information representation matrix X are as follows: matrix inner products calculating is carried out to information representation Matrix C by formula 2, is obtained
Obtain information representation matrix X;
Formula 2;
The calculation method of the final score score are as follows: it is summed again all elements in X matrix divided by peptide segment length n,
The check information summation of peptide fragment expression can then be obtained;And the logarithm that complementary ion in mass spectrogram is indicated with p, represents complementary information
Amount, p is positive integer more than or equal to 0;Finally, it will be commented again with fuzzy after the adduction of check information summation and the amount of complementary information
Valence index w be multiplied, then obtain this algorithm to candidate peptide fragment S and the matched score score of MS/MS spectrogram;
Formula 3.
2. analysis method described in accordance with the claim 1, it is characterised in that: the inner product computing function includes three Logical Derivings
Reason function: same ion self proves, with hold long ion prove with hold short ion to exist, different ends ion is shaken hands proof.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510919595.5A CN106872554B (en) | 2015-12-13 | 2015-12-13 | The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510919595.5A CN106872554B (en) | 2015-12-13 | 2015-12-13 | The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106872554A CN106872554A (en) | 2017-06-20 |
CN106872554B true CN106872554B (en) | 2019-06-11 |
Family
ID=59177269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510919595.5A Active CN106872554B (en) | 2015-12-13 | 2015-12-13 | The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106872554B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103245714A (en) * | 2013-03-25 | 2013-08-14 | 暨南大学 | Protein secondary mass spectrum identification method of marker loci based on candidate peptide fragment discrimination |
CN103852513A (en) * | 2012-11-29 | 2014-06-11 | 中国科学院计算技术研究所 | Method and system based on HCD mass spectrogram and ETD mass spectrogram for peptide fragment de novo sequencing |
CN104034792A (en) * | 2014-06-26 | 2014-09-10 | 云南民族大学 | Secondary protein mass spectrum identification method based on mass-to-charge ratio error recognition capability |
-
2015
- 2015-12-13 CN CN201510919595.5A patent/CN106872554B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103852513A (en) * | 2012-11-29 | 2014-06-11 | 中国科学院计算技术研究所 | Method and system based on HCD mass spectrogram and ETD mass spectrogram for peptide fragment de novo sequencing |
CN103245714A (en) * | 2013-03-25 | 2013-08-14 | 暨南大学 | Protein secondary mass spectrum identification method of marker loci based on candidate peptide fragment discrimination |
CN104034792A (en) * | 2014-06-26 | 2014-09-10 | 云南民族大学 | Secondary protein mass spectrum identification method based on mass-to-charge ratio error recognition capability |
Non-Patent Citations (2)
Title |
---|
A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics;YONG FUGA LI 等;《JOURNAL OF COMPUTATIONAL BIOLOGY》;20091231;全文 * |
Protein Analysis by Shotgun/Bottom-up Proteomics;Yaoyang Zhang 等;《chemical reviews》;20131231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN106872554A (en) | 2017-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mason et al. | A guide for using functional diversity indices to reveal changes in assembly processes along ecological gradients | |
CN102495127B (en) | Protein secondary mass spectrometric identification method based on probability statistic model | |
Colak et al. | Automated McIntosh-based classification of sunspot groups using MDI images | |
CN109902018B (en) | Method for acquiring test case of intelligent driving system | |
CN105527359B (en) | Protein secondary Mass Spectrometric Identification method based on positive and negative planting modes on sink characteristic information matches | |
CN109933656A (en) | Public sentiment polarity prediction technique, device, computer equipment and storage medium | |
Liu et al. | Motif discoveries in unaligned molecular sequences using self-organizing neural networks | |
Harvey et al. | Phylogenetic extinction rates and comparative methodology | |
WO2019083351A2 (en) | Method and system for disease prediction and control | |
Wang et al. | Radial basis function neural network ensemble for predicting protein-protein interaction sites in heterocomplexes | |
Earl et al. | Spatial phylogenetics of butterflies in relation to environmental drivers and angiosperm diversity across North America | |
Peters et al. | Why is the biological hydrophobicity scale more accurate than earlier experimental hydrophobicity scales? | |
Fang et al. | Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method | |
JPH02195473A (en) | Method for forecasting attribute value in learning system | |
CN113762417A (en) | Method for enhancing HLA antigen presentation prediction system based on deep migration | |
Heinze-Deml et al. | Think before you act: A simple baseline for compositional generalization | |
CN116106878A (en) | Big data analysis system and method | |
Yilmaz et al. | Sequence-to-sequence translation from mass spectra to peptides with a transformer model | |
CN115312118A (en) | Single-sequence protein contact map prediction method based on map neural network | |
Worachartcheewan et al. | Quantitative population-health relationship (QPHR) for assessing metabolic syndrome | |
CN106872554B (en) | The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic | |
CN108805280A (en) | A kind of method and apparatus of image retrieval | |
Sadowski et al. | On the evolutionary origins of “Fold Space Continuity”: A study of topological convergence and divergence in mixed alpha-beta domains | |
CN107729719B (en) | De novo sequencing method | |
CN115620818A (en) | Protein mass spectrum peptide fragment verification method based on natural language processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |