CN106872554B - The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic - Google Patents

The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic Download PDF

Info

Publication number
CN106872554B
CN106872554B CN201510919595.5A CN201510919595A CN106872554B CN 106872554 B CN106872554 B CN 106872554B CN 201510919595 A CN201510919595 A CN 201510919595A CN 106872554 B CN106872554 B CN 106872554B
Authority
CN
China
Prior art keywords
ion
peptide fragment
spectrogram
peptide
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510919595.5A
Other languages
Chinese (zh)
Other versions
CN106872554A (en
Inventor
张丽华
张树荣
单亦初
张玉奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Institute of Chemical Physics of CAS
Original Assignee
Dalian Institute of Chemical Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Institute of Chemical Physics of CAS filed Critical Dalian Institute of Chemical Physics of CAS
Priority to CN201510919595.5A priority Critical patent/CN106872554B/en
Publication of CN106872554A publication Critical patent/CN106872554A/en
Application granted granted Critical
Publication of CN106872554B publication Critical patent/CN106872554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2570/00Omics, e.g. proteomics, glycomics or lipidomics; Methods of analysis focusing on the entire complement of classes of biological molecules or subsets thereof, i.e. focusing on proteomes, glycomes or lipidomes

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Hematology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Food Science & Technology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a kind of analysis methods of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic.Peptide fragment-second order ms (MS/MS) figure Match Analysis based on fuzzy discrimination and reasoning from logic that the present invention relates to a kind of.The algorithm simulates people to the fuzzy Judgment of spectrogram quality with Logistics function, realizes logical derivation of the people to peptide section sequence with matrix inner products numerical procedure.It is tested by Null-test, which finds that algorithm of the invention can be by Null-test, intelligence is higher than contrast method compared with existing algorithm/software (Mascot, Morpheus, Pfind, MaxQuant).The algorithm simultaneously describes mass spectrometric data match condition without using conventional ion space, and peptide fragment information space is used to carry out the matching operation with mass spectrogram.

Description

The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic
Technical field
The present invention is to judge that the marking of candidate peptide fragment and second order ms (MS/MS) figure matching degree is analyzed in proteomics Method, to the MS/MS spectrogram acquired from protein enzymatic hydrolyzate to candidate's peptide fragments all in specific protein sequence library and experiment Matching marking is carried out, it is final to differentiate that there are which peptide fragments in enzymolysis liquid.
Background technique
Currently, Shotgun scheme is the weight that proteomics research largely identifies existing protein from complex system Want method.Shotgun strategy is by after protease hydrolytic, obtaining peptide fragment, then from target egg for the albumen in research system Bai Ku carries out matching marking.The performance of marking algorithm, determines the qualification result degree of reliability.Existing marking algorithm is based on Probabilistic model perhaps the ion space based on peptide fragment or has increasingly complex Evaluation Strategy, by manually screening, it is found that The qualification result false positive of existing algorithm is still higher, and the result of study of proteomics can be influenced in root.The present invention from The angle for simulating artificial analysis spectrogram is set out, and the new marking algorithm of simulation mankind's fuzzy Judgment and reasoning from logic is developed, to reach To the robustness and reliability of artificial spectrum unscrambling.
Summary of the invention
To avoid being screened superiority and inferiority one by one to tens of thousands of marking results using manual method, test and comparison scheme is used Null-test scheme.Null-test contains the target protein library of a random sequence albumen up to ten thousand by randomized policy construction, leads to The Target-Decoy scheme based on anti-library is crossed to carry out searching storehouse matching.Set FDR (False Discovery Rate) as In the case where 20%, if the peptide fragment that algorithm identifies is 0 or 1, algorithm can be tested by Null-test, and performance is steady It is strong, there is certain intelligence;If the peptide fragment that algorithm identifies be greater than 1, illustrate algorithm distinguish two be all random The congeniality sequence library (the anti-library with hangar is still with hangar) in library, algorithm over-fitting, false positive results are relatively high.
Technical solution:
The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic,
According to the requirement of Shotgun Proteomics, protein enzymatic hydrolyzate is subjected to second order ms (MS/MS) analysis, is obtained big In or equal to one MS/MS second order ms figure;To all in existing target protein database (Target database) Protein sequence carries out inverted sequence and obtains Decoy database, by the protein sequence in two databases according to above-mentioned protein enzymatic hydrolyzate Acquisition process carries out simulation digestion, obtains candidate peptide section sequence library;For a specific MS/MS mass spectrogram, according to setting Quality error 0-50ppm, several candidate peptide fragments are filtered out from peptide section sequence library according to the mass number of parent ion, if Screening obtained peptide fragment number is 0, then the MS/MS mass spectrogram is invalid;If screening obtained peptide fragment number is more than or equal to 1 to have Mass spectrogram is imitated, then calculates peptide fragment in the score of second order ms figure according to following scoring methods, the peptide fragment of highest scoring is this The best match peptide fragment of mass spectrogram;Above-mentioned scoring operations are carried out to other MS/MS mass spectrograms, all best " peptide fragments-mass spectrum Figure " matching result arranges from big to small according to score, and presetting FDR (False Discover Rate) value is 0-5%, then can The cutoff value of score is calculated, then is present in protein enzymatic hydrolyzate greater than the peptide fragment of the Target database that must be worth;
Determine it in protein sequence databank under corresponding mass by first mass spectrometric parent ion in peptide fragment qualification process Candidate peptide section sequence S, give a mark to the matching degree of S and second order ms;
Scoring process is as follows:
1) the peptide fragment fragment ion effective ratio index u of MS/MS second level spectrogram is calculated, processing No Parity element marks spectrogram When: u=(with the matched peptide fragment ion fragment peak strength signal summation of candidate peptide fragment)/(current spectrogram peak intensity signal summation); Or, when processing has isotope labelling spectrogram: u=is (with the matched significant notation peptide fragment ion fragment peak strength signal of candidate peptide fragment Summation)/(current spectrogram significant notation peak intensity signal summation), simulation people is converted to spectrogram matter by Logistics formula 1 The fuzzy evaluation index w of amount;
2) sequence information that the fragment ion in sequence S is included is encoded, default behavior are as follows: remember that peptide segment length is N, n are the positive integer more than or equal to 2, and peptide section sequence arranges from top to bottom according to from aminoterminal (N-terminal) to c-terminus (C-terminal), and And it is corresponding with hereafter column vector;
Peptide fragment b ion coding mode are as follows: if there is b1Ion, b1The column vector of ion corresponding specification n*1, first is 1, remaining position is 0;If there is b2Ion, b2 ion then first to second be 1, remaining position be 0;So analogize, such as There are b for fruitn-1Ion, bn-1Ion corresponds to length then to be all 1 in first to n-1 in the column vector of n, and remaining position is 0;
To the mode of y ion coding are as follows: if there is y-1Ion, y1Ion pair is answered last in the column vector that length is n One is 1, remaining position is 0;If there is y2Ion, y2Ion pair answers in the column vector that length is n last position to inverse the Two be 1, remaining position is 0;So analogize, if there is yn-1Ion, yn-1It is last of the column vector of n that ion pair, which answers length, Position is 1 to the 2nd, remaining position is 0;
The N-terminal ion of remaining type is encoded by b ion coding mode, remaining C-terminal ion carries out in such a way that y ion encodes Coding;Finally information representation Matrix C is merged by column vector obtained above along row;
3) matrix inner products calculating is carried out to information representation Matrix C by formula 2, obtains information representation matrix X;Inner product calculates real Show same ion to demonstrate,prove presence certainly, prove different end (C-terminals, N such as short ion presence, a-y, b-y with the long ion in end (such as C-terminal) End) ion shake hands proof etc. reasoning from logics function;
4) it is summed all elements in X matrix again divided by peptide segment length n, then can obtain the check information of peptide fragment expression Summation;And the logarithm that complementary ion in mass spectrogram is indicated with p, represents the amount of complementary information, and p is the positive integer more than or equal to 0; Finally, by check information summation be multiplied again with fuzzy discrimination index w after the adduction of the amount of complementary information, then obtain this algorithm To candidate peptide fragment S and the matched score score (formula 3) of MS/MS spectrogram;
The inner product computing function includes three reasoning from logic functions: same ion self proves, with the long ion card in end Proof that the bright same short ion in end exists, different ends ion is shaken hands.
The algorithm simulates people to the fuzzy Judgment of spectrogram quality, with matrix inner products numerical procedure with Logistics function Realize logical derivation of the people to peptide section sequence.It is tested by Null-test, the algorithm and existing algorithm/software (Mascot, Morpheus, Pfind, MaxQuant) compare, it is found that algorithm of the invention can compare contrast method by Null-test, intelligence Want high.The algorithm simultaneously describes mass spectrometric data match condition without using conventional ion space, uses peptide fragment information space Carry out the matching operation with mass spectrogram.
The present invention has the advantage that
1. the present invention can be by Null-test, as a result more excellent, algorithm itself has certain intelligence.
2. the present invention uses peptide fragment information coding schemes, and does not use traditional ion space scheme.
The present invention can determine a large amount of erroneous matching result (being scored at 0), have stronger resolution capability.
Attached formula explanation:
Formula 1 is the Logistics formula that spectrogram quality index is converted into fuzzy evaluation.
Formula 2 is matrix inner products operation.
Formula 3 is scoring functions complete form of the invention.
Formula 1
X=CTC
Formula 2
Formula 3
Detailed description of the invention
Fig. 1 is the encoding scheme of peptide segment information.
Fig. 2 is the three kinds of reasoning from logic schemes realized by matrix inner products.
Specific embodiment
Method provided by the invention is described in detail below by embodiment, but the invention is not limited in any way.
Embodiment 1:
Use the collected human hepatocarcinoma cells' enzymolysis liquid data pair of Thermo Scientific Q Exactive mass spectrum Marking algorithm is verified.It include 77979 second order ms figures in the data.Tolerance, which is composed, in level-one composes tolerance for 10ppm, second level Under conditions of being set as 1% for 20ppm, FDR, marking analysis method of the invention can identify 14909 PSM (Peptide- Spectrum Match), 8813 unique peptide fragments, 1752 albumen.Same type software, Morpheus identify 14903 PSM, 9038 unique peptide fragments, 1880 albumen;Mascot identifies 16648 PSM, 10247 unique peptide fragments, 1975 eggs It is white.Method performance in the present invention is suitable with software at this stage.
It is 6 that Fig. 1, which demonstrates one of length, and a, b, y ion are all by the collected peptide fragment coding mode of mass spectrum.Figure 2, then the same ion for demonstrating length to be realized in 6 peptide fragment by matrix inner products demonstrate,proves presence, with end (such as C-terminal) length certainly Ion, which proves that different end (the C-terminal, N-terminal) ions such as short ion presences, a-y, b-y are shaken hands, the reasoning from logics function such as proves.
Embodiment 2:
Using Null-test scheme, random formation sequence length is in 100~1000 random albumen, 12000 eggs altogether It is white, the library Null anti-library corresponding with it is constituted, setting FDR is 20%.Algorithm of the invention in the above-mentioned library Null under the conditions of It is matched to 1 peptide fragment, is tested by Null-test.Mascot is matched to 2 peptide fragments, cannot be tested by Null-test; Morpheus and Pfind is respectively matched to 6 peptide fragments, cannot be tested by Null-test;MaxQuant is matched to 33 peptides Section, cannot equally be tested by Null-test.Thus it proves, invention algorithm has intelligence really, and result is more Reliably, steadily and surely.

Claims (2)

1. the analysis method of the protein enzymatic hydrolyzate based on fuzzy evaluation and reasoning from logic is based on fuzzy evaluation and reasoning from logic The method for analyzing peptide fragment-second order ms (MS/MS) figure matching relationship in protein enzymatic hydrolyzate, it is characterised in that:
According to Shotgun Proteomics requirement, by protein enzymatic hydrolyzate carry out second order ms (MS/MS) analysis, obtain be greater than or Person is equal to one MS/MS second order ms figure;To albumen all in existing target protein database (Target database) Sequence carries out inverted sequence and obtains Decoy database, by the protein sequence in two databases according to the acquisition of above-mentioned protein enzymatic hydrolyzate Process carries out simulation digestion, obtains candidate peptide section sequence library;For a specific MS/MS mass spectrogram, according to the matter of setting Error 0-50ppm is measured, filters out several candidate peptide fragments from peptide section sequence library according to the mass number of parent ion, if screening Obtained peptide fragment number is 0, then the MS/MS mass spectrogram is invalid;If screening obtained peptide fragment number to be more than or equal to 1 is effective matter Spectrogram then calculates peptide fragment in the score of second order ms figure according to following scoring methods, and the peptide fragment of highest scoring is this mass spectrum The best match peptide fragment of figure;Scoring operations are carried out to other MS/MS mass spectrograms, all best " peptide fragment-mass spectrogram " is matched As a result arranged from big to small according to score, preset FDR(False Discover Rate) value be 0-5%, then can calculate The cutoff value divided, then the peptide fragment greater than the Target database of the cutoff value is present in protein enzymatic hydrolyzate;
The scoring method is to be directed to determine it in protein sequence databank under corresponding mass by first mass spectrometric parent ion Candidate peptide section sequence S, give a mark to the matching degree of S and second order ms;Including calculating fuzzy evaluation index w, letter is generated Cease expression matrix C, to information representation Matrix C carry out matrix inner products calculate obtain information representation matrix X, by check information summation with It is multiplied to obtain final score score with fuzzy evaluation index w again after the amount p adduction of complementary information;
The scheme of the calculating fuzzy evaluation index w are as follows: calculate the peptide fragment fragment ion useful signal ratio of MS/MS second level spectrogram Rate index u, when processing No Parity element marks spectrogram: u=(total with the matched peptide fragment ion fragment peak strength signal of candidate peptide fragment With)/(current spectrogram peak intensity signal summation);Or, processing is when having isotope labelling spectrogram: u=(with candidate peptide fragment is matched has Criterion remembers peptide fragment ion fragment peak strength signal summation)/(current spectrogram significant notation peak intensity signal summation), pass through Logistics formula 1 is converted to simulation people to the fuzzy evaluation index w of spectrogram quality;
Formula 1;
The generation information representation Matrix C process are as follows: compile the sequence information that the fragment ion in sequence S is included Code, default behavior are as follows: note peptide segment length is n, and n is positive integer more than or equal to 2, peptide section sequence according to from aminoterminal (N-terminal) to C-terminus (C-terminal) arranges from top to bottom, and corresponding with hereafter column vector;
Peptide fragment b ion coding mode are as follows: if there is b1Ion, b1The column vector of ion corresponding specification n*1, first is 1, Remaining position is 0;If there is b2Ion, b2 ion then first to second be 1, remaining position be 0;So analogize, if deposited In bn-1Ion, bn-1Ion corresponds to length then to be all 1 in first to n-1 in the column vector of n, and being left position is 0;
To the mode of y ion coding are as follows: if there is y1Ion, y1It is that last position in the column vector of n is that ion pair, which answers length, 1, remaining position is 0;If there is y2Ion, y2It is that last position is in the column vector of n to penultimate that ion pair, which answers length, 1, remaining position is 0;So analogize, if there is yn-1Ion, yn-1It is that last position of the column vector of n is arrived that ion pair, which answers length, 2nd is 1, remaining position is 0;
The N-terminal ion of remaining type is encoded by b ion coding mode, remaining C-terminal ion is compiled in such a way that y ion encodes Code;Finally information representation Matrix C is merged by column vector obtained above along row;
The calculation method of the information representation matrix X are as follows: matrix inner products calculating is carried out to information representation Matrix C by formula 2, is obtained Obtain information representation matrix X;
Formula 2;
The calculation method of the final score score are as follows: it is summed again all elements in X matrix divided by peptide segment length n, The check information summation of peptide fragment expression can then be obtained;And the logarithm that complementary ion in mass spectrogram is indicated with p, represents complementary information Amount, p is positive integer more than or equal to 0;Finally, it will be commented again with fuzzy after the adduction of check information summation and the amount of complementary information Valence index w be multiplied, then obtain this algorithm to candidate peptide fragment S and the matched score score of MS/MS spectrogram;
Formula 3.
2. analysis method described in accordance with the claim 1, it is characterised in that: the inner product computing function includes three Logical Derivings Reason function: same ion self proves, with hold long ion prove with hold short ion to exist, different ends ion is shaken hands proof.
CN201510919595.5A 2015-12-13 2015-12-13 The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic Active CN106872554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510919595.5A CN106872554B (en) 2015-12-13 2015-12-13 The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510919595.5A CN106872554B (en) 2015-12-13 2015-12-13 The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic

Publications (2)

Publication Number Publication Date
CN106872554A CN106872554A (en) 2017-06-20
CN106872554B true CN106872554B (en) 2019-06-11

Family

ID=59177269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510919595.5A Active CN106872554B (en) 2015-12-13 2015-12-13 The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic

Country Status (1)

Country Link
CN (1) CN106872554B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103245714A (en) * 2013-03-25 2013-08-14 暨南大学 Protein secondary mass spectrum identification method of marker loci based on candidate peptide fragment discrimination
CN103852513A (en) * 2012-11-29 2014-06-11 中国科学院计算技术研究所 Method and system based on HCD mass spectrogram and ETD mass spectrogram for peptide fragment de novo sequencing
CN104034792A (en) * 2014-06-26 2014-09-10 云南民族大学 Secondary protein mass spectrum identification method based on mass-to-charge ratio error recognition capability

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103852513A (en) * 2012-11-29 2014-06-11 中国科学院计算技术研究所 Method and system based on HCD mass spectrogram and ETD mass spectrogram for peptide fragment de novo sequencing
CN103245714A (en) * 2013-03-25 2013-08-14 暨南大学 Protein secondary mass spectrum identification method of marker loci based on candidate peptide fragment discrimination
CN104034792A (en) * 2014-06-26 2014-09-10 云南民族大学 Secondary protein mass spectrum identification method based on mass-to-charge ratio error recognition capability

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics;YONG FUGA LI 等;《JOURNAL OF COMPUTATIONAL BIOLOGY》;20091231;全文 *
Protein Analysis by Shotgun/Bottom-up Proteomics;Yaoyang Zhang 等;《chemical reviews》;20131231;全文 *

Also Published As

Publication number Publication date
CN106872554A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
Mason et al. A guide for using functional diversity indices to reveal changes in assembly processes along ecological gradients
CN102495127B (en) Protein secondary mass spectrometric identification method based on probability statistic model
Colak et al. Automated McIntosh-based classification of sunspot groups using MDI images
CN109902018B (en) Method for acquiring test case of intelligent driving system
CN105527359B (en) Protein secondary Mass Spectrometric Identification method based on positive and negative planting modes on sink characteristic information matches
CN109933656A (en) Public sentiment polarity prediction technique, device, computer equipment and storage medium
Liu et al. Motif discoveries in unaligned molecular sequences using self-organizing neural networks
Harvey et al. Phylogenetic extinction rates and comparative methodology
WO2019083351A2 (en) Method and system for disease prediction and control
Wang et al. Radial basis function neural network ensemble for predicting protein-protein interaction sites in heterocomplexes
Earl et al. Spatial phylogenetics of butterflies in relation to environmental drivers and angiosperm diversity across North America
Peters et al. Why is the biological hydrophobicity scale more accurate than earlier experimental hydrophobicity scales?
Fang et al. Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method
JPH02195473A (en) Method for forecasting attribute value in learning system
CN113762417A (en) Method for enhancing HLA antigen presentation prediction system based on deep migration
Heinze-Deml et al. Think before you act: A simple baseline for compositional generalization
CN116106878A (en) Big data analysis system and method
Yilmaz et al. Sequence-to-sequence translation from mass spectra to peptides with a transformer model
CN115312118A (en) Single-sequence protein contact map prediction method based on map neural network
Worachartcheewan et al. Quantitative population-health relationship (QPHR) for assessing metabolic syndrome
CN106872554B (en) The analysis method of protein enzymatic hydrolyzate based on fuzzy discrimination and reasoning from logic
CN108805280A (en) A kind of method and apparatus of image retrieval
Sadowski et al. On the evolutionary origins of “Fold Space Continuity”: A study of topological convergence and divergence in mixed alpha-beta domains
CN107729719B (en) De novo sequencing method
CN115620818A (en) Protein mass spectrum peptide fragment verification method based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant