CN114596912A - Short peptide omics identification method based on polypeptide length and application thereof - Google Patents

Short peptide omics identification method based on polypeptide length and application thereof Download PDF

Info

Publication number
CN114596912A
CN114596912A CN202210151752.2A CN202210151752A CN114596912A CN 114596912 A CN114596912 A CN 114596912A CN 202210151752 A CN202210151752 A CN 202210151752A CN 114596912 A CN114596912 A CN 114596912A
Authority
CN
China
Prior art keywords
mass
polypeptide
theoretical
ion
charge ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210151752.2A
Other languages
Chinese (zh)
Other versions
CN114596912B (en
Inventor
徐巨才
黄峻洪
陈雅君
严嘉慧
梁姚顺
陈静
刘万顺
范丽琪
江桂玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuyi University
Original Assignee
Wuyi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuyi University filed Critical Wuyi University
Priority to CN202210151752.2A priority Critical patent/CN114596912B/en
Publication of CN114596912A publication Critical patent/CN114596912A/en
Application granted granted Critical
Publication of CN114596912B publication Critical patent/CN114596912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a polypeptide length-based short peptide omics identification method and application thereof. The polypeptide length-based short peptide omics identification method can enumerate all polypeptide fragments within a specified polypeptide length range, then generates theoretical primary parent ions and secondary child ions one by one, continuously compares the theoretical primary parent ions and the theoretical secondary parent ions with actual ion spectrogram data acquired by substances in a sample, associates the polypeptides meeting the matching requirements with corresponding substances, and finally realizes identification of the polypeptide sequence structure of the substances in the sample. The method can directly carry out retrieval and identification without dead angles and leakage on the protein enzymolysis products without depending on a protein sequence database, and meanwhile, the method adopts a completely matched evaluation mode on the short peptides, so that the identification result is high in accuracy and high in reliability, and the defects of the existing proteomics analysis method and tools can be fully overcome. The invention has important significance for improving the level of the polypeptinomics analysis technology and promoting the development of the industry.

Description

Short peptide omics identification method based on polypeptide length and application thereof
Technical Field
The invention relates to the field of polypeptide omics identification, in particular to a polypeptide length-based short peptide omics identification method and application thereof.
Background
After food-derived protein is treated by modern biological enzymolysis technology and is changed into food-derived polypeptide, the molecular weight becomes smaller, the digestion and absorption characteristics are greatly enhanced, and various biological activities such as oxidation resistance, aging resistance, memory improvement and the like are endowed to enzymolysis products, so that the food-derived polypeptide has wide attention of researchers, consumers and food enterprises. It is worth noting that, since most of the food proteases are crude enzymes and the action sites of the enzymes are wide, the enzymolysis products often contain a large amount of small-molecule short peptides. Meanwhile, the short peptides endow the enzymolysis products with stronger digestion and absorption characteristics and functional characteristics. However, the composition and distribution of short peptides in enzymatic products has not been resolved to date due to the lack of efficient analytical methods and tools.
In the related technology, researchers mainly analyze the composition of polypeptides in food-borne protein enzymatic hydrolysate through related proteomics methods and software, such as Mascot, Maxquant, sequence, Peaks and the like. However, these proteomics research methods and software objects are mainly proteins, not polypeptides. Short peptides are often overlooked by proteomic analysis methods and tools due to their high frequency of occurrence in each protein sequence, often lacking the specificity required for protein sequence identification. Wherein, Mascot identification results are mainly distributed above 7 peptides, and the identification capability of short peptides is basically deleted; the Maxquant is mainly used for proteomics analysis during specific enzymolysis, and is relatively weak during analysis of non-specific enzyme degradation products, such as long analysis time, short peptides are few in analysis results. In addition, the tools such as Mascot, Maxquant, Sequest and the like must rely on a known protein database when performing polypeptide analysis, and food protein raw materials often cannot provide a complete or accurate sequence database, so that the difficulty of short peptide analysis is greatly improved. Peaks is used as a polypeptide analysis tool for de novo sequencing, polypeptide spectrum decomposition can be carried out without depending on a protein database, but the practical process shows that the spectrum decomposition result of the software is few, and the analysis requirement of food short peptide omics cannot be met. Meanwhile, the related art has few researches on the identification and application of the short-peptide omics, and therefore, a complete short-peptide omics identification method is urgently needed to be searched.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a short peptide omics identification method based on polypeptide length, which can directly search and identify protein enzymolysis products without dead angles and missing without depending on a protein sequence database.
The invention also provides application of the polypeptide length-based short peptide omics identification method in the polypeptide omics.
In a first aspect of the invention, a method for identifying short peptidomics based on polypeptide length is provided, which comprises the following steps:
s1, presetting a polypeptide fragment length range, an amino acid residue list and a secondary ion cluster analysis type, and acquiring mass spectrum data of primary parent ions and secondary daughter ions of a substance, which are acquired by a sample to be detected in a mass spectrum;
s2, presetting a primary parent ion signal response intensity threshold and a primary parent ion mass-to-charge ratio range, and screening mass spectrum data obtained in the step S1 according to the preset primary parent ion signal response intensity threshold and the primary parent ion mass-to-charge ratio range to obtain a substance set C2 meeting the requirements in the sample to be detected;
s3, obtaining the length value L of the polypeptide fragment within the preset length range of the polypeptide fragment according toThe number N of amino acid residue classes in the amino acid residue list in step S1 is calculated for all theoretical number N of polypeptide fragments per polypeptide fragment lengthLAnd obtaining a sequence set of the theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio of the sequence set;
s4, screening candidate polypeptide fragments meeting the conditions in the substance collection C2 in the step S2 according to the mass-to-charge ratio of the primary theoretical parent ions and the mass-to-charge ratio of the secondary theoretical child ion clusters of the theoretical polypeptide fragments;
and S5, scoring the candidate polypeptide fragments, and selecting the candidate polypeptide fragment with the highest score to be judged as the polypeptide fragment sequence within the preset length range in the sample to be detected.
According to some embodiments of the invention, the predetermined polypeptide fragment length range in step S1 is any length range, preferably 2-7.
According to some embodiments of the invention, the amino acid residues in the amino acid residue list are any non-repeating amino acid residues, and the amino acid residue numbers are numbered from 0 and are required to be unique and consecutive.
According to some embodiments of the invention, the amino acid residues may be the common 20 amino acid residues, and may also be unusual amino acid residues.
According to some embodiments of the invention, the unusual amino acid residue comprises at least one of a hydroxyproline residue and a selenocysteine residue.
According to some embodiments of the present invention, the secondary ion cluster type in step S1 may be any one or more of an a ion cluster, a b ion cluster, and a y ion cluster.
According to some embodiments of the invention, the threshold in step S2 is 0-0.02Da or the threshold is 0-40 ppm.
According to some embodiments of the invention, the secondary sub-ion cluster mass-to-charge ratio deviation value in step S2 is in a range of 0-0.05Da, and or the secondary sub-ion cluster mass-to-charge ratio deviation value is in a range of 0-80 ppm.
According to some embodiments of the invention, the method for calculating the mass-to-charge ratio of the secondary theoretical ion cluster in step S4 is as follows:
Figure BDA0003510866230000031
Figure BDA0003510866230000032
Figure BDA0003510866230000033
in the formula: mz (a)k) Is akThe mass-to-charge ratio of the ions; mz (b)k) Is b iskThe mass-to-charge ratio of the ions; mz (y)k) Is ykThe mass-to-charge ratio of the ions; l is the length value of the polypeptide fragment to be retrieved; k is an ion number: an integer from 1-L; m (H)+) Molar mass of hydrogen ion, M (A)j) Denotes the j (th) amino acid residue in the polypeptide fragment (A)j) M (CO) is the molar mass of CO (carbonyl); m (H)2O) is the molar mass of a water molecule.
According to some embodiments of the present invention, the specific steps of screening to obtain the qualified substance set C2 in the test sample in step S2 are as follows:
s11, acquiring charge information of each ion in the primary mass spectrum and the secondary mass spectrum according to the mass spectrum data of the primary parent ion and the secondary ion cluster of the substance acquired in the mass spectrum of the sample to be detected in the step S1, and performing ion standardization processing to obtain the mass-to-charge ratio of the primary parent ion and the mass-to-charge ratio of the secondary ion cluster of the substance;
s12, filtering the substances in the step S11 according to the signal response intensity of the primary parent ions, and obtaining a substance set C1 which is higher than the signal response intensity threshold value of the primary parent ions in the step S2;
s13, filtering the substance set in the step S12 according to the primary parent ion mass-to-charge ratio range in the step S2, and obtaining a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement.
According to some embodiments of the invention, the ion normalization process refers to converting multiply charged ions in a mass spectrum into ions with a unit positive charge by mass-to-charge ratio calculation, wherein the ions of unknown charge are by default single charged ions.
According to some embodiments of the invention, the computational conversion process in the ion normalization process is exemplified by:
converting the substance with z positive charges and the actual measured value of mass-to-charge ratio X into ions with unit positive charges, wherein the mass-to-charge ratio of the converted ions with unit positive charges is (X xz-z xH + H)/1, and H is the molar mass of hydrogen ions with unit positive charges.
According to some embodiments of the invention, the primary parent ion signal response intensity threshold is 3 times or more the instrument noise intensity; the mass-to-charge ratio range of the primary parent ions is 70-2000 Da.
According to some embodiments of the invention, the screening of the candidate polypeptide fragment in step S4 comprises the steps of:
s21, obtaining an unresearched length value L within the preset polypeptide fragment length range in the step S1;
s22, initializing polypeptide coding value U ═ 0;
s23, carrying out N-system conversion on the polypeptide coding value U according to the number N of the amino acid residue types, and expressing the calculation result according to the L position to obtain a sequence codon X;
s24, associating the number of each position in the sequence codon X with the amino acid residue sequence number, and translating the sequence codon X into a theoretical polypeptide fragment P;
s25, obtaining a first-order theoretical parent ion mass-to-charge ratio Z and a second-order theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
s26, calculating the absolute value of the difference between the first-order parent ion mass-to-charge ratio of the substance set C2 and the first-order theoretical parent ion mass-to-charge ratio Z of the theoretical polypeptide fragment, and marking as E1Obtaining a substance set F1 with an absolute value meeting a threshold, and judging whether the substance set F1 is empty:
if the theoretical polypeptide fragment P is empty, acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragment P in the step S3, wherein the polypeptide coding value is increased by 1 on the original basis;
if not, comparing the mass-to-charge ratio of the secondary ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference value between the mass-to-charge ratio of the secondary ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified with the absolute value within the deviation value range of the mass-to-charge ratio of the secondary ion clusters, and judging whether the substance set F2 is empty:
if the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragments in step S3, wherein the polypeptide coding value is increased by 1 on the original basis; if the material set F2 is empty, the next theoretical polypeptide fragment is directly listed, and the encoding value of the polypeptide is increased by 1 on the original basis;
s27, repeating steps S22-S26 until the polypeptide coding value U equals all possible polypeptide numbers at that length, i.e., U ═ NL
S28, obtaining the length value of the next unresearched polypeptide fragment, and repeating the steps S21-S27 until the length values of the polypeptide fragments within the preset polypeptide fragment length range in the step S1 are searched.
According to some embodiments of the invention, the set threshold in step S26 is 0-0.02Da or the set threshold is 0-40 ppm.
According to some embodiments of the invention, the secondary sub-ion cluster mass-to-charge ratio deviation value in step S26 is in a range of 0-0.05Da, and or the secondary sub-ion cluster mass-to-charge ratio deviation value is in a range of 0-80 ppm.
According to some embodiments of the invention, the scoring method in step S5 includes two kinds:
the first score calculation method is as follows: comparing the candidate polypeptide fragment with the secondary ion cluster type of the actual substance in the sample to be detected, and selecting the candidate with the highest score by default in the output result when 10 points are counted for each complete ion clusterSelecting the identified polypeptide as the final identification result, and marking as SA
The second score calculation method is as follows: recording the secondary ion type number of substances in the sample to be detected matched with the candidate polypeptide fragment, and recording the secondary ion type number as NMAnd calculating the mean value of the absolute value of the deviation between the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragment, and marking as E2Second type of score SBThe calculation formula of (2) is as follows:
Figure BDA0003510866230000051
wherein C is the type number of theoretical secondary ion clusters, and L is the length value of the polypeptide.
According to some embodiments of the invention, the output result of the scoring in the step S5 is respectively according to SA>SB>E1>E2Is ordered in a priority order of where SA、SBHigher score the higher confidence in the result, E1、E2The smaller the error the higher the confidence in the results, the candidate identified polypeptide with the highest score in the first of the above ranking results is selected by default as the final preferred identification.
According to some embodiments of the present invention, in the first score calculating method, the secondary ion cluster type may be any one of a, b, or y ion clusters or a combination thereof.
In a second aspect of the invention, the application of a polypeptide length-based short peptide omics identification method in the field of polypeptide omics or proteomics is provided.
According to some embodiments of the invention, at least the following benefits are achieved:
(1) the short peptide omics identification method based on the polypeptide length can perform listing and retrieval matching in a specified polypeptide retrieval range, does not need to depend on a protein database at all, realizes indifference and omission-free short peptide retrieval matching, greatly improves the short peptide identification efficiency and flux, and obviously lightens the burden of scientific research workers.
(2) The polypeptide length-based short peptide omics identification method can efficiently search short peptides and long peptides if necessary, supports parallel computing processing, and has high flexibility and expansion prospect.
(3) The polypeptide length-based short peptide omics identification method disclosed by the invention can be used for infinitely searching short peptides in a one-by-one exemplary mode, so that the requirements and loads on computer hardware are greatly reduced, and the working efficiency and simplicity are improved.
(4) The polypeptide length-based short peptide omics identification method adopts a complete matching evaluation mode for the identification result, and the identification result is reliable and accurate.
Drawings
The invention is further described with reference to the following figures and examples, in which:
FIG. 1 is a flow chart of short peptide identification according to an embodiment of the present invention.
FIG. 2 is a mass spectrum of the identification result of the sample in example 1 of the present invention.
FIG. 3 is a statistical chart of the identification results of the samples in example 2 of the present invention.
FIG. 4 is a statistical chart of the results of identification of the sample in comparative example 1 of the present invention.
Detailed Description
The concept and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments to fully understand the objects, features and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.
To fully demonstrate the novelty and advancement of the present invention, the following examples and comparative examples were all verified using high resolution liquid chromatography data of the same sample, glutathione standards or soy protein enzymatic hydrolysate (Alcalase broad-spectrum alkaline protease), acquisition UPLC I-class (Waters, usa) -ESI-Q-TOF (bruker, germany), liquid phase separation consisting of mobile phase a (acetonitrile) and mobile phase B (0.1% by volume formic acid water). The gradient elution procedure was: at 0-60min, the volume fraction of the mobile phase B is changed from 95% to 60%; the volume fraction of mobile phase B was changed from 60% to 95% at 60-64min, 95% at 64-70min, 1.0mm X100 mm HSS T3(1.8 μm, 100A, Waters, USA) for the column, 1 μ L for the sample, 0.05mL/min for the flow rate, and 40 ℃ for the column temperature. Mass spectrum ESI-Q-TOF is operated in a positive ion scanning mode, and Auto MS/MS automatic secondary detection is adopted for the first 3 parent ions. The mass spectrum ion detection range is 50-1200m/z, and the mass spectrum acquisition rate in two-dimensional liquid mass analysis is 10 HZ.
Examples
The embodiment discloses a short peptide omics identification method of candidate polypeptide fragment length, as shown in figure 1, comprising the following specific steps:
(1) presetting the length range of the polypeptide fragment, the amino acid residue list and the type of the secondary ion cluster, and obtaining mass spectrogram data of primary parent ions and secondary daughter ions of a substance collected by a sample to be detected in a mass spectrum.
Wherein the length range of the preset polypeptide fragment is any length range, preferably 2-7; the amino acid residues in the amino acid residue list are any non-repetitive amino acid residues, such as common 20 amino acid residues, the number of the amino acid residues is numbered from 0 and is required to be unique and continuous; the secondary ion cluster types include any one or more of a ion cluster, b ion cluster, and y ion cluster.
(2) And screening according to a preset primary parent ion signal response intensity threshold and a primary parent ion mass-to-charge ratio range to obtain a substance set C2 (hereinafter referred to as a substance set C2) meeting the requirements in the sample to be detected. The specific steps of obtaining the substance set C2 meeting the requirements in the sample to be tested through screening are as follows:
1) acquiring primary parent ion mass-to-charge ratio and secondary ion mass-to-charge ratio of the substance according to the primary parent ion mass-to-charge ratio and secondary ion mass-to-charge ratio of the substance of the sample to be detected in the step (1) in the mass spectrum to obtain charge information of each ion in the primary mass spectrum and the secondary mass spectrum, and performing ion standardization treatment; the ion normalization processing refers to converting multi-charge ions in a mass spectrogram into ions with unit positive charges through mass-to-charge ratio calculation, wherein unknown charge ions are defaulted to be single charge ions, and the calculation conversion process is as follows: converting the substance with z positive charges and the mass-to-charge ratio actually measured as X into ions with unit positive charges, wherein the mass-to-charge ratio of the converted ions with unit positive charges is (X xz-z xH + H)/1, and H is the molar mass of hydrogen ions with unit positive charges.
2) Filtering the substances according to the first-level parent ion signal response intensity, wherein the first-level parent ion signal response intensity threshold is 3 times or more of the noise intensity of the instrument, and obtaining a substance set C1 (hereinafter referred to as substance set C1) higher than the first-level parent ion signal response intensity threshold after filtering;
3) and filtering the substances with the primary parent ion signal response intensity higher than the threshold value in the step 2) according to the primary parent ion mass-to-charge ratio range, wherein the primary parent ion mass-to-charge ratio range is 70-2000Da, and obtaining a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement.
(3) Obtaining a polypeptide fragment length value L in a preset polypeptide fragment length range, and calculating all theoretical polypeptide fragment number N under the polypeptide fragment length according to the number N of the amino acid residue types in the amino acid residue list in the step (1)LAnd obtaining a sequence set of theoretical polypeptide fragments and the mass-to-charge ratio of the primary theoretical parent ions and the mass-to-charge ratio of the secondary theoretical child ion clusters of the theoretical polypeptide fragments. The method for calculating the mass-to-charge ratio of the secondary theoretical ion cluster comprises the following steps:
Figure BDA0003510866230000081
Figure BDA0003510866230000082
Figure BDA0003510866230000083
in the formula: mz (a)k) Is akThe mass-to-charge ratio of the ions; mz (b)k) Is b iskThe mass-to-charge ratio of the ions; mz (y)k) Is ykThe mass-to-charge ratio of the ions; l is the length value of the polypeptide fragment to be retrieved; k is an ion number: an integer from 1-L; m (H)+) Molar mass of hydrogen ion, M (A)j) Denotes the j (th) amino acid residue in the polypeptide fragment (A)j) M (CO) is the molar mass of CO (carbonyl); m (H)2O) is the molar mass of a water molecule.
(4) And (3) screening candidate polypeptide fragments meeting the conditions in the substance collection C2 in the step (2) according to the mass-to-charge ratio of the theoretical polypeptide fragments of the primary theoretical parent ions and the mass-to-charge ratio of the secondary theoretical ionic clusters, wherein the screening of the candidate polypeptide fragments comprises the following steps:
1) obtaining an unresearched length value L within the preset length range of the polypeptide fragment in the step (1);
2) initializing the polypeptide coding value U-0, exemplified by a polypeptide fragment;
3) carrying out N-system conversion on the polypeptide coding value U according to the number N of the types of the amino acid residues, and expressing a calculation result according to the L position to obtain a sequence codon X;
4) correlating the number of each position in the sequence codon X with the amino acid residue number, and translating the sequence codon X into a theoretical polypeptide fragment P;
5) obtaining a first-order theoretical parent ion mass-to-charge ratio Z and a second-order theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
6) calculating the absolute value of the difference between the mass-to-charge ratio of the primary parent ion of the substance set C2 and the mass-to-charge ratio Z of the primary theoretical parent ion of the theoretical polypeptide fragment P, and recording as E1Obtaining a substance group F1 (hereinafter referred to as substance group F1) having an absolute value satisfying a threshold value of 0-0.02Da or 0-40ppm, and determining whether or not the substance group F1 is empty:
if empty, the next polypeptide fragment is exemplified, the polypeptide coding value is increased by 1 on an original basis;
if not, comparing the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference between the mass-to-charge ratio of the secondary sub-ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical sub-ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified (hereinafter referred to as substance set F2) with the absolute value within the deviation range of the mass-to-charge ratio of the secondary sub-ion clusters, namely the deviation range of the mass-to-charge ratio of the secondary sub-ion clusters is 0-0.05Da or 0-80ppm, and judging whether the deviation value of the substance set F2 is empty:
if the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and listing the next theoretical polypeptide fragment, wherein the polypeptide coding value is increased by 1 on the original basis; if the material set F2 is empty, the next theoretical polypeptide fragment is directly listed, and the encoding value of the polypeptide is increased by 1 on the original basis;
7) repeating steps 2) -6) until the polypeptide-encoding value U equals the number of all possible polypeptides for that length, i.e. U ═ NL
8) And acquiring the next unrequired length value, and repeating the steps 1) -7) until all the length values are completely retrieved.
(5) And scoring the candidate polypeptide fragments according to the matching degree of the secondary ion clusters, and selecting the candidate polypeptide fragment with the highest score as an experimental map identification result of the sample to be detected. The scoring method comprises two methods:
the first score calculation method is as follows: comparing the candidate polypeptide fragments with the secondary ion cluster types of actual substances in a sample to be detected, and when 10 points are counted for each detected complete ion cluster, selecting the candidate identification polypeptide with the highest score as a final identification result by default in the output result, and marking as SA
The second score calculation method is as follows: recording the secondary ion type number of substances in the sample to be detected matched with the candidate polypeptide fragment, and recording the secondary ion type number as NMAnd calculating the mean value of the absolute value of the deviation between the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragment, and marking as E2Second type of score SBThe calculation formula of (2) is as follows:
Figure BDA0003510866230000091
wherein C is the type number of theoretical secondary ion clusters, and L is the length value of the polypeptide;
the output results are respectively according to SA>SB>E1>E2Is ordered in a priority order of (1), wherein SA、SBHigher score the higher confidence in the result, E1、E2The smaller the error the higher the confidence in the result, the candidate identification polypeptide with the highest score for the first of the above-mentioned ranked results is selected by default as the final preferred identification result.
Example 1
In this embodiment, the analysis target sample is a glutathione standard, and the short peptide omics identification based on the length of the candidate polypeptide fragment comprises the following specific steps:
(1) firstly, presetting the length range of a polypeptide fragment to be 2-4 according to the length range of a short peptide to be measured, wherein the amino acid residues in an amino acid residue list are 20 common residues, and the types of preset secondary ion clusters are a, b and y ion clusters; wherein, the amino acid residue numbers are written continuously from 0 according to the molecular weight of the amino acid residue from small to large, that is, each amino acid residue is assigned with a unique and continuous number, which is specifically shown in table 1.
Table 1:
Figure BDA0003510866230000101
Figure BDA0003510866230000111
table 1 shows a list of common amino acid residues, and specific amino acid residues can be added according to actual analysis requirements, namely, unusual amino acid residues can be added according to actual analysis requirements when the method is implemented.
(2) Acquiring original data of substances in a standard glutathione sample to be detected, which are acquired in a mass spectrum, wherein the original data comprise mass spectrogram data of primary parent ions and secondary child ions, further acquiring charge information of each ion in the primary mass spectrogram and the secondary mass spectrogram, and performing ion standardization treatment on each ion. The ion standardization processing refers to converting multi-charge ions in mass spectrogram information into ions with unit positive charge through mass-to-charge ratio calculation, wherein unknown charge ions are directly defaulted to be single charge ions.
And filtering the substances according to the signal response intensity of the primary parent ions, wherein the threshold value of the signal intensity of the primary parent ions is 1000counts (about 50 times of the noise of the instrument), the signal response intensity of the primary parent ions is the absolute intensity, and the substance set C1 with the signal response intensity of the primary parent ions higher than the threshold value is reserved after filtering. Further, the substance set C1 with the primary parent ion signal response intensity higher than the threshold is filtered according to the primary parent ion mass-to-charge ratio range, the primary parent ion mass-to-charge ratio range set in the embodiment is 70-1200Da, and only the substance with the primary parent ion mass-to-charge ratio meeting the range requirement is retained after filtering, namely the sample substance set C2 to be detected.
(3) Obtaining an unrequired length value 3 according to a preset length range of the polypeptide fragment, and calculating the number of all polypeptide fragments 20 at the length according to the number of amino acid residue types (20) and the length value (3) of the polypeptide fragment38000 in total, and obtaining a sequence set of theoretical polypeptide fragments and a first-level theoretical parent ion mass-to-charge ratio and a second-level theoretical child ion cluster mass-to-charge ratio thereof.
(4) Screening candidate polypeptide fragments meeting the conditions in a sample substance to be tested according to the mass-to-charge ratio of the primary theoretical parent ions and the mass-to-charge ratio of the secondary theoretical ionic clusters of the theoretical polypeptide fragments, and specifically comprising the following steps:
step 1: obtaining an unrequired length value (such as a length value of 3) within a preset length range of the polypeptide fragment;
step 2: initializing the polypeptide coding value U-0, exemplified by a polypeptide fragment;
and step 3: carrying out 20-system conversion on the polypeptide coding value U according to the number (20) of the amino acid residue types, and expressing the calculation result according to 3 positions to obtain a sequence codon X (such as '000');
and 4, step 4: correlating the number of each position in the sequence codon X with the number of amino acid residues, and translating the sequence codon X into a theoretical polypeptide fragment (for example, translating the sequence codon X '000' into 'G-G-G', namely 'Gly-Gly-Gly');
and 5: obtaining a set of primary theoretical parent ion mass-to-charge ratios and secondary theoretical daughter ion clusters (including a, b or y ion clusters) mass-to-charge ratios of the theoretical polypeptide fragments ('Gly-Gly-Gly');
step 6: calculating the absolute value of the difference between the mass-to-charge ratio of the primary parent ion of the substance set C2 in the mass spectrogram of the standard glutathione sample to be detected and the mass-to-charge ratio of the primary theoretical parent ion of the theoretical polypeptide fragment ('Gly-Gly-Gly'), and recording the absolute value as E1Obtaining a substance set F1 with an absolute value smaller than a threshold value of 0.005Da, and judging whether the substance set F1 is empty:
if empty, the theoretical polypeptide fragment is exemplified by one in which the polypeptide-encoding value is increased by 1 on an original basis, i.e., U-1;
if the second-level ion cluster mass-to-charge ratio of the substances in the substance set F1 is not null, comparing the mass-to-charge ratios of the second-level ion clusters of the substances with the mass-to-charge ratios of the second-level theoretical ion clusters of theoretical polypeptide fragments ("Gly-Gly-Gly"), if the deviation value of the mass-to-charge ratios of the second-level ion clusters is less than 0.02Da, marking the theoretical polypeptide fragments as candidate identification results of the substances, namely candidate polypeptide fragments, and listing next theoretical polypeptide fragments, wherein the polypeptide coding value is added by 1 on the original basis, namely U is 1;
and 7: repeating steps 2-6 until the polypeptide-encoding value U equals the number of all possible polypeptides at that length 203
And 8: and (4) acquiring the next unrequired length value, and repeating the steps 1-7 until all the length values are completely retrieved.
(5) Scoring the candidate polypeptide fragments according to the matching degree of the secondary ion clusters, and selecting the candidate polypeptide fragment with the highest score as an experimental map identification result of a sample to be detected, wherein the scoring method comprises two methods:
the first score calculation method is as follows: comparing the candidate polypeptide fragment with the secondary ion cluster type of the actual substance in the sample to be detected, wherein the secondary ion cluster type can be any one or combination of a ion cluster, b ion cluster or y ion cluster, 10 points are counted when one complete ion cluster is detected, 20 points are counted when two complete ion clusters are detected, and the like, the candidate identification polypeptide with the highest score is selected as the final identification result by default according to the output result, and the final identification result is recorded as SA
The second score calculation method is as follows: recording the number of secondary ions of the substance in the sample to be detected matched with the candidate polypeptide fragment, and recording the number as NMAnd calculating the mean value of the absolute value of the deviation between the secondary ion mass-to-charge ratio of the substance in the sample to be detected and the secondary theoretical ion mass-to-charge ratio of the corresponding candidate polypeptide fragment, and recording as E2Second type of score SBThe calculation formula of (2) is as follows:
Figure BDA0003510866230000131
in the formula, C is the type number of the secondary theoretical ion cluster, and L is the length value of the polypeptide;
the output results are respectively according to SA>SB>E1>E2Is ordered in a priority order of (1), wherein SA、SBHigher score the higher confidence in the result, E1、E2The smaller the error the higher the confidence in the results, the candidate identified polypeptide with the highest score in the first of the above ranking results is selected by default as the final preferred identification.
Specific output results are shown in table 2.
Table 2:
Figure BDA0003510866230000132
Figure BDA0003510866230000141
in table 2, the material source refers to the spectrum source number during the data acquisition process of the mass spectrometer.
As can be seen from Table 2, S is the number of candidate polypeptide fragments "ECGAIs the same as others, but SBThe score is 78 points, and the output results are respectively according to SA>SB>E1>E2Is ordered in a priority order of (1), wherein SA、SBHigher score the higher confidence in the result, E1、E2The smaller the error, the higher the confidence of the result, and the candidate identification polypeptide having the highest score among the above-mentioned ranked results is selected as the final preferred identification result by default, so that in the identification result of the present embodiment, S isAThe scoring results are consistent, preferably S is usedBWherein the candidate polypeptide fragment "ECG" has the highest score with a value of 78, which is the final preferred identification result.
FIG. 2 is the result of ion cluster identification, in which a1 and a2 are a ion cluster ions; the b1, b2 and b3 ions are b ion cluster ions; y2 and y3 are y ion cluster ions. As can be seen from FIG. 2, a complete b ion cluster (b1, b2, and b3 ions), S, was detected in this exampleAScore 10, SBThe score is 78, the identification result of the candidate polypeptide fragment is ECG, namely the amino acid sequence of the polypeptide in the sample to be detected is identified as glutamic acid-cysteine-glycine which is consistent with the actual sequence of the sample. As can be seen from FIG. 2, the short peptides can be identified very accurately by the polypeptide length-based short peptide omics identification method of the present invention.
Example 2
In the embodiment, the analysis target sample is a soybean enzymolysis product, and the short peptide omics identification based on the length of the candidate polypeptide fragment comprises the following specific steps:
(1) firstly, the length identification range of a preset polypeptide fragment is 2-6, the amino acid residues in an amino acid residue list are 20 common residues, and the amino acid collection sequence numbers are continuously compiled according to the molecular weight of the amino acid residues from small to large, namely, each amino acid residue is assigned with a unique and continuous sequence number from 0, which is shown in table 1 of specific example 1.
(2) Acquiring original data of substances in the soybean enzymolysis product to be detected, which are acquired in a mass spectrum, wherein the original data comprise mass spectrogram data of primary parent ions and secondary child ions, further acquiring charge information of each ion in the primary mass spectrogram and the secondary mass spectrogram, and performing ion standardization treatment on each ion. The ion standardization processing refers to converting multi-charge ions in mass spectrogram information into ions with unit positive charge through mass-to-charge ratio calculation, wherein unknown charge ions are directly defaulted to be single charge ions.
And filtering the substances according to the signal response intensity of the primary parent ions, wherein the threshold value of the signal intensity of the primary parent ions is 500counts (namely the threshold value of the signal intensity is more than 3 times of the noise of the instrument), the signal response intensity of the ions is absolute intensity, and the substance set C1 with the signal response intensity of the primary parent ions higher than the threshold value is reserved after filtering. Further, the substance set C1 with the primary parent ion signal response intensity higher than the threshold is filtered according to the primary parent ion mass-to-charge ratio range, the primary parent ion mass-to-charge ratio range set in the embodiment is 70-1500Da, and only the substances with the primary parent ion mass-to-charge ratio meeting the range requirement are retained after filtering, namely the sample substance set C2 to be detected.
(3) Obtaining an unresearched length value 2 according to a preset length range of the polypeptide fragment, and calculating the number of all polypeptide fragments 20 at the length according to the number of amino acid residue types (20) and the length value (2) of the polypeptide fragment2And totaling 400, and obtaining a sequence set of theoretical polypeptide fragments and a primary theoretical parent ion mass-to-charge ratio and a secondary theoretical child ion cluster mass-to-charge ratio of the sequence set.
(4) Candidate polypeptide fragments meeting the conditions are screened from a sample substance to be detected according to the mass-to-charge ratio of primary theoretical parent ions and the mass-to-charge ratio of secondary theoretical ionic clusters of the theoretical polypeptide fragments, and the method comprises the following specific steps:
step 1: obtaining an unrequired length value (for example, the length value is 2) in a preset length range of the polypeptide fragment;
step 2: initializing the polypeptide coding value U ═ 0, exemplified by a polypeptide fragment;
and step 3: carrying out 20-system conversion on the polypeptide coding value U according to the number (20) of the amino acid residue types, and expressing the calculation result according to 2 bits to obtain a sequence codon X (such as '00');
and 4, step 4: correlating the number of each position in the sequence codon X with the amino acid residue number, and translating the sequence codon X into a theoretical polypeptide fragment (for example, translating the sequence codon X '00' into 'G-G', namely 'Gly-Gly');
and 5: obtaining a mass-to-charge ratio set of primary theoretical parent ions and secondary theoretical daughter ion clusters (including a, b or y ion clusters) of the theoretical polypeptide fragments ('Gly-Gly');
step 6: calculating the absolute value of the difference between the mass-to-charge ratio of the primary parent ion of the substance set C2 and the mass-to-charge ratio of the primary theoretical parent ion of the theoretical polypeptide fragment ('Gly-Gly') in the mass spectrogram of the standard glutathione sample to be detected, and recording the absolute value as E,1Obtaining a substance set F1 with an absolute value less than a threshold value of 10ppm, and judging whether the substance set F1 is empty:
if empty, the theoretical polypeptide fragment is exemplified by one in which the polypeptide-encoding value is increased by 1 on an original basis, i.e., U-1;
if not, comparing the mass-to-charge ratios of the secondary ion clusters of the substances in the substance set F1 with the mass-to-charge ratios of the secondary theoretical ion clusters of the theoretical polypeptide fragments ("Gly-Gly") one by one, if the deviation value of the mass-to-charge ratios of the secondary ion clusters is less than 0.02Da, marking the theoretical polypeptide fragments as candidate polypeptide fragments of the substances, namely candidate identification results, and listing next theoretical polypeptide fragments, wherein the polypeptide coding value is added by 1 on the original basis, namely U is 1;
and 7: repeating steps 2-6 until the encoded polypeptide value U is greater than all possible polypeptides at that length, i.e., U is greater than 400; counting the number of polypeptides in the soybean enzymolysis product to be detected when the length of the polypeptide fragment is 2;
and step 8: the next unretrieved length value (e.g., 3, 4, 5, and 6) is obtained, and steps 1-7 are repeated until all length values have been retrieved.
The detection results are shown in FIG. 3. When the length of the polypeptide fragment is 2 and the number of the detected polypeptides in the soybean enzymatic hydrolysate to be detected is 163, and the length of the polypeptide fragment is 3, the number of the detected polypeptides in the soybean enzymatic hydrolysate to be detected is 335; when the length of the polypeptide fragment is 4, the detected number of the polypeptides in the soybean enzymolysis product to be detected is 93; when the length of the polypeptide fragment is 5, the number of the detected polypeptides in the soybean enzymolysis product to be detected is 31; when the length of the polypeptide fragment is 6, the number of the detected polypeptides in the soybean enzymolysis product to be detected is 23.
Comparative example 1
In the comparative example, a test was performed using Protein Discovery software (version 2.4, built-in sequence engine) of thermo company in the united states, a soybean Protein sequence library (downloaded from UniProt) was used as the sequence library, the sample was the soybean Protein enzymatic hydrolysate in example 2, the enzymatic hydrolysis mode was nonspecific enzymatic hydrolysis, the parent ion response threshold was set to 500, the polypeptide identification length was 4 to 6 (only 4 due to the minimum method limit), the primary ion mass-to-charge ratio deviation was 10ppm, the secondary ion mass-to-charge ratio deviation was 0.02Da, the secondary ion cluster types were a, b, and y ions, the false Discovery rate was 1%, the software default parameters were used for the other parameters, and the statistical distribution of the identification results is shown in fig. 4.
As can be seen from fig. 4, when the length of the polypeptide fragment is 4, the number of detected polypeptides in the soybean enzymatic hydrolysate to be detected is 1; when the length of the polypeptide fragment is 5, the number of the detected polypeptides in the soybean enzymatic hydrolysate to be detected is 21; when the length of the polypeptide fragment is 6, the number of the detected polypeptides in the soybean enzymolysis product to be detected is 24.
It can be seen that, compared with example 2, the polypeptide length-based short peptide omics identification method of the present invention in example 2 is superior in the analysis range of the length of polypeptide fragments from 2 to 6, and more short peptides are identified at each length value, compared with comparative example 1, which can only perform short peptide analysis between 4 and 6, and the number of identified short peptides is much less than that of the former.
More importantly, in the embodiment 2, the short peptide omics identification method based on the polypeptide length is adopted, and no protein sequence database is loaded in the whole analysis process, so that the blind-corner-free retrieval identification of the short peptide is directly realized, and the advancement, the creativity and the wide application prospect of the invention are fully demonstrated.
In summary, the invention discloses a polypeptide length-based short peptide omics identification method and application thereof, the method includes the steps of exemplifying polypeptide fragments within a designated polypeptide length range, generating theoretical primary parent ions and secondary parent ions one by one, continuously comparing the theoretical primary parent ions and the theoretical secondary parent ions with actual ion spectrogram data acquired by substances in a sample, associating the polypeptide fragments meeting matching requirements with corresponding substances, and finally realizing identification of substance polypeptide sequence structures in the sample.
The polypeptide length-based short peptide omics identification method can directly search and identify protein enzymolysis products without dead angles and leakage completely without depending on a protein sequence database, and meanwhile, the method adopts a completely matched evaluation mode for short peptides, so that the identification result is high in accuracy and high in reliability, and the defects of the existing proteomics analysis method and tools can be fully overcome. In addition, the identification result of the short peptide obtained by the invention can be widely applied to qualitative and quantitative analysis of the short peptide in food-borne protein enzymolysis products or other biological samples, and has important significance for improving the technical level of the proteomics analysis and promoting the development of the industry.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention. Furthermore, the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

Claims (10)

1. A short peptide omics identification method based on polypeptide length is characterized by comprising the following steps:
s1, presetting the length range of the polypeptide fragment, the amino acid residue list and the type of the secondary ion cluster, and acquiring mass spectrum data of primary parent ions and secondary daughter ions of a substance, which are acquired by a sample to be detected in a mass spectrum;
s2, presetting a primary parent ion signal response intensity threshold and a primary parent ion mass-to-charge ratio range, and screening mass spectrum data obtained in the step S1 according to the preset primary parent ion signal response intensity threshold and the primary parent ion mass-to-charge ratio range to obtain a substance set C2 meeting the requirements in the sample to be detected;
s3, obtaining the length value L of the polypeptide fragment in the preset length range of the polypeptide fragment, and calculating the number N of all theoretical polypeptide fragments under each length of the polypeptide fragment according to the number N of the amino acid residue types in the amino acid residue catalog in the step S1LAnd obtaining a sequence set of the theoretical polypeptide fragments and mass-to-charge ratios of primary theoretical parent ions and secondary theoretical child ion clusters of the sequence set;
s4, screening candidate polypeptide fragments meeting the conditions in the substance collection C2 in the step S2 according to the mass-to-charge ratio of the primary theoretical parent ions and the mass-to-charge ratio of the secondary theoretical child ion clusters of the theoretical polypeptide fragments;
and S5, scoring the candidate polypeptide fragments, and selecting the candidate polypeptide fragment with the highest score to judge the candidate polypeptide fragment with the highest score as the polypeptide fragment sequence in the preset length range in the sample to be detected.
2. The short-peptide omics identification method of claim 1,
the length range of the preset polypeptide fragment in the step S1 is any length range, preferably 2-7;
the amino acid residues in the amino acid residue list in step S1 are arbitrary non-repetitive amino acid residues, the number of which is numbered from 0 and is required to be unique and continuous;
the secondary ion cluster types in step S1 include any one or more of an a ion cluster, a b ion cluster, and a y ion cluster.
3. The method for identifying oligopeptides of claim 1, wherein the threshold of the first order parent ion signal response intensity is 3 times or more of the noise intensity of the device; the mass-to-charge ratio range of the primary parent ions is 70-2000 Da.
4. The short peptide omics identification method as set forth in any one of claims 1 to 3, wherein the specific steps of obtaining the material set C2 meeting the requirements in the sample to be tested by screening in step S2 are as follows:
s11, acquiring charge information of each ion in the primary mass spectrum and the secondary mass spectrum according to the mass spectrum data of the primary parent ion and the secondary ion cluster of the substance acquired in the mass spectrum of the sample to be detected in the step S1, and performing ion standardization processing to obtain the mass-to-charge ratio of the primary parent ion and the mass-to-charge ratio of the secondary ion cluster of the substance;
s12, filtering the substances in the step S11 according to the signal response intensity of the primary parent ions, and obtaining a substance set C1 which is higher than the signal response intensity threshold value of the primary parent ions in the step S2;
s13, filtering the substance set in the step S12 according to the primary parent ion mass-to-charge ratio range in the step S2 to obtain a substance set C2 with the primary parent ion mass-to-charge ratio meeting the range requirement, namely screening to obtain a substance set C2 meeting the requirement in the sample to be tested.
5. The short peptide omics identification method of claim 4, wherein the ion normalization process is to convert the multiply charged ions in the mass spectrogram into ions with unit positive charge through mass-to-charge ratio calculation, wherein the ions with unknown charge are defined as ions with single charge.
6. The short peptide omics identification method of claim 1, wherein the screening of said candidate polypeptide fragments in step S4 comprises the steps of:
s21, obtaining an unresearched length value L in the preset polypeptide fragment length range in the step S1;
s22, initializing polypeptide coding value U ═ 0;
s23, carrying out N-system conversion on the polypeptide coding value U according to the number N of the amino acid residue types, and expressing the calculation result according to the L position to obtain a sequence codon X;
s24, associating the number of each position in the sequence codon X with the amino acid residue sequence number, and translating the sequence codon X into a theoretical polypeptide fragment P;
s25, obtaining a first-order theoretical parent ion mass-to-charge ratio Z and a second-order theoretical child ion cluster mass-to-charge ratio set T of the theoretical polypeptide fragment P;
s26, calculating the absolute value of the difference between the first-order parent ion mass-to-charge ratio of the substance set C2 and the first-order theoretical parent ion mass-to-charge ratio Z of the theoretical polypeptide fragment P, and marking as E1Obtaining a substance set F1 with absolute values meeting a set threshold, and judging whether the substance set F1 is empty:
if the sequence is empty, acquiring the next theoretical polypeptide segment from the sequence set of the theoretical polypeptide segments in the step S3, wherein the polypeptide coding value is increased by 1 on the original basis;
if not, comparing the mass-to-charge ratio of the secondary ion clusters of the substances in the substance set F1 with the mass-to-charge ratio set T of the secondary theoretical ion clusters of the theoretical polypeptide fragment P one by one, calculating the absolute value of the difference value between the mass-to-charge ratio of the secondary ion clusters of the substances in the substance set F1 and the mass-to-charge ratio of the secondary theoretical ion clusters of the theoretical polypeptide fragment P, obtaining a substance set F2 to be identified with the absolute value within the deviation value range of the mass-to-charge ratio of the secondary ion clusters, and judging whether the substance set F2 is empty:
if the substance set F2 is not empty, marking the theoretical polypeptide fragment P as a candidate polypeptide fragment of the substance set F2, namely a candidate identification result, and acquiring the next theoretical polypeptide fragment from the sequence set of the theoretical polypeptide fragments in step S3, wherein the polypeptide coding value is increased by 1 on the original basis; if the material set F2 is empty, the next theoretical polypeptide fragment is directly listed, and the encoding value of the polypeptide is increased by 1 on the original basis;
s27, repeating steps S22-S26 until the encoded polypeptide value U equals the total number of possible polypeptides for that length, i.e., U ═ NL
S28, obtaining the length value of the next unrequired polypeptide fragment within the preset polypeptide fragment length range in the step S1, and repeating the steps S21-S27 until the polypeptide fragment length values within the preset polypeptide fragment length range in the step S1 are all retrieved.
7. The short peptide omics identification method of claim 6, wherein said set threshold value is 0-0.02Da or said set threshold value is 0-40ppm in step S26.
8. The short peptide omics identification method of claim 6, wherein the deviation value of the mass-to-charge ratio of the secondary ion clusters in step S26 is in the range of 0-0.05 Da.
9. The short peptidomics identification method as set forth in claim 1, wherein the scoring in step S5 comprises two methods:
the first score calculation method is as follows: comparing the candidate polypeptide fragments with the secondary ion cluster types of actual substances in a sample to be detected, and when 10 points are counted for each detected complete ion cluster, selecting the candidate identification polypeptide with the highest score as a final identification result by default in the output result, and marking as SA
The second score calculation method is as follows: recording the number of secondary ions of the substance in the sample to be tested, which is matched with the candidate polypeptide fragment under the type of secondary ion cluster analysis preset in step S1, and recording the number as NMAnd calculating the mean value of the absolute values of the deviation between the mass-to-charge ratio of the secondary ion of the substance in the sample to be detected and the mass-to-charge ratio of the secondary theoretical ion of the corresponding candidate polypeptide fragment, and recording the mean value as E2Second type of score SBThe calculation formula of (2) is as follows:
Figure FDA0003510866220000031
wherein C is the type number of theoretical secondary ion clusters, and L is the length value of the polypeptide;
preferably, the output result of the scoring is respectively according to SA>SB>E1>E2Is ordered in a priority order of (1), wherein SA、SBHigher score the higher confidence in the result, E1、E2And selecting the first candidate identification polypeptide with the highest score in the sequencing results as the final preferred identification result, wherein the smaller the error is, the higher the reliability of the results is.
10. Use of a short peptidomics identification method as defined in any one of claims 1 to 9 in peptidomics or proteomics.
CN202210151752.2A 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof Active CN114596912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210151752.2A CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210151752.2A CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Publications (2)

Publication Number Publication Date
CN114596912A true CN114596912A (en) 2022-06-07
CN114596912B CN114596912B (en) 2023-08-29

Family

ID=81805260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210151752.2A Active CN114596912B (en) 2022-02-18 2022-02-18 Short peptide histology identification method based on polypeptide length and application thereof

Country Status (1)

Country Link
CN (1) CN114596912B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116217659A (en) * 2022-10-10 2023-06-06 上海市农业科学院 Stropharia rugoso-annulata mycelium flavor peptide and preparation method and application thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020168682A1 (en) * 2001-04-13 2002-11-14 Goodlett David R. Methods for quantification and de novo polypeptide sequencing by mass spectrometry
US20090215103A1 (en) * 2005-06-03 2009-08-27 Waters Investments Limited Generation and use of a catalog of polypeptide-related information for chemical analyses
JP2010256101A (en) * 2009-04-23 2010-11-11 Shimadzu Corp Method and device for analyzing glycopeptide structure
WO2012128580A1 (en) * 2011-03-22 2012-09-27 Korea Advanced Institute Of Science And Technology Water-soluble polypeptides comprised of repeat modules, method for preparing the same and method for a target-specific polypeptide and analysis of biological activity thereof
CN104034791A (en) * 2014-05-04 2014-09-10 北京大学 CID and ETD mass spectrogram fusion based polypeptide de novo sequencing method
CN104076115A (en) * 2014-06-26 2014-10-01 云南民族大学 Protein second-level mass spectrum identification method based on peak intensity recognition capability
CN106645437A (en) * 2015-10-30 2017-05-10 中国科学院大连化学物理研究所 Polypeptide amino acid sequence De novo sequencing method based on chemical modification and isotope labeling
CN110556162A (en) * 2019-08-20 2019-12-10 广州基迪奥生物科技有限公司 Detection and analysis method of cyclic RNA translation polypeptide based on translation group
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020168682A1 (en) * 2001-04-13 2002-11-14 Goodlett David R. Methods for quantification and de novo polypeptide sequencing by mass spectrometry
US20090215103A1 (en) * 2005-06-03 2009-08-27 Waters Investments Limited Generation and use of a catalog of polypeptide-related information for chemical analyses
JP2010256101A (en) * 2009-04-23 2010-11-11 Shimadzu Corp Method and device for analyzing glycopeptide structure
WO2012128580A1 (en) * 2011-03-22 2012-09-27 Korea Advanced Institute Of Science And Technology Water-soluble polypeptides comprised of repeat modules, method for preparing the same and method for a target-specific polypeptide and analysis of biological activity thereof
CN104034791A (en) * 2014-05-04 2014-09-10 北京大学 CID and ETD mass spectrogram fusion based polypeptide de novo sequencing method
CN104076115A (en) * 2014-06-26 2014-10-01 云南民族大学 Protein second-level mass spectrum identification method based on peak intensity recognition capability
CN106645437A (en) * 2015-10-30 2017-05-10 中国科学院大连化学物理研究所 Polypeptide amino acid sequence De novo sequencing method based on chemical modification and isotope labeling
CN110556162A (en) * 2019-08-20 2019-12-10 广州基迪奥生物科技有限公司 Detection and analysis method of cyclic RNA translation polypeptide based on translation group
CN112326769A (en) * 2020-11-04 2021-02-05 西北大学 Method for identifying N-sugar chain branch structure on complete glycopeptide

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王海鹏,付岩,孙瑞祥,贺思敏,曾嵘,高文: "pepReap:基于支持向量机的肽鉴定算法", 计算机研究与发展, no. 09, pages 54 - 61 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116217659A (en) * 2022-10-10 2023-06-06 上海市农业科学院 Stropharia rugoso-annulata mycelium flavor peptide and preparation method and application thereof
CN116217659B (en) * 2022-10-10 2024-02-27 上海市农业科学院 Stropharia rugoso-annulata mycelium flavor peptide and preparation method and application thereof

Also Published As

Publication number Publication date
CN114596912B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
Lam et al. Development and validation of a spectral library searching method for peptide identification from MS/MS
EP1472539B1 (en) Absolute quantification of proteins and modified forms thereof by multistage mass spectrometry
Becker et al. Recent developments in quantitative proteomics
US20060004525A1 (en) System and method of determining proteomic differences
US20120191685A1 (en) Method for identifying peptides and proteins from mass spectrometry data
CA2908962A1 (en) Mass labels
Umar et al. NanoLC‐FT‐ICR MS improves proteome coverage attainable for∼ 3000 laser‐microdissected breast carcinoma cells
Gao et al. Protein analysis by shotgun proteomics
Vinciguerra et al. Identification of proteinaceous binders in paintings: A targeted proteomic approach for cultural heritage
Di et al. MdFDIA: a mass defect based four-plex data-independent acquisition strategy for proteome quantification
EP1346229A2 (en) Inverse labeling method for the rapid identification of marker/target proteins
CN114596912A (en) Short peptide omics identification method based on polypeptide length and application thereof
JP4317083B2 (en) Mass spectrometry method and mass spectrometry system
Mauri et al. Multidimensional protein identification technology for clinical proteomic analysis
KR100805775B1 (en) An additive scoring method for modified polypeptide
Biemann Laying the groundwork for proteomics: mass spectrometry from 1958 to 1988
Lu et al. Shotgun protein identification and quantification by mass spectrometry
CN114639445B (en) Polypeptide histology identification method based on Bayesian evaluation and sequence search library
Lee Probability-based shotgun cross-linking sites analysis
WO2010094300A1 (en) A method for determining in silico- a set of selected target epitopes
Pap et al. Assessing the reproducibility of an O‐glycopeptide enrichment method with a novel software, Pinnacle
CN115112778B (en) Disease protein biomarker identification method
US7765068B2 (en) Identification and characterization of protein fragments
Ruse et al. A tool to evaluate correspondence between extraction ion chromatographic peaks and peptide‐spectrum matches in shotgun proteomics experiments
Jung et al. Systematic analysis of yeast proteome reveals peptide detectability factors for mass spectrometry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant