CN112786105A - Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics - Google Patents

Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics Download PDF

Info

Publication number
CN112786105A
CN112786105A CN202011415023.0A CN202011415023A CN112786105A CN 112786105 A CN112786105 A CN 112786105A CN 202011415023 A CN202011415023 A CN 202011415023A CN 112786105 A CN112786105 A CN 112786105A
Authority
CN
China
Prior art keywords
trypsin
protein
search
peptide
proteolysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011415023.0A
Other languages
Chinese (zh)
Other versions
CN112786105B (en
Inventor
严志祥
单鸿
贺飞翔
张婷
薛可文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Fifth Affiliated Hospital of Sun Yat Sen University
Original Assignee
Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Fifth Affiliated Hospital of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Marine Science and Engineering Guangdong Laboratory Zhuhai, Fifth Affiliated Hospital of Sun Yat Sen University filed Critical Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Priority to CN202011415023.0A priority Critical patent/CN112786105B/en
Publication of CN112786105A publication Critical patent/CN112786105A/en
Application granted granted Critical
Publication of CN112786105B publication Critical patent/CN112786105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of biology, and discloses a macro proteome data mining method taking hemitrypsin polypeptide as a center. These strategies can reduce the false positive rate due to database incompleteness and post-translational modifications. When the method of the invention is used for analyzing the Escherichia coli proteome, 93.4 percent of peptide fragments identified from a huge macroprotein database are consistent with those identified from a traditional Escherichia coli reference database.

Description

Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics
Technical Field
The invention relates to the technical field of biological information analysis, in particular to a macro-proteome mining method and application thereof in obtaining the hydrolysis characteristics of intestinal microorganisms.
Background
Gut microbes live in a dynamic environment and face protein toxicity and metabolic stresses from drugs, diet, microbial competition, and endogenous chemical composition of the host. Bacteria have evolved different regulatory strategies to adapt to changing environments, including changes in gene expression, changes in cell differentiation and motility, in which proteolysis plays a crucial role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to the dynamic intestinal environment. The functions of microorganisms to regulate by proteolysis are very extensive, such as stress response, cell growth division, biofilm formation, secretion of proteins.
Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by both genetic and environmental factors, and primarily includes Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbial dysregulation. In the IBD intestinal microbiome study, metagenomics and 16S rRNA gene sequencing account for the vast majority. However, there is a need for macrotranscriptomics or macroproteomics to pinpoint functional and metabolic activity by directly measuring RNA and protein, respectively. Furthermore, there are important regulatory modes at the protein level, such as proteolysis regulation, which are not available through RNA studies, but can be studied using macroproteomics.
However, in a complex disease state such as IBD, the change in the characteristics of intestinal microbial proteolysis has not been studied, and therefore a method capable of grasping the characteristics of intestinal microbial proteolysis in a complex disease state is demanded.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the problems in the prior art, and firstly, a macro proteome mining method taking hemitrypsin polypeptide as a center is provided, and a method for comparing the proteolysis degree is also provided.
A second object of the invention is to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.
The purpose of the invention is realized by the following technical scheme:
a method of determining the degree of proteolysis, comprising the steps of:
s1, (macro) proteome data of the obtained sample or published in a public database;
s2, performing a first search using a large macro protein database and PEAKS DB software to obtain a protein from which at least one peptide is identified;
s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKS DB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKS DB, the MaxQuant software and the pBind software;
s4, distinguishing Semi-trypsin polypeptide (Semi-tryptic peptide) and full-trypsin polypeptide (full tryptic peptide) in the peptide obtained in S3;
and S5, determining the proteolysis degree by using the normalized relative abundance of the semi-trypsin polypeptide, wherein the normalized relative abundance of the semi-trypsin polypeptide is obtained by normalizing the relative abundance of the semi-trypsin polypeptide to the relative abundance of the full trypsin polypeptide.
Preferably, in S4, the identification principle of the hemitrypsin polypeptide is: peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein).
The first amino acid of the peptide fragment generated by the trypsin hydrolysis of the protein during the preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If hemitrypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in that the first or last amino acid of the peptide fragment is not K or R, so hemitrypsin can be used as a sign that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a sign that the protein is not hydrolyzed by other proteases in the organism. But studying the extent of proteolysis cannot be solely dependent on hemitrypsin, since changes in the abundance of hemitrypsin may be due solely to changes in the corresponding total amount of protein (increased or decreased synthesis), while the extent of proteolysis is not changed. It is therefore desirable to normalize the relative abundance of hemitrypsin polypeptides to that of complete trypsin polypeptides to compare the change in the degree of proteolysis between samples, thus eliminating the factor of total protein variation.
Preferably, the parameters for performing the search in the PEAKS DB database are: the mass deviation of the parent ion (precursor ion) is 10ppm, and the mass deviation of the fragment ion (product ion) is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 3; the false positive rate (false discovery rate) was set to 1%.
Preferably, the parameters of the MaxQuant performing the search are as follows: the primary search (first search) quality deviation was 20ppm, the main search (main search) quality deviation was 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (FDR) is set to 1%, and the peptide fragments with the Posterior Error Probability (PEP) less than 5% are reserved for subsequent analysis.
Preferably, pFind performs the search with the parameters: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-search (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
The invention also provides the application of the method.
In particular, the above method is used to capture the proteolytic characteristics of intestinal microorganisms. Given the different levels of information beyond the flora structure and protein abundance, this analysis was based on the assumption that similar degrees of proteolysis should result in similar relative abundances of hemitrypsin polypeptides, the present study found that microbial hemitrypsin polypeptides in the 447 faecal macroproteins were enriched in several biological processes including metabolic processes of fatty acids, carboxylic acids, glucose and dunaliose, biosynthetic processes of branched chain amino acids, protein trafficking and bacterial flagellar-mediated cell motility, indicating that they undergo a more extensive regulation of proteolysis.
Alternatively, the above methods are used to study gut microflora and host-microorganism interactions.
The method of the present invention for mining the proteome is also suitable for capturing the proteolytic characteristics of plants and environmental microorganisms, and therefore, the method can be used for exploring the proteolytic laws of plants and environmental microorganisms.
The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and the change of the bacterial proteolysis degree can be researched, so that the corresponding bacterial protease is taken as a target, and the corresponding medicine is developed in a targeted manner for regulation.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a macro-proteome mining method taking hemitrypsin polypeptide as a center, which comprises two-step search, de novo sequencing, open search and result matching of various software to carry out large-scale macro-proteome mining taking hemitrypsin peptide as the center. These strategies can reduce false positive identifications due to database incompleteness and polypeptide modification. In the past, semi-trypsin polypeptide search is carried out on a macro-proteomics data set generated by low-resolution MS/MS, so that the search space is inevitably increased, and the confidence of an identification result is reduced. In their study, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching the Pyrococcus furiosus proteome in a large macro database containing 6162,582 sequences. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. The use of the method of the present invention in the analysis of the E.coli proteome showed that 93.4% of the peptides identified from a significantly larger macroprotein database (130,975,891 sequences) were identical to those identified from the conventional E.coli reference database, indicating a better accuracy of the method.
Drawings
Figure 1 is the normalized relative abundance of semi-trypsin polypeptides from the main bacterial species and biological processes (NRASP, semi-trypsin polypeptide abundance/full trypsin polypeptide abundance) in 447 fecal metabolomics samples, with the functions of the different bacterial species (a), biological processes (B) and enzymes (C) in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile, and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), and the outliers are shown as dots;
FIG. 2 shows the change in proteolytic characteristics of E.coli proteome in different biological processes induced by heat stress (p < 0.05).
Detailed Description
The following further describes the embodiments of the present invention. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used are, unless otherwise specified, commercially available reagents and materials.
Data set: a data set of 2 publicly published populations of healthy and IBD intestinal macroproteins was analyzed, data set 1(PXD008675) consisting of 447 fecal macroproteins from 89 subjects aged 6-58 years with a median of 22.8 years, including 24 non-IBD controls, 39 CD patients, 26 UC patients; of these samples, 272 samples had a matching metagenome and 184 samples had a matching metaproteome, respectively; we also analyzed the proteome data set (PXS000498) to investigate the effect of heat stress on the proteolytic regulation of E.coli K-12.
Macro protein database: a comprehensive human intestinal microbial protein database consists of the following components: (1) an Integrated Gene Catalog (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) sequence data for 215 strains cultured from healthy adult feces; (3) a Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 enterobacteria isolated from healthy human feces; (4) all archaea, bacterial and fungal sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The above-mentioned microbial sequence database is supplemented with a UniProt human reference proteome, which includes a food database composed of dietary organic substances, such as the organisms Triticum aestivum, Oryza sativa subsp, Glycine max, Zea mays, Arachis hypogaea, Solanum tuberosum, Lycopersicum esculentum, Sus scrofa, Bos taurus, Chicken (Gallus gallis), sheep (Ovis aries), Fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp, and Lipopenaeus vammi), and a common contaminant database (Sal. sativa), a food database composed of dietary organic substances, such as Triticum aestivum, Oryza sativa, and ahttp://maxquant.org/contaminants.zip). Using USEARCH v11.0.667(-Fastx _ Uniques) to remove repeat protein sequence, 130,975,891 non-redundant sequences were obtained.
The statistical analysis method comprises the following steps: multivariate analysis was performed on the amino acid frequencies near the cleavage site using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). Variables that differ significantly between groups (present in at least 75% of samples) were detected in R (version 3.5.3) and RStudio (version 1.1.383) using Kruskal-Wallis test and Dunn-Bonferroni test with P values less than 0.05. The beta diversity of the multiple sets of mathematical data was determined using principal coordinate analysis of the Bray-Curtis distance (PcoA).
Example 1 representation of different software performing a search
Using MLI data sets and large macroprotein databases, we compared the performance of different commercial software (protome discover, PEAK, ProteinPilot, and byionic) and open source software (MaxQuant, MSFragger, and pFind) searching for hemitryptic peptides on several 36-core servers (with 192G memory installed). Proteome discover, Byonic, MaxQuant, pFind, and ProteinPilot did not complete the search within a month, while MSFragger crashed due to an out of memory error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-core high performance computing cluster that completed the database search within 2 weeks.
Example 2 database search
The database search process generally includes two main steps: (1) de-novo sequencing and performing a first search using a large macro protein database (large database) and PEAKS software to obtain proteins from which at least one peptide is identified and to generate a corresponding small protein database (reduced database); (2) a second search was performed using reduced database and various software to improve the accuracy of identifying hemitrypsin polypeptides.
To cope with the increased search space and time in the identification of macroprotein hemitrypsin polypeptides, a search was first performed on clusters configured with an intel (r) xeon (r) 156-core processor and 1.5TB 2666MHz memory using PEAKS DB, the software first performed de novo sequencing, followed by a database search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.
Here a two-step search strategy is used, in order to increase the sensitivity of the search of the library, the protein identified by at least one peptide in the first search step is retained for the second multiple engine search, the second search step using PEAKS DB, MaxQuant (version 1.6.2) and pFind (version 3.1.5).
A MaxQuant (version 1.6.2.10) search is performed using the Andromeda engine. The parameters are set as follows: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving the peptide segment with the posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectra. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Quantification of proteins and peptides using the label-free quantification (LFQ) algorithm, the minimum ratio count was 1, and the minimum and average neighborhood numbers were 3 and 6, respectively.
Database searching was performed using pFind, the mass deviation of parent ions was 10ppm, the mass deviation of fragment ions was 20ppm, the library search mode was open-search (open-search), the enzyme was trypsin, the cleavage mode was semi-specific, and the number of sites not cleaved was at most 3.
Only peptides identified by three search engines (PEAKS DB, MaxQuant and pFind) were retained for further analysis.
Example 3 hemitrypsin polypeptide identification and Classification and functional analysis
1. Identification principle of hemitrypsin polypeptide
Peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein). The In-source fragment (In-source CID fragment) is distinguished from the proteolytically derived hemitrypsin polypeptide by elution time. Most of the intrasource fragments showed different retention times compared to their theoretical retention times (predicted using SSRCalc). Microbial hemitrypsin polypeptides are distinguished from human-derived and food-derived peptides by the corresponding accession numbers in the FASTA sequence entries.
2. Data combining hemitrypsin and complete trypsin to quantify the degree of hydrolysis of proteins
We determined the change in the degree of proteolysis from the normalized relative abundances of hemitryptic peptides (NRASP) by normalizing the relative abundances of hemitryptic peptides to the relative abundance of complete tryptic peptides. This normalization step is important because if the abundance of the hemitrypsin polypeptide and the complete trypsin polypeptide are varied in proportion, it is generally indicated that there is no change in the degree of proteolysis. However, in this case, differences between groups occur if only hemitrypsin polypeptides are compared.
3. Results
To improve the sensitivity of macro-proteome analysis based on large sequence space, we adopted a two-step database search strategy. This effectively reduces the macro-protein database size to that of traditional proteomic analysis, thereby facilitating hemitrypsin-based macro-proteomic searches. Furthermore, the confidence in peptide identification was improved by combining three commonly used software. These software used different algorithms for peak matching, co-efflux peptide fragment recognition and FDR calculation (MaxQuant and pFind use the target-decoy strategy, PEASK DB uses the decoy-fusion method), thereby significantly increasing the confidence of peptide recognition. Only the peptides identified collectively by the three software were retained for further analysis.
A total of 12,828,005 MS/MS patterns were retrieved and 3,804,903 (29.66%) secondary Patterns (PSMs), 125,494 peptides were identified from the stool macroprotein group, of which 108,784 (86.68%) are microbial-specific peptides (not shared by human or food sequences). 7,969 (6.35%) human specific polypeptides were identified in the fecal macroprotein group, of which 5,104 (64.05%) peptide is hemitrypsin. Gene Ontology (GO) analysis showed that 84.13% of human hemitryptic peptides are derived from potential extracellular proteins, and only 1.16% of microbial hemitryptic peptides are derived from potential extracellular proteins.
Example 4 the above procedure was verified by analyzing the proteolytic characteristics in the E.coli heat shock reaction
We validated our method by analyzing the heat shock-induced proteolytic features using the published proteome data set of E.coli K12. 9937 peptides were identified using the large macroprotein database described above in conjunction with three search engines, while 14111 peptides were identified using the UniProt e.coli K12 reference database. The number of peptides identified in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since large databases produced 10,000 times more sequences than the conventional reference sequence.
Of all 14111 peptides identified by UniProt e.coli K12 reference database, 83.7% had PEP values below 0.01 and 61.6% had PEP values below 0.001. Whereas in the 4783 peptides identified only by the UniProt e. coli K12 reference database (not identified by the macroprotein database), PEP values were 60.3% below 0.01 and 39.5% below 0.001. Peptide fragments identified in the reference database by UniProt e. coli K12 only had higher PEP values, indicating that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity reduction in large database searches. It is also noteworthy that the single microbial proteome is significantly different from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) have produced comparable results for intestinal macroproteomics studies. Therefore, our method does not show significant sensitivity loss in intestinal metaproteome analysis. 93.4% of the peptide fragments identified by the huge macroprotein database are consistent with those identified by the escherichia coli reference database, which indicates that the method has higher peptide fragment identification accuracy.
To validate our approach, we compared the NRASP of 185 biological processes found in all samples (as a regulatory indicator of proteolysis), and found that the NRASP of 20 (about 10.8%) biological processes was significantly different between the control and heat-stressed groups (P-value <0.05, fig. 2).
Heat stress disturbs the folding of proteins, leading to the accumulation of misfolded proteins that need to be refolded into the correct conformation. Accordingly, using our method, it was found that NRASP associated with protein folding under heat stress decreased, while NRASP associated with protein refolding increased. At the same time we observed an increase in methylation-associated NRASP under heat stress, which is consistent with recent findings. In conclusion, the biological findings obtained by using our method and the regulation of proteolysis have higher reliability.
Example 5 Classification and functional analysis of peptides
Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, and all peptides were analyzed with the following parameters: i and L were equalized, repeat peptides were filtered, and Advanced deletion cleavage treatment (Advanced missing cleavage). The classification information is a Sunburst view visualization provided using UniPept. A
Results of the study
(1) Relative abundance and distribution of hemitrypsin polypeptides
Figure 1 shows NRASP with 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses identified in at least 75% of samples from 447 fecal macroproteins from CD (n-204), UC (n-123) and control (n-120) groups. The median number of NRASPs from the phyla Firmicutes and Bacteroidetes, Bacteroides and Clostridia, Bacteroides and Bacteroides, and Bacteroides (Bacteroides) was around 1, indicating that the relative abundance of the corresponding hemitryptic peptide was comparable to that of the complete tryptic peptide (FIG. 1A). However, the median of NRASP increased to about 1.25 in the families of Lachnospiraceae and ruminants (ruminococcus), respectively, the median of NRASP increased to 1.5 in the genera Roseburia (genera Rosebularia) and Prevotella (Prevotella), respectively, and Clostridium (Faecalixizii) and Prevotella (Prevotella copri), respectively, while the median of NRASP decreased to about 0.5 in the phylum Actinobacillus (Actinobacillus) and the order Bifidobacterium (Bifidobacterium). The above data indicate that different enterobacteria have different degrees of proteolytic hydrolysis.
The median of NRASP for most biological processes also fluctuates around 1 (fig. 1B). While NRASP values of isoleucine biosynthetic process, valine biosynthetic process, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolic process, fucose metabolic process and glucose metabolic process are all increased to 1.75-2, NRASP of fatty acid metabolic process and L-threonine catabolic process is further increased to 2.5, NRASP of polysaccharide catabolic process, carbohydrate transport and transmembrane transport is reduced to about 0.75, and NRASP of metabolic process is further reduced to 0.3.
At the enzyme level, NRASP is highest for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism (median >3), followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid beta oxidation, glycine C-acetyltransferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductoisomerase involved in Branched Chain Amino Acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).

Claims (10)

1. A method of determining the degree of proteolysis, comprising the steps of:
s1, (macro) proteome data of the obtained sample or published in a public database;
s2, performing a first search using a large macro protein database and PEAKSDB software to obtain a protein in which at least one peptide is identified;
s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKSDB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKSDB, MaxQuant and pBind software;
s4, distinguishing a hemitrypsin polypeptide and a complete trypsin polypeptide in the peptide obtained in S3;
and S5, determining the proteolysis degree by using the normalized relative abundance of the semi-trypsin polypeptide, wherein the normalized relative abundance of the semi-trypsin polypeptide is obtained by normalizing the relative abundance of the semi-trypsin polypeptide to the relative abundance of the full trypsin polypeptide.
2. The method of determining the degree of proteolysis of claim 1, wherein the identity of the hemitrypsin polypeptide in S4 is determined by: the identified peptide fragment is a hemi-trypsin N-terminal peptide if the first amino acid is not R or K (excluding the protein N-terminal peptide fragment), and the identified peptide fragment is a hemi-trypsin C-terminal peptide if the last amino acid is not R or K (excluding the protein C-terminal peptide fragment).
3. The method of claim 1, wherein the PEAKSDB database performs the search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.
4. The method of claim 1, wherein MaxQuant performs a search with parameters of: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate is set as 1%, and peptide fragments with the posterior error probability less than 5% are reserved for subsequent analysis.
5. Method for determining the degree of proteolysis according to claim 1, characterized in that the pFind performs the search with the parameters: the mass deviation of the parent ions is 10ppm, the mass deviation of the fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by the enzyme is at most 3; FDR was set to 1%.
6. Use of the method of any one of claims 1 to 5.
7. Use according to claim 6, wherein the method is used to capture characteristic information of intestinal microbial proteolysis.
8. The use according to claim 6, wherein the method is used for studying gut microbial and host interaction.
9. Use according to claim 6, wherein the method is used for studying diseases associated with bacterial proteases.
10. The use according to claim 9, wherein said diseases include, but are not limited to, bacterial infections, inflammatory bowel disease.
CN202011415023.0A 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms Active CN112786105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011415023.0A CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011415023.0A CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Publications (2)

Publication Number Publication Date
CN112786105A true CN112786105A (en) 2021-05-11
CN112786105B CN112786105B (en) 2024-05-07

Family

ID=75750749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011415023.0A Active CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Country Status (1)

Country Link
CN (1) CN112786105B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115267033A (en) * 2022-08-05 2022-11-01 西湖大学 Macro-proteomics analysis method based on mass spectrum data and electronic equipment

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
US20050032040A1 (en) * 2002-10-11 2005-02-10 Bettina Warscheild Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides
US20050048564A1 (en) * 2001-05-30 2005-03-03 Andrew Emili Protein expression profile database
CN1692282A (en) * 2002-04-15 2005-11-02 萨莫芬尼根有限责任公司 Quantitation of biological molecules
US20070231909A1 (en) * 2005-10-13 2007-10-04 Applera Corporation Methods for the development of a biomolecule assay
US20100047261A1 (en) * 2006-10-31 2010-02-25 Curevac Gmbh Base-modified rna for increasing the expression of a protein
US20100143912A1 (en) * 2007-01-25 2010-06-10 The Regents Of The Universuty Of California Specific n-terminal labeling of peptides and proteins in complex mixtures
US20110093205A1 (en) * 2009-10-19 2011-04-21 Palo Alto Research Center Incorporated Proteomics previewer
CN103268432A (en) * 2013-05-08 2013-08-28 中国科学院水生生物研究所 Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry
US20140072991A1 (en) * 2011-04-04 2014-03-13 Atlas Antibodies Ab Quantitative standard for mass spectrometry of proteins
KR20140101134A (en) * 2013-02-08 2014-08-19 건국대학교 산학협력단 Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration
US20150248998A1 (en) * 2012-11-15 2015-09-03 Dh Technologies Development Pte. Ltd. Systems and Methods for Identifying Compounds from MS/MS Data without Precursor Ion Information
US20150309045A1 (en) * 2012-11-28 2015-10-29 Eth Zurich Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof
WO2018165350A1 (en) * 2017-03-07 2018-09-13 Nuseed Pty Ltd. Lc-ms/ms-based methods for characterizing proteins
US20180340941A1 (en) * 2017-05-25 2018-11-29 Wisconsin Alumni Research Foundation Method to Map Protein Landscapes
CN109444313A (en) * 2018-10-23 2019-03-08 大连工业大学 Method based on LC-MS technology analysis protein-PS complex digestibility
US20190307856A1 (en) * 2016-10-12 2019-10-10 Institute For Research In Biomedicine Arginine And Its Use As A T Cell Modulator
US20200141946A1 (en) * 2017-08-25 2020-05-07 Nanjing Agricultural University Method for evaluating in vivo protein nutrition based on lc-ms-ms technique
CN111220690A (en) * 2018-11-27 2020-06-02 中国科学院大连化学物理研究所 Direct mass spectrometry detection method for low-abundance protein posttranslational modification group

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050048564A1 (en) * 2001-05-30 2005-03-03 Andrew Emili Protein expression profile database
CN1692282A (en) * 2002-04-15 2005-11-02 萨莫芬尼根有限责任公司 Quantitation of biological molecules
US20050032040A1 (en) * 2002-10-11 2005-02-10 Bettina Warscheild Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
US20070231909A1 (en) * 2005-10-13 2007-10-04 Applera Corporation Methods for the development of a biomolecule assay
US20100047261A1 (en) * 2006-10-31 2010-02-25 Curevac Gmbh Base-modified rna for increasing the expression of a protein
US20100143912A1 (en) * 2007-01-25 2010-06-10 The Regents Of The Universuty Of California Specific n-terminal labeling of peptides and proteins in complex mixtures
US20110093205A1 (en) * 2009-10-19 2011-04-21 Palo Alto Research Center Incorporated Proteomics previewer
US20140072991A1 (en) * 2011-04-04 2014-03-13 Atlas Antibodies Ab Quantitative standard for mass spectrometry of proteins
US20150248998A1 (en) * 2012-11-15 2015-09-03 Dh Technologies Development Pte. Ltd. Systems and Methods for Identifying Compounds from MS/MS Data without Precursor Ion Information
US20150309045A1 (en) * 2012-11-28 2015-10-29 Eth Zurich Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof
KR20140101134A (en) * 2013-02-08 2014-08-19 건국대학교 산학협력단 Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration
CN103268432A (en) * 2013-05-08 2013-08-28 中国科学院水生生物研究所 Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry
US20190307856A1 (en) * 2016-10-12 2019-10-10 Institute For Research In Biomedicine Arginine And Its Use As A T Cell Modulator
WO2018165350A1 (en) * 2017-03-07 2018-09-13 Nuseed Pty Ltd. Lc-ms/ms-based methods for characterizing proteins
US20180340941A1 (en) * 2017-05-25 2018-11-29 Wisconsin Alumni Research Foundation Method to Map Protein Landscapes
US20200141946A1 (en) * 2017-08-25 2020-05-07 Nanjing Agricultural University Method for evaluating in vivo protein nutrition based on lc-ms-ms technique
CN109444313A (en) * 2018-10-23 2019-03-08 大连工业大学 Method based on LC-MS technology analysis protein-PS complex digestibility
CN111220690A (en) * 2018-11-27 2020-06-02 中国科学院大连化学物理研究所 Direct mass spectrometry detection method for low-abundance protein posttranslational modification group

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何明敏;舒坤贤;白明泽;许睿;: "质谱图聚类网络法在鉴定多肽翻译后修饰中的应用及研究进展", 生物工程学报, no. 10, 19 April 2018 (2018-04-19) *
吴重德;黄钧;周荣清;: "宏蛋白质组学研究进展及应用", 食品与发酵工业, no. 05, 15 April 2016 (2016-04-15) *
齐崴, 何明霞, 何志敏, 史德青: "胰蛋白酶水解全酪蛋白反应过程中的色谱分析", 色谱, no. 01, 30 January 2002 (2002-01-30) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115267033A (en) * 2022-08-05 2022-11-01 西湖大学 Macro-proteomics analysis method based on mass spectrum data and electronic equipment

Also Published As

Publication number Publication date
CN112786105B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
Braaksma et al. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data
Kolmeder et al. Metaproteomics of our microbiome—developing insight in function and activity in man and model systems
Pomastowski et al. Analysis of bacteria associated with honeys of different geographical and botanical origin using two different identification approaches: MALDI-TOF MS and 16S rDNA PCR technique
Kallow et al. MALDI‐TOF MS for microbial identification: Years of experimental development to an established protocol
Falb et al. Archaeal N-terminal protein maturation commonly involves N-terminal acetylation: a large-scale proteomics survey
Radzinski et al. Temporal profiling of redox-dependent heterogeneity in single cells
Jonckheere et al. Omics assisted N-terminal proteoform and protein expression profiling on methionine aminopeptidase 1 (MetAP1) deletion
Šedo et al. Limitations of routine MALDI-TOF mass spectrometric identification of Acinetobacter species and remedial actions
Sabarly et al. Interactions between genotype and environment drive the metabolic phenotype within E scherichia coli isolates
CN112786105A (en) Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics
Laschuk et al. Proteomic survey of the cestode Mesocestoides corti during the first 24 hours of strobilar development
US8224581B1 (en) Methods for detection and identification of cell type
Seerangaiyan et al. Untargeted metabolomics of the bacterial tongue coating of intra-oral halitosis patients
Franklin et al. Proteomic genotyping: Using mass spectrometry to infer SNP genotypes in pigmented and non-pigmented hair
Yu et al. Proteogenomic analysis provides novel insight into genome annotation and nitrogen metabolism in Nostoc Sp. PCC 7120
Plikat et al. From proteomics to systems biology of bacterial pathogens: approaches, tools, and applications
WO2022192857A9 (en) Biomarkers for determining an immuno-oncology response
Yan et al. Metaproteomics reveals potential signatures of disease-specific alterations in the gut microbial proteolytic events in inflammatory bowel disease
EP4097478A2 (en) Biomarkers for diagnosing ovarian cancer
Chen et al. Human exhaled air diagnostic markers for respiratory tract infections in subjects receiving mechanical ventilation
Karlsson et al. Proteotyping: Tandem mass spectrometry shotgun proteomic characterization and typing of pathogenic microorganisms
Candela et al. Automatic discrimination of species within the Enterobacter cloacae complex using MALDI-TOF Mass Spectrometry and supervised algorithms
Bukato et al. Proteomic dataset: Profiling of cultivated Echerichia coli isolates from Crohn's disease patients and healthy individuals Q9
Weldatsadik et al. Pool-seq driven proteogenomic database for Group G Streptococcus
Rakitina et al. Proteomic dataset: Profiling of cultivated Echerichia coli isolates from Crohn's disease patients and healthy individuals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant