CN112786105B - Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms - Google Patents

Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms Download PDF

Info

Publication number
CN112786105B
CN112786105B CN202011415023.0A CN202011415023A CN112786105B CN 112786105 B CN112786105 B CN 112786105B CN 202011415023 A CN202011415023 A CN 202011415023A CN 112786105 B CN112786105 B CN 112786105B
Authority
CN
China
Prior art keywords
trypsin
proteolysis
peptide
protein
enzyme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011415023.0A
Other languages
Chinese (zh)
Other versions
CN112786105A (en
Inventor
严志祥
单鸿
贺飞翔
张婷
薛可文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Fifth Affiliated Hospital of Sun Yat Sen University
Original Assignee
Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Fifth Affiliated Hospital of Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Marine Science and Engineering Guangdong Laboratory Zhuhai, Fifth Affiliated Hospital of Sun Yat Sen University filed Critical Southern Marine Science and Engineering Guangdong Laboratory Zhuhai
Priority to CN202011415023.0A priority Critical patent/CN112786105B/en
Publication of CN112786105A publication Critical patent/CN112786105A/en
Application granted granted Critical
Publication of CN112786105B publication Critical patent/CN112786105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of biology, and discloses a macroprotein group data mining method taking a tryptic peptide as a center, which comprises two-step library searching, de novo sequencing, open searching and multiple library searching software matching, and is used for large-scale macroprotein group information mining taking the tryptic peptide as a center aiming at high-resolution mass spectrum data. These strategies can reduce the false positive rate due to database imperfections and post-translational modifications. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a huge macro protein database are consistent with the peptide fragments identified from the traditional colibacillus reference database.

Description

Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms
Technical Field
The invention relates to the technical field of biological information analysis, in particular to a macro proteome mining method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms.
Background
Intestinal microorganisms live in a dynamic environment, facing protein toxicity and metabolic stress from drugs, diets, microbial competition, and host endogenous chemicals. Bacteria have evolved different regulatory strategies to accommodate changing environments, including alterations in gene expression, cell differentiation and changes in motility, in which proteolysis plays a vital role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to a dynamic intestinal environment. Microorganisms have a very broad range of functions which are regulated by proteolysis, such as stress reactions, cell growth division, biofilm formation, secretion of proteins.
Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by genetic and environmental factors, mainly including Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbiologic dysregulation. In IBD gut microbiome studies, metagenomics and 16S rRNA gene sequencing are the vast majority. However, macro-transcriptomics or macro-proteomics are required to pinpoint functional and metabolic activities by direct measurement of RNA and protein, respectively. In addition, there are important regulatory patterns at the protein level, such as proteolytic regulation, which cannot be obtained by RNA studies, but can be studied using macroproteomics.
However, in complex disease states such as IBD, the characteristic changes of the proteolysis of intestinal microorganisms have not been studied yet, and thus a method capable of grasping the proteolytic characteristics of intestinal microorganisms in complex disease states is highly demanded.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art, and firstly provides a macroproteome excavation method taking a trypsin polypeptide as a center and also provides a method for comparing the degree of proteolysis.
It is a second object of the present invention to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.
The aim of the invention is achieved by the following technical scheme:
A method of determining the degree of proteolysis comprising the steps of:
S1, acquiring (macro) proteome data of a sample or (macro) proteome data published in a public database;
S2, performing a first search by using a large macro protein database and PEAKS DB software to obtain at least one protein with the peptide identified;
S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides simultaneously identified by the PEAKS DB software, the MaxQuant software and the pFind software;
S4, distinguishing a half trypsin polypeptide (Semi-TRYPTIC PEPTIDE) and a full trypsin polypeptide (full TRYPTIC PEPTIDE) in the peptide obtained in the S3;
S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide.
Preferably, in S4, the principle of identification of the hemicrypsin polypeptide is: peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein).
The previous amino acid of the peptide fragment produced by trypsin hydrolysis of the protein during preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If half-trypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in an amino acid in front of the peptide stretch or in the last amino acid being other than K or R, so that half-trypsin can be used as a marker that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a marker that the protein is not hydrolyzed by other proteases in the organism. However, studies on the degree of proteolysis cannot rely solely on trypsin, since the change in the abundance of trypsin is probably due solely to a change in the corresponding total amount of protein (increase or decrease in synthesis), whereas the degree of proteolysis is not. It is therefore desirable to normalize the relative abundance of the hemicrypsin polypeptide to that of the holo-trypsin polypeptide to compare changes in the degree of proteolysis between different samples, thus eliminating the factor of total protein variation.
Preferably, the parameters of the PEAKS DB database performing the search are: the mass deviation of parent ion (pre-conditioner ion) was 10ppm, and the mass deviation of fragment ion (product ion) was 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 3; the false positive rate (false discovery rate) was set to 1%.
Preferably, parameters for MaxQuant to perform the search are: the primary search (FIRST SEARCH) had a mass deviation of 20ppm and the primary search (MAIN SEARCH) had a mass deviation of 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (false discovery rate, FDR) was set to 1% and peptide fragments with a posterior error probability (posterior error probability, PEP) of less than 5% were retained for subsequent analysis.
Preferably, parameters for pFind to perform the search are: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-library searching (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
The invention also provides application of the method.
In particular, the above method is used to capture proteolytic characteristics of intestinal microorganisms. The present inventors have found that microbial trypsin polypeptides in the 447 fecal macro-proteome are enriched in several biological processes, such as fatty acid, carboxylic acid, glucose and salt algae metabolism, branched chain amino acid biosynthesis, protein transport and bacterial flagella mediated cell movement, suggesting that they undergo more extensive proteolytic regulation.
Or the above method is used to study intestinal microflora and host-microorganism interactions.
The above-described proteome mining method of the present invention is also applicable to capturing proteolytic characteristics of plant and environmental microorganisms, and thus, the above-described method can be used to explore proteolytic laws of plant and environmental microorganisms.
The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and can be used for researching the change of bacterial proteolytic degree, so that corresponding bacterial protease is used as a target, and corresponding medicaments are developed for regulation and control in a targeted manner.
Compared with the prior art, the invention has the following beneficial effects:
The invention provides a macro proteome mining method taking a half-trypsin polypeptide as a center, which comprises two-step searching, de novo sequencing, open searching and multiple software result matching so as to perform large-scale macro proteome mining taking the half-trypsin polypeptide as the center. These strategies may reduce false positive recognition due to database imperfections and polypeptide modifications. Previous studies performed a halftoning polypeptide search on macro proteomics datasets generated by low resolution MS/MS, inevitably increasing search space and decreasing confidence in the identification results. In their study, in a macro protein large database containing 6162,582 sequences, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching Pyrococcus furiosus proteomes. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a large macro protein database (130,975,891 sequences) are consistent with the peptide fragments identified from the traditional colibacillus reference database, so that the method has better accuracy.
Drawings
FIG. 1 shows the normalized relative abundance of a hemicrypsin polypeptide from a major bacterial species and biological process (NRASP, hemicrypsin polypeptide abundance/holo-trypsin polypeptide abundance) in 447 stool metabolism proteomics samples, with functions of different bacterial species (A), biological process (B) and enzyme (C) arranged in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), with outliers shown as points;
FIG. 2 shows the changes in proteolytic characteristics of different biological processes under heat stress induction (p < 0.05) of E.coli proteomes.
Detailed Description
The following describes the invention in more detail. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used, unless otherwise specified, are those commercially available.
Data set: a dataset of 2 publicly published populations of healthy and IBD intestinal macroproteomes was analyzed, dataset 1 (PXD 008675) consisting of 447 fecal macroproteomes from 89 subjects aged 6-58 years with a median 22.8 years, including 24 non-IBD control groups, 39 CD patients, 26 UC patients; of these samples, 272 samples each had a matching metagenome and 184 samples had a matching metaproteome; we also analyzed the proteome dataset (PXS 000498) to investigate the effect of heat stress on E.coli K-12 proteolytic regulation.
Macro protein database: a comprehensive human intestinal microbial protein database is composed of the following parts: (1) INTEGRATED GENE Category (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) Sequence data of 215 strains cultured from healthy adult human feces; (3) Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 strains of enterobacteria isolated from healthy human feces; (4) All archaea, bacteria and fungi sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The microbial sequence database described above is supplemented with a UniProt human reference proteome, which includes food databases of dietary organic compositions such as bio-common wheat (Triticum aestivum), rice (Oryza sativa subsp. Japonica), soybean (Glycine max), corn (Zea mays), peanut (Arachis hypogaea), potato (Solanum tuberosum), tomato (Solanum lycopersicum), pig (susscia), cow (Bos taurus), chicken (Gallus gallus), sheep (Ovis aries), fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp. Litopenaeus vannamei), and a common contaminant database (http:// maxquat. Org/contacts. Zip). The repeated protein sequences were removed using USEARCH v11.0.667 (-Fastx _unique) to yield 130,975,891 non-redundant sequences.
The statistical analysis method comprises the following steps: the amino acid frequencies near the cleavage sites were subjected to multivariate analysis using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). The Kruskal-Wallis test and Dunn-Bonferroni test were used in R (vesion 3.5.3) and RStudio (version 1.1.383), with P values less than 0.05 to detect variables that differ significantly between groups (in at least 75% of the samples). The beta diversity of the multiple sets of mathematical data was determined using a principal coordinate analysis (PcoA) of the Bray-Curtis distances.
Example 1 manifestation of different software performing searches
Using the MLI dataset and the large macro protein database, we compared the performance of different commercial software (Proteome Discoverer, PEAK, proteinPilot, and Byonic) and open source software (MaxQuant, MSFragger and pFind) searching for tryptic peptides on several 36-core servers (with 192G memory installed). Proteome Discoverer, byonic, maxQuant, pFind, and ProteinPilot did not complete the search within one month, while MSFRAGGER crashed due to a memory starvation error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-kernel high performance computing cluster that completed the database search within 2 weeks.
Example 2 database search
The database search process generally includes two main steps: (1) De-novo sequencing and performing a first search using large macro database (large database) and PEAKS software to obtain at least one protein identified by the peptide and to generate a corresponding small database of proteins; (2) A second search was performed using reduced database and various software to improve the accuracy of identifying the hemitrypsin polypeptide.
To address the increased search space and time in macro-proteomic hemicrypsin polypeptide identification, searches were first performed using PEAKS DB on clusters configured with Intel (R) Xeon (R) 156 core processor and 1.5tb 2666mhz memory, software first performed de novo sequencing, followed by database searches using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.
The two-step search strategy is used here in order to increase the sensitivity of the search pool, in which the proteins identified by at least one peptide are retained for a second round of multi-engine search in a first search, and PEAKS DB, maxQuant (version 1.6.2) and pFind (version 3.1.5) are used in a second search.
MaxQuant (version 1.6.2.10) search was performed using Andromeda engine. The setting parameters are as follows: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving peptide fragments with posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectrum. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Protein and peptide quantification using a label-free quantification (LFQ) algorithm, a minimum ratio count of 1, and a minimum neighborhood number and average neighborhood number of 3 and 6, respectively.
Database searching was performed using pFind, with a mass bias of 10ppm for parent ions and 20ppm for fragment ions, with an open-search mode, trypsin as enzyme, and half-specificity as enzyme cleavage, with a maximum of 3 sites not cleaved by enzyme.
Only peptides recognized by the three search engines (PEAKS DB, maxQuant, and pFind) will be retained for further analysis.
Example 3 identification, classification and functional analysis of a hemitrypsin polypeptide
1. Identification principle of a Defactin polypeptide
Peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein). The In-source fragment (In-source CID FRAGMENT) was distinguished from the proteolytically derived hemicrypsin polypeptide according to elution time. Most intra-source fragments show different retention times compared to their theoretical retention times (predicted using SSRCalc). The microbial trypsin-like polypeptide is distinguished from human-derived and food-derived peptides according to the corresponding search number in the FASTA sequence entry.
2. Combining data from half trypsin and full trypsin to quantify the degree of protein hydrolysis
We determined the change in the degree of proteolysis from the normalized relative abundance of the tryptic peptides (normalized relative abundance of semi-TRYPTIC PEPTIDES, NRASP for short) by normalizing the relative abundance of the tryptic peptides to that of the full tryptic peptides. This normalization step is important because if the abundance of the half-trypsin polypeptide and the full-trypsin polypeptide changes proportionally, it is usually indicated that there is no change in the degree of proteolysis. However, in this case, if only the half-trypsin polypeptides are compared, the group-to-group difference occurs.
3. Results
To increase the sensitivity of macro proteome analysis based on large sequence space, we employed a two-step database search strategy. This effectively reduces the size of the macro protein database to that of traditional proteomic analysis, thereby facilitating a half-trypsin-based macro proteomic search. In addition, the reliability of peptide identification is improved by combining three commonly used software. These software used different algorithms for peak matching, co-outflow peptide fragment identification and FDR calculation (MaxQuant and pFind used target-decoy strategy, PEASK DB used decoy-fusion method), thus significantly increasing the confidence of peptide identification. Only peptides identified together by three software were retained for further analysis.
12,828,005 MS/MS profiles were retrieved and 3,804,903 (29.66%) secondary Profiles (PSMs), 125,494 peptides were identified from the fecal macroproteome, of which 108,784 (86.68%) peptides were microbial specific peptides (not shared by human or food sequences). 7,969 (6.35%) of human specific polypeptides were identified in the fecal macroproteome, of which 5,104 (64.05%) of the peptide was trypsin halfprypsin. Gene Ontology (GO) analysis showed that 84.13% of the human tryptic peptides were derived from potential extracellular proteins, and only 1.16% of the microbial tryptic peptides were derived from potential extracellular proteins.
Example 4 the above method was validated by analysis of proteolytic characteristics in the E.coli heat shock reaction
We validated our method by analyzing the heat shock induced proteolytic profile using the published proteomic data set of E.coli K12. 9937 peptide fragments were identified using the large macro protein database described above, while 14111 peptide fragments were identified using the UniProt e.coll K12 reference database, in combination with three search engines. The number of identified peptides in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since the large database produced 10,000 times more sequences than the conventional reference sequences.
Of all 14111 peptide fragments identified by the UniProt e.collik12 reference database, 83.7% had a PEP value below 0.01 and 61.6% had a PEP value below 0.001. Whereas of 4783 peptide fragments identified only by the UniProt e.collik12 reference database (not identified by the macroprotein database), PEP values were less than 60.3% and less than 39.5% of 0.01. The higher PEP values in the peptide fragments identified by the UniProt e.collik12 reference database alone indicate that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity degradation when searching large databases. It is also notable that the single microbial proteome differs significantly from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) produce comparable results for intestinal macro proteomics studies. Thus, our method does not suffer significant sensitivity loss in intestinal metaproteomic analysis. 93.4% of peptides identified by the huge macro protein database are identical to those identified by the E.coli reference database, which shows that our method has higher accuracy of identifying peptides.
To verify our approach, we compared NRASP of the 185 biological processes found in all samples (as a proteolytic regulatory index), and found that NRASP of 20 (approximately 10.8%) biological processes differed significantly between the control and heat stressed groups (P-value <0.05, fig. 2).
Heat stress can disrupt the folding of proteins, resulting in the accumulation of misfolded proteins that need to refold into the correct conformation. Accordingly, using our method we found that NRASP under heat stress was reduced in protein folding and NRASP was increased in association with protein refolding. At the same time we observed that NRASP associated with methylation increased under heat stress, consistent with recent findings. In conclusion, the biological findings with proteolytic control obtained using our method have a high degree of confidence.
Example 5 Classification and functional analysis of peptides
Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, all peptides were analyzed with the following parameters: let I and L equal, filter the repeat peptide, advanced deletion cleavage treatment (ADVANCED MISSING CLEAVAGE HANDLING). The classification information is visualized using the Sunburst view provided by UniPept. A step of
Results of the study
(1) Relative abundance and distribution of a hemitrypsin polypeptide
Figure 1 shows NRASP of 447 fecal macroproteomes from CD (n=204), UC (n=123) and control (n=120) groups identified 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses in at least 75% of the samples. The median of NRASP of the phylum firmicutes (phyla Firmicutes) and Bacteroides (Bacteroidetes), bacteroides (Bacteroidia) and clostridia (Clostridia), bacteroides (Bacteroidales) and clostridia (Clostridiales), bacteroides (fBacteroidaceae) and Bacteroides (bacterioides) was around 1, indicating that the relative abundance of the corresponding hemicelluloses to the complete tryptic peptides was comparable (fig. 1A). However, the median of NRASP increases to about 1.25 in the trichomonadaceae (Lachnospiraceae) and ruminant cocci (Ruminococcaceae), respectively, the median of NRASP of the genus rosbehenia (genera Roseburia) and the genus praziella (Prevotella) and the genus praziella (Faecalibacterium prausnitzii) and the genus praziella (Prevotella copri), respectively, increases to 1.5, and the median of NRASP of the phylum actinobacillus (Actinobacteria) and the order bifidobacterium (Bifidobacteriales) decreases to about 0.5. The above data indicate that different intestinal bacteria have different degrees of protease hydrolysis.
The median of NRASP of most biological processes also fluctuates around 1 (fig. 1B). While isoleucine biosynthesis, valine biosynthesis, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolism, fucose metabolism and glucose metabolism all increase to a value of NRASP of 1.75-2, fatty acid metabolism and L-threonine catabolism NRASP further increase to 2.5, polysaccharide catabolism NRASP of carbohydrate transport and transmembrane transport decrease to about 0.75, and metabolism NRASP further decreases to 0.3.
At the enzyme level, NRASP is highest (median > 3) for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism, followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid β oxidation, glycine C-acetyl transferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductase isomerase involved in branched-chain amino acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).

Claims (8)

1. A method for determining the degree of proteolysis of a microorganism in the intestinal tract, comprising the steps of:
S1, acquiring macro proteome data of a sample or macro proteome data published in a public database;
S2, performing a first search by using a large macro protein database and PEAKS DB software to obtain at least one protein with the peptide identified;
S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides identified by the PEAKS DB software, the MaxQuant software and the pFind software simultaneously;
s4, distinguishing the half trypsin polypeptide and the complete trypsin polypeptide in the peptide obtained in the S3;
S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide;
In S4, the identification principle of the hemicrypsin polypeptide is as follows: the identified peptide fragment is a half-trypsin N-terminal peptide if the previous amino acid is not R or K and does not include a protein N-terminal peptide fragment, and the identified peptide fragment is a half-trypsin C-terminal peptide if the last amino acid is not R or K and does not include a protein C-terminal peptide fragment; the in-source fragments were distinguished from proteolytically derived halftoning polypeptides according to elution time.
2. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters of the PEAKS DB database search are: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.
3. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search MaxQuant are: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate was set to 1% and peptide fragments with a posterior error probability of less than 5% were retained for subsequent analysis.
4. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search pFind are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
5. A method of determining the degree of proteolysis of an intestinal microorganism according to any of claims 1 to 4 wherein the method is used to capture characteristic information of proteolysis of an intestinal microorganism.
6. The method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the method is used to study gut microorganism and host interactions.
7. A method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4 for the study of diseases associated with bacterial proteases.
8. The method for determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the disease comprises a bacterial infection, inflammatory bowel disease.
CN202011415023.0A 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms Active CN112786105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011415023.0A CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011415023.0A CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Publications (2)

Publication Number Publication Date
CN112786105A CN112786105A (en) 2021-05-11
CN112786105B true CN112786105B (en) 2024-05-07

Family

ID=75750749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011415023.0A Active CN112786105B (en) 2020-12-07 2020-12-07 Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Country Status (1)

Country Link
CN (1) CN112786105B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115267033A (en) * 2022-08-05 2022-11-01 西湖大学 Macro-proteomics analysis method based on mass spectrum data and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
CN1692282A (en) * 2002-04-15 2005-11-02 萨莫芬尼根有限责任公司 Quantitation of biological molecules
CN103268432A (en) * 2013-05-08 2013-08-28 中国科学院水生生物研究所 Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry
KR20140101134A (en) * 2013-02-08 2014-08-19 건국대학교 산학협력단 Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration
WO2018165350A1 (en) * 2017-03-07 2018-09-13 Nuseed Pty Ltd. Lc-ms/ms-based methods for characterizing proteins
CN109444313A (en) * 2018-10-23 2019-03-08 大连工业大学 Method based on LC-MS technology analysis protein-PS complex digestibility
CN111220690A (en) * 2018-11-27 2020-06-02 中国科学院大连化学物理研究所 Direct mass spectrometry detection method for low-abundance protein posttranslational modification group

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2349265A1 (en) * 2001-05-30 2002-11-30 Andrew Emili Protein expression profile database
US8071329B2 (en) * 2002-10-11 2011-12-06 University Of Maryland Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides
EP1941280A2 (en) * 2005-10-13 2008-07-09 Applera Corporation Methods for the development of a biomolecule assay
DE102006051516A1 (en) * 2006-10-31 2008-05-08 Curevac Gmbh (Base) modified RNA to increase the expression of a protein
US8679771B2 (en) * 2007-01-25 2014-03-25 The Regents Of The University Of California Specific N-terminal labeling of peptides and proteins in complex mixtures
US20110093205A1 (en) * 2009-10-19 2011-04-21 Palo Alto Research Center Incorporated Proteomics previewer
EP2508537A1 (en) * 2011-04-04 2012-10-10 Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. Quantitative standard for mass spectrometry of proteins
US10141169B2 (en) * 2012-11-15 2018-11-27 Dh Technologies Development Pte. Ltd. Systems and methods for identifying compounds from MS/MS data without precursor ion information
EP2738558A1 (en) * 2012-11-28 2014-06-04 ETH Zurich Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof
EP3308778A1 (en) * 2016-10-12 2018-04-18 Institute for Research in Biomedicine Arginine and its use as a t cell modulator
US20180340941A1 (en) * 2017-05-25 2018-11-29 Wisconsin Alumni Research Foundation Method to Map Protein Landscapes
CN107655985B (en) * 2017-08-25 2020-05-26 南京农业大学 LC-MS-MS technology-based in vivo protein nutrition evaluation method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1692282A (en) * 2002-04-15 2005-11-02 萨莫芬尼根有限责任公司 Quantitation of biological molecules
WO2004046731A2 (en) * 2002-11-18 2004-06-03 Ludwig Institute For Cancer Research Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives
KR20140101134A (en) * 2013-02-08 2014-08-19 건국대학교 산학협력단 Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration
CN103268432A (en) * 2013-05-08 2013-08-28 中国科学院水生生物研究所 Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry
WO2018165350A1 (en) * 2017-03-07 2018-09-13 Nuseed Pty Ltd. Lc-ms/ms-based methods for characterizing proteins
CN109444313A (en) * 2018-10-23 2019-03-08 大连工业大学 Method based on LC-MS technology analysis protein-PS complex digestibility
CN111220690A (en) * 2018-11-27 2020-06-02 中国科学院大连化学物理研究所 Direct mass spectrometry detection method for low-abundance protein posttranslational modification group

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
宏蛋白质组学研究进展及应用;吴重德;黄钧;周荣清;;食品与发酵工业;20160415(05);全文 *
胰蛋白酶水解全酪蛋白反应过程中的色谱分析;齐崴, 何明霞, 何志敏, 史德青;色谱;20020130(01);全文 *
质谱图聚类网络法在鉴定多肽翻译后修饰中的应用及研究进展;何明敏;舒坤贤;白明泽;许睿;;生物工程学报;20180419(10);全文 *

Also Published As

Publication number Publication date
CN112786105A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
Braaksma et al. An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data
Pomastowski et al. Analysis of bacteria associated with honeys of different geographical and botanical origin using two different identification approaches: MALDI-TOF MS and 16S rDNA PCR technique
Karlsson et al. Proteotyping: Proteomic characterization, classification and identification of microorganisms–A prospectus
Merkley et al. Applications and challenges of forensic proteomics
Falb et al. Archaeal N-terminal protein maturation commonly involves N-terminal acetylation: a large-scale proteomics survey
Zhu et al. Comparative proteomic analysis of sensitive and multi-drug resistant Aeromonas hydrophila isolated from diseased fish
Radzinski et al. Temporal profiling of redox-dependent heterogeneity in single cells
US8412464B1 (en) Methods for detection and identification of cell type
CN112786105B (en) Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms
Šedo et al. Limitations of routine MALDI-TOF mass spectrometric identification of Acinetobacter species and remedial actions
Jonckheere et al. Omics assisted N-terminal proteoform and protein expression profiling on methionine aminopeptidase 1 (MetAP1) deletion
Zhang et al. ScCobB2-mediated Lysine Desuccinylation Regulates Protein Biosynthesis and Carbon Metabolism in Streptomyces coelicolor*[S]
Gregersen et al. Proteomic characterization of pilot scale hot-water extracts from the industrial carrageenan red seaweed Eucheuma denticulatum
Laschuk et al. Proteomic survey of the cestode Mesocestoides corti during the first 24 hours of strobilar development
Yan et al. A semi-tryptic peptide centric metaproteomic mining approach and its potential utility in capturing signatures of gut microbial proteolysis
Benabdelkamel et al. Serum-based proteomics profiling in adult patients with cystic fibrosis
Spörl et al. A UHPLC-MS/MS method for the detection of meat substitution by nine legume species in emulsion-type sausages
Willmann et al. Multi-omics approach identifies novel pathogen-derived prognostic biomarkers in patients with Pseudomonas aeruginosa bloodstream infection
Plikat et al. From proteomics to systems biology of bacterial pathogens: approaches, tools, and applications
CA3208429A1 (en) Biomarkers for determining an immuno-oncology response
CN113433253A (en) Novel method for detecting Enterobacter sakazakii, application and detection kit
Yan et al. Metaproteomics reveals potential signatures of disease-specific alterations in the gut microbial proteolytic events in inflammatory bowel disease
Candela et al. Automatic discrimination of species within the Enterobacter cloacae complex using MALDI-TOF Mass Spectrometry and supervised algorithms
Karlsson et al. Proteotyping: Tandem mass spectrometry shotgun proteomic characterization and typing of pathogenic microorganisms
Noecker et al. Systems biology illuminates alternative metabolic niches in the human gut microbiome

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant