CN112786105B

CN112786105B - Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Info

Publication number: CN112786105B
Application number: CN202011415023.0A
Authority: CN
Inventors: 严志祥; 单鸿; 贺飞翔; 张婷; 薛可文
Original assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhuhai; Fifth Affiliated Hospital of Sun Yat Sen University
Current assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhuhai; Fifth Affiliated Hospital of Sun Yat Sen University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2024-05-07
Anticipated expiration: 2040-12-07
Also published as: CN112786105A

Abstract

The invention belongs to the technical field of biology, and discloses a macroprotein group data mining method taking a tryptic peptide as a center, which comprises two-step library searching, de novo sequencing, open searching and multiple library searching software matching, and is used for large-scale macroprotein group information mining taking the tryptic peptide as a center aiming at high-resolution mass spectrum data. These strategies can reduce the false positive rate due to database imperfections and post-translational modifications. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a huge macro protein database are consistent with the peptide fragments identified from the traditional colibacillus reference database.

Description

Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms

Technical Field

The invention relates to the technical field of biological information analysis, in particular to a macro proteome mining method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms.

Background

Intestinal microorganisms live in a dynamic environment, facing protein toxicity and metabolic stress from drugs, diets, microbial competition, and host endogenous chemicals. Bacteria have evolved different regulatory strategies to accommodate changing environments, including alterations in gene expression, cell differentiation and changes in motility, in which proteolysis plays a vital role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to a dynamic intestinal environment. Microorganisms have a very broad range of functions which are regulated by proteolysis, such as stress reactions, cell growth division, biofilm formation, secretion of proteins.

Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by genetic and environmental factors, mainly including Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbiologic dysregulation. In IBD gut microbiome studies, metagenomics and 16S rRNA gene sequencing are the vast majority. However, macro-transcriptomics or macro-proteomics are required to pinpoint functional and metabolic activities by direct measurement of RNA and protein, respectively. In addition, there are important regulatory patterns at the protein level, such as proteolytic regulation, which cannot be obtained by RNA studies, but can be studied using macroproteomics.

However, in complex disease states such as IBD, the characteristic changes of the proteolysis of intestinal microorganisms have not been studied yet, and thus a method capable of grasping the proteolytic characteristics of intestinal microorganisms in complex disease states is highly demanded.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art, and firstly provides a macroproteome excavation method taking a trypsin polypeptide as a center and also provides a method for comparing the degree of proteolysis.

It is a second object of the present invention to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.

The aim of the invention is achieved by the following technical scheme:

A method of determining the degree of proteolysis comprising the steps of:

S1, acquiring (macro) proteome data of a sample or (macro) proteome data published in a public database;

S2, performing a first search by using a large macro protein database and PEAKS DB software to obtain at least one protein with the peptide identified;

S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides simultaneously identified by the PEAKS DB software, the MaxQuant software and the pFind software;

S4, distinguishing a half trypsin polypeptide (Semi-TRYPTIC PEPTIDE) and a full trypsin polypeptide (full TRYPTIC PEPTIDE) in the peptide obtained in the S3;

S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide.

Preferably, in S4, the principle of identification of the hemicrypsin polypeptide is: peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein).

The previous amino acid of the peptide fragment produced by trypsin hydrolysis of the protein during preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If half-trypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in an amino acid in front of the peptide stretch or in the last amino acid being other than K or R, so that half-trypsin can be used as a marker that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a marker that the protein is not hydrolyzed by other proteases in the organism. However, studies on the degree of proteolysis cannot rely solely on trypsin, since the change in the abundance of trypsin is probably due solely to a change in the corresponding total amount of protein (increase or decrease in synthesis), whereas the degree of proteolysis is not. It is therefore desirable to normalize the relative abundance of the hemicrypsin polypeptide to that of the holo-trypsin polypeptide to compare changes in the degree of proteolysis between different samples, thus eliminating the factor of total protein variation.

Preferably, the parameters of the PEAKS DB database performing the search are: the mass deviation of parent ion (pre-conditioner ion) was 10ppm, and the mass deviation of fragment ion (product ion) was 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 3; the false positive rate (false discovery rate) was set to 1%.

Preferably, parameters for MaxQuant to perform the search are: the primary search (FIRST SEARCH) had a mass deviation of 20ppm and the primary search (MAIN SEARCH) had a mass deviation of 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (false discovery rate, FDR) was set to 1% and peptide fragments with a posterior error probability (posterior error probability, PEP) of less than 5% were retained for subsequent analysis.

Preferably, parameters for pFind to perform the search are: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-library searching (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.

The invention also provides application of the method.

In particular, the above method is used to capture proteolytic characteristics of intestinal microorganisms. The present inventors have found that microbial trypsin polypeptides in the 447 fecal macro-proteome are enriched in several biological processes, such as fatty acid, carboxylic acid, glucose and salt algae metabolism, branched chain amino acid biosynthesis, protein transport and bacterial flagella mediated cell movement, suggesting that they undergo more extensive proteolytic regulation.

Or the above method is used to study intestinal microflora and host-microorganism interactions.

The above-described proteome mining method of the present invention is also applicable to capturing proteolytic characteristics of plant and environmental microorganisms, and thus, the above-described method can be used to explore proteolytic laws of plant and environmental microorganisms.

The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and can be used for researching the change of bacterial proteolytic degree, so that corresponding bacterial protease is used as a target, and corresponding medicaments are developed for regulation and control in a targeted manner.

Compared with the prior art, the invention has the following beneficial effects:

The invention provides a macro proteome mining method taking a half-trypsin polypeptide as a center, which comprises two-step searching, de novo sequencing, open searching and multiple software result matching so as to perform large-scale macro proteome mining taking the half-trypsin polypeptide as the center. These strategies may reduce false positive recognition due to database imperfections and polypeptide modifications. Previous studies performed a halftoning polypeptide search on macro proteomics datasets generated by low resolution MS/MS, inevitably increasing search space and decreasing confidence in the identification results. In their study, in a macro protein large database containing 6162,582 sequences, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching Pyrococcus furiosus proteomes. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a large macro protein database (130,975,891 sequences) are consistent with the peptide fragments identified from the traditional colibacillus reference database, so that the method has better accuracy.

Drawings

FIG. 1 shows the normalized relative abundance of a hemicrypsin polypeptide from a major bacterial species and biological process (NRASP, hemicrypsin polypeptide abundance/holo-trypsin polypeptide abundance) in 447 stool metabolism proteomics samples, with functions of different bacterial species (A), biological process (B) and enzyme (C) arranged in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), with outliers shown as points;

FIG. 2 shows the changes in proteolytic characteristics of different biological processes under heat stress induction (p < 0.05) of E.coli proteomes.

Detailed Description

The following describes the invention in more detail. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used, unless otherwise specified, are those commercially available.

Data set: a dataset of 2 publicly published populations of healthy and IBD intestinal macroproteomes was analyzed, dataset 1 (PXD 008675) consisting of 447 fecal macroproteomes from 89 subjects aged 6-58 years with a median 22.8 years, including 24 non-IBD control groups, 39 CD patients, 26 UC patients; of these samples, 272 samples each had a matching metagenome and 184 samples had a matching metaproteome; we also analyzed the proteome dataset (PXS 000498) to investigate the effect of heat stress on E.coli K-12 proteolytic regulation.

Macro protein database: a comprehensive human intestinal microbial protein database is composed of the following parts: (1) INTEGRATED GENE Category (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) Sequence data of 215 strains cultured from healthy adult human feces; (3) Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 strains of enterobacteria isolated from healthy human feces; (4) All archaea, bacteria and fungi sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The microbial sequence database described above is supplemented with a UniProt human reference proteome, which includes food databases of dietary organic compositions such as bio-common wheat (Triticum aestivum), rice (Oryza sativa subsp. Japonica), soybean (Glycine max), corn (Zea mays), peanut (Arachis hypogaea), potato (Solanum tuberosum), tomato (Solanum lycopersicum), pig (susscia), cow (Bos taurus), chicken (Gallus gallus), sheep (Ovis aries), fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp. Litopenaeus vannamei), and a common contaminant database (http:// maxquat. Org/contacts. Zip). The repeated protein sequences were removed using USEARCH v11.0.667 (-Fastx _unique) to yield 130,975,891 non-redundant sequences.

The statistical analysis method comprises the following steps: the amino acid frequencies near the cleavage sites were subjected to multivariate analysis using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). The Kruskal-Wallis test and Dunn-Bonferroni test were used in R (vesion 3.5.3) and RStudio (version 1.1.383), with P values less than 0.05 to detect variables that differ significantly between groups (in at least 75% of the samples). The beta diversity of the multiple sets of mathematical data was determined using a principal coordinate analysis (PcoA) of the Bray-Curtis distances.

Example 1 manifestation of different software performing searches

Using the MLI dataset and the large macro protein database, we compared the performance of different commercial software (Proteome Discoverer, PEAK, proteinPilot, and Byonic) and open source software (MaxQuant, MSFragger and pFind) searching for tryptic peptides on several 36-core servers (with 192G memory installed). Proteome Discoverer, byonic, maxQuant, pFind, and ProteinPilot did not complete the search within one month, while MSFRAGGER crashed due to a memory starvation error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-kernel high performance computing cluster that completed the database search within 2 weeks.

Example 2 database search

The database search process generally includes two main steps: (1) De-novo sequencing and performing a first search using large macro database (large database) and PEAKS software to obtain at least one protein identified by the peptide and to generate a corresponding small database of proteins; (2) A second search was performed using reduced database and various software to improve the accuracy of identifying the hemitrypsin polypeptide.

To address the increased search space and time in macro-proteomic hemicrypsin polypeptide identification, searches were first performed using PEAKS DB on clusters configured with Intel (R) Xeon (R) 156 core processor and 1.5tb 2666mhz memory, software first performed de novo sequencing, followed by database searches using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.

The two-step search strategy is used here in order to increase the sensitivity of the search pool, in which the proteins identified by at least one peptide are retained for a second round of multi-engine search in a first search, and PEAKS DB, maxQuant (version 1.6.2) and pFind (version 3.1.5) are used in a second search.

MaxQuant (version 1.6.2.10) search was performed using Andromeda engine. The setting parameters are as follows: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving peptide fragments with posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectrum. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Protein and peptide quantification using a label-free quantification (LFQ) algorithm, a minimum ratio count of 1, and a minimum neighborhood number and average neighborhood number of 3 and 6, respectively.

Database searching was performed using pFind, with a mass bias of 10ppm for parent ions and 20ppm for fragment ions, with an open-search mode, trypsin as enzyme, and half-specificity as enzyme cleavage, with a maximum of 3 sites not cleaved by enzyme.

Only peptides recognized by the three search engines (PEAKS DB, maxQuant, and pFind) will be retained for further analysis.

Example 3 identification, classification and functional analysis of a hemitrypsin polypeptide

1. Identification principle of a Defactin polypeptide

Peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein). The In-source fragment (In-source CID FRAGMENT) was distinguished from the proteolytically derived hemicrypsin polypeptide according to elution time. Most intra-source fragments show different retention times compared to their theoretical retention times (predicted using SSRCalc). The microbial trypsin-like polypeptide is distinguished from human-derived and food-derived peptides according to the corresponding search number in the FASTA sequence entry.

2. Combining data from half trypsin and full trypsin to quantify the degree of protein hydrolysis

We determined the change in the degree of proteolysis from the normalized relative abundance of the tryptic peptides (normalized relative abundance of semi-TRYPTIC PEPTIDES, NRASP for short) by normalizing the relative abundance of the tryptic peptides to that of the full tryptic peptides. This normalization step is important because if the abundance of the half-trypsin polypeptide and the full-trypsin polypeptide changes proportionally, it is usually indicated that there is no change in the degree of proteolysis. However, in this case, if only the half-trypsin polypeptides are compared, the group-to-group difference occurs.

3. Results

To increase the sensitivity of macro proteome analysis based on large sequence space, we employed a two-step database search strategy. This effectively reduces the size of the macro protein database to that of traditional proteomic analysis, thereby facilitating a half-trypsin-based macro proteomic search. In addition, the reliability of peptide identification is improved by combining three commonly used software. These software used different algorithms for peak matching, co-outflow peptide fragment identification and FDR calculation (MaxQuant and pFind used target-decoy strategy, PEASK DB used decoy-fusion method), thus significantly increasing the confidence of peptide identification. Only peptides identified together by three software were retained for further analysis.

12,828,005 MS/MS profiles were retrieved and 3,804,903 (29.66%) secondary Profiles (PSMs), 125,494 peptides were identified from the fecal macroproteome, of which 108,784 (86.68%) peptides were microbial specific peptides (not shared by human or food sequences). 7,969 (6.35%) of human specific polypeptides were identified in the fecal macroproteome, of which 5,104 (64.05%) of the peptide was trypsin halfprypsin. Gene Ontology (GO) analysis showed that 84.13% of the human tryptic peptides were derived from potential extracellular proteins, and only 1.16% of the microbial tryptic peptides were derived from potential extracellular proteins.

Example 4 the above method was validated by analysis of proteolytic characteristics in the E.coli heat shock reaction

We validated our method by analyzing the heat shock induced proteolytic profile using the published proteomic data set of E.coli K12. 9937 peptide fragments were identified using the large macro protein database described above, while 14111 peptide fragments were identified using the UniProt e.coll K12 reference database, in combination with three search engines. The number of identified peptides in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since the large database produced 10,000 times more sequences than the conventional reference sequences.

Of all 14111 peptide fragments identified by the UniProt e.collik12 reference database, 83.7% had a PEP value below 0.01 and 61.6% had a PEP value below 0.001. Whereas of 4783 peptide fragments identified only by the UniProt e.collik12 reference database (not identified by the macroprotein database), PEP values were less than 60.3% and less than 39.5% of 0.01. The higher PEP values in the peptide fragments identified by the UniProt e.collik12 reference database alone indicate that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity degradation when searching large databases. It is also notable that the single microbial proteome differs significantly from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) produce comparable results for intestinal macro proteomics studies. Thus, our method does not suffer significant sensitivity loss in intestinal metaproteomic analysis. 93.4% of peptides identified by the huge macro protein database are identical to those identified by the E.coli reference database, which shows that our method has higher accuracy of identifying peptides.

To verify our approach, we compared NRASP of the 185 biological processes found in all samples (as a proteolytic regulatory index), and found that NRASP of 20 (approximately 10.8%) biological processes differed significantly between the control and heat stressed groups (P-value <0.05, fig. 2).

Heat stress can disrupt the folding of proteins, resulting in the accumulation of misfolded proteins that need to refold into the correct conformation. Accordingly, using our method we found that NRASP under heat stress was reduced in protein folding and NRASP was increased in association with protein refolding. At the same time we observed that NRASP associated with methylation increased under heat stress, consistent with recent findings. In conclusion, the biological findings with proteolytic control obtained using our method have a high degree of confidence.

Example 5 Classification and functional analysis of peptides

Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, all peptides were analyzed with the following parameters: let I and L equal, filter the repeat peptide, advanced deletion cleavage treatment (ADVANCED MISSING CLEAVAGE HANDLING). The classification information is visualized using the Sunburst view provided by UniPept. A step of

Results of the study

(1) Relative abundance and distribution of a hemitrypsin polypeptide

Figure 1 shows NRASP of 447 fecal macroproteomes from CD (n=204), UC (n=123) and control (n=120) groups identified 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses in at least 75% of the samples. The median of NRASP of the phylum firmicutes (phyla Firmicutes) and Bacteroides (Bacteroidetes), bacteroides (Bacteroidia) and clostridia (Clostridia), bacteroides (Bacteroidales) and clostridia (Clostridiales), bacteroides (fBacteroidaceae) and Bacteroides (bacterioides) was around 1, indicating that the relative abundance of the corresponding hemicelluloses to the complete tryptic peptides was comparable (fig. 1A). However, the median of NRASP increases to about 1.25 in the trichomonadaceae (Lachnospiraceae) and ruminant cocci (Ruminococcaceae), respectively, the median of NRASP of the genus rosbehenia (genera Roseburia) and the genus praziella (Prevotella) and the genus praziella (Faecalibacterium prausnitzii) and the genus praziella (Prevotella copri), respectively, increases to 1.5, and the median of NRASP of the phylum actinobacillus (Actinobacteria) and the order bifidobacterium (Bifidobacteriales) decreases to about 0.5. The above data indicate that different intestinal bacteria have different degrees of protease hydrolysis.

The median of NRASP of most biological processes also fluctuates around 1 (fig. 1B). While isoleucine biosynthesis, valine biosynthesis, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolism, fucose metabolism and glucose metabolism all increase to a value of NRASP of 1.75-2, fatty acid metabolism and L-threonine catabolism NRASP further increase to 2.5, polysaccharide catabolism NRASP of carbohydrate transport and transmembrane transport decrease to about 0.75, and metabolism NRASP further decreases to 0.3.

At the enzyme level, NRASP is highest (median > 3) for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism, followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid β oxidation, glycine C-acetyl transferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductase isomerase involved in branched-chain amino acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).

Claims

1. A method for determining the degree of proteolysis of a microorganism in the intestinal tract, comprising the steps of:

S1, acquiring macro proteome data of a sample or macro proteome data published in a public database;

S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides identified by the PEAKS DB software, the MaxQuant software and the pFind software simultaneously;

s4, distinguishing the half trypsin polypeptide and the complete trypsin polypeptide in the peptide obtained in the S3;

S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide;

In S4, the identification principle of the hemicrypsin polypeptide is as follows: the identified peptide fragment is a half-trypsin N-terminal peptide if the previous amino acid is not R or K and does not include a protein N-terminal peptide fragment, and the identified peptide fragment is a half-trypsin C-terminal peptide if the last amino acid is not R or K and does not include a protein C-terminal peptide fragment; the in-source fragments were distinguished from proteolytically derived halftoning polypeptides according to elution time.

2. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters of the PEAKS DB database search are: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.

3. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search MaxQuant are: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate was set to 1% and peptide fragments with a posterior error probability of less than 5% were retained for subsequent analysis.

4. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search pFind are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.

5. A method of determining the degree of proteolysis of an intestinal microorganism according to any of claims 1 to 4 wherein the method is used to capture characteristic information of proteolysis of an intestinal microorganism.

6. The method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the method is used to study gut microorganism and host interactions.

7. A method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4 for the study of diseases associated with bacterial proteases.

8. The method for determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the disease comprises a bacterial infection, inflammatory bowel disease.