CN112786105A

CN112786105A - Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics

Info

Publication number: CN112786105A
Application number: CN202011415023.0A
Authority: CN
Inventors: 严志祥; 单鸿; 贺飞翔; 张婷; 薛可文
Original assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhuhai; Fifth Affiliated Hospital of Sun Yat Sen University
Current assignee: Southern Marine Science and Engineering Guangdong Laboratory Zhuhai; Fifth Affiliated Hospital of Sun Yat Sen University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-05-11
Anticipated expiration: 2040-12-07
Also published as: CN112786105B

Abstract

The invention belongs to the technical field of biology, and discloses a macro proteome data mining method taking hemitrypsin polypeptide as a center. These strategies can reduce the false positive rate due to database incompleteness and post-translational modifications. When the method of the invention is used for analyzing the Escherichia coli proteome, 93.4 percent of peptide fragments identified from a huge macroprotein database are consistent with those identified from a traditional Escherichia coli reference database.

Description

Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics

Technical Field

The invention relates to the technical field of biological information analysis, in particular to a macro-proteome mining method and application thereof in obtaining the hydrolysis characteristics of intestinal microorganisms.

Background

Gut microbes live in a dynamic environment and face protein toxicity and metabolic stresses from drugs, diet, microbial competition, and endogenous chemical composition of the host. Bacteria have evolved different regulatory strategies to adapt to changing environments, including changes in gene expression, changes in cell differentiation and motility, in which proteolysis plays a crucial role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to the dynamic intestinal environment. The functions of microorganisms to regulate by proteolysis are very extensive, such as stress response, cell growth division, biofilm formation, secretion of proteins.

Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by both genetic and environmental factors, and primarily includes Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbial dysregulation. In the IBD intestinal microbiome study, metagenomics and 16S rRNA gene sequencing account for the vast majority. However, there is a need for macrotranscriptomics or macroproteomics to pinpoint functional and metabolic activity by directly measuring RNA and protein, respectively. Furthermore, there are important regulatory modes at the protein level, such as proteolysis regulation, which are not available through RNA studies, but can be studied using macroproteomics.

However, in a complex disease state such as IBD, the change in the characteristics of intestinal microbial proteolysis has not been studied, and therefore a method capable of grasping the characteristics of intestinal microbial proteolysis in a complex disease state is demanded.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the problems in the prior art, and firstly, a macro proteome mining method taking hemitrypsin polypeptide as a center is provided, and a method for comparing the proteolysis degree is also provided.

A second object of the invention is to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.

The purpose of the invention is realized by the following technical scheme:

a method of determining the degree of proteolysis, comprising the steps of:

s1, (macro) proteome data of the obtained sample or published in a public database;

s2, performing a first search using a large macro protein database and PEAKS DB software to obtain a protein from which at least one peptide is identified;

s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKS DB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKS DB, the MaxQuant software and the pBind software;

s4, distinguishing Semi-trypsin polypeptide (Semi-tryptic peptide) and full-trypsin polypeptide (full tryptic peptide) in the peptide obtained in S3;

and S5, determining the proteolysis degree by using the normalized relative abundance of the semi-trypsin polypeptide, wherein the normalized relative abundance of the semi-trypsin polypeptide is obtained by normalizing the relative abundance of the semi-trypsin polypeptide to the relative abundance of the full trypsin polypeptide.

Preferably, in S4, the identification principle of the hemitrypsin polypeptide is: peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein).

The first amino acid of the peptide fragment generated by the trypsin hydrolysis of the protein during the preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If hemitrypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in that the first or last amino acid of the peptide fragment is not K or R, so hemitrypsin can be used as a sign that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a sign that the protein is not hydrolyzed by other proteases in the organism. But studying the extent of proteolysis cannot be solely dependent on hemitrypsin, since changes in the abundance of hemitrypsin may be due solely to changes in the corresponding total amount of protein (increased or decreased synthesis), while the extent of proteolysis is not changed. It is therefore desirable to normalize the relative abundance of hemitrypsin polypeptides to that of complete trypsin polypeptides to compare the change in the degree of proteolysis between samples, thus eliminating the factor of total protein variation.

Preferably, the parameters for performing the search in the PEAKS DB database are: the mass deviation of the parent ion (precursor ion) is 10ppm, and the mass deviation of the fragment ion (product ion) is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 3; the false positive rate (false discovery rate) was set to 1%.

Preferably, the parameters of the MaxQuant performing the search are as follows: the primary search (first search) quality deviation was 20ppm, the main search (main search) quality deviation was 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (FDR) is set to 1%, and the peptide fragments with the Posterior Error Probability (PEP) less than 5% are reserved for subsequent analysis.

Preferably, pFind performs the search with the parameters: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-search (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.

The invention also provides the application of the method.

In particular, the above method is used to capture the proteolytic characteristics of intestinal microorganisms. Given the different levels of information beyond the flora structure and protein abundance, this analysis was based on the assumption that similar degrees of proteolysis should result in similar relative abundances of hemitrypsin polypeptides, the present study found that microbial hemitrypsin polypeptides in the 447 faecal macroproteins were enriched in several biological processes including metabolic processes of fatty acids, carboxylic acids, glucose and dunaliose, biosynthetic processes of branched chain amino acids, protein trafficking and bacterial flagellar-mediated cell motility, indicating that they undergo a more extensive regulation of proteolysis.

Alternatively, the above methods are used to study gut microflora and host-microorganism interactions.

The method of the present invention for mining the proteome is also suitable for capturing the proteolytic characteristics of plants and environmental microorganisms, and therefore, the method can be used for exploring the proteolytic laws of plants and environmental microorganisms.

The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and the change of the bacterial proteolysis degree can be researched, so that the corresponding bacterial protease is taken as a target, and the corresponding medicine is developed in a targeted manner for regulation.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a macro-proteome mining method taking hemitrypsin polypeptide as a center, which comprises two-step search, de novo sequencing, open search and result matching of various software to carry out large-scale macro-proteome mining taking hemitrypsin peptide as the center. These strategies can reduce false positive identifications due to database incompleteness and polypeptide modification. In the past, semi-trypsin polypeptide search is carried out on a macro-proteomics data set generated by low-resolution MS/MS, so that the search space is inevitably increased, and the confidence of an identification result is reduced. In their study, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching the Pyrococcus furiosus proteome in a large macro database containing 6162,582 sequences. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. The use of the method of the present invention in the analysis of the E.coli proteome showed that 93.4% of the peptides identified from a significantly larger macroprotein database (130,975,891 sequences) were identical to those identified from the conventional E.coli reference database, indicating a better accuracy of the method.

Drawings

Figure 1 is the normalized relative abundance of semi-trypsin polypeptides from the main bacterial species and biological processes (NRASP, semi-trypsin polypeptide abundance/full trypsin polypeptide abundance) in 447 fecal metabolomics samples, with the functions of the different bacterial species (a), biological processes (B) and enzymes (C) in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile, and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), and the outliers are shown as dots;

FIG. 2 shows the change in proteolytic characteristics of E.coli proteome in different biological processes induced by heat stress (p < 0.05).

Detailed Description

The following further describes the embodiments of the present invention. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used are, unless otherwise specified, commercially available reagents and materials.

Data set: a data set of 2 publicly published populations of healthy and IBD intestinal macroproteins was analyzed, data set 1(PXD008675) consisting of 447 fecal macroproteins from 89 subjects aged 6-58 years with a median of 22.8 years, including 24 non-IBD controls, 39 CD patients, 26 UC patients; of these samples, 272 samples had a matching metagenome and 184 samples had a matching metaproteome, respectively; we also analyzed the proteome data set (PXS000498) to investigate the effect of heat stress on the proteolytic regulation of E.coli K-12.

Macro protein database: a comprehensive human intestinal microbial protein database consists of the following components: (1) an Integrated Gene Catalog (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) sequence data for 215 strains cultured from healthy adult feces; (3) a Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 enterobacteria isolated from healthy human feces; (4) all archaea, bacterial and fungal sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The above-mentioned microbial sequence database is supplemented with a UniProt human reference proteome, which includes a food database composed of dietary organic substances, such as the organisms Triticum aestivum, Oryza sativa subsp, Glycine max, Zea mays, Arachis hypogaea, Solanum tuberosum, Lycopersicum esculentum, Sus scrofa, Bos taurus, Chicken (Gallus gallis), sheep (Ovis aries), Fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp, and Lipopenaeus vammi), and a common contaminant database (Sal. sativa), a food database composed of dietary organic substances, such as Triticum aestivum, Oryza sativa, and ahttp://maxquant.org/contaminants.zip). Using USEARCH v11.0.667(-Fastx _ Uniques) to remove repeat protein sequence, 130,975,891 non-redundant sequences were obtained.

The statistical analysis method comprises the following steps: multivariate analysis was performed on the amino acid frequencies near the cleavage site using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). Variables that differ significantly between groups (present in at least 75% of samples) were detected in R (version 3.5.3) and RStudio (version 1.1.383) using Kruskal-Wallis test and Dunn-Bonferroni test with P values less than 0.05. The beta diversity of the multiple sets of mathematical data was determined using principal coordinate analysis of the Bray-Curtis distance (PcoA).

Example 1 representation of different software performing a search

Using MLI data sets and large macroprotein databases, we compared the performance of different commercial software (protome discover, PEAK, ProteinPilot, and byionic) and open source software (MaxQuant, MSFragger, and pFind) searching for hemitryptic peptides on several 36-core servers (with 192G memory installed). Proteome discover, Byonic, MaxQuant, pFind, and ProteinPilot did not complete the search within a month, while MSFragger crashed due to an out of memory error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-core high performance computing cluster that completed the database search within 2 weeks.

Example 2 database search

The database search process generally includes two main steps: (1) de-novo sequencing and performing a first search using a large macro protein database (large database) and PEAKS software to obtain proteins from which at least one peptide is identified and to generate a corresponding small protein database (reduced database); (2) a second search was performed using reduced database and various software to improve the accuracy of identifying hemitrypsin polypeptides.

To cope with the increased search space and time in the identification of macroprotein hemitrypsin polypeptides, a search was first performed on clusters configured with an intel (r) xeon (r) 156-core processor and 1.5TB 2666MHz memory using PEAKS DB, the software first performed de novo sequencing, followed by a database search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.

Here a two-step search strategy is used, in order to increase the sensitivity of the search of the library, the protein identified by at least one peptide in the first search step is retained for the second multiple engine search, the second search step using PEAKS DB, MaxQuant (version 1.6.2) and pFind (version 3.1.5).

A MaxQuant (version 1.6.2.10) search is performed using the Andromeda engine. The parameters are set as follows: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving the peptide segment with the posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectra. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Quantification of proteins and peptides using the label-free quantification (LFQ) algorithm, the minimum ratio count was 1, and the minimum and average neighborhood numbers were 3 and 6, respectively.

Database searching was performed using pFind, the mass deviation of parent ions was 10ppm, the mass deviation of fragment ions was 20ppm, the library search mode was open-search (open-search), the enzyme was trypsin, the cleavage mode was semi-specific, and the number of sites not cleaved was at most 3.

Only peptides identified by three search engines (PEAKS DB, MaxQuant and pFind) were retained for further analysis.

Example 3 hemitrypsin polypeptide identification and Classification and functional analysis

1. Identification principle of hemitrypsin polypeptide

Peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein). The In-source fragment (In-source CID fragment) is distinguished from the proteolytically derived hemitrypsin polypeptide by elution time. Most of the intrasource fragments showed different retention times compared to their theoretical retention times (predicted using SSRCalc). Microbial hemitrypsin polypeptides are distinguished from human-derived and food-derived peptides by the corresponding accession numbers in the FASTA sequence entries.

2. Data combining hemitrypsin and complete trypsin to quantify the degree of hydrolysis of proteins

We determined the change in the degree of proteolysis from the normalized relative abundances of hemitryptic peptides (NRASP) by normalizing the relative abundances of hemitryptic peptides to the relative abundance of complete tryptic peptides. This normalization step is important because if the abundance of the hemitrypsin polypeptide and the complete trypsin polypeptide are varied in proportion, it is generally indicated that there is no change in the degree of proteolysis. However, in this case, differences between groups occur if only hemitrypsin polypeptides are compared.

3. Results

To improve the sensitivity of macro-proteome analysis based on large sequence space, we adopted a two-step database search strategy. This effectively reduces the macro-protein database size to that of traditional proteomic analysis, thereby facilitating hemitrypsin-based macro-proteomic searches. Furthermore, the confidence in peptide identification was improved by combining three commonly used software. These software used different algorithms for peak matching, co-efflux peptide fragment recognition and FDR calculation (MaxQuant and pFind use the target-decoy strategy, PEASK DB uses the decoy-fusion method), thereby significantly increasing the confidence of peptide recognition. Only the peptides identified collectively by the three software were retained for further analysis.

A total of 12,828,005 MS/MS patterns were retrieved and 3,804,903 (29.66%) secondary Patterns (PSMs), 125,494 peptides were identified from the stool macroprotein group, of which 108,784 (86.68%) are microbial-specific peptides (not shared by human or food sequences). 7,969 (6.35%) human specific polypeptides were identified in the fecal macroprotein group, of which 5,104 (64.05%) peptide is hemitrypsin. Gene Ontology (GO) analysis showed that 84.13% of human hemitryptic peptides are derived from potential extracellular proteins, and only 1.16% of microbial hemitryptic peptides are derived from potential extracellular proteins.

Example 4 the above procedure was verified by analyzing the proteolytic characteristics in the E.coli heat shock reaction

We validated our method by analyzing the heat shock-induced proteolytic features using the published proteome data set of E.coli K12. 9937 peptides were identified using the large macroprotein database described above in conjunction with three search engines, while 14111 peptides were identified using the UniProt e.coli K12 reference database. The number of peptides identified in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since large databases produced 10,000 times more sequences than the conventional reference sequence.

Of all 14111 peptides identified by UniProt e.coli K12 reference database, 83.7% had PEP values below 0.01 and 61.6% had PEP values below 0.001. Whereas in the 4783 peptides identified only by the UniProt e. coli K12 reference database (not identified by the macroprotein database), PEP values were 60.3% below 0.01 and 39.5% below 0.001. Peptide fragments identified in the reference database by UniProt e. coli K12 only had higher PEP values, indicating that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity reduction in large database searches. It is also noteworthy that the single microbial proteome is significantly different from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) have produced comparable results for intestinal macroproteomics studies. Therefore, our method does not show significant sensitivity loss in intestinal metaproteome analysis. 93.4% of the peptide fragments identified by the huge macroprotein database are consistent with those identified by the escherichia coli reference database, which indicates that the method has higher peptide fragment identification accuracy.

To validate our approach, we compared the NRASP of 185 biological processes found in all samples (as a regulatory indicator of proteolysis), and found that the NRASP of 20 (about 10.8%) biological processes was significantly different between the control and heat-stressed groups (P-value <0.05, fig. 2).

Heat stress disturbs the folding of proteins, leading to the accumulation of misfolded proteins that need to be refolded into the correct conformation. Accordingly, using our method, it was found that NRASP associated with protein folding under heat stress decreased, while NRASP associated with protein refolding increased. At the same time we observed an increase in methylation-associated NRASP under heat stress, which is consistent with recent findings. In conclusion, the biological findings obtained by using our method and the regulation of proteolysis have higher reliability.

Example 5 Classification and functional analysis of peptides

Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, and all peptides were analyzed with the following parameters: i and L were equalized, repeat peptides were filtered, and Advanced deletion cleavage treatment (Advanced missing cleavage). The classification information is a Sunburst view visualization provided using UniPept. A

Results of the study

(1) Relative abundance and distribution of hemitrypsin polypeptides

Figure 1 shows NRASP with 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses identified in at least 75% of samples from 447 fecal macroproteins from CD (n-204), UC (n-123) and control (n-120) groups. The median number of NRASPs from the phyla Firmicutes and Bacteroidetes, Bacteroides and Clostridia, Bacteroides and Bacteroides, and Bacteroides (Bacteroides) was around 1, indicating that the relative abundance of the corresponding hemitryptic peptide was comparable to that of the complete tryptic peptide (FIG. 1A). However, the median of NRASP increased to about 1.25 in the families of Lachnospiraceae and ruminants (ruminococcus), respectively, the median of NRASP increased to 1.5 in the genera Roseburia (genera Rosebularia) and Prevotella (Prevotella), respectively, and Clostridium (Faecalixizii) and Prevotella (Prevotella copri), respectively, while the median of NRASP decreased to about 0.5 in the phylum Actinobacillus (Actinobacillus) and the order Bifidobacterium (Bifidobacterium). The above data indicate that different enterobacteria have different degrees of proteolytic hydrolysis.

The median of NRASP for most biological processes also fluctuates around 1 (fig. 1B). While NRASP values of isoleucine biosynthetic process, valine biosynthetic process, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolic process, fucose metabolic process and glucose metabolic process are all increased to 1.75-2, NRASP of fatty acid metabolic process and L-threonine catabolic process is further increased to 2.5, NRASP of polysaccharide catabolic process, carbohydrate transport and transmembrane transport is reduced to about 0.75, and NRASP of metabolic process is further reduced to 0.3.

At the enzyme level, NRASP is highest for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism (median >3), followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid beta oxidation, glycine C-acetyltransferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductoisomerase involved in Branched Chain Amino Acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).

Claims

1. A method of determining the degree of proteolysis, comprising the steps of:

s2, performing a first search using a large macro protein database and PEAKSDB software to obtain a protein in which at least one peptide is identified;

s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKSDB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKSDB, MaxQuant and pBind software;

s4, distinguishing a hemitrypsin polypeptide and a complete trypsin polypeptide in the peptide obtained in S3;

2. The method of determining the degree of proteolysis of claim 1, wherein the identity of the hemitrypsin polypeptide in S4 is determined by: the identified peptide fragment is a hemi-trypsin N-terminal peptide if the first amino acid is not R or K (excluding the protein N-terminal peptide fragment), and the identified peptide fragment is a hemi-trypsin C-terminal peptide if the last amino acid is not R or K (excluding the protein C-terminal peptide fragment).

3. The method of claim 1, wherein the PEAKSDB database performs the search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.

4. The method of claim 1, wherein MaxQuant performs a search with parameters of: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate is set as 1%, and peptide fragments with the posterior error probability less than 5% are reserved for subsequent analysis.

5. Method for determining the degree of proteolysis according to claim 1, characterized in that the pFind performs the search with the parameters: the mass deviation of the parent ions is 10ppm, the mass deviation of the fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by the enzyme is at most 3; FDR was set to 1%.

6. Use of the method of any one of claims 1 to 5.

7. Use according to claim 6, wherein the method is used to capture characteristic information of intestinal microbial proteolysis.

8. The use according to claim 6, wherein the method is used for studying gut microbial and host interaction.

9. Use according to claim 6, wherein the method is used for studying diseases associated with bacterial proteases.

10. The use according to claim 9, wherein said diseases include, but are not limited to, bacterial infections, inflammatory bowel disease.