CN112786105B - Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms - Google Patents
Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms Download PDFInfo
- Publication number
- CN112786105B CN112786105B CN202011415023.0A CN202011415023A CN112786105B CN 112786105 B CN112786105 B CN 112786105B CN 202011415023 A CN202011415023 A CN 202011415023A CN 112786105 B CN112786105 B CN 112786105B
- Authority
- CN
- China
- Prior art keywords
- trypsin
- proteolysis
- peptide
- protein
- enzyme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 244000005700 microbiome Species 0.000 title claims description 21
- 230000000968 intestinal effect Effects 0.000 title claims description 19
- 230000002797 proteolythic effect Effects 0.000 title description 18
- 238000009412 basement excavation Methods 0.000 title description 3
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 87
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 51
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 48
- 108010026552 Proteome Proteins 0.000 claims abstract description 19
- 102000007079 Peptide Fragments Human genes 0.000 claims abstract description 17
- 108010033276 Peptide Fragments Proteins 0.000 claims abstract description 17
- 230000004481 post-translational protein modification Effects 0.000 claims abstract description 7
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 63
- 235000018102 proteins Nutrition 0.000 claims description 47
- 239000012588 trypsin Substances 0.000 claims description 45
- 229920001184 polypeptide Polymers 0.000 claims description 37
- 102000004190 Enzymes Human genes 0.000 claims description 30
- 108090000790 Enzymes Proteins 0.000 claims description 30
- 102000004142 Trypsin Human genes 0.000 claims description 25
- 108090000631 Trypsin Proteins 0.000 claims description 25
- 230000017854 proteolysis Effects 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 14
- 150000002500 ions Chemical class 0.000 claims description 14
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 claims description 12
- 235000001014 amino acid Nutrition 0.000 claims description 11
- 150000001413 amino acids Chemical class 0.000 claims description 11
- 239000012634 fragment Substances 0.000 claims description 11
- 208000022559 Inflammatory bowel disease Diseases 0.000 claims description 9
- 229940024606 amino acid Drugs 0.000 claims description 9
- 230000001580 bacterial effect Effects 0.000 claims description 9
- 102000035195 Peptidases Human genes 0.000 claims description 8
- 108091005804 Peptidases Proteins 0.000 claims description 8
- 239000004365 Protease Substances 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000004048 modification Effects 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 7
- 230000003647 oxidation Effects 0.000 claims description 7
- 238000007254 oxidation reaction Methods 0.000 claims description 7
- ODHCTXKNWHHXJC-VKHMYHEASA-N 5-oxo-L-proline Chemical compound OC(=O)[C@@H]1CCC(=O)N1 ODHCTXKNWHHXJC-VKHMYHEASA-N 0.000 claims description 6
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 claims description 6
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 claims description 6
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 claims description 6
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 claims description 6
- ODHCTXKNWHHXJC-GSVOUGTGSA-N Pyroglutamic acid Natural products OC(=O)[C@H]1CCC(=O)N1 ODHCTXKNWHHXJC-GSVOUGTGSA-N 0.000 claims description 6
- 230000021736 acetylation Effects 0.000 claims description 6
- 238000006640 acetylation reaction Methods 0.000 claims description 6
- ODHCTXKNWHHXJC-UHFFFAOYSA-N acide pyroglutamique Natural products OC(=O)C1CCC(=O)N1 ODHCTXKNWHHXJC-UHFFFAOYSA-N 0.000 claims description 6
- 238000005902 aminomethylation reaction Methods 0.000 claims description 6
- 229960001230 asparagine Drugs 0.000 claims description 6
- 235000009582 asparagine Nutrition 0.000 claims description 6
- 235000018417 cysteine Nutrition 0.000 claims description 6
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 claims description 6
- 230000006240 deamidation Effects 0.000 claims description 6
- 229930182817 methionine Natural products 0.000 claims description 6
- 201000010099 disease Diseases 0.000 claims description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 5
- 230000000717 retained effect Effects 0.000 claims description 5
- 101800001415 Bri23 peptide Proteins 0.000 claims description 4
- 102400000107 C-terminal peptide Human genes 0.000 claims description 4
- 101800000655 C-terminal peptide Proteins 0.000 claims description 4
- 208000035143 Bacterial infection Diseases 0.000 claims description 2
- 208000022362 bacterial infectious disease Diseases 0.000 claims description 2
- 238000010828 elution Methods 0.000 claims description 2
- 230000003993 interaction Effects 0.000 claims description 2
- 210000001035 gastrointestinal tract Anatomy 0.000 claims 8
- 238000005065 mining Methods 0.000 abstract description 5
- 238000012163 sequencing technique Methods 0.000 abstract description 5
- 238000007418 data mining Methods 0.000 abstract 1
- 238000001819 mass spectrum Methods 0.000 abstract 1
- 230000031018 biological processes and functions Effects 0.000 description 8
- 230000000813 microbial effect Effects 0.000 description 8
- 230000004060 metabolic process Effects 0.000 description 6
- 241000606125 Bacteroides Species 0.000 description 5
- 241000588724 Escherichia coli Species 0.000 description 5
- 230000033228 biological regulation Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000002550 fecal effect Effects 0.000 description 5
- 230000008642 heat stress Effects 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 241000894006 Bacteria Species 0.000 description 4
- 206010009900 Colitis ulcerative Diseases 0.000 description 4
- 208000011231 Crohn disease Diseases 0.000 description 4
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 4
- 201000006704 Ulcerative Colitis Diseases 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 4
- 238000003776 cleavage reaction Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 230000007017 scission Effects 0.000 description 4
- 238000004885 tandem mass spectrometry Methods 0.000 description 4
- 241001112696 Clostridia Species 0.000 description 3
- 150000005693 branched-chain amino acids Chemical class 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 235000013305 food Nutrition 0.000 description 3
- 230000007062 hydrolysis Effects 0.000 description 3
- 238000006460 hydrolysis reaction Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 230000032258 transport Effects 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 108010055682 3-hydroxybutyryl-CoA dehydrogenase Proteins 0.000 description 2
- 244000105624 Arachis hypogaea Species 0.000 description 2
- 235000010777 Arachis hypogaea Nutrition 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 108050001049 Extracellular proteins Proteins 0.000 description 2
- 241000192125 Firmicutes Species 0.000 description 2
- 241000287828 Gallus gallus Species 0.000 description 2
- 102100041003 Glutamate carboxypeptidase 2 Human genes 0.000 description 2
- 244000068988 Glycine max Species 0.000 description 2
- 235000010469 Glycine max Nutrition 0.000 description 2
- 102100039894 Hemoglobin subunit delta Human genes 0.000 description 2
- 102400000108 N-terminal peptide Human genes 0.000 description 2
- 101800000597 N-terminal peptide Proteins 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 2
- 241001494479 Pecora Species 0.000 description 2
- 241000205156 Pyrococcus furiosus Species 0.000 description 2
- 240000003768 Solanum lycopersicum Species 0.000 description 2
- 244000061456 Solanum tuberosum Species 0.000 description 2
- 235000002595 Solanum tuberosum Nutrition 0.000 description 2
- 239000004473 Threonine Substances 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 2
- 150000001732 carboxylic acid derivatives Chemical class 0.000 description 2
- 230000009087 cell motility Effects 0.000 description 2
- 238000010224 classification analysis Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 235000014113 dietary fatty acids Nutrition 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 229930195729 fatty acid Natural products 0.000 description 2
- 239000000194 fatty acid Substances 0.000 description 2
- 150000004665 fatty acids Chemical class 0.000 description 2
- 210000003608 fece Anatomy 0.000 description 2
- 210000003495 flagella Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010230 functional analysis Methods 0.000 description 2
- 244000005709 gut microbiome Species 0.000 description 2
- 238000003368 label free method Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 229920003259 poly(silylenemethylene) Polymers 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000035882 stress Effects 0.000 description 2
- 229960002898 threonine Drugs 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 102100025230 2-amino-3-ketobutyrate coenzyme A ligase, mitochondrial Human genes 0.000 description 1
- 241000606750 Actinobacillus Species 0.000 description 1
- 241001156739 Actinobacteria <phylum> Species 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 241000372033 Andromeda Species 0.000 description 1
- 235000017060 Arachis glabrata Nutrition 0.000 description 1
- 235000018262 Arachis monticola Nutrition 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000238569 Artemia sp. Species 0.000 description 1
- 241000692822 Bacteroidales Species 0.000 description 1
- 241000605059 Bacteroidetes Species 0.000 description 1
- 241001141113 Bacteroidia Species 0.000 description 1
- 241001655328 Bifidobacteriales Species 0.000 description 1
- 241000186000 Bifidobacterium Species 0.000 description 1
- FERIUCNNQQJTOY-UHFFFAOYSA-M Butyrate Chemical compound CCCC([O-])=O FERIUCNNQQJTOY-UHFFFAOYSA-M 0.000 description 1
- FERIUCNNQQJTOY-UHFFFAOYSA-N Butyric acid Natural products CCCC(O)=O FERIUCNNQQJTOY-UHFFFAOYSA-N 0.000 description 1
- 241001112695 Clostridiales Species 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- SHZGCJCMOBCMKK-UHFFFAOYSA-N D-mannomethylose Natural products CC1OC(O)C(O)C(O)C1O SHZGCJCMOBCMKK-UHFFFAOYSA-N 0.000 description 1
- 241000238557 Decapoda Species 0.000 description 1
- 241000305071 Enterobacterales Species 0.000 description 1
- 241000605980 Faecalibacterium prausnitzii Species 0.000 description 1
- PNNNRSAQSRJVSB-SLPGGIOYSA-N Fucose Natural products C[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C=O PNNNRSAQSRJVSB-SLPGGIOYSA-N 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108030000900 Glycine C-acetyltransferases Proteins 0.000 description 1
- 229920002488 Hemicellulose Polymers 0.000 description 1
- 108090000769 Isomerases Proteins 0.000 description 1
- 102000004195 Isomerases Human genes 0.000 description 1
- 238000012313 Kruskal-Wallis test Methods 0.000 description 1
- SHZGCJCMOBCMKK-DHVFOXMCSA-N L-fucopyranose Chemical compound C[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@@H]1O SHZGCJCMOBCMKK-DHVFOXMCSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 241001112693 Lachnospiraceae Species 0.000 description 1
- 241000238553 Litopenaeus vannamei Species 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 241000277275 Oncorhynchus mykiss Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 235000005043 Oryza sativa Japonica Group Nutrition 0.000 description 1
- 108090000854 Oxidoreductases Proteins 0.000 description 1
- 102000004316 Oxidoreductases Human genes 0.000 description 1
- 102000004035 Phosphoenolpyruvate carboxykinase (ATP) Human genes 0.000 description 1
- 108090000472 Phosphoenolpyruvate carboxykinase (ATP) Proteins 0.000 description 1
- 241000425347 Phyla <beetle> Species 0.000 description 1
- 241000605861 Prevotella Species 0.000 description 1
- 241000385060 Prevotella copri Species 0.000 description 1
- 241001354471 Pseudobahia Species 0.000 description 1
- 241000605947 Roseburia Species 0.000 description 1
- 241000282849 Ruminantia Species 0.000 description 1
- 241000095588 Ruminococcaceae Species 0.000 description 1
- 241000277289 Salmo salar Species 0.000 description 1
- 235000002560 Solanum lycopersicum Nutrition 0.000 description 1
- 102000019197 Superoxide Dismutase Human genes 0.000 description 1
- 108010012715 Superoxide dismutase Proteins 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 244000098338 Triticum aestivum Species 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 235000007244 Zea mays Nutrition 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000003963 antioxidant agent Substances 0.000 description 1
- 230000003078 antioxidant effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- 230000032770 biofilm formation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 208000037976 chronic inflammation Diseases 0.000 description 1
- 208000037893 chronic inflammatory disorder Diseases 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 230000000378 dietary effect Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004129 fatty acid metabolism Effects 0.000 description 1
- 230000005714 functional activity Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000004110 gluconeogenesis Effects 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 230000004153 glucose metabolism Effects 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 150000004715 keto acids Chemical class 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000006609 metabolic stress Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 230000002906 microbiologic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004899 motility Effects 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000010239 partial least squares discriminant analysis Methods 0.000 description 1
- 235000020232 peanut Nutrition 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 230000007065 protein hydrolysis Effects 0.000 description 1
- 230000030788 protein refolding Effects 0.000 description 1
- 238000000575 proteomic method Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000032895 transmembrane transport Effects 0.000 description 1
- 230000001810 trypsinlike Effects 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the technical field of biology, and discloses a macroprotein group data mining method taking a tryptic peptide as a center, which comprises two-step library searching, de novo sequencing, open searching and multiple library searching software matching, and is used for large-scale macroprotein group information mining taking the tryptic peptide as a center aiming at high-resolution mass spectrum data. These strategies can reduce the false positive rate due to database imperfections and post-translational modifications. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a huge macro protein database are consistent with the peptide fragments identified from the traditional colibacillus reference database.
Description
Technical Field
The invention relates to the technical field of biological information analysis, in particular to a macro proteome mining method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms.
Background
Intestinal microorganisms live in a dynamic environment, facing protein toxicity and metabolic stress from drugs, diets, microbial competition, and host endogenous chemicals. Bacteria have evolved different regulatory strategies to accommodate changing environments, including alterations in gene expression, cell differentiation and changes in motility, in which proteolysis plays a vital role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to a dynamic intestinal environment. Microorganisms have a very broad range of functions which are regulated by proteolysis, such as stress reactions, cell growth division, biofilm formation, secretion of proteins.
Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by genetic and environmental factors, mainly including Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbiologic dysregulation. In IBD gut microbiome studies, metagenomics and 16S rRNA gene sequencing are the vast majority. However, macro-transcriptomics or macro-proteomics are required to pinpoint functional and metabolic activities by direct measurement of RNA and protein, respectively. In addition, there are important regulatory patterns at the protein level, such as proteolytic regulation, which cannot be obtained by RNA studies, but can be studied using macroproteomics.
However, in complex disease states such as IBD, the characteristic changes of the proteolysis of intestinal microorganisms have not been studied yet, and thus a method capable of grasping the proteolytic characteristics of intestinal microorganisms in complex disease states is highly demanded.
Disclosure of Invention
The invention aims to solve the technical problems in the prior art, and firstly provides a macroproteome excavation method taking a trypsin polypeptide as a center and also provides a method for comparing the degree of proteolysis.
It is a second object of the present invention to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.
The aim of the invention is achieved by the following technical scheme:
A method of determining the degree of proteolysis comprising the steps of:
S1, acquiring (macro) proteome data of a sample or (macro) proteome data published in a public database;
S2, performing a first search by using a large macro protein database and PEAKS DB software to obtain at least one protein with the peptide identified;
S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides simultaneously identified by the PEAKS DB software, the MaxQuant software and the pFind software;
S4, distinguishing a half trypsin polypeptide (Semi-TRYPTIC PEPTIDE) and a full trypsin polypeptide (full TRYPTIC PEPTIDE) in the peptide obtained in the S3;
S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide.
Preferably, in S4, the principle of identification of the hemicrypsin polypeptide is: peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein).
The previous amino acid of the peptide fragment produced by trypsin hydrolysis of the protein during preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If half-trypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in an amino acid in front of the peptide stretch or in the last amino acid being other than K or R, so that half-trypsin can be used as a marker that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a marker that the protein is not hydrolyzed by other proteases in the organism. However, studies on the degree of proteolysis cannot rely solely on trypsin, since the change in the abundance of trypsin is probably due solely to a change in the corresponding total amount of protein (increase or decrease in synthesis), whereas the degree of proteolysis is not. It is therefore desirable to normalize the relative abundance of the hemicrypsin polypeptide to that of the holo-trypsin polypeptide to compare changes in the degree of proteolysis between different samples, thus eliminating the factor of total protein variation.
Preferably, the parameters of the PEAKS DB database performing the search are: the mass deviation of parent ion (pre-conditioner ion) was 10ppm, and the mass deviation of fragment ion (product ion) was 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 3; the false positive rate (false discovery rate) was set to 1%.
Preferably, parameters for MaxQuant to perform the search are: the primary search (FIRST SEARCH) had a mass deviation of 20ppm and the primary search (MAIN SEARCH) had a mass deviation of 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of non-enzyme cutting sites is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (false discovery rate, FDR) was set to 1% and peptide fragments with a posterior error probability (posterior error probability, PEP) of less than 5% were retained for subsequent analysis.
Preferably, parameters for pFind to perform the search are: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-library searching (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
The invention also provides application of the method.
In particular, the above method is used to capture proteolytic characteristics of intestinal microorganisms. The present inventors have found that microbial trypsin polypeptides in the 447 fecal macro-proteome are enriched in several biological processes, such as fatty acid, carboxylic acid, glucose and salt algae metabolism, branched chain amino acid biosynthesis, protein transport and bacterial flagella mediated cell movement, suggesting that they undergo more extensive proteolytic regulation.
Or the above method is used to study intestinal microflora and host-microorganism interactions.
The above-described proteome mining method of the present invention is also applicable to capturing proteolytic characteristics of plant and environmental microorganisms, and thus, the above-described method can be used to explore proteolytic laws of plant and environmental microorganisms.
The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and can be used for researching the change of bacterial proteolytic degree, so that corresponding bacterial protease is used as a target, and corresponding medicaments are developed for regulation and control in a targeted manner.
Compared with the prior art, the invention has the following beneficial effects:
The invention provides a macro proteome mining method taking a half-trypsin polypeptide as a center, which comprises two-step searching, de novo sequencing, open searching and multiple software result matching so as to perform large-scale macro proteome mining taking the half-trypsin polypeptide as the center. These strategies may reduce false positive recognition due to database imperfections and polypeptide modifications. Previous studies performed a halftoning polypeptide search on macro proteomics datasets generated by low resolution MS/MS, inevitably increasing search space and decreasing confidence in the identification results. In their study, in a macro protein large database containing 6162,582 sequences, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching Pyrococcus furiosus proteomes. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. When the method of the invention is used for analyzing the colibacillus proteome, 93.4% of peptide fragments identified from a large macro protein database (130,975,891 sequences) are consistent with the peptide fragments identified from the traditional colibacillus reference database, so that the method has better accuracy.
Drawings
FIG. 1 shows the normalized relative abundance of a hemicrypsin polypeptide from a major bacterial species and biological process (NRASP, hemicrypsin polypeptide abundance/holo-trypsin polypeptide abundance) in 447 stool metabolism proteomics samples, with functions of different bacterial species (A), biological process (B) and enzyme (C) arranged in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), with outliers shown as points;
FIG. 2 shows the changes in proteolytic characteristics of different biological processes under heat stress induction (p < 0.05) of E.coli proteomes.
Detailed Description
The following describes the invention in more detail. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used, unless otherwise specified, are those commercially available.
Data set: a dataset of 2 publicly published populations of healthy and IBD intestinal macroproteomes was analyzed, dataset 1 (PXD 008675) consisting of 447 fecal macroproteomes from 89 subjects aged 6-58 years with a median 22.8 years, including 24 non-IBD control groups, 39 CD patients, 26 UC patients; of these samples, 272 samples each had a matching metagenome and 184 samples had a matching metaproteome; we also analyzed the proteome dataset (PXS 000498) to investigate the effect of heat stress on E.coli K-12 proteolytic regulation.
Macro protein database: a comprehensive human intestinal microbial protein database is composed of the following parts: (1) INTEGRATED GENE Category (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) Sequence data of 215 strains cultured from healthy adult human feces; (3) Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 strains of enterobacteria isolated from healthy human feces; (4) All archaea, bacteria and fungi sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The microbial sequence database described above is supplemented with a UniProt human reference proteome, which includes food databases of dietary organic compositions such as bio-common wheat (Triticum aestivum), rice (Oryza sativa subsp. Japonica), soybean (Glycine max), corn (Zea mays), peanut (Arachis hypogaea), potato (Solanum tuberosum), tomato (Solanum lycopersicum), pig (susscia), cow (Bos taurus), chicken (Gallus gallus), sheep (Ovis aries), fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp. Litopenaeus vannamei), and a common contaminant database (http:// maxquat. Org/contacts. Zip). The repeated protein sequences were removed using USEARCH v11.0.667 (-Fastx _unique) to yield 130,975,891 non-redundant sequences.
The statistical analysis method comprises the following steps: the amino acid frequencies near the cleavage sites were subjected to multivariate analysis using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). The Kruskal-Wallis test and Dunn-Bonferroni test were used in R (vesion 3.5.3) and RStudio (version 1.1.383), with P values less than 0.05 to detect variables that differ significantly between groups (in at least 75% of the samples). The beta diversity of the multiple sets of mathematical data was determined using a principal coordinate analysis (PcoA) of the Bray-Curtis distances.
Example 1 manifestation of different software performing searches
Using the MLI dataset and the large macro protein database, we compared the performance of different commercial software (Proteome Discoverer, PEAK, proteinPilot, and Byonic) and open source software (MaxQuant, MSFragger and pFind) searching for tryptic peptides on several 36-core servers (with 192G memory installed). Proteome Discoverer, byonic, maxQuant, pFind, and ProteinPilot did not complete the search within one month, while MSFRAGGER crashed due to a memory starvation error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-kernel high performance computing cluster that completed the database search within 2 weeks.
Example 2 database search
The database search process generally includes two main steps: (1) De-novo sequencing and performing a first search using large macro database (large database) and PEAKS software to obtain at least one protein identified by the peptide and to generate a corresponding small database of proteins; (2) A second search was performed using reduced database and various software to improve the accuracy of identifying the hemitrypsin polypeptide.
To address the increased search space and time in macro-proteomic hemicrypsin polypeptide identification, searches were first performed using PEAKS DB on clusters configured with Intel (R) Xeon (R) 156 core processor and 1.5tb 2666mhz memory, software first performed de novo sequencing, followed by database searches using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.
The two-step search strategy is used here in order to increase the sensitivity of the search pool, in which the proteins identified by at least one peptide are retained for a second round of multi-engine search in a first search, and PEAKS DB, maxQuant (version 1.6.2) and pFind (version 3.1.5) are used in a second search.
MaxQuant (version 1.6.2.10) search was performed using Andromeda engine. The setting parameters are as follows: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving peptide fragments with posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectrum. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Protein and peptide quantification using a label-free quantification (LFQ) algorithm, a minimum ratio count of 1, and a minimum neighborhood number and average neighborhood number of 3 and 6, respectively.
Database searching was performed using pFind, with a mass bias of 10ppm for parent ions and 20ppm for fragment ions, with an open-search mode, trypsin as enzyme, and half-specificity as enzyme cleavage, with a maximum of 3 sites not cleaved by enzyme.
Only peptides recognized by the three search engines (PEAKS DB, maxQuant, and pFind) will be retained for further analysis.
Example 3 identification, classification and functional analysis of a hemitrypsin polypeptide
1. Identification principle of a Defactin polypeptide
Peptides that were not R or K at the amino acid position prior to the identification sequence were N-terminal peptides of trypsin (not containing the N-terminus of the protein). The last amino acid of the identified sequence lacks R or K and is the hemitrypsin C-terminal peptide (not comprising the C-terminus of the protein). The In-source fragment (In-source CID FRAGMENT) was distinguished from the proteolytically derived hemicrypsin polypeptide according to elution time. Most intra-source fragments show different retention times compared to their theoretical retention times (predicted using SSRCalc). The microbial trypsin-like polypeptide is distinguished from human-derived and food-derived peptides according to the corresponding search number in the FASTA sequence entry.
2. Combining data from half trypsin and full trypsin to quantify the degree of protein hydrolysis
We determined the change in the degree of proteolysis from the normalized relative abundance of the tryptic peptides (normalized relative abundance of semi-TRYPTIC PEPTIDES, NRASP for short) by normalizing the relative abundance of the tryptic peptides to that of the full tryptic peptides. This normalization step is important because if the abundance of the half-trypsin polypeptide and the full-trypsin polypeptide changes proportionally, it is usually indicated that there is no change in the degree of proteolysis. However, in this case, if only the half-trypsin polypeptides are compared, the group-to-group difference occurs.
3. Results
To increase the sensitivity of macro proteome analysis based on large sequence space, we employed a two-step database search strategy. This effectively reduces the size of the macro protein database to that of traditional proteomic analysis, thereby facilitating a half-trypsin-based macro proteomic search. In addition, the reliability of peptide identification is improved by combining three commonly used software. These software used different algorithms for peak matching, co-outflow peptide fragment identification and FDR calculation (MaxQuant and pFind used target-decoy strategy, PEASK DB used decoy-fusion method), thus significantly increasing the confidence of peptide identification. Only peptides identified together by three software were retained for further analysis.
12,828,005 MS/MS profiles were retrieved and 3,804,903 (29.66%) secondary Profiles (PSMs), 125,494 peptides were identified from the fecal macroproteome, of which 108,784 (86.68%) peptides were microbial specific peptides (not shared by human or food sequences). 7,969 (6.35%) of human specific polypeptides were identified in the fecal macroproteome, of which 5,104 (64.05%) of the peptide was trypsin halfprypsin. Gene Ontology (GO) analysis showed that 84.13% of the human tryptic peptides were derived from potential extracellular proteins, and only 1.16% of the microbial tryptic peptides were derived from potential extracellular proteins.
Example 4 the above method was validated by analysis of proteolytic characteristics in the E.coli heat shock reaction
We validated our method by analyzing the heat shock induced proteolytic profile using the published proteomic data set of E.coli K12. 9937 peptide fragments were identified using the large macro protein database described above, while 14111 peptide fragments were identified using the UniProt e.coll K12 reference database, in combination with three search engines. The number of identified peptides in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since the large database produced 10,000 times more sequences than the conventional reference sequences.
Of all 14111 peptide fragments identified by the UniProt e.collik12 reference database, 83.7% had a PEP value below 0.01 and 61.6% had a PEP value below 0.001. Whereas of 4783 peptide fragments identified only by the UniProt e.collik12 reference database (not identified by the macroprotein database), PEP values were less than 60.3% and less than 39.5% of 0.01. The higher PEP values in the peptide fragments identified by the UniProt e.collik12 reference database alone indicate that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity degradation when searching large databases. It is also notable that the single microbial proteome differs significantly from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) produce comparable results for intestinal macro proteomics studies. Thus, our method does not suffer significant sensitivity loss in intestinal metaproteomic analysis. 93.4% of peptides identified by the huge macro protein database are identical to those identified by the E.coli reference database, which shows that our method has higher accuracy of identifying peptides.
To verify our approach, we compared NRASP of the 185 biological processes found in all samples (as a proteolytic regulatory index), and found that NRASP of 20 (approximately 10.8%) biological processes differed significantly between the control and heat stressed groups (P-value <0.05, fig. 2).
Heat stress can disrupt the folding of proteins, resulting in the accumulation of misfolded proteins that need to refold into the correct conformation. Accordingly, using our method we found that NRASP under heat stress was reduced in protein folding and NRASP was increased in association with protein refolding. At the same time we observed that NRASP associated with methylation increased under heat stress, consistent with recent findings. In conclusion, the biological findings with proteolytic control obtained using our method have a high degree of confidence.
Example 5 Classification and functional analysis of peptides
Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, all peptides were analyzed with the following parameters: let I and L equal, filter the repeat peptide, advanced deletion cleavage treatment (ADVANCED MISSING CLEAVAGE HANDLING). The classification information is visualized using the Sunburst view provided by UniPept. A step of
Results of the study
(1) Relative abundance and distribution of a hemitrypsin polypeptide
Figure 1 shows NRASP of 447 fecal macroproteomes from CD (n=204), UC (n=123) and control (n=120) groups identified 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses in at least 75% of the samples. The median of NRASP of the phylum firmicutes (phyla Firmicutes) and Bacteroides (Bacteroidetes), bacteroides (Bacteroidia) and clostridia (Clostridia), bacteroides (Bacteroidales) and clostridia (Clostridiales), bacteroides (fBacteroidaceae) and Bacteroides (bacterioides) was around 1, indicating that the relative abundance of the corresponding hemicelluloses to the complete tryptic peptides was comparable (fig. 1A). However, the median of NRASP increases to about 1.25 in the trichomonadaceae (Lachnospiraceae) and ruminant cocci (Ruminococcaceae), respectively, the median of NRASP of the genus rosbehenia (genera Roseburia) and the genus praziella (Prevotella) and the genus praziella (Faecalibacterium prausnitzii) and the genus praziella (Prevotella copri), respectively, increases to 1.5, and the median of NRASP of the phylum actinobacillus (Actinobacteria) and the order bifidobacterium (Bifidobacteriales) decreases to about 0.5. The above data indicate that different intestinal bacteria have different degrees of protease hydrolysis.
The median of NRASP of most biological processes also fluctuates around 1 (fig. 1B). While isoleucine biosynthesis, valine biosynthesis, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolism, fucose metabolism and glucose metabolism all increase to a value of NRASP of 1.75-2, fatty acid metabolism and L-threonine catabolism NRASP further increase to 2.5, polysaccharide catabolism NRASP of carbohydrate transport and transmembrane transport decrease to about 0.75, and metabolism NRASP further decreases to 0.3.
At the enzyme level, NRASP is highest (median > 3) for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism, followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid β oxidation, glycine C-acetyl transferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductase isomerase involved in branched-chain amino acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).
Claims (8)
1. A method for determining the degree of proteolysis of a microorganism in the intestinal tract, comprising the steps of:
S1, acquiring macro proteome data of a sample or macro proteome data published in a public database;
S2, performing a first search by using a large macro protein database and PEAKS DB software to obtain at least one protein with the peptide identified;
S3, searching and identifying the histology data and the protein sequence obtained in the S2 by using PEAKS DB software, maxQuant software and pFind software, and reserving peptides identified by the PEAKS DB software, the MaxQuant software and the pFind software simultaneously;
s4, distinguishing the half trypsin polypeptide and the complete trypsin polypeptide in the peptide obtained in the S3;
S5, determining the degree of proteolysis by using the normalized relative abundance of the half-trypsin polypeptide, wherein the normalized relative abundance of the half-trypsin polypeptide is obtained by normalizing the relative abundance of the half-trypsin polypeptide to the relative abundance of the full-trypsin polypeptide;
In S4, the identification principle of the hemicrypsin polypeptide is as follows: the identified peptide fragment is a half-trypsin N-terminal peptide if the previous amino acid is not R or K and does not include a protein N-terminal peptide fragment, and the identified peptide fragment is a half-trypsin C-terminal peptide if the last amino acid is not R or K and does not include a protein C-terminal peptide fragment; the in-source fragments were distinguished from proteolytically derived halftoning polypeptides according to elution time.
2. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters of the PEAKS DB database search are: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02Da; the aminomethylation of cysteine is set as an immobilization modification; maximum variable post-translational modifications of each peptide to 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; the false positive rate was set to 1%.
3. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search MaxQuant are: the primary search quality deviation was 20ppm, and the primary search quality deviation was 4.5ppm; the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 2; the aminomethylation of cysteine is set as an immobilization modification; the maximum number of variable post-translational modifications per peptide is 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate was set to 1% and peptide fragments with a posterior error probability of less than 5% were retained for subsequent analysis.
4. The method for determining the degree of proteolysis of a gut microorganism according to claim 1, wherein the parameters for performing the search pFind are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specific, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
5. A method of determining the degree of proteolysis of an intestinal microorganism according to any of claims 1 to 4 wherein the method is used to capture characteristic information of proteolysis of an intestinal microorganism.
6. The method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the method is used to study gut microorganism and host interactions.
7. A method of determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4 for the study of diseases associated with bacterial proteases.
8. The method for determining the degree of proteolysis of a gut microorganism according to any of claims 1 to 4, wherein the disease comprises a bacterial infection, inflammatory bowel disease.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011415023.0A CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011415023.0A CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112786105A CN112786105A (en) | 2021-05-11 |
CN112786105B true CN112786105B (en) | 2024-05-07 |
Family
ID=75750749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011415023.0A Active CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112786105B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115267033A (en) * | 2022-08-05 | 2022-11-01 | 西湖大学 | Macro-proteomics analysis method based on mass spectrum data and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
CN1692282A (en) * | 2002-04-15 | 2005-11-02 | 萨莫芬尼根有限责任公司 | Quantitation of biological molecules |
CN103268432A (en) * | 2013-05-08 | 2013-08-28 | 中国科学院水生生物研究所 | Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry |
KR20140101134A (en) * | 2013-02-08 | 2014-08-19 | 건국대학교 산학협력단 | Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration |
WO2018165350A1 (en) * | 2017-03-07 | 2018-09-13 | Nuseed Pty Ltd. | Lc-ms/ms-based methods for characterizing proteins |
CN109444313A (en) * | 2018-10-23 | 2019-03-08 | 大连工业大学 | Method based on LC-MS technology analysis protein-PS complex digestibility |
CN111220690A (en) * | 2018-11-27 | 2020-06-02 | 中国科学院大连化学物理研究所 | Direct mass spectrometry detection method for low-abundance protein posttranslational modification group |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2349265A1 (en) * | 2001-05-30 | 2002-11-30 | Andrew Emili | Protein expression profile database |
US8071329B2 (en) * | 2002-10-11 | 2011-12-06 | University Of Maryland | Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides |
EP1941280A2 (en) * | 2005-10-13 | 2008-07-09 | Applera Corporation | Methods for the development of a biomolecule assay |
DE102006051516A1 (en) * | 2006-10-31 | 2008-05-08 | Curevac Gmbh | (Base) modified RNA to increase the expression of a protein |
US8679771B2 (en) * | 2007-01-25 | 2014-03-25 | The Regents Of The University Of California | Specific N-terminal labeling of peptides and proteins in complex mixtures |
US20110093205A1 (en) * | 2009-10-19 | 2011-04-21 | Palo Alto Research Center Incorporated | Proteomics previewer |
EP2508537A1 (en) * | 2011-04-04 | 2012-10-10 | Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. | Quantitative standard for mass spectrometry of proteins |
US10141169B2 (en) * | 2012-11-15 | 2018-11-27 | Dh Technologies Development Pte. Ltd. | Systems and methods for identifying compounds from MS/MS data without precursor ion information |
EP2738558A1 (en) * | 2012-11-28 | 2014-06-04 | ETH Zurich | Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof |
EP3308778A1 (en) * | 2016-10-12 | 2018-04-18 | Institute for Research in Biomedicine | Arginine and its use as a t cell modulator |
US20180340941A1 (en) * | 2017-05-25 | 2018-11-29 | Wisconsin Alumni Research Foundation | Method to Map Protein Landscapes |
CN107655985B (en) * | 2017-08-25 | 2020-05-26 | 南京农业大学 | LC-MS-MS technology-based in vivo protein nutrition evaluation method |
-
2020
- 2020-12-07 CN CN202011415023.0A patent/CN112786105B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1692282A (en) * | 2002-04-15 | 2005-11-02 | 萨莫芬尼根有限责任公司 | Quantitation of biological molecules |
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
KR20140101134A (en) * | 2013-02-08 | 2014-08-19 | 건국대학교 산학협력단 | Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration |
CN103268432A (en) * | 2013-05-08 | 2013-08-28 | 中国科学院水生生物研究所 | Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry |
WO2018165350A1 (en) * | 2017-03-07 | 2018-09-13 | Nuseed Pty Ltd. | Lc-ms/ms-based methods for characterizing proteins |
CN109444313A (en) * | 2018-10-23 | 2019-03-08 | 大连工业大学 | Method based on LC-MS technology analysis protein-PS complex digestibility |
CN111220690A (en) * | 2018-11-27 | 2020-06-02 | 中国科学院大连化学物理研究所 | Direct mass spectrometry detection method for low-abundance protein posttranslational modification group |
Non-Patent Citations (3)
Title |
---|
宏蛋白质组学研究进展及应用;吴重德;黄钧;周荣清;;食品与发酵工业;20160415(05);全文 * |
胰蛋白酶水解全酪蛋白反应过程中的色谱分析;齐崴, 何明霞, 何志敏, 史德青;色谱;20020130(01);全文 * |
质谱图聚类网络法在鉴定多肽翻译后修饰中的应用及研究进展;何明敏;舒坤贤;白明泽;许睿;;生物工程学报;20180419(10);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112786105A (en) | 2021-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Braaksma et al. | An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data | |
Pomastowski et al. | Analysis of bacteria associated with honeys of different geographical and botanical origin using two different identification approaches: MALDI-TOF MS and 16S rDNA PCR technique | |
Karlsson et al. | Proteotyping: Proteomic characterization, classification and identification of microorganisms–A prospectus | |
Merkley et al. | Applications and challenges of forensic proteomics | |
Falb et al. | Archaeal N-terminal protein maturation commonly involves N-terminal acetylation: a large-scale proteomics survey | |
Zhu et al. | Comparative proteomic analysis of sensitive and multi-drug resistant Aeromonas hydrophila isolated from diseased fish | |
Radzinski et al. | Temporal profiling of redox-dependent heterogeneity in single cells | |
US8412464B1 (en) | Methods for detection and identification of cell type | |
CN112786105B (en) | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms | |
Šedo et al. | Limitations of routine MALDI-TOF mass spectrometric identification of Acinetobacter species and remedial actions | |
Jonckheere et al. | Omics assisted N-terminal proteoform and protein expression profiling on methionine aminopeptidase 1 (MetAP1) deletion | |
Zhang et al. | ScCobB2-mediated Lysine Desuccinylation Regulates Protein Biosynthesis and Carbon Metabolism in Streptomyces coelicolor*[S] | |
Gregersen et al. | Proteomic characterization of pilot scale hot-water extracts from the industrial carrageenan red seaweed Eucheuma denticulatum | |
Laschuk et al. | Proteomic survey of the cestode Mesocestoides corti during the first 24 hours of strobilar development | |
Yan et al. | A semi-tryptic peptide centric metaproteomic mining approach and its potential utility in capturing signatures of gut microbial proteolysis | |
Benabdelkamel et al. | Serum-based proteomics profiling in adult patients with cystic fibrosis | |
Spörl et al. | A UHPLC-MS/MS method for the detection of meat substitution by nine legume species in emulsion-type sausages | |
Willmann et al. | Multi-omics approach identifies novel pathogen-derived prognostic biomarkers in patients with Pseudomonas aeruginosa bloodstream infection | |
Plikat et al. | From proteomics to systems biology of bacterial pathogens: approaches, tools, and applications | |
CA3208429A1 (en) | Biomarkers for determining an immuno-oncology response | |
CN113433253A (en) | Novel method for detecting Enterobacter sakazakii, application and detection kit | |
Yan et al. | Metaproteomics reveals potential signatures of disease-specific alterations in the gut microbial proteolytic events in inflammatory bowel disease | |
Candela et al. | Automatic discrimination of species within the Enterobacter cloacae complex using MALDI-TOF Mass Spectrometry and supervised algorithms | |
Karlsson et al. | Proteotyping: Tandem mass spectrometry shotgun proteomic characterization and typing of pathogenic microorganisms | |
Noecker et al. | Systems biology illuminates alternative metabolic niches in the human gut microbiome |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |