CN112786105A - Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics - Google Patents
Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics Download PDFInfo
- Publication number
- CN112786105A CN112786105A CN202011415023.0A CN202011415023A CN112786105A CN 112786105 A CN112786105 A CN 112786105A CN 202011415023 A CN202011415023 A CN 202011415023A CN 112786105 A CN112786105 A CN 112786105A
- Authority
- CN
- China
- Prior art keywords
- trypsin
- protein
- search
- peptide
- proteolysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000017854 proteolysis Effects 0.000 title claims description 23
- 230000000968 intestinal effect Effects 0.000 title claims description 17
- 230000000813 microbial effect Effects 0.000 title claims description 15
- 238000005065 mining Methods 0.000 title description 7
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 88
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 65
- 229920001184 polypeptide Polymers 0.000 claims abstract description 37
- 108010026552 Proteome Proteins 0.000 claims abstract description 15
- 102000007079 Peptide Fragments Human genes 0.000 claims abstract description 12
- 108010033276 Peptide Fragments Proteins 0.000 claims abstract description 12
- 230000004481 post-translational protein modification Effects 0.000 claims abstract description 7
- 108090000623 proteins and genes Proteins 0.000 claims description 46
- 235000018102 proteins Nutrition 0.000 claims description 43
- 102000004169 proteins and genes Human genes 0.000 claims description 43
- 239000012588 trypsin Substances 0.000 claims description 32
- 102000004142 Trypsin Human genes 0.000 claims description 19
- 108090000631 Trypsin Proteins 0.000 claims description 19
- 102000004190 Enzymes Human genes 0.000 claims description 16
- 108090000790 Enzymes Proteins 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 15
- 150000002500 ions Chemical class 0.000 claims description 14
- 229940024606 amino acid Drugs 0.000 claims description 12
- 235000001014 amino acid Nutrition 0.000 claims description 12
- 150000001413 amino acids Chemical class 0.000 claims description 12
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 claims description 12
- 230000001580 bacterial effect Effects 0.000 claims description 10
- 239000012634 fragment Substances 0.000 claims description 10
- 208000022559 Inflammatory bowel disease Diseases 0.000 claims description 9
- 102000035195 Peptidases Human genes 0.000 claims description 7
- 108091005804 Peptidases Proteins 0.000 claims description 7
- 239000004365 Protease Substances 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000004048 modification Effects 0.000 claims description 7
- 238000012986 modification Methods 0.000 claims description 7
- 230000003647 oxidation Effects 0.000 claims description 7
- 238000007254 oxidation reaction Methods 0.000 claims description 7
- ODHCTXKNWHHXJC-VKHMYHEASA-N 5-oxo-L-proline Chemical compound OC(=O)[C@@H]1CCC(=O)N1 ODHCTXKNWHHXJC-VKHMYHEASA-N 0.000 claims description 6
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 claims description 6
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 claims description 6
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 claims description 6
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 claims description 6
- ODHCTXKNWHHXJC-GSVOUGTGSA-N Pyroglutamic acid Natural products OC(=O)[C@H]1CCC(=O)N1 ODHCTXKNWHHXJC-GSVOUGTGSA-N 0.000 claims description 6
- 230000021736 acetylation Effects 0.000 claims description 6
- 238000006640 acetylation reaction Methods 0.000 claims description 6
- ODHCTXKNWHHXJC-UHFFFAOYSA-N acide pyroglutamique Natural products OC(=O)C1CCC(=O)N1 ODHCTXKNWHHXJC-UHFFFAOYSA-N 0.000 claims description 6
- 238000005902 aminomethylation reaction Methods 0.000 claims description 6
- 229960001230 asparagine Drugs 0.000 claims description 6
- 235000009582 asparagine Nutrition 0.000 claims description 6
- 235000018417 cysteine Nutrition 0.000 claims description 6
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 claims description 6
- 230000006240 deamidation Effects 0.000 claims description 6
- 238000001976 enzyme digestion Methods 0.000 claims description 6
- 229930182817 methionine Natural products 0.000 claims description 6
- 201000010099 disease Diseases 0.000 claims description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 5
- 101800001415 Bri23 peptide Proteins 0.000 claims description 4
- 102400000107 C-terminal peptide Human genes 0.000 claims description 4
- 101800000655 C-terminal peptide Proteins 0.000 claims description 4
- 208000035143 Bacterial infection Diseases 0.000 claims description 2
- 208000022362 bacterial infectious disease Diseases 0.000 claims description 2
- 230000003993 interaction Effects 0.000 claims description 2
- 241000588724 Escherichia coli Species 0.000 abstract description 13
- 238000007418 data mining Methods 0.000 abstract 1
- 230000002797 proteolythic effect Effects 0.000 description 10
- 230000031018 biological processes and functions Effects 0.000 description 8
- 230000004060 metabolic process Effects 0.000 description 7
- 244000005700 microbiome Species 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 241000606125 Bacteroides Species 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000008642 heat stress Effects 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 206010009900 Colitis ulcerative Diseases 0.000 description 4
- 208000011231 Crohn disease Diseases 0.000 description 4
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 description 4
- 201000006704 Ulcerative Colitis Diseases 0.000 description 4
- 230000033228 biological regulation Effects 0.000 description 4
- 238000003776 cleavage reaction Methods 0.000 description 4
- 230000002550 fecal effect Effects 0.000 description 4
- 235000013305 food Nutrition 0.000 description 4
- 230000007062 hydrolysis Effects 0.000 description 4
- 238000006460 hydrolysis reaction Methods 0.000 description 4
- 230000007017 scission Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 238000004885 tandem mass spectrometry Methods 0.000 description 4
- 241000605861 Prevotella Species 0.000 description 3
- 150000005693 branched-chain amino acids Chemical class 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 230000009087 cell motility Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 235000005911 diet Nutrition 0.000 description 3
- 235000014113 dietary fatty acids Nutrition 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 229930195729 fatty acid Natural products 0.000 description 3
- 239000000194 fatty acid Substances 0.000 description 3
- 150000004665 fatty acids Chemical class 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 230000032258 transport Effects 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 108010055682 3-hydroxybutyryl-CoA dehydrogenase Proteins 0.000 description 2
- 241000606750 Actinobacillus Species 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 241000186000 Bifidobacterium Species 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 241000305071 Enterobacterales Species 0.000 description 2
- 108050001049 Extracellular proteins Proteins 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- 102100041003 Glutamate carboxypeptidase 2 Human genes 0.000 description 2
- 102100039894 Hemoglobin subunit delta Human genes 0.000 description 2
- 102400000108 N-terminal peptide Human genes 0.000 description 2
- 101800000597 N-terminal peptide Proteins 0.000 description 2
- 240000007594 Oryza sativa Species 0.000 description 2
- 235000007164 Oryza sativa Nutrition 0.000 description 2
- 241001494479 Pecora Species 0.000 description 2
- 241000205156 Pyrococcus furiosus Species 0.000 description 2
- 239000004473 Threonine Substances 0.000 description 2
- 244000098338 Triticum aestivum Species 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 238000010224 classification analysis Methods 0.000 description 2
- 239000000356 contaminant Substances 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000000378 dietary effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 210000003608 fece Anatomy 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010230 functional analysis Methods 0.000 description 2
- 239000008103 glucose Substances 0.000 description 2
- 238000003368 label free method Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 229920003259 poly(silylenemethylene) Polymers 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000024176 regulation of proteolysis Effects 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 229960002898 threonine Drugs 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 102100025230 2-amino-3-ketobutyrate coenzyme A ligase, mitochondrial Human genes 0.000 description 1
- 241000372033 Andromeda Species 0.000 description 1
- 244000105624 Arachis hypogaea Species 0.000 description 1
- 235000010777 Arachis hypogaea Nutrition 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000238582 Artemia Species 0.000 description 1
- 241000605059 Bacteroidetes Species 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- FERIUCNNQQJTOY-UHFFFAOYSA-M Butyrate Chemical compound CCCC([O-])=O FERIUCNNQQJTOY-UHFFFAOYSA-M 0.000 description 1
- FERIUCNNQQJTOY-UHFFFAOYSA-N Butyric acid Natural products CCCC(O)=O FERIUCNNQQJTOY-UHFFFAOYSA-N 0.000 description 1
- 241001112696 Clostridia Species 0.000 description 1
- 241000193403 Clostridium Species 0.000 description 1
- SHZGCJCMOBCMKK-UHFFFAOYSA-N D-mannomethylose Natural products CC1OC(O)C(O)C(O)C1O SHZGCJCMOBCMKK-UHFFFAOYSA-N 0.000 description 1
- 241000238557 Decapoda Species 0.000 description 1
- 241000192125 Firmicutes Species 0.000 description 1
- PNNNRSAQSRJVSB-SLPGGIOYSA-N Fucose Natural products C[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C=O PNNNRSAQSRJVSB-SLPGGIOYSA-N 0.000 description 1
- 241000287826 Gallus Species 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 108030000900 Glycine C-acetyltransferases Proteins 0.000 description 1
- 244000068988 Glycine max Species 0.000 description 1
- 235000010469 Glycine max Nutrition 0.000 description 1
- 238000012313 Kruskal-Wallis test Methods 0.000 description 1
- SHZGCJCMOBCMKK-DHVFOXMCSA-N L-fucopyranose Chemical compound C[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@@H]1O SHZGCJCMOBCMKK-DHVFOXMCSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 241001112693 Lachnospiraceae Species 0.000 description 1
- 241000277275 Oncorhynchus mykiss Species 0.000 description 1
- 102000004035 Phosphoenolpyruvate carboxykinase (ATP) Human genes 0.000 description 1
- 108090000472 Phosphoenolpyruvate carboxykinase (ATP) Proteins 0.000 description 1
- 241000425347 Phyla <beetle> Species 0.000 description 1
- 241000385060 Prevotella copri Species 0.000 description 1
- 241001354471 Pseudobahia Species 0.000 description 1
- 241000605947 Roseburia Species 0.000 description 1
- 241000282849 Ruminantia Species 0.000 description 1
- 241000192031 Ruminococcus Species 0.000 description 1
- 241000277289 Salmo salar Species 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 235000002560 Solanum lycopersicum Nutrition 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 102000019197 Superoxide Dismutase Human genes 0.000 description 1
- 108010012715 Superoxide dismutase Proteins 0.000 description 1
- 241000282898 Sus scrofa Species 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 240000008042 Zea mays Species 0.000 description 1
- 235000007244 Zea mays Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 239000003963 antioxidant agent Substances 0.000 description 1
- 230000003078 antioxidant effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- 230000032770 biofilm formation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 150000001732 carboxylic acid derivatives Chemical class 0.000 description 1
- 150000001735 carboxylic acids Chemical class 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 208000037976 chronic inflammation Diseases 0.000 description 1
- 208000037893 chronic inflammatory disorder Diseases 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 210000003495 flagella Anatomy 0.000 description 1
- 230000005714 functional activity Effects 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000004110 gluconeogenesis Effects 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 244000005709 gut microbiome Species 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 150000004715 keto acids Chemical class 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 230000002503 metabolic effect Effects 0.000 description 1
- 230000006609 metabolic stress Effects 0.000 description 1
- 238000002705 metabolomic analysis Methods 0.000 description 1
- 230000001431 metabolomic effect Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000010239 partial least squares discriminant analysis Methods 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 230000007065 protein hydrolysis Effects 0.000 description 1
- 230000030788 protein refolding Effects 0.000 description 1
- 238000000575 proteomic method Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003938 response to stress Effects 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
- 230000032895 transmembrane transport Effects 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the technical field of biology, and discloses a macro proteome data mining method taking hemitrypsin polypeptide as a center. These strategies can reduce the false positive rate due to database incompleteness and post-translational modifications. When the method of the invention is used for analyzing the Escherichia coli proteome, 93.4 percent of peptide fragments identified from a huge macroprotein database are consistent with those identified from a traditional Escherichia coli reference database.
Description
Technical Field
The invention relates to the technical field of biological information analysis, in particular to a macro-proteome mining method and application thereof in obtaining the hydrolysis characteristics of intestinal microorganisms.
Background
Gut microbes live in a dynamic environment and face protein toxicity and metabolic stresses from drugs, diet, microbial competition, and endogenous chemical composition of the host. Bacteria have evolved different regulatory strategies to adapt to changing environments, including changes in gene expression, changes in cell differentiation and motility, in which proteolysis plays a crucial role, proteolytic regulation is an important process affecting all organisms, bacteria use energy-dependent proteases to degrade misfolded proteins, or activate regulatory proteins to react rapidly to the dynamic intestinal environment. The functions of microorganisms to regulate by proteolysis are very extensive, such as stress response, cell growth division, biofilm formation, secretion of proteins.
Inflammatory Bowel Disease (IBD) is a chronic inflammatory disease that is affected by both genetic and environmental factors, and primarily includes Crohn's Disease (CD) and Ulcerative Colitis (UC). IBD has been reported to be associated with intestinal microbial dysregulation. In the IBD intestinal microbiome study, metagenomics and 16S rRNA gene sequencing account for the vast majority. However, there is a need for macrotranscriptomics or macroproteomics to pinpoint functional and metabolic activity by directly measuring RNA and protein, respectively. Furthermore, there are important regulatory modes at the protein level, such as proteolysis regulation, which are not available through RNA studies, but can be studied using macroproteomics.
However, in a complex disease state such as IBD, the change in the characteristics of intestinal microbial proteolysis has not been studied, and therefore a method capable of grasping the characteristics of intestinal microbial proteolysis in a complex disease state is demanded.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the problems in the prior art, and firstly, a macro proteome mining method taking hemitrypsin polypeptide as a center is provided, and a method for comparing the proteolysis degree is also provided.
A second object of the invention is to provide the use of the above method for obtaining proteolytic characteristics of intestinal microorganisms.
The purpose of the invention is realized by the following technical scheme:
a method of determining the degree of proteolysis, comprising the steps of:
s1, (macro) proteome data of the obtained sample or published in a public database;
s2, performing a first search using a large macro protein database and PEAKS DB software to obtain a protein from which at least one peptide is identified;
s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKS DB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKS DB, the MaxQuant software and the pBind software;
s4, distinguishing Semi-trypsin polypeptide (Semi-tryptic peptide) and full-trypsin polypeptide (full tryptic peptide) in the peptide obtained in S3;
and S5, determining the proteolysis degree by using the normalized relative abundance of the semi-trypsin polypeptide, wherein the normalized relative abundance of the semi-trypsin polypeptide is obtained by normalizing the relative abundance of the semi-trypsin polypeptide to the relative abundance of the full trypsin polypeptide.
Preferably, in S4, the identification principle of the hemitrypsin polypeptide is: peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein).
The first amino acid of the peptide fragment generated by the trypsin hydrolysis of the protein during the preparation of the proteomic sample should be K or R, and the last amino acid should also be K or R. If hemitrypsin is detected in the data, it is indicated that other proteases than trypsin are involved in the hydrolysis of the protein, resulting in that the first or last amino acid of the peptide fragment is not K or R, so hemitrypsin can be used as a sign that the protein is hydrolyzed by other proteases in the organism, and complete trypsin can be used as a sign that the protein is not hydrolyzed by other proteases in the organism. But studying the extent of proteolysis cannot be solely dependent on hemitrypsin, since changes in the abundance of hemitrypsin may be due solely to changes in the corresponding total amount of protein (increased or decreased synthesis), while the extent of proteolysis is not changed. It is therefore desirable to normalize the relative abundance of hemitrypsin polypeptides to that of complete trypsin polypeptides to compare the change in the degree of proteolysis between samples, thus eliminating the factor of total protein variation.
Preferably, the parameters for performing the search in the PEAKS DB database are: the mass deviation of the parent ion (precursor ion) is 10ppm, and the mass deviation of the fragment ion (product ion) is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 3; the false positive rate (false discovery rate) was set to 1%.
Preferably, the parameters of the MaxQuant performing the search are as follows: the primary search (first search) quality deviation was 20ppm, the main search (main search) quality deviation was 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specific (semi-specific), and the number of sites which are not digested is at most 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate (FDR) is set to 1%, and the peptide fragments with the Posterior Error Probability (PEP) less than 5% are reserved for subsequent analysis.
Preferably, pFind performs the search with the parameters: the parameters for pFind to perform the search are: the mass deviation of parent ions is 10ppm, the mass deviation of fragment ions is 20ppm, the library searching mode is open-search (open-search), the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by enzyme is at most 3; FDR was set to 1%.
The invention also provides the application of the method.
In particular, the above method is used to capture the proteolytic characteristics of intestinal microorganisms. Given the different levels of information beyond the flora structure and protein abundance, this analysis was based on the assumption that similar degrees of proteolysis should result in similar relative abundances of hemitrypsin polypeptides, the present study found that microbial hemitrypsin polypeptides in the 447 faecal macroproteins were enriched in several biological processes including metabolic processes of fatty acids, carboxylic acids, glucose and dunaliose, biosynthetic processes of branched chain amino acids, protein trafficking and bacterial flagellar-mediated cell motility, indicating that they undergo a more extensive regulation of proteolysis.
Alternatively, the above methods are used to study gut microflora and host-microorganism interactions.
The method of the present invention for mining the proteome is also suitable for capturing the proteolytic characteristics of plants and environmental microorganisms, and therefore, the method can be used for exploring the proteolytic laws of plants and environmental microorganisms.
The method can also be used for researching diseases (such as bacterial infection and inflammatory bowel disease) related to bacterial protease, and the change of the bacterial proteolysis degree can be researched, so that the corresponding bacterial protease is taken as a target, and the corresponding medicine is developed in a targeted manner for regulation.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a macro-proteome mining method taking hemitrypsin polypeptide as a center, which comprises two-step search, de novo sequencing, open search and result matching of various software to carry out large-scale macro-proteome mining taking hemitrypsin peptide as the center. These strategies can reduce false positive identifications due to database incompleteness and polypeptide modification. In the past, semi-trypsin polypeptide search is carried out on a macro-proteomics data set generated by low-resolution MS/MS, so that the search space is inevitably increased, and the confidence of an identification result is reduced. In their study, only 80.2% of the identified peptides were annotated as p.furiosus sequences when searching the Pyrococcus furiosus proteome in a large macro database containing 6162,582 sequences. In contrast, the present invention is directed to multi-engine searching of high resolution MS/MS data. The use of the method of the present invention in the analysis of the E.coli proteome showed that 93.4% of the peptides identified from a significantly larger macroprotein database (130,975,891 sequences) were identical to those identified from the conventional E.coli reference database, indicating a better accuracy of the method.
Drawings
Figure 1 is the normalized relative abundance of semi-trypsin polypeptides from the main bacterial species and biological processes (NRASP, semi-trypsin polypeptide abundance/full trypsin polypeptide abundance) in 447 fecal metabolomics samples, with the functions of the different bacterial species (a), biological processes (B) and enzymes (C) in ascending order; the block diagram represents the median (line in the middle of the box), 25 th percentile, and 75 th percentile; the dashed line represents 1.5 times the quartile range (IQR), and the outliers are shown as dots;
FIG. 2 shows the change in proteolytic characteristics of E.coli proteome in different biological processes induced by heat stress (p < 0.05).
Detailed Description
The following further describes the embodiments of the present invention. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The test methods used in the following experimental examples are all conventional methods unless otherwise specified; the materials, reagents and the like used are, unless otherwise specified, commercially available reagents and materials.
Data set: a data set of 2 publicly published populations of healthy and IBD intestinal macroproteins was analyzed, data set 1(PXD008675) consisting of 447 fecal macroproteins from 89 subjects aged 6-58 years with a median of 22.8 years, including 24 non-IBD controls, 39 CD patients, 26 UC patients; of these samples, 272 samples had a matching metagenome and 184 samples had a matching metaproteome, respectively; we also analyzed the proteome data set (PXS000498) to investigate the effect of heat stress on the proteolytic regulation of E.coli K-12.
Macro protein database: a comprehensive human intestinal microbial protein database consists of the following components: (1) an Integrated Gene Catalog (IGC) database based on 1267 intestinal metagenomes from 1070 individuals (760 european, 368 chinese and 139 american samples); (2) sequence data for 215 strains cultured from healthy adult feces; (3) a Culturable Genome Reference (CGR) database containing 1520 non-redundant, high quality genomes of 6000 enterobacteria isolated from healthy human feces; (4) all archaea, bacterial and fungal sequences in UniProtKB (version 2017_06) and NCBI RefSeq (version 90). The above-mentioned microbial sequence database is supplemented with a UniProt human reference proteome, which includes a food database composed of dietary organic substances, such as the organisms Triticum aestivum, Oryza sativa subsp, Glycine max, Zea mays, Arachis hypogaea, Solanum tuberosum, Lycopersicum esculentum, Sus scrofa, Bos taurus, Chicken (Gallus gallis), sheep (Ovis aries), Fish (Salmo salar and Oncorhynchus mykiss), shrimp (Artemia sp, and Lipopenaeus vammi), and a common contaminant database (Sal. sativa), a food database composed of dietary organic substances, such as Triticum aestivum, Oryza sativa, and ahttp://maxquant.org/contaminants.zip). Using USEARCH v11.0.667(-Fastx _ Uniques) to remove repeat protein sequence, 130,975,891 non-redundant sequences were obtained.
The statistical analysis method comprises the following steps: multivariate analysis was performed on the amino acid frequencies near the cleavage site using Principal Component Analysis (PCA) and partial least squares discriminant analysis (PLS-DA), and the deletion values were estimated using Bayesian PCA (BPCA). Variables that differ significantly between groups (present in at least 75% of samples) were detected in R (version 3.5.3) and RStudio (version 1.1.383) using Kruskal-Wallis test and Dunn-Bonferroni test with P values less than 0.05. The beta diversity of the multiple sets of mathematical data was determined using principal coordinate analysis of the Bray-Curtis distance (PcoA).
Example 1 representation of different software performing a search
Using MLI data sets and large macroprotein databases, we compared the performance of different commercial software (protome discover, PEAK, ProteinPilot, and byionic) and open source software (MaxQuant, MSFragger, and pFind) searching for hemitryptic peptides on several 36-core servers (with 192G memory installed). Proteome discover, Byonic, MaxQuant, pFind, and ProteinPilot did not complete the search within a month, while MSFragger crashed due to an out of memory error. Only PEAK completed the analysis within one month, so a further high throughput analysis was performed using a 156-core high performance computing cluster that completed the database search within 2 weeks.
Example 2 database search
The database search process generally includes two main steps: (1) de-novo sequencing and performing a first search using a large macro protein database (large database) and PEAKS software to obtain proteins from which at least one peptide is identified and to generate a corresponding small protein database (reduced database); (2) a second search was performed using reduced database and various software to improve the accuracy of identifying hemitrypsin polypeptides.
To cope with the increased search space and time in the identification of macroprotein hemitrypsin polypeptides, a search was first performed on clusters configured with an intel (r) xeon (r) 156-core processor and 1.5TB 2666MHz memory using PEAKS DB, the software first performed de novo sequencing, followed by a database search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.
Here a two-step search strategy is used, in order to increase the sensitivity of the search of the library, the protein identified by at least one peptide in the first search step is retained for the second multiple engine search, the second search step using PEAKS DB, MaxQuant (version 1.6.2) and pFind (version 3.1.5).
A MaxQuant (version 1.6.2.10) search is performed using the Andromeda engine. The parameters are set as follows: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; setting the false positive rate as 1%, and reserving the peptide segment with the posterior error probability less than 5% for subsequent analysis; the "Second peptides" option searches for co-fragmented peptides in the MS/MS spectra. The "match between runs" option is enabled, setting a matching time window of 0.7 minutes and a calibration period of 20 minutes. Quantification of proteins and peptides using the label-free quantification (LFQ) algorithm, the minimum ratio count was 1, and the minimum and average neighborhood numbers were 3 and 6, respectively.
Database searching was performed using pFind, the mass deviation of parent ions was 10ppm, the mass deviation of fragment ions was 20ppm, the library search mode was open-search (open-search), the enzyme was trypsin, the cleavage mode was semi-specific, and the number of sites not cleaved was at most 3.
Only peptides identified by three search engines (PEAKS DB, MaxQuant and pFind) were retained for further analysis.
Example 3 hemitrypsin polypeptide identification and Classification and functional analysis
1. Identification principle of hemitrypsin polypeptide
Peptides that did not have an R or K amino acid at the first amino acid of the identified sequence were hemitrypsin N-terminal peptides (not including the N-terminus of the protein). The last amino acid of the identified sequence, lacking either R or K, is the hemitrypsin C-terminal peptide (not including the C-terminus of the protein). The In-source fragment (In-source CID fragment) is distinguished from the proteolytically derived hemitrypsin polypeptide by elution time. Most of the intrasource fragments showed different retention times compared to their theoretical retention times (predicted using SSRCalc). Microbial hemitrypsin polypeptides are distinguished from human-derived and food-derived peptides by the corresponding accession numbers in the FASTA sequence entries.
2. Data combining hemitrypsin and complete trypsin to quantify the degree of hydrolysis of proteins
We determined the change in the degree of proteolysis from the normalized relative abundances of hemitryptic peptides (NRASP) by normalizing the relative abundances of hemitryptic peptides to the relative abundance of complete tryptic peptides. This normalization step is important because if the abundance of the hemitrypsin polypeptide and the complete trypsin polypeptide are varied in proportion, it is generally indicated that there is no change in the degree of proteolysis. However, in this case, differences between groups occur if only hemitrypsin polypeptides are compared.
3. Results
To improve the sensitivity of macro-proteome analysis based on large sequence space, we adopted a two-step database search strategy. This effectively reduces the macro-protein database size to that of traditional proteomic analysis, thereby facilitating hemitrypsin-based macro-proteomic searches. Furthermore, the confidence in peptide identification was improved by combining three commonly used software. These software used different algorithms for peak matching, co-efflux peptide fragment recognition and FDR calculation (MaxQuant and pFind use the target-decoy strategy, PEASK DB uses the decoy-fusion method), thereby significantly increasing the confidence of peptide recognition. Only the peptides identified collectively by the three software were retained for further analysis.
A total of 12,828,005 MS/MS patterns were retrieved and 3,804,903 (29.66%) secondary Patterns (PSMs), 125,494 peptides were identified from the stool macroprotein group, of which 108,784 (86.68%) are microbial-specific peptides (not shared by human or food sequences). 7,969 (6.35%) human specific polypeptides were identified in the fecal macroprotein group, of which 5,104 (64.05%) peptide is hemitrypsin. Gene Ontology (GO) analysis showed that 84.13% of human hemitryptic peptides are derived from potential extracellular proteins, and only 1.16% of microbial hemitryptic peptides are derived from potential extracellular proteins.
Example 4 the above procedure was verified by analyzing the proteolytic characteristics in the E.coli heat shock reaction
We validated our method by analyzing the heat shock-induced proteolytic features using the published proteome data set of E.coli K12. 9937 peptides were identified using the large macroprotein database described above in conjunction with three search engines, while 14111 peptides were identified using the UniProt e.coli K12 reference database. The number of peptides identified in both methods was reduced by 29.6%, reflecting the normal loss of sensitivity, since large databases produced 10,000 times more sequences than the conventional reference sequence.
Of all 14111 peptides identified by UniProt e.coli K12 reference database, 83.7% had PEP values below 0.01 and 61.6% had PEP values below 0.001. Whereas in the 4783 peptides identified only by the UniProt e. coli K12 reference database (not identified by the macroprotein database), PEP values were 60.3% below 0.01 and 39.5% below 0.001. Peptide fragments identified in the reference database by UniProt e. coli K12 only had higher PEP values, indicating that low quality Peptide Spectra (PSMs) are more susceptible to sensitivity reduction in large database searches. It is also noteworthy that the single microbial proteome is significantly different from the intestinal proteome. Recent studies have shown that large public database assembled macro protein databases and sample-matched reference databases (sample-matched) have produced comparable results for intestinal macroproteomics studies. Therefore, our method does not show significant sensitivity loss in intestinal metaproteome analysis. 93.4% of the peptide fragments identified by the huge macroprotein database are consistent with those identified by the escherichia coli reference database, which indicates that the method has higher peptide fragment identification accuracy.
To validate our approach, we compared the NRASP of 185 biological processes found in all samples (as a regulatory indicator of proteolysis), and found that the NRASP of 20 (about 10.8%) biological processes was significantly different between the control and heat-stressed groups (P-value <0.05, fig. 2).
Heat stress disturbs the folding of proteins, leading to the accumulation of misfolded proteins that need to be refolded into the correct conformation. Accordingly, using our method, it was found that NRASP associated with protein folding under heat stress decreased, while NRASP associated with protein refolding increased. At the same time we observed an increase in methylation-associated NRASP under heat stress, which is consistent with recent findings. In conclusion, the biological findings obtained by using our method and the regulation of proteolysis have higher reliability.
Example 5 Classification and functional analysis of peptides
Analysis was performed using Unipept (version 4.3.5), using UniProt 2020.01, based on the Lowest Common Ancestral (LCA) algorithm, and all peptides were analyzed with the following parameters: i and L were equalized, repeat peptides were filtered, and Advanced deletion cleavage treatment (Advanced missing cleavage). The classification information is a Sunburst view visualization provided using UniPept. A
Results of the study
(1) Relative abundance and distribution of hemitrypsin polypeptides
Figure 1 shows NRASP with 20 major bacterial species, 35 major biological processes and 32 enzyme subclasses identified in at least 75% of samples from 447 fecal macroproteins from CD (n-204), UC (n-123) and control (n-120) groups. The median number of NRASPs from the phyla Firmicutes and Bacteroidetes, Bacteroides and Clostridia, Bacteroides and Bacteroides, and Bacteroides (Bacteroides) was around 1, indicating that the relative abundance of the corresponding hemitryptic peptide was comparable to that of the complete tryptic peptide (FIG. 1A). However, the median of NRASP increased to about 1.25 in the families of Lachnospiraceae and ruminants (ruminococcus), respectively, the median of NRASP increased to 1.5 in the genera Roseburia (genera Rosebularia) and Prevotella (Prevotella), respectively, and Clostridium (Faecalixizii) and Prevotella (Prevotella copri), respectively, while the median of NRASP decreased to about 0.5 in the phylum Actinobacillus (Actinobacillus) and the order Bifidobacterium (Bifidobacterium). The above data indicate that different enterobacteria have different degrees of proteolytic hydrolysis.
The median of NRASP for most biological processes also fluctuates around 1 (fig. 1B). While NRASP values of isoleucine biosynthetic process, valine biosynthetic process, bacterial flagellum-dependent cell movement, protein transport, carboxylic acid metabolic process, fucose metabolic process and glucose metabolic process are all increased to 1.75-2, NRASP of fatty acid metabolic process and L-threonine catabolic process is further increased to 2.5, NRASP of polysaccharide catabolic process, carbohydrate transport and transmembrane transport is reduced to about 0.75, and NRASP of metabolic process is further reduced to 0.3.
At the enzyme level, NRASP is highest for 3-hydroxybutyryl-coa dehydrogenase involved in butyrate metabolism (median >3), followed by 3-hydroxybutyryl-coa dehydrogenase involved in fatty acid beta oxidation, glycine C-acetyltransferase involved in L-threonine degradation, phosphoenolpyruvate carboxykinase (ATP) involved in gluconeogenesis, ketoacid reductoisomerase involved in Branched Chain Amino Acid (BCAA) biosynthesis, and superoxide dismutase involved in antioxidant stress (NRASP median 2-3, fig. 1C).
Claims (10)
1. A method of determining the degree of proteolysis, comprising the steps of:
s1, (macro) proteome data of the obtained sample or published in a public database;
s2, performing a first search using a large macro protein database and PEAKSDB software to obtain a protein in which at least one peptide is identified;
s3, performing library searching identification on omics data and the protein sequence obtained in S2 by using PEAKSDB software, MaxQuant software and pBind software, and reserving peptides simultaneously identified by the PEAKSDB, MaxQuant and pBind software;
s4, distinguishing a hemitrypsin polypeptide and a complete trypsin polypeptide in the peptide obtained in S3;
and S5, determining the proteolysis degree by using the normalized relative abundance of the semi-trypsin polypeptide, wherein the normalized relative abundance of the semi-trypsin polypeptide is obtained by normalizing the relative abundance of the semi-trypsin polypeptide to the relative abundance of the full trypsin polypeptide.
2. The method of determining the degree of proteolysis of claim 1, wherein the identity of the hemitrypsin polypeptide in S4 is determined by: the identified peptide fragment is a hemi-trypsin N-terminal peptide if the first amino acid is not R or K (excluding the protein N-terminal peptide fragment), and the identified peptide fragment is a hemi-trypsin C-terminal peptide if the last amino acid is not R or K (excluding the protein C-terminal peptide fragment).
3. The method of claim 1, wherein the PEAKSDB database performs the search using the following parameters: the mass deviation of the parent ion is 10ppm, and the mass deviation of the fragment ion is 0.02 Da; aminomethylation of cysteine is set as a fixed modification; the maximum variable post-translational modifications per peptide were 3, including acetylation of the protein N-terminus, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 3; the false positive rate was set to 1%.
4. The method of claim 1, wherein MaxQuant performs a search with parameters of: the primary search mass deviation is 20ppm, and the main search mass deviation is 4.5 ppm; the enzyme is trypsin, the enzyme digestion mode is semi-specificity, and the maximum number of sites which are not digested is 2; aminomethylation of cysteine is set as a fixed modification; the maximum number of variable post-translational modifications per peptide was 5, including acetylation of the N-terminus of the protein, oxidation of methionine, deamidation of asparagine and glutamine, and conversion of glutamine to pyroglutamic acid; the false positive rate is set as 1%, and peptide fragments with the posterior error probability less than 5% are reserved for subsequent analysis.
5. Method for determining the degree of proteolysis according to claim 1, characterized in that the pFind performs the search with the parameters: the mass deviation of the parent ions is 10ppm, the mass deviation of the fragment ions is 20ppm, the library searching mode is open library searching, the enzyme is trypsin, the enzyme cutting mode is semi-specificity, and the number of sites which are not cut by the enzyme is at most 3; FDR was set to 1%.
6. Use of the method of any one of claims 1 to 5.
7. Use according to claim 6, wherein the method is used to capture characteristic information of intestinal microbial proteolysis.
8. The use according to claim 6, wherein the method is used for studying gut microbial and host interaction.
9. Use according to claim 6, wherein the method is used for studying diseases associated with bacterial proteases.
10. The use according to claim 9, wherein said diseases include, but are not limited to, bacterial infections, inflammatory bowel disease.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011415023.0A CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011415023.0A CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112786105A true CN112786105A (en) | 2021-05-11 |
CN112786105B CN112786105B (en) | 2024-05-07 |
Family
ID=75750749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011415023.0A Active CN112786105B (en) | 2020-12-07 | 2020-12-07 | Macro-proteome excavation method and application thereof in obtaining proteolytic characteristics of intestinal microorganisms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112786105B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115267033A (en) * | 2022-08-05 | 2022-11-01 | 西湖大学 | Macro-proteomics analysis method based on mass spectrum data and electronic equipment |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
US20050032040A1 (en) * | 2002-10-11 | 2005-02-10 | Bettina Warscheild | Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides |
US20050048564A1 (en) * | 2001-05-30 | 2005-03-03 | Andrew Emili | Protein expression profile database |
CN1692282A (en) * | 2002-04-15 | 2005-11-02 | 萨莫芬尼根有限责任公司 | Quantitation of biological molecules |
US20070231909A1 (en) * | 2005-10-13 | 2007-10-04 | Applera Corporation | Methods for the development of a biomolecule assay |
US20100047261A1 (en) * | 2006-10-31 | 2010-02-25 | Curevac Gmbh | Base-modified rna for increasing the expression of a protein |
US20100143912A1 (en) * | 2007-01-25 | 2010-06-10 | The Regents Of The Universuty Of California | Specific n-terminal labeling of peptides and proteins in complex mixtures |
US20110093205A1 (en) * | 2009-10-19 | 2011-04-21 | Palo Alto Research Center Incorporated | Proteomics previewer |
CN103268432A (en) * | 2013-05-08 | 2013-08-28 | 中国科学院水生生物研究所 | Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry |
US20140072991A1 (en) * | 2011-04-04 | 2014-03-13 | Atlas Antibodies Ab | Quantitative standard for mass spectrometry of proteins |
KR20140101134A (en) * | 2013-02-08 | 2014-08-19 | 건국대학교 산학협력단 | Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration |
US20150248998A1 (en) * | 2012-11-15 | 2015-09-03 | Dh Technologies Development Pte. Ltd. | Systems and Methods for Identifying Compounds from MS/MS Data without Precursor Ion Information |
US20150309045A1 (en) * | 2012-11-28 | 2015-10-29 | Eth Zurich | Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof |
WO2018165350A1 (en) * | 2017-03-07 | 2018-09-13 | Nuseed Pty Ltd. | Lc-ms/ms-based methods for characterizing proteins |
US20180340941A1 (en) * | 2017-05-25 | 2018-11-29 | Wisconsin Alumni Research Foundation | Method to Map Protein Landscapes |
CN109444313A (en) * | 2018-10-23 | 2019-03-08 | 大连工业大学 | Method based on LC-MS technology analysis protein-PS complex digestibility |
US20190307856A1 (en) * | 2016-10-12 | 2019-10-10 | Institute For Research In Biomedicine | Arginine And Its Use As A T Cell Modulator |
US20200141946A1 (en) * | 2017-08-25 | 2020-05-07 | Nanjing Agricultural University | Method for evaluating in vivo protein nutrition based on lc-ms-ms technique |
CN111220690A (en) * | 2018-11-27 | 2020-06-02 | 中国科学院大连化学物理研究所 | Direct mass spectrometry detection method for low-abundance protein posttranslational modification group |
-
2020
- 2020-12-07 CN CN202011415023.0A patent/CN112786105B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050048564A1 (en) * | 2001-05-30 | 2005-03-03 | Andrew Emili | Protein expression profile database |
CN1692282A (en) * | 2002-04-15 | 2005-11-02 | 萨莫芬尼根有限责任公司 | Quantitation of biological molecules |
US20050032040A1 (en) * | 2002-10-11 | 2005-02-10 | Bettina Warscheild | Analyzing and distinguishing organisms such as bacterial spores by their soluble polypeptides |
WO2004046731A2 (en) * | 2002-11-18 | 2004-06-03 | Ludwig Institute For Cancer Research | Method for analysing amino acids, peptides and proteins using mass spectroscopy of fixed charge-modified derivatives |
US20070231909A1 (en) * | 2005-10-13 | 2007-10-04 | Applera Corporation | Methods for the development of a biomolecule assay |
US20100047261A1 (en) * | 2006-10-31 | 2010-02-25 | Curevac Gmbh | Base-modified rna for increasing the expression of a protein |
US20100143912A1 (en) * | 2007-01-25 | 2010-06-10 | The Regents Of The Universuty Of California | Specific n-terminal labeling of peptides and proteins in complex mixtures |
US20110093205A1 (en) * | 2009-10-19 | 2011-04-21 | Palo Alto Research Center Incorporated | Proteomics previewer |
US20140072991A1 (en) * | 2011-04-04 | 2014-03-13 | Atlas Antibodies Ab | Quantitative standard for mass spectrometry of proteins |
US20150248998A1 (en) * | 2012-11-15 | 2015-09-03 | Dh Technologies Development Pte. Ltd. | Systems and Methods for Identifying Compounds from MS/MS Data without Precursor Ion Information |
US20150309045A1 (en) * | 2012-11-28 | 2015-10-29 | Eth Zurich | Method and tools for the determination of conformation and conformational changes of proteins and of derivatives thereof |
KR20140101134A (en) * | 2013-02-08 | 2014-08-19 | 건국대학교 산학협력단 | Method for providing information by Proteomic Analysis of the Aqueous Humor in Age-related Macular Degeneration Patients and biomarker for Age-related Macular Degeneration |
CN103268432A (en) * | 2013-05-08 | 2013-08-28 | 中国科学院水生生物研究所 | Method of identifying protein phosphorylation modification sites on the basis of tandem mass spectrometry |
US20190307856A1 (en) * | 2016-10-12 | 2019-10-10 | Institute For Research In Biomedicine | Arginine And Its Use As A T Cell Modulator |
WO2018165350A1 (en) * | 2017-03-07 | 2018-09-13 | Nuseed Pty Ltd. | Lc-ms/ms-based methods for characterizing proteins |
US20180340941A1 (en) * | 2017-05-25 | 2018-11-29 | Wisconsin Alumni Research Foundation | Method to Map Protein Landscapes |
US20200141946A1 (en) * | 2017-08-25 | 2020-05-07 | Nanjing Agricultural University | Method for evaluating in vivo protein nutrition based on lc-ms-ms technique |
CN109444313A (en) * | 2018-10-23 | 2019-03-08 | 大连工业大学 | Method based on LC-MS technology analysis protein-PS complex digestibility |
CN111220690A (en) * | 2018-11-27 | 2020-06-02 | 中国科学院大连化学物理研究所 | Direct mass spectrometry detection method for low-abundance protein posttranslational modification group |
Non-Patent Citations (3)
Title |
---|
何明敏;舒坤贤;白明泽;许睿;: "质谱图聚类网络法在鉴定多肽翻译后修饰中的应用及研究进展", 生物工程学报, no. 10, 19 April 2018 (2018-04-19) * |
吴重德;黄钧;周荣清;: "宏蛋白质组学研究进展及应用", 食品与发酵工业, no. 05, 15 April 2016 (2016-04-15) * |
齐崴, 何明霞, 何志敏, 史德青: "胰蛋白酶水解全酪蛋白反应过程中的色谱分析", 色谱, no. 01, 30 January 2002 (2002-01-30) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115267033A (en) * | 2022-08-05 | 2022-11-01 | 西湖大学 | Macro-proteomics analysis method based on mass spectrum data and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112786105B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Braaksma et al. | An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data | |
Kolmeder et al. | Metaproteomics of our microbiome—developing insight in function and activity in man and model systems | |
Pomastowski et al. | Analysis of bacteria associated with honeys of different geographical and botanical origin using two different identification approaches: MALDI-TOF MS and 16S rDNA PCR technique | |
Kallow et al. | MALDI‐TOF MS for microbial identification: Years of experimental development to an established protocol | |
Falb et al. | Archaeal N-terminal protein maturation commonly involves N-terminal acetylation: a large-scale proteomics survey | |
Radzinski et al. | Temporal profiling of redox-dependent heterogeneity in single cells | |
Jonckheere et al. | Omics assisted N-terminal proteoform and protein expression profiling on methionine aminopeptidase 1 (MetAP1) deletion | |
Šedo et al. | Limitations of routine MALDI-TOF mass spectrometric identification of Acinetobacter species and remedial actions | |
Sabarly et al. | Interactions between genotype and environment drive the metabolic phenotype within E scherichia coli isolates | |
CN112786105A (en) | Macroproteome mining method and application thereof in obtaining intestinal microbial proteolysis characteristics | |
Laschuk et al. | Proteomic survey of the cestode Mesocestoides corti during the first 24 hours of strobilar development | |
US8224581B1 (en) | Methods for detection and identification of cell type | |
Seerangaiyan et al. | Untargeted metabolomics of the bacterial tongue coating of intra-oral halitosis patients | |
Franklin et al. | Proteomic genotyping: Using mass spectrometry to infer SNP genotypes in pigmented and non-pigmented hair | |
Yu et al. | Proteogenomic analysis provides novel insight into genome annotation and nitrogen metabolism in Nostoc Sp. PCC 7120 | |
Plikat et al. | From proteomics to systems biology of bacterial pathogens: approaches, tools, and applications | |
WO2022192857A9 (en) | Biomarkers for determining an immuno-oncology response | |
Yan et al. | Metaproteomics reveals potential signatures of disease-specific alterations in the gut microbial proteolytic events in inflammatory bowel disease | |
EP4097478A2 (en) | Biomarkers for diagnosing ovarian cancer | |
Chen et al. | Human exhaled air diagnostic markers for respiratory tract infections in subjects receiving mechanical ventilation | |
Karlsson et al. | Proteotyping: Tandem mass spectrometry shotgun proteomic characterization and typing of pathogenic microorganisms | |
Candela et al. | Automatic discrimination of species within the Enterobacter cloacae complex using MALDI-TOF Mass Spectrometry and supervised algorithms | |
Bukato et al. | Proteomic dataset: Profiling of cultivated Echerichia coli isolates from Crohn's disease patients and healthy individuals Q9 | |
Weldatsadik et al. | Pool-seq driven proteogenomic database for Group G Streptococcus | |
Rakitina et al. | Proteomic dataset: Profiling of cultivated Echerichia coli isolates from Crohn's disease patients and healthy individuals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |