CN114882942A - Quantitative proteomics analysis method for FLASH irradiated tissue - Google Patents

Quantitative proteomics analysis method for FLASH irradiated tissue Download PDF

Info

Publication number
CN114882942A
CN114882942A CN202210349001.1A CN202210349001A CN114882942A CN 114882942 A CN114882942 A CN 114882942A CN 202210349001 A CN202210349001 A CN 202210349001A CN 114882942 A CN114882942 A CN 114882942A
Authority
CN
China
Prior art keywords
analysis
flash
proteins
group
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210349001.1A
Other languages
Chinese (zh)
Inventor
胡广
吴代
胡文涛
肖飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202210349001.1A priority Critical patent/CN114882942A/en
Publication of CN114882942A publication Critical patent/CN114882942A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a quantitative proteomics analysis method of FLASH irradiated tissues, which comprises the following steps: acquiring original data; preprocessing original data; carrying out differential expression protein analysis on the preprocessed original data to obtain three groups of differential expression proteins; three groups of differentially expressed proteins include: comparing the FLASH irradiation group with a conventional irradiation group, comparing the FLASH irradiation group with a comparison group and comparing the FLASH irradiation group with the comparison group; respectively carrying out protein interaction network construction on the three groups of differential expression proteins and a union set of the three groups of differential expression proteins to generate four networks; respectively carrying out module division on the four networks, and screening out important modules; performing subcellular localization analysis on the two groups of differentially expressed proteins; and obtaining the corresponding relation between the result of the subcellular localization analysis and the important module, and completing the quantitative proteomics analysis of the FLASH irradiation tissue. The method can analyze and obtain the biological principle of the reliable FLASH irradiation technology, and explains the FLASH irradiation superior to the conventional irradiation in the aspect of molecular mechanism.

Description

Quantitative proteomics analysis method for FLASH irradiated tissue
Technical Field
The invention relates to the technical field of FLASH irradiation, in particular to a quantitative proteomics analysis method of FLASH irradiated tissues.
Background
The FLASH irradiation technology is a novel radiotherapy technology, and is an irradiation means using ultra-high dose rate in a very short irradiation time. FLASH allowed normal tissues to be protected from dose-limiting toxicity while maintaining antitumor efficacy. This makes it possible to increase the therapeutic dose without further damaging the surrounding healthy tissue. The biological principles of this illumination technique are currently unclear. And there is no streamlined data analysis method. And the obtained FLASH irradiation data is in the form of expression abundance of protein, the biological significance behind the data can be analyzed only by means of bioinformatics, but no reliable analysis flow is available at present for processing.
Therefore, on the basis of the existing FLASH irradiation technology, how to perform a series of biological information analysis on protein abundance data irradiated by FLASH irradiation is a problem that needs to be solved by the technical personnel in the field by determining the biological principle of the FLASH irradiation technology.
Disclosure of Invention
In view of the above problems, the present invention proposes a method for the quantitative proteomics analysis of FLASH irradiated tissues that solves at least some of the above technical problems, which allows to analyze the biological principles of obtaining the FLASH irradiation technique, explaining from the molecular mechanism what FLASH irradiation is superior to the conventional irradiation.
The embodiment of the invention provides a quantitative proteomics analysis method of FLASH irradiated tissues, which comprises the following steps:
s1, acquiring original data; the raw data includes: protein expression abundance data of FLASH irradiated tissues, protein expression abundance data irradiated by conventional radiotherapy and protein expression abundance data without any irradiation;
s2, preprocessing the original data; carrying out differential expression protein analysis on the preprocessed original data to obtain three groups of differential expression proteins; the three groups of differentially expressed proteins include: comparing the FLASH irradiation group with a conventional irradiation group, comparing the FLASH irradiation group with a comparison group and comparing the FLASH irradiation group with the comparison group; the FLASH irradiation group consists of protein expression abundance data of the FLASH irradiation tissue; the conventional irradiation group consists of protein expression abundance data irradiated by the conventional radiotherapy; the control group consists of the protein expression abundance data without any irradiation;
s3, respectively carrying out protein interaction network construction on the three groups of differentially expressed proteins and the union set of the three groups of differentially expressed proteins to generate four networks; respectively carrying out module division on the four networks, and screening out important modules; the screening of the important modules is determined by the number of proteins contained in the modules, the types of the contained proteins and the number of important proteins related to the action mechanism of FLASH;
s4, carrying out subcellular localization analysis on the two groups of differentially expressed proteins of the FLASH irradiation group comparison control group and the conventional irradiation group comparison control group; and acquiring the corresponding relation between the result of the subcellular localization analysis and the important module, and completing the quantitative proteomics analysis of the FLASH irradiated tissue.
Further, the step S2 further includes:
and respectively carrying out gene ontology function enrichment analysis and KEGG channel enrichment analysis on the three groups of differentially expressed proteins, and visualizing the analysis result.
Further, the step S3 further includes:
respectively carrying out topological and biological analysis on the networks constructed by the three groups of differential expression proteins; the topological analysis comprises: analyzing the degrees and the clustering coefficients of the protein nodes; the biological analysis comprises: analyzing the gene ontology function enrichment analysis result, the KEGG channel enrichment analysis result, the tissue expression result and the protein structural domain result;
and visualizing the KEGG channel enrichment analysis result, and comparing the visualized KEGG channel enrichment analysis result with the visualized KEGG channel enrichment analysis result of the three groups of differential expression proteins before network construction to obtain the difference of the KEGG channel enrichment analysis results before and after network construction.
Further, the step S3 further includes:
respectively carrying out KEGG channel enrichment analysis and visualization on the important modules;
determining the functions of the important modules, and screening out the important modules with the same functions;
acquiring and visualizing the protein-protein interaction between the important modules with the same function;
and merging the important modules with the same functions into a Robust module.
Further, the determining the function of the important module comprises:
respectively comparing the important modules of the network established by the union of the three groups of differential expression proteins and the KEGG channel enrichment analysis results thereof, and the important modules of the network established by the three groups of differential expression proteins and the KEGG channel enrichment analysis results thereof, and determining the functions of the important modules.
Further, the determining the function of the important module further includes:
and for the important module with the function which cannot be determined, checking biological processes through gene ontology function enrichment analysis to assist in determining the function of the important module.
Further, the step S4 further includes:
performing KEGG channel enrichment analysis on two groups of proteins with differentially expressed proteins in the same subcellular localization in the result of the subcellular localization analysis; the two groups of differentially expressed proteins were: FLASH irradiation group versus control group, and conventional irradiation group versus control group.
Further, still include:
s5, carrying out disordered protein analysis on two groups of differentially expressed proteins of the FLASH irradiation group and the conventional irradiation group compared with the control group.
Further, still include:
s6, classifying and visualizing and comparing the differentially expressed proteins of the FLASH irradiation group and the conventional irradiation group with the control group.
Further, still include:
s7, classifying and visualizing and comparing the differentially expressed proteins of the Robust module.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
the embodiment of the invention provides a quantitative proteomics analysis method of FLASH irradiated tissues, which comprises the following steps: acquiring original data; preprocessing original data; carrying out differential expression protein analysis on the preprocessed original data to obtain three groups of differential expression proteins; three groups of differentially expressed proteins include: comparing the FLASH irradiation group with a conventional irradiation group, comparing the FLASH irradiation group with a comparison group and comparing the FLASH irradiation group with the comparison group; respectively carrying out protein interaction network construction on the three groups of differential expression proteins and a union set of the three groups of differential expression proteins to generate four networks; respectively carrying out module division on the four networks, and screening out important modules; performing subcellular localization analysis on the two groups of differentially expressed proteins; and obtaining the corresponding relation between the result of the subcellular localization analysis and the important module, and completing the quantitative proteomics analysis of the FLASH irradiation tissue. The method can analyze and obtain the biological principle of the reliable FLASH irradiation technology, and explains the FLASH irradiation superior to the conventional irradiation in the aspect of molecular mechanism.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of a quantitative proteomics analysis method of FLASH-irradiated tissues according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a quantitative proteomics analysis method of FLASH irradiated tissues, which is shown in figure 1 and comprises the following steps:
s1, acquiring original data; the raw data includes: protein expression abundance data of FLASH irradiated tissues, protein expression abundance data irradiated by conventional radiotherapy and protein expression abundance data without any irradiation;
s2, preprocessing the original data; carrying out differential expression protein analysis on the preprocessed original data to obtain three groups of differential expression proteins; three groups of differentially expressed proteins include: comparing the FLASH irradiation group with a conventional irradiation group, comparing the FLASH irradiation group with a comparison group and comparing the FLASH irradiation group with the comparison group; the FLASH irradiation group consists of protein expression abundance data of FLASH irradiation tissues; the conventional irradiation group consists of protein expression abundance data irradiated by conventional radiotherapy; the control group consisted of protein expression abundance data without any irradiation;
s3, respectively carrying out protein interaction network construction on the three groups of differential expression proteins and a union set of the three groups of differential expression proteins to generate four networks; respectively carrying out module division on the four networks, and screening out important modules; the screening of the important modules is determined by the number of proteins contained in the modules, the types of the proteins and the number of important proteins related to the action mechanism of the FLASH;
s4, performing subcellular localization analysis on the two groups of differentially expressed proteins of the FLASH irradiation group comparison control group and the conventional irradiation group comparison control group; and obtaining the corresponding relation between the result of the subcellular localization analysis and the important module, and completing the quantitative proteomics analysis of the FLASH irradiation tissue.
This example provides a set of analysis procedures to perform a series of bioinformatic analyses on the protein abundance data of FLASH irradiated tissues, namely: starting from the original data, a series of multi-level and multi-angle biological information analysis is carried out, so that the mechanism of the FLASH irradiation technology for playing the function is verified to a certain extent, and the whole protein expression condition of an organism can be analyzed more completely. The proteome data generated by FLASH irradiation can realize the process from the off-line data to the final annotation result and make an explanation on the biological meaning of the result. Further reveals the action mechanism of FLASH, and finally can be explained from the molecular mechanism as to what FLASH irradiation is superior to conventional irradiation.
The method for FLASH-irradiated tissue-based quantitative proteomics analysis is described in detail below in a specific example:
stage one: raw data processing
Raw data is acquired. Proteomics data of FLASH-irradiated mouse intestinal tissues were obtained from the laboratory, namely: protein expression abundance data. And screening the expression abundance data of the protein to be researched from the original data. This step can be skipped if all of the resulting data needs to be studied. Screening the original expression abundance data of the FLASH irradiation group, the conventional irradiation group and the control group from the original data.
And a second stage: data pre-processing and feature viewing
Preprocessing the screened data, comprising: normalization processing and missing value processing. The normalization process includes: looking up the overall distribution of the screened data, if the data is distributed in a biased state, log processing is carried out on the data, and the read expression data can be subjected to log processing by using a log2() function carried by the R language. Missing value processing includes: removing protein expression abundance data with a deletion value of more than 30% of the total number of samples; for protein expression abundance data with deletion values less than 30%, the deleted fraction was filled with the mean of the protein abundance values of all samples (assays treated under the same conditions) in which it was present.
Viewing the overall characteristics of the preprocessed data, comprising: heat maps, Principal Component Analysis (PCA), and sample correlation analysis. Analysis and visualization was done using the pheatmap, prcomp, and corrplot packages in the R language.
And a third stage: differential expression analysis
Carrying out differential expression protein analysis on the preprocessed data to obtain differential expression protein data, wherein the differential expression protein data comprises three groups, namely: the FLASH irradiation group (F) was compared with the conventional irradiation group (C) [ FC ], the FLASH irradiation group (F) was compared with the control group (N) [ FN ], and the conventional irradiation group (C) was compared with the control group (N) [ CN ]. This was done using the limma difference gene analysis package in the R language. Wherein, 100 differentially expressed proteins are obtained from the FC group, 345 differentially expressed proteins are obtained from the FN group, and 102 differentially expressed proteins are obtained from the CN group.
And checking the intersection condition of the three groups of the data of the differentially expressed proteins. Analysis and visualization were performed using the online tool Venn2.1.0 and the UpSet package in the R language. It was found that 26 proteins intersected with the FN group differentially expressed protein and the FC group differentially expressed protein, 25 proteins intersected with the CN group differentially expressed protein and the FN group differentially expressed protein, and that no protein intersected with the FC group differentially expressed protein and the CN group differentially expressed protein and the three groups differentially expressed proteins.
And a fourth stage: enrichment and network analysis
And respectively carrying out Gene Ontology (GO) function enrichment analysis and KEGG channel enrichment analysis on the three groups of differentially expressed protein data. Enrichment analysis was done using the clusterProfiler package in the R language and visualized using the ggplot2 package.
And respectively constructing a protein interaction network for the three groups of differentially expressed proteins and a union set of the three groups of differentially expressed proteins. And respectively carrying out network construction on the three groups of differentially expressed proteins and a union set of the three groups of differentially expressed proteins by using an online tool and a database STRING, and storing a construction result into the Cytoscape. Four networks are formed: the network established by the FC group differential expression protein consists of 98 points and 76 edges, the network established by the CN group differential expression protein consists of 97 points and 145 edges, the network established by the FN group differential expression protein consists of 340 points and 1838 edges, and the network established by the union set of the three groups of differential expression proteins consists of 434 points and 2520 edges.
And performing topological and biological analysis on the networks constructed by the three groups of differentially expressed proteins, and integrating the topological analysis and the biological analysis. Topology analysis was performed using the analysis Network of Cytoscape, including: analyzing the degrees and the clustering coefficients of the protein nodes; results of online analysis using STRING, including: the results of gene ontology functional analysis, KEGG pathway enrichment analysis, tissue expression and protein domain are used as the results of biological analysis. And integrating the results of the two to make certain explanation as the result of the integrated analysis. However, when the amount of protein is large, the feasibility of the strategy is greatly reduced, because the protein with a higher degree is theoretically increased when the amount of protein is too much, and the protein-protein interaction (PPI) among the proteins with a higher degree needs to be counted, so that the workload is greatly increased, and the strategy is only suitable for the protein interaction network with a smaller scale. Therefore, only the networks constructed by the differentially expressed proteins of the FC group and the CN group are analyzed topologically and biologically, and the integration analysis of the two is integrated.
And (3) visualizing the enrichment analysis results of the three groups of differentially expressed proteins obtained by the STRING online tool. Enrichment analysis results for STRING were also visualized using the R language ggplot2 package. And comparing the visual result with the enrichment analysis visual results of the three groups of differentially expressed proteins which are not subjected to network construction, and checking the difference of the enrichment analysis results before and after the network construction. It was found that the results of the enrichment analysis are essentially identical both before and after the construction of the network.
And carrying out module division on the four constructed networks, and visualizing the divided results. And (3) performing module division on the four constructed networks respectively by using the autonomously developed R packet ne.PCA, namely performing priority analysis on edges and points in the four networks by using an algorithm in the R packet ne.PCA to identify a functional module. And then the Cytoscape is reused to visualize the result of module division, and since the network information directly extracted from STRING is too cumbersome, the network is reconstructed to simplify the result, and only useful information is displayed.
Screening important modules, which can be determined by the number of proteins contained in the module, the type of the proteins and the number of important proteins related to the action mechanism of FLASH researched from the literature. The specific screening process is as follows: first, it is considered important that the number of proteins contained in the module is large in terms of the number. However, because the number of the three groups of differential proteins in the constructed network is different, the number alone is not enough to select all the important modules, and the modules of the network partition constructed by the union of the three groups of differential expression proteins and the modules of the network partition constructed by the three groups of differential expression protein groups have certain similar parts, the important modules are determined by the number of the proteins in the different groups, which are contained in the modules of the network partition constructed by the union, so to speak. Some modules are not dominant in quantity, but contain important proteins relevant to the action mechanism of FLASH researched from the literature, and the module can be considered as an important module.
The number of the proteins contained in the modules determines the important modules of the network established by the union set of the FN group differential expression protein, the FC group differential expression protein and the CN group differential expression protein. The important modules of the FN group are modules numbered 1, 2, and 4, the important modules of the FC group are modules numbered 2 and 4, the important modules of the CN group are modules numbered 1, 2, and 3, and the important modules of the union group are modules numbered 2, 4, 9, and 17. The important modules of the network established by the union of the three groups of differentially expressed proteins were also determined by the type of protein contained in the modules, including modules numbered 12 and 23. The number of important modules contained in the module and researched from the literature and related to the action mechanism of FLASH also confirms that the important modules of the network established by the FN group differential expression protein comprise the module with the number of 6.
Key modules were analyzed for KEGG pathway enrichment and visualized. The KEGG path enrichment analysis is realized by a clusterProfiler program package in the R language, and the visualization of the enrichment analysis result is also realized by using a ggplot2 program package in the R language.
And respectively comparing the module division and the KEGG channel enrichment analysis results of the networks established by the three groups of differentially expressed proteins in a union set, and the module division and the KEGG channel enrichment analysis results of the networks established by the three groups of differentially expressed proteins in a separated manner to determine the functions of the important modules and screen out the important modules with the same functions in the important modules. The important modules with the same functions are finally determined to comprise: modules associated with DNA damage repair: FN group differential expression protein constructed network module 2 and three groups differential expression protein and integrated constructed network module 9; modules related to energy metabolism and neurodegenerative diseases (aging): a module 3 of a network built by CN group differential expression proteins and a module 12 of a network built by a union set of three groups of differential expression proteins; modules associated with heat shock proteins but of unknown function: FN group differential expression protein constructed network module 6 and three groups differential expression protein and integrated constructed network module 2; ribosome-associated but functionally unknown modules: CN group differential expression protein module 1 and three groups of differential expression protein modules 23.
And performing gene ontology function enrichment analysis on the module of which the function cannot be determined only through KEGG channel enrichment analysis, and checking a Biological Process (BP) to assist in determining the function of the module. The gene ontology functional enrichment analysis was done by clusterProfiler package. This is not required if module function can be determined by KEGG pathway enrichment analysis alone. Gene ontology function enrichment analysis is performed on modules related to heat shock proteins but with unknown functions and modules related to ribosomes but with unknown functions, and the modules related to heat shock proteins are found to function as protein folds, while the modules related to ribosomes function as mitochondrial gene expression and maintenance.
PPI (protein interaction) between functionally identical important modules was obtained and visualized. The interaction between the important modules with the same function in the networks established by the three groups of differential expression proteins and the important modules with the same function in the FN group and CN group networks is extracted by using python, and then the result is visualized by using Cytoscape. In a network established by three groups of different expression protein union sets, 5 PPI intersections exist between a DNA damage repair module and an aging module, 26 PPI intersections exist between the DNA damage repair module and a protein folding module, 1 PPI intersection exists between the DNA damage repair module and a mitochondrial gene expression module, 12 PPI intersections exist between the aging module and the protein folding module, 6 PPI intersections exist between the aging module and the mitochondrial gene expression module, and 1 PPI intersection exists between the protein folding module and the mitochondrial gene expression module. In the network constructed by differential expression of proteins in the FN group, there were 25 PPI intersections between the DNA damage repair module and the protein folding module. In the network constructed by the CN group differential expression protein, 8 PPI intersections exist between the senescence module and the mitochondrial gene expression module.
And screening the differential expression proteins in the important modules by three standards of node degrees of the proteins in the built network, the number of enrichment channels and the signal transduction capability of edges to determine the relatively more important differential expression proteins. The number of enrichment paths comprises the number of simple paths and the number of paths related to module functions, and the number is obtained by extracting a result file of KEGG enrichment analysis through python programming; the signaling capability of the edge is defined by the TFCs function in the R language package ne. It is found that in a network built by three groups of differentially expressed proteins, 1 relatively more important differentially expressed protein exists in a DNA damage repair module, 9 relatively more important differentially expressed proteins exist in an aging module, 4 relatively more important differentially expressed proteins exist in a protein folding module, and 5 relatively more important differentially expressed proteins exist in a mitochondrial gene expression module. In the network established by the FN group differential expression proteins, 2 relatively more important proteins exist in the DNA damage repair module, and 4 relatively more important differential expression proteins exist in the protein folding module. In the network established by the CN group differential expression proteins, 5 relatively more important differential expression proteins exist in the senescence module, and 5 relatively more important differential expression proteins exist in the mitochondrial gene expression module.
Various information on differentially expressed proteins of relatively greater importance was investigated using various database resources, including: ID, full name, tissue expression, UniProt ID, and subcellular localization of the corresponding gene. And counting the obtained enriched path information, the up-down regulation relation, the degree information, the TFCs score, the module and the participating PPI. And finally, the data are summarized into a table, so that subsequent query is facilitated.
And combining the network modules which are built by merging the three groups of differentially expressed proteins with the same functions and the network modules which are built by separating the three groups of differentially expressed proteins into a Robust module by using an R language package ne.PCA, namely combining the important modules with the same functions in the four groups of networks into the Robust module. In the analysis process, because the mode of the established network is different, two modules with similar functions in each group are analyzed, and the two modules have slight difference. The Robust module can embody the most prominent function of the group of modules. And finally obtaining 4 Robust modules which are a DNA damage repair module, an aging module, a protein folding module and a mitochondrial gene expression module.
And a fifth stage: subcellular localization analysis
The two groups of differentially expressed proteins, FN and CN, were subjected to subcellular localization analysis. The sub-cellular localization information of these differentially expressed proteins described in the UniProt database was extracted using the python language. Since the study mainly revealed why FLASH irradiation and conventional irradiation had different biological effects, the focus of detailed analysis was mainly among the differentially expressed proteins falling in the FN and CN groups.
And (3) carrying out KEGG channel enrichment analysis on the proteins with the FN group and CN group differentially expressed proteins in the same subcellular localization in the subcellular localization analysis result, and checking the difference of the enrichment channels of the FN group and CN group differentially expressed proteins in the same subcellular localization. KEGG pathway enrichment analysis was done by clusterProfiler package. It was found that the sub-cellular localization of the differentially expressed proteins of the FN group was concentrated on the cytoplasm and nucleus, and that the sub-cellular localization of the differentially expressed proteins of the CN group was concentrated on the inner mitochondrial membrane.
And comparing the subcellular localization analysis result with the important module to find out the corresponding relation between the important module and the subcellular localization. It was found that modules associated with aging had a relatively high degree of agreement with mitochondrial inner membrane localization and modules associated with DNA damage repair had a relatively high degree of agreement with nuclear localization.
And a sixth stage: inherent disorder protein analysis
Unordered proteins recorded by a database are searched in the differential expression proteins of the FN group and the CN group. All recorded disordered proteins are inquired in a database DisProt, and intersection is taken by the disordered proteins and the differential expression proteins of the FN group and the CN group, so that the disordered protein condition can be obtained.
And (3) using an online prediction tool to predict the disorder degree of all the differentially expressed proteins in the FN group and the CN group, and visualizing and comparing the prediction results. Because of the limited data in the database, the prediction is performed using an online tool, which uses two online tools, PrDOS and CSpritz. Visualization was then performed using ggplot2 and the visualization results compared. The differentially expressed proteins of the FN group were found to be slightly more disordered than those of the CN group.
Stage seven: protein classification analysis
The FN group and CN group were classified and visualized and compared for differentially expressed proteins using an online tool. Classifying the differential expression proteins of the FN group and the CN group by using an online tool and a database Panther; visualization is performed using the R language package ggplot 2. The most of the obtained FC group differential expression proteins are nucleotide metabolic proteins and metabolic tautomerase; the most differentially expressed proteins in the CN group are metabolic tautomerases.
Differentially expressed proteins from the Robust module were classified using an online tool and visualized and compared. Classifying the differentially expressed proteins of the Robust module using an online tool and a database, pantoher; visualization is performed using the R language package ggplot 2. To see if the type of differentially expressed protein matches the function of the module. The most common of the DNA damage repair modules was found to be the nucleotide metabolizing protein, followed by the protein modifying enzyme; all proteins in the aging module belong to metabolic tautomeses; the chaperone proteins are the most abundant in the protein folding module, and differentially expressed proteins in the mitochondrial gene expression module all belong to translated proteins. Therefore, the classification result and the function of the module are highly matched.
The quantitative proteomics analysis method of FLASH irradiated tissues provided in this embodiment starts from proteomics data, and proteomics is the core of the evolution of the post-genome era, and is a technology for analyzing the protein composition and the activity rule of an organism, cell or tissue on the whole level. The expression of the protein can be directly observed, and the expression amount can be measured. From proteomics, the method can provide a more complete whole protein expression condition of an organism, thereby providing a more comprehensive understanding of protein level changes caused by various diseases or therapeutic means.
The present embodiment applies the method of module division in the system biology analysis method. For proteomics data, a network is constructed and annotated on the network level, so that more important differentially expressed proteins are found. The module division is divided on the network level, and in the biological evolution process, the interaction of a plurality of protein proteins is relatively conservative, and the determined functions are also relatively conservative. Through module division, a group of differentially expressed proteins with conservative interaction can be obtained, so that the functions of the differentially expressed proteins are known, and finally a certain explanation is made on the mechanisms of the differentially expressed proteins.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A quantitative proteomics analysis method of FLASH irradiated tissues is characterized by comprising the following steps:
s1, acquiring original data; the raw data includes: protein expression abundance data of FLASH irradiated tissues, protein expression abundance data irradiated by conventional radiotherapy and protein expression abundance data without any irradiation;
s2, preprocessing the original data; carrying out differential expression protein analysis on the preprocessed original data to obtain three groups of differential expression proteins; the three groups of differentially expressed proteins include: comparing the FLASH irradiation group with a conventional irradiation group, comparing the FLASH irradiation group with a comparison group and comparing the FLASH irradiation group with the comparison group; the FLASH irradiation group consists of protein expression abundance data of the FLASH irradiation tissue; the conventional irradiation group consists of protein expression abundance data irradiated by the conventional radiotherapy; the control group consists of the protein expression abundance data without any irradiation;
s3, respectively carrying out protein interaction network construction on the three groups of differentially expressed proteins and the union set of the three groups of differentially expressed proteins to generate four networks; respectively carrying out module division on the four networks, and screening out important modules; the screening of the important modules is determined by the number of proteins contained in the modules, the types of the contained proteins and the number of important proteins related to the action mechanism of FLASH;
s4, performing subcellular localization analysis on the two groups of differentially expressed proteins of the FLASH irradiation group comparison control group and the conventional irradiation group comparison control group; and acquiring the corresponding relation between the result of the subcellular localization analysis and the important module, and completing the quantitative proteomics analysis of the FLASH irradiated tissue.
2. The method for quantitative proteomic analysis of FLASH-irradiated tissues of claim 1, wherein said step S2 further comprises:
and respectively carrying out gene ontology function enrichment analysis and KEGG channel enrichment analysis on the three groups of differentially expressed proteins, and visualizing analysis results.
3. The method for quantitative proteomic analysis of FLASH-irradiated tissues of claim 2, wherein said step S3 further comprises:
respectively carrying out topological and biological analysis on the networks constructed by the three groups of differential expression proteins; the topological analysis comprises: analyzing the degrees and the clustering coefficients of the protein nodes; the biological analysis comprises: analyzing the gene ontology function enrichment analysis result, the KEGG channel enrichment analysis result, the tissue expression result and the protein structural domain result;
and visualizing the KEGG channel enrichment analysis result, and comparing the visualized KEGG channel enrichment analysis result with the visualized KEGG channel enrichment analysis result of the three groups of differential expression proteins before network construction to obtain the difference of the KEGG channel enrichment analysis results before and after network construction.
4. The method for quantitative proteomic analysis of FLASH-irradiated tissues of claim 1, wherein said step S3 further comprises:
respectively carrying out KEGG channel enrichment analysis and visualization on the important modules;
determining the functions of the important modules, and screening out the important modules with the same functions;
acquiring and visualizing the protein-protein interaction between the important modules with the same function;
and merging the important modules with the same functions into a Robust module.
5. The method of claim 4 wherein the determining the function of the key module comprises:
respectively comparing the important modules of the network established by the union of the three groups of differential expression proteins and the KEGG channel enrichment analysis results thereof, and the important modules of the network established by the three groups of differential expression proteins and the KEGG channel enrichment analysis results thereof, and determining the functions of the important modules.
6. The method of claim 5 wherein the determining the function of the key module further comprises:
and for the important module with the function which cannot be determined, checking biological processes through gene ontology function enrichment analysis to assist in determining the function of the important module.
7. The method for quantitative proteomic analysis of FLASH-irradiated tissues of claim 1, wherein said step S4 further comprises:
performing KEGG channel enrichment analysis on two groups of proteins with differentially expressed proteins in the same subcellular localization in the result of the subcellular localization analysis; the two groups of differentially expressed proteins were: FLASH irradiation group versus control group, and conventional irradiation group versus control group.
8. The method of claim 1 for quantitative proteomics analysis of FLASH-irradiated tissues, further comprising:
s5, carrying out disordered protein analysis on two groups of differentially expressed proteins of the FLASH irradiation group and the conventional irradiation group compared with the control group.
9. The method of claim 1 for quantitative proteomics analysis of FLASH-irradiated tissues, further comprising:
s6, classifying and visualizing and comparing the differentially expressed proteins of the FLASH irradiation group and the conventional irradiation group with the control group.
10. The method of claim 4 for quantitative proteomics analysis of FLASH-irradiated tissues, further comprising:
s7, classifying and visualizing and comparing the differentially expressed proteins of the Robust module.
CN202210349001.1A 2022-04-01 2022-04-01 Quantitative proteomics analysis method for FLASH irradiated tissue Pending CN114882942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210349001.1A CN114882942A (en) 2022-04-01 2022-04-01 Quantitative proteomics analysis method for FLASH irradiated tissue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210349001.1A CN114882942A (en) 2022-04-01 2022-04-01 Quantitative proteomics analysis method for FLASH irradiated tissue

Publications (1)

Publication Number Publication Date
CN114882942A true CN114882942A (en) 2022-08-09

Family

ID=82670670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210349001.1A Pending CN114882942A (en) 2022-04-01 2022-04-01 Quantitative proteomics analysis method for FLASH irradiated tissue

Country Status (1)

Country Link
CN (1) CN114882942A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884478A (en) * 2023-06-20 2023-10-13 广州金墁利医药科技有限公司 Proteomics data analysis method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116884478A (en) * 2023-06-20 2023-10-13 广州金墁利医药科技有限公司 Proteomics data analysis method, device, electronic equipment and storage medium
CN116884478B (en) * 2023-06-20 2024-01-05 广州金墁利医药科技有限公司 Proteomics data analysis method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Zohora et al. DeepIso: a deep learning model for peptide feature detection from LC-MS map
US20050165594A1 (en) System, method and apparatus for causal implication analysis in biological networks
CN107729721B (en) Metabolite identification and disorder pathway analysis method
Wang et al. GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
US20190180844A1 (en) Method for deep learning-based biomarker discovery with conversion data of genome sequences
CN103810200B (en) The database search method of opened protein matter qualification and system thereof
CN106021984A (en) Whole-exome sequencing data analysis system
Sardiu et al. Topological scoring of protein interaction networks
CN115240762B (en) Multi-scale small molecule virtual screening method and system
US20240055071A1 (en) Artificial intelligence-based compound processing method and apparatus, device, storage medium, and computer program product
CN114882942A (en) Quantitative proteomics analysis method for FLASH irradiated tissue
CN107229842A (en) A kind of three generations's sequencing sequence bearing calibration based on Local map
Dong et al. An accurate de novo algorithm for glycan topology determination from mass spectra
Luo et al. A Caps-UBI model for protein ubiquitination site prediction
CN113380326B (en) Gene expression data analysis method based on PAM clustering algorithm
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism
CN115691702A (en) Compound visual classification method and system
Ucar et al. Effective pre-processing strategies for functional clustering of a protein-protein interactions network
CN114999564A (en) Protein data processing method, device, electronic device and storage medium
Gong et al. Hs-dti: Drug-target interaction prediction based on hierarchical networks and multi-order sequence effect
US20100280759A1 (en) Mass spectrometer output analysis tool for identification of proteins
Yu et al. Identification of core–attachment complexes based on maximal frequent patterns in protein–protein interaction networks
Malard et al. Constrained de novo peptide identification via multi-objective optimization
Papetti et al. Barcode demultiplexing of nanopore sequencing raw signals by unsupervised machine learning
Zaki Predicting Cell Type and Extracting Key Genes using Single Cell Multi-Omics Data and Graph Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination