CN116994653A - Sepsis diagnosis model construction method, compound screening method and electronic equipment - Google Patents

Sepsis diagnosis model construction method, compound screening method and electronic equipment Download PDF

Info

Publication number
CN116994653A
CN116994653A CN202311247147.6A CN202311247147A CN116994653A CN 116994653 A CN116994653 A CN 116994653A CN 202311247147 A CN202311247147 A CN 202311247147A CN 116994653 A CN116994653 A CN 116994653A
Authority
CN
China
Prior art keywords
sepsis
genes
gene
obtaining
diagnostic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311247147.6A
Other languages
Chinese (zh)
Inventor
胡亚惠
郑泽茂
葛静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern Hospital Southern Medical University
Original Assignee
Southern Hospital Southern Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern Hospital Southern Medical University filed Critical Southern Hospital Southern Medical University
Priority to CN202311247147.6A priority Critical patent/CN116994653A/en
Publication of CN116994653A publication Critical patent/CN116994653A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Epidemiology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Computing Systems (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application provides a sepsis diagnosis model construction method, a compound screening method and electronic equipment, wherein the sepsis diagnosis model construction method comprises the following steps: obtaining a sepsis gene expression dataset and an aging gene dataset; analyzing the sepsis gene expression dataset and the senescent gene dataset according to a genobody enrichment analysis and a kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes; analyzing the sepsis related genes by using lasso regression analysis and a support vector machine to obtain sepsis related junction genes; and constructing a sepsis diagnosis model according to multi-factor logistic regression and the sepsis related junction genes. The sepsis diagnosis model construction method can screen out core genes closely related to sepsis development from a large number of genes, and construct a sepsis diagnosis model according to the genes, and the sepsis diagnosis model plays an important role in pathogenesis and clinical manifestation of sepsis.

Description

Sepsis diagnosis model construction method, compound screening method and electronic equipment
Technical Field
The application relates to the technical field of health monitoring, in particular to a sepsis diagnosis model construction method, a compound screening method and electronic equipment.
Background
Sepsis is a severe infection-induced syndrome with profound and fatal effects. Systemic inflammatory response triggered by sepsis may lead to reduced blood pressure, tissue organ damage, and Multiple Organ Dysfunction Syndrome (MODS). Because early symptoms of sepsis are often vague and difficult to diagnose accurately in time, patients often seek treatment when the illness is serious, resulting in delay of treatment time.
The widely used diagnostic standard sepsis-3, although having a certain accuracy, involves multiple system examinations, which require a longer time, possibly resulting in delayed treatment, affecting the prognosis of the patient.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a sepsis diagnostic model construction method, a compound screening method, and an electronic device that can overcome at least one of the above drawbacks.
In a first aspect, an embodiment of the present application provides a method for constructing a sepsis diagnostic model, including: obtaining a sepsis gene expression dataset and an aging gene dataset; analyzing the sepsis gene expression dataset and the senescent gene dataset according to a genobody enrichment analysis and a kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes; analyzing the sepsis related genes by using lasso regression analysis and a support vector machine to obtain sepsis related junction genes; and constructing a sepsis diagnosis model according to multi-factor logistic regression and the sepsis related junction genes.
According to one embodiment of the present application, after the obtaining of the sepsis gene expression data set and the senescent gene data set, further comprising: and processing the sepsis gene expression data set by using a microarray data linear model so as to normalize and normalize an expression matrix of the sepsis gene expression data set.
According to one embodiment of the application, the analyzing the sepsis gene expression data set to obtain sepsis-related genes includes: and analyzing and processing the sepsis gene expression data set according to a weighted gene co-expression network to obtain the sepsis related genes, wherein the sepsis related genes comprise genes positively related to sepsis and genes negatively related to sepsis.
According to one embodiment of the present application, the sepsis diagnostic model construction method further includes: obtaining a sepsis gene expression validation set; validating the sepsis diagnostic model according to the sepsis gene expression validation set.
According to one embodiment of the application, the sepsis associated hub gene comprises: BCL6, ETS1, ETS2, FOS, MAPK14 and MYC.
In a second aspect, embodiments of the present application provide a method for screening a compound, comprising: obtaining main compounds in red sage root; obtaining a targeting protein from the primary compound; obtaining genes corresponding to the target proteins; obtaining a sepsis diagnostic model constructed according to the sepsis diagnostic model construction method of the first aspect; obtaining sepsis related junction genes according to the sepsis diagnostic model; obtaining common genes related to sepsis in the red sage root according to genes corresponding to the targeting proteins and the sepsis related junction genes; obtaining sepsis related compounds related to sepsis in the salvia miltiorrhiza bunge according to the common gene.
According to one embodiment of the application, the common genes include: MYC, FOS, and MAPK14.
According to one embodiment of the application, the sepsis related compound comprises: isoprostol I, wu Ermei acid, dihydro Wu Ermei ketone, danshen new quinone B, danshen new quinone A, 2- (4-hydroxy-3-methoxyphenyl) -5- (3-hydroxypropyl) -7-methoxy-3-benzofurancarbaldehyde, tanshinone IIA and methyl rosmarinic acid.
According to one embodiment of the application, the method for screening a compound further comprises: a compound for treating sepsis is prepared using at least one of the sepsis-related compounds.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the sepsis diagnostic model building method of the first aspect or the compound screening method of the second aspect when executing the instructions.
The sepsis diagnosis model construction method, the compound screening method and the electronic equipment provided by the embodiment of the application can screen core genes closely related to sepsis development from a large number of genes, and construct a sepsis diagnosis model according to the genes, and the sepsis diagnosis model plays an important role in pathogenesis and clinical manifestation of sepsis and has important significance for early diagnosis and prognosis evaluation.
Drawings
Fig. 1 is a flowchart of a sepsis diagnostic model construction method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of parameters of a clustering module according to an embodiment of the application.
FIG. 3a is a schematic diagram of the enrichment analysis of the kyoto gene and genome encyclopedia of genes ontology according to an embodiment of the present application.
FIG. 3b is a schematic diagram of a bulk enrichment analysis of Kyoto genes and genomic encyclopedia enrichment analysis genes according to another embodiment of the present application.
FIG. 4 is a schematic diagram of a protein-protein network according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a core network according to an embodiment of the present application.
Fig. 6 is a schematic illustration of a lasso analysis according to an embodiment of the present application.
Fig. 7 is a wien diagram illustrating an embodiment of the present application.
Fig. 8 is a schematic diagram of a sepsis diagnostic model according to an embodiment of the present application.
FIG. 9 is a flow chart of a method for screening compounds according to an embodiment of the present application.
Fig. 10 is a wien diagram illustrating another embodiment of the present application.
FIG. 11 is a schematic diagram of a compound network according to an embodiment of the present application.
Fig. 12 is a schematic diagram of an electronic device according to an embodiment of the application.
Description of main reference numerals: 20-an electronic device; a 21-processor; 22-memory.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the present application.
It should be noted that, in the embodiments of the present application, "at least one" refers to one or more, and a plurality refers to two or more. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
It should be noted that, in the embodiments of the present application, the terms "first," "second," and the like are used for distinguishing between the descriptions and not necessarily for indicating or implying a relative importance, or for indicating or implying a sequence. Features defining "first", "second" may include one or more of the stated features, either explicitly or implicitly. In describing embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without any inventive effort, are intended to be within the scope of the present application.
Sepsis is a severe infection-induced syndrome with profound and fatal effects. Systemic inflammatory response triggered by sepsis may lead to reduced blood pressure, tissue organ damage, and Multiple Organ Dysfunction Syndrome (MODS). Because early symptoms of sepsis are often vague and difficult to diagnose accurately in time, patients often seek treatment when the illness is serious, resulting in delay of treatment time.
The widely used diagnostic standard sepsis-3, although having a certain accuracy, involves multiple system examinations, which require a longer time, possibly resulting in delayed treatment, affecting the prognosis of the patient.
In view of the above, the application provides a sepsis diagnostic model construction method, a compound screening method and electronic equipment, wherein the sepsis diagnostic model construction method combines a gene expression data set and an aging gene data set, and fully utilizes the correlation between aging and sepsis to improve the prediction and diagnosis accuracy of sepsis. Genes associated with sepsis are identified by analysis of sepsis gene expression datasets, and then junction genes associated with sepsis are determined by enrichment analysis of the genome-noumenon and genome-encyclopedias. These junction genes may play an important role in the development and progress of sepsis, so that when a sepsis diagnosis model is constructed, these key genes are incorporated into the model, and the prediction accuracy and stability of the model are expected to be improved. The sepsis related junction genes are combined with other clinical indexes through a multi-factor logistic regression method, so that a sepsis diagnosis model is built, an effective tool is provided for clinicians, the clinicians are helped to diagnose and treat sepsis earlier, and the survival rate and the treatment effect of patients are improved.
Some embodiments of the application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 1 is a flowchart of a sepsis diagnostic model construction method according to an embodiment of the present application. The sepsis diagnostic model construction method as shown in fig. 1 at least comprises the following steps: s100: obtaining a sepsis gene expression dataset and an aging gene dataset; s200: analyzing the sepsis gene expression dataset and the senescent gene dataset according to the gene ontology enrichment analysis and the kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes; s300: analyzing sepsis related genes by using lasso regression analysis and a support vector machine to obtain sepsis related junction genes; s400: and constructing a sepsis diagnostic model according to multi-factor logistic regression and sepsis related junction genes.
S100: and obtaining sepsis gene expression data sets and aging gene data sets.
In the embodiment of the application, in the sepsis diagnostic model construction method, a sepsis gene expression data set and an aging gene data set are first acquired in step S100.
Specifically, the sepsis diagnostic model construction method provided by the embodiment of the application comprises the steps of obtaining a gene expression data set related to sepsis in a GEO database, wherein the gene expression data set comprises data sets GSE26440, GSE13904 and GSE32707. Wherein, GSE26440 includes 32 control samples and 98 sepsis samples, GSE13904 includes 18 control samples and 158 sepsis samples, and GSE32707 includes 34 control samples and 89 sepsis samples. In an embodiment of the application, data set GSE26440 is used as a training set and data sets GSE13904 and GSE32707 are used as verification sets.
In the embodiment of the application, the sepsis diagnostic model construction method further comprises the step of processing the sepsis gene expression data set by using a microarray data linear model so as to normalize and normalize an expression matrix of the sepsis gene expression data set.
Specifically, a linear model (linear models for microarray data, limma) software package of microarray data in R language may be applied to implement normalization processing and normalization processing for data sets GSE26440, GSE13904, and GSE32707.
It will be appreciated that preprocessing the raw data is an important step, which can improve data quality and accuracy. Therefore, the standardized and normalized processing is carried out on the expression matrix of each data set by using the limma software package, so that the technical variation among different data sets can be eliminated, the consistency and comparability of the data are ensured, and a more reliable basis is provided for subsequent analysis and mining.
It will be appreciated that the sepsis diagnostic model construction method provided by the embodiments of the present application further includes obtaining a senescent gene data set from a database (e.g., cellAge). Wherein the senescence-associated genes dataset included 183 non-tumor senescence-associated genes. It is understood that a non-tumor senescence-associated gene refers to a gene that plays an important role in the senescence process, but is not associated with tumor progression. These genes play a key role in physiological processes such as cell aging and tissue function decline. Because sepsis itself does not involve a tumor, selection of non-tumor senescence-associated genes can better analyze the association of senescence with sepsis, etc.
It is understood that aging is a known risk factor for sepsis. With age, immune cells gradually decrease in function, resulting in a defect in immune response. In particular, in severe infectious diseases such as sepsis, damage to the immune system may result in the body not effectively eliminating the source of infection and thus exacerbating the condition. The aging-related genes are incorporated into the sepsis diagnosis model construction method, so that not only can the prediction accuracy and the risk assessment accuracy be improved, but also the understanding of the sepsis development mechanism can be enhanced, and more personalized medical guidance can be provided for individuals.
S200: the sepsis gene expression dataset and the senescent gene dataset were analyzed according to the genobody enrichment analysis and the kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes.
In the embodiment of the application, the method for constructing the sepsis diagnostic model further comprises the step of analyzing a sepsis gene expression data set and an aging gene data set according to gene ontology enrichment analysis and kyoto gene and genome encyclopedia enrichment analysis to acquire sepsis related genes in step S200.
Specifically, a genetic Co-expression network analysis (WGCNA) package in the R language was used to perform a genetic Co-expression network analysis on the training set GSE26440, and an undirected network was constructed, wherein the undirected network had a topology fitting index (Topological Overlap Measure, TOM) of 0.8 and a soft threshold of 10.
It will be appreciated that the construction of an undirected network is one method in gene co-expression network analysis (WGCNA). In this process, a gene co-expression network is first constructed by calculating the correlation between genes, which can be measured by different correlation indexes.
It will be appreciated that the topological fit index is an indicator in WGCNA that measures similarity between genes. Topology fitting index is a measure of topological similarity between genes by taking into account the pattern of connection of genes in a network. The value range of the topological fitting index is-1 to 1, and the higher the topological fitting index value is, the tighter the connection between genes is, and the stronger the correlation is. Setting the topology fitting index to 0.8 means that highly similar genes are more of a concern when constructing an undirected network.
It will be appreciated that the similarity between nodes in an undirected network is often expressed in terms of weights. The soft threshold is a parameter used to adjust node similarity weights that can control the connection density in the network. Setting the soft threshold to 10 means that relatively strong connections remain when the network is constructed, edges will only be formed between pairs of nodes that have a similarity above the soft threshold.
In an embodiment of the application, gene Ontology enrichment analysis (GO) and kyoto Gene and genome encyclopedia enrichment analysis (Kyoto Encyclopedia of Genes and Genomes, KEGG) pathway enrichment analysis may be performed on senescence-associated genes associated with sepsis by using a cluster analysis tool (cluster profiler) software package in the R language. Among them, GO analysis is mainly used to identify the enrichment of genes in biological processes, cellular components and molecular functions to reveal the important role of these genes in cellular functions. Whereas KEGG pathway analysis is primarily used to identify the collection of genes associated with a particular biological process and signaling pathway.
Referring to fig. 2 together, fig. 2 is a schematic diagram of parameters of a clustering module according to an embodiment of the application.
In the present example, by setting the p-value of the GO analysis to 0.05, and using the enrichment analysis (enrichgo) function, it was identified which gene sets had significant enrichment in aging and sepsis. The results of the GO analysis are then presented using a graphical tool in the R language. For example, a ggplot2 software package in the R language may be used. It will be appreciated that one skilled in the art may select different graphical tools to present the results of the analysis as desired, as the application is not limited in this regard.
In an embodiment of the application, fig. 2 shows the correlation of different modules with sepsis and p-values. Wherein the positive correlation between the genes in the black module and sepsis is highest, 0.54, and p value is. The negative correlation between the genes in the green module and sepsis is highest, and is-0.57, and the p value is +.>
In the embodiment of the application, by analyzing each module in the undirected network, two modules with the highest correlation degree with sepsis, namely a black module and a green module, are obtained in total. The black module is a positive correlation module, and the black module is in maximum positive correlation with sepsis. The green module is a negative correlation module, and the green module is in maximum negative correlation with sepsis. 1953 genes are included in the positive correlation block and 2823 genes are included in the negative correlation block.
Referring to fig. 3a and 3b together, fig. 3a and 3b are schematic diagrams of the kyoto gene and genome encyclopedia enrichment analysis gene ontology enrichment analysis according to an embodiment of the present application.
In embodiments of the present application, KEGG pathway analysis is primarily used to identify gene sets associated with specific biological processes and signaling pathways. The enrichment analysis of the KEGG pathway was performed on the gene set by setting the p-value to 0.05 and using the KEGG enrichment analysis (endrich KEGG) function. As shown in fig. 3a and 3b, the senescence gene dataset intersected by the black module for 22 genes and the green module for 35 genes. That is, 57 sepsis-related genes were obtained in total by KEGG pathway analysis.
It will be appreciated that in embodiments of the application, not only genes associated with sepsis may be identified by KEGG pathway analysis, but these genes may also be correlated with known biological pathways and processes. The key pathways in the pathogenesis of sepsis can be further analyzed, and a basis is provided for constructing a sepsis diagnosis model.
S300, carrying out analysis on sepsis related genes by using lasso regression analysis and a support vector machine so as to obtain sepsis related junction genes.
In the embodiment of the application, the method for constructing the sepsis diagnostic model further includes the step of applying lasso regression analysis and a support vector machine to analyze sepsis related genes to obtain sepsis related junction genes in step S300.
Specifically, in an embodiment of the present application, a protein-protein network is first constructed using an interactive gene/protein search tool (Search Tool for the Retrieval of Interacting Genes/Proteins, sting) network, and the threshold for the high confidence score is set to 0.7. It will be appreciated that the threshold for high confidence score is used to screen which protein interactions in the protein-protein network are considered reliable. In this embodiment, interactions below this threshold will be excluded, but only if the confidence score for interactions between proteins reaches 0.7 or higher, will be included in the constructed PPI network. By setting the threshold, interaction information contained in the constructed PPI network can be ensured to have higher reliability, and subsequent analysis is facilitated.
Referring to fig. 4, fig. 4 is a schematic diagram of a protein-protein network according to an embodiment of the application.
The data in the protein-protein network is then imported into the Cytoscape software, and the core network in the protein-protein network is extracted according to the molecular complex detection (Molecular Complex Detection, MCODE) algorithm.
It will be appreciated that when extracting the core network using the MCODE algorithm, the default setting of the MCODE algorithm may be used, i.e. the threshold for the expected node degree is 2, the threshold for the k-core is 2, and the maximum depth is 100.
Referring to fig. 5, fig. 5 is a schematic diagram of a core network according to an embodiment of the application. As shown in FIG. 5, the total number of genes involved in the core network is 13.
Subsequently, 13 genes in the core network were analyzed using LASSO regression analysis (Least Absolute Shrinkage and Selection Operator, LASSO). Specifically, ten-fold cross-validation was applied to validate lambda values in the lasso analysis.
Referring to fig. 6, fig. 6 is a schematic diagram of lasso analysis according to an embodiment of the present application, and as shown in fig. 6, total number of non-zero coefficient genes is 11.
It will be appreciated that the lasso analysis may automatically make feature selection by adjusting the regularization parameters (lambda values) to bring coefficients of certain features to zero, thereby excluding features that contribute less or are not relevant to the model. Non-zero coefficients refer to coefficients that remain and are not zero after lasso analysis regularization. By preserving non-zero coefficients, lasso analysis can find the most important features, thereby improving the predictive performance of the model.
Subsequently, 13 genes in the core network were analyzed using a support vector machine (Support Vector Machine, SVM) and 8 high-correlation genes were obtained.
Finally, the high-correlation genes screened by the support vector machine are subjected to intersection removal by using a wien diagram tool (Venn Diagram Plotter) and non-zero coefficient genes in lasso regression analysis, so that sepsis related junction genes are obtained.
Referring to fig. 7 together, fig. 7 is a schematic diagram of wien according to an embodiment of the application. As shown in FIG. 7, a total of 6 sepsis-associated hub genes were screened.
In the embodiment of the application, the finally obtained pivot genes related to sepsis are BCL6, ETS1, ETS2, FOS, MAPK14 and MYC.
It is understood that BCL6 is an abbreviation for B cell lymphoma 6 protein, a transcription repressing factor, playing a key role in regulating immune response and cell proliferation. ETS1 and ETS2 are transcription factor family members that play an important role in cell proliferation, differentiation and immune regulation. FOS is a member of the FOS family of transcription factors that are involved in the regulation of the cell cycle and in inflammatory responses. MAPK14 is a mitogen-activated protein kinase 14, a member of the MAPK family, which plays a key role in cell signaling and inflammatory responses. MYC is a transcription factor, and has important effects on cell growth and differentiation by regulating cell proliferation and apoptosis.
S400, constructing a sepsis diagnosis model according to multi-factor logistic regression and sepsis related junction genes.
In the embodiment of the application, the method for constructing the sepsis diagnostic model further comprises the step of constructing the sepsis diagnostic model according to multi-factor logistic regression and sepsis related junction genes in step S400.
Specifically, in embodiments of the present application, a regression modeling strategy (Regression Modeling Strategies, rms) software package in the R language may be used to implement multi-factor logistic regression and ultimately obtain a sepsis diagnostic model.
Referring to fig. 8, fig. 8 is a schematic diagram of a sepsis diagnosis model according to an embodiment of the present application.
Methods of using the sepsis diagnostic model are described below. The first line is the score of each individual item, and the first line in the nomogram is taken as the scale of the score of each individual item, and each scale represents the score of the corresponding individual item. Lines 2 to 7 are names and coefficients of the individual items, and in the alignment chart, lines 2 to 7 list the names of the individual items in order, such as BCL6, ETS1, and the like. And drawing a vertical line upwards from the positioned coefficient scale points until the vertical line is intersected with the first row, wherein the intersection point is the score of the single item. The scores of all the individual items are added to obtain a total score. In the 8 th row, finding a scale point corresponding to the total score, and according to the 9 th row proportional risk corresponding to the scale point, namely the risk that the user suffers from sepsis under the corresponding total score.
In the embodiment of the application, the pivot genes are used as the diagnosis basis of a sepsis diagnosis model, so that the accuracy and the reliability of the model are improved, and more accurate help is provided for early diagnosis and treatment of sepsis patients.
In an embodiment of the present application, the method for constructing a sepsis diagnostic model further includes verifying the effect of the sepsis diagnostic model according to the data verification sets GSE13904 and GSE32707.
FIG. 9 is a schematic flow chart of a compound screening method according to another embodiment of the present application, wherein the compound screening method shown in FIG. 9 at least comprises the following steps: s110: obtaining main compounds in red sage root; s210: obtaining a targeting protein from the primary compound; s310: obtaining genes corresponding to the target proteins; s410: obtaining a sepsis diagnostic model constructed according to a sepsis diagnostic model construction method; s510: obtaining sepsis related junction genes according to a sepsis diagnostic model; s610: obtaining common genes related to sepsis in the red sage root according to genes corresponding to the targeting proteins and the sepsis related junction genes; s710: and obtaining sepsis related compounds related to sepsis in the red sage root according to the common gene.
S110: obtaining main compounds in radix Salviae Miltiorrhizae.
According to the compound screening method provided by the embodiment of the present application, the main compound of the root of red-rooted salvia is first obtained in step S110.
It is understood that Salvia Miltiorrhiza (subject name: salvia miltiorrhiza) is a common herb, also known as red root or root of Salvia Miltiorrhiza, belonging to the Labiatae family. It is one of the important members of Chinese traditional herbal medicine, and has long history and wide application. The active ingredients of Saviae Miltiorrhizae radix have antioxidant and antiinflammatory effects, and can relieve oxidative stress and inflammatory reaction, and be used for preventing and improving cardiovascular diseases and nervous system diseases. In practice, red sage root has a good therapeutic effect on patients with sepsis, but the composition of red sage root is complex, and particularly, it is not clear which compounds have therapeutic effects on sepsis.
In the embodiment of the application, the main component of the red sage root is firstly obtained, and 202 compounds are totally obtained. Then analyzing the main components of the red sage root, and screening 65 compounds with potential effects. Specifically, 65 compounds can be screened from 202 compounds by setting oral bioavailability (Oral Bioavailability, OB) >30 and Drug-like (DL) > 0.18.
It is understood that oral bioavailability refers to the percentage of a drug that is absorbed and produces a biological effect in the body after oral administration. Drug-like refers to whether the chemical structure of a compound is similar to a known drug molecule. Through calculation and screening of oral bioavailability and drug-like properties, compounds with higher oral bioavailability and drug-like properties can be screened, which are more likely to be potential drug candidates.
S210: the target protein is obtained from the primary compound.
According to the compound screening method provided by the embodiment of the application, step S210 further includes obtaining the target protein according to the main compound obtained in step S110.
In the examples of the present application, 354 targeting proteins were obtained in total by analyzing the potential effect targeting proteins of these 65 compounds. Specifically, the compound's corresponding targeting protein can be queried by an interactive gene/protein retrieval tool (Search Tool for the Retrieval of Interacting Genes/Proteins, STRING).
S310: obtaining genes corresponding to the target proteins;
according to the compound screening method provided by the embodiment of the application, the step S310 further comprises the step of further acquiring the corresponding genes according to the targeting proteins acquired in the step S210.
In the embodiment of the application, genes corresponding to the 354 target proteins can be obtained through an interactive gene/protein retrieval tool.
S410: obtaining a sepsis diagnostic model constructed according to a sepsis diagnostic model construction method.
According to the compound screening method provided by the embodiment of the present application, step S410 further includes obtaining a sepsis diagnosis model constructed by the sepsis diagnosis model construction method, and the specific obtaining manner is shown in fig. 1 to 8 and corresponding descriptions thereof, which are not repeated here.
S510: obtaining the sepsis related junction genes according to a sepsis diagnostic model.
According to the compound screening method provided by the embodiment of the present application, step S510 further includes obtaining a sepsis-related junction gene by a sepsis diagnosis model, and the specific obtaining manner is shown in fig. 1 to 8 and corresponding descriptions thereof, which are not repeated here.
S610: and obtaining common genes related to sepsis in the red sage root according to genes corresponding to the targeting proteins and the sepsis related junction genes.
According to the compound screening method provided by the embodiment of the application, the step S610 further comprises the step of obtaining common genes related to sepsis in the red sage root according to genes corresponding to the targeting protein and the sepsis related junction genes.
Specifically, referring to fig. 10 together, fig. 10 is a schematic diagram of wien according to another embodiment of the present application. As shown in FIG. 10, the genes corresponding to the targeting proteins share three common genes in the sepsis-associated junction genes. In the embodiment of the application, 3 common genes (MYC, FOS and MAPK 14) are found in the process of carrying out wien mapping intersection on 354 targeting proteins and 6 sepsis core genes obtained in a sepsis diagnosis model construction method. This suggests that the root of red-rooted salvia may act on sepsis patients through these three genes. These common genes may play a key role in the onset and progression of sepsis, whereas the salvianic compounds may play a therapeutic role by modulating the expression and function of these genes.
S710: and obtaining sepsis related compounds related to sepsis in the red sage root according to the common gene.
According to the compound screening method provided by the embodiment of the application, step S710 further includes obtaining sepsis-related compounds related to sepsis in the root of red-rooted salvia according to the common gene.
In the embodiment of the application, the network diagram of three genes and corresponding compounds is drawn through Cytoscape so as to better understand the relationship between the red sage root compound and sepsis related genes.
It will be appreciated that the network diagram is a graphical representation of the interaction relationship between genes and compounds in the form of nodes (genes and compounds) and edges (interactions). In this network diagram, genes and compounds are represented as nodes, respectively, and the targeting relationship between compounds and genes is represented as edges.
Referring to fig. 11, fig. 11 is a schematic diagram of a compound network according to an embodiment of the application. In the embodiment of the application, nine compounds which are possibly related to the treatment of sepsis in the screened red sage root are totally used. These compounds are respectively: isopropanone I (Isotanshinone I), wu Ermei Acid (Ursolic Acid), dihydro Wu Ermei ketone (dihydroxosothashinone), danshen-neoquinone B (dan-shixinkum B), danshen-neoquinone A (dan-shixinkum a), 2- (4-hydroxy-3-methoxyphenyl) -5- (3-hydroxypropyl) -7-methoxy-3-benzofurancarbaldehyde (2- (4-hydroxy-3-methoxyphenyl) -5- (3-hydroxy-propyl) -7-methoxy-3-benzofurancarboxaldehyde), apigenin, tanshinone IIA (Tanshinone IIA), methyl rosmarinic Acid (Methyl rosmarinate).
In an embodiment of the application, the method of compound screening further comprises the step of preparing a compound for treating sepsis using at least one of the sepsis-related compounds.
It will be appreciated that the presence of these compounds in salvia miltiorrhiza makes salvia miltiorrhiza a potential natural drug source for sepsis treatment. They may play the following roles in sepsis treatment: antiinflammatory, antioxidant, antibacterial, cardiovascular protecting and antitumor effects. These compounds may provide new potential therapeutic directions and drug candidates for the treatment of sepsis.
Fig. 12 is an electronic device 20 according to an embodiment of the present application. As shown in fig. 12, the electronic device 20 includes at least the following: a processor 21 and a memory 22.
In an embodiment of the application, the memory 22 is used for storing instructions executable by the processor 21, the processor 21 being configured to implement a sepsis diagnostic model building method as shown in fig. 1 or a compound screening method as shown in fig. 9 when executing the instructions.
In an embodiment of the application, a computer readable storage medium comprises instructions instructing a device to perform the sepsis diagnostic model building method according to the first aspect. For example, the instructions instruct the device to perform a sepsis diagnostic model building method as shown in fig. 1 or a compound screening method as shown in fig. 9.
The program to be executed in the electronic device 20 according to an embodiment of the present application may be a program (a program for causing a computer to function) for controlling a central processing unit (Central Processing Unit, CPU) or the like to realize the functions of the above-described embodiment according to an aspect of the present application. Information processed by these devices is temporarily stored in a random access Memory (Random Access Memory, RAM) when the processing is performed, and then stored in various ROMs such as a Read Only Memory (Flash ROM) and a Hard Disk Drive (HDD), and Read, corrected, and written by a CPU as necessary.
Note that, a part of the electronic device 20 of the above embodiment may be implemented by a computer. In this case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed.
The term "computer system" as used herein refers to a computer system built into the electronic device 20, and uses hardware including an OS and peripheral devices. The term "computer-readable recording medium" refers to a removable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, and a storage device such as a hard disk incorporated in a computer system.
Also, the "computer-readable recording medium" may include: a medium for dynamically storing a program in a short time, such as a communication line in the case of transmitting the program via a network such as the internet or a communication line such as a telephone line; a medium storing a program for a fixed time, such as a volatile memory in a computer system, which is a server or a client in this case. The program may be a program for realizing a part of the functions described above, or may be a program capable of realizing the functions described above by being combined with a program recorded in a computer system.
The electronic device 20 in the above embodiment may be realized as an aggregate (device group) composed of a plurality of devices. Each device constituting the device group may include a part or all of each function or each functional block of the electronic apparatus 20 according to the above embodiment. The device group may have all the functions or functional blocks of the electronic apparatus 20.
It can be appreciated that the sepsis diagnostic model construction method, the compound screening method and the electronic device 20 provided in the embodiments of the present application combine the gene expression data set and the aging gene data set, and make full use of the association between aging and sepsis to improve the accuracy of sepsis prediction and diagnosis. Genes associated with sepsis are identified by analysis of sepsis gene expression datasets, and then junction genes associated with sepsis are determined by enrichment analysis of the genome-noumenon and genome-encyclopedias. These junction genes may play an important role in the development and progress of sepsis, so that when a sepsis diagnosis model is constructed, these key genes are incorporated into the model, and the prediction accuracy and stability of the model are expected to be improved. The sepsis related junction genes are combined with other clinical indexes through a multi-factor logistic regression method, so that a sepsis diagnosis model is built, an effective tool is provided for clinicians, the clinicians are helped to diagnose and treat sepsis earlier, and the survival rate and the treatment effect of patients are improved.
It will be appreciated by persons skilled in the art that the above embodiments have been provided for the purpose of illustrating the application and are not to be construed as limiting the application, and that suitable modifications and variations of the above embodiments are within the scope of the application as claimed.

Claims (10)

1. The sepsis diagnosis model construction method is characterized by comprising the following steps of:
obtaining a sepsis gene expression dataset and an aging gene dataset;
analyzing the sepsis gene expression dataset and the senescent gene dataset according to a genobody enrichment analysis and a kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes;
analyzing the sepsis related genes by using lasso regression analysis and a support vector machine to obtain sepsis related junction genes;
and constructing a sepsis diagnosis model according to multi-factor logistic regression and the sepsis related junction genes.
2. A method of constructing a diagnostic model of sepsis according to claim 1, further comprising, after the acquiring of the sepsis gene expression data set and the senescent gene data set:
and processing the sepsis gene expression data set by using a microarray data linear model so as to normalize and normalize an expression matrix of the sepsis gene expression data set.
3. A method of constructing a diagnostic model of sepsis according to claim 2, wherein analyzing the sepsis gene expression data set and the senescent gene data set according to a gene ontology enrichment analysis and a kyoto gene and genome encyclopedia enrichment analysis to obtain sepsis-related genes comprises:
and analyzing and processing the sepsis gene expression data set according to a weighted gene co-expression network to obtain the sepsis related genes, wherein the sepsis related genes comprise genes positively related to sepsis and genes negatively related to sepsis.
4. A method of constructing a diagnostic model of sepsis according to claim 3, wherein the method of constructing a diagnostic model of sepsis further comprises:
obtaining a sepsis gene expression validation set;
validating the sepsis diagnostic model according to the sepsis gene expression validation set.
5. A method of constructing a diagnostic model of sepsis according to claim 1, wherein the sepsis-associated junction gene comprises: BCL6, ETS1, ETS2, FOS, MAPK14 and MYC.
6. A method of screening a compound, comprising:
obtaining main compounds in red sage root;
obtaining a targeting protein from the primary compound;
obtaining genes corresponding to the target proteins;
obtaining a sepsis diagnostic model constructed according to the sepsis diagnostic model construction method of any one of claims 1 to 5;
obtaining sepsis related junction genes according to the sepsis diagnostic model;
obtaining common genes related to sepsis in the red sage root according to genes corresponding to the targeting proteins and the sepsis related junction genes;
obtaining sepsis related compounds related to sepsis in the salvia miltiorrhiza bunge according to the common gene.
7. The method of screening compounds according to claim 6, wherein the common gene comprises: MYC, FOS, and MAPK14.
8. A compound screening method according to claim 6, wherein said sepsis-related compound comprises: isoprostol I, wu Ermei acid, dihydro Wu Ermei ketone, danshen new quinone B, danshen new quinone A, 2- (4-hydroxy-3-methoxyphenyl) -5- (3-hydroxypropyl) -7-methoxy-3-benzofurancarbaldehyde, tanshinone IIA and methyl rosmarinic acid.
9. The method of compound screening according to claim 8, wherein the method of compound screening further comprises: a compound for treating sepsis is prepared using at least one of the sepsis-related compounds.
10. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the sepsis diagnostic model building method of any one of claims 1 to 5 or the compound screening method of any one of claims 6 to 8 when executing the instructions.
CN202311247147.6A 2023-09-26 2023-09-26 Sepsis diagnosis model construction method, compound screening method and electronic equipment Pending CN116994653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311247147.6A CN116994653A (en) 2023-09-26 2023-09-26 Sepsis diagnosis model construction method, compound screening method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311247147.6A CN116994653A (en) 2023-09-26 2023-09-26 Sepsis diagnosis model construction method, compound screening method and electronic equipment

Publications (1)

Publication Number Publication Date
CN116994653A true CN116994653A (en) 2023-11-03

Family

ID=88534114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311247147.6A Pending CN116994653A (en) 2023-09-26 2023-09-26 Sepsis diagnosis model construction method, compound screening method and electronic equipment

Country Status (1)

Country Link
CN (1) CN116994653A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106103744A (en) * 2014-02-11 2016-11-09 英国国防部 For predicting the equipment of onset of sepsis, test kit and method
CN110218792A (en) * 2019-05-31 2019-09-10 江苏省肿瘤医院 It is a kind of for breast cancer diagnosis and the marker and its preparation method of prognosis
CN113610845A (en) * 2021-09-09 2021-11-05 汕头大学医学院附属肿瘤医院 Tumor local control prediction model construction method, prediction method and electronic equipment
CN115044665A (en) * 2022-06-08 2022-09-13 中国人民解放军海军军医大学 Application of ARG1 in preparation of sepsis diagnosis, severity judgment or prognosis evaluation reagent or kit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106103744A (en) * 2014-02-11 2016-11-09 英国国防部 For predicting the equipment of onset of sepsis, test kit and method
CN110218792A (en) * 2019-05-31 2019-09-10 江苏省肿瘤医院 It is a kind of for breast cancer diagnosis and the marker and its preparation method of prognosis
CN113610845A (en) * 2021-09-09 2021-11-05 汕头大学医学院附属肿瘤医院 Tumor local control prediction model construction method, prediction method and electronic equipment
CN115044665A (en) * 2022-06-08 2022-09-13 中国人民解放军海军军医大学 Application of ARG1 in preparation of sepsis diagnosis, severity judgment or prognosis evaluation reagent or kit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HE, SHASHA ET AL: "Alterations in the gut microbiome and metabolome profiles of septic mice treated with Shen FuHuang formula", 《FRONTIERS IN MICROBIOLOGY》, vol. 14, pages 1 - 10 *

Similar Documents

Publication Publication Date Title
Li et al. Decoding the genomics of abdominal aortic aneurysm
Bi et al. A fast and accurate method for genome-wide time-to-event data analysis and its application to UK Biobank
Nagarajan et al. Application of computational biology and artificial intelligence technologies in cancer precision drug discovery
Li et al. Network-based approach identified cell cycle genes as predictor of overall survival in lung adenocarcinoma patients
Wang et al. Integrated bioinformatic analysis reveals YWHAB as a novel diagnostic biomarker for idiopathic pulmonary arterial hypertension
Liu et al. Statistical methods for analyzing tissue microarray data
JP2019534506A (en) System and method for medical data mining
Masconi et al. Effects of different missing data imputation techniques on the performance of undiagnosed diabetes risk prediction models in a mixed-ancestry population of South Africa
US10854326B2 (en) Systems and methods for full body circulation and drug concentration prediction
CN110714078B (en) Marker gene for colorectal cancer recurrence prediction in stage II and application thereof
JP7041614B6 (en) Multi-level architecture for pattern recognition in biometric data
van der Lee et al. Artificial intelligence in pharmacology research and practice
CN110289092A (en) The method for improving medical diagnosis on disease using surveyed analyte
CN115862850B (en) Modeling method and device of hepatocellular carcinoma monitoring model based on longitudinal multidimensional data
Shi et al. Predicting two-year quality of life after breast cancer surgery using artificial neural network and linear regression models
RU2632509C1 (en) Method for diagnostics of non-infectious diseases based on statistical methods of data processing
Huie et al. Machine intelligence identifies soluble TNFa as a therapeutic target for spinal cord injury
Khalilimeybodi et al. Context-specific network modeling identifies new crosstalk in β-adrenergic cardiac hypertrophy
JP7124265B2 (en) Biomarker detection method, disease determination method, biomarker detection device, and biomarker detection program
CN116994653A (en) Sepsis diagnosis model construction method, compound screening method and electronic equipment
CN115691751A (en) Traditional Chinese medicine prescription screening method and system based on diagnosis and treatment experience and intelligent learning
Wosiak et al. On integrating clustering and statistical analysis for supporting cardiovascular disease diagnosis
Huang et al. A neural network model to screen feature genes for pancreatic cancer
Blackstone et al. Clinical-pathologic conference: use and choice of statistical methods for the clinical study,“superficial adenocarcinoma of the esophagus”
Pattichis et al. Guest editorial on the special issue on integrating informatics and technology for precision medicine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination