CN107480441B - Modeling method and system for children septic shock prognosis prediction - Google Patents

Modeling method and system for children septic shock prognosis prediction Download PDF

Info

Publication number
CN107480441B
CN107480441B CN201710661510.7A CN201710661510A CN107480441B CN 107480441 B CN107480441 B CN 107480441B CN 201710661510 A CN201710661510 A CN 201710661510A CN 107480441 B CN107480441 B CN 107480441B
Authority
CN
China
Prior art keywords
data
septic shock
children
prognosis
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710661510.7A
Other languages
Chinese (zh)
Other versions
CN107480441A (en
Inventor
方芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201710661510.7A priority Critical patent/CN107480441B/en
Publication of CN107480441A publication Critical patent/CN107480441A/en
Application granted granted Critical
Publication of CN107480441B publication Critical patent/CN107480441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a modeling method and a system for children septic shock prognosis prediction based on a support vector machine. The method is characterized by screening features according to high-throughput data of gene expression after childhood septic shock prognosis, modeling a plurality of screened features by adopting a Support Vector Machine (SVM) algorithm, realizing accurate prediction of the childhood septic shock prognosis, and providing supplement and support of molecular level for clinical prognosis prediction of childhood septic shock.

Description

Modeling method and system for children septic shock prognosis prediction
Technical Field
The invention belongs to the field of bioinformatics, and relates to a modeling method and a system for children septic shock prognosis prediction based on a support vector machine.
Background
Sepsis is an inflammatory disorder with high mortality, and childhood sepsis is also an important cause of death in children worldwide. Septic shock is the most severe type of sepsis, and therefore, it is important to develop a prognostic prediction technique for septic shock in children. At present, scientific researchers mainly adopt a biomarker decision tree model to carry out modeling prediction on septic shock of children. However, the decision tree algorithm may have an over-fitting problem, and correlation among attributes in the data set is ignored, so that the problem of machine learning cannot be solved, and the generalization error rate is greatly increased.
Biomarker data mining and computer simulation are critical to the development of efficient prediction technologies, are good at processing large-scale noisy data with potential value, and are now powerful technical means in a plurality of research fields. Data mining and computer simulation studies of complex diseases were initially conducted based on interrelationships between variables using logistic regression techniques and network visualization techniques. The advent of various high-throughput technologies in recent years has led to the generation of large volumes of data, and the use of various complex systems methods has increased accordingly. The Support Vector Machine (SVM) machine learning algorithm based on the biomarkers can realize the integration of high-dimensional and large-scale data, has the advantages of strong generalization capability and the like, can solve the machine learning problems of small sample size, high dimension, nonlinearity and the like, can reduce the generalization error rate, and does not establish a child septic shock prognosis SVM model based on expression profile data at present.
Disclosure of Invention
Aiming at the problems, the invention provides a modeling method and a system for predicting children septic shock prognosis based on a support vector machine, which are used for carrying out feature screening according to high-throughput data of gene expression of children septic shock prognosis, and modeling a plurality of screened features by adopting a Support Vector Machine (SVM) algorithm, thereby realizing accurate prediction of children septic shock prognosis and providing supplement and support of molecular level for clinical prediction of children septic shock.
In a first aspect, the invention provides a modeling method for children septic shock prognosis prediction based on a support vector machine, which comprises the following steps:
(1) collecting high-throughput data of child septic shock gene Expression in a GEO (Gene Expression Omnibus) data source;
(2) sequentially preprocessing and summarizing the high-flux data to obtain preprocessed data;
(3) screening genes which are abnormally expressed in a death group relative to a survival group from the preprocessed data to obtain an abnormally expressed gene data set with poor prognosis of the child septic shock;
(4) carrying out format conversion on an abnormal expression gene data set with poor prognosis of the children septic shock to form a training biomarker data set;
(5) carrying out feature screening on the training biomarker data set, and selecting a set with the least features which enable the prediction accuracy to reach the highest, namely a feature set for model construction;
(6) and (5) constructing a children septic shock prognosis prediction model by using the feature set and the training biomarker data set in the step (5) and a kernlab program package in the R program by adopting a Support Vector Machine (SVM) algorithm.
The GEO (Gene Expression Omnibus) data source is a public repository for archiving and freely distributing high-throughput gene Expression data submitted by researchers, storing data for about 10 billion individual gene expressions from over 100 organisms, with a website of www.ncbi.nih.gov/geo.
The basic principle of the Support Vector Machine (SVM) algorithm is as follows:
given a training sample set: (x)i,yi),i=1,2,…,N,
Wherein x isi∈RdD is the dimension of the input space, yiE { -1,1} represents the class label, and N is the number of training samples. Then the linear discriminant function general shape of the d-dimensional spaceThe formula is as follows:
f(x)=wx+b,
the equation for the classification plane is:
wx+b=0,
wherein the coefficient w represents the weight vector and b is the threshold.
Finding the optimal classification plane requires that the classification plane can correctly classify all samples, and two types of samples can meet the constraint condition:
yi(wxi+b)≥1,i=1,2,…,N,
at the same time, in order to maximize the generalization ability, it is desirable to maximize the classification interval 2/| w |, i.e., equivalent
Figure GDA0002842627100000031
In the present invention, it is preferable to use a linear indivisible support vector machine algorithm that requires the use of a kernel function K (x)i,xj) And (5) raising the dimension of the low-dimensional vector, so as to find the optimal classification plane in the high-dimensional space. Partial samples may still be inseparable after dimension rising, and relaxation variable xi can be introducedii≧ 0), i ═ 1,2, …, N, and the relaxed classification plane constraint is:
yi(wxi+b)-1+ξi≥0,i=1,2,…,N。
while balancing generalization ability and error classification in
Figure GDA0002842627100000032
Introducing a penalty term:
Figure GDA0002842627100000033
the objective function is converted into:
Figure GDA0002842627100000034
wherein, C is an error penalty factor representing the penalty degree for the error sample point. Then introducing Lagrange function to obtain corresponding optimal classification function:
Figure GDA0002842627100000041
preferably, the step (1) further comprises the steps of screening the high-throughput data, and downloading and extracting screening results;
preferably, the screening is to exclude animal sample data, adult sample data, under-sized sample data and incomplete information data to obtain sample data of septic shock in the child.
Preferably, the animal sample data is animal population sample data other than human;
preferably, the adult sample data is sample data with an age range above 18 years;
preferably, the sample size too small data is sample data of which the total number of samples is less than 30 persons;
preferably, the incomplete information data is sample data that does not include both the alive group and the dead group.
Preferably, the step (1) of collecting high throughput data on septic shock gene expression is:
in the GEO database, keywords "sepsis (namely sepsis)" and/or "septa shock (namely septic shock)" are used for searching to obtain high-throughput data of septic shock gene expression.
Preferably, the background correction in step (2) is performed by using RMA (Robust Multi-chip Average) function in the R program;
preferably, the normalization process is performed using a quantile method;
preferably, Median polish (Median smoothing) is used for data summarization.
Preferably, the screening in the step (3) is performed by using a limma program package in the R program;
preferably, the determination criteria for the abnormally expressed gene in step (3) are:
the absolute value of the logarithm of the fold difference between the expression levels of the death group and the survival group is more than or equal to 0.8, for example, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4 or 1.5, and the values between the above values and above are specific, for brevity and conciseness, the invention is not exhaustive, and the invention is not limited to the specific values included in the ranges, and the genes with corrected P-values of less than 0.05, for example, 0.04, 0.03, 0.02, 0.01 or 0.005, and the values between the above values and below are specific, for brevity and conciseness, the invention is not limited to the specific values included in the ranges.
Preferably, the format in step (4) is converted into a data format for converting the data set of abnormally expressed genes with poor prognosis of children septic shock by Perl program into a data format which is in accordance with the R program for feature selection, for example, the format shown in Table 2, or any other format which can be identified by the R program.
Preferably, the characteristic screening in the step (5) is as follows:
adopting an R program to construct a characteristic sorting coefficient, and removing a characteristic with the minimum sorting coefficient in each iteration to finally obtain the descending sorting of all the characteristics; the set of the least features that maximize the prediction accuracy, i.e., the set of features used for model construction, is selected.
Preferably, constructing the children septic shock prognosis prediction model in the step (6) by using a support vector machine algorithm and a Gaussian kernel function;
wherein, the formula of the Gaussian kernel function is as follows:
Figure GDA0002842627100000051
preferably, the step (6) is to run a support vector machine algorithm according to a data subset belonging to the feature set part in the step (5) in the training biomarker data set, train to obtain a parameter sigma of a gaussian kernel function and a parameter error penalty factor C of the support vector machine, and then construct a child septic shock prognosis prediction model;
preferably, the parameter σ of the gaussian kernel function is 0.05-0.5, and may be, for example, 0.06, 0.07, 0.08, 0.09, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4 or 0.45, and specific point values therebetween, and for reasons of brevity and brevity, the present invention is not exhaustive of the specific point values included in the range, preferably 0.08-0.3, and more preferably 0.11-0.13;
preferably, the error penalty factor C of the support vector machine is 8-15, for example, 8, 9, 10, 11, 12, 13, 14 or 15, and specific point values between the above values, which are limited by space and for brevity, the present invention is not exhaustive, and the range includes specific point values, preferably 9-13, and more preferably 10-11, and the AUC values of the model external test can reach 0.722 within the above optimal parameter range.
In particular, σ and the error penalty factor C are adjusted by the training data set to optimize the prediction of the trained children septic shock prognosis model, so that the values of the two important parameters are varied within a certain range.
In a second aspect, the present invention provides a modeling system for children septic shock prognosis prediction based on a support vector machine, comprising:
(1) a data collection module: high throughput data for the collection of childhood septic shock gene expression within the GEO data source;
(2) a data preprocessing module: the data collection module is connected with the data acquisition module and is used for preprocessing and summarizing the high-flux data to obtain preprocessed data;
(3) a screening module: the data preprocessing module is connected with the data processing module, and genes which are abnormally expressed in a death group relative to a survival group are screened from the preprocessed data to obtain an abnormally expressed gene data set with poor children septic shock prognosis;
(4) the data conversion module: the abnormal expression gene data set is connected with the screening module and used for carrying out format conversion on the abnormal expression gene data set with poor prognosis of the children septic shock to form a training biomarker data set;
(5) a characteristic screening module: the characteristic screening module is connected with the data conversion module and is used for carrying out characteristic screening on the training biomarker data set to select a set of minimum characteristics which enable the prediction accuracy to reach the highest degree, namely a characteristic set used for model construction;
(6) a model building module: and the prediction module is connected with the feature screening module, and a prediction model for the children septic shock prognosis is constructed by using the feature set and the training biomarker data set and using a kernlab program package in the R program by adopting a support vector machine algorithm.
Compared with the prior art, the invention has at least the following beneficial effects:
the modeling method for the children septic shock prognosis prediction based on the support vector machine, provided by the invention, is used for carrying out feature screening according to high-throughput data of the children septic shock prognosis gene expression, and modeling a plurality of screened features by adopting a Support Vector Machine (SVM) algorithm, so that accurate prediction of the children septic shock prognosis is realized, and the supplementation and support of the molecular level are provided for the clinical prognosis prediction of the children septic shock.
Drawings
FIG. 1 is a process diagram of the modeling method for the children septic shock prognosis prediction based on the support vector machine of the invention.
FIG. 2 is a graph showing the results of feature screening in example 1.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It is further noted that, for the sake of convenience in this description, the drawings show only some of the results relevant to the present invention and not all of them.
Example 1
The embodiment provides a technical solution of a modeling method for children septic shock prognosis prediction based on a support vector machine, the modeling method provided by the embodiment can be executed by a modeling device, the device is integrated in a computer device, and the method specifically includes the following steps, and the flow is shown in fig. 1:
(1) selecting a data source: a GEO (Gene Expression Omnibus) database is selected as a data source.
(2) Searching a data source: and searching a GEO data source by using keywords 'sepsis' and 'septa shock' and collecting septic shock gene expression high-throughput data.
(3) And (3) screening search results: the original search results were further screened to exclude adult samples older than 18 years, under-sized samples less than 30, and incomplete samples that did not provide information for the complete sample population (i.e., not including both the surviving and dead groups).
(4) And (4) carrying out data downloading and extraction on the screening result (the data is at a probe level), and inputting the data into the R program.
(5) Preprocessing the extracted data: the preprocessing is performed using RMA (multi-array logarithmic robust algorithm) function in the R program, specifically, the background correction is performed using RMA method, the normalization is performed using quantile method, and the summarization method uses mediapolish (median smoothing).
The algorithm for carrying out standardization (Quantum Normalization) processing by using a Quantile method is mainly divided into three steps:
a) sorting the data points of each chip;
b) calculating the average value of all chip data at the same position, and replacing the expression quantity of the gene at the position with the average value;
c) each gene is reduced to its own position.
(6) Converting the processed probe level data into gene level data, and specifically comprising the following steps:
d) according to the corresponding file of the probe of the corresponding chip technology platform of the original data of the expression profile and each gene, corresponding the data of the probe level with the gene;
e) deleting the data rows of the genes corresponding to one-to-many probes and not corresponding to the probes;
f) in the case where a plurality of probes correspond to the same gene, the average value is taken as the expression level of the gene.
(7) Screening of differentially expressed genes: by utilizing the limma program package of the R program, the genes of which the logarithmic absolute value of the expression quantity difference multiple of the death group and the survival group is more than or equal to 0.8 and the P value after the False Discovery Rate (FDR) correction is less than 0.05 are judged as the abnormally expressed genes, and an abnormally expressed gene data set with poor children septic shock prognosis is obtained after summarizing, wherein the data is shown in Table 1. It is specifically noted that since there are a total of twenty thousand genes, not all of them are listed, and only 5 genes that are abnormally expressed are exemplified here.
TABLE 1 examples of abnormally expressed Gene data
Gene Logarithm of fold difference in expression amount False Discovery Rate (FDR) corrected P value
Gene 1 1.717146732 0.029147425
Gene 2 1.358191894 0.035863019
Gene 3 1.283283163 0.002649534
Gene 4 -0.84291548 0.015277801
Gene 5 -0.837903188 0.022307329
(8) Training the format conversion of the biomarker data set: and (3) performing format conversion on the abnormal expression gene data set with poor prognosis of the children septic shock by using a Perl program to meet the data format (shown in table 2) required by the R program for feature selection, wherein the data set subjected to format conversion is the training biomarker data set.
Table 2 data format example
Figure GDA0002842627100000091
Figure GDA0002842627100000101
(9) And (3) feature screening: performing feature screening by adopting an R program according to a training biomarker data set; constructing feature sorting coefficients (namely feature importance sorting coefficients), and removing a feature with the minimum sorting coefficient in each iteration to finally obtain the descending sorting of all the features; the set of the least features where the prediction accuracy is highest, i.e. the set of features used for model construction, is selected. The Feature selection process of this embodiment uses Recursive Feature Elimination (RFE), and its main idea is to repeatedly construct a model, select and exclude the worst features, and then repeat this process on the remaining features, where the order in which the features are eliminated is the Feature ordering. This is therefore an algorithm to find the optimal feature subset. When the RFE is adopted for feature selection, all N features are included in the model, the performance and feature importance ranking of the model is calculated, the most important N-1 features are reserved, the performance is modeled and calculated again, and iteration is repeated in this way to find out a proper feature subset. In the above process of this embodiment, a random forest algorithm is used to perform model construction, performance evaluation, and feature importance ranking on each iteration.
The screening results are shown in FIG. 2, and it can be seen that the accuracy can be maximized by selecting at least 11 genes, so that the 11 genes (the specific gene Entrez ID numbers and gene names are shown in Table 3) are selected as a feature set for the subsequent model construction.
TABLE 3
Entrez ID number of Gene Name of Gene
54541 DDIT4
5553 PRG2
10875 FGL2
55701 ARHGEF40
5168 ENPP2
100133941 CD24
84419 C15orf48
5657 PRTN3
2867 FFAR2
401233 HTATSF1P2
7045 TGFBI
(10) Constructing an early warning model: according to a data subset which belongs to 11 gene feature set parts in a training biomarker data set, a kernel lab module of an R program is used for operating a Support Vector Machine (SVM) algorithm in machine learning, a parameter sigma of a Gaussian kernel function is obtained by training and is 0.11, a wrong penalty factor C is 10, and then a children septic shock prognosis prediction model is constructed.
In the invention, an external verification method is adopted to respectively select an error penalty factor C and a parameter sigma of a Gaussian kernel function, and the specific steps are as follows:
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50 and 100 are respectively selected as test values of the error penalty factor C, 0.01, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.2, 0.3, 0.4, 0.5 and 1 are respectively selected as test values of the parameter sigma, and model construction is respectively performed according to the combination of the test values to obtain a series of test models.
And then, selecting an independent sample data set with other known results to perform performance verification on the constructed test model, drawing a Receiver Operating Characteristic (ROC) Curve, checking the prediction effect of the test model according to an evaluation parameter of Area under the Curve (AUC), and selecting an error penalty factor C and a parameter sigma applied to the test model with the maximum AUC value.
Example 210X Cross test
The 10 × cross test is to divide the training biomarker data set into 10 parts after random rearrangement, use 1 part of them as test data, use the other 9 parts as training data, after circulating 10 times in this way (i.e. each part is used as test data once), arrange the test results of each time and take the mean value to obtain the final test result for the model performance.
Finally, the cross inspection error of the model is 0.128571, and the error of repeated times is stabilized between 0.10 and 0.15, so that the error of the method is close to that of other models, and the method has certain reliability.
In addition, besides cross inspection, which is an internal inspection, external inspection is also performed, namely, other independent sample data sets are selected to verify the performance of the model in the invention, and the specific steps are as follows:
firstly, an independent sample data set for verification is abbreviated as an external test data set, and the external test data set is predicted by using a constructed model to obtain a model prediction prognosis result of the test data set.
Secondly, after the model prediction result is obtained, the prediction result is compared with the actual prognosis condition (namely the clinical sample result) to obtain a confusion matrix between the prediction result and the actual prognosis condition, and the format example of the confusion matrix is shown in a table 4. Where a is the number of actual positive samples predicted as positive samples, b is the number of actual positive samples predicted as negative samples, c is the number of actual negative samples predicted as positive samples, and d is the number of actual negative samples predicted as negative samples.
TABLE 4
Figure GDA0002842627100000131
Thirdly, drawing a Receiver Operating Characteristic (ROC) Curve according to the confusion matrix, and testing the prediction effect of the model according to an evaluation parameter, namely Area under the Curve (AUC).
Auc (area dark) is the area under the Receiver Operating Characteristic (ROC) curve, typically between 0.5 and 1. The AUC value is a probability value, and when a positive sample and a negative sample are randomly selected from all samples, the constructed classification model is used for calculating the probability that the positive sample is arranged in front of the negative sample, namely the AUC value. The larger the AUC value, the more likely the model will rank positive samples ahead of negative samples, enabling better classification. Therefore, the AUC can be used as a numerical value to intuitively evaluate the prediction performance of the model, and the larger the value, the better the value.
Thus, if the samples are classified completely randomly, the AUC should be close to 0.5. AUC <0.5 does not correspond to the real case and occurs rarely in practice. In the case of AUC >0.5, the closer the AUC is to 1, indicating the better the predictive performance of the model. The AUC value of the external test is 0.722 and is between 0.7 and 0.8, and compared with performance indexes of prediction models of other diseases, the performance of the method is close to or better than that of the method, and the model constructed by the method is acceptable in prediction capability of the prognosis of the septic shock of children.
In conclusion, the support vector machine-based modeling method for the children septic shock prognosis prediction carries out feature screening according to high-throughput data of the children septic shock prognosis gene expression, models a plurality of screened features by adopting a Support Vector Machine (SVM) algorithm, realizes accurate prediction of the children septic shock prognosis, and provides supplement and support of molecular level for clinical prognosis prediction of the children septic shock.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (3)

1. A modeling system for children septic shock prognosis prediction based on a support vector machine is characterized by comprising:
(1) a data collection module: high throughput data for the collection of septic shock gene expression within the GEO data source;
(2) a data preprocessing module: the data collection module is connected with the data acquisition module and is used for preprocessing and summarizing the high-flux data to obtain preprocessed data;
(3) a screening module: the data preprocessing module is connected with the data processing module, and genes which are abnormally expressed in a death group relative to a survival group are screened from the preprocessed data to obtain an abnormally expressed gene data set with poor children septic shock prognosis;
(4) the data conversion module: the screening module is connected with the abnormal expression gene data set for format conversion of the children septic shock poor prognosis abnormal expression gene data set to form a training biomarker data set, and the format conversion module converts the abnormal expression gene data set for children septic shock poor prognosis into a data format which is in accordance with an R program for feature selection through a Perl program;
(5) a characteristic screening module: the characteristic sorting coefficient is constructed by adopting an R program, one characteristic with the minimum sorting coefficient is removed from each iteration of the training biomarker data set, the descending sorting of all the characteristics is finally obtained, and a set with the minimum characteristics, namely a characteristic set for model construction, which enables the prediction accuracy of the children septic shock prognosis to be highest is selected;
(6) a model building module: the characteristic screening module is connected with the characteristic screening module, the characteristic set and the training biomarker data set are used, a support vector machine algorithm is adopted, a kernlab program package in an R program is used for constructing a children septic shock prognosis prediction model, the support vector machine algorithm is operated according to a data subset belonging to the characteristic set part in the training biomarker data set, a parameter sigma of a Gaussian kernel function and a wrong penalty factor C of the support vector machine are obtained through training, and then a children septic shock prognosis prediction model is constructed; the parameter sigma of the Gaussian kernel function is 0.05-0.5, and the error penalty factor C of the support vector machine is 8-15.
2. The system of claim 1, wherein the parameter σ of the gaussian kernel function is 0.08-0.3, and the error penalty factor C of the support vector machine is 9-13.
3. The system of claim 2, wherein the parameter σ of the gaussian kernel function is 0.11-0.13, and the error penalty factor C of the support vector machine is 10-11.
CN201710661510.7A 2017-08-04 2017-08-04 Modeling method and system for children septic shock prognosis prediction Active CN107480441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710661510.7A CN107480441B (en) 2017-08-04 2017-08-04 Modeling method and system for children septic shock prognosis prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710661510.7A CN107480441B (en) 2017-08-04 2017-08-04 Modeling method and system for children septic shock prognosis prediction

Publications (2)

Publication Number Publication Date
CN107480441A CN107480441A (en) 2017-12-15
CN107480441B true CN107480441B (en) 2021-02-09

Family

ID=60597607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710661510.7A Active CN107480441B (en) 2017-08-04 2017-08-04 Modeling method and system for children septic shock prognosis prediction

Country Status (1)

Country Link
CN (1) CN107480441B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615555A (en) * 2018-04-26 2018-10-02 山东师范大学 Colorectal cancer prediction technique and device based on marker gene and mixed kernel function SVM
CN109585011A (en) * 2018-10-26 2019-04-05 朱海燕 The Illnesses Diagnoses method and machine readable storage medium of chest pain patients
CN115081749A (en) * 2022-07-28 2022-09-20 华中科技大学 Bayesian optimization LSTM-based shield tunneling load advanced prediction method and system
CN116580847B (en) * 2023-07-14 2023-11-28 天津医科大学总医院 Method and system for predicting prognosis of septic shock

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104204808A (en) * 2011-11-14 2014-12-10 耶拿大学附属医院 Diagnosis of sepsis and systemic inflammatory response syndrome

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243296A (en) * 2015-09-28 2016-01-13 丽水学院 Tumor feature gene selection method combining mRNA and microRNA expression profile chips

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104204808A (en) * 2011-11-14 2014-12-10 耶拿大学附属医院 Diagnosis of sepsis and systemic inflammatory response syndrome

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Biomarker discovery and development in pediatric critical care medicine;Jennifer M et al;《Pediatr Crit Care Med》;20111231;第12卷(第2期);第165-173页 *
重症疾病预警与分级技术研究;傅筱;《中国优秀硕士学位论文全文数据库信息科技辑》;20131015(第10期);第I140-200页 *

Also Published As

Publication number Publication date
CN107480441A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107480441B (en) Modeling method and system for children septic shock prognosis prediction
Zuber et al. High-dimensional regression and variable selection using CAR scores
CN104881706B (en) A kind of power-system short-term load forecasting method based on big data technology
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
CN110659207A (en) Heterogeneous cross-project software defect prediction method based on nuclear spectrum mapping migration integration
CN112927757B (en) Gastric cancer biomarker identification method based on gene expression and DNA methylation data
Cui et al. Learning global pairwise interactions with Bayesian neural networks
CN112633601A (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
CN105740653A (en) Redundancy removal feature selection method LLRFC score+ based on LLRFC and correlation analysis
CN110633371A (en) Log classification method and system
CN111798935A (en) Universal compound structure-property correlation prediction method based on neural network
Solorio-Fernández et al. A systematic evaluation of filter Unsupervised Feature Selection methods
CN107480426B (en) Self-iteration medical record file clustering analysis system
Chen et al. Wafer map failure pattern recognition based on deep convolutional neural network
CN116564409A (en) Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer
CN111309577A (en) Spark-oriented batch processing application execution time prediction model construction method
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
Poolsawad et al. Feature selection approaches with missing values handling for data mining-a case study of heart failure dataset
CN110879821A (en) Method, device, equipment and storage medium for generating rating card model derivative label
Zhang et al. A hierarchical feature selection model using clustering and recursive elimination methods
CN113838519A (en) Gene selection method and system based on adaptive gene interaction regularization elastic network model
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
CN111127184B (en) Distributed combined credit evaluation method
Cudic et al. Prediction of sorghum bicolor genotype from in-situ images using autoencoder-identified SNPs
CN113971984A (en) Classification model construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant