CN112992377A - Method, device, terminal and storage medium for generating drug treatment result prediction model - Google Patents

Method, device, terminal and storage medium for generating drug treatment result prediction model Download PDF

Info

Publication number
CN112992377A
CN112992377A CN202110234102.XA CN202110234102A CN112992377A CN 112992377 A CN112992377 A CN 112992377A CN 202110234102 A CN202110234102 A CN 202110234102A CN 112992377 A CN112992377 A CN 112992377A
Authority
CN
China
Prior art keywords
training data
drug treatment
clinical
prediction model
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110234102.XA
Other languages
Chinese (zh)
Inventor
赵霞
胡湛棋
廖建湘
赵彩蕾
段婧
袁碧霞
叶园珍
操德智
朱凤军
姚一
曾洪武
李德发
干芸根
王海峰
苏适
杨俊�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Childrens Hospital
Original Assignee
Shenzhen Childrens Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Childrens Hospital filed Critical Shenzhen Childrens Hospital
Priority to CN202110234102.XA priority Critical patent/CN112992377A/en
Publication of CN112992377A publication Critical patent/CN112992377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a method, a device, a terminal and a storage medium for generating a drug treatment result prediction model. The method comprises the following steps: acquiring clinical data of a plurality of patients, and generating at least one first training data set according to the clinical data of the plurality of patients, wherein each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result; constructing a plurality of initial models according to at least one machine learning algorithm, and training the initial models according to the first training data sets to obtain a plurality of models to be selected; and determining a drug treatment result prediction model according to the test results of the plurality of candidate models. According to the invention, the machine learning model for predicting the drug treatment result more accurately can be generated, so that the drug treatment result of the patient can be predicted through the drug treatment result prediction model to determine whether the patient is resistant, and the time for identifying the drug-resistant patient is shortened.

Description

Method, device, terminal and storage medium for generating drug treatment result prediction model
Technical Field
The invention relates to the technical field of medical treatment, in particular to a method, a device, a terminal and a storage medium for generating a drug treatment result prediction model.
Background
Tuberous sclerosis is an autosomal dominant hereditary disease caused by gene mutation, most patients with the tuberous sclerosis have epileptic seizures, epilepsy is one of symptoms which influence the quality of life most in the manifestation of many symptoms of the tuberous sclerosis, the main treatment method of the epilepsy is antiepileptic, however, many patients with the epilepsy are drug-resistant, the early identification of patients who are ineffective in drug treatment is very important at present, the drug resistance of the patients can be found only if the patients do not have the effect of reusing the drugs for a long time, and the process needs a long time.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
Aiming at the defects in the prior art, a method, a device, a terminal and a storage medium for generating a drug treatment result prediction model are provided, and the problem that the time consumption for identifying drug-resistant patients in the prior art is long is solved.
In a first aspect of the present invention, a method for generating a model for predicting the outcome of a drug treatment is provided, which includes:
acquiring clinical data of a plurality of patients, and generating at least one first training data set according to the clinical data of the plurality of patients, wherein each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result;
constructing a plurality of initial models according to at least one machine learning algorithm, and training the initial models according to the first training data sets to obtain a plurality of models to be selected;
and determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
The method for generating a model for predicting the outcome of a drug treatment, wherein the classes of the sample clinical features in the training data of each of the training data sets are consistent, and the generating at least one first training data set according to the clinical data of the plurality of patients comprises:
extracting a plurality of feature classes from the clinical data of the plurality of patients;
performing feature selection on the plurality of feature classes by using at least one preset feature selection method to determine classes of sample clinical features in the at least one first training data set;
constructing the first training data set according to the category of the sample clinical features.
The method for generating a model for predicting the outcome of drug treatment, wherein the performing feature selection on the plurality of feature classes by using at least one preset feature selection method to determine the class of the sample clinical features in the at least one first training data set, comprises:
and selecting a preset number of characteristic categories as the categories of the sample clinical characteristics in the target first training data set by adopting a target preset characteristic selection method for the plurality of special categories.
The method for generating the drug treatment result prediction model comprises at least one of analysis of variance test, chi-square test and mutual information.
The method for generating the drug treatment result prediction model comprises at least one of a decision tree, a random forest, a support vector machine, naive Bayes, logistic regression and a multi-layer perception machine.
The method for generating the drug treatment result prediction model, wherein the determining the drug treatment result prediction model according to the test results of the plurality of candidate models, comprises:
acquiring a receiver working characteristic curve of each model to be selected;
acquiring the model to be selected with the highest area under the curve of the receiver working characteristic curve as a target model;
training the target model according to a second training data set to generate the drug treatment result prediction model;
the second training data set comprises a plurality of groups of training data, the sample clinical feature category in each group of training data is consistent with the sample clinical special category in the first training data set corresponding to the target model, and the number of the training data sets in the second training data set is larger than that of the training data sets in the first training data set.
The method for generating a prediction model of drug treatment outcome, wherein after determining the prediction model of drug treatment outcome from the plurality of candidate models, the method further comprises:
acquiring clinical data of a target patient, and extracting clinical features of the target patient from the clinical data of the target patient;
inputting the clinical characteristics into a trained drug treatment result prediction model generation model, and determining a drug treatment prediction result of the target patient through the drug treatment result prediction model generation model;
wherein the feature classes of the clinical features of the target patient are consistent with the particular classes of the sample clinical features in the training dataset used in training the medication outcome prediction model.
In a second aspect of the present invention, there is provided a medication result prediction model generation apparatus, including:
the training data generating module is used for acquiring clinical data of a plurality of patients and generating at least one first training data set according to the clinical data of the plurality of patients, each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result;
the training module is used for constructing a plurality of initial models according to at least one machine learning algorithm and respectively training the initial models according to each first training data set to obtain a plurality of models to be selected;
and the determining module is used for determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
In a third aspect of the present invention, a terminal is provided, which includes: the system comprises a processor and a storage medium in communication with the processor, wherein the storage medium is adapted to store a plurality of instructions, and the processor is adapted to call the instructions in the storage medium to execute the steps of implementing the method for generating a model for predicting the outcome of a drug therapy according to any one of the above methods.
In a fourth aspect of the present invention, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, which are executable by one or more processors to implement the steps of the method for generating a model for predicting the outcome of a drug therapy according to any one of the above methods.
Has the advantages that: compared with the prior art, the invention provides a method, a device, a terminal and a storage medium for generating a drug treatment result prediction model, which are used for extracting different types of features of the existing clinical data of a patient, constructing different initial models by adopting different machine learning algorithms, selecting the drug treatment result prediction model finally used for predicting the drug treatment result from the models obtained by training the sample features of different types, and generating the machine learning model for more accurately predicting the drug treatment result, so that the drug treatment result of the patient can be predicted through the drug treatment result prediction model according to the features extracted from the clinical data of the patient to determine whether the patient is resistant, and the time for identifying the resistant patient is shortened.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for generating a model for predicting the outcome of a drug treatment provided by the present invention;
FIG. 2 is a logic diagram of a process for generating and using a medication outcome prediction model under review in an embodiment of a medication outcome prediction model generation method provided by the present invention;
FIG. 3 is a statistical graph of the area under the curve of the receiver operating characteristic curve of each candidate model in an embodiment of the method for generating a model for predicting the outcome of medication provided by the present invention;
FIG. 4 is a schematic diagram of a receiver operating characteristic curve of a target model in an embodiment of a method for generating a model for predicting a medication outcome provided by the present invention;
FIG. 5 is a schematic structural diagram of an embodiment of a model generation apparatus for predicting the outcome of a drug treatment provided by the present invention;
fig. 6 is a schematic structural diagram of an embodiment of a terminal provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method for generating the drug treatment result prediction model provided by the invention can be applied to terminals, and the terminals can be but are not limited to various personal computers, notebook computers, mobile phones, tablet computers and the like.
Example one
As shown in fig. 1, the method for generating a model for predicting the outcome of a drug therapy provided by the present invention comprises the steps of:
s100, obtaining clinical data of a plurality of patients, and generating at least one first training data set according to the clinical data of the plurality of patients, wherein each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result.
The present invention generates a medication outcome prediction model based on a supervised machine learning approach that learns a mapping from input to output based on existing input-output data pairs, one input-output pair being representable as a dyad (x, y), referred to as a training example, where x is input and y is output. The plurality of training examples constitute a training set. The supervised learning method derives a function f: x → y through a training set. This function may also be input to x' that is not in the training set. Assume that the correct output for input x 'is y'. In the most ideal case, obtained after inputting x' to function f
Figure BDA0002959974770000051
Equal to the correct label, i.e.
Figure BDA0002959974770000052
In the supervised learning method, the type of features input in a training set, the type and parameters of a machine learning algorithm, and the like directly influence the prediction effect of a generated model, in this embodiment, the types of features included in existing clinical data are selected through different feature selection modes to generate training data sets including features of different types, and the types of sample clinical features in the training data of each training data set are consistent. In particular, said method according to said plurality of patientsThe clinical data generates at least one first training data set comprising:
s110, extracting a plurality of feature categories from clinical data of the patients;
s120, performing feature selection on the feature categories by adopting at least one preset feature selection method to determine the categories of the sample clinical features in the at least one first training data set;
s130, constructing the first training data set according to the category of the sample clinical features.
Specifically, the clinical data of the patient includes personal information, medical history data, genetic data, MR image data, CT image data, etc. of the patient, and each feature is obtained by converting each item of data into a numerical value, that is, the feature category is a data category, for example, the feature category may include: after clinical data of a plurality of patients are obtained, preprocessing is carried out on the data, when the drug resistance of epilepsy is predicted, patient data without epilepsy and patient data with epilepsy but not only with drug treatment are removed, and date, name, birth date and other information irrelevant to the task are removed from the data. In practical applications, there may be data missing, and for the missing data, there is a default value, and the missing data may be supplemented by default value filling, for example, the number of lesions may default to 0, for those that are not detailed or not examined, a continuous value (such as age, etc.) may be filled with a median, and a discrete value (such as gender, etc.) may be filled with a mode. The treatment results in the patient data are stored separately as target values. After the preprocessed data are converted into numerical values, a feature vector with the length of m is formed for the feature value of each patient, wherein m is the number of feature types, for example, the feature vector of the ith patient
Figure BDA0002959974770000061
Of a first value v1Representing sex, second value v2Representing the waiting age of onset, all the feature vectors of n patients are formed into an m multiplied by n feature matrix Xm×n=[x1,x2,...xn]And then, the first and second image data are displayed,mixing Xm×nViewed as m column vectors
Figure BDA0002959974770000062
In a possible implementation manner, in order to facilitate data processing, a normalization operation is further performed on each feature value, and a specific formula of the normalization operation is as follows:
Figure BDA0002959974770000063
wherein: ═ denotes assignment, max (f)i) Expression vector fiMaximum value of (d), min (f)i) Expression vector fiMinimum value of (1).
After processing each feature value, performing feature selection by using at least one preset feature selection method, specifically, the preset feature selection method includes at least one of analysis of variance test, chi-square test and mutual information, and performing feature selection on the plurality of feature classes by using at least one preset feature selection method to determine classes of sample clinical features in the at least one first training data set, including:
and selecting a preset number of characteristic categories as the categories of the sample clinical characteristics in the target first training data set by adopting a target preset characteristic selection method for the plurality of special categories.
Processing the clinical data of the patients to obtain a plurality of feature categories and feature values under the feature categories, selecting a preset number of feature categories from the feature categories by adopting at least one preset feature selection method, and changing the size of a feature matrix X from m × n to k × n after selection, wherein k is the preset number. The preset number may be multiple, for example, 20, 25, 30, and the like, for example, when feature selection is performed by using an analysis of variance test feature selection method, the first 20, the first 25, and the first 30 feature categories are respectively selected, so that 3 feature matrices with sizes of nx20, nx25, and nx30 can be obtained, three first training data sets can be generated, each first training data set includes n groups of training data, the number of features of sample clinical features in each group of training data is 20, 25, and 30, and each group of training data includes a drug therapy result (whether drug resistance is present) corresponding to the sample clinical features in the data.
It is obvious that, according to the above method, a plurality of first training data sets can be constructed, and since in this embodiment, the first training data sets are used to preliminarily determine the prediction capability of the drug treatment result of the model to preliminarily perform model selection, the number of training data sets in the first training data sets can be set to be smaller, and after the model is selected, further training is performed according to a second training data set having more training data sets, which will be described later in detail.
Referring to fig. 1 again, the method for generating a prediction model of drug treatment outcome further includes the following steps:
s200, constructing a plurality of initial models according to at least one machine learning algorithm, and training the initial models respectively according to the first training data sets to obtain a plurality of models to be selected.
In this embodiment, model training is performed according to each first training data set, and since different machine learning algorithms may have different effects, in order to select a machine learning algorithm more suitable for predicting a medication result, in this embodiment, different initial models are constructed according to different machine learning algorithms, and then training is performed according to each first training data set, and then selection is performed.
Specifically, the machine learning algorithm includes at least one of a decision tree, a random forest, a support vector machine, naive bayes, logistic regression, and a multi-layered perceptron. For each machine learning algorithm, multiple hyper-parameters may be selected to build the initial model, i.e., for each and its learning algorithm, multiple initial models may be built. As shown in fig. 2, a plurality of different models to be selected can be obtained by combining training data sets obtained by different preset feature selection methods with different machine learning methods for model training. After screening according to the drug treatment result prediction performance of the candidate model, determining a drug treatment result prediction model finally used for predicting the drug result of the new patient, that is, the method for generating the drug treatment result prediction model provided by the embodiment further includes the steps of:
s300, determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
Specifically, the determining a prediction model of the drug treatment result according to the test results of the plurality of candidate models includes:
s310, obtaining the receiver working characteristic curve of each model to be selected.
The receiver operating characteristic curve is a curve which is drawn by taking the false positive rate of the classifier model as a horizontal axis and the true positive rate as a vertical axis and changing the threshold value of the classifier model. The area under the curve can reflect the classification performance of the classifier model, and the closer to 1.0, the better the effect is; the closer to 0.5, the classifier is in random guessing and has no prediction value; if the value is less than 0.5, the effect is worse than that of random guessing. The area under the curve of a normal and effective classifier model is between 0.5 and 1.0.
And S320, acquiring the candidate model with the highest area under the curve of the receiver working characteristic curve as a target model.
The higher the area under the curve of the receiver operating characteristic curve is, the better the prediction performance of the medication result of the corresponding candidate model is, in this embodiment, the candidate model with the highest area under the curve of the receiver operating characteristic curve is selected as the target model.
The experiment was conducted using the method provided in this example, using a data set with a patient count of 103. The number of features is 155, and 1 target feature. After data preprocessing, the number of patients remained 102 and the number of features remained 109. In the experiment, three methods of variance analysis F test, chi-square test and mutual information are used for feature selection, the number k of feature selection is 20, 35 and 50, and a group of feature selection is added for comparison. In the experiment, six machine learning methods including decision trees, random forests, support vector machines, naive Bayes, logistic regression and multilayer perceptrons are used. The number of trees in the random forest is 100, the kernel function in the support vector machine is a radial basis function, the multilayer perceptron comprises 1 hidden layer with 100 neurons, and the activation function is a linear rectification function. In the experiment, layered ten-fold cross validation is used for validating the models constructed by the feature selection method and the machine learning method, each experiment is repeated for 50 times, the area under the curve of the receiver working characteristic curve is recorded and calculated, and the average value and 95% confidence interval of the area under the curve are calculated. The experimental results are shown in fig. 3, which shows the areas under the curves and the 95% confidence intervals thereof for various methods (the histograms corresponding to each machine learning method in fig. 3 respectively show the areas under the curves of the receiver operating characteristic curves corresponding to the non-feature selection, the F check selection 20, the F check selection 35, the F check selection 50, the chi-square check selection 20, the chi-square check selection 35, the chi-square check selection 50, the mutual information selection 20, the mutual information selection 35, and the mutual information selection 50 from left to right). Of the results, the best performing was to select 35 features for the analysis of variance, F, test and predict the classification using a multi-layered perceptron. The receiver performance curve is shown in fig. 4, and the area under the curve reaches 0.812, 95% confidence intervals (0.807, 0.817). This illustrates that the approach provided by the present embodiment is feasible.
S330, training the target model according to a second training data set to generate the drug treatment result prediction model.
The second training data set comprises a plurality of groups of training data, the sample clinical characteristic category in each group of training data is consistent with the sample clinical special category in the first training data set corresponding to the target model, and the number of the training data groups in the second training data set is larger than that of the training data groups in the first training data set. After the target model is determined, for more clinical data of patients who have undergone epilepsy medication, after feature extraction and preprocessing are performed by using clinical feature categories corresponding to the target model, training data in the second training data set are generated, and similarly, the second training data set includes multiple sets of training data, and each set of training data includes sample clinical features and corresponding treatment results.
After the target model is trained by using the second training data set, the medication result prediction model for predicting whether a new patient is resistant to the drug is generated, that is, after the medication result prediction model is determined according to the test results of the plurality of candidate models, the method includes the following steps:
acquiring clinical data of a target patient, and extracting clinical features of the target patient from the clinical data of the target patient;
inputting the clinical characteristics into a trained drug treatment result prediction model generation model, and determining a drug treatment prediction result of the target patient through the drug treatment result prediction model generation model;
wherein the feature classes of the clinical features of the target patient are consistent with the particular classes of the sample clinical features in the training dataset used in training the medication outcome prediction model.
In summary, the embodiment provides a method for generating a medication result prediction model, which performs different types of feature extraction on existing patient clinical data, constructs different initial models by using different machine learning algorithms, and selects a medication result prediction model finally used for predicting a medication result from models obtained by training using different types of sample features, so as to generate a machine learning model capable of more accurately predicting a medication result, thereby achieving that a medication result of a patient can be predicted by the medication result prediction model according to features extracted from the patient clinical data to determine whether the patient is resistant, and shortening the time for identifying a resistant patient.
It should be understood that, although the steps in the flowcharts shown in the figures of the present specification are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps of the present invention are not limited to being performed in the exact order disclosed, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of the present invention may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Example two
Based on the above embodiment, the present invention also provides a device for generating a model for predicting a medication outcome, as shown in fig. 5, including:
a training data generation module, configured to obtain clinical data of multiple patients, and generate at least one first training data set according to the clinical data of the multiple patients, where each first training data set includes multiple sets of training data, and each set of training data includes a sample clinical characteristic and a corresponding medication result, which is specifically described in embodiment one;
the training module is configured to construct a plurality of initial models according to at least one machine learning algorithm, and train the initial models according to each of the first training data sets to obtain a plurality of models to be selected, which is specifically described in embodiment one;
a determining module, configured to determine a drug therapy outcome prediction model according to a test outcome of the multiple candidate models, as described in embodiment one.
EXAMPLE III
Based on the above embodiments, the present invention further provides a terminal, and a schematic block diagram thereof may be as shown in fig. 6. The terminal comprises a processor 10 and a memory 20, wherein the memory 20 stores a computer program, and the processor 10 executes the computer program to realize at least the following steps:
acquiring clinical data of a plurality of patients, and generating at least one first training data set according to the clinical data of the plurality of patients, wherein each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result;
constructing a plurality of initial models according to at least one machine learning algorithm, and training the initial models according to the first training data sets to obtain a plurality of models to be selected;
and determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
Wherein the classes of sample clinical features in the training data of each of the training data sets are consistent, the generating at least one first training data set from the clinical data of the plurality of patients comprising:
extracting a plurality of feature classes from the clinical data of the plurality of patients;
performing feature selection on the plurality of feature classes by using at least one preset feature selection method to determine classes of sample clinical features in the at least one first training data set;
constructing the first training data set according to the category of the sample clinical features.
Wherein the performing of feature selection on the plurality of feature classes using at least one preset feature selection method to determine the class of the sample clinical features in the at least one first training data set comprises:
and selecting a preset number of characteristic categories as the categories of the sample clinical characteristics in the target first training data set by adopting a target preset characteristic selection method for the plurality of special categories.
The preset feature selection method comprises at least one of analysis of variance test, chi-square test and mutual information.
The machine learning algorithm comprises at least one of a decision tree, a random forest, a support vector machine, naive Bayes, logistic regression and a multilayer perceptron.
Wherein, the determining a drug treatment result prediction model according to the test results of the plurality of candidate models comprises:
acquiring a receiver working characteristic curve of each model to be selected;
acquiring the model to be selected with the highest area under the curve of the receiver working characteristic curve as a target model;
training the target model according to a second training data set to generate the drug treatment result prediction model;
the second training data set comprises a plurality of groups of training data, the sample clinical feature category in each group of training data is consistent with the sample clinical special category in the first training data set corresponding to the target model, and the number of the training data sets in the second training data set is larger than that of the training data sets in the first training data set.
Wherein, after determining the prediction model of the drug treatment result in the plurality of candidate models, the method further comprises:
acquiring clinical data of a target patient, and extracting clinical features of the target patient from the clinical data of the target patient;
inputting the clinical characteristics into a trained drug treatment result prediction model generation model, and determining a drug treatment prediction result of the target patient through the drug treatment result prediction model generation model;
wherein the feature classes of the clinical features of the target patient are consistent with the particular classes of the sample clinical features in the training dataset used in training the medication outcome prediction model.
EXAMPLE III
The present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors, to implement the steps of the method for generating a prediction model of drug therapy outcome described in the above embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for generating a model for predicting the outcome of a drug treatment, comprising:
acquiring clinical data of a plurality of patients, and generating at least one first training data set according to the clinical data of the plurality of patients, wherein each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result;
constructing a plurality of initial models according to at least one machine learning algorithm, and training the initial models according to the first training data sets to obtain a plurality of models to be selected;
and determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
2. The method of generating a medication outcome prediction model according to claim 1, wherein the classes of sample clinical features in the training data of each of the training data sets are consistent, and the generating at least one first training data set from the clinical data of the plurality of patients comprises:
extracting a plurality of feature classes from the clinical data of the plurality of patients;
performing feature selection on the plurality of feature classes by using at least one preset feature selection method to determine classes of sample clinical features in the at least one first training data set;
constructing the first training data set according to the category of the sample clinical features.
3. The method of generating a medication outcome prediction model according to claim 2, wherein the feature selecting the plurality of feature classes using at least one preset feature selection method to determine the class of the sample clinical features in the at least one first training data set comprises:
and selecting a preset number of characteristic categories as the categories of the sample clinical characteristics in the target first training data set by adopting a target preset characteristic selection method for the plurality of special categories.
4. The method of claim 2, wherein the predetermined feature selection method comprises at least one of analysis of variance test, chi-square test, and mutual information.
5. The method of claim 1, wherein the machine learning algorithm comprises at least one of decision trees, random forests, support vector machines, naive bayes, logistic regression, and multi-tier perceptrons.
6. The method for generating a prediction model of drug treatment outcome according to claim 1, wherein the determining a prediction model of drug treatment outcome from the test outcomes of the plurality of candidate models comprises:
acquiring a receiver working characteristic curve of each model to be selected;
acquiring the model to be selected with the highest area under the curve of the receiver working characteristic curve as a target model;
training the target model according to a second training data set to generate the drug treatment result prediction model;
the second training data set comprises a plurality of groups of training data, the sample clinical feature category in each group of training data is consistent with the sample clinical special category in the first training data set corresponding to the target model, and the number of the training data sets in the second training data set is larger than that of the training data sets in the first training data set.
7. The method of generating a prediction model for outcome of drug treatment according to claim 1, wherein after determining the prediction model for outcome of drug treatment in the plurality of candidate models, the method further comprises:
acquiring clinical data of a target patient, and extracting clinical features of the target patient from the clinical data of the target patient;
inputting the clinical characteristics into a trained drug treatment result prediction model generation model, and determining a drug treatment prediction result of the target patient through the drug treatment result prediction model generation model;
wherein the feature classes of the clinical features of the target patient are consistent with the particular classes of the sample clinical features in the training dataset used in training the medication outcome prediction model.
8. A medication outcome prediction model generation apparatus, comprising:
the training data generating module is used for acquiring clinical data of a plurality of patients and generating at least one first training data set according to the clinical data of the plurality of patients, each first training data set comprises a plurality of groups of training data, and each group of training data comprises a sample clinical characteristic and a corresponding drug treatment result;
the training module is used for constructing a plurality of initial models according to at least one machine learning algorithm and respectively training the initial models according to each first training data set to obtain a plurality of models to be selected;
and the determining module is used for determining a drug treatment result prediction model according to the test results of the plurality of candidate models.
9. A terminal, characterized in that the terminal comprises: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the method of generating a model of a drug therapy outcome prediction according to any of the preceding claims 1-7.
10. A computer readable storage medium, storing one or more programs, which are executable by one or more processors, to implement the steps of the method for generating a model for predicting drug therapy outcome of any of claims 1-7.
CN202110234102.XA 2021-03-03 2021-03-03 Method, device, terminal and storage medium for generating drug treatment result prediction model Pending CN112992377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110234102.XA CN112992377A (en) 2021-03-03 2021-03-03 Method, device, terminal and storage medium for generating drug treatment result prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110234102.XA CN112992377A (en) 2021-03-03 2021-03-03 Method, device, terminal and storage medium for generating drug treatment result prediction model

Publications (1)

Publication Number Publication Date
CN112992377A true CN112992377A (en) 2021-06-18

Family

ID=76352276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110234102.XA Pending CN112992377A (en) 2021-03-03 2021-03-03 Method, device, terminal and storage medium for generating drug treatment result prediction model

Country Status (1)

Country Link
CN (1) CN112992377A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410686A (en) * 2022-08-22 2022-11-29 哈尔滨医科大学 Method and device for selecting conversion treatment scheme, electronic equipment and storage medium
CN116564514A (en) * 2023-03-30 2023-08-08 深圳市儿童医院 Multi-model-based method for predicting curative effect of epileptic caused by tuberous sclerosis
CN116825382A (en) * 2023-02-23 2023-09-29 深圳市儿童医院 Epileptic drug effectiveness detection method and device based on multi-modal fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020187987A1 (en) * 2019-03-19 2020-09-24 Koninklijke Philips N.V. Population-level gaussian processes for clinical time series forecasting
CN111834017A (en) * 2020-07-09 2020-10-27 上海市精神卫生中心(上海市心理咨询培训中心) Method, system and device for predicting treatment effect of psychotropic drugs
CN111899894A (en) * 2020-08-03 2020-11-06 东南大学 System and method for evaluating prognosis drug effect of depression patient
CN112132624A (en) * 2020-09-27 2020-12-25 平安医疗健康管理股份有限公司 Medical claims data prediction system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020187987A1 (en) * 2019-03-19 2020-09-24 Koninklijke Philips N.V. Population-level gaussian processes for clinical time series forecasting
CN111834017A (en) * 2020-07-09 2020-10-27 上海市精神卫生中心(上海市心理咨询培训中心) Method, system and device for predicting treatment effect of psychotropic drugs
CN111899894A (en) * 2020-08-03 2020-11-06 东南大学 System and method for evaluating prognosis drug effect of depression patient
CN112132624A (en) * 2020-09-27 2020-12-25 平安医疗健康管理股份有限公司 Medical claims data prediction system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卓义轩: "基于机器学习的高血压常见单药物治疗方案预测", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410686A (en) * 2022-08-22 2022-11-29 哈尔滨医科大学 Method and device for selecting conversion treatment scheme, electronic equipment and storage medium
CN115410686B (en) * 2022-08-22 2023-07-25 哈尔滨医科大学 Method and device for selecting conversion treatment scheme, electronic equipment and storage medium
CN116825382A (en) * 2023-02-23 2023-09-29 深圳市儿童医院 Epileptic drug effectiveness detection method and device based on multi-modal fusion
CN116564514A (en) * 2023-03-30 2023-08-08 深圳市儿童医院 Multi-model-based method for predicting curative effect of epileptic caused by tuberous sclerosis

Similar Documents

Publication Publication Date Title
CN112992377A (en) Method, device, terminal and storage medium for generating drug treatment result prediction model
US11328798B2 (en) Utilizing multiple sub-models via a multi-model medical scan analysis system
CN109378064B (en) Medical data processing method, device electronic equipment and computer readable medium
CN112017789B (en) Triage data processing method, triage data processing device, triage data processing equipment and triage data processing medium
CN111387938B (en) Patient heart failure death risk prediction system based on characteristic rearrangement one-dimensional convolutional neural network
CN112035611B (en) Target user recommendation method, device, computer equipment and storage medium
Hasan et al. A machine learning framework for early-stage detection of autism spectrum disorders
Mall et al. Heart diagnosis using deep neural network
Marathe et al. Prediction of heart disease and diabetes using naive Bayes algorithm
Ashrafuzzaman et al. Prediction of stroke disease using deep CNN based approach
CN116864139A (en) Disease risk assessment method, device, computer equipment and readable storage medium
Zafar et al. Reviewing methods of deep learning for intelligent healthcare systems in genomics and biomedicine
Chandra et al. Comparative analysis of machine learning techniques with principal component analysis on kidney and heart disease
Rabhi et al. Temporal deep learning framework for retinopathy prediction in patients with type 1 diabetes
Olaoye et al. Hybrid Models for Medical Data Analysis
Singh Better application of Bayesian deep learning to diagnose disease
Patro et al. An effective correlation-based data modeling framework for automatic diabetes prediction using machine and deep learning techniques
CN114974501A (en) Artificial intelligence based medicine recommendation method, device, equipment and storage medium
Faris et al. An intelligence model for detection of PCOS based on K‐means coupled with LS‐SVM
Athilakshmi et al. Enhancing Diabetic Retinopathy Diagnosis with Inception v4: A Deep Learning Approach
Busi et al. A Hybrid Deep Learning Technique for Feature Selection and Classification of Chronic Kidney Disease.
US20210027892A1 (en) System and method for outputting groups of vectorized temporal records
Talari et al. Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2
Bhadoria et al. Parkinson's Disease Prediction Using Efficient Analysis of Machine Learning Algorithms
Pasha et al. Well-calibrated probabilistic machine learning classifiers for multivariate healthcare Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210618