CN114121296B - Data-driven clinical information rule extraction method, storage medium and equipment - Google Patents

Data-driven clinical information rule extraction method, storage medium and equipment Download PDF

Info

Publication number
CN114121296B
CN114121296B CN202111500068.2A CN202111500068A CN114121296B CN 114121296 B CN114121296 B CN 114121296B CN 202111500068 A CN202111500068 A CN 202111500068A CN 114121296 B CN114121296 B CN 114121296B
Authority
CN
China
Prior art keywords
rule
data
rule set
optimal
clinical information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111500068.2A
Other languages
Chinese (zh)
Other versions
CN114121296A (en
Inventor
张少典
马汉东
位凯
朱珉
薛颜波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Synyi Medical Technology Co ltd
Original Assignee
Shanghai Synyi Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Synyi Medical Technology Co ltd filed Critical Shanghai Synyi Medical Technology Co ltd
Priority to CN202111500068.2A priority Critical patent/CN114121296B/en
Publication of CN114121296A publication Critical patent/CN114121296A/en
Application granted granted Critical
Publication of CN114121296B publication Critical patent/CN114121296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Abstract

The invention provides a data-driven clinical information rule extraction method, a storage medium and equipment, wherein the data-driven clinical information rule extraction method comprises the following steps: obtaining patient sample data, the patient sample data including individual clinical features of a patient; generating an initial rule set from the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universality rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universality rule set. According to the invention, a series of rules with high confidence and accuracy can be mined from clinical information on the premise of ensuring accuracy, so that a clear conclusion path can be effectively obtained, and a doctor can be assisted to make a decision to a certain extent.

Description

Data-driven clinical information rule extraction method, storage medium and equipment
Technical Field
The invention belongs to the technical field of data mining, relates to a rule extraction method, and in particular relates to a data-driven clinical information rule extraction method, a storage medium and equipment.
Background
At present, with the development of intelligent medical technology, medical rules play an important role in the processes of risk prediction, clinical diagnosis and the like of diseases, wherein mining rules with high confidence in data such as clinical diagnosis information, demographic information and the like can assist doctors in decision making to a certain extent.
The existing disease risk and clinical diagnosis rules are mostly from various medical scales and machine learning predictive models. (1) The medical scale can quantify clinical information, demographic information, various daily habits and the like of patients, assign different scores to different features, and finally measure the disease degree, disease risk and the like in a scoring mode. However, most of the existing medical scales are formulated by foreign people, and factors such as race, daily habit, individual difference and the like are often ignored, and have a certain influence on the accuracy of scale evaluation. (2) The use of machine learning models can improve prediction and diagnostic accuracy to some extent. However, most existing machine learning models cannot directly provide decision rules with interpretability.
Therefore, how to provide a method, a storage medium and a device for extracting clinical information rules based on data driving, so as to solve the defects that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability, and the like, is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method, a storage medium and a device for extracting clinical information rules based on data driving, which are used for solving the problem that the prior art cannot provide a rule extraction scheme with high accuracy and interpretability.
To achieve the above and other related objects, an aspect of the present invention provides a data-driven based clinical information rule extraction method, including: obtaining patient sample data, the patient sample data including individual clinical features of a patient; generating an initial rule set from the patient sample data; screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universality rule set; and determining an optimal rule set through the accuracy and the interpretability of each rule in the universality rule set.
In one embodiment of the present invention, the patient sample data is table data without missing values, wherein each row of the table data represents a patient sample and each column represents a feature of the patient.
In one embodiment of the present invention, the step of generating an initial rule set from the patient sample data comprises: preprocessing the patient sample data; for the preprocessed patient sample data, rule extraction is carried out on each node in each generated tree by utilizing a tree model; and generating the initial rule set according to the rule extraction result.
In an embodiment of the present invention, the step of screening the initial rule set based on the timing characteristics in the initial rule set to obtain a universality rule set includes: acquiring the time frequency of regular occurrence on each node by using a time sequence statistical method; and screening out the rule of which the time frequency meets the preset requirement of the user as the universality rule set.
In an embodiment of the present invention, the step of determining the optimal rule set according to the accuracy and the interpretability of each rule in the universality rule set includes: determining an optimal solution by a multi-objective optimization algorithm aiming at each rule in the universality rule set; and determining the combination of all the optimal solution components as the optimal rule set.
In an embodiment of the present invention, the step of determining the optimal solution by the multi-objective optimization algorithm includes: taking the accuracy and the interpretability of each rule as two optimization targets; randomly initializing a particle swarm aiming at the optimization target; determining the fitness of each particle in the particle swarm; updating the speed and the position of the particles according to the fitness; judging whether the maximum iteration times or the global optimal position is reached to meet the minimum authority; if yes, determining the pareto optimal solution.
In an embodiment of the present invention, after the step of determining an optimal rule set by accuracy and interpretability of each rule in the universality rule set, the data-driven clinical information rule extraction method further includes: acquiring prediction data of clinical decisions required by a user; all the obtained prediction data form a prediction data set; and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rule which is met by the predicted data set according to the matching result of the predicted data and the optimal rule set.
In an embodiment of the present invention, the optimal rule set includes a first rule, a second rule, and a third rule; the step of comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rule which the predicted data set accords with according to the matching result of the predicted data and the optimal rule set comprises the following steps: and determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
To achieve the above and other related objects, the present invention provides in another aspect a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data-driven based clinical information rule extraction method.
To achieve the above and other related objects, a final aspect of the present invention provides an electronic device, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the electronic equipment to execute the data-driven clinical information rule extraction method.
As described above, the data-driven clinical information rule extraction method, the storage medium and the device according to the present invention have the following beneficial effects:
according to the invention, an initial rule set is generated according to patient sample data, then universal rule screening is carried out according to time sequence characteristics, and the accuracy and the interpretability of each rule are utilized to determine an optimal rule set. Therefore, the problems of low prediction accuracy of the medical scale and poor resolvability of the traditional machine learning model are well solved, and the rule extraction scheme based on data driving provided by the invention can mine a series of rules with high confidence and accuracy from clinical information on the premise of ensuring the accuracy. The clear conclusion path can be effectively obtained, and the doctor is assisted in making decisions to a certain extent.
Drawings
FIG. 1 is a schematic flow chart of a data-driven clinical information rule extraction method according to an embodiment of the invention.
FIG. 2 is a flowchart of determining an optimal rule set according to an embodiment of the data-driven clinical information rule extraction method of the present invention.
FIG. 3 is a flowchart illustrating the calculation of the optimal solution according to an embodiment of the method for extracting clinical information rules based on data driving.
FIG. 4 is a flowchart of a method for extracting clinical information rules based on data driving according to an embodiment of the invention.
Fig. 5 is a schematic structural connection diagram of an electronic device according to an embodiment of the invention.
Description of element reference numerals
5. Electronic equipment
51. Processor and method for controlling the same
52. Memory device
S11 to S16 steps
S141 to S142 steps
Steps S141A to S141F
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the illustrations, not according to the number, shape and size of the components in actual implementation, and the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.
According to the data-driven-based clinical information rule extraction method, the storage medium and the device, a series of rules with high confidence and accuracy can be mined from clinical information on the premise of ensuring accuracy, so that a clear conclusion path can be effectively obtained, and a doctor can be assisted in making a decision to a certain extent.
The principle and implementation of the data-driven clinical information rule extraction method, storage medium and apparatus of the present embodiment will be described in detail below with reference to fig. 1 to 5, so that those skilled in the art can understand the data-driven clinical information rule extraction method, storage medium and apparatus of the present embodiment without creative effort.
Referring to fig. 1, a schematic flow chart of a data-driven clinical information rule extraction method according to an embodiment of the invention is shown. As shown in fig. 1, the method for extracting clinical information rules based on data driving specifically includes the following steps:
s11, acquiring patient sample data, wherein the patient sample data comprises various clinical characteristics of a patient.
In one embodiment of the present invention, the patient sample data is table data without missing values, wherein each row of the table data represents a patient sample and each column represents a feature of the patient.
In practice, taking pulmonary embolism as an example, laboratory test data for a collection of patients with outcome variables are taken from a hospital-related department as patient sample data.
S12, generating an initial rule set according to the patient sample data.
In one embodiment, S12 specifically includes the following steps:
(1) Preprocessing the patient sample data.
Specifically, the preprocessing includes existing preprocessing means such as data cleaning, data merging, data transformation, data normalization and the like, so as to improve the usability of the patient sample data.
(2) And carrying out rule extraction on each node in each generated tree by utilizing a tree model aiming at the preprocessed patient sample data.
In particular, the tree model may be any robust model such as decision tree, random forest, GBDT (Gradient Boosting Decision Tree, gradient descent tree), xgboost, etc.
In practical application, each node in each generated tree is subjected to rule extraction by utilizing a random forest algorithm. The random forest is a stable integrated learning model, adopts the idea of 'bagging', uses a bootstrap method to generate a plurality of training sets, constructs a decision tree for each training set, and finally combines the classification results of a plurality of decision tree-based classifiers to obtain a relatively better prediction model.
Specifically, given data set D, feature vector X and corresponding label y, let d= (Xi, yi), i=1, 2, …, n. Xi e X, xi= (Xi 1, xi2, …, xim), m is the feature quantity, yi e y= {0,1, … }. Gini (D) is defined to measure the purity of D, and can be expressed as follows:
p in formula 1 k (k=1, 2, …, K) represents the attribute of the kth class sample in the current dataset. k' represents a class other than the k class. The smaller Gini (D), the higher the purity of dataset D. Assuming that feature m has V possible values { m1, m2, …, mv }, dividing data set D using feature m to generate V different branch nodes, wherein the V-th branch is denoted as Dv, and Gini is defined index(,) To represent the uncertainty of feature m in D, can be expressed as:
for training set D, the learning algorithm that constructs the decision tree can be represented as an X-to-y map that uses the lowest-radix index feature after partitioning to cycle the data set D into subsets to form a tree. The selected feature m is expressed as:
and then obtaining classification results by integrating weighted outputs of all decision trees:
in equation 4, ω h Representing the weight of the h tree, a sample may be classified according to the following equation:
in equation 5, S represents the number of trees.
(3) And generating the initial rule set according to the rule extraction result.
Specifically, the initial rule set acquisition mode is: the random forest algorithm obtains the conclusion of the characteristic corresponding rule condition of the nodes in each path and the category corresponding rule of the leaf nodes by traversing the path from the root node to each leaf node in each decision tree.
In practical applications, the type of tree model output is determined by the individual tree output conditions when performing disease prediction or medical diagnosis tasks. Since the tree model is a "white box model" that provides a clear path for each conclusion, rules for all nodes on each tree in the tree model are output as the initial rule set.
And S13, screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain the universality rule set. Therefore, through screening of indexes such as time frequency and the like of the analysis rules, the occurrence of rules which are not universal and corresponding to some black swan events can be effectively avoided.
In one embodiment, S13 specifically includes the following steps:
(1) And obtaining the time frequency of the regular occurrence on each node by using a time sequence statistical method.
In particular, the timing statistics method may be a timing statistics function or other implementation that may implement the timing statistics function.
In practical application, for the statistical analysis process of time sequence data in the rule, the grouping and aggregation function of samples on each node according to time frequency is implemented by using a pandas packet based on python, for example: and counting information with time frequency attributes such as the number of days, weeks, months, years or the start-stop time of the appearance of the sample on the node.
(2) And screening out the rule of which the time frequency meets the preset requirement of the user as the universality rule set.
Specifically, for example, the preset requirement of the user is 1 year, if a certain patient sample data appears within 2 weeks, the rule corresponding to the extraction of the patient sample data has no universality, and if a certain patient sample data appears within 2 years, the rule corresponding to the extraction of the patient sample data has universality.
And S14, determining an optimal rule set according to the accuracy and the interpretability of each rule in the universality rule set.
Referring to fig. 2, a flowchart of determining an optimal rule set according to an embodiment of the data-driven clinical information rule extraction method of the present invention is shown. As shown in fig. 2, S14 specifically includes the following steps:
s141, determining an optimal solution by a multi-objective optimization algorithm according to each rule in the universality rule set. Wherein the multi-objective optimization algorithm is used to balance accuracy and interpretability of rules.
Specifically, the multi-objective optimization algorithm may be any algorithm capable of realizing two objectives and more than two objective optimization analyses, such as a multi-objective particle swarm algorithm, a non-dominant ordering genetic algorithm, a multi-objective evolutionary algorithm, and the like.
Referring to fig. 3, a flowchart of an optimal solution calculation according to an embodiment of the data-driven clinical information rule extraction method of the present invention is shown. As shown in fig. 3, S141 specifically includes the following steps:
S141A, the accuracy and the interpretability of each rule are taken as two optimization targets.
In order to guarantee the accuracy of rule sets, the accuracy of each rule set, namely the ratio of the correctly predicted data sets, is calculated. Rule accuracy is defined as follows:
in equation 6, QACC represents the accuracy of the rule set, Q represents the number of samples, and xi represents the ith sample. To measure the interpretability of a rule, we define it as:
in formula 7, Q FEA 、Q COV 、Q CNT Representing the complexity of the rule, the convergence of the rule and the quality of the rule, respectively. Alpha, beta and gamma are weights of the three, and can be set according to actual conditions. Specifically, Q FEA For finding the number of features of each rule, if the average number of features involved in the rule is small, Q CNT The value is larger. Q (Q) COV For representing the coverage of each rule, Q is the rule when it has strong applicability COV Larger. Q (Q) CNT For measuring the quality of the rule. They are defined as:
in the formula 8 of the present invention,representing the active features in the ith rule, in equation 9, < >>Representing the number of samples matching the ith rule. In formula 10, rule selected Representing the number of rules derived from the algorithm. Z is the number of candidate rules generated. When Q is FEA Only one feature, Q, in the expression rule of =1 FEA The expression rule contains all features when=0. I.e. Q FEA The smaller the rule, the easier the physician will understand at the time of diagnosis.
S141B, randomly initializing a particle swarm aiming at the optimization target.
The invention regards the solution in the optimization problem as "particles", all of which are searched in the N-dimensional space, each particle having only two attributes: position and velocity, velocity representing the speed of movement and position representing the direction of movement. The current position of the particle is a candidate solution to the optimization problem, and the flying process of the particle is the searching process of the individual.
And S141C, determining the fitness of each particle in the particle swarm.
Specifically, an fitness function is defined that is capable of determining individual optimal solutions for each particle, and a global optimal value is found from the individual optimal solutions.
And S141D, updating the speed and the position of the particles according to the adaptability.
Specifically, the flight speed of the particles may be dynamically adjusted based on the historical optimal position of the particles and the historical optimal position of the population. The speed and position of the particles are updated according to the fitness.
S141E, judging whether the maximum iteration number or the global optimal position meets the minimum authority.
The optimal solution searched by each particle is called an individual extremum, and the optimal individual extremum in the particle swarm is used as the current global optimal solution. The iteration is continued, updating the speed and the position. And finally obtaining the optimal solution meeting the termination condition. If the maximum iteration number is not reached or the global optimal position does not satisfy the minimum authority, the process returns to step S141C.
And S141F, if yes, determining the pareto optimal solution.
And determining the pareto optimal solution in the final overall by using a rapid non-dominant sorting method for particles which reach the maximum iteration number or the global optimal position and meet the minimum authority.
And S142, determining the combination of all the optimal solution components as the optimal rule set.
Specifically, for pulmonary arterial embolism, the optimal rule set is: "lower limb varicose vein _ diagnosis _ any >0.5 within 1 month, gender visit count < = 1.5,10000 days age visit last < = 26373.0".
When "1 month in_lower limb varicose vein_diagnosis_any >0.5, 10000 days in_gender_visit_count < = 1.5,10000 days in_age_visit_last < = 26373.0", the probability of patient suffering from VTE is determined to be 90% or more.
Referring to fig. 4, a flowchart of a predicted data matching process according to an embodiment of the invention is shown. As shown in fig. 4, after the step, the data-driven-based clinical information rule extraction method further includes the steps of:
s15, obtaining prediction data of clinical decisions required by a user; all the acquired prediction data constitute a prediction data set.
S16, comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rule which the predicted data set accords with according to the matching result of the predicted data and the optimal rule set.
In one embodiment, the optimal rule set includes a first rule, a second rule, and a third rule.
And determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
Specifically, for pulmonary arterial embolism, the optimal rule set is: "lower limb varicose vein_diagnosis_any >0.5 within 1 month, gender_visit_count < =1.5 within 10000 days, age_visit_last < = 26373.0 within 10000 days. The first rule is 1 month in 1 lower limb varicose vein_diagnosis_any >0.5, the second rule is 10000 days in gender_visit_count < = 1.5, and the third rule is 10000 days in age_visit_last < = 26373.0. When the predicted data corresponding to a certain patient simultaneously meets three rules, the analyzed probability of the patient suffering from pulmonary artery embolism is more than 90%, and after a doctor knows the information of the probability of the patient suffering from pulmonary artery embolism is more than 90%, the doctor can diagnose the disease of the patient according to the information.
The following is an effect comparison analysis of the present invention with the existing machine learning model: the existing machine learning model takes a risk proportion regression model as an example, and simultaneously evaluates the influence of various factors on disease risks or diagnosis results, and a predictable and diagnostic function is obtained by weighting and nonlinear mapping the factors. Taking the example of chronic kidney disease predicting its probability of developing renal failure within five years, the following risk ratio regression model can be obtained:
the accurate prediction result can be obtained through the function, but the rules obtained by weighting or nonlinear operation on factors such as GFR (Glomerular Filtration Rate ), ACR (Autologous Cellular Rejuvenation, autologous cell regeneration), AGE (Advanced Glycation End products, glycosylation end product) and the like are not interpretable, and a series of rules with high confidence and accuracy are mined from clinical information through a multi-objective optimization algorithm on the premise of ensuring accuracy.
The protection scope of the data-driven clinical information rule extraction method is not limited to the execution sequence of the steps listed in the embodiment, and all the schemes of step increase, step decrease and step replacement in the prior art according to the principles of the invention are included in the protection scope of the invention.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data-driven-based clinical information rule extraction method.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned computer-readable storage medium includes: various computer storage media such as ROM, RAM, magnetic or optical disks may store program code.
Referring to fig. 5, a schematic structural connection diagram of an electronic device according to an embodiment of the invention is shown. As shown in fig. 5, the present embodiment provides an electronic device 5, specifically including: a processor 51 and a memory 52; the memory 52 is configured to store a computer program, and the processor 51 is configured to execute the computer program stored in the memory 52, so that the electronic device 5 performs the steps of the data-driven clinical information rule extraction method.
The processor 51 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Alication Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable GateArray, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The memory 52 may include a random access memory (Random Access Memory, abbreviated as RAM) and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory.
In practical applications, the electronic device may be a computer including all or part of the components of a memory, a memory controller, one or more processing units (CPUs), a peripheral interface, an RF circuit, an audio circuit, a speaker, a microphone, an input/output (I/O) subsystem, a display screen, other output or control devices, and an external port; the computer includes, but is not limited to, a personal computer such as a desktop computer, a notebook computer, a tablet computer, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA for short), and the like. In other embodiments, the electronic device may also be a server, where the server may be disposed on one or more entity servers according to multiple factors such as functions, loads, and the like, and may also be a cloud server formed by a distributed or centralized server cluster, which is not limited in this embodiment.
In summary, the data-driven clinical information rule extraction method, the storage medium and the device generate the initial rule set according to the patient sample data, perform universality rule screening according to time sequence characteristics, and determine the optimal rule set by utilizing the accuracy and the interpretability of each rule. Therefore, the problems of low prediction accuracy of the medical scale and poor resolvability of the traditional machine learning model are well solved, and the rule extraction scheme based on data driving provided by the invention can mine a series of rules with high confidence and accuracy from clinical information on the premise of ensuring the accuracy. The clear conclusion path can be effectively obtained, and the doctor is assisted in making decisions to a certain extent. The invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (9)

1. The data-driven clinical information rule extraction method is characterized by comprising the following steps of:
obtaining patient sample data, the patient sample data including individual clinical features of a patient; the patient sample data is table data without missing values;
generating an initial rule set from the patient sample data; wherein the patient sample data is pre-processed; for the preprocessed patient sample data, rule extraction is carried out on each node in each generated tree by utilizing a tree model; generating the initial rule set according to the rule extraction result;
screening the initial rule set based on the time sequence characteristics in the initial rule set to obtain a universality rule set;
and determining an optimal rule set through the accuracy and the interpretability of each rule in the universality rule set.
2. The data-driven clinical information rule extraction method according to claim 1, wherein:
the patient sample data is tabular data without missing values, wherein each row of the tabular data represents a patient sample and each column represents a characteristic of the patient.
3. The method for extracting clinical information rules based on data driving according to claim 1, wherein the step of screening the initial rule set based on the time sequence features in the initial rule set to obtain a universality rule set comprises:
acquiring the time frequency of regular occurrence on each node by using a time sequence statistical method;
and screening out the rule of which the time frequency meets the preset requirement of the user as the universality rule set.
4. The data-driven clinical information rule extraction method according to claim 1, wherein the step of determining an optimal rule set by accuracy and interpretability of each rule in the universality rule set comprises:
determining an optimal solution by a multi-objective optimization algorithm aiming at each rule in the universality rule set;
and determining the combination of all the optimal solution components as the optimal rule set.
5. The method of claim 4, wherein the determining the optimal solution by a multi-objective optimization algorithm comprises:
taking the accuracy and the interpretability of each rule as two optimization targets;
randomly initializing a particle swarm aiming at the optimization target;
determining the fitness of each particle in the particle swarm;
updating the speed and the position of the particles according to the fitness;
judging whether the maximum iteration times or the global optimal position is reached to meet the minimum authority;
if yes, determining the pareto optimal solution.
6. The data-driven clinical information rule extraction method according to claim 1, wherein after the step of determining an optimal rule set by accuracy and interpretability of each rule in the universality rule set, the data-driven clinical information rule extraction method further comprises:
acquiring prediction data of clinical decisions required by a user; all the obtained prediction data form a prediction data set;
and comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rule which is met by the predicted data set according to the matching result of the predicted data and the optimal rule set.
7. The data-driven based clinical information rule extraction method according to claim 6, wherein the optimal rule set includes a first rule, a second rule, and a third rule; the step of comparing the predicted data with the rules in the optimal rule set one by one, and obtaining the rule which the predicted data set accords with according to the matching result of the predicted data and the optimal rule set comprises the following steps:
and determining the user illness probability corresponding to the prediction data set in response to the prediction data simultaneously meeting the first rule, the second rule and the third rule, wherein the user illness probability is used for providing auxiliary judgment information for a doctor in the process of disease diagnosis of the doctor.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data-driven based clinical information rule extraction method according to any one of claims 1 to 7.
9. An electronic device, comprising: a processor and a memory;
the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the electronic device to execute the data-driven-based clinical information rule extraction method according to any one of claims 1 to 7.
CN202111500068.2A 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment Active CN114121296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111500068.2A CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111500068.2A CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN114121296A CN114121296A (en) 2022-03-01
CN114121296B true CN114121296B (en) 2024-02-02

Family

ID=80364078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111500068.2A Active CN114121296B (en) 2021-12-09 2021-12-09 Data-driven clinical information rule extraction method, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114121296B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117059214A (en) * 2023-07-21 2023-11-14 南京智慧云网络科技有限公司 Clinical scientific research data integration and intelligent analysis system and method based on artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103326353A (en) * 2013-05-21 2013-09-25 武汉大学 Environmental economic power generation dispatching calculation method based on improved multi-objective particle swarm optimization algorithm
CN111489827A (en) * 2020-04-10 2020-08-04 吉林大学 Thyroid disease prediction modeling method based on associative decision tree
CN112071420A (en) * 2020-08-12 2020-12-11 福建中榕数据科技有限公司 Clinical aid decision making method, system, equipment and medium based on real-time data
AU2020103709A4 (en) * 2020-11-26 2021-02-11 Daqing Oilfield Design Institute Co., Ltd A modified particle swarm intelligent optimization method for solving high-dimensional optimization problems of large oil and gas production systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11538586B2 (en) * 2019-05-07 2022-12-27 International Business Machines Corporation Clinical decision support

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103326353A (en) * 2013-05-21 2013-09-25 武汉大学 Environmental economic power generation dispatching calculation method based on improved multi-objective particle swarm optimization algorithm
CN111489827A (en) * 2020-04-10 2020-08-04 吉林大学 Thyroid disease prediction modeling method based on associative decision tree
CN112071420A (en) * 2020-08-12 2020-12-11 福建中榕数据科技有限公司 Clinical aid decision making method, system, equipment and medium based on real-time data
AU2020103709A4 (en) * 2020-11-26 2021-02-11 Daqing Oilfield Design Institute Co., Ltd A modified particle swarm intelligent optimization method for solving high-dimensional optimization problems of large oil and gas production systems

Also Published As

Publication number Publication date
CN114121296A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
Lan et al. A survey of data mining and deep learning in bioinformatics
Xia et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers
Zhang et al. Local density adaptive similarity measurement for spectral clustering
CN110929029A (en) Text classification method and system based on graph convolution neural network
CN103559504A (en) Image target category identification method and device
CN109817339B (en) Patient grouping method and device based on big data
Teng et al. Customer credit scoring based on HMM/GMDH hybrid model
Khan et al. Machine learning facilitated business intelligence (Part II) Neural networks optimization techniques and applications
CN112102899A (en) Construction method of molecular prediction model and computing equipment
WO2023185925A1 (en) Data processing method and related apparatus
CN114121296B (en) Data-driven clinical information rule extraction method, storage medium and equipment
Shrestha et al. Supervised machine learning for early predicting the sepsis patient: modified mean imputation and modified chi-square feature selection
Quach et al. Evaluation of the efficiency of the optimization algorithms for transfer learning on the rice leaf disease dataset
Saravanan et al. Prediction of Insufficient Accuracy for Human Activity Recognition using Convolutional Neural Network in Compared with Support Vector Machine
Mahapatra et al. MRMR-SSA: a hybrid approach for optimal feature selection
CN111159481A (en) Edge prediction method and device of graph data and terminal equipment
CN115936841A (en) Method and device for constructing credit risk assessment model
CN116383441A (en) Community detection method, device, computer equipment and storage medium
CN115879508A (en) Data processing method and related device
Saranya et al. FBCNN-TSA: An optimal deep learning model for banana ripening stages classification
Cai et al. Improved EfficientNet for corn disease identification
CN113393303A (en) Article recommendation method, device, equipment and storage medium
CN115420866A (en) Drug activity detection method, device, electronic equipment and storage medium
Wålinder Evaluation of logistic regression and random forest classification based on prediction accuracy and metadata analysis
Vinutha et al. EPCA—enhanced principal component analysis for medical data dimensionality reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant