CN111968741B - Deep learning and integrated learning-based diabetes complication high-risk early warning system - Google Patents

Deep learning and integrated learning-based diabetes complication high-risk early warning system Download PDF

Info

Publication number
CN111968741B
CN111968741B CN202010677989.5A CN202010677989A CN111968741B CN 111968741 B CN111968741 B CN 111968741B CN 202010677989 A CN202010677989 A CN 202010677989A CN 111968741 B CN111968741 B CN 111968741B
Authority
CN
China
Prior art keywords
data
early warning
sample
value
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010677989.5A
Other languages
Chinese (zh)
Other versions
CN111968741A (en
Inventor
陈锦泉
田翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010677989.5A priority Critical patent/CN111968741B/en
Publication of CN111968741A publication Critical patent/CN111968741A/en
Application granted granted Critical
Publication of CN111968741B publication Critical patent/CN111968741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a diabetes complication high-risk early warning system based on deep learning and integrated learning, which is characterized in that: comprising the following steps: the data input module is used for inputting original data of the electronic medical record to be identified; the data preprocessing module is used for preprocessing the original data of the electronic medical record to be identified to obtain an electronic medical record data set; the data processing module is used for inputting the electronic medical record data set into a data mining model, and processing the electronic medical record data set through the data mining model to obtain an early warning result of whether the electronic medical record data set is a high risk group; the data mining model is obtained by training an initial data mining model. The invention can realize early detection and early warning of high risk groups of diabetic complications, has high early warning accuracy and good early warning effect, and can effectively monitor and prevent diseases.

Description

Deep learning and integrated learning-based diabetes complication high-risk early warning system
Technical Field
The invention relates to the technical field of medical data mining, in particular to a diabetes complication high-risk early warning system based on deep learning and integrated learning.
Background
The electronic medical record (Electronic Medical Record, EMR for short) is a digitalized medical record stored, managed, transmitted and reproduced by adopting electronic equipment such as a computer, and comprises relevant personal information (sex, age and the like) of a patient, various physiological indexes, examination results, past history, genetic history and the like. The electronic medical record can effectively and lightly store relevant data of patients, realize transmission and sharing of medical data and lay a solid foundation for development of medical data mining technology.
Diabetes (Diabetes) is a chronic, non-infectious disease that is closely related to people's lifestyle, and has become a major killer for health and life. The disease has the characteristics of long disease course, complex etiology, serious health damage and social hazard, life-long disease, poor prognosis, frequently accompanied with various complications and the like, the prevalence of diabetes in China is continuously rising, the population aging trend is rapidly accelerated, the aging population is increased to promote the control work of the diabetes to be more urgent and difficult, and the diabetes and the complications thereof are increasingly in strong management demand.
The traditional diabetes related examination mainly aims at whether diabetes is suffered or not, and generally, when a patient has serious symptoms, the patient is found to suffer from the diabetes only by taking an examination such as testing fasting blood glucose by a hospital outpatient service, which delays the illness state of the patient, so that the diabetes patient can miss the treatment time, and can not wait for the treatment of the hospital until a plurality of high-risk complications such as cerebral apoplexy occur, so that the management and early intervention of the diabetes and the complications are very important.
Many existing diabetes related checks are built on mathematical models, and there are common cases where data is collected from community health people and a diabetes risk assessment model is built based on the data, and also studies are made on individual diabetes complications, and laboratory check data is used for modeling, so that the possibility that a diabetic patient gets a certain complication is predicted. In general, the research on screening the high risk group of the diabetic complications is relatively lacking at the present stage, and a great number of experiments are only carried out based on a few physical indexes of patients, and the research is carried out by using electronic medical record data comprising various examinations, treatments and the like of the patients, so that the method is very helpful for disease prevention and later treatment.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention aims to provide a diabetes complication high-risk early warning system based on deep learning and integrated learning; the invention can realize early detection and early warning of high risk groups of diabetic complications, has high early warning accuracy and good early warning effect, and can effectively monitor and prevent diseases.
In order to achieve the above purpose, the invention is realized by the following technical scheme: diabetes complication high-risk early warning system based on deep learning and integrated learning, characterized in that: comprising the following steps:
The data input module is used for inputting original data of the electronic medical record to be identified;
the data preprocessing module is used for preprocessing the original data of the electronic medical record to be identified to obtain an electronic medical record data set;
the data processing module is used for inputting the electronic medical record data set into a data mining model, and processing the electronic medical record data set through the data mining model to obtain an early warning result of whether the electronic medical record data set is a high risk group; the data mining model is a model obtained by training an initial data mining model;
the training processing of the initial data mining model comprises the following steps:
s1, taking a plurality of original electronic medical record data of whether the existing diagnosis conclusion is of a high-risk group as a training set;
s2, preprocessing samples of the training set to obtain preprocessed samples;
s3, inputting the preprocessed sample into a data mining model to obtain an early warning result of whether the sample is a high risk group;
s4, comparing the patients with the early warning result of the high-risk group with the patients with the diagnosis conclusion of the high-risk group to obtain the number of the patients with the correct early warning result of the high-risk group; calculating the number of patients with the correct early warning as the high-risk group and the total number of patients with the early warning as the high-risk group to obtain early warning accuracy; calculating the number of patients with the high risk group by using the correct early warning and the number of patients with the high risk group by using the diagnosis conclusion to obtain an early warning recall rate; calculating by using the early warning accuracy rate and the early warning recall rate to obtain an early warning F1 value; recording an early warning F1 value, and adjusting and updating data mining model parameters and early warning threshold values;
Judging whether the set number of wheels is reached: if yes, taking the data mining model corresponding to the maximum value of the early warning F1 value as a trained data mining model; otherwise, return to step S2.
Preferably, the electronic medical record raw data is structured text data; the data mining model comprises a text structuring sub-module for extracting features from text data, a feature engineering sub-module for improving the performance upper limit of the data mining model, a data balancing sub-module for balancing the proportion inclination of positive and negative samples in the data, and an integrated learning sub-module for classifying the samples.
Preferably, the text structuring sub-module comprises a named entity recognition part and a relation extraction part; the characteristic engineering submodule comprises a characteristic construction part and a characteristic screening part; the data equalization submodule comprises an oversampling part based on k-means clustering; the ensemble learning submodule includes a classifier stack portion;
the step S3 comprises the following steps:
s31, inputting the preprocessed sample into a data mining model;
s32, a sample passes through a text structuring sub-module:
executing a named entity recognition part, and splicing the output of the named entity recognition part with the preprocessed sample;
Executing a relation extraction part, and splicing the output of the relation extraction part with the preprocessed sample;
s33, through a characteristic engineering submodule:
executing a feature construction part, and splicing the output of the feature construction part with the preprocessed sample;
executing a feature screening part;
s34, the output of the feature screening part passes through a data equalization sub-module, and an oversampling part based on k-means clustering is executed;
s35, outputting an oversampling part based on k-means clustering through an ensemble learning sub-module:
and executing the classifier stacking part to obtain an early warning result of whether the high risk group is obtained.
Preferably, in the text structuring sub-module, the named entity recognition part has a four-layer structure, which is a word vector layer I, a bidirectional gating circulation unit layer I, a self-attention mechanism layer I and a random condition field layer respectively;
in the step S32, the named entity recognition portion is executed, which means: firstly, converting the preprocessed samples into OneHot sparse vectors through a word vector layer I, multiplying the OneHot sparse vectors by a custom input weight matrix, then linearly adding the OneHot sparse vectors to obtain an average value to obtain a hidden layer vector, multiplying the hidden layer vector by an output weight matrix, obtaining probability distribution through softMax function processing, and obtaining a result with the maximum probability to obtain a dense low-dimensional vector to be used as a word vector layer to be output;
Then inputting the first data to a bidirectional gating circulation unit layer I, wherein the bidirectional gating circulation unit layer I comprises a forward gating circulation unit and a reverse gating circulation unit; the forward gating circulation unit takes the left input as a starting point and inputs from left to right, and is regarded as reading in from the beginning of the text, the reverse gating circulation unit takes the right input as a starting point and inputs from right to left, and is regarded as reading in from the end of the text, and the forward gating circulation unit and the reverse gating circulation unit are spliced and output together;
then for each element x in the input sequence by self-attention mechanism layer one i With other elements x j Between them, the similarity f (x i ,x j ) The similarity coefficient a is obtained through the normalization processing of the softMax function i,j Multiplying the similarity coefficient by the corresponding hidden layer output to obtain a self-attention value result, splicing the hidden layer output and the self-attention value result, and obtaining a self-attention mechanism layer output through a tanh function;
finally, inputting the sequence labeling result to a random condition field layer, and calculating the overall optimal sequence labeling result by using a global optimal method;
the relation extraction part is used for establishing a relation extraction neural network with a four-layer structure on the basis of the named entity recognition part, and the relation extraction neural network is a word vector layer II, a two-way gating circulation unit layer II, a self-attention mechanism layer II and a softMax classification layer respectively;
In the step S32, the execution of the relation extracting section means: the method comprises the steps of inputting entity pairs and sentences output by a named entity recognition part into a word vector layer II, converting the entity pairs and sentences into dense low-dimensional vectors, splicing the dense low-dimensional vectors with position vectors representing the relative positions of the entity pairs and words, outputting the position vectors to a bi-directional gating cyclic unit layer II as a word vector layer II result, extracting text features of a forward sequence and a reverse sequence, outputting the text features to a self-attention mechanism layer II after splicing, calculating similarity coefficients, multiplying the similarity coefficients by corresponding hidden layer outputs, and then outputting the text features and the hidden layer outputs after splicing the text features and the hidden layer outputs together through a tanh function, and inputting the text features and the text features to a softMax layer as a self-attention mechanism layer II output result for relationship classification.
Preferably, in the feature engineering submodule, the feature construction part classifies the features into three main classes for classification processing; the feature classification includes: text type features, numerical type features of body indicators, and temporal type features;
text type features include status values for describing a feature of a patient; the processing method of the text type features is as follows: firstly, coding state values, then performing OneHot treatment, and combining diseases as new characteristics; the method for processing the numerical characteristics of the body index is as follows: performing mathematical operation on the numerical characteristics of the body indexes to construct new characteristics; the time type features include time of admission and time of discharge of the patient; the processing method of the time type features is as follows: counting the number of times of discharge according to the time of discharge and the time of discharge of the patient, and calculating the hospitalization time of the patient to construct new characteristics;
The characteristic screening part uses a residual analysis method, a value range is determined for a characteristic variable X, the value of a discrete characteristic variable is an integer, a continuous characteristic variable is divided by equal interval intervals, and a discrete value or interval is taken as a value; for each value c of the characteristic variable X, respectively calculating the occupancy rate of the high-risk group in the sample with the characteristic variable X as the value c as the high-risk rate with the characteristic variable X as the value c; calculating the occupancy rate of the high-risk group in the whole sample as the total high-risk rate; subtracting the absolute value from the high-risk rate and the total-risk rate to obtain a relative high-risk rate as a residual error; and removing the characteristics with large relative high-risk rate values, and retaining the characteristics with small relative high-risk rate values.
Preferably, the processing method of the oversampling part based on k-means clustering is as follows: firstly, selecting k samples as a clustering center, then calculating Euclidean distance from each sample point to the clustering center, dividing all samples into classes with minimum distance from each clustering center, and forming k subclasses from sample data; calculating new cluster centers in the k classes respectively, calculating Euclidean distance from each sample to the new cluster centers again, dividing sample data again, repeating the steps, and iterating until the iteration times reach a set value or the change of the cluster centers is smaller than the set value; after the clustering is completed, determining the oversampling ratio a in each class according to the sample number proportion of each subclass in each of the majority class and the minority class; determining an out-of-class oversampling ratio b according to the quantity ratio of the majority class to the minority class; then, according to the sampling ratio in the class and the sampling ratio out of the class, carrying out an oversampling method for artificially synthesizing new samples from each subclass in each of the majority class and the minority class, calculating the distance from each sample point x in the class to other samples in the class by using Euclidean distance, obtaining k adjacent points with the nearest distance, and randomly selecting a plurality of sample points from the k adjacent points by using sampling multiplying power a; then randomly selecting a sample point as a synthesized new sample in a sample set between the sample point x and each randomly selected sample point; and after each subclass in each of the majority class and the minority class finishes oversampling according to the sampling multiplying power of the subclass, obtaining a sample with balanced quantity proportion of non-high-risk groups and balanced in the class.
Preferably, the processing method of the classifier stacking part is as follows: five classifiers Xgboost, catboost, lightGbm, GBDT, LR are selected as basic learners, rannomforest is used as a meta learner, and then samples are randomly divided into ten parts; randomly extracting nine samples from the test set to respectively train five basic learners to obtain five different classification models, respectively putting the rest of samples into the five classification models as test sets to predict, and debugging the five classification models to obtain the early warning result of the five classification models on the rest of samples; repeating the steps of extracting the sample training basic learner and predicting the test set by using the classification model for ten times, wherein each time one training sample serving as the test set is selected to be different, so that the early warning result of the five classification models on ten samples is obtained;
then, taking the early warning results and the actual results of ten samples as new input samples, and using a ten-fold cross validation training element learner random to obtain a classifier stacking element model of the random and an early warning result of the whole sample; lightGbm, xgboost is respectively used as a meta learner, classifier stacking operation is repeated, and a light Gbm classifier stacking meta model and an early warning result of the whole sample, an Xgboost classifier stacking meta model and an early warning result of the whole sample are respectively obtained; and adding and averaging the output of the whole sample of the random classifier stack meta-model, the whole sample of the LightGbm classifier stack meta-model and the whole sample of the Xgboost classifier stack meta-model to obtain the early-warning result of whether the final crowd is at risk.
Preferably, in the data preprocessing module, preprocessing includes data desensitization, redundant attribute column processing, repeated data deletion, outlier processing, type conversion and missing value filling.
Preferably, the data desensitization is to renumber the original patient ID number and the hospitalization ID number in the electronic medical record data to obtain a new patient mapping ID number and a new hospitalization mapping ID number, and replace the original patient ID number and the hospitalization ID number;
the redundant attribute column processing is to delete obviously irrelevant or repeated attribute columns;
the repeated data deletion is to delete repeated patient data samples;
the abnormal value processing is to remove data samples with attribute values being abnormal values and accounting for less than 1% of the total sample number; the abnormal value is that the attribute value is more than 10 times of the average value of all samples in the attribute;
the type conversion is to convert character type numerical data in the data into a real type and convert character type time data in the data into a DATETIME type;
the missing value filling is to fill the missing value of the numerical attribute with the attribute missing rate less than 15% by using the average value of all samples in the attribute, the numerical attribute with the attribute missing rate more than 15% is not filled, and the missing value of the label attribute is filled by-1; the attribute missing rate is the ratio of the number of missing values of a certain attribute to the total sample amount.
Preferably, the method further comprises: the high-risk group medical record acquisition module is used for reserving an electronic medical record data set with the early warning result being the high-risk group, searching from the original data of the electronic medical record according to the patient ID number of the electronic medical record data set, and obtaining the detailed information of the patient with the early warning result being the high-risk group.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention can realize early discovery and early warning of high risk groups of diabetic complications, strengthen medical care intervention, reduce disability and death, and reduce readmission rate, thereby achieving the purposes of disease supervision and prevention, finally reducing medical cost and relieving hospital pressure; the model obtained by the early warning model training method can be applied to the early warning field of the high risk group of the diabetic complications, so that the early warning accuracy and early warning effect of the high risk group of the diabetic complications are improved.
Drawings
FIG. 1 is a block diagram of a deep learning and ensemble learning based high risk early warning system for diabetic complications according to the present invention;
FIG. 2 is a schematic workflow diagram of a deep learning and ensemble learning based high risk early warning system for diabetic complications according to the present invention;
fig. 3 is a flowchart of a data mining model training method of the diabetes complication high-risk early warning system based on deep learning and ensemble learning.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Examples
The embodiment provides a diabetes complication high-risk early warning system based on deep learning and integrated learning, as shown in fig. 1, comprising:
the data input module is used for inputting original data of the electronic medical record to be identified;
the data preprocessing module is used for preprocessing the original data of the electronic medical record to be identified to obtain an electronic medical record data set;
the data processing module is used for inputting the electronic medical record data set into a data mining model, and processing the electronic medical record data set through the data mining model to obtain an early warning result of whether the electronic medical record data set is a high risk group; the data mining model is a model obtained by training an initial data mining model;
the training processing of the initial data mining model comprises the following steps:
s1, taking a plurality of original electronic medical record data of whether the existing diagnosis conclusion is of a high-risk group as a training set;
s2, preprocessing samples of the training set to obtain preprocessed samples;
s3, inputting the preprocessed sample into a data mining model to obtain an early warning result of whether the sample is a high risk group;
S4, comparing the patients with the early warning result of the high-risk group with the patients with the diagnosis conclusion of the high-risk group to obtain the number of the patients with the correct early warning result of the high-risk group; calculating the number of patients with the correct early warning as the high-risk group and the total number of patients with the early warning as the high-risk group to obtain early warning accuracy; calculating the number of patients with the high risk group by using the correct early warning and the number of patients with the high risk group by using the diagnosis conclusion to obtain an early warning recall rate; calculating by using the early warning accuracy rate and the early warning recall rate to obtain an early warning F1 value; recording an early warning F1 value, and adjusting and updating data mining model parameters and early warning threshold values;
judging whether the set number of wheels is reached: if yes, taking the data mining model corresponding to the maximum value of the early warning F1 value as a trained data mining model; otherwise, return to step S2.
In the scheme, the model obtained by the early warning model training method can be applied to the early warning field of the high risk group of the diabetic complications, so that the early warning accuracy and the early warning effect of the high risk group of the diabetic complications are improved; the early detection, early warning and tracking of the high risk group of the diabetic complications are realized, the medical intervention is enhanced, the disability and death are reduced, and the readmission rate is reduced, so that the purposes of supervising and preventing diseases are achieved, the medical cost is finally reduced, and the pressure of a hospital is relieved.
The workflow of the diabetes complication high-risk early warning system based on deep learning and integrated learning is shown in figure 2.
In the data preprocessing module, preprocessing comprises data desensitization, redundant attribute column processing, repeated data deletion, outlier processing, type conversion and missing value filling.
The data desensitization is to renumber the original patient ID number and the hospitalization ID number in the electronic medical record data to obtain a new patient mapping ID number and a new hospitalization mapping ID number, and replace the original patient ID number and the hospitalization ID number;
the redundant attribute column processing is to delete obviously irrelevant or repeated attribute columns;
the repeated data deletion is to delete repeated patient data samples;
the abnormal value processing is to remove data samples with attribute values (characteristic values) being abnormal values and accounting for less than 1% of the total sample number; the abnormal value is that the attribute value is more than 10 times of the average value of all samples in the attribute;
the type conversion is to convert character type numerical data in the data into a real type and convert character type time data in the data into a DATETIME type;
the filling of the missing value is to fill the missing value of the numerical value type attribute (blood sugar value, blood pressure value and the like) with the attribute of which the attribute missing rate is less than 15 percent by using the average value of all samples in the attribute, the numerical value type attribute with the attribute missing rate being more than 15 percent is not filled, and the missing value of the label type attribute (diabetes type and the like) is filled with-1; the attribute missing rate is the ratio of the number of missing values of a certain attribute to the total sample amount.
The electronic medical record raw data is structured text data. The data mining model comprises a text structuring sub-module for extracting features from text data, a feature engineering sub-module for improving the upper limit of the model performance, a data balancing sub-module for balancing the proportion inclination of positive and negative samples in the data, and an integrated learning sub-module for classifying the samples.
The text structuring sub-module comprises a named entity identification part and a relation extraction part; the characteristic engineering submodule comprises a characteristic construction part and a characteristic screening part; the data equalization submodule comprises an oversampling part based on k-means clustering; the ensemble learning submodule includes a classifier stack portion;
the step S3 comprises the following steps:
s31, inputting the preprocessed sample into a data mining model;
s32, a sample passes through a text structuring sub-module:
executing a named entity recognition part, and splicing the output of the named entity recognition part with the preprocessed sample;
executing a relation extraction part, and splicing the output of the relation extraction part with the preprocessed sample;
s33, through a characteristic engineering submodule:
executing a feature construction part, and splicing the output of the feature construction part with the preprocessed sample;
Executing a feature screening part;
s34, the output of the feature screening part passes through a data equalization sub-module, and an oversampling part based on k-means clustering is executed;
s35, outputting an oversampling part based on k-means clustering through an ensemble learning sub-module:
and executing the classifier stacking part to obtain an early warning result of whether the high risk group is obtained.
Specifically, in the text structuring sub-module, the named entity recognition part has a four-layer structure, which is a word vector layer I, a two-way gating circulation unit layer I, a self-attention mechanism layer I and a random condition field layer respectively;
in the step S32, the named entity recognition portion is executed, which means: firstly, converting the preprocessed samples into OneHot sparse vectors through a word vector layer I, multiplying the OneHot sparse vectors by a custom input weight matrix, then linearly adding the OneHot sparse vectors to obtain an average value to obtain a hidden layer vector, multiplying the hidden layer vector by an output weight matrix, obtaining probability distribution through softMax function processing, and obtaining a result with the maximum probability to obtain a dense low-dimensional vector to be used as a word vector layer to be output;
then inputting the first data to a bidirectional gating circulation unit layer I, wherein the bidirectional gating circulation unit layer I comprises a forward gating circulation unit and a reverse gating circulation unit; the forward gating circulation unit takes the left input as a starting point and inputs from left to right, and is regarded as reading in from the beginning of the text, the reverse gating circulation unit takes the right input as a starting point and inputs from right to left, and is regarded as reading in from the end of the text, and the forward gating circulation unit and the reverse gating circulation unit are spliced and output together;
Then for each element x in the input sequence by self-attention mechanism layer one i With other elements x j Between them, the similarity f (x i ,x j ) The similarity coefficient a is obtained through the normalization processing of the softMax function i,j Multiplying the similarity coefficient by the corresponding hidden layer output to obtain a self-attention value result, splicing the hidden layer output and the self-attention value result, and obtaining a self-attention mechanism layer output through a tanh function;
finally, inputting the sequence labeling result to a random condition field layer, and calculating the overall optimal sequence labeling result by using a global optimal method;
the relation extraction part is used for establishing a relation extraction neural network with a four-layer structure on the basis of the named entity recognition part, and the relation extraction neural network is a word vector layer II, a two-way gating circulation unit layer II, a self-attention mechanism layer II and a softMax classification layer respectively;
in the step S32, the execution of the relation extracting section means: the method comprises the steps of inputting entity pairs and sentences output by a named entity recognition part into a word vector layer II, converting the entity pairs and sentences into dense low-dimensional vectors, splicing the dense low-dimensional vectors with position vectors representing the relative positions of the entity pairs and words, outputting the position vectors to a bi-directional gating cyclic unit layer II as a word vector layer II result, extracting text features of a forward sequence and a reverse sequence, outputting the text features to a self-attention mechanism layer II after splicing, calculating similarity coefficients, multiplying the similarity coefficients by corresponding hidden layer outputs, and then outputting the text features and the hidden layer outputs after splicing the text features and the hidden layer outputs together through a tanh function, and inputting the text features and the text features to a softMax layer as a self-attention mechanism layer II output result for relationship classification.
In the feature engineering submodule, the feature construction part classifies the features into three main categories for classification treatment; the feature classification includes: text type features (regular short text features and structured text features extracted from long text, such as past medical history), numerical features of various physical indicators in laboratory examinations (e.g., blood glucose, blood pressure), time type features (e.g., time of admission);
the text type characteristics comprise state values for describing certain characteristics of a patient, such as whether the patient suffers from hypertension or not, and various pathological changes of a certain organ when a certain part of the physical examination is normal or abnormal; the processing method of the text type features is as follows: firstly, coding state values such as 0, 1 and 2, then performing OneHot treatment, and combining diseases as new characteristics; such as hypertension, heart disease, etc., as a new feature of both hypertension and heart disease, and so on; the numerical characteristics of the body indexes are already perfect originally and can be directly used; the method for processing the numerical characteristics of the body index is as follows: performing mathematical operation on numerical characteristics of body indexes, such as the ratio between weight and height, and constructing new characteristics; the ratio between many body indexes is meaningful, and has great effect on improving the model performance. The time type features include time of admission and time of discharge of the patient; the processing method of the time type features is as follows: counting the number of times of discharge according to the time of discharge and the time of discharge of the patient, and calculating the hospitalization time of the patient to construct new characteristics;
The feature screening part is that the original feature number is added with the new feature number after the feature is constructed, so that the total number of the features is very large, and a plurality of features which can bring noise interference to model training exist, so that the features need to be screened, and the strong features of which the reserved part can really improve the model performance are reserved.
The characteristic screening part uses a residual analysis method, a value range is determined for a characteristic variable X, the value of a discrete characteristic variable is an integer such as 0,1 and 2, a continuous characteristic variable is divided by equal interval intervals, and the discrete value or interval is taken as a value such as [0,1 ], [1,2 ], [2, 3); for each value c of the characteristic variable X, respectively calculating the occupancy of the high-risk group in the sample with the characteristic variable X as the value c, namely dividing the number of the high-risk group in the sample with the characteristic variable as the value by the total number of the samples with the characteristic variable as the value c as the high-risk rate with the characteristic variable X; calculating the occupancy rate of the high-risk groups in the whole sample, namely dividing the number of the high-risk groups in the whole sample by the total number of the samples to obtain the total high-risk rate; subtracting the absolute value from the high-risk rate and the total-risk rate to obtain a relative high-risk rate as a residual error; and removing the characteristics with large relative high-risk rate values, and retaining the characteristics with small relative high-risk rate values.
Determining the influence on the performance stability of the data mining model according to the change amplitude of the relative high-risk rate of all the values of the characteristic variable X, wherein if the change amplitude of the relative high-risk rate is large and too small and too large values exist, the stability of the data mining model is not facilitated, the importance of the performance of the data mining model is not large, and the characteristic variable needs to be removed; if the change amplitude of the relative high-risk rate is not large and the numerical value is stable, the performance of the data mining model is improved, and the characteristic variables are required to be reserved. And (3) carrying out feature screening by using the residual analysis method to obtain features required by training the data mining model finally.
The oversampling part based on k-means clustering is to perform k-means clustering on sample data of most non-high risk groups and few high risk groups with less numbers in the preprocessed samples. The processing method of the oversampling part based on k-means clustering is as follows: firstly, selecting k samples as a clustering center, then calculating Euclidean distance from each sample point to the clustering center, dividing all samples into classes with minimum distance from each clustering center, and forming k subclasses from sample data; calculating new cluster centers in the k classes respectively, calculating Euclidean distance from each sample to the new cluster centers again, dividing sample data again, repeating the steps, and iterating until the iteration times reach a set value or the change of the cluster centers is smaller than the set value; after the clustering is completed, determining the oversampling ratio a in each class according to the sample number proportion of each subclass in each of the majority class and the minority class; determining an out-of-class oversampling ratio b according to the quantity ratio of the majority class to the minority class; then, according to the sampling ratio in the class and the sampling ratio out of the class, carrying out an oversampling method for artificially synthesizing new samples from each subclass in each of the majority class and the minority class, calculating the distance from each sample point x in the class to other samples in the class by using Euclidean distance, obtaining k adjacent points with the nearest distance, and randomly selecting a plurality of sample points from the k adjacent points by using sampling multiplying power a; then randomly selecting a sample point as a synthesized new sample in a sample set between the sample point x and each randomly selected sample point; and after each subclass in each of the majority class and the minority class finishes oversampling according to the sampling multiplying power of the subclass, obtaining a sample with balanced quantity proportion of non-high-risk groups and balanced in the class.
The processing method of the classifier stacking part comprises the following steps: five classifiers Xgboost, catboost, lightGbm, GBDT, LR are selected as basic learners, rannomforest is used as a meta learner, and then samples are randomly divided into ten parts; randomly extracting nine samples from the test set to respectively train five basic learners to obtain five different classification models, respectively putting the rest of samples into the five classification models as test sets to predict, and debugging the five classification models to obtain the early warning result of the five classification models on the rest of samples; repeating the steps of extracting the sample training basic learner and predicting the test set by using the classification model for ten times, wherein each time one training sample serving as the test set is selected to be different, so that the early warning result of the five classification models on ten samples is obtained;
then, taking the early warning results and the actual results of ten samples as new input samples, and using a ten-fold cross validation training element learner random to obtain a classifier stacking element model of the random and an early warning result of the whole sample; lightGbm, xgboost is respectively used as a meta learner, classifier stacking operation is repeated, and a light Gbm classifier stacking meta model and an early warning result of the whole sample, an Xgboost classifier stacking meta model and an early warning result of the whole sample are respectively obtained; and adding and averaging the output of the whole sample of the random classifier stack meta-model, the whole sample of the LightGbm classifier stack meta-model and the whole sample of the Xgboost classifier stack meta-model to obtain the early-warning result of whether the final crowd is at risk.
In the step S4, the early warning F1 value is used as an evaluation index of the data mining model, and the calculation formula is as follows:
wherein S is the number of patients with high risk groups in correct early warning, T is the total number of patients with high risk groups in early warning, U is the number of patients with high risk groups in diagnosis conclusion, P is early warning accuracy rate, and R is early warning recall rate.
The early warning system of the present invention preferably further comprises: the high-risk group medical record acquisition module is used for reserving an electronic medical record data set with the early warning result being the high-risk group, searching from the original data of the electronic medical record according to the patient ID number of the electronic medical record data set, and obtaining the detailed information of the patient with the early warning result being the high-risk group.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (6)

1. Diabetes complication high-risk early warning system based on deep learning and integrated learning, characterized in that: comprising the following steps:
the data input module is used for inputting original data of the electronic medical record to be identified; the electronic medical record original data is structured text data;
The data preprocessing module is used for preprocessing the original data of the electronic medical record to be identified to obtain an electronic medical record data set;
the data processing module is used for inputting the electronic medical record data set into a data mining model, and processing the electronic medical record data set through the data mining model to obtain an early warning result of whether the electronic medical record data set is a high risk group; the data mining model is a model obtained by training an initial data mining model;
the data mining model comprises a text structuring sub-module for extracting features from text data, a feature engineering sub-module for improving the performance upper limit of the data mining model, a data balancing sub-module for balancing the proportion inclination of positive and negative samples in the data and an integrated learning sub-module for classifying the samples;
the text structuring sub-module comprises a named entity identification part and a relation extraction part; in the text structuring sub-module, the named entity recognition part has a four-layer structure which is a word vector layer I, a two-way gating circulating unit layer I, a self-attention mechanism layer I and a random condition field layer respectively;
the characteristic engineering submodule comprises a characteristic construction part and a characteristic screening part; in the feature engineering submodule, the feature construction part classifies the features into three main categories for classification treatment; the feature classification includes: text type features, numerical type features of body indicators, and temporal type features; text type features include status values for describing a feature of a patient; the processing method of the text type features is as follows: firstly, coding state values, then performing OneHot treatment, and combining diseases as new characteristics; the method for processing the numerical characteristics of the body index is as follows: performing mathematical operation on the numerical characteristics of the body indexes to construct new characteristics; the time type features include time of admission and time of discharge of the patient; the processing method of the time type features is as follows: counting the number of times of discharge according to the time of discharge and the time of discharge of the patient, and calculating the hospitalization time of the patient to construct new characteristics;
The characteristic screening part uses a residual analysis method, a value range is determined for a characteristic variable X, the value of a discrete characteristic variable is an integer, a continuous characteristic variable is divided by equal interval intervals, and a discrete value or interval is taken as a value; for each value c of the characteristic variable X, respectively calculating the occupancy rate of the high-risk group in the sample with the characteristic variable X as the value c as the high-risk rate with the characteristic variable X as the value c; calculating the occupancy rate of the high-risk group in the whole sample as the total high-risk rate; subtracting the absolute value from the high-risk rate and the total-risk rate to obtain a relative high-risk rate as a residual error; removing the characteristics with large relative high-risk rate values and reserving the characteristics with small relative high-risk rate values;
the data equalization submodule comprises an oversampling part based on k-means clustering; the ensemble learning submodule includes a classifier stack portion;
the training processing of the initial data mining model comprises the following steps:
s1, taking a plurality of original electronic medical record data of whether the existing diagnosis conclusion is of a high-risk group as a training set;
s2, preprocessing samples of the training set to obtain preprocessed samples;
S3, inputting the preprocessed sample into a data mining model to obtain an early warning result of whether the sample is a high risk group;
s4, comparing the patients with the early warning result of the high-risk group with the patients with the diagnosis conclusion of the high-risk group to obtain the number of the patients with the correct early warning result of the high-risk group; calculating the number of patients with the correct early warning as the high-risk group and the total number of patients with the early warning as the high-risk group to obtain early warning accuracy; calculating the number of patients with the high risk group by using the correct early warning and the number of patients with the high risk group by using the diagnosis conclusion to obtain an early warning recall rate; calculating by using the early warning accuracy rate and the early warning recall rate to obtain an early warning F1 value; recording an early warning F1 value, and adjusting and updating data mining model parameters and early warning threshold values;
judging whether the set number of wheels is reached: if yes, taking the data mining model corresponding to the maximum value of the early warning F1 value as a trained data mining model; otherwise, returning to the step S2;
the step S3 comprises the following steps:
s31, inputting the preprocessed sample into a data mining model;
s32, a sample passes through a text structuring sub-module:
executing a named entity recognition part, and splicing the output of the named entity recognition part with the preprocessed sample;
Executing a relation extraction part, and splicing the output of the relation extraction part with the preprocessed sample;
executing the named entity recognition portion refers to: firstly, converting the preprocessed samples into OneHot sparse vectors through a word vector layer I, multiplying the OneHot sparse vectors by a custom input weight matrix, then linearly adding the OneHot sparse vectors to obtain an average value to obtain a hidden layer vector, multiplying the hidden layer vector by an output weight matrix, obtaining probability distribution through softMax function processing, and obtaining a result with the maximum probability to obtain a dense low-dimensional vector to be used as a word vector layer to be output;
then inputting the first data to a bidirectional gating circulation unit layer I, wherein the bidirectional gating circulation unit layer I comprises a forward gating circulation unit and a reverse gating circulation unit; the forward gating circulation unit takes the left input as a starting point and inputs from left to right, and is regarded as reading in from the beginning of the text, the reverse gating circulation unit takes the right input as a starting point and inputs from right to left, and is regarded as reading in from the end of the text, and the forward gating circulation unit and the reverse gating circulation unit are spliced and output together;
then for each element x in the input sequence by self-attention mechanism layer one i With other elements x j Between them, the similarity f (x i ,x j ) Warp yarnCarrying out SoftMax function normalization processing to obtain a similarity coefficient a i,j Multiplying the similarity coefficient by the corresponding hidden layer output to obtain a self-attention value result, splicing the hidden layer output and the self-attention value result, and obtaining the output of a self-attention mechanism layer I through a tanh function;
finally, inputting the sequence labeling result to a random condition field layer, and calculating the overall optimal sequence labeling result by using a global optimal method;
the relation extraction part is used for establishing a relation extraction neural network with a four-layer structure on the basis of the named entity recognition part, and the relation extraction neural network is a word vector layer II, a two-way gating circulation unit layer II, a self-attention mechanism layer II and a softMax classification layer respectively;
the execution relationship extraction section refers to: the method comprises the steps of inputting entity pairs and sentences output by a named entity recognition part into a word vector layer II, converting the entity pairs and sentences into dense low-dimensional vectors, splicing the dense low-dimensional vectors with position vectors representing the relative positions of the entity pairs and words, outputting the position vectors to a bi-directional gating cyclic unit layer II as a word vector layer II result, extracting text features of a forward sequence and a reverse sequence, outputting the text features to a self-attention mechanism layer II after splicing, calculating similarity coefficients, multiplying the similarity coefficients by corresponding hidden layer outputs, and then outputting the text features and the hidden layer outputs after splicing the text features and the hidden layer outputs together through a tanh function, and inputting the text features to a softMax layer as a self-attention mechanism layer II output result for relationship classification;
S33, through a characteristic engineering submodule:
executing a feature construction part, and splicing the output of the feature construction part with the preprocessed sample;
executing a feature screening part;
s34, the output of the feature screening part passes through a data equalization sub-module, and an oversampling part based on k-means clustering is executed;
s35, outputting an oversampling part based on k-means clustering through an ensemble learning sub-module:
and executing the classifier stacking part to obtain an early warning result of whether the high risk group is obtained.
2. The deep learning and ensemble learning-based high risk early warning system for diabetic complications according to claim 1, wherein: the processing method of the oversampling part based on k-means clustering is as follows: firstly, selecting k samples as a clustering center, then calculating Euclidean distance from each sample point to the clustering center, dividing all samples into classes with minimum distance from each clustering center, and forming k subclasses from sample data; calculating new cluster centers in the k classes respectively, and re-calculating Euclidean distances from each sample to the new cluster centers to re-divide sample data; iteration is continued until the iteration times reach a set value or the change of the clustering center is smaller than the set value; after the clustering is completed, determining the oversampling ratio a in each class according to the sample number proportion of each subclass in each of the majority class and the minority class; determining an out-of-class oversampling ratio b according to the quantity ratio of the majority class to the minority class; then, according to the sampling ratio in the class and the sampling ratio out of the class, carrying out an oversampling method for artificially synthesizing new samples from each subclass in each of the majority class and the minority class, calculating the distance from each sample point x in the class to other samples in the class by using Euclidean distance, obtaining k adjacent points with the nearest distance, and randomly selecting a plurality of sample points from the k adjacent points by using sampling multiplying power a; then randomly selecting a sample point as a synthesized new sample in a sample set between the sample point x and each randomly selected sample point; and after each subclass in each of the majority class and the minority class finishes oversampling according to the sampling multiplying power of the subclass, obtaining a sample with balanced quantity proportion of non-high-risk groups and balanced in the class.
3. The deep learning and ensemble learning-based high risk early warning system for diabetic complications according to claim 2, wherein: the processing method of the classifier stacking part comprises the following steps: five classifiers Xgboost, catboost, lightGbm, GBDT, LR are selected as basic learners, rannomforest is used as a meta learner, and then samples are randomly divided into ten parts; randomly extracting nine samples from the test set to respectively train five basic learners to obtain five different classification models, respectively putting the rest of samples into the five classification models as test sets to predict, and debugging the five classification models to obtain the early warning result of the five classification models on the rest of samples; repeating the steps of extracting a sample training basic learner and predicting a test set by using a classification model for ten times, wherein each time one training sample serving as the test set is selected to be different, so that the early warning result of five classification models on ten samples is obtained;
then, taking the early warning results and the actual results of ten samples as new input samples, and using a ten-fold cross validation training element learner random to obtain a classifier stacking element model of the random and an early warning result of the whole sample; lightGbm, xgboost is respectively used as a meta learner, classifier stacking operation is repeated, and a light Gbm classifier stacking meta model and an early warning result of the whole sample, an Xgboost classifier stacking meta model and an early warning result of the whole sample are respectively obtained; and adding and averaging the output of the whole sample of the random classifier stack meta-model, the whole sample of the LightGbm classifier stack meta-model and the whole sample of the Xgboost classifier stack meta-model to obtain the early-warning result of whether the final crowd is at risk.
4. The deep learning and ensemble learning based high risk early warning system for diabetic complications according to any one of claims 1 to 3, wherein: in the data preprocessing module, preprocessing comprises data desensitization, redundant attribute column processing, repeated data deletion, outlier processing, type conversion and missing value filling.
5. The deep learning and ensemble learning-based high risk early warning system for diabetic complications according to claim 4, wherein: the data desensitization is to renumber the original patient ID number and the hospitalization ID number in the electronic medical record data to obtain a new patient mapping ID number and a new hospitalization mapping ID number, and replace the original patient ID number and the hospitalization ID number;
the redundant attribute column processing is to delete obviously irrelevant or repeated attribute columns;
the repeated data deletion is to delete repeated patient data samples;
the abnormal value processing is to remove data samples with attribute values being abnormal values and accounting for less than 1% of the total sample number; the abnormal value is that the attribute value is more than 10 times of the average value of all samples in the attribute;
the type conversion is to convert character type numerical data in the data into a real type and convert character type time data in the data into a DATETIME type;
The missing value filling is to fill the missing value of the numerical attribute with the attribute missing rate less than 15% by using the average value of all samples in the attribute, the numerical attribute with the attribute missing rate more than 15% is not filled, and the missing value of the label attribute is filled by-1; the attribute missing rate is the ratio of the number of missing values of a certain attribute to the total sample amount.
6. The deep learning and ensemble learning based high risk early warning system for diabetic complications according to any one of claims 1 to 3, wherein: further comprises: the high-risk group medical record acquisition module is used for reserving an electronic medical record data set with the early warning result being the high-risk group, searching from the original data of the electronic medical record according to the patient ID number of the electronic medical record data set, and obtaining the detailed information of the patient with the early warning result being the high-risk group.
CN202010677989.5A 2020-07-15 2020-07-15 Deep learning and integrated learning-based diabetes complication high-risk early warning system Active CN111968741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010677989.5A CN111968741B (en) 2020-07-15 2020-07-15 Deep learning and integrated learning-based diabetes complication high-risk early warning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010677989.5A CN111968741B (en) 2020-07-15 2020-07-15 Deep learning and integrated learning-based diabetes complication high-risk early warning system

Publications (2)

Publication Number Publication Date
CN111968741A CN111968741A (en) 2020-11-20
CN111968741B true CN111968741B (en) 2023-07-18

Family

ID=73361473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010677989.5A Active CN111968741B (en) 2020-07-15 2020-07-15 Deep learning and integrated learning-based diabetes complication high-risk early warning system

Country Status (1)

Country Link
CN (1) CN111968741B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112819797B (en) * 2021-02-06 2023-09-19 国药集团基因科技有限公司 Method, device, system and storage medium for analyzing diabetic retinopathy
CN113159132A (en) * 2021-03-26 2021-07-23 上海市杨浦区中心医院(同济大学附属杨浦医院) Hypertension grading method based on multi-model fusion
CN113113140B (en) * 2021-04-02 2022-09-23 中山大学 Diabetes early warning method, system, equipment and storage medium based on self-supervision DNN
CN113345581B (en) * 2021-05-14 2023-06-27 浙江工业大学 Cerebral apoplexy post thrombolysis bleeding probability prediction method based on ensemble learning
CN113421632A (en) * 2021-07-09 2021-09-21 中国人民大学 Psychological disease type diagnosis system based on time series
CN115148361B (en) * 2022-07-15 2023-10-10 深圳大学 Disease subtype determination system and method
CN117972530B (en) * 2024-03-28 2024-06-11 北京大数据先进技术研究院 Ant lion optimization-based missing unbalanced data multi-classification method and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451385A (en) * 2016-05-30 2017-12-08 中国科学院软件研究所 A kind of the nervous system disease monitoring and early warning system based on daily necessities
CN107680676A (en) * 2017-09-26 2018-02-09 电子科技大学 A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven
CN110164559A (en) * 2019-04-28 2019-08-23 万达信息股份有限公司 A kind of lunger's early warning system based on electronic health record data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190287685A1 (en) * 2018-03-16 2019-09-19 Vvc Holding Corporation String classification apparatus and methods using artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451385A (en) * 2016-05-30 2017-12-08 中国科学院软件研究所 A kind of the nervous system disease monitoring and early warning system based on daily necessities
CN107680676A (en) * 2017-09-26 2018-02-09 电子科技大学 A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven
CN110164559A (en) * 2019-04-28 2019-08-23 万达信息股份有限公司 A kind of lunger's early warning system based on electronic health record data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于电子病历数据挖掘的疾病危重度动态预测研究;李季;丁凤一;李翔宇;;信息资源管理学报(第04期);全文 *

Also Published As

Publication number Publication date
CN111968741A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN111968741B (en) Deep learning and integrated learning-based diabetes complication high-risk early warning system
US20220358363A1 (en) Engine surge fault prediction system and method based on fusion neural network model
WO2021120936A1 (en) Chronic disease prediction system based on multi-task learning model
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN110021364B (en) Analysis and detection system for screening single-gene genetic disease pathogenic genes based on patient clinical symptom data and whole exome sequencing data
WO2016192612A1 (en) Method for analysing medical treatment data based on deep learning, and intelligent analyser thereof
CN111834012A (en) Traditional Chinese medicine syndrome diagnosis method and device based on deep learning and attention mechanism
CN109935337B (en) Medical record searching method and system based on similarity measurement
Jiang et al. A hybrid intelligent model for acute hypotensive episode prediction with large-scale data
WO2022166158A1 (en) System for performing long-term hazard prediction on hemodialysis complications on basis of convolutional survival network
CN113838577B (en) Convenient layered old people MODS early death risk assessment model, device and establishment method
Shimaa Ouf A proposed paradigm for intelligent heart disease prediction system using data mining techniques
CN117153393A (en) Cardiovascular disease risk prediction method based on multi-mode fusion
CN116959725A (en) Disease risk prediction method based on multi-mode data fusion
CN109360658A (en) A kind of the disease pattern method for digging and device of word-based vector model
Gong et al. Prognosis analysis of heart failure based on recurrent attention model
CN110400610B (en) Small sample clinical data classification method and system based on multichannel random forest
Waheeb et al. An efficient sentiment analysis based deep learning classification model to evaluate treatment quality
CN113284627A (en) Medication recommendation method based on patient characterization learning
CN115660871B (en) Unsupervised modeling method for medical clinical process, computer equipment and storage medium
CN116994751A (en) Method and device for constructing pre-eclampsia early-stage risk prediction model
CN115035346A (en) Classification method for Alzheimer disease based on cooperative learning method enhancement
CN114694787A (en) Optimization method and optimization system for health examination items and information data processing terminal
CN115019960A (en) Disease aid decision-making system based on personalized state space progress model
CN114255878A (en) Training method, system, device and storage medium of disease typing model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant