CN115148319A - Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases - Google Patents

Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases Download PDF

Info

Publication number
CN115148319A
CN115148319A CN202210877630.1A CN202210877630A CN115148319A CN 115148319 A CN115148319 A CN 115148319A CN 202210877630 A CN202210877630 A CN 202210877630A CN 115148319 A CN115148319 A CN 115148319A
Authority
CN
China
Prior art keywords
classification
data set
disease
characteristic value
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210877630.1A
Other languages
Chinese (zh)
Inventor
张宏国
任涵彬
杜宇芳
方舟
白瑞
杨霄璇
宋雪
李锐
刘明鸽
齐红
何晨龙
耿瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Network Space Research Center
Harbin University of Science and Technology
Original Assignee
Heilongjiang Network Space Research Center
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang Network Space Research Center, Harbin University of Science and Technology filed Critical Heilongjiang Network Space Research Center
Priority to CN202210877630.1A priority Critical patent/CN115148319A/en
Publication of CN115148319A publication Critical patent/CN115148319A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides an auxiliary classification method, equipment and a storage medium for multi-clinical stage diseases, wherein the method comprises the following steps: determining a medical record data set; extracting characteristic values and labels in the data to form a characteristic value set and a label set; performing secondary classification on the medical record data set by using a secondary classification model; analyzing the association degree of the characteristic value set to obtain an optimized characteristic value set; screening the optimized characteristic value set to obtain a key characteristic value set; medical record data with characteristic values meeting the confirmed diagnosis conditions are searched in the health data set and added into the sick data set to form a new sick data set; and carrying out multi-classification on the new diseased data set to obtain the prediction of different stages of the disease. The invention predicts the disease stage by stage through a classification algorithm model and assists doctors to diagnose the disease.

Description

Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases
Technical Field
The application relates to the field of intelligent medical treatment, in particular to an auxiliary classification method, equipment and a storage medium for multi-clinical stage diseases.
Background
Disease staging initially stayed at a purely clinical level, e.g., mild vs. severe symptoms, and thereafter evolved gradually to a more advanced clinical pathology perspective under the guidance of progress in the fields of autopsy, imaging, and biomarkers. The disease stage is suitable for diseases which are possibly delayed to be healed, have progressive functional deterioration and/or are possibly died early, and for most diseases, the early state is relatively stable, the clinical cure rate is higher, the late state is fast to develop, and the cure rate is lower. If the patient can find and treat the disease in the early stage of the disease development, the clinical cure rate of the patient is greatly improved before the disease condition is worsened, so that how to accurately diagnose the stage of the disease is one of the important problems in clinical medicine. With the development of machine learning and the improvement of electronic medical records, data-driven intelligent medical diagnosis and treatment methods become the mainstream. Intelligent medical treatment is a hot point of academic research in recent years and is a hot focus of combination of computer and medical fields, so how to help disease stage diagnosis through intelligent medical treatment is a problem to be solved.
Disclosure of Invention
In view of the above, the present application provides a method, an apparatus, and a storage medium for assisting classification of multiple clinical stage diseases, so as to solve the problem of helping disease stage diagnosis through intelligent medical treatment.
The implementation method of the technical scheme of the application comprises the following steps:
an assisted classification method for multi-clinical stage diseases, comprising:
determining a medical record data set S1, wherein the medical record data set S1 comprises medical record data of at least one patient;
extracting characteristic values and labels of medical records in the medical record data set S1 to form a characteristic value set F and a label set D, wherein the characteristic value set F comprises physical examination data and examination result data in the medical record data of patients, and the label set D comprises diseased or healthy labels determined based on the diagnosis results of doctors;
performing secondary classification on the medical record data set S1 by using a secondary classification model based on the characteristic value set F and the label set D to obtain a healthy data set and a diseased data set;
analyzing the association degree of the characteristic value set F to obtain an optimized characteristic value set F1;
based on the medical field information, screening the optimized characteristic value set F1 to obtain a key characteristic value set F2 and conditions corresponding to the characteristics in the key characteristic value set F2;
medical record data with the characteristic value meeting the confirmed diagnosis condition in the F2 are searched in the health data set and added into the sick data set to form a new sick data set S3;
the new diseased data set S3 is multi-classified to obtain predictions of different stages of the disease.
In the method, the volume survey data at least comprises: height, weight, pain level, smoking history, drinking history, and medical history;
the inspection result data includes at least: biochemical test result of hematuria and imaging test result.
In the method, the secondary classification of the medical record data set S1 by using a secondary classification model based on the feature value set F and the label set D includes:
establishing a candidate two-classification model library, wherein the candidate two-classification model library comprises a plurality of two-classification models;
and simultaneously executing a plurality of two classification models to obtain the accuracy, recall rate and F1Score value of the two classification models, comprehensively considering the three classification evaluation indexes, and selecting the medical record data set S1 with the best evaluation index effect, wherein one two classification model carries out two classifications on the medical record data set S1.
In the method, the analyzing the association degree of the characteristic value set F to obtain an optimized characteristic value set F1 includes:
and performing association degree analysis on the characteristic values in the characteristic value set F through chi-square test, or sample variance values, or discrete category interaction information, and deleting the characteristic values with lower association degree to obtain an optimized characteristic value set F1.
In the method, the optimized characteristic value set F1 is screened based on the medical field information to obtain a key characteristic value set F2, wherein the key characteristic value set is the characteristic value set which has decisive influence on the confirmed disease.
In the method, before the multi-classifying the new diseased data set S3, the method further includes:
filling missing feature items in the new diseased data set S3 with a specific value, or an average value, or a mode according to the corresponding medical meaning;
the data in the filled diseased data set S3 is normalized to constitute a data set S4.
In the method, the multi-classification is performed on the new diseased data set S3, specifically:
determining a new label set D 'according to the disease type, wherein the new label set D' is a stage diagnosis set corresponding to the disease;
performing multi-classification on the S4 based on the deep neural network model; wherein the content of the first and second substances,
the number of the neurons of the input layer corresponds to the number of the characteristic values in the characteristic set F1;
the number of the neurons of the output layer corresponds to the number of disease stages, namely the number of numerical values in the label set D';
using the relu function as an activation function for each hidden layer and creating a softmax function, a disease stage prediction is determined.
The invention also provides auxiliary classification equipment for multi-clinical stage diseases, which comprises: a processor and a memory;
the processor is used for storing a computer program for realizing the auxiliary classification method of the multi-clinical stage diseases.
The invention also proposes a storage medium for storing at least one set of instructions;
the set of instructions is for being invoked and performing at least the assisted classification method for the multi-clinical stage disease.
The method provided by the invention is suitable for multi-stage disease diagnosis. Firstly, a machine learning two-classification model is used for carrying out two-classification on whether diseases are diagnosed or not, then professional knowledge in the medical field is applied to determine a characteristic value set, and the diagnosed data in the two-classification result is diagnosed by a deep learning multi-classification model to realize disease stage diagnosis.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings may be obtained according to the drawings without inventive labor.
FIG. 1 is a flow chart of an embodiment of a method for assisted classification of multiple clinical stage diseases according to the present invention;
FIG. 2 is a schematic structural diagram of an auxiliary classification device for multi-clinical stage diseases according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In view of the above, the present application provides a method, an apparatus, and a storage medium for assisting classification of multiple clinical stage diseases, so as to solve the problem of helping disease stage diagnosis through intelligent medical treatment.
With the development of machine learning and the improvement of electronic medical records, data-driven intelligent medical diagnosis and treatment methods become the mainstream. The generation of a large number of electronic medical records provides an adequate data source for intelligent medical treatment. On the other hand, how to accurately carry out disease staging is one of the major difficulties of disease diagnosis of clinicians, and for most diseases, the early disease condition is relatively stable, the clinical cure rate is also high, the late disease condition is rapidly developed, and the cure rate is low. The disease staging can be timely and accurately carried out, so that the survival rate and the quality after healing of the patient are greatly improved. In view of the practical problems, the invention provides an auxiliary classification method for multi-clinical stage diseases, which is suitable for predicting the diseases with the multi-clinical stage and assisting a clinician in diagnosing the diseases.
The implementation method of the technical scheme of the application comprises the following steps:
the embodiment of the invention provides an auxiliary classification method for multi-clinical stage diseases, which comprises the following steps of:
s101: determining a medical record data set S1, wherein the medical record data set S1 comprises medical record data of at least one patient; the medical record data set S1 can comprise electronic medical records of a plurality of patients in a hospital medical record library as a data set;
s102: extracting characteristic values and labels of medical records in the medical record data set S1 to form a characteristic value set F and a label set D, wherein the characteristic value set F comprises physical examination data and examination result data in the medical record data of patients, and the label set D comprises diseased or healthy labels determined based on the diagnosis results of doctors;
when a patient goes to a doctor, a doctor can input a medical record of the patient through an electronic medical record information system of a hospital. The electronic medical record data comprises personal information of patients, symptom data of the patients, physical examination, biochemical detection data, diagnosis orders of doctors and medication data. Then, the electronic medical record data of the patient is derived from the electronic medical record information system, all disease characteristics in the medical record data set, such as height, weight, pain degree, smoking history, drinking history, medical history and other physical examination data, hematuria biochemical detection results, imaging examination results and other results are used as characteristic values, and the multi-stage disease condition diagnosed by a doctor is simplified into disease or health and used as a label.
S103: performing secondary classification on the medical record data set S1 by using a secondary classification model based on the characteristic value set F and the label set D to obtain a healthy data set and a diseased data set;
based on preliminary observation and statistics of electronic medical records, we have found that the overall diagnostic results can be divided into two categories, namely "healthy" and "diseased" depending on whether the disease is diseased or not. Then, we perform a preliminary screening on the labels in the medical record data S1, remove the personal information part and the medication advice part of the patient that have no effect on disease diagnosis, and use the remaining feature values, such as the pain level, the physical examination data of height and weight, and the biochemical detection result of hematuria, as a feature set, which is denoted by F = { F1, F2, \8230;, fn }. Let us let D denote the set of diseased cases as the set of labels, then D = {1,0},1 denotes diseased and 0 denotes healthy. Performing secondary classification on the S1 on the basis;
since medical field knowledge is not introduced here, in order to better support the two-classification diagnosis of different types of multi-clinical staging diseases, a concept of "candidate two-classification model library" is proposed, and a user can simultaneously execute a plurality of two-classification models according to a specific scenario (i.e., an electronic medical record of a specific multi-clinical staging disease), and then select the most appropriate one of the two-classification models according to an actual test effect. Commonly used binary classification algorithms include logistic regression, K Nearest Neighbor (KNN), support Vector Machine (SVM), and the like. In addition to the mainstream binary classification algorithm, the random forest and XGboost models also have better performance in the classification problem. Taking the case that two classification models, namely SVM and XGboost, are used in a "candidate two-classification model library":
a Support Vector Machine (SVM) is a typical binary model. The basic model of the method is a linear classifier with the maximum interval defined on a feature space, and the maximum interval makes the linear classifier different from a perceptron; the SVM also includes kernel techniques, which make it a substantially non-linear classifier. The basic idea is to solve for hyperplanes that can properly partition a data set and have a geometrically maximum separation. In sample space, a hyperplane (ω, b) is determined by a normal vector ω and a displacement term b, and the distance from any point x in sample space to the hyperplane can be written as:
Figure BDA0003763168770000061
among the numerous hyperplanes for segmenting the two types of samples, the partition hyperplane with the largest interval needs to satisfy the following constraints:
Figure BDA0003763168770000062
s.t.yi(ωTxi+b)≥1,i=1,2,…,m
the XGboost is an optimized distributed gradient enhancement library. The machine learning algorithm is realized under the Gradient Boosting framework, and large-scale training samples can be efficiently, flexibly and conveniently processed. The objective function for Xgboost is:
Figure BDA0003763168770000063
wherein n is the number of medical record samples, y i For a true diagnosis of the ith medical record,
Figure BDA0003763168770000064
the predictive diagnosis of the ith sample for the model. K denotes the number of regression trees, f k Expressing the kth tree, wherein omega is the complexity of a regression tree as a regularization term, and is expressed as follows:
Figure BDA0003763168770000065
wherein T is the number of leaf nodes, omega is pruned through gamma when the leaf nodes are excessive, and lambda is controlled
Figure BDA0003763168770000066
The problem of overfitting occurs when it is too large.
Our optimization objective is
Figure BDA0003763168770000067
Wherein
Figure BDA0003763168770000068
The sample falls to the leaf node value of the ith regression tree for the optimal case.
S104: analyzing the association degree of the characteristic value set F to obtain an optimized characteristic value set F1;
in order to improve the accuracy of model prediction, chi-square test, sample variance value and discrete category interaction information are comprehensively considered to carry out relevance analysis on the characteristic value in the F. Taking chi-square test and discrete category interaction information as an example, chi-square statistic between the characteristic value Fi and whether the disease D is affected is as follows:
Figure BDA0003763168770000069
where A is the actual value of F over D and T is the theoretical value. X is the absolute magnitude of the deviation of the actual value from the theoretical value, and the larger X indicates that Fi has more influence on the disease.
The discrete category interactive information is called 'mutual information' for short, and is a method for screening characteristic values in characteristic engineering. For discrete random variables X, Y, the formula for mutual information is as follows:
Figure BDA0003763168770000071
if X, Y are mutually independent variables p (X, Y) = p (X) p (Y), I (X; Y) above is 0, so that a larger value of I (X; Y) indicates a larger correlation between the two variables.
On the basis, the feature values with low association degree are deleted, and a feature set F1 after preliminary optimization is obtained.
S105: based on the medical field information, screening the optimized characteristic value set F1 to obtain a key characteristic value set F2 and conditions corresponding to the characteristics in the key characteristic value set F2;
on the basis, medical field knowledge is introduced, and a characteristic value set F2= { fn, \8230; fm } and conditions thereof which have decisive influence on a prediction result are screened from the characteristic value set F1;
the process is based on disease diagnosis knowledge in the medical field, and characteristic values which have decisive influence on confirmed diseases are screened, for example, the score of the ventilation quantitative score has decisive influence on whether the ventilation is confirmed or not for gout;
s106: medical record data with the characteristic value meeting the confirmed diagnosis condition in the F2 are searched in the health data set and added into the sick data set to form a new sick data set S3;
on the basis, medical field knowledge is introduced, the characteristic value set F2= { fn, \8230; fm } which has decisive influence on a prediction result and conditions thereof are screened from the characteristic value set F1, the health data set S2 with D =0 after the two classifications is searched, data S2 'with F2 meeting diagnosis confirmation conditions are screened, and then S2' is added into the data set with D =1 to form a data set S3 for multi-classification.
S107: the new diseased data set S3 is multi-classified to obtain predictions of different stages of the disease.
In the method, the volume survey data at least comprises: height, weight, pain level, smoking history, drinking history, and medical history;
the inspection result data includes at least: biochemical detection result of hematuria and imaging examination result.
In the method, the secondary classification of the medical record data set S1 by using a secondary classification model based on the feature value set F and the label set D includes:
establishing a candidate two-classification model library, wherein the candidate two-classification model library comprises a plurality of two-classification models;
and simultaneously executing a plurality of two classification models to obtain the accuracy, recall rate and F1Score value of the two classification models, comprehensively considering the three classification evaluation indexes, and selecting the medical record data set S1 with the best evaluation index effect, wherein one two classification model carries out two classifications on the medical record data set S1.
In the method, the analyzing the association degree of the characteristic value set F to obtain an optimized characteristic value set F1 includes:
and performing association degree analysis on the characteristic values in the characteristic value set F through chi-square test, or sample variance values, or discrete category interaction information, and deleting the characteristic values with lower association degree to obtain an optimized characteristic value set F1.
In the method, the optimized characteristic value set F1 is screened based on the medical field information to obtain a key characteristic value set F2, wherein the key characteristic value set is the characteristic value set which has decisive influence on the diagnosed diseases.
In the method, before the multi-classifying the new diseased data set S3, the method further includes:
filling missing feature items in the new diseased data set S3 with a specific value, or an average value, or a mode according to the corresponding medical meaning;
for the characteristic item of data missing in S3, the missing value is filled in with a specific value, an average value or a mode according to the medical meaning of the item. E.g., the number of painful joints, the absence of an attribute indicates that the patient does not present symptoms of joint pain, and the default non-painful joints are filled with 0 s. If the drinking type is lost in the drinking history, the value is the most frequently appeared 'beer' type.
The data in the filled diseased data set S3 is normalized to constitute a data set S4.
And because different evaluation indexes often have different dimensions and dimension units, in order to make up for the influence of the problem on data analysis, a Z-Score method is adopted for standardization to scale the data in proportion so as to enable the data to fall into a specific interval.
Figure BDA0003763168770000081
Where x is the actual value of a certain characteristic value in F1, μ is the mean, and σ is the standard deviation. The Z-Score method converts data of different magnitudes into a unified measurement, and the comparability of the data is improved. And the data S4 subjected to missing value filling and normalization can be used as the input of a multi-classification model for disease stage prediction.
In the method, the multi-classification is performed on the new diseased data set S3, specifically:
determining a new label set D 'according to the disease type, wherein the new label set D' is a stage diagnosis set corresponding to the disease;
performing multi-classification on the S4 based on the deep neural network model; wherein the content of the first and second substances,
the number of the neurons of the input layer corresponds to the number of the characteristic values in the characteristic set F1;
the number of the neurons of the output layer corresponds to the number of disease stages, namely the number of numerical values in the label set D';
using the relu function as an activation function of each hidden layer and creating a softmax function, disease stage prediction is determined.
We multi-classify S4 using a deep neural network model (DNN model). DNN is a neural network comprising a plurality of hidden layers, and its internal neural network layers can be classified into three categories: an input layer, a hidden layer, and an output layer. The number of neurons in the input layer corresponds to the number of characteristic values in the characteristic set F1, and the number of neurons in the output layer corresponds to the number of disease stages, i.e. the indexSign D' = { D = 1 ,d 2 ,…,d n |d i E.g. N + }, where d 1 To d n All the diseases are diagnosed in different stages. And uses the relu function as the activation function of each hidden layer and creates a softmax function for the activation function of the output layer to solve the multi-classification problem. Wherein the softmax function is defined as follows:
Figure BDA0003763168770000091
wherein z is i Is the output value of the ith node, namely the output value of a certain disease stage; c is the number of output nodes, namely the number of disease stages. And the cross entropy of the classification which shows better in the multi-classification problem is used as a loss function according to different disease types. To further improve the accuracy of prediction of different disease stages.
In another embodiment, the present invention further provides an auxiliary classification device for multi-clinical stage diseases, comprising: a processor 201 and a memory 202;
the processor is used for storing a computer program for realizing the auxiliary classification method of the multi-clinical stage diseases.
In yet another embodiment, the present invention further provides a storage medium for storing at least one set of instructions;
the set of instructions is for being invoked and performing at least the assisted classification method for the multi-clinical stage disease.
The method provided by the invention is suitable for multi-stage disease diagnosis. Firstly, a machine learning two-classification model is used for carrying out two-classification on whether diseases are diagnosed or not, then professional knowledge in the medical field is applied to determine a characteristic value set, and the diagnosed data in the two-classification result is diagnosed by a deep learning multi-classification model to realize disease stage diagnosis. The disease characteristics are segmented and screened on the complex and various electronic medical record data collected by a hospital by combining with professional knowledge in the medical field, and the segmented and screened electronic medical record data is used for predicting diseases with multiple clinical stages and assisting a clinician in disease diagnosis.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present application, should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims (9)

1. An auxiliary classification method for multi-clinical stage diseases is characterized by comprising the following steps:
determining a medical record data set S1, wherein the medical record data set S1 comprises medical record data of at least one patient;
extracting characteristic values and labels of medical records in the medical record data set S1 to form a characteristic value set F and a label set D, wherein the characteristic value set F comprises physical examination data and examination result data in the medical record data of patients, and the label set D comprises two types of labels of diseases or health which are determined based on doctor diagnosis results;
performing secondary classification on the medical record data set S1 by using a secondary classification model based on the characteristic value set F and the label set D to obtain a healthy data set and a diseased data set;
analyzing the association degree of the characteristic value set F to obtain an optimized characteristic value set F1;
based on the medical field information, screening the optimized characteristic value set F1 to obtain a key characteristic value set F2 and conditions corresponding to the characteristics in the key characteristic value set F2;
medical record data with characteristic values meeting the confirmed diagnosis conditions in the F2 are searched in the health data set and added into the diseased data set to form a new diseased data set S3;
the new diseased data set S3 is multi-classified to obtain predictions of different stages of the disease.
2. The method of claim 1, wherein the volume survey data comprises at least: height, weight, pain level, smoking history, drinking history, and medical history;
the inspection result data at least includes: biochemical test result of hematuria and imaging test result.
3. The method according to claim 1, wherein the bi-classifying medical record data set S1 based on the feature value set F and the label set D using a bi-classification model comprises:
establishing a candidate two-classification model library, wherein the candidate two-classification model library comprises a plurality of two-classification models;
and simultaneously executing a plurality of two classification models to obtain the accuracy, recall rate and F1Score value of the two classification models, comprehensively considering the three classification evaluation indexes, and selecting the medical record data set S1 with the best evaluation index effect, wherein one two classification model carries out two classifications on the medical record data set S1.
4. The method according to claim 1, wherein the analyzing the eigenvalue set F for relevancy to obtain an optimized eigenvalue set F1 comprises:
and analyzing the association degree of the characteristic values in the characteristic value set F through chi-square test, or sample variance values, or discrete category interaction information, and deleting the characteristic values with lower association degree to obtain the optimized characteristic value set F1.
5. The method according to claim 1, wherein the optimized feature value set F1 is screened based on the medical field information to obtain a key feature value set F2, wherein the key feature value set is a feature value set that is decisive for determining the disease.
6. The method according to claim 1, wherein prior to multi-classifying the new diseased data set S3, further comprising:
filling missing feature items in the new diseased data set S3 with a specific value, or an average value, or a mode according to the corresponding medical meaning;
the data in the filled diseased data set S3 is normalized to constitute a data set S4.
7. The method according to claim 6, wherein the new diseased data set S3 is multi-classified, in particular:
determining a new set of tags D based on the disease category The new label set D (ii) a set of staging diagnoses corresponding to said disease;
performing multi-classification on the S4 based on the deep neural network model; wherein the content of the first and second substances,
the number of the neurons of the input layer corresponds to the number of the characteristic values in the characteristic set F1;
the number of neurons in the output layer corresponds to the number of disease stages, i.e. the label set D The number of median values;
using the relu function as an activation function for each hidden layer and creating a softmax function, a disease stage prediction is determined.
8. An auxiliary classification apparatus for multi-clinical stage disease, comprising: a processor and a memory;
the processor is for storing a computer program for implementing a method for assisted classification of a multi-clinical stage disease according to any one of claims 1-7.
9. A storage medium storing at least one set of instructions;
the set of instructions for being invoked and performing at least the method of assisted classification of a multi-clinical stage disease as claimed in any one of claims 1 to 7.
CN202210877630.1A 2022-07-25 2022-07-25 Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases Pending CN115148319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210877630.1A CN115148319A (en) 2022-07-25 2022-07-25 Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210877630.1A CN115148319A (en) 2022-07-25 2022-07-25 Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases

Publications (1)

Publication Number Publication Date
CN115148319A true CN115148319A (en) 2022-10-04

Family

ID=83414231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210877630.1A Pending CN115148319A (en) 2022-07-25 2022-07-25 Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases

Country Status (1)

Country Link
CN (1) CN115148319A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065171A (en) * 2018-11-05 2018-12-21 苏州贝斯派生物科技有限公司 The construction method and system of Kawasaki disease risk evaluation model based on integrated study
CN109785976A (en) * 2018-12-11 2019-05-21 青岛中科慧康科技有限公司 A kind of goat based on Soft-Voting forecasting system by stages
CN110347837A (en) * 2019-07-17 2019-10-18 电子科技大学 A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease
CN112541542A (en) * 2020-12-11 2021-03-23 第四范式(北京)技术有限公司 Method and device for processing multi-classification sample data and computer readable storage medium
CN113555077A (en) * 2021-09-18 2021-10-26 北京大学第三医院(北京大学第三临床医学院) Suspected infectious disease prediction method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065171A (en) * 2018-11-05 2018-12-21 苏州贝斯派生物科技有限公司 The construction method and system of Kawasaki disease risk evaluation model based on integrated study
CN109785976A (en) * 2018-12-11 2019-05-21 青岛中科慧康科技有限公司 A kind of goat based on Soft-Voting forecasting system by stages
CN110347837A (en) * 2019-07-17 2019-10-18 电子科技大学 A kind of unplanned Risk Forecast Method of being hospitalized again of cardiovascular disease
CN112541542A (en) * 2020-12-11 2021-03-23 第四范式(北京)技术有限公司 Method and device for processing multi-classification sample data and computer readable storage medium
CN113555077A (en) * 2021-09-18 2021-10-26 北京大学第三医院(北京大学第三临床医学院) Suspected infectious disease prediction method and device

Similar Documents

Publication Publication Date Title
Islam et al. Chronic kidney disease prediction based on machine learning algorithms
CN111710420B (en) Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data
Bashir et al. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting
Mishra et al. Use of deep learning for disease detection and diagnosis
Köse et al. Effect of missing data imputation on deep learning prediction performance for vesicoureteral reflux and recurrent urinary tract infection clinical study
Zhang et al. HCNN: Heterogeneous convolutional neural networks for comorbid risk prediction with electronic health records
Mall et al. Heart diagnosis using deep neural network
Benhar et al. A systematic mapping study of data preparation in heart disease knowledge discovery
Ahmad et al. Diagnosis of cardiovascular disease using deep learning technique
Lin et al. Acute coronary syndrome risk prediction based on gradient boosted tree feature selection and recursive feature elimination: A dataset-specific modeling study
Samet et al. Predicting and staging chronic kidney disease using optimized random forest algorithm
Navaz et al. The use of data mining techniques to predict mortality and length of stay in an ICU
Svenson et al. Sepsis deterioration prediction using channelled long short-term memory networks
Madanan et al. Designing a hybrid artificial intelligent clinical decision support system using artificial neural network and artificial Bee Colony for predicting heart failure rate
Gollapalli et al. Text mining on hospital stay durations and management of sickle cell disease patients
Chaki Deep learning in healthcare: applications, challenges, and opportunities
CN115148319A (en) Auxiliary classification method, equipment and storage medium for multi-clinical stage diseases
Chaudhuri et al. Variable Selection in Genetic Algorithm Model with Logistic Regression for Prediction of Progression to Diseases
Mythili et al. Similarity Disease Prediction System for Efficient Medicare
Esteva et al. Neural networks and artificial intelligence in thoracic surgery
Dilli Babu et al. Heart disease prognosis and quick access to medical data record using data lake with deep learning approaches
Firthous et al. Survey on using electronic medical records (EMR) to identify the health conditions of the patients
Sharma et al. Machine Learning-Based Algorithms for Prediction of Chronic Kidney Disease: A Review
Brindha et al. Efficient Method for Predicting Thyroid Disease Classification using Convolutional Neural Network with Support Vector Machine
Bamidele et al. Survival model for diabetes mellitus patients’ using support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination