CN111785385A - Disease classification method, device, equipment and storage medium - Google Patents
Disease classification method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111785385A CN111785385A CN202010612274.1A CN202010612274A CN111785385A CN 111785385 A CN111785385 A CN 111785385A CN 202010612274 A CN202010612274 A CN 202010612274A CN 111785385 A CN111785385 A CN 111785385A
- Authority
- CN
- China
- Prior art keywords
- disease
- disease classification
- information
- numerical
- case
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 368
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 368
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013145 classification model Methods 0.000 claims abstract description 132
- 239000013598 vector Substances 0.000 claims abstract description 104
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000003062 neural network model Methods 0.000 claims description 20
- 230000000306 recurrent effect Effects 0.000 claims description 10
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 206010035664 Pneumonia Diseases 0.000 description 6
- 208000019622 heart disease Diseases 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 208000002672 hepatitis B Diseases 0.000 description 4
- 206010020772 Hypertension Diseases 0.000 description 3
- 230000036772 blood pressure Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 208000006454 hepatitis Diseases 0.000 description 3
- 231100000283 hepatitis Toxicity 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a disease classification method, a disease classification device, disease classification equipment and a storage medium. The method comprises the following steps: acquiring a numerical vector of case information of a target object to be classified; respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data; determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type. The effect of quickly and accurately determining the disease type of the patient according to the medical record information is achieved.
Description
Technical Field
The embodiment of the invention relates to a medical information processing technology, in particular to a disease classification method, a device, equipment and a storage medium.
Background
At present, after a doctor diagnoses a disease of a patient in the process of receiving a doctor, the doctor usually selects a disease classification name through the experience of the doctor.
The above method of selecting disease classification names according to patient case information is inefficient, and relies on the experience of doctors, thus being subjective and not accurate enough for disease classification.
Disclosure of Invention
The embodiment of the invention provides a disease classification method, a disease classification device, disease classification equipment and a storage medium, so as to realize the effect of quickly and accurately determining the disease type of a patient according to medical record information.
In a first aspect, an embodiment of the present invention provides a disease classification method, including:
acquiring a numerical vector of case information of a target object to be classified;
respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
In a second aspect, an embodiment of the present invention further provides a disease classification apparatus, including:
the case information acquisition module is used for acquiring a numerical vector of case information of a target object to be classified;
the disease type classification module is used for respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module for determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the disease classification method of any of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions for performing the disease classification method described in any of the embodiments of the present invention when executed by a computer processor.
According to the technical scheme, the acquired numerical vectors of the case information of the target object to be classified are input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained, and the target disease type of the case information of the target object is determined based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type, so that the problems that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough due to the fact that the disease classification name is usually selected through the experience of a doctor are solved.
Drawings
FIG. 1 is a flow chart of a disease classification method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a disease classification method according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a disease classification method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus in the fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a medical record classification method according to an embodiment of the present invention, where this embodiment is applicable to a case where a disease type of case information is determined according to medical record information, and the method may be executed by a medical record classification device, where the medical record classification device may be implemented by software and/or hardware, and the medical record classification device may be configured on a computing device, and specifically includes the following steps:
and S110, acquiring a numerical value vector of case information of the target object to be classified.
For example, the target object may be a target corresponding to medical record information to be classified, and may be a human or an animal, for example. The case information may be descriptive information of the condition of the target object, for example, the patient may have physical discomfort, go to a hospital for examination, have a fever of 38 degrees with cough or the like after examination, and be described in writing by the doctor to form the case information of the patient. The case information may be acquired from a case management center of a hospital. The numerical value vector of the case information may be a vector of the case information formed by converting each character in the case information into a numerical value form and combining the numerical values based on a preset conversion rule.
This makes it possible to analyze the numerical vectors of the case information to determine the disease type.
S120, respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type; the first disease classification model is obtained by training based on a plurality of non-standardized historical case history sample data, and the second disease classification model is obtained by training based on a plurality of structured historical case history sample data.
For example, the first disease classification model and the second disease classification model may be numerical vectors based on input case information, resulting in a probability of a disease type to which the case information belongs. The first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data, wherein the non-standardization can be medical expression which does not meet the standard. The second disease classification model is trained based on a plurality of structured historical case sample data, wherein the structured data can be examination items in the form of key-value pairs and corresponding numerical values or types. For example, blood pressure 80mmHg, 120mmHg, heart rate 80 beats/minute, hepatitis b antibody positive, etc., and such examination items and corresponding numerical values or types that can be presented in the form of key-value pairs are structured. The historical case sample data here includes: historical case sample information and a disease type corresponding to the historical case sample information.
The first disease classification probability may be a probability that case information output based on the first disease classification model belongs to each disease type; the second disease classification probability may be a probability that case information output based on the second disease classification model belongs to each disease type.
It should be noted that the first disease classification model and the second disease classification model may be models of the same structure. Or may be a model of a different structure.
Optionally, the first disease classification model may be a recurrent neural network model, where a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model; the second disease classification model can be a fully-connected neural network model, the 2 layers of fully-connected layers are used as hidden layers of the fully-connected neural network model, and the S-shaped growth curve layer is used as an output layer of the fully-connected neural network model.
When the first disease classification model is a recurrent neural network model and the normalization layer is an output layer of the first disease classification model, the probability that the obtained case information belongs to each disease type is a numerical value between 0 and 1, and the sum of the probabilities that the obtained case information belongs to each disease type is 1. When the second disease classification model is a fully connected neural network model, the S-shaped growth curve layer is used as an output layer of the second disease classification model, and the probability that the obtained case information belongs to each disease type is a numerical value between 0 and 1, but the sum of the probabilities that the obtained case information belongs to each disease type is not necessarily 1.
After the case information of the target object to be classified is respectively input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information of the target object to be classified belonging to each disease type are obtained. Thus, the probability that the case information belongs to each disease type can be obtained quickly, so that the target disease type of the case information can be determined based on the obtained first disease classification probability and the second disease classification probability.
S130, determining a target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type.
For example, according to the first disease classification probability and the second disease classification probability that the acquired case information belongs to each disease type, the target disease type of the case information of the target object to be classified may be determined according to a preset calculation rule. The problem that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough because the disease classification name is usually selected through the experience of a doctor is solved.
Optionally, the determining the target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type may specifically be: accumulating the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease; and sequencing the target probabilities of the case information belonging to the disease types, and determining the target disease type of the case information of the target object based on the sequencing result.
For example, the current type of disease may be a case where the information belongs to a current type of disease, for example, the current type of disease may be pneumonia or heart disease. The target probability may be a probability obtained by adding the first disease classification probability and the second disease classification probability that the case information belongs to the current category disease type. The target disease type may be a type of disease to which the case information is finally determined.
And adding the first disease classification probability and the second disease classification probability of the case information belonging to the same type to obtain the target probability of the case information belonging to the type of the disease, wherein the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained based on the first disease classification model and the second disease classification model respectively. The target probabilities that the case information belongs to the respective disease types may be sorted, for example, in descending order, and one disease type ranked the top or several disease types ranked the top based on the sorting result may be used as the target disease type of the case information.
For example, there are 3 disease types in the first disease classification model and the second disease classification model, which are pneumonia, heart disease and hypertension, respectively, and the first disease classification probability that the first disease classification model obtains a case information belonging to the 3 disease types is: 0.5, 0.3 and 0.2; the second disease classification model obtains the first disease classification probabilities that the case information belongs to the 3 disease types as follows: 0.7, 0.8 and 0.6. Then the first disease classification probability and the second disease classification probability of the same type are added, that is, the first disease classification probability and the second disease classification probability of pneumonia are added as follows: 0.5+0.7 ═ 1.2; the first disease classification probability and the second disease classification probability of the heart disease are accumulated as follows: 0.3+0.8 ═ 1.1; the first disease classification probability and the second disease classification probability of hypertension are accumulated as follows: 0.2+0.6 ═ 0.8. The case information has a target probability of being pneumonia of 1.2, a target probability of being heart disease of 1.1, and a target probability of being hypertension of 0.8. And sequencing the 3 target probabilities in a descending order, if the doctor determines that the highest target probability is the target disease type, the pneumonia is the target disease type of the case information of the target object, and if the doctor determines that the top 2 of the target probabilities are the target disease types, the pneumonia and the heart disease are the target disease types of the case information of the target object.
It should be noted that, in addition to obtaining the target probability by accumulating the first disease classification probability and the second disease classification probability that the disease information belongs to the same type, the average value of the first disease classification probability and the second disease classification probability that the disease information belongs to the same type may be calculated, and the average value may be used as the target probability, or the probability with a larger value of the first disease classification probability and the second disease classification probability that the disease information belongs to the same type may be used as the target probability, which may be set by a user according to the needs, and is not limited herein. Any way of determining the target probability based on the first disease classification probability and the second disease classification probability that the disease information belongs to the same type belongs to the protection scope of the embodiment of the present invention.
According to the technical scheme, the acquired numerical vectors of the case information of the target object to be classified are input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained, and the target disease type of the case information of the target object is determined based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type, so that the problems that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough due to the fact that the disease classification name is usually selected through the experience of a doctor are solved.
Example two
Fig. 2 is a flowchart of a disease classification method according to a second embodiment of the present invention, which may be combined with various alternatives in the above embodiments. In an embodiment of the present invention, optionally, the method for training any one of the first disease classification model and the second disease classification model includes: determining a numerical vector of each historical case information based on each historical case information; determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each of the historical case information; and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training samples and the training samples.
As shown in fig. 2, the method of the embodiment of the present invention specifically includes the following steps:
s210, determining a numerical value vector of each historical case sample information based on each historical case sample information.
For example, the numerical value vector may be a vector in which the historical case sample information is converted into numerical values based on a preset conversion rule, and the vector corresponding to the historical case sample information is formed based on a plurality of numerical values. Therefore, when the first disease classification model and the second disease classification model are trained by using historical case sample data, the model can directly process the information in the numerical form of the numerical vector corresponding to the historical case sample information without processing the information in the numerical form of the historical case sample information, so that the processing process of the information in the numerical form of the historical case sample information by the first disease classification model and the second disease classification model is avoided, the workload of the first disease classification model and the second disease classification model is reduced, and the working efficiency of the first disease classification model and the second disease classification model is improved.
Optionally, when the first disease classification model is trained, the determining a numerical vector of each piece of historical case sample information based on each piece of historical case sample information may specifically be: determining a numerical code of each character in each piece of historical case sample information based on a corresponding relationship between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information; and orderly splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
Illustratively, each character in each historical case sample information has its corresponding numerical code, and the numerical code of each character in each historical case sample information can be determined based on the correspondence between each character in one non-standardized historical case sample information and the numerical code corresponding to each character in each non-standardized historical case sample information. For example, if the non-standardized historical case sample information is "monday cold", each character in the "monday cold" has its corresponding numerical code, for example, the numerical code corresponding to "monday" is "1", "the numerical code corresponding to" one "is" 2 "," the numerical code corresponding to "feeling" is "3", and the numerical code corresponding to "cold" is "4", and the numerical codes corresponding to the characters are spliced to obtain the numerical vector of the non-standardized historical case sample information, that is, the numerical vector corresponding to the non-standardized historical case sample information, which is "monday cold", is [1,2,3,4 ].
Optionally, when the second disease classification model is trained, the determining a numerical vector of each piece of historical case sample information based on each piece of historical case sample information may specifically be: determining the position of a characteristic value corresponding to each structured characteristic information in a numerical sequence based on each structured characteristic information in each structured historical case sample information; and determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
For example, the structured feature information may be information of examination items and corresponding values or types, which may be in the form of key-value pairs, such as blood pressure 80mmHg, 120mmHg, heart rate 80 beats/minute, hepatitis b antibody positive, and the like.
The feature value corresponding to the structured feature information may be a numerical value or a type corresponding to the item to be checked in the feature information, and for example, if the feature information is blood pressure 80mmHg or 120mmHg, the feature value is 80 or 120, and if the feature information is positive for hepatitis b antibody, the feature value is 1. Here, the determination of positive test items is defined as a characteristic value of 1, and the determination of stealth is defined as a characteristic value of 0.
According to the position of the characteristic information in the structured case sample information, the position of the characteristic value corresponding to the characteristic information in the numerical sequence is determined, for example, one piece of structured historical case sample information is 80 heart rate/min and positive for hepatitis B antibody, characters in the 80 heart rate/min and positive for hepatitis B antibody are converted into numerical values based on the method for acquiring the numerical vector of the non-standardized historical case sample information, for example, "heart" can be converted into "1", "rate" can be converted into "2", "hepatitis B" can be converted into "3", "liver" can be converted into "4", "anti" can be converted into "5", and "body" can be converted into "6". Extracting characteristic values of structured historical case sample information of 80 heart rate/min and positive hepatitis B antibody to obtain characteristic values corresponding to the characteristic information, wherein the characteristic values are respectively '80' and '1'. According to the position of each character and characteristic value in the structured historical case sample information, namely the position of each character and characteristic value in the sequence, the position of the characteristic value in the numerical sequence is determined, and the numerical vector of the structured historical case sample information can be obtained based on the determined characteristic value and the position of the characteristic value in the sequence, namely the example structured historical case sample information is the position of each character and characteristic value in the sequence, namely the numerical vector of the structured historical case sample information is [1,2,80,3,4,5,6,1 ].
It should be noted that the first disease classification model and the second disease classification model both have their own sequence lengths of numerical vectors, that is, the number of numerical vectors input into the first disease classification model and the second disease classification model is to be in accordance with the sequence lengths of their corresponding models. The length of the numerical vector in the second disease classification model is set according to the structured historical case history sample information when the second disease classification model is constructed, and the length is fixed without supplementing or deleting the length of the numerical vector in the second disease classification model.
But when the number of the numerical value vectors input into the first disease classification model does not accord with the sequence length of the corresponding model, corresponding measures are taken.
Specifically, the measures taken may be: when the number of numerical values in the numerical value vector of the non-standardized historical case sample information is larger than a preset number threshold, deleting the numerical values exceeding the preset number threshold so as to enable the number of numerical values in the numerical value vector of the non-standardized historical case sample information to be equal to the preset number threshold; when the number of values in the value vector of the non-standardized historical case sample information is smaller than a preset number threshold, supplementing a preset value after the last value in the value vector of the non-standardized historical case sample information so as to enable the number of values in the value vector of the non-standardized historical case sample information to be equal to the preset number threshold.
For example, the preset number threshold may be a preset sequence length of the accepted numerical vectors in the first disease classification model. The preset value may be any value of a preset setting.
In the first disease classification model, if the sequence length of the numerical vectors restricted in the model is 10 (i.e. the number of numerical values in the numerical vectors is 10), the sequence length of the numerical vectors input to the model must be 10, if the numerical vectors input to the model are: [1,2,80,3,4,5,6,1], it is known that if the sequence length of the value vector is 8, 8<10, a preset value, such as "0", is supplemented after the last value "1" of the value vector, so that the sequence length of the value vector input to the model is equal to the sequence length of the value vector limited by the model, that is, the value vector input to the model after supplementing the preset value is: [1,2,80,3,4,5,6,1,0,0].
If the numerical vector input into the model is [1,2,80,3,4,5,6,1,3,9,11,15], it is known that the sequence length of the numerical vector is 12, 12>10, the numerical value exceeding the sequence length of the numerical vector limited by the model in the numerical vector is deleted, that is, the last two numerical values "11" and "15" are deleted, and the numerical vector which can be finally input into the model is obtained: [1,2,80,3,4,5,6,1,3,9].
Therefore, the condition that the model works disorderly due to unequal sequence lengths of numerical vectors input into the model is avoided.
It should be noted that, in order to avoid adding too many preset values or deleting too many values to the numerical vector corresponding to the case information, which may cause interference of invalid information or loss of valid information, when the preset number threshold of the model is set, the setting needs to be performed according to the sequence length of the numerical vector of the case information, so as to avoid adding more values or deleting more values as much as possible.
S220, determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical value vector of each piece of historical case sample information.
Illustratively, the determined numerical vector of non-normalized historical case sample information is used as a training sample for training the first disease classification model, and the determined numerical vector of structured historical case sample information is used as a training sample for training the second disease classification model. So that the corresponding models can be trained subsequently based on the training samples of the models respectively.
And S230, carrying out iterative training on any one of the first disease classification model and the second disease classification model based on the training samples and the disease types corresponding to the training samples.
Illustratively, a numerical vector formed by numerical coding of characters in historical case sample information is input into a first disease classification model to obtain the probability that the historical case sample information belongs to each disease type, and meanwhile, numerical vectors formed by the structured feature information and the feature values of the feature information in the historical case sample information are input into a second disease classification model to obtain the probability of each disease type of the historical case sample information. And accumulating the probability values output by the first disease classification model and the probability values output by the second disease classification model in the embodiment, determining the predicted disease types of the historical case sample information, comparing the predicted disease types with the real disease types corresponding to the historical case sample information, if the comparison results are consistent, correctly predicting the first disease classification model and the second disease classification model, and if the comparison results are inconsistent, continuously training the first disease classification model and the second disease classification model.
Therefore, the target disease type of the case information of the target object to be classified can be quickly and accurately determined on the basis of the trained first disease classification model and the trained second disease classification model.
S240, acquiring a case vector of case information of the target object to be classified.
S250, respectively inputting the case vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data.
And S260, determining the target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type.
According to the technical scheme of the embodiment of the invention, the numerical value vector of each historical case information is determined based on each historical case information, so that when the historical case sample data is used for training the first disease classification model and the second disease classification model, the model can directly process the information in the numerical value form of the numerical value vector corresponding to the historical case sample information without processing the information in the character form of the historical case sample information, the processing process of the information in the character form of the historical case sample information by the first disease classification model and the second disease classification model is avoided, the work load of the first disease classification model and the second disease classification model is reduced, and the work efficiency of the first disease classification model and the second disease classification model is improved. Determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each historical case information, and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training sample and the training sample. Therefore, the target disease type of the case information of the target object to be classified can be quickly and accurately determined on the basis of the trained first disease classification model and the trained second disease classification model.
EXAMPLE III
Fig. 3 is a flowchart of a disease classification method according to a third embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a case information acquisition module 31, a disease type classification module 32, and a target disease type determination module 33.
The case information acquiring module 31 is configured to acquire a numerical vector of case information of a target object to be classified;
the disease type classification module 32 is configured to input the numerical vectors of the case information into a trained first disease classification model and a trained second disease classification model respectively, so as to obtain a first disease classification probability and a second disease classification probability that the case information belongs to each disease type, where the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module 33, configured to determine a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
Optionally, the historical case sample data includes: historical case sample information and a disease type corresponding to the historical case sample information.
On the basis of the technical scheme of the embodiment, the device further comprises:
the system comprises a historical case sample information numerical vector determination module, a historical case sample information numerical vector determination module and a historical case sample information numerical vector determination module, wherein the historical case sample information numerical vector determination module is used for determining the numerical vectors of the historical case sample information based on each piece of historical case sample information;
a training sample determination module, configured to determine a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each piece of historical case sample information;
and the model training module is used for carrying out iterative training on any one of the first disease classification model and the second disease classification model based on the training samples and the disease types corresponding to the training samples.
On the basis of the technical solution of the above embodiment, when the first disease classification model is trained, the numerical vector determination module of the historical case sample information includes:
a numerical code determination unit for determining a numerical code of each character in each piece of historical case sample information based on a correspondence between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information;
and the numerical vector first determining unit is used for sequentially splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
On the basis of the technical solution of the above embodiment, when the second disease classification model is trained, the numerical vector determination module of the historical case sample information includes:
the characteristic value position determining unit is used for determining the position of the characteristic value corresponding to each structured characteristic information in the numerical sequence based on each structured characteristic information in each structured historical case sample information;
and the numerical vector second determining unit is used for determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
On the basis of the technical scheme of the embodiment, the device further comprises:
the numerical value vector first adjusting module is used for deleting the numerical values exceeding a preset number threshold value when the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is larger than the preset number threshold value, so that the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is equal to the preset number threshold value;
and the numerical value vector second adjusting module is used for supplementing a preset numerical value after the last numerical value in the numerical value vector of the non-standardized historical case sample information when the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is smaller than a preset number threshold value, so that the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is equal to the preset number threshold value.
On the basis of the technical solution of the above embodiment, the target disease type determination module 33 includes:
a target probability unit, configured to accumulate the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease;
and a target disease type determination unit configured to rank the target probabilities of the case information belonging to the respective disease types, and determine a target disease type of the case information of the target object based on a ranking result.
Optionally, the first disease classification model is a recurrent neural network model, wherein a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model; the second disease classification model is a full-connection neural network model, the 2 full-connection layers are used as hidden layers of the full-connection neural network model, and the S-shaped growth curve layer is used as an output layer of the full-connection neural network model.
The disease classification device provided by the embodiment of the invention can execute the disease classification method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus includes a processor 70, a memory 71, an input device 72, and an output device 73; the number of processors 70 in the device may be one or more, and one processor 70 is taken as an example in fig. 4; the processor 70, the memory 71, the input device 72 and the output device 73 of the apparatus may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The memory 71, as a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules (e.g., the case information acquisition module 31, the disease type classification module 31, and the target disease type determination module) corresponding to the disease classification method in the embodiment of the present invention. The processor 70 executes various functional applications of the device and data processing, i.e., implements the above-described disease classification method, by executing software programs, instructions, and modules stored in the memory 71.
The memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 71 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 71 may further include memory located remotely from the processor 70, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 72 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 73 may include a display device such as a display screen.
EXAMPLE five
Embodiments of the present invention also provide a storage medium containing computer-executable instructions which, when executed by a computer processor, perform a method of disease classification.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the disease classification method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the disease classification apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A method of disease classification, comprising:
acquiring a numerical vector of case information of a target object to be classified;
respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
2. The method of claim 1, wherein the historical case sample data comprises: historical case sample information and disease types corresponding to the historical case sample information;
the training method of any one of the first disease classification model and the second disease classification model comprises the following steps:
determining a numerical value vector of each historical case sample information based on each historical case sample information;
determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each of the historical case sample information;
and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training samples and the training samples.
3. The method of claim 2, wherein, when training the first disease classification model,
the determining a numerical vector of each historical case sample information based on each historical case sample information includes:
determining a numerical code of each character in each piece of historical case sample information based on a corresponding relationship between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information;
and orderly splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
4. The method of claim 2, wherein when training the second disease classification model,
the determining a numerical vector of each historical case sample information based on each historical case sample information includes:
determining the position of a characteristic value corresponding to each structured characteristic information in a numerical sequence based on each structured characteristic information in each structured historical case sample information;
and determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
5. The method of claim 3, further comprising:
when the number of numerical values in the numerical value vector of the non-standardized historical case sample information is larger than a preset number threshold, deleting the numerical values exceeding the preset number threshold so as to enable the number of numerical values in the numerical value vector of the non-standardized historical case sample information to be equal to the preset number threshold;
when the number of values in the value vector of the non-standardized historical case sample information is smaller than a preset number threshold, supplementing a preset value after the last value in the value vector of the non-standardized historical case sample information so as to enable the number of values in the value vector of the non-standardized historical case sample information to be equal to the preset number threshold.
6. The method according to claim 1, wherein the determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type comprises:
accumulating the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease;
and sequencing the target probabilities of the case information belonging to the disease types, and determining the target disease type of the case information of the target object based on the sequencing result.
7. The method of claim 1, wherein the first disease classification model is a recurrent neural network model, wherein a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model;
the second disease classification model is a full-connection neural network model, the 2 full-connection layers are used as hidden layers of the full-connection neural network model, and the S-shaped growth curve layer is used as an output layer of the full-connection neural network model.
8. A disease classification device, comprising:
the case information acquisition module is used for acquiring a numerical vector of case information of a target object to be classified;
the disease type classification module is used for respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module for determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
9. An apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a disease classification method as claimed in any one of claims 1-7.
10. A storage medium containing computer-executable instructions for performing the disease classification method of any one of claims 1-7 when executed by a computer processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010612274.1A CN111785385A (en) | 2020-06-29 | 2020-06-29 | Disease classification method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010612274.1A CN111785385A (en) | 2020-06-29 | 2020-06-29 | Disease classification method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111785385A true CN111785385A (en) | 2020-10-16 |
Family
ID=72760342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010612274.1A Pending CN111785385A (en) | 2020-06-29 | 2020-06-29 | Disease classification method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111785385A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800248A (en) * | 2021-01-19 | 2021-05-14 | 天河超级计算淮海分中心 | Similar case retrieval method, similar case retrieval device, computer equipment and storage medium |
CN112885481A (en) * | 2021-03-09 | 2021-06-01 | 联仁健康医疗大数据科技股份有限公司 | Case grouping method, case grouping device, electronic equipment and storage medium |
CN113111162A (en) * | 2021-04-21 | 2021-07-13 | 康键信息技术(深圳)有限公司 | Department recommendation method and device, electronic equipment and storage medium |
CN115938593A (en) * | 2023-03-10 | 2023-04-07 | 武汉大学人民医院(湖北省人民医院) | Medical record information processing method, device and equipment and computer readable storage medium |
CN117292174A (en) * | 2023-09-06 | 2023-12-26 | 中化现代农业有限公司 | Apple disease identification method, apple disease identification device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922608A (en) * | 2018-06-13 | 2018-11-30 | 平安医疗科技有限公司 | Intelligent hospital guide's method, apparatus, computer equipment and storage medium |
CN109460473A (en) * | 2018-11-21 | 2019-03-12 | 中南大学 | The electronic health record multi-tag classification method with character representation is extracted based on symptom |
CN109978022A (en) * | 2019-03-08 | 2019-07-05 | 腾讯科技(深圳)有限公司 | A kind of medical treatment text message processing method and device, storage medium |
CN110249392A (en) * | 2018-08-20 | 2019-09-17 | 深圳市全息医疗科技有限公司 | Intelligent assisting in diagnosis and treatment system and method |
WO2020048264A1 (en) * | 2018-09-03 | 2020-03-12 | 平安医疗健康管理股份有限公司 | Method and apparatus for processing drug data, computer device, and storage medium |
CN110991170A (en) * | 2019-12-05 | 2020-04-10 | 清华大学 | Chinese disease name intelligent standardization method and system based on electronic medical record information |
-
2020
- 2020-06-29 CN CN202010612274.1A patent/CN111785385A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108922608A (en) * | 2018-06-13 | 2018-11-30 | 平安医疗科技有限公司 | Intelligent hospital guide's method, apparatus, computer equipment and storage medium |
CN110249392A (en) * | 2018-08-20 | 2019-09-17 | 深圳市全息医疗科技有限公司 | Intelligent assisting in diagnosis and treatment system and method |
WO2020048264A1 (en) * | 2018-09-03 | 2020-03-12 | 平安医疗健康管理股份有限公司 | Method and apparatus for processing drug data, computer device, and storage medium |
CN109460473A (en) * | 2018-11-21 | 2019-03-12 | 中南大学 | The electronic health record multi-tag classification method with character representation is extracted based on symptom |
CN109978022A (en) * | 2019-03-08 | 2019-07-05 | 腾讯科技(深圳)有限公司 | A kind of medical treatment text message processing method and device, storage medium |
CN110490251A (en) * | 2019-03-08 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Prediction disaggregated model acquisition methods and device, storage medium based on artificial intelligence |
CN110991170A (en) * | 2019-12-05 | 2020-04-10 | 清华大学 | Chinese disease name intelligent standardization method and system based on electronic medical record information |
Non-Patent Citations (4)
Title |
---|
曾向阳: "《智能水中目标识别》", vol. 2016, 31 March 2016, 国防工业出版社, pages: 136 - 138 * |
李林杰 等: "《经济应用统计学》", vol. 2010, 31 July 2010, 现代教育出版社, pages: 82 - 84 * |
梁繁荣 等: "《针灸数据挖掘与临床决策》", vol. 2010, 28 February 2010, 巴蜀书社, pages: 200 - 205 * |
董海军: "《社会调查与统计》", vol. 2015, 28 February 2015, 武汉大学出版社, pages: 155 - 160 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800248A (en) * | 2021-01-19 | 2021-05-14 | 天河超级计算淮海分中心 | Similar case retrieval method, similar case retrieval device, computer equipment and storage medium |
CN112885481A (en) * | 2021-03-09 | 2021-06-01 | 联仁健康医疗大数据科技股份有限公司 | Case grouping method, case grouping device, electronic equipment and storage medium |
CN113111162A (en) * | 2021-04-21 | 2021-07-13 | 康键信息技术(深圳)有限公司 | Department recommendation method and device, electronic equipment and storage medium |
CN115938593A (en) * | 2023-03-10 | 2023-04-07 | 武汉大学人民医院(湖北省人民医院) | Medical record information processing method, device and equipment and computer readable storage medium |
CN115938593B (en) * | 2023-03-10 | 2023-06-02 | 武汉大学人民医院(湖北省人民医院) | Medical record information processing method, device, equipment and computer readable storage medium |
CN117292174A (en) * | 2023-09-06 | 2023-12-26 | 中化现代农业有限公司 | Apple disease identification method, apple disease identification device, electronic equipment and storage medium |
CN117292174B (en) * | 2023-09-06 | 2024-04-19 | 中化现代农业有限公司 | Apple disease identification method, apple disease identification device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111785385A (en) | Disease classification method, device, equipment and storage medium | |
CN109189991B (en) | Duplicate video identification method, device, terminal and computer readable storage medium | |
CN106951484B (en) | Picture retrieval method and device, computer equipment and computer readable medium | |
CN110309874B (en) | Negative sample screening model training method, data screening method and data matching method | |
EP3545470A1 (en) | Method for training neuron network and active learning system | |
WO2022110444A1 (en) | Dynamic prediction method and apparatus for cloud native resources, computer device and storage medium | |
CN111445968A (en) | Electronic medical record query method and device, computer equipment and storage medium | |
EP2499569A1 (en) | Clustering method and system | |
CN111368064A (en) | Survey information processing method, device, equipment and storage medium | |
CN111008272A (en) | Knowledge graph-based question and answer method and device, computer equipment and storage medium | |
CN111951943B (en) | Intelligent triage method and device, electronic equipment and storage medium | |
CN111710364B (en) | Method, device, terminal and storage medium for acquiring flora marker | |
CN110969172A (en) | Text classification method and related equipment | |
CN111160049B (en) | Text translation method, apparatus, machine translation system, and storage medium | |
CN114886404B (en) | Electronic equipment, device and storage medium | |
CN111310834B (en) | Data processing method and device, processor, electronic equipment and storage medium | |
Nakaya et al. | Extraction of correlated gene clusters by multiple graph comparison | |
CN116955538B (en) | Medical dictionary data matching method and device, electronic equipment and storage medium | |
CN114048136A (en) | Test type determination method, device, server, medium and product | |
CN109635004A (en) | A kind of object factory providing method, device and the equipment of database | |
CN111816306A (en) | Medical data processing method, and prediction model training method and device | |
CN116680401A (en) | Document processing method, document processing device, apparatus and storage medium | |
CN110957046A (en) | Medical health case knowledge matching method and system | |
CN114881124B (en) | Causal relation graph construction method and device, electronic equipment and medium | |
CN114579626B (en) | Data processing method, data processing device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |