CN111785385A - Disease classification method, device, equipment and storage medium - Google Patents

Disease classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN111785385A
CN111785385A CN202010612274.1A CN202010612274A CN111785385A CN 111785385 A CN111785385 A CN 111785385A CN 202010612274 A CN202010612274 A CN 202010612274A CN 111785385 A CN111785385 A CN 111785385A
Authority
CN
China
Prior art keywords
disease
disease classification
information
numerical
case
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010612274.1A
Other languages
Chinese (zh)
Inventor
许红伟
方成
饶官军
柴鹏飞
吴边
洪叶恩
孟海忠
任宇翔
冯辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weiyiyun Hangzhou Holding Co ltd
Original Assignee
Weiyiyun Hangzhou Holding Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weiyiyun Hangzhou Holding Co ltd filed Critical Weiyiyun Hangzhou Holding Co ltd
Priority to CN202010612274.1A priority Critical patent/CN111785385A/en
Publication of CN111785385A publication Critical patent/CN111785385A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a disease classification method, a disease classification device, disease classification equipment and a storage medium. The method comprises the following steps: acquiring a numerical vector of case information of a target object to be classified; respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data; determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type. The effect of quickly and accurately determining the disease type of the patient according to the medical record information is achieved.

Description

Disease classification method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to a medical information processing technology, in particular to a disease classification method, a device, equipment and a storage medium.
Background
At present, after a doctor diagnoses a disease of a patient in the process of receiving a doctor, the doctor usually selects a disease classification name through the experience of the doctor.
The above method of selecting disease classification names according to patient case information is inefficient, and relies on the experience of doctors, thus being subjective and not accurate enough for disease classification.
Disclosure of Invention
The embodiment of the invention provides a disease classification method, a disease classification device, disease classification equipment and a storage medium, so as to realize the effect of quickly and accurately determining the disease type of a patient according to medical record information.
In a first aspect, an embodiment of the present invention provides a disease classification method, including:
acquiring a numerical vector of case information of a target object to be classified;
respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
In a second aspect, an embodiment of the present invention further provides a disease classification apparatus, including:
the case information acquisition module is used for acquiring a numerical vector of case information of a target object to be classified;
the disease type classification module is used for respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module for determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the disease classification method of any of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer-executable instructions for performing the disease classification method described in any of the embodiments of the present invention when executed by a computer processor.
According to the technical scheme, the acquired numerical vectors of the case information of the target object to be classified are input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained, and the target disease type of the case information of the target object is determined based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type, so that the problems that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough due to the fact that the disease classification name is usually selected through the experience of a doctor are solved.
Drawings
FIG. 1 is a flow chart of a disease classification method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a disease classification method according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a disease classification method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus in the fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a medical record classification method according to an embodiment of the present invention, where this embodiment is applicable to a case where a disease type of case information is determined according to medical record information, and the method may be executed by a medical record classification device, where the medical record classification device may be implemented by software and/or hardware, and the medical record classification device may be configured on a computing device, and specifically includes the following steps:
and S110, acquiring a numerical value vector of case information of the target object to be classified.
For example, the target object may be a target corresponding to medical record information to be classified, and may be a human or an animal, for example. The case information may be descriptive information of the condition of the target object, for example, the patient may have physical discomfort, go to a hospital for examination, have a fever of 38 degrees with cough or the like after examination, and be described in writing by the doctor to form the case information of the patient. The case information may be acquired from a case management center of a hospital. The numerical value vector of the case information may be a vector of the case information formed by converting each character in the case information into a numerical value form and combining the numerical values based on a preset conversion rule.
This makes it possible to analyze the numerical vectors of the case information to determine the disease type.
S120, respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type; the first disease classification model is obtained by training based on a plurality of non-standardized historical case history sample data, and the second disease classification model is obtained by training based on a plurality of structured historical case history sample data.
For example, the first disease classification model and the second disease classification model may be numerical vectors based on input case information, resulting in a probability of a disease type to which the case information belongs. The first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data, wherein the non-standardization can be medical expression which does not meet the standard. The second disease classification model is trained based on a plurality of structured historical case sample data, wherein the structured data can be examination items in the form of key-value pairs and corresponding numerical values or types. For example, blood pressure 80mmHg, 120mmHg, heart rate 80 beats/minute, hepatitis b antibody positive, etc., and such examination items and corresponding numerical values or types that can be presented in the form of key-value pairs are structured. The historical case sample data here includes: historical case sample information and a disease type corresponding to the historical case sample information.
The first disease classification probability may be a probability that case information output based on the first disease classification model belongs to each disease type; the second disease classification probability may be a probability that case information output based on the second disease classification model belongs to each disease type.
It should be noted that the first disease classification model and the second disease classification model may be models of the same structure. Or may be a model of a different structure.
Optionally, the first disease classification model may be a recurrent neural network model, where a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model; the second disease classification model can be a fully-connected neural network model, the 2 layers of fully-connected layers are used as hidden layers of the fully-connected neural network model, and the S-shaped growth curve layer is used as an output layer of the fully-connected neural network model.
When the first disease classification model is a recurrent neural network model and the normalization layer is an output layer of the first disease classification model, the probability that the obtained case information belongs to each disease type is a numerical value between 0 and 1, and the sum of the probabilities that the obtained case information belongs to each disease type is 1. When the second disease classification model is a fully connected neural network model, the S-shaped growth curve layer is used as an output layer of the second disease classification model, and the probability that the obtained case information belongs to each disease type is a numerical value between 0 and 1, but the sum of the probabilities that the obtained case information belongs to each disease type is not necessarily 1.
After the case information of the target object to be classified is respectively input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information of the target object to be classified belonging to each disease type are obtained. Thus, the probability that the case information belongs to each disease type can be obtained quickly, so that the target disease type of the case information can be determined based on the obtained first disease classification probability and the second disease classification probability.
S130, determining a target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type.
For example, according to the first disease classification probability and the second disease classification probability that the acquired case information belongs to each disease type, the target disease type of the case information of the target object to be classified may be determined according to a preset calculation rule. The problem that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough because the disease classification name is usually selected through the experience of a doctor is solved.
Optionally, the determining the target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type may specifically be: accumulating the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease; and sequencing the target probabilities of the case information belonging to the disease types, and determining the target disease type of the case information of the target object based on the sequencing result.
For example, the current type of disease may be a case where the information belongs to a current type of disease, for example, the current type of disease may be pneumonia or heart disease. The target probability may be a probability obtained by adding the first disease classification probability and the second disease classification probability that the case information belongs to the current category disease type. The target disease type may be a type of disease to which the case information is finally determined.
And adding the first disease classification probability and the second disease classification probability of the case information belonging to the same type to obtain the target probability of the case information belonging to the type of the disease, wherein the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained based on the first disease classification model and the second disease classification model respectively. The target probabilities that the case information belongs to the respective disease types may be sorted, for example, in descending order, and one disease type ranked the top or several disease types ranked the top based on the sorting result may be used as the target disease type of the case information.
For example, there are 3 disease types in the first disease classification model and the second disease classification model, which are pneumonia, heart disease and hypertension, respectively, and the first disease classification probability that the first disease classification model obtains a case information belonging to the 3 disease types is: 0.5, 0.3 and 0.2; the second disease classification model obtains the first disease classification probabilities that the case information belongs to the 3 disease types as follows: 0.7, 0.8 and 0.6. Then the first disease classification probability and the second disease classification probability of the same type are added, that is, the first disease classification probability and the second disease classification probability of pneumonia are added as follows: 0.5+0.7 ═ 1.2; the first disease classification probability and the second disease classification probability of the heart disease are accumulated as follows: 0.3+0.8 ═ 1.1; the first disease classification probability and the second disease classification probability of hypertension are accumulated as follows: 0.2+0.6 ═ 0.8. The case information has a target probability of being pneumonia of 1.2, a target probability of being heart disease of 1.1, and a target probability of being hypertension of 0.8. And sequencing the 3 target probabilities in a descending order, if the doctor determines that the highest target probability is the target disease type, the pneumonia is the target disease type of the case information of the target object, and if the doctor determines that the top 2 of the target probabilities are the target disease types, the pneumonia and the heart disease are the target disease types of the case information of the target object.
It should be noted that, in addition to obtaining the target probability by accumulating the first disease classification probability and the second disease classification probability that the disease information belongs to the same type, the average value of the first disease classification probability and the second disease classification probability that the disease information belongs to the same type may be calculated, and the average value may be used as the target probability, or the probability with a larger value of the first disease classification probability and the second disease classification probability that the disease information belongs to the same type may be used as the target probability, which may be set by a user according to the needs, and is not limited herein. Any way of determining the target probability based on the first disease classification probability and the second disease classification probability that the disease information belongs to the same type belongs to the protection scope of the embodiment of the present invention.
According to the technical scheme, the acquired numerical vectors of the case information of the target object to be classified are input into the trained first disease classification model and the trained second disease classification model, the first disease classification probability and the second disease classification probability of the case information belonging to each disease type are obtained, and the target disease type of the case information of the target object is determined based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type, so that the problems that in the prior art, the disease classification efficiency is low and the disease classification is not accurate enough due to the fact that the disease classification name is usually selected through the experience of a doctor are solved.
Example two
Fig. 2 is a flowchart of a disease classification method according to a second embodiment of the present invention, which may be combined with various alternatives in the above embodiments. In an embodiment of the present invention, optionally, the method for training any one of the first disease classification model and the second disease classification model includes: determining a numerical vector of each historical case information based on each historical case information; determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each of the historical case information; and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training samples and the training samples.
As shown in fig. 2, the method of the embodiment of the present invention specifically includes the following steps:
s210, determining a numerical value vector of each historical case sample information based on each historical case sample information.
For example, the numerical value vector may be a vector in which the historical case sample information is converted into numerical values based on a preset conversion rule, and the vector corresponding to the historical case sample information is formed based on a plurality of numerical values. Therefore, when the first disease classification model and the second disease classification model are trained by using historical case sample data, the model can directly process the information in the numerical form of the numerical vector corresponding to the historical case sample information without processing the information in the numerical form of the historical case sample information, so that the processing process of the information in the numerical form of the historical case sample information by the first disease classification model and the second disease classification model is avoided, the workload of the first disease classification model and the second disease classification model is reduced, and the working efficiency of the first disease classification model and the second disease classification model is improved.
Optionally, when the first disease classification model is trained, the determining a numerical vector of each piece of historical case sample information based on each piece of historical case sample information may specifically be: determining a numerical code of each character in each piece of historical case sample information based on a corresponding relationship between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information; and orderly splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
Illustratively, each character in each historical case sample information has its corresponding numerical code, and the numerical code of each character in each historical case sample information can be determined based on the correspondence between each character in one non-standardized historical case sample information and the numerical code corresponding to each character in each non-standardized historical case sample information. For example, if the non-standardized historical case sample information is "monday cold", each character in the "monday cold" has its corresponding numerical code, for example, the numerical code corresponding to "monday" is "1", "the numerical code corresponding to" one "is" 2 "," the numerical code corresponding to "feeling" is "3", and the numerical code corresponding to "cold" is "4", and the numerical codes corresponding to the characters are spliced to obtain the numerical vector of the non-standardized historical case sample information, that is, the numerical vector corresponding to the non-standardized historical case sample information, which is "monday cold", is [1,2,3,4 ].
Optionally, when the second disease classification model is trained, the determining a numerical vector of each piece of historical case sample information based on each piece of historical case sample information may specifically be: determining the position of a characteristic value corresponding to each structured characteristic information in a numerical sequence based on each structured characteristic information in each structured historical case sample information; and determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
For example, the structured feature information may be information of examination items and corresponding values or types, which may be in the form of key-value pairs, such as blood pressure 80mmHg, 120mmHg, heart rate 80 beats/minute, hepatitis b antibody positive, and the like.
The feature value corresponding to the structured feature information may be a numerical value or a type corresponding to the item to be checked in the feature information, and for example, if the feature information is blood pressure 80mmHg or 120mmHg, the feature value is 80 or 120, and if the feature information is positive for hepatitis b antibody, the feature value is 1. Here, the determination of positive test items is defined as a characteristic value of 1, and the determination of stealth is defined as a characteristic value of 0.
According to the position of the characteristic information in the structured case sample information, the position of the characteristic value corresponding to the characteristic information in the numerical sequence is determined, for example, one piece of structured historical case sample information is 80 heart rate/min and positive for hepatitis B antibody, characters in the 80 heart rate/min and positive for hepatitis B antibody are converted into numerical values based on the method for acquiring the numerical vector of the non-standardized historical case sample information, for example, "heart" can be converted into "1", "rate" can be converted into "2", "hepatitis B" can be converted into "3", "liver" can be converted into "4", "anti" can be converted into "5", and "body" can be converted into "6". Extracting characteristic values of structured historical case sample information of 80 heart rate/min and positive hepatitis B antibody to obtain characteristic values corresponding to the characteristic information, wherein the characteristic values are respectively '80' and '1'. According to the position of each character and characteristic value in the structured historical case sample information, namely the position of each character and characteristic value in the sequence, the position of the characteristic value in the numerical sequence is determined, and the numerical vector of the structured historical case sample information can be obtained based on the determined characteristic value and the position of the characteristic value in the sequence, namely the example structured historical case sample information is the position of each character and characteristic value in the sequence, namely the numerical vector of the structured historical case sample information is [1,2,80,3,4,5,6,1 ].
It should be noted that the first disease classification model and the second disease classification model both have their own sequence lengths of numerical vectors, that is, the number of numerical vectors input into the first disease classification model and the second disease classification model is to be in accordance with the sequence lengths of their corresponding models. The length of the numerical vector in the second disease classification model is set according to the structured historical case history sample information when the second disease classification model is constructed, and the length is fixed without supplementing or deleting the length of the numerical vector in the second disease classification model.
But when the number of the numerical value vectors input into the first disease classification model does not accord with the sequence length of the corresponding model, corresponding measures are taken.
Specifically, the measures taken may be: when the number of numerical values in the numerical value vector of the non-standardized historical case sample information is larger than a preset number threshold, deleting the numerical values exceeding the preset number threshold so as to enable the number of numerical values in the numerical value vector of the non-standardized historical case sample information to be equal to the preset number threshold; when the number of values in the value vector of the non-standardized historical case sample information is smaller than a preset number threshold, supplementing a preset value after the last value in the value vector of the non-standardized historical case sample information so as to enable the number of values in the value vector of the non-standardized historical case sample information to be equal to the preset number threshold.
For example, the preset number threshold may be a preset sequence length of the accepted numerical vectors in the first disease classification model. The preset value may be any value of a preset setting.
In the first disease classification model, if the sequence length of the numerical vectors restricted in the model is 10 (i.e. the number of numerical values in the numerical vectors is 10), the sequence length of the numerical vectors input to the model must be 10, if the numerical vectors input to the model are: [1,2,80,3,4,5,6,1], it is known that if the sequence length of the value vector is 8, 8<10, a preset value, such as "0", is supplemented after the last value "1" of the value vector, so that the sequence length of the value vector input to the model is equal to the sequence length of the value vector limited by the model, that is, the value vector input to the model after supplementing the preset value is: [1,2,80,3,4,5,6,1,0,0].
If the numerical vector input into the model is [1,2,80,3,4,5,6,1,3,9,11,15], it is known that the sequence length of the numerical vector is 12, 12>10, the numerical value exceeding the sequence length of the numerical vector limited by the model in the numerical vector is deleted, that is, the last two numerical values "11" and "15" are deleted, and the numerical vector which can be finally input into the model is obtained: [1,2,80,3,4,5,6,1,3,9].
Therefore, the condition that the model works disorderly due to unequal sequence lengths of numerical vectors input into the model is avoided.
It should be noted that, in order to avoid adding too many preset values or deleting too many values to the numerical vector corresponding to the case information, which may cause interference of invalid information or loss of valid information, when the preset number threshold of the model is set, the setting needs to be performed according to the sequence length of the numerical vector of the case information, so as to avoid adding more values or deleting more values as much as possible.
S220, determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical value vector of each piece of historical case sample information.
Illustratively, the determined numerical vector of non-normalized historical case sample information is used as a training sample for training the first disease classification model, and the determined numerical vector of structured historical case sample information is used as a training sample for training the second disease classification model. So that the corresponding models can be trained subsequently based on the training samples of the models respectively.
And S230, carrying out iterative training on any one of the first disease classification model and the second disease classification model based on the training samples and the disease types corresponding to the training samples.
Illustratively, a numerical vector formed by numerical coding of characters in historical case sample information is input into a first disease classification model to obtain the probability that the historical case sample information belongs to each disease type, and meanwhile, numerical vectors formed by the structured feature information and the feature values of the feature information in the historical case sample information are input into a second disease classification model to obtain the probability of each disease type of the historical case sample information. And accumulating the probability values output by the first disease classification model and the probability values output by the second disease classification model in the embodiment, determining the predicted disease types of the historical case sample information, comparing the predicted disease types with the real disease types corresponding to the historical case sample information, if the comparison results are consistent, correctly predicting the first disease classification model and the second disease classification model, and if the comparison results are inconsistent, continuously training the first disease classification model and the second disease classification model.
Therefore, the target disease type of the case information of the target object to be classified can be quickly and accurately determined on the basis of the trained first disease classification model and the trained second disease classification model.
S240, acquiring a case vector of case information of the target object to be classified.
S250, respectively inputting the case vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data.
And S260, determining the target disease type of the case information of the target object based on the first disease classification probability and the second disease classification probability of the case information belonging to each disease type.
According to the technical scheme of the embodiment of the invention, the numerical value vector of each historical case information is determined based on each historical case information, so that when the historical case sample data is used for training the first disease classification model and the second disease classification model, the model can directly process the information in the numerical value form of the numerical value vector corresponding to the historical case sample information without processing the information in the character form of the historical case sample information, the processing process of the information in the character form of the historical case sample information by the first disease classification model and the second disease classification model is avoided, the work load of the first disease classification model and the second disease classification model is reduced, and the work efficiency of the first disease classification model and the second disease classification model is improved. Determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each historical case information, and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training sample and the training sample. Therefore, the target disease type of the case information of the target object to be classified can be quickly and accurately determined on the basis of the trained first disease classification model and the trained second disease classification model.
EXAMPLE III
Fig. 3 is a flowchart of a disease classification method according to a third embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a case information acquisition module 31, a disease type classification module 32, and a target disease type determination module 33.
The case information acquiring module 31 is configured to acquire a numerical vector of case information of a target object to be classified;
the disease type classification module 32 is configured to input the numerical vectors of the case information into a trained first disease classification model and a trained second disease classification model respectively, so as to obtain a first disease classification probability and a second disease classification probability that the case information belongs to each disease type, where the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module 33, configured to determine a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
Optionally, the historical case sample data includes: historical case sample information and a disease type corresponding to the historical case sample information.
On the basis of the technical scheme of the embodiment, the device further comprises:
the system comprises a historical case sample information numerical vector determination module, a historical case sample information numerical vector determination module and a historical case sample information numerical vector determination module, wherein the historical case sample information numerical vector determination module is used for determining the numerical vectors of the historical case sample information based on each piece of historical case sample information;
a training sample determination module, configured to determine a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each piece of historical case sample information;
and the model training module is used for carrying out iterative training on any one of the first disease classification model and the second disease classification model based on the training samples and the disease types corresponding to the training samples.
On the basis of the technical solution of the above embodiment, when the first disease classification model is trained, the numerical vector determination module of the historical case sample information includes:
a numerical code determination unit for determining a numerical code of each character in each piece of historical case sample information based on a correspondence between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information;
and the numerical vector first determining unit is used for sequentially splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
On the basis of the technical solution of the above embodiment, when the second disease classification model is trained, the numerical vector determination module of the historical case sample information includes:
the characteristic value position determining unit is used for determining the position of the characteristic value corresponding to each structured characteristic information in the numerical sequence based on each structured characteristic information in each structured historical case sample information;
and the numerical vector second determining unit is used for determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
On the basis of the technical scheme of the embodiment, the device further comprises:
the numerical value vector first adjusting module is used for deleting the numerical values exceeding a preset number threshold value when the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is larger than the preset number threshold value, so that the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is equal to the preset number threshold value;
and the numerical value vector second adjusting module is used for supplementing a preset numerical value after the last numerical value in the numerical value vector of the non-standardized historical case sample information when the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is smaller than a preset number threshold value, so that the number of the numerical values in the numerical value vector of the non-standardized historical case sample information is equal to the preset number threshold value.
On the basis of the technical solution of the above embodiment, the target disease type determination module 33 includes:
a target probability unit, configured to accumulate the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease;
and a target disease type determination unit configured to rank the target probabilities of the case information belonging to the respective disease types, and determine a target disease type of the case information of the target object based on a ranking result.
Optionally, the first disease classification model is a recurrent neural network model, wherein a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model; the second disease classification model is a full-connection neural network model, the 2 full-connection layers are used as hidden layers of the full-connection neural network model, and the S-shaped growth curve layer is used as an output layer of the full-connection neural network model.
The disease classification device provided by the embodiment of the invention can execute the disease classification method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus includes a processor 70, a memory 71, an input device 72, and an output device 73; the number of processors 70 in the device may be one or more, and one processor 70 is taken as an example in fig. 4; the processor 70, the memory 71, the input device 72 and the output device 73 of the apparatus may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The memory 71, as a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules (e.g., the case information acquisition module 31, the disease type classification module 31, and the target disease type determination module) corresponding to the disease classification method in the embodiment of the present invention. The processor 70 executes various functional applications of the device and data processing, i.e., implements the above-described disease classification method, by executing software programs, instructions, and modules stored in the memory 71.
The memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 71 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 71 may further include memory located remotely from the processor 70, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 72 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 73 may include a display device such as a display screen.
EXAMPLE five
Embodiments of the present invention also provide a storage medium containing computer-executable instructions which, when executed by a computer processor, perform a method of disease classification.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the disease classification method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the disease classification apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method of disease classification, comprising:
acquiring a numerical vector of case information of a target object to be classified;
respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
2. The method of claim 1, wherein the historical case sample data comprises: historical case sample information and disease types corresponding to the historical case sample information;
the training method of any one of the first disease classification model and the second disease classification model comprises the following steps:
determining a numerical value vector of each historical case sample information based on each historical case sample information;
determining a training sample of any one of the first disease classification model and the second disease classification model based on the numerical vector of each of the historical case sample information;
and performing iterative training on any one of the first disease classification model and the second disease classification model based on the disease types corresponding to the training samples and the training samples.
3. The method of claim 2, wherein, when training the first disease classification model,
the determining a numerical vector of each historical case sample information based on each historical case sample information includes:
determining a numerical code of each character in each piece of historical case sample information based on a corresponding relationship between each character in each piece of non-standardized historical case sample information and the numerical code corresponding to each character in each piece of non-standardized historical case sample information;
and orderly splicing the numerical codes of each character in each non-standardized historical case sample information to obtain the numerical vector of each non-standardized historical case sample information.
4. The method of claim 2, wherein when training the second disease classification model,
the determining a numerical vector of each historical case sample information based on each historical case sample information includes:
determining the position of a characteristic value corresponding to each structured characteristic information in a numerical sequence based on each structured characteristic information in each structured historical case sample information;
and determining a numerical vector of each structured historical case information based on the characteristic value corresponding to each structured characteristic information in each structured historical case sample information and the position of the characteristic value corresponding to each structured characteristic information in a numerical sequence.
5. The method of claim 3, further comprising:
when the number of numerical values in the numerical value vector of the non-standardized historical case sample information is larger than a preset number threshold, deleting the numerical values exceeding the preset number threshold so as to enable the number of numerical values in the numerical value vector of the non-standardized historical case sample information to be equal to the preset number threshold;
when the number of values in the value vector of the non-standardized historical case sample information is smaller than a preset number threshold, supplementing a preset value after the last value in the value vector of the non-standardized historical case sample information so as to enable the number of values in the value vector of the non-standardized historical case sample information to be equal to the preset number threshold.
6. The method according to claim 1, wherein the determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type comprises:
accumulating the first disease classification probability and the second disease classification probability that the case information belongs to the current type of disease to obtain a target probability that the case information belongs to the current type of disease;
and sequencing the target probabilities of the case information belonging to the disease types, and determining the target disease type of the case information of the target object based on the sequencing result.
7. The method of claim 1, wherein the first disease classification model is a recurrent neural network model, wherein a 2-layer bidirectional long-short term memory network model is used as a hidden layer of the recurrent neural network model, and a normalization layer is used as an output layer of the recurrent neural network model;
the second disease classification model is a full-connection neural network model, the 2 full-connection layers are used as hidden layers of the full-connection neural network model, and the S-shaped growth curve layer is used as an output layer of the full-connection neural network model.
8. A disease classification device, comprising:
the case information acquisition module is used for acquiring a numerical vector of case information of a target object to be classified;
the disease type classification module is used for respectively inputting the numerical vectors of the case information into a first trained disease classification model and a second trained disease classification model to obtain a first disease classification probability and a second disease classification probability of the case information belonging to each disease type, wherein the first disease classification model is obtained by training based on a plurality of non-standardized historical medical record sample data; the second disease classification model is obtained by training based on a plurality of structured historical case sample data;
a target disease type determination module for determining a target disease type of case information of the target object based on the first disease classification probability and the second disease classification probability that the case information belongs to each disease type.
9. An apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a disease classification method as claimed in any one of claims 1-7.
10. A storage medium containing computer-executable instructions for performing the disease classification method of any one of claims 1-7 when executed by a computer processor.
CN202010612274.1A 2020-06-29 2020-06-29 Disease classification method, device, equipment and storage medium Pending CN111785385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010612274.1A CN111785385A (en) 2020-06-29 2020-06-29 Disease classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010612274.1A CN111785385A (en) 2020-06-29 2020-06-29 Disease classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111785385A true CN111785385A (en) 2020-10-16

Family

ID=72760342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010612274.1A Pending CN111785385A (en) 2020-06-29 2020-06-29 Disease classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111785385A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800248A (en) * 2021-01-19 2021-05-14 天河超级计算淮海分中心 Similar case retrieval method, similar case retrieval device, computer equipment and storage medium
CN112885481A (en) * 2021-03-09 2021-06-01 联仁健康医疗大数据科技股份有限公司 Case grouping method, case grouping device, electronic equipment and storage medium
CN113111162A (en) * 2021-04-21 2021-07-13 康键信息技术(深圳)有限公司 Department recommendation method and device, electronic equipment and storage medium
CN115938593A (en) * 2023-03-10 2023-04-07 武汉大学人民医院(湖北省人民医院) Medical record information processing method, device and equipment and computer readable storage medium
CN117292174A (en) * 2023-09-06 2023-12-26 中化现代农业有限公司 Apple disease identification method, apple disease identification device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108922608A (en) * 2018-06-13 2018-11-30 平安医疗科技有限公司 Intelligent hospital guide's method, apparatus, computer equipment and storage medium
CN109460473A (en) * 2018-11-21 2019-03-12 中南大学 The electronic health record multi-tag classification method with character representation is extracted based on symptom
CN109978022A (en) * 2019-03-08 2019-07-05 腾讯科技(深圳)有限公司 A kind of medical treatment text message processing method and device, storage medium
CN110249392A (en) * 2018-08-20 2019-09-17 深圳市全息医疗科技有限公司 Intelligent assisting in diagnosis and treatment system and method
WO2020048264A1 (en) * 2018-09-03 2020-03-12 平安医疗健康管理股份有限公司 Method and apparatus for processing drug data, computer device, and storage medium
CN110991170A (en) * 2019-12-05 2020-04-10 清华大学 Chinese disease name intelligent standardization method and system based on electronic medical record information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108922608A (en) * 2018-06-13 2018-11-30 平安医疗科技有限公司 Intelligent hospital guide's method, apparatus, computer equipment and storage medium
CN110249392A (en) * 2018-08-20 2019-09-17 深圳市全息医疗科技有限公司 Intelligent assisting in diagnosis and treatment system and method
WO2020048264A1 (en) * 2018-09-03 2020-03-12 平安医疗健康管理股份有限公司 Method and apparatus for processing drug data, computer device, and storage medium
CN109460473A (en) * 2018-11-21 2019-03-12 中南大学 The electronic health record multi-tag classification method with character representation is extracted based on symptom
CN109978022A (en) * 2019-03-08 2019-07-05 腾讯科技(深圳)有限公司 A kind of medical treatment text message processing method and device, storage medium
CN110490251A (en) * 2019-03-08 2019-11-22 腾讯科技(深圳)有限公司 Prediction disaggregated model acquisition methods and device, storage medium based on artificial intelligence
CN110991170A (en) * 2019-12-05 2020-04-10 清华大学 Chinese disease name intelligent standardization method and system based on electronic medical record information

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
曾向阳: "《智能水中目标识别》", vol. 2016, 31 March 2016, 国防工业出版社, pages: 136 - 138 *
李林杰 等: "《经济应用统计学》", vol. 2010, 31 July 2010, 现代教育出版社, pages: 82 - 84 *
梁繁荣 等: "《针灸数据挖掘与临床决策》", vol. 2010, 28 February 2010, 巴蜀书社, pages: 200 - 205 *
董海军: "《社会调查与统计》", vol. 2015, 28 February 2015, 武汉大学出版社, pages: 155 - 160 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800248A (en) * 2021-01-19 2021-05-14 天河超级计算淮海分中心 Similar case retrieval method, similar case retrieval device, computer equipment and storage medium
CN112885481A (en) * 2021-03-09 2021-06-01 联仁健康医疗大数据科技股份有限公司 Case grouping method, case grouping device, electronic equipment and storage medium
CN113111162A (en) * 2021-04-21 2021-07-13 康键信息技术(深圳)有限公司 Department recommendation method and device, electronic equipment and storage medium
CN115938593A (en) * 2023-03-10 2023-04-07 武汉大学人民医院(湖北省人民医院) Medical record information processing method, device and equipment and computer readable storage medium
CN115938593B (en) * 2023-03-10 2023-06-02 武汉大学人民医院(湖北省人民医院) Medical record information processing method, device, equipment and computer readable storage medium
CN117292174A (en) * 2023-09-06 2023-12-26 中化现代农业有限公司 Apple disease identification method, apple disease identification device, electronic equipment and storage medium
CN117292174B (en) * 2023-09-06 2024-04-19 中化现代农业有限公司 Apple disease identification method, apple disease identification device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111785385A (en) Disease classification method, device, equipment and storage medium
CN109189991B (en) Duplicate video identification method, device, terminal and computer readable storage medium
CN106951484B (en) Picture retrieval method and device, computer equipment and computer readable medium
CN110309874B (en) Negative sample screening model training method, data screening method and data matching method
EP3545470A1 (en) Method for training neuron network and active learning system
WO2022110444A1 (en) Dynamic prediction method and apparatus for cloud native resources, computer device and storage medium
CN111445968A (en) Electronic medical record query method and device, computer equipment and storage medium
EP2499569A1 (en) Clustering method and system
CN111368064A (en) Survey information processing method, device, equipment and storage medium
CN111008272A (en) Knowledge graph-based question and answer method and device, computer equipment and storage medium
CN111951943B (en) Intelligent triage method and device, electronic equipment and storage medium
CN111710364B (en) Method, device, terminal and storage medium for acquiring flora marker
CN110969172A (en) Text classification method and related equipment
CN111160049B (en) Text translation method, apparatus, machine translation system, and storage medium
CN114886404B (en) Electronic equipment, device and storage medium
CN111310834B (en) Data processing method and device, processor, electronic equipment and storage medium
Nakaya et al. Extraction of correlated gene clusters by multiple graph comparison
CN116955538B (en) Medical dictionary data matching method and device, electronic equipment and storage medium
CN114048136A (en) Test type determination method, device, server, medium and product
CN109635004A (en) A kind of object factory providing method, device and the equipment of database
CN111816306A (en) Medical data processing method, and prediction model training method and device
CN116680401A (en) Document processing method, document processing device, apparatus and storage medium
CN110957046A (en) Medical health case knowledge matching method and system
CN114881124B (en) Causal relation graph construction method and device, electronic equipment and medium
CN114579626B (en) Data processing method, data processing device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination