AU2020333132B2 - Method and system for disease classification coding based on deep learning, and device and medium - Google Patents

Method and system for disease classification coding based on deep learning, and device and medium Download PDF

Info

Publication number
AU2020333132B2
AU2020333132B2 AU2020333132A AU2020333132A AU2020333132B2 AU 2020333132 B2 AU2020333132 B2 AU 2020333132B2 AU 2020333132 A AU2020333132 A AU 2020333132A AU 2020333132 A AU2020333132 A AU 2020333132A AU 2020333132 B2 AU2020333132 B2 AU 2020333132B2
Authority
AU
Australia
Prior art keywords
standard
icd
disease diagnosis
name
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
AU2020333132A
Other versions
AU2020333132A1 (en
Inventor
Dejie FENG
Fuyou Li
Xiaomei Liu
Bo SANG
Zhao Sun
Jun Wu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shan Dong Msun Health Technology Group Co Ltd
Original Assignee
Shan Dong Msun Health Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shan Dong Msun Health Technology Group Co Ltd filed Critical Shan Dong Msun Health Technology Group Co Ltd
Publication of AU2020333132A1 publication Critical patent/AU2020333132A1/en
Application granted granted Critical
Publication of AU2020333132B2 publication Critical patent/AU2020333132B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Pathology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Processing (AREA)

Abstract

A method and a system for disease classification coding based on deep learning, and a device and a medium. Said method comprises: an ICD code knowledge base is constructed; a patient discharge record to be disease classification coded is acquired, and a disease diagnosis name is acquired from the patient discharge record, and it is determined wither a current disease diagnosis name is an ICD standard disease diagnosis name in the ICD code knowledge base; if not, then for the current disease diagnosis name, the most similar t ICD standard disease diagnosis names are extracted from among all of the ICD standard disease diagnosis names; a classification step is entered; and in the classification step: an ICD standard disease diagnosis name corresponding to a maximum probability value is found from among the t ICD standard disease diagnosis names, and is output; and on the basis of the ICD standard disease diagnosis name output by a model M, a corresponding ICD standard disease code is found in the ICD code knowledge base to serve as a final code result, and is output.

Description

Specification
METHOD FOR CLASSIFICATION AND CODING OF DISEASES BASED ON DEEP LEARNING, SYSTEM, DEVICE AND MEDIUM THEREOF TECHNICAL FIELD
The present invention relates to the technical field of classification and coding of
diseases, in particular to a method for classification and coding of disease based on
deep learning, a system, a device and a medium thereof.
BACKGROUND
Information of the Related Art part is merely disclosed to increase the
understanding of the overall background of the present invention, but is not
necessarily regarded as acknowledging or suggesting, in any form, that the
information constitutes the prior art known to a person of ordinary skill in the art.
The International Classification of Disease (ICD) is a system that diseases can be
classified and coded by following their characteristics and specific rules. With the
application of the system of Diagnosis Related Groups (DRGs) in the field of medical
insurance, it is more and more urgent to standardize the information of the first page
of medical record. Therefore, it is more and more important to fill the ICD code of
diagnosis in the first page of medical record quickly and accurately.
In the process of realizing the present invention, the inventor finds the technical
problems with the prior art as shown below:
In order to facilitate coders to accurately fill in the ICD code for diagnosing
diseases, doctors need to fill in the standard diagnostic names in the ICD system in the
medical records. In practice, the diagnostic names of diseases written by doctors in
medical records are often inconsistent with the standard names of diseases in the ICD
system; therefore, coders often need to adjust the diagnostic names of diseases
according to the detailed information of patients' diseases recorded in medical records
to complete the ICD coding. Coders often need to manually consult a large of
information such as patient admission, discharge, and course records to determine the
ICD standard diagnostic name of disease, and then correspond to the ICD code
Specification
according to the name. By which has standard codes for more than 55,000 diseases,
the latest ICD-i1 system presents a huge challenge for coders.
At present, the intelligent coding system is based on the recognition of the
diagnostic name of the disease filled in by the doctor. There is no software system that
can intelligently 'read and analyze' the patient's condition described in the medical
record, so as to give ICD standard code. In other words, the coding work cannot be
completed accurately only through the disease diagnostic names filled in by doctors in
medical records if the names filled in do not meet the standard of ICD system.
SUMMARY
To address the shortcomings of existing technologies, the present invention
provides a method for classification and coding diseases, and a system, a device and a
medium thereof; it is of great significance to use artificial intelligence technology,
especially deep learning technology, to enable computers to 'read' medical records
and intelligently infer standard disease diagnosis based on patient medical records.
In the first aspect, the present invention provides a method for classification and
coding of diseases based on deep learning.
The method for classification and coding diseases based on deep learning,
including:
constructing an ICD code knowledge base; the ICD code knowledge base
includes: ICD standard diagnostic names of diseases and ICD standard codes of
diseases corresponding to the ICD standard diagnostic names of diseases;
obtaining a discharge note of a patient whose disease to be classified and coded,
then obtaining a diagnostic name from the discharge note of the patient, determining
whether a current diagnostic name of disease is a ICD standard diagnostic name of
disease in the ICD code knowledge base: if so, outputting a corresponding ICD
standard code of the disease directly;
if not, extracting t ICD standard diagnostic names of disease with the highest
similarity to the current diagnostic name of disease from all the ICD standard
diagnostic names of diseases; then going to classification steps;
Specification
the classification steps including: finding and outputting an ICD standard
diagnostic name of disease corresponding to a maximum value of probability from the
t ICD standard diagnostic names of disease;
according to the ICD standard diagnostic name of disease output by a model M,
finding a corresponding ICD standard code of disease from the ICD code knowledge
base as a final coding result to output.
In the second aspect, the present invention also provides a system for
classification and coding of diseases based on deep learning;
the system for classification and coding of diseases based on deep learning,
including:
a knowledge base building module, being configured to build the ICD code
knowledge base; the ICD code knowledge base includes the ICD standard diagnostic
names of the diseases and the corresponding ICD standard codes of diseases;
an acquisition model, being configured to obtain a discharge note of a patient
whose disease is to be classified and coded, to obtain the diagnostic name of disease
from the discharge notes of patients, to determine whether the current diagnostic name
of disease is the ICD standard diagnostic name of disease in the ICD code knowledge
base; if yes, outputting the corresponding ICD standard code of disease directly;
if not, extracting t ICD standard diagnostic names of disease with the highest
similarity to the current diagnostic name of disease from all the ICD standard
diagnostic names of disease; then go to classification steps;
a classification module, being configured to find and output the ICD standard
diagnostic name of disease corresponding to a maximum value of probability from the
t ICD standard diagnostic names of disease;
an output module, being configured as: according to the ICD standard diagnostic
name of disease output by a model M, finding and outputting a corresponding ICD
standard code of disease from the ICD code knowledge base as a final coding result.
In the third aspect, the present invention also provides an electronic device,
including a memory, a processor and computer instructions stored in the memory and
running on the processor. When the computer instructions are run by the processor,
Specification
the steps of method in the first aspect are completed.
In the fourth aspect, the present invention also provides a computer readable
storage medium, being used for storing the computer instructions. When the computer
instructions are run by the processor, the steps of method in the first aspect are
completed.
The beneficial effects of the present invention are:
the current technology mainly analyzes the name of the disease diagnosis in the
patient's medical record, but does not analyze the other text information in the
patient's medical record in depth. Therefore, when the disease diagnosis in the
medical record goes wrong, the doctor cannot be alerted and warned. Through deep
learning technology, the present invention comprehensively analyzes the patient's
medical record text information, accurately gives the disease diagnostic name and
code, and can timely find the disease diagnosis problems in the medical record, and
significantly improve the accuracy of the disease diagnosis made by doctors and
disease coding.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings constituting a part of the present application are
used to provide a further understanding of the present application. The exemplary
examples of the present application and descriptions thereof are used to explain the
present application, and do not constitute an improper limitation of the present
application.
FIG.1 is a flow chart of a method in embodiment 1;
FIG.2 is a schematic diagram of the working principle of the deep learning
model in embodiment 1.
DETAILED DESCRIPTION
It should be noted that, the following detailed descriptions are all exemplary, and
are intended to provide further descriptions of the present invention. Unless otherwise
specified, all technical and scientific terms used herein have the same meanings as
Specification
those usually understood by a person of ordinary skill in the art to which the present
invention belongs.
It should be noted that the terms used herein are merely used for describing
specific implementations, and are not intended to limit exemplary implementations of
the present invention. As used herein, the singular form is also intended to include the
plural form unless the context clearly dictates otherwise. In addition, it should further
be understood that, terms 'comprise' and/or 'include' used in this specification
indicate that there are features, steps, operations, devices, components, and/or
combinations thereof.
In Embodiment 1, the present embodiment provides a method for classification
and coding of diseases based on deep learning;
as shown in FIG.1, the method for classification and coding of diseases based on
deep learning, including:
Si: constructing an ICD code knowledge base; the ICD code knowledge base
including ICD standard diagnostic names of diseases and corresponding ICD standard
codes of diseases;
S2: obtaining a discharge note of a patient whose disease to be classified and
coded, then obtaining a diagnostic name from the discharge note of the patient,
determining whether a current diagnostic name of disease is a ICD standard
diagnostic name of disease in the ICD code knowledge base: if so, outputting a
corresponding ICD standard code of the disease directly;
if not, extracting t ICD standard diagnostic names of disease with the highest
similarity to the current diagnostic name of disease from all the ICD standard
diagnostic names of disease; then go to classification steps;
S3: the classification steps including: finding and outputting the ICD standard
diagnostic name of disease corresponding to a maximum value of probability from the
t ICD standard diagnostic names of disease;
S4: according to the ICD standard diagnostic name of disease output by a model
M, finding a corresponding ICD standard code of disease from the ICD code
knowledge base as a final coding result to output;
Specification
The classification steps, as one or more embodiments, specifically including:
obtaining an admission situation and a process of diagnosis and treatment from
the discharge note of patient; combining the admission situation and the process of
diagnosis and treatment to obtain a medical record to be classified which marked as u;
obtaining standard medical records corresponding to each of t ICD standard
diagnostic names of disease;
inputting the medical record to be classified u and the t standard medical records
into a pre-trained model M for deep learning; the model M finds and outputs the ICD
standard diagnostic name of disease according to a maximum value of probability of
similarity from the t ICD standard diagnostic names of disease.
As one or more embodiments, the ICD code knowledge base includes ICD
standard diagnostic name of disease and corresponding ICD standard code of disease,
for example:
Table 1 ICD code knowledge base
ICD standard diagnostic name of disease ICD standard code of disease
Typhopneumonia AO.051+
wherein, the first column of the table 1 of the ICD code knowledge base is the
ICD standard diagnostic name of disease and the second column of the table 1 of the
ICD code knowledge base is the ICD standard code of disease corresponding to the
name of disease diagnosis in the first column.
As one or more embodiments, the discharge note of patient, including one or
more of the following information: an admission situation I, a process of diagnosis
and treatment P and a discharge diagnosis result D.
As one or more embodiments, obtaining the diagnostic name of disease from the
patient's discharge note means obtaining the diagnostic name of disease from the
discharge diagnosis result D in the patient's discharge note, wherein the diagnostic
name of disease recorded by the doctor after the final diagnosis, such as
typhopneumonia, pneumonia, etc.
As one or more embodiments, the t ICD standard diagnostic names of disease
Specification
with the highest similarity calculated by a text similarity algorithm are extracted from
all the ICD standard diagnostic names of disease.
As one or more embodiment , obtaining the standard medical records
corresponding to each of t ICD standard diagnostic names of the disease. Wherein, the
steps of generating the standard medical records are:
for the ICD standard diagnostic name of disease, selecting n discharge notes
containing the current ICD standard diagnostic name of disease;
for each discharge note, extracting the admission situation and process of
diagnosis and treatment to form a standard medical record.
Similarly, n discharge notes correspond to n standard medical records, that is, the
ICD standard diagnostic name of disease corresponding to n standard medical
records.
As one or more embodiments, obtaining of the standard medical records
corresponding to each of ICD standard diagnostic name of disease is to select
randomly one standard medical record from the n standard medical records
corresponding to each of ICD standard diagnostic names of disease.
As one or more embodiments, the model M for deep learning shown in FIG. 2,
including:
an input layer, being used to input the medical records to be classified and t
standard medical records;
a vectorization representation layer, being used to carry out coarse granularity
vectorization representation of the input medical records to be classified and t
standard medical records;
a bidirectional LSTM network based on attention mechanism, being used to
extract the features of the results of coarse granularity vectorization representation,
and to extract the results of fine granularity vectorization representation of each
medical record;
a pooling layer, being used to obtain the mutual vector V between the names of
disease diagnosis to be classified and the corresponding names of disease diagnosis in
each standard medical record library on the results of the fine granularity
Specification
vectorization representation of each medical record; a linear regression layer, being used to map the mutual vector V to a similarity relationship index p between the medical records to be classified and the each standard medical record by using a linear regression algorithm; a Softmax layer, being used to convert the similarity relationship index p into a probability p through the Softmax function; a cross entropy layer, being used to take a cross entropy of the probability p and a 0-1 probability distribution q of real results as a loss function of the model M; an output layer, being used to output the ICD standard diagnostic name of disease corresponding to the maximum value of the probability p. As one or more embodiments, a training process of the model M for deep learning for deep learning, including: building a training set; inputting the training set into the model M for deep learning for deep learning; training the model M for deep learning; when a number of training time is reached a set number, the training ends; outputting the trained model M. As one or more embodiments, the specific steps of building the training set are: step (31): establishing a standard medical record library: step (311): for each of the standard diagnostic name of disease in the ICD code knowledge base, selecting n (for example, n=5) discrepant discharge notes containing the diagnostic name of disease; the discrepant discharge notes refer to the similarity between two of the discharge notes calculated by using the Levenshtein Distance is less than a certain threshold, such as 0.5; step (312): extracting the admission situation and the process of diagnosis and treatment for each discharge note selected in step (311) to form a standard medical record; step (313): obtaining the standard medical record of each of standard diagnostic name of disease in the ICD code knowledge base by step (311) and step (312), and forming a standard medical record library by collecting all the standard medical records;
Specification
step (32): building a medical record library to be trained; step (321): selecting m (for example, M =100) discrepant discharge notes that contain any of the standard diagnostic names of the disease in the ICD code knowledge base from outside the standard medical record library; the discrepant discharge notes refer to the similarity between two of them calculated using the Levenshtein Distance is less than a certain threshold, such as 0.5; step (322): forming a medical record to be train by extracting the admission situation and the process of treatment and diagnosis from each of discharge notes selected in the step (321); step (323): obtaining the standard medical record of each of standard diagnostic names of the disease in the ICD code knowledge base by step (321) and step (322), and to form a standard medical record library by collecting all the standard medical records; step (33): establishing a training set, a verification set and a test set: step (331): selecting randomly one of the n standard medical records for each diagnostic name d written by doctors in any of the medical records to be trained r in the medical record library to be trained as a correct medical record for training; then, selecting randomly t-1(the value of t is same as the value of the t in step (202)) diagnosis that differs from the diagnostic name d from the medical record library to be trained; to select randomly one of the n standard medical records corresponding to each diagnosis in the t-1 diagnosis as a control medical record for training; a control medical record set for training is formed by all t-1 control medical records for training; the medical record to be trained r corresponding to the diagnostic name d, the correct medical record for training and the control medical record set for training are combined to be called the medical record set for training for the diagnostic name d in the medical record to be trained r; that is, the medical record set for training for the diagnostic name d in the medical record to be trained r contains t+1 medical records; step (332): through the step (331), obtaining the medical record set for training of each medical record to be trained and each of the diagnostic names of the disease in
Specification
the medical record library to be trained; a medical record library for training of all the names of disease diagnosis in all the medical records in the medical record library to be trained is formed through the obtained medical record set for training; step (333): 60%, 20% and 20% of the medical record set for training in the medical record library for training are classified into the training set, the verification set and the test set respectively. As one or more embodiments, the detailed training process of the model M for deep learning includes: step (401): for a medical record set for train, assuming the medical record to be trained is r, the corresponding diagnostic name of disease of the medical record set for train is d, the correct medical record for training is z, the control medical record set for training is Q, and each 'control medical record for training' is qi, and the set{{z} U Q} is written as set W; step (402): completing vector quantization for each medical record b in the medical record set for training; the words in the medical record are represented by word vector using word segmentation and word vector conversion, and a word vector representation of the medical record is called C_1_b; step (403): inputting the word vector representation C_1_b obtained in step (402) of each medical record in the medical record set for trained into the bidirectional LSTM model, and using the output of the bidirectional LSTM model based on attention mechanism as a new word vector representation of each medical record, which is called C_2_b; step (404): a word vector representation C_2_r of the medical record to be trained r and a word vector representation (marked as C_2_z and C_2_q_i) of each medical record (marked as z and q_i respectively in step (401)) in the set W passing through the 'pooling layer' (e.g. 'maximum pool', 'minimum pool', and 'average pool'), and obtaining a vector V (marked as V_r_z and V_r_q_i) that represents the correlation between the diagnostic names of the disease d corresponding to 'the medical records to be trained' r and the diagnostic name of disease corresponding to each medical record (which are z and q_i) in the set W;
Specification
step (405): mapping the vector V to an exponent p representing the similarity
relationship between the representation r and each medical record (which are r and z,
r and q_i) in the set W by using the linear regression method, which are
p(r,z)=cl* V_r_z + c_2;
p(r,q_i)=c_l* Vr q_i + c_2;
wherein, c_1 and c_2 are the parameters in the linear regression relation, of
which the specific values are determined by training;
step(406): converting p to a probability p via softmax function, which are
p (r,z)= exp(p(r,z)) exp(p(r,z))+±exp(p(r,qi))
p(rq exp(p(r,q-i)) exp(p(r,z))+ exp(p(r,q j))
wherein, p(r,z)represents the probability that medical records r and z have the
same disease diagnosis, and p(r,q_i)has the same meaning;
step (407): taking the cross entropy ofp and the 0-1 probability distribution q of
the real result as the loss function of model M; because r and z have the same disease
diagnosis, r and q_i do not have, the real result is:
q(r,z)=1;
Vi, q(r,q_i)=O;
therefore, the loss function determined by the cross entropy ofp and q is:
L(r,d)=-log(p(r,z));
step (408): obtaining the parameters of the model M by training: the minimum
loss function of each diagnostic name of disease d corresponding to each medical
record for training r in the training set as a training target for training; the model M
obtained being saved once every 1000 times of training on the training set;
step (409): performing a verification on the verification set using the model M:
taking the diagnosis x corresponding to the largest probability value p(r, x) in the
probability distribution as the inference of the model M, and the inference being
verified by comparing the inference with the real results, then saving the accuracy of
Specification
verification obtained. step (410): after verification for 100 times, selecting the parameters corresponding to the model with the highest verification accuracy as the parameters of the final model M. In Embodiment 2, the present embodiment also provides a system for classification and coding of diseases based on deep learning. The system for classification and coding of diseases based on deep learning, including: a knowledge base building module, being configured to build the ICD code knowledge base; the ICD code knowledge base includes the ICD standard diagnostic names of the disease names and the corresponding ICD standard codes of disease; an acquisition model, being configured to obtain the discharge notes of patients whose disease is to be classified and coded, to obtain the diagnostic name of disease from the discharge notes of patients, to determine whether the current diagnostic name of disease is the ICD standard diagnostic name of disease in the ICD code knowledge base; if yes, outputting the corresponding ICD standard code of disease directly; if not, extracting t ICD standard diagnostic names of disease with the highest similarity to the current diagnostic name of disease from all the ICD standard diagnostic names of disease; then go to classification steps; a classification module, being configured to find and output an ICD standard diagnostic name of disease corresponding to a maximum value of probability from the t ICD standard diagnostic names of disease; an output module, being configured as: according to the ICD standard diagnostic name of disease output by a model M, finding and outputting a corresponding ICD standard code of disease from the ICD code knowledge base as a final coding result. The functions of each module in the system correspond to the functions of each step in the Embodiment 1 one by one, and will not be described here. In Embodiment 3, the present embodiment also provides an electronic device, including a memory, a processor and computer instructions stored on the memory and running on the processor. When the computer instructions are run by the processor,
Specification
the steps of the method in the embodiment 1 are completed. In Embodiment 4, the present embodiment also provides a computer readable storage medium, used for storing the computer instructions. When the computer instructions are run by the processor, the steps of the method in the embodiment 1 are completed. The foregoing descriptions are merely preferred embodiments of the present invention, but not intended to limit the present invention. A person skilled in the art may make various alterations and variations to the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (7)

1. A method for classification and coding of diseases based on deep learning, executed on a computer, the method comprising: constructing a code knowledge base of an International Classification of Disease (ICD); wherein, the ICD code knowledge base comprises: ICD standard names of disease diagnosis and ICD standard codes of disease corresponding to the ICD standard names of disease diagnosis; obtaining a discharge note of a patient whose disease is to be classified and coded, then obtaining a name of disease diagnosis from the discharge note of the patient; determining whether a current name of disease diagnosis is the ICD standard name of disease diagnosis in the ICD code knowledge base: if so, outputting an ICD standard code of disease corresponding to the current name of disease diagnosis directly; if not, extracting t ICD standard names of disease diagnosis which are most similar to the current name of disease diagnosis from all the ICD standard names of disease diagnosis; then go to a classification step; wherein, the classification step comprises: finding and outputting, by a deep learning model M, an ICD standard name of disease diagnosis corresponding to a maximum similarity probability from the t ICD standard names of disease diagnosis; according to the ICD standard name of disease diagnosis output by the deep learning model M, finding a ICD standard code of disease corresponding to the output ICD standard name of disease diagnosis from the ICD code knowledge base as a final coding result to output; and the classification step, specifically comprising obtaining an admission situation and a diagnosis and treatment process from the discharge note of the patient; obtaining a medical record to be classified u by combining the admission situation and the diagnosis and treatment process; obtaining a standard medical record corresponding to each one of the t ICD standard names of disease diagnosis; inputting the medical record to be classified u and the t standard medical records into the deep learning model M being pre-trained; finding and outputting, by the deep learning model M, the ICD standard name of disease diagnosis corresponding to the maximum similarity probability from the t ICD standard names of disease diagnosis; wherein, the t ICD standard names of disease diagnosis, being obtained by calculating a similarity using a text similarity algorithm; a generation of the standard medical record comprises the steps of: for one ICD standard diagnostic name of disease, selecting n discharge notes containing current the one ICD standard name of disease diagnosis; extracting the admission situation and diagnosis and treatment process from each one of the n discharge notes to form a standard medical record; for the same reason, the n discharge notes may correspond to obtain n standard medical records; that is, the one ICD standard name of disease diagnosis, correspond to obtain the n standard medical records; wherein, the discharge notes refer to that, corresponding to each the name of disease diagnosis in the ICD code knowledge base, selecting n discrepant discharge notes containing the name of disease diagnosis; the discrepant discharge notes refer to that there is the similarity calculated by Levenshtein Distance between each two of the n discharge notes being less than a certain threshold.
2. The method of claim 1, wherein the deep learning model M comprises:
an input layer, used to input the medical records to be classified and the t standard
medical records;
a vectorization representation layer, used for a coarse granularity vectorization
representation of the input medical records to be classified and the t standard medical
records;
a bidirectional LSTM network based on attention mechanism, used to extract
features of results of the coarse granularity vectorization representation, and to extract
results of a fine granularity vectorization representation of each the medical record;
a pooling layer, used to obtain a mutual vector V between the name of disease
diagnosis to be classified and the name of disease diagnosis corresponding to the each standard medical record library on the results of the fine granularity vectorization representation of the each medical record; a linear regression layer, used to map the mutual vector V to a similarity relationship index p between the medical records to be classified and the each standard medical record by using a linear regression algorithm; a Softmax layer, used to convert the similarity relationship index p into a probability p through a Softmax function; a cross entropy layer, used to take a cross entropy of the probability p and a 0-1 probability distribution q of real results as a loss function of the model M; an output layer, used to output the ICD standard name of disease diagnosis corresponding to the maximum value of the probability p.
3. The method of claim 1, wherein a training process of the deep learning model M comprises: building a training set; inputting the training set into the deep learning model M; training the deep learning model M; when a number of training time is reached a set number, ending the training; outputting the trained deep learning model M.
4. The method of claim 1, wherein the building the training set comprises the specific steps of: step (31): establishing a standard medical record library: step (311): corresponding to each the standard name of disease diagnosis in the ICD code knowledge base, selecting n discrepant discharge notes containing the name of the disease diagnosis; step (312): extracting the admission situation and the diagnosis and treatment process from each the selected discharge note in step (311), to form the standard medical record; step (313): corresponding to each the standard name of disease diagnosis in the ICD code knowledge base, forming the standard medical record library for all the standard names of disease diagnosis by the standard medical records obtained in the step (311) and the step (312); step (32): establishing a medical record library to be trained; step (321): corresponding to each the standard name of disease diagnosis in the ICD code knowledge base, selecting m discrepant discharge notes containing the standard name of disease diagnosis from outside the standard medical record library; the discrepant discharge notes refer to that there is the similarity calculated by the Levenshtein Distance between the each two discharge notes being less than the certain threshold; step (322): extracting the admission situation and the process of treatment and diagnosis from each the selected discharge note in the step (321) to form a medical record to be train; step (323): corresponding to each the standard name of disease diagnosis in the ICD code knowledge base, forming the standard medical record library for all the standard names of disease diagnosis by the medical records to be trained obtained in the step (321) and the step (322); step (33): establishing a training set, a verification set and a test set: step (331): for each the medical record to be trained r in the medical record library to be trained, and for each the name of disease diagnosis d written by doctor in the medical record to be trained r, selecting randomly one of the n standard medical records corresponding to the name of disease diagnosis d as a correct medical record for training; then selecting randomly t-1 diagnosis that differ from the name of disease diagnosis d from the medical record library to be trained; selecting randomly one of the n standard medical records corresponding to each one of the t-1 diagnosis as a control medical record for training; and, forming a control medical record set for training through the t-1 control medical records for training; and combining the medical record to be trained r corresponding to the diagnosis name d, the correct medical record for training and the control medical record set for training, and naming as a medical record set for training of the name of disease diagnosis d in the medical record to be trained r; then, the medical record set for training of the name of disease diagnosis d in the medical record to be trained r contains totally t+1 medical records; step (332): for each the medical record to be trained in the medical record library to be trained and each the name of disease diagnosis in the each medical record to be trained, forming a medical record library for training of the names of disease diagnosis in all the medical records in the medical record library to be trained through the medical record set for training obtained in the step(331); step (333): dividing the medical record sets for training in the medical record library for training into the training set, the verification set and the test set respectively according to a set proportion.
5. A system for classification and coding of diseases based on deep learning on a computer, the system comprising: a knowledge base building module, being configured to construct an ICD code knowledge base; the ICD code knowledge base comprises ICD standard names of disease diagnosis and ICD standard codes of disease corresponding to the ICD standard names of disease diagnosis; an acquisition model, being configured to obtain a discharge note of a patient whose disease is to be classified and coded, to obtain a name of disease diagnosis from the discharge note of the patient, to determine whether a current name of disease diagnosis is the ICD standard name of disease diagnosis in the ICD code knowledge base; if yes, an ICD standard code of disease corresponding to the current name of disease diagnosis is output directly; if not, to extract t ICD standard names of disease diagnosis which are most similar to the current name of disease diagnosis from all the ICD standard names of disease diagnosis; then go to a classification step; a classification module that is a deep learning model M, being configured to find and output an ICD standard name of disease diagnosis corresponding to a maximum similarity probability from the t ICD standard names of disease diagnosis; an output module, being configured to, according to the ICD standard name of disease diagnosis output by the classification module, find a corresponding ICD standard code of disease from the ICD code knowledge base as a final coding result to output. the classification module, specifically comprising: obtaining an admission situation and a diagnosis and treatment process from the discharge note of the patient; obtaining a medical record to be classified u by combining the admission situation and the diagnosis and treatment process; obtaining standard medical records corresponding to each one of the t ICD standard names of disease diagnosis; inputting the medical record to be classified u and the t standard medical records into the deep learning model M being pre-trained; finding and outputting, by the deep learning model M, the ICD standard name of disease diagnosis corresponding to the maximum similarity probability from the t ICD standard names of disease diagnosis; wherein, the t ICD standard names of disease diagnosis are obtained by calculating a similarity using a text similarity algorithm; a generation module for the standard medical record, being configured to: for one ICD standard name of disease diagnosis, select n discharge notes containing current the one ICD standard name of disease diagnosis; extract the admission situation and diagnosis and treatment process from each one of the n discharge notes to form the standard medical record; for the same reason, obtain n standard medical records corresponding to the n discharge notes; that is, the one ICD standard name of disease diagnosis, correspond to obtain the n standard medical records; wherein, the discharge notes refer to that, corresponding to each the name of disease diagnosis in the ICD code knowledge base, selecting n discrepant discharge notes containing the name of disease diagnosis; the discrepant discharge notes refer to that there is the similarity calculated by Levenshtein Distance between each two of the n discharge notes being less than a certain threshold.
6. An electronic device, including a memory, a processor and computer
instructions stored in the memory and running on the processor, completing the steps
of the method of any of claims 1-4, when the computer instructions are run by the
processor.
7. A computer readable storage medium, using for storing the computer
instructions, completing the steps of the method of any of claims 1-4, when the
computer instructions are run by the processor.
AU2020333132A 2019-08-20 2020-10-19 Method and system for disease classification coding based on deep learning, and device and medium Active AU2020333132B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201910768516.3A CN110491465B (en) 2019-08-20 2019-08-20 Disease classification coding method, system, device and medium based on deep learning
CN201910768516.3 2019-08-20
PCT/CN2020/121962 WO2021032219A2 (en) 2019-08-20 2020-10-19 Method and system for disease classification coding based on deep learning, and device and medium

Publications (2)

Publication Number Publication Date
AU2020333132A1 AU2020333132A1 (en) 2021-11-18
AU2020333132B2 true AU2020333132B2 (en) 2023-07-13

Family

ID=68552073

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020333132A Active AU2020333132B2 (en) 2019-08-20 2020-10-19 Method and system for disease classification coding based on deep learning, and device and medium

Country Status (3)

Country Link
CN (1) CN110491465B (en)
AU (1) AU2020333132B2 (en)
WO (1) WO2021032219A2 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110491465B (en) * 2019-08-20 2020-09-15 山东众阳健康科技集团有限公司 Disease classification coding method, system, device and medium based on deep learning
CN111026841B (en) * 2019-11-27 2023-04-18 云知声智能科技股份有限公司 Automatic coding method and device based on retrieval and deep learning
CN110991170B (en) * 2019-12-05 2021-10-12 清华大学 Chinese disease name intelligent standardization method and system based on electronic medical record information
CN110911015B (en) * 2019-12-05 2022-12-02 清华大学 Disease name standardization rapid calculation method based on profile implicit Markov model
CN111046882B (en) * 2019-12-05 2023-01-24 清华大学 Disease name standardization method and system based on profile hidden Markov model
CN111046672B (en) * 2019-12-11 2020-07-14 山东众阳健康科技集团有限公司 Multi-scene text abstract generation method
CN110895580B (en) * 2019-12-12 2020-07-07 山东众阳健康科技集团有限公司 ICD operation and operation code automatic matching method based on deep learning
CN111180062A (en) * 2019-12-12 2020-05-19 山东众阳健康科技集团有限公司 Disease classification coding intelligent recommendation method based on original diagnosis data
CN112992303A (en) * 2019-12-15 2021-06-18 苏州市爱生生物技术有限公司 Human phenotype standard expression extraction method
CN111192667A (en) * 2019-12-16 2020-05-22 山东众阳健康科技集团有限公司 Method for prompting operation code based on intelligent operation
CN113012774A (en) * 2019-12-18 2021-06-22 医渡云(北京)技术有限公司 Automatic medical record encoding method and device, electronic equipment and storage medium
CN111370084B (en) * 2020-02-07 2023-10-03 山东师范大学 BiLSTM-based electronic health record representation learning method and system
CN111402974A (en) * 2020-03-06 2020-07-10 西南交通大学 Electronic medical record ICD automatic coding method based on deep learning
CN111462896B (en) * 2020-03-31 2023-04-18 重庆大学 Real-time intelligent auxiliary ICD coding system and method based on medical record
CN111540468B (en) * 2020-04-21 2023-05-16 重庆大学 ICD automatic coding method and system for visualizing diagnostic reasons
CN112686306B (en) * 2020-12-29 2023-03-24 山东众阳健康科技集团有限公司 ICD operation classification automatic matching method and system based on graph neural network
CN114983352A (en) * 2021-03-01 2022-09-02 浙江远图互联科技股份有限公司 Method and device for identifying new coronary pneumonia based on attention mechanism
CN113031943A (en) * 2021-03-29 2021-06-25 北京大米科技有限公司 Code generation method, device, storage medium and electronic equipment
CN113488183B (en) * 2021-06-30 2023-10-31 吾征智能技术(北京)有限公司 Heating disease multi-mode feature fusion cognitive system, equipment and storage medium
CN113569996B (en) * 2021-08-30 2024-05-07 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for classifying medical records information
CN113779179B (en) * 2021-09-29 2024-02-09 北京雅丁信息技术有限公司 ICD intelligent coding method based on deep learning and knowledge graph
CN114388085B (en) * 2021-11-23 2022-09-09 皖南医学院第一附属医院(皖南医学院弋矶山医院) Real-time intelligent auxiliary ICD coding method and system based on medical record
CN115964472A (en) * 2021-12-03 2023-04-14 奥码哈(杭州)医疗科技有限公司 ICD coding method, ICD coding query method, coding system and query system
CN115081668A (en) * 2021-12-29 2022-09-20 南方医科大学深圳医院 Disease category score prediction system and method based on disease diagnosis
CN117271804B (en) * 2023-11-21 2024-03-01 之江实验室 Method, device, equipment and medium for generating common disease feature knowledge base

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731269A (en) * 2017-10-25 2018-02-23 山东众阳软件有限公司 Disease code method and system based on raw diagnostic data and patient file data
CN107833605A (en) * 2017-03-14 2018-03-23 北京大瑞集思技术有限公司 A kind of coding method, device, server and the system of hospital's medical record information
CN109785959A (en) * 2018-12-14 2019-05-21 平安医疗健康管理股份有限公司 A kind of disease code method and apparatus

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2229588A4 (en) * 2007-11-14 2011-05-25 Medtronic Inc Diagnostic kits and methods for scd or sca therapy selection
US20180211723A1 (en) * 2015-06-15 2018-07-26 Katie Coles Personalized nutritional and metabolic modification system
CN105069124B (en) * 2015-08-13 2018-06-15 易保互联医疗信息科技(北京)有限公司 A kind of International Classification of Diseases coding method of automation and system
CN107705839B (en) * 2017-10-25 2020-06-26 山东众阳软件有限公司 Disease automatic coding method and system
CN110019839B (en) * 2018-01-03 2021-11-05 中国科学院计算技术研究所 Medical knowledge graph construction method and system based on neural network and remote supervision
CN109273062A (en) * 2018-08-09 2019-01-25 北京爱医声科技有限公司 ICD intelligence Auxiliary Encoder System
CN109697285B (en) * 2018-12-13 2022-06-21 中南大学 Hierarchical BilSt Chinese electronic medical record disease coding and labeling method for enhancing semantic representation
CN109710761A (en) * 2018-12-21 2019-05-03 中国标准化研究院 The sentiment analysis method of two-way LSTM model based on attention enhancing
CN109994216A (en) * 2019-03-21 2019-07-09 上海市第六人民医院 A kind of ICD intelligent diagnostics coding method based on machine learning
CN109993227B (en) * 2019-03-29 2021-09-24 京东方科技集团股份有限公司 Method, system, apparatus and medium for automatically adding international disease classification code
CN110491465B (en) * 2019-08-20 2020-09-15 山东众阳健康科技集团有限公司 Disease classification coding method, system, device and medium based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107833605A (en) * 2017-03-14 2018-03-23 北京大瑞集思技术有限公司 A kind of coding method, device, server and the system of hospital's medical record information
CN107731269A (en) * 2017-10-25 2018-02-23 山东众阳软件有限公司 Disease code method and system based on raw diagnostic data and patient file data
CN109785959A (en) * 2018-12-14 2019-05-21 平安医疗健康管理股份有限公司 A kind of disease code method and apparatus

Also Published As

Publication number Publication date
WO2021032219A3 (en) 2021-04-15
WO2021032219A2 (en) 2021-02-25
AU2020333132A1 (en) 2021-11-18
CN110491465B (en) 2020-09-15
CN110491465A (en) 2019-11-22

Similar Documents

Publication Publication Date Title
AU2020333132B2 (en) Method and system for disease classification coding based on deep learning, and device and medium
CN109471895B (en) Electronic medical record phenotype extraction and phenotype name normalization method and system
CN110364251B (en) Intelligent interactive diagnosis guide consultation system based on machine reading understanding
CN107016438B (en) System based on traditional Chinese medicine syndrome differentiation artificial neural network algorithm model
CN107193919A (en) The search method and system of a kind of electronic health record
CN110111885B (en) Attribute prediction method, attribute prediction device, computer equipment and computer readable storage medium
CN112015917A (en) Data processing method and device based on knowledge graph and computer equipment
CN112100406B (en) Data processing method, device, equipment and medium
CN112765370B (en) Entity alignment method and device of knowledge graph, computer equipment and storage medium
CN112257449A (en) Named entity recognition method and device, computer equipment and storage medium
CN109993227A (en) Method, system, device and the medium of automatic addition International Classification of Diseases coding
CN115497616A (en) Method, system, equipment and storage medium for aid decision making of infectious diseases
CN116719520B (en) Code generation method and device
US6317730B1 (en) Method for optimizing a set of fuzzy rules using a computer
CN108122613B (en) Health prediction method and device based on health prediction model
CN113761192A (en) Text processing method, text processing device and text processing equipment
CN116468043A (en) Nested entity identification method, device, equipment and storage medium
CN116344060A (en) Method and device for representing time-series medical information
CN112686306B (en) ICD operation classification automatic matching method and system based on graph neural network
CN113590846B (en) Legal knowledge map construction method and related equipment
CN115762721A (en) Medical image quality control method and system based on computer vision technology
CN111063430B (en) Disease prediction method and device
CN103198357A (en) Optimized and improved fuzzy classification model construction method based on nondominated sorting genetic algorithm II (NSGA- II)
CN114722217A (en) Content pushing method based on link prediction and collaborative filtering
CN116127050B (en) Opinion mining method, electronic device, and computer-readable storage medium

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)