CN111814432A - Method and apparatus for determining standard diagnostic codes for diseases - Google Patents
Method and apparatus for determining standard diagnostic codes for diseases Download PDFInfo
- Publication number
- CN111814432A CN111814432A CN202010631296.2A CN202010631296A CN111814432A CN 111814432 A CN111814432 A CN 111814432A CN 202010631296 A CN202010631296 A CN 202010631296A CN 111814432 A CN111814432 A CN 111814432A
- Authority
- CN
- China
- Prior art keywords
- diagnostic
- standard
- description
- disease
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 77
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000003745 diagnosis Methods 0.000 claims abstract description 55
- 238000003058 natural language processing Methods 0.000 claims abstract description 21
- 230000015654 memory Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 12
- 238000010801 machine learning Methods 0.000 claims description 7
- 230000000306 recurrent effect Effects 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 5
- 206010019663 Hepatic failure Diseases 0.000 description 6
- 208000011200 Kawasaki disease Diseases 0.000 description 6
- 208000007903 liver failure Diseases 0.000 description 6
- 231100000835 liver failure Toxicity 0.000 description 6
- 208000001725 mucocutaneous lymph node syndrome Diseases 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 241000606768 Haemophilus influenzae Species 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 229940047650 haemophilus influenzae Drugs 0.000 description 5
- 206010040047 Sepsis Diseases 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000001154 acute effect Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 201000008297 typhoid fever Diseases 0.000 description 3
- 208000037386 Typhoid Diseases 0.000 description 2
- 231100000354 acute hepatitis Toxicity 0.000 description 2
- 208000006454 hepatitis Diseases 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 208000007788 Acute Liver Failure Diseases 0.000 description 1
- 206010000804 Acute hepatic failure Diseases 0.000 description 1
- 206010019799 Hepatitis viral Diseases 0.000 description 1
- 206010035664 Pneumonia Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 201000001862 viral hepatitis Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The present disclosure relates to a method of determining a standard diagnostic code for a disease, comprising: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
Description
Technical Field
The present disclosure relates to methods and apparatus for determining a standard diagnostic code for a disease.
Background
Insurance institutions often need to obtain medical information of insured life from medical institutions during insurance claim settlement. The medical information provided by the medical structure typically includes diagnostic codes and diagnostic descriptions. The insurance agency decides whether or not to pay for the insured life based on medical information provided by the medical agency.
Disclosure of Invention
According to one aspect of the present disclosure, there is provided a method of determining a standard diagnostic code for a disease, comprising: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
In some embodiments according to the present disclosure, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions of the subset having a larger second similarity are selected as candidate standard diagnostic descriptions; and selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
In some embodiments according to the disclosure, selecting one of the candidate standard diagnostic descriptions as the standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
In some embodiments according to the present disclosure, the method further comprises: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
In some embodiments according to the present disclosure, the method further comprises: receiving another diagnostic description of the disease from an external entity; and determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
In some embodiments according to the disclosure, the natural language processing uses a model obtained by machine learning, the method further comprising: training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the present disclosure, determining the subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the present disclosure, the diagnostic code is an international disease classification standard code, and the standard diagnostic codes included in the major or minor class of diagnostic codes are taken as the set of standard diagnostic codes.
According to another aspect of the present disclosure, there is provided an apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
In some embodiments according to the disclosure, the processor is further configured to: selecting a predetermined number of standard diagnostic descriptions of the subset having a greater second similarity as candidate standard diagnostic descriptions if it is determined that the maximum value of the second similarity is less than the second threshold; and selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
In some embodiments according to the disclosure, the processor is further configured to: receiving one of the candidate standard diagnostic descriptions manually selected as a standard diagnostic description of the disease.
In some embodiments according to the disclosure, the processor is further configured to: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
In some embodiments according to the disclosure, the processor is further configured to: receiving another diagnostic description of the disease from an external entity; and determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
In some embodiments according to the disclosure, the natural language processing uses a model obtained by machine learning, the processor further configured to: training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the disclosure, the processor is further configured to: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the disclosure, the diagnostic code is an international disease classification standard code, the processor further configured to: and taking the standard diagnostic codes contained in the large class or the small class of the diagnostic codes as the set of the standard diagnostic codes.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, are configured to perform the above-described method of determining a standard diagnostic code for a disease.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
fig. 1 shows a flow diagram of a method of determining a standard diagnostic code for a disease according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of a computing device according to an example embodiment of the present disclosure. .
Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In this specification, like reference numerals and letters are used to designate like items, and therefore, once an item is defined in one drawing, further discussion thereof is not required in subsequent drawings.
For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the disclosed invention is not limited to the positions, dimensions, ranges, etc., disclosed in the drawings and the like.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
Diagnostic codes for diseases provided by different medical institutions may be based on different coding standards. Even if the same coding standard is used, the same disease may give different codes due to differences in the level and knowledge of workers. In addition, diagnostic descriptions of diseases are often written by physicians in medical institutions. Different physicians may use different descriptions. The same disease may also be described using different terms. This increases the complexity of the claim settlement operation performed by the insurance agency.
In some embodiments of the present disclosure, diagnostic codes and diagnostic descriptions of diseases provided by external entities (e.g., medical structures) may be converted into their own standard diagnostic codes and standard diagnostic descriptions of insurance mechanisms, thereby facilitating insurance mechanisms' processing of underwriting and claims.
Fig. 1 shows a flow diagram of a method of determining a standard diagnostic code for a disease according to an embodiment of the present disclosure.
As shown in fig. 1, first, the insurance structure may receive medical information of an insured from an external institution (e.g., medical institution, etc.) (step 101). Such medical information may include diagnostic codes and diagnostic descriptions of diseases.
Diagnostic coding is a method of coding a disease according to its kind. One commonly used diagnostic code for a disease is the international diagnostic code for the disease. Currently, the international disease diagnostic codes are widely used as both versions of ICD-10 and ICD-11. In international disease diagnostic codes, each disease is assigned a seven-digit code, with the first three digits being referred to as the major class and the first four digits being referred to as the minor class. For example, code A01.000B represents an enteric fever. The first three digits of the code, A01, represent the major groups of typhoid and paratyphoid, and the first four digits, A01.0, represent the minor groups of typhoid.
A diagnostic description is a description of the condition of a patient. The diagnostic description may generally include information such as a description of the condition. For example, in some exemplary embodiments according to the present disclosure, the diagnosis description may be a disease name corresponding to the diagnosis code.
After the insurance institution receives the diagnostic code and diagnostic description from the medical institution, the received diagnostic description is matched to the insurance institution's standard diagnosis (step 102).
Diagnostic codes from medical facilities may in many cases be inaccurate or erroneous. Therefore, the insurance agency needs to determine the correct diagnostic code (i.e., the standard diagnostic code) from the diagnostic description provided by the medical institution. In some exemplary embodiments, the standard diagnostic code and the standard diagnostic description corresponding thereto may be stored in a database. The insurance agency may match the diagnostic description provided by the medical institution with the standard diagnostic descriptions in the database to find the standard diagnostic description in the database that is most similar to the diagnostic description provided by the medical institution. The closest standard diagnostic code may be used as a diagnostic code for the disease of the patient.
In some embodiments according to the present disclosure, the diagnostic description of the medical structure may be matched to a plurality of diagnostic descriptions in a database of an insurance company through natural language processing. In the natural language processing, firstly, the diagnosis description provided by the medical institution needs to be participled, and in order to improve the accuracy of the participle, a medical participle library, a medical synonym library, a medical diagnosis description sentence segmentation library and the like can be specially established aiming at the medical field, so that the diagnosis description can be accurately participled. The tokenized diagnostic description may then be semantically matched to standard diagnostic descriptions in a database.
Using semantic matching processing, the similarity (i.e., the first similarity) between each standard diagnostic description in the database and the diagnostic description of the medical institution can be obtained. In many cases, the standard diagnostic description with the highest similarity may be output as a result of matching with the diagnostic description of the medical structure, and the standard diagnostic description and the corresponding standard diagnostic code may be output as a result. For example, a predetermined threshold (first threshold) may be set. In the case where the maximum value of the similarity is greater than or equal to the predetermined threshold value (first threshold value), the standard diagnostic description and the standard diagnostic code corresponding to the maximum value may be directly used as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code provided by the medical institution.
However, in the case where the similarity of the standard diagnostic description with the highest similarity is still low, for example, the similarity is smaller than a predetermined threshold (first threshold) (step 103), the standard diagnostic description and the standard diagnostic code obtained at this time may not be accurate. Therefore, when the similarity of the standard diagnosis description with the highest similarity is smaller than the predetermined first threshold, the diagnosis code provided by the medical institution needs to be further utilized.
A subset of the standard diagnostic descriptions in the database may be determined by the diagnostic code provided by the medical facility (step 104). For example, in the case where the diagnostic code is encoded as specified by an international disease diagnostic code (e.g., ICD-10 or ICD-11), the first three bits or the first four bits of the diagnostic code represent the classification of the disease, where the first three bits represent the major class and the first four bits represent the minor class. Accordingly, the standard diagnostic description corresponding to all diagnostic codes in the large or small class may be used as a subset according to the large or small class of diagnostic codes provided by the medical institution (step 104).
Then, the similarity (second similarity) of each standard diagnostic description in the subset to the diagnostic description provided by the medical institution may be obtained (step 105). In the above step 102, the similarity (first similarity) of each standard diagnosis description in the database and the diagnosis description provided by the medical institution has been calculated. In some embodiments according to the present disclosure, these similarities may be stored temporarily or non-temporarily, so that the similarities do not have to be recalculated in step 105, as long as the similarity of the required standard diagnostic description and the diagnostic description provided by the medical institution is read from the already stored similarities.
Of course, in some embodiments according to the present disclosure, the similarity of each standard diagnostic description in the subset to the diagnostic description provided by the medical institution may be calculated again in step 105. For example, the similarity (second similarity) may be recalculated using a different segmentation model or semantic analysis model than step 102.
Next, the maximum value of the second similarity degrees may be compared with a predetermined threshold value (second threshold value) (step 106).
If the maximum value of the second similarity is greater than or equal to the second threshold, the standard diagnostic description and the standard diagnostic code corresponding to the second similarity having the maximum value may be used as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code of the medical institution.
If the maximum value of the second degree of similarity is still less than the second threshold value, a predetermined number of standard diagnostic descriptions may be considered as candidate standard diagnostic descriptions (step 107). For example, a predetermined number of standard diagnostic descriptions with a larger similarity may be regarded as candidate standard diagnostic descriptions according to the similarity (second similarity) of the standard diagnostic descriptions in the subset.
Further, in some embodiments according to the present disclosure, the second threshold may be less than the first threshold. For example, when the first threshold is 0.95, the second threshold may be set to 0.85.
Next, a standard diagnostic description corresponding to a diagnostic description provided by the medical institution may be further selected from the candidate standard diagnostic descriptions (step 108). For example, a selection may be made manually from candidate standard diagnostic descriptions; or may use other natural language processing and semantic matching models to select a standard diagnostic description from the candidate standard diagnostic descriptions that matches the diagnostic description provided by the medical facility.
Finally, the standard diagnostic descriptions determined to match and the corresponding standard diagnostic codes are output (step 109).
Furthermore, in some embodiments according to the present disclosure, when a standard diagnostic description corresponding to a diagnostic description provided by a medical institution is selected from the candidate standard diagnostic descriptions in step 108, the selected standard diagnostic description and the diagnostic description provided by the medical institution and the corresponding relationship therebetween may be recorded. For example, it may be recorded in a table (or database). In this way, when a similar diagnosis description is provided next time from the medical institution, the records in the table can be referred to, thereby directly determining the standard diagnosis description and the standard diagnosis code corresponding to the diagnosis description provided by the medical institution. It should be appreciated that the step of matching with reference to the table may be performed at any suitable location of the flow shown in fig. 1, as desired. For example, it may be between step 101 and step 102, or between step 103 and step 104, or between step 106 and step 107, etc.
Further, in some embodiments according to the present disclosure, the natural language processing and the semantic matching processing described above may obtain the respective models through machine learning. In this case, the standard diagnostic description corresponding to the diagnostic description provided by the medical institution, selected from the candidate standard diagnostic descriptions at step 108, may be fed back to the model as a training sample. By retraining the model with the training sample, the new model can find the matched standard diagnostic description more quickly and accurately in the subsequent natural language processing and semantic matching processing.
It should be understood that a variety of models obtained by machine learning may be used by those skilled in the art, for example, a word vectorization (WordToVector) model, a Recurrent Neural Network (RNN) model, a Long Short-Term Memory Network (LSTM) model, etc. may be used. The present disclosure is not so limited.
TABLE 1
Table 1 gives some illustrative embodiments according to the present disclosure.
In example 1, the name of the external code (i.e., diagnostic description) provided by an external entity (e.g., medical institution) is "acute non-viral hepatitis" and the diagnostic code is K72.004. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, the standard code name most similar to the external code name is determined in the database as "acute hepatitis (non-viral)", and the similarity is 0.9999. In the case where the predetermined first threshold value is, for example, 0.95, since the degree of similarity is greater than the first threshold value, the most similar standard code name "acute hepatitis (non-viral)" and the corresponding standard code K72.000A may be output as a result.
In example 2, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "mucocutaneous lymph node syndrome [ kawasaki disease ]", and the diagnostic code is M30.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, the standard code name most similar to the external code name was determined in the database as "mucocutaneous lymph node syndrome [ kawasaki disease ]", and the similarity was 0.9808. In the case where the predetermined first threshold value is, for example, 0.95, since the degree of similarity is greater than the first threshold value, the most similar standard code name "mucocutaneous lymph node syndrome [ kawasaki disease ]" and the corresponding standard code M30.3 can be output as a result.
In example 3, the external code name (i.e., diagnostic description) provided by an external agency (e.g., medical agency) is "haemophilus influenzae sepsis" and the diagnostic code is a 41.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, a search and matching is performed in the database. Among the found results, the standard diagnosis with the highest similarity was described as "haemophilus influenzae pneumonia", the similarity was 0.9488, and the corresponding standard diagnosis code was j14.x 00. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose degree of similarity is larger than the first threshold value. In this case, it is necessary to further determine a subset based on the diagnostic code a41.300 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class a41 may be considered as a subset. Then, the standard code name with the highest similarity in the subset and the corresponding standard diagnostic code are selected for judgment. For example, the standard coding name "haemophilus influenzae sepsis" has the highest similarity 0.9366 to the "haemophilus influenzae sepsis" provided by external agencies in this subset. In the case where the predetermined second threshold value is 0.85, the similarity is larger than the second threshold value. Thus, the result of the final match can be determined for the standard encoding named "haemophilus influenzae sepsis" and the standard diagnostic encoding a41.3, and output.
In example 4, the name of the external code (i.e., the diagnostic description) provided by the external institution (e.g., medical institution) is "late-onset liver failure" and the diagnostic code is K72.005. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, a search and matching is performed in the database. Among the searched results, the standard diagnosis having the highest similarity is described as "late schizophrenia", the similarity is 0.6838, and the corresponding standard diagnosis code is F20.802. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose degree of similarity is larger than the first threshold value. In this case, it is necessary to further determine a subset based on the diagnostic code K72.005 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class K72 may be considered as a subset. Then, the standard code name with the highest similarity in the subset and the corresponding standard diagnostic code are selected for judgment. For example, the standard coding name "acute and sub-acute liver failure" in this subset has the highest similarity 0.6233 to "late-onset liver failure" provided by external agencies, corresponding to the standard diagnostic code K72.0. However, in the case where the predetermined second threshold value is 0.85, the similarity is still smaller than the second threshold value. In this case, some standard code names having a larger similarity, the corresponding similarities, and the standard diagnostic codes are output as candidates. One of the candidates is selected as the final matching result, for example manually. For example, two candidates may be output. The first candidate standard code is named "acute and subacute liver failure", similarity 0.6233, standard diagnostic code K72.0; the second candidate was the standard code named "liver failure", similarity 0.5918, standard diagnostic code K72.900. Thus, the worker may manually select one of the two candidates (e.g., the second candidate) to be output as the final matching result. Of course, the number of candidate standard encoding names is not limited to two, and may be one, three, or more.
In addition, in some embodiments, manually selected matching results may also be recorded. For example, record in a table or database: the standard code name that matches the external code name "late-onset liver failure" is "liver failure". Therefore, when new diagnosis description and diagnosis codes are received from an external mechanism and need to be matched, the search can be carried out according to the matching relation recorded in the table or the database, the possibility that the same external code name is manually matched again is avoided, and the matching speed and the matching accuracy are improved.
Fig. 2 shows a block diagram of a computing device, which is one example of a hardware device applicable to aspects of the present disclosure, according to an example embodiment of the present disclosure.
With reference to fig. 2, a computing device 700, which is one example of a hardware device applicable to aspects of the present disclosure, will now be described. Computing device 700 may be any machine configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. The various aforementioned apparatus/servers/client devices may be implemented in whole or at least in part by computing device 700 or similar devices or systems.
According to some embodiments of the present disclosure, the following technical solutions may be adopted:
1. a method of determining a standard diagnostic code for a disease, comprising:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
2. The method according to 1, wherein in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions with larger second similarities among the standard diagnostic descriptions of the subset are selected as candidate standard diagnostic descriptions; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
4. The method of claim 2 or 3, further comprising: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
5. The method of 4, further comprising:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
6. The method of claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the following steps:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of 6, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
8. The method of 1, wherein the second threshold is less than the first threshold.
9. The method of 1, wherein determining the subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises:
determining a set of standard diagnostic codes from the diagnostic codes; and
obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
10. The method of claim 9, wherein the diagnostic codes are international disease classification standard codes, and the standard diagnostic codes contained in the major or minor classes of diagnostic codes are taken as the set of standard diagnostic codes.
11. An apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
12. The apparatus of claim 11, wherein the processor is further configured to:
selecting a predetermined number of standard diagnostic descriptions of the subset having a greater second similarity as candidate standard diagnostic descriptions if it is determined that the maximum value of the second similarity is less than the second threshold; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
13. The apparatus of claim 12, wherein the processor is further configured to: receiving one of the candidate standard diagnostic descriptions manually selected as a standard diagnostic description of the disease.
14. The apparatus of claim 12 or 13, wherein the processor is further configured to:
recording a correspondence between the selected standard diagnostic description and the diagnostic description.
15. The apparatus of claim 14, wherein the processor is further configured to:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
16. The apparatus of claim 12 or 13, wherein the natural language processing uses a model obtained by machine learning,
the processor is further configured to:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
17. The apparatus of claim 16, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
18. The apparatus of claim 11, wherein the second threshold is less than the first threshold.
19. The apparatus of claim 11, wherein the processor is further configured to:
determining a set of standard diagnostic codes from the diagnostic codes; and
obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
20. The apparatus of claim 19, wherein the diagnostic code is an international disease classification standard code, the processor further configured to:
and taking the standard diagnostic codes contained in the large class or the small class of the diagnostic codes as the set of the standard diagnostic codes.
21. A non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, the processor is configured to perform the method of any one of claims 1-10.
As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be replicated accurately. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.
In addition, certain terminology may also be used in the following description for the purpose of reference only, and thus is not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.
It will be further understood that the terms "comprises/comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In the present disclosure, the term "providing" is used broadly to encompass all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" the object, and the like.
Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.
Claims (10)
1. A method of determining a standard diagnostic code for a disease, comprising:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
2. The method according to claim 1, wherein in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions of the subset having a larger second similarity are selected as candidate standard diagnostic descriptions; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
4. The method of claim 2 or 3, further comprising: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
5. The method of claim 4, further comprising:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
6. The method according to claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the following steps:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of claim 6, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
8. The method of claim 1, wherein the second threshold is less than the first threshold.
9. An apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
10. A non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, are configured to perform the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010631296.2A CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010631296.2A CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814432A true CN111814432A (en) | 2020-10-23 |
CN111814432B CN111814432B (en) | 2024-04-16 |
Family
ID=72855269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010631296.2A Active CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814432B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723056A (en) * | 2021-08-19 | 2021-11-30 | 杭州火树科技有限公司 | ICD (interface control document) coding conversion method, device, computing equipment and storage medium |
CN116385566A (en) * | 2022-05-27 | 2023-07-04 | 上海玄戒技术有限公司 | Light source estimation method, device, electronic equipment, chip and storage medium |
CN116631614A (en) * | 2023-07-24 | 2023-08-22 | 北京惠每云科技有限公司 | Treatment scheme generation method, treatment scheme generation device, electronic equipment and storage medium |
CN116884630A (en) * | 2023-09-06 | 2023-10-13 | 深圳达实旗云健康科技有限公司 | Method for improving disease automatic coding efficiency |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
EP3637431A1 (en) * | 2018-10-12 | 2020-04-15 | Fujitsu Limited | Medical diagnostic aid and method |
-
2020
- 2020-07-03 CN CN202010631296.2A patent/CN111814432B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
EP3637431A1 (en) * | 2018-10-12 | 2020-04-15 | Fujitsu Limited | Medical diagnostic aid and method |
Non-Patent Citations (2)
Title |
---|
成诚;黄昊;欧东;: "疾病诊断自动编码系统的设计与应用", 中国病案, no. 09 * |
杨华;汪凯;郑晓华;: "ICD-10智能辅助编码方法的探讨", 中国病案, no. 09 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723056A (en) * | 2021-08-19 | 2021-11-30 | 杭州火树科技有限公司 | ICD (interface control document) coding conversion method, device, computing equipment and storage medium |
CN116385566A (en) * | 2022-05-27 | 2023-07-04 | 上海玄戒技术有限公司 | Light source estimation method, device, electronic equipment, chip and storage medium |
CN116385566B (en) * | 2022-05-27 | 2024-04-30 | 上海玄戒技术有限公司 | Light source estimation method, device, electronic equipment, chip and storage medium |
CN116631614A (en) * | 2023-07-24 | 2023-08-22 | 北京惠每云科技有限公司 | Treatment scheme generation method, treatment scheme generation device, electronic equipment and storage medium |
CN116884630A (en) * | 2023-09-06 | 2023-10-13 | 深圳达实旗云健康科技有限公司 | Method for improving disease automatic coding efficiency |
Also Published As
Publication number | Publication date |
---|---|
CN111814432B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814432B (en) | Method and apparatus for determining standard diagnostic codes for disease | |
CN111581976B (en) | Medical term standardization method, device, computer equipment and storage medium | |
WO2022105115A1 (en) | Question and answer pair matching method and apparatus, electronic device and storage medium | |
US11216618B2 (en) | Query processing method, apparatus, server and storage medium | |
US20130091138A1 (en) | Contextualization, mapping, and other categorization for data semantics | |
US20200242506A1 (en) | Systems and methods for time-based abnormality identification within uniform dataset | |
CN107909088B (en) | Method, apparatus, device and computer storage medium for obtaining training samples | |
CN111651552B (en) | Structured information determining method and device and electronic equipment | |
CN111143556A (en) | Software function point automatic counting method, device, medium and electronic equipment | |
CN113535817B (en) | Feature broad table generation and service processing model training method and device | |
CN111506595B (en) | Data query method, system and related equipment | |
US11308130B1 (en) | Constructing ground truth when classifying data | |
CN112181490A (en) | Method, device, equipment and medium for identifying function category in function point evaluation method | |
CN114969387A (en) | Document author information disambiguation method and device and electronic equipment | |
CN113761867A (en) | Address recognition method and device, computer equipment and storage medium | |
CN109144999B (en) | Data positioning method, device, storage medium and program product | |
CN118095205A (en) | Information extraction method, device and equipment of layout file and storage medium | |
CN111680082A (en) | Government financial data acquisition system and data acquisition method based on data integration | |
CN117667841A (en) | Enterprise data management platform and method | |
US11842165B2 (en) | Context-based image tag translation | |
CN115358817A (en) | Intelligent product recommendation method, device, equipment and medium based on social data | |
CN114444441A (en) | Name similarity calculation method and device, storage medium and calculation equipment | |
CN115129871A (en) | Text category determination method and device, computer equipment and storage medium | |
CN114550157A (en) | Bullet screen gathering identification method and device | |
CN116306598B (en) | Customized error correction method, system, equipment and medium for words in different fields |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |