CN111814432B - Method and apparatus for determining standard diagnostic codes for disease - Google Patents
Method and apparatus for determining standard diagnostic codes for disease Download PDFInfo
- Publication number
- CN111814432B CN111814432B CN202010631296.2A CN202010631296A CN111814432B CN 111814432 B CN111814432 B CN 111814432B CN 202010631296 A CN202010631296 A CN 202010631296A CN 111814432 B CN111814432 B CN 111814432B
- Authority
- CN
- China
- Prior art keywords
- diagnostic
- standard
- description
- standard diagnostic
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 83
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000003745 diagnosis Methods 0.000 claims abstract description 62
- 238000003058 natural language processing Methods 0.000 claims abstract description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 230000015654 memory Effects 0.000 claims description 9
- 238000010801 machine learning Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 230000006403 short-term memory Effects 0.000 claims description 7
- 125000004122 cyclic group Chemical group 0.000 claims description 6
- 230000007787 long-term memory Effects 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 206010019663 Hepatic failure Diseases 0.000 description 7
- 231100000835 liver failure Toxicity 0.000 description 7
- 208000007903 liver failure Diseases 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 241000606768 Haemophilus influenzae Species 0.000 description 5
- 229940047650 haemophilus influenzae Drugs 0.000 description 5
- 206010040047 Sepsis Diseases 0.000 description 4
- 208000011200 Kawasaki disease Diseases 0.000 description 3
- 208000037386 Typhoid Diseases 0.000 description 3
- 230000001154 acute effect Effects 0.000 description 3
- 210000001165 lymph node Anatomy 0.000 description 3
- 208000001725 mucocutaneous lymph node syndrome Diseases 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 208000011580 syndromic disease Diseases 0.000 description 3
- 201000008297 typhoid fever Diseases 0.000 description 3
- 231100000354 acute hepatitis Toxicity 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 208000006454 hepatitis Diseases 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 206010019799 Hepatitis viral Diseases 0.000 description 1
- 206010035664 Pneumonia Diseases 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 201000001862 viral hepatitis Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The present disclosure relates to a method of determining a standard diagnostic code for a disease, comprising: receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism; matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, taking a standard diagnostic description corresponding to the maximum value of the second similarity as a standard diagnostic description of the disease, and taking a standard diagnostic code corresponding to the standard diagnostic description as a standard diagnostic code of the disease.
Description
Technical Field
The present disclosure relates to methods and apparatus for determining standard diagnostic codes for disease.
Background
In the course of insurance claims, insurance institutions often need to obtain medical information of insured persons from medical institutions. Medical information provided by a medical structure typically includes information such as diagnostic codes and diagnostic descriptions. The insurer decides whether to pay the insured life based on the medical information provided by the medical institution.
Disclosure of Invention
According to one aspect of the present disclosure, there is provided a method of determining a standard diagnostic code for a disease, comprising: receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism; matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, taking a standard diagnostic description corresponding to the maximum value of the second similarity as a standard diagnostic description of the disease, and taking a standard diagnostic code corresponding to the standard diagnostic description as a standard diagnostic code of the disease.
In some embodiments according to the present disclosure, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, selecting a predetermined number of standard diagnostic descriptions having a larger second similarity among the standard diagnostic descriptions of the subset as candidate standard diagnostic descriptions; and selecting one of the candidate standard diagnostic profiles as a standard diagnostic profile for the disease.
In some embodiments according to the present disclosure, selecting one of the candidate standard diagnostic descriptions as the standard diagnostic description corresponding to the disease comprises: one of the candidate standard diagnostic profiles is manually selected by a human as a standard diagnostic profile for the disease.
In some embodiments according to the present disclosure, the method further comprises: and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
In some embodiments according to the present disclosure, the method further comprises: receiving another diagnostic description of the disease from the external mechanism; and determining a standard diagnostic description of the disease based on the correspondence and the other diagnostic description.
In some embodiments according to the present disclosure, the natural language processing uses a model obtained by machine learning, the method further comprising: the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model includes: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the present disclosure, determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining a standard diagnostic description corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the present disclosure, the diagnostic code is an international disease classification standard code, and the standard diagnostic code contained in the major or minor class of the diagnostic code is taken as the set of standard diagnostic codes.
According to another aspect of the present disclosure, there is provided an apparatus for determining standard diagnostic codes for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to: receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism; matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, taking a standard diagnostic description corresponding to the maximum value of the second similarity as a standard diagnostic description of the disease, and taking a standard diagnostic code corresponding to the standard diagnostic description as a standard diagnostic code of the disease.
In some embodiments according to the present disclosure, the processor is further configured to: selecting, as candidate standard diagnostic descriptions, a predetermined number of standard diagnostic descriptions having a larger second similarity among the standard diagnostic descriptions of the subset, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold; and selecting one of the candidate standard diagnostic profiles as a standard diagnostic profile for the disease.
In some embodiments according to the present disclosure, the processor is further configured to: one of the candidate standard diagnostic descriptions manually selected by a human is received as a standard diagnostic description of the disease.
In some embodiments according to the present disclosure, the processor is further configured to: and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
In some embodiments according to the present disclosure, the processor is further configured to: receiving another diagnostic description of the disease from the external mechanism; and determining a standard diagnostic description of the disease based on the correspondence and the other diagnostic description.
In some embodiments according to the present disclosure, the natural language processing uses a model obtained by machine learning, the processor being further configured to: the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model includes: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the present disclosure, the processor is further configured to: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining a standard diagnostic description corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the present disclosure, the diagnostic code is an international disease classification standard code, the processor is further configured to: the standard diagnostic codes contained in the major or minor class of diagnostic codes are taken as the set of standard diagnostic codes.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing instructions which, when executed by a processor, are configured to perform the above-described method of determining standard diagnostic codes for a disease.
Other features of the present disclosure and its advantages will become apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a flow chart of a method of determining standard diagnostic codes for a disease according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of a computing device according to an exemplary embodiment of the present disclosure. .
Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same parts or parts having the same functions, and a repetitive description thereof may be omitted. In this specification, like reference numerals and letters are used to designate like items, and thus once an item is defined in one drawing, no further discussion thereof is necessary in subsequent drawings.
For ease of understanding, the positions, dimensions, ranges, etc. of the respective structures shown in the drawings and the like may not represent actual positions, dimensions, ranges, etc. Accordingly, the disclosed invention is not limited to the disclosed positions, dimensions, ranges, etc. as illustrated in the drawings.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
Diagnostic codes for diseases provided by different medical institutions may be based on different coding criteria. Even with the same coding standard, the same disease may give different codes due to differences in staff level and awareness. Furthermore, diagnostic descriptions of the disease are typically written by physicians at the medical facility. Different physicians may employ different modes of description. The same disease may also be expressed using different terms. This increases the complexity of the insurance entity to perform the claim operations.
In some embodiments of the present disclosure, diagnostic codes and diagnostic descriptions of diseases provided by external institutions (e.g., medical structures) may be converted into standard diagnostic codes and standard diagnostic descriptions of the insurance institutions themselves, thereby facilitating the processing of warranties and claims by the insurance institutions.
FIG. 1 illustrates a flow chart of a method of determining standard diagnostic codes for a disease according to an embodiment of the present disclosure.
As shown in fig. 1, first, an insurance structure may receive medical information of an insured life from an external institution (e.g. medical institution, etc.) (step 101). Such medical information may include diagnostic codes and diagnostic descriptions of the disease.
Diagnostic coding is a method of coding a disease according to its type. One commonly used diagnostic code for a disease is the international diagnostic code for a disease. Currently, international disease diagnostic codes are widely used as two versions of ICD-10 and ICD-11. In international disease diagnostic codes, each disease is assigned a seven-digit code, with the first three digits being referred to as the major category and the first four digits being referred to as the minor category. For example, code A01.000B represents typhoid fever. The first three A01 bits of the code represent the major classes of typhoid and paratyphoid, and the first four A01.0 bits represent the minor classes of typhoid.
The diagnostic description is a description of the condition of the patient. The diagnostic description may generally include information such as a description of the condition. For example, in some example embodiments according to the present disclosure, the diagnostic description may be a disease name corresponding to the diagnostic code.
After the insurance entity receives the diagnostic code and the diagnostic description from the medical entity, the received diagnostic description is matched with the standard diagnosis of the insurance entity (step 102).
Diagnostic codes from medical institutions may in many cases be inaccurate or subject to error. Thus, the insurance entity needs to determine the correct diagnostic code (i.e., standard diagnostic code) from the diagnostic description provided by the medical entity. In some exemplary embodiments, the standard diagnostic code and the standard diagnostic description corresponding thereto may be stored in a database. The insurance entity may match the diagnostic description provided by the medical entity with the standard diagnostic description in the database to find the standard diagnostic description in the database that is most similar to the diagnostic description provided by the medical entity. The closest standard diagnostic description corresponds to the standard diagnostic code as the diagnostic code for the disease of the patient.
In some embodiments according to the present disclosure, the diagnostic description of the medical structure may be matched with multiple diagnostic descriptions in the database of the insurance company through natural language processing. In the natural language processing, firstly, the diagnosis description provided by the medical institution needs to be subjected to word segmentation processing, and in order to improve the word segmentation accuracy, a medical word segmentation library, a medical synonymous word library, a medical diagnosis description sentence segmentation library and the like can be established specially aiming at the medical field, so that the diagnosis description can be accurately subjected to word segmentation processing. The word-segmented diagnostic description may then be semantically matched with standard diagnostic descriptions in a database.
By means of the semantic matching process, a similarity (i.e. a first similarity) between the respective standard diagnostic description in the database and the diagnostic description of the medical institution can be obtained. In many cases, the standard diagnostic description with the highest similarity may be used as a result of matching with the diagnostic description of the medical structure, and the standard diagnostic description and the corresponding standard diagnostic code may be output as a result. For example, a predetermined threshold (first threshold) may be set. In the case where the maximum value of the similarity is greater than or equal to the predetermined threshold value (first threshold value), the standard diagnostic description and the standard diagnostic code corresponding to the maximum value may be directly used as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code provided by the medical institution.
However, in a case where the similarity of the standard diagnostic description with the highest similarity is still low, for example, the similarity is smaller than a predetermined threshold (first threshold) (step 103), the obtained standard diagnostic description and standard diagnostic code may not be accurate at this time. Therefore, in the case where the similarity of the standard diagnostic description with the highest similarity is smaller than the predetermined first threshold value, it is also necessary to further utilize the diagnostic code provided by the medical institution.
A subset of the standard diagnostic descriptions in the database may be determined by the diagnostic codes provided by the medical facility (step 104). For example, where the diagnostic code is encoded as specified by an international disease diagnostic code (e.g., ICD-10 or ICD-11), the first three or four digits of the diagnostic code represent a classification of the disease, where the first three digits represent a major class and the first four digits represent a minor class. Thus, the standard diagnostic descriptions corresponding to all diagnostic codes in a major or minor class may be considered as a subset based on the major or minor class of diagnostic codes provided by the medical facility (step 104).
Then, a similarity (second similarity) of each standard diagnostic description in the subset to the diagnostic description provided by the medical facility may be obtained (step 105). In step 102 described above, the similarity (first similarity) of each standard diagnostic description in the database to the diagnostic description provided by the medical institution has been calculated. In some embodiments according to the present disclosure, these similarities may be stored temporarily or non-temporarily, so that it is not necessary to recalculate the similarities in step 105, as long as the required standard diagnostic description is read from the already stored similarities with the diagnostic description provided by the medical institution.
Of course, in some embodiments according to the present disclosure, the similarity of each standard diagnostic description in the subset to the diagnostic description provided by the medical facility may be re-calculated in step 105. For example, the similarity (second similarity) may be recalculated using a word segmentation model or a semantic analysis model different from step 102.
Next, the maximum value in the second similarity may be compared with a predetermined threshold (second threshold) (step 106).
If the maximum value of the second similarity is greater than or equal to the second threshold value, the standard diagnostic description and the standard diagnostic code corresponding to the second similarity having the maximum value may be regarded as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code of the medical institution.
If the maximum value of the second similarity is still less than the second threshold, a predetermined number of standard diagnostic descriptions may be used as candidate standard diagnostic descriptions (step 107). For example, a predetermined number of standard diagnostic descriptions having a larger similarity may be used as candidate standard diagnostic descriptions according to the similarity (second similarity) of the standard diagnostic descriptions in the subset.
Further, in some embodiments according to the present disclosure, the second threshold may be less than the first threshold. For example, when the first threshold is 0.95, the second threshold may be set to 0.85.
Next, a standard diagnostic description corresponding to the diagnostic description provided by the medical institution may be further selected from the candidate standard diagnostic descriptions (step 108). For example, selection from candidate standard diagnostic descriptions may be made manually; alternatively, other natural language processing and semantic matching models may be employed to select a standard diagnostic description from the candidate standard diagnostic descriptions that matches the diagnostic description provided by the medical facility.
Finally, the standard diagnostic description determined to be matched and the corresponding standard diagnostic code output (step 109).
Further, in some embodiments according to the present disclosure, when a standard diagnostic description corresponding to a diagnostic description provided by a medical institution is selected from the candidate standard diagnostic descriptions in step 108, the selected standard diagnostic description and the diagnostic description provided by the medical institution and the correspondence therebetween may be recorded. For example, it may be recorded in a table (or database). In this way, when a similar diagnostic description is next provided from the medical institution, the records in the table may be referenced to directly determine the standard diagnostic description and standard diagnostic code corresponding to the diagnostic description provided by the medical institution. It should be appreciated that the step of matching with reference to the table may be performed at any suitable location in the flow shown in fig. 1, as desired. For example, it may be between step 101 and step 102, or between step 103 and step 104, or between step 106 and step 107, etc.
Furthermore, in some embodiments according to the present disclosure, the above-described natural language processing and semantic matching processing may obtain the corresponding model through machine learning. In this case, the standard diagnostic description corresponding to the diagnostic description provided by the medical institution selected from the candidate standard diagnostic descriptions at step 108 may be fed back to the model as a training sample. By retraining the model with the training samples, the new model can be made to find a matching standard diagnostic description more quickly and accurately in later natural language processing and semantic matching processing.
It should be appreciated that various models obtained through machine learning may be used by those skilled in the art, such as a word vector (WordToVector) model, a recurrent neural network (Recurrent Neural Network, RNN) model, a Long Short-Term Memory (LSTM) model, and the like. The present disclosure is not limited in this regard.
TABLE 1
Table 1 gives some exemplary embodiments according to the present disclosure.
In example 1, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "acute non-viral hepatitis", and the diagnostic code is K72.004. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with the standard code name (i.e., standard diagnostic description) of the insurance agency. The standard code name most similar to the external code name is determined in the database by natural language processing to be "acute hepatitis (non-viral)", and the similarity is 0.9999. In the case where the predetermined first threshold value is, for example, 0.95, since the similarity is larger than the first threshold value, the most similar standard code name "acute hepatitis (non-viral)" and the corresponding standard code K72.000A may be output as a result.
In example 2, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "mucosal cutaneous lymph node syndrome [ kawasaki disease ]", and the diagnostic code is M30.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with the standard code name (i.e., standard diagnostic description) of the insurance agency. The standard code name most similar to the external code name is determined in the database by natural language processing as "mucosal cutaneous lymph node syndrome [ kawasaki disease ]", and the similarity is 0.9808. In the case where the predetermined first threshold value is, for example, 0.95, since the similarity is larger than the first threshold value, the most similar standard code name "mucosal cutaneous lymph node syndrome [ kawasaki disease ]" and the corresponding standard code M30.3 may be output as a result.
In example 3, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "haemophilus influenzae type sepsis", and the diagnostic code is a41.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with the standard code name (i.e., standard diagnostic description) of the insurance agency. And searching and matching in the database through natural language processing. Among the found results, the standard diagnosis with the highest similarity is described as "Haemophilus influenzae type pneumonia", the similarity is 0.9488, and the corresponding standard diagnosis is encoded as J14.x00. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose similarity is larger than the first threshold value. In this case, it is necessary to determine a subset further from the diagnostic codes a41.300 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class a41 may be encoded as a subset. And then, selecting the standard code name with the highest similarity in the subset and the corresponding standard diagnosis code for judgment. For example, among this subset, the standard code name "haemophilus influenzae sepsis" has the highest similarity 0.9366 to "haemophilus influenzae sepsis" provided by the external institution. In the case where the predetermined second threshold value is 0.85, the similarity is larger than the second threshold value. Thus, the standard code name "haemophilus influenzae sepsis", the standard diagnostic code a41.3, can be determined as the result of the final match and output.
In example 4, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "late liver failure", and the diagnostic code is K72.005. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with the standard code name (i.e., standard diagnostic description) of the insurance agency. And searching and matching in the database through natural language processing. Among the found results, the standard diagnosis with the highest similarity is described as "late onset schizophrenia", the similarity is 0.6838, and the corresponding standard diagnosis is encoded as F20.802. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose similarity is larger than the first threshold value. In this case, it is necessary to determine a subset further from the diagnostic code K72.005 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class K72 may be encoded as a subset. And then, selecting the standard code name with the highest similarity in the subset and the corresponding standard diagnosis code for judgment. For example, in this subset, the standard code name "acute and subacute liver failure" has the highest similarity 0.6233 to "late liver failure" provided by the external institution, corresponding to the standard diagnostic code K72.0. However, in the case where the predetermined second threshold value is 0.85, the similarity is still smaller than the second threshold value. In this case, some standard code names with a large degree of similarity, the corresponding degree of similarity, and standard diagnostic codes are output as candidates. One is selected as a final matching result from the candidates, e.g. manually. For example, two candidates may be output. The first candidate standard code name is "acute and subacute liver failure", similarity 0.6233, standard diagnostic code K72.0; the second candidate standard code was named "liver failure", similarity 0.5918, standard diagnostic code K72.900. In this way, the worker can manually select one of the two candidates (e.g., the second candidate) as the final matching result output. Of course, the number of candidate standard code names is not limited to two, but may be one, three or more.
In addition, in some embodiments, the manually selected matching results may also be recorded. For example, record in a table or database: the standard code name matching the external code name "late liver failure" is "liver failure". Therefore, when new diagnosis description and diagnosis codes are received from an external mechanism and need to be matched later, the new diagnosis description and diagnosis codes can be searched according to the matching relation recorded in the table or the database, the possibility that the same external code names are manually matched again is avoided, and the matching speed and accuracy are improved.
Fig. 2 illustrates a block diagram of a computing device, which is one example of a hardware device applicable to aspects of the present disclosure, according to one exemplary embodiment of the present disclosure.
With reference to fig. 2, a computing device 700, which is one example of a hardware device applicable to aspects of the present disclosure, will now be described. Computing device 700 may be any machine configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. The various means/servers/client devices described previously may be implemented, in whole or at least in part, by computing device 700 or a similar device or system.
Computing device 700 may include components that may be connected to or in communication with a bus 702 via one or more interfaces. For example, computing device 700 may include a bus 702, one or more processors 704, one or more input devices 706, and one or more output devices 708. The one or more processors 704 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special purpose processing chips). Input device 706 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, keyboard, touch screen, microphone, and/or remote controller. Output device 708 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 700 may also include or be connected to a non-transitory storage device 710, which may be non-transitory and capable of data storage, and which may include, but is not limited to, a disk drive, an optical storage device, solid state memory, a floppy disk, a flexible disk, a hard disk, magnetic tape, or any other magnetic medium, an optical disk or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The non-transitory storage device 710 may be separated from the interface. The non-transitory storage device 710 may have data/instructions/code for implementing the methods and steps described above. Computing device 700 may also include communication device 712. The communication device 712 may be any type of device or system capable of enabling communication with an internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth device, 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
According to some embodiments of the present disclosure, the following technical solutions may be adopted:
1. a method of determining a standard diagnostic code for a disease, comprising:
receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism;
matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and
in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, a standard diagnostic description corresponding to the maximum value of the second similarity is taken as a standard diagnostic description of the disease, and a standard diagnostic code corresponding to the standard diagnostic description is taken as a standard diagnostic code of the disease.
2. The method according to claim 1, wherein, in the case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions, of which second similarity is larger, among the standard diagnostic descriptions of the subset are selected as candidate standard diagnostic descriptions; and
one of the candidate standard diagnostic profiles is selected as a standard diagnostic profile for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic profiles as the standard diagnostic profile corresponding to the disease comprises: one of the candidate standard diagnostic profiles is manually selected by a human as a standard diagnostic profile for the disease.
4. The method of 2 or 3, further comprising: and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
5. The method according to claim 4, further comprising:
receiving another diagnostic description of the disease from the external mechanism; and
and determining a standard diagnosis description of the disease according to the corresponding relation and the another diagnosis description.
6. The method of claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the steps of:
the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of claim 6, wherein the model comprises: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
8. The method of claim 1, wherein the second threshold is less than the first threshold.
9. The method of claim 1, wherein determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises:
determining a set of standard diagnostic codes from the diagnostic codes; and
a standard diagnostic description corresponding to the standard diagnostic code in the set is obtained as a subset of the plurality of standard diagnostic descriptions.
10. The method of claim 9, wherein the diagnostic code is an international disease classification standard code, and a standard diagnostic code contained in a major or minor class of the diagnostic code is taken as the set of standard diagnostic codes.
11. An apparatus for determining standard diagnostic codes for a disease comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism;
matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and
in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, a standard diagnostic description corresponding to the maximum value of the second similarity is taken as a standard diagnostic description of the disease, and a standard diagnostic code corresponding to the standard diagnostic description is taken as a standard diagnostic code of the disease.
12. The apparatus of claim 11, wherein the processor is further configured to:
selecting, as candidate standard diagnostic descriptions, a predetermined number of standard diagnostic descriptions having a larger second similarity among the standard diagnostic descriptions of the subset, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold; and
one of the candidate standard diagnostic profiles is selected as a standard diagnostic profile for the disease.
13. The apparatus of claim 12, wherein the processor is further configured to: one of the candidate standard diagnostic descriptions manually selected by a human is received as a standard diagnostic description of the disease.
14. The apparatus of claim 12 or 13, wherein the processor is further configured to:
and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
15. The apparatus of claim 14, wherein the processor is further configured to:
receiving another diagnostic description of the disease from the external mechanism; and
and determining a standard diagnosis description of the disease according to the corresponding relation and the another diagnosis description.
16. The apparatus of claim 12 or 13, wherein the natural language processing uses a model obtained by machine learning,
the processor is further configured to:
the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
17. The apparatus of claim 16, wherein the model comprises: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
18. The apparatus of claim 11, wherein the second threshold is less than the first threshold.
19. The apparatus of claim 11, wherein the processor is further configured to:
determining a set of standard diagnostic codes from the diagnostic codes; and
a standard diagnostic description corresponding to the standard diagnostic code in the set is obtained as a subset of the plurality of standard diagnostic descriptions.
20. The apparatus of claim 19, wherein the diagnostic code is an international disease classification code, the processor further configured to:
the standard diagnostic codes contained in the major or minor class of diagnostic codes are taken as the set of standard diagnostic codes.
21. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, are configured to perform the method of any one of claims 1-10.
As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" to be replicated accurately. Any implementation described herein by way of example is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, this disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.
In addition, certain terminology may be used in the following description for the purpose of reference only and is therefore not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.
It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.
In this disclosure, the term "providing" is used in a broad sense to cover all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" an object, etc.
Those skilled in the art will recognize that the boundaries between the above described operations are merely illustrative. The operations may be combined into a single operation, the single operation may be distributed among additional operations, and the operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in other various embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. The embodiments disclosed herein may be combined in any desired manner without departing from the spirit and scope of the present disclosure. Those skilled in the art will also appreciate that various modifications might be made to the embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.
Claims (19)
1. A method of determining a standard diagnostic code for a disease, comprising:
receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism;
matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and
in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, a standard diagnostic description corresponding to the maximum value of the second similarity is taken as a standard diagnostic description of the disease, a standard diagnostic code corresponding to the standard diagnostic description is taken as a standard diagnostic code of the disease,
wherein determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises:
determining a set of standard diagnostic codes from the diagnostic codes; and
a standard diagnostic description corresponding to the standard diagnostic code in the set is obtained as a subset of the plurality of standard diagnostic descriptions.
2. The method of claim 1, wherein, in the event that it is determined that the maximum value of the second similarity is less than the second threshold, selecting a predetermined number of standard diagnostic descriptions of the subset of standard diagnostic descriptions having greater second similarity as candidate standard diagnostic descriptions; and
one of the candidate standard diagnostic profiles is selected as a standard diagnostic profile for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic profiles as the standard diagnostic profile corresponding to the disease comprises: one of the candidate standard diagnostic profiles is manually selected by a human as a standard diagnostic profile for the disease.
4. A method according to claim 2 or 3, further comprising: and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
5. The method of claim 4, further comprising:
receiving another diagnostic description of the disease from the external mechanism; and
and determining a standard diagnosis description of the disease according to the corresponding relation and the another diagnosis description.
6. The method of claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the steps of:
the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of claim 6, wherein the model comprises: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
8. The method of claim 1, wherein the second threshold is less than the first threshold.
9. The method of claim 1, wherein the diagnostic code is an international disease classification standard code, and a standard diagnostic code contained in a major or minor class of the diagnostic code is taken as the set of standard diagnostic codes.
10. An apparatus for determining standard diagnostic codes for a disease comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving diagnostic codes and diagnostic descriptions of a disease from an external mechanism;
matching the diagnosis description with a plurality of standard diagnosis descriptions in a database through natural language processing to obtain a plurality of first similarities between the diagnosis description and the plurality of standard diagnosis descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of a standard diagnostic description in the subset to the diagnostic description; and
in the case where it is determined that the maximum value of the second similarity is greater than or equal to a predetermined second threshold value, a standard diagnostic description corresponding to the maximum value of the second similarity is taken as a standard diagnostic description of the disease, a standard diagnostic code corresponding to the standard diagnostic description is taken as a standard diagnostic code of the disease,
wherein the processor is further configured to:
determining a set of standard diagnostic codes from the diagnostic codes; and
a standard diagnostic description corresponding to the standard diagnostic code in the set is obtained as a subset of the plurality of standard diagnostic descriptions.
11. The apparatus of claim 10, wherein the processor is further configured to:
selecting, as candidate standard diagnostic descriptions, a predetermined number of standard diagnostic descriptions having a larger second similarity among the standard diagnostic descriptions of the subset, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold; and
one of the candidate standard diagnostic profiles is selected as a standard diagnostic profile for the disease.
12. The apparatus of claim 11, wherein the processor is further configured to: one of the candidate standard diagnostic descriptions manually selected by a human is received as a standard diagnostic description of the disease.
13. The apparatus of claim 11 or 12, wherein the processor is further configured to:
and recording the corresponding relation between the selected standard diagnosis description and the diagnosis description.
14. The apparatus of claim 13, wherein the processor is further configured to:
receiving another diagnostic description of the disease from the external mechanism; and
and determining a standard diagnosis description of the disease according to the corresponding relation and the another diagnosis description.
15. The apparatus of claim 11 or 12, wherein the natural language processing uses a model obtained by machine learning,
the processor is further configured to:
the model is trained using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
16. The apparatus of claim 15, wherein the model comprises: word vectorization model, cyclic neural network model, long-term and short-term memory network model.
17. The apparatus of claim 10, wherein the second threshold is less than the first threshold.
18. The apparatus of claim 10, wherein the diagnostic code is an international disease classification standard code, the processor further configured to:
the standard diagnostic codes contained in the major or minor class of diagnostic codes are taken as the set of standard diagnostic codes.
19. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, are configured to perform the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010631296.2A CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010631296.2A CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814432A CN111814432A (en) | 2020-10-23 |
CN111814432B true CN111814432B (en) | 2024-04-16 |
Family
ID=72855269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010631296.2A Active CN111814432B (en) | 2020-07-03 | 2020-07-03 | Method and apparatus for determining standard diagnostic codes for disease |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814432B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723056B (en) * | 2021-08-19 | 2024-07-16 | 杭州火树科技有限公司 | ICD code conversion method, ICD code conversion device, computing equipment and storage medium |
CN116385566B (en) * | 2022-05-27 | 2024-04-30 | 上海玄戒技术有限公司 | Light source estimation method, device, electronic equipment, chip and storage medium |
CN116631614A (en) * | 2023-07-24 | 2023-08-22 | 北京惠每云科技有限公司 | Treatment scheme generation method, treatment scheme generation device, electronic equipment and storage medium |
CN116884630B (en) * | 2023-09-06 | 2024-08-23 | 深圳达实旗云健康科技有限公司 | Method for improving disease automatic coding efficiency |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
EP3637431A1 (en) * | 2018-10-12 | 2020-04-15 | Fujitsu Limited | Medical diagnostic aid and method |
-
2020
- 2020-07-03 CN CN202010631296.2A patent/CN111814432B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
EP3637431A1 (en) * | 2018-10-12 | 2020-04-15 | Fujitsu Limited | Medical diagnostic aid and method |
Non-Patent Citations (2)
Title |
---|
ICD-10智能辅助编码方法的探讨;杨华;汪凯;郑晓华;;中国病案(第09期);全文 * |
疾病诊断自动编码系统的设计与应用;成诚;黄昊;欧东;;中国病案(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111814432A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814432B (en) | Method and apparatus for determining standard diagnostic codes for disease | |
CN107908635B (en) | Method and device for establishing text classification model and text classification | |
CN111581976B (en) | Medical term standardization method, device, computer equipment and storage medium | |
EP2812883B1 (en) | System and method for semantically annotating images | |
US20200364233A1 (en) | Systems and methods for a context sensitive search engine using search criteria and implicit user feedback | |
US9659052B1 (en) | Data object resolver | |
CN112541056B (en) | Medical term standardization method, device, electronic equipment and storage medium | |
US20200242506A1 (en) | Systems and methods for time-based abnormality identification within uniform dataset | |
CN107909088B (en) | Method, apparatus, device and computer storage medium for obtaining training samples | |
CN110362798B (en) | Method, apparatus, computer device and storage medium for judging information retrieval analysis | |
CN112287680B (en) | Entity extraction method, device and equipment of inquiry information and storage medium | |
CN109299227B (en) | Information query method and device based on voice recognition | |
CN116737879A (en) | Knowledge base query method and device, electronic equipment and storage medium | |
CN111143556A (en) | Software function point automatic counting method, device, medium and electronic equipment | |
US20230061731A1 (en) | Significance-based prediction from unstructured text | |
CN114168841A (en) | Content recommendation method and device | |
CN111949785A (en) | Query statement management method and device, readable storage medium and electronic device | |
US11610401B2 (en) | Acquiring public opinion and training word viscosity model | |
WO2019071907A1 (en) | Method for identifying help information based on operation page, and application server | |
WO2021159812A1 (en) | Cancer staging information processing method and apparatus, and storage medium | |
CN112181490A (en) | Method, device, equipment and medium for identifying function category in function point evaluation method | |
CN117251777A (en) | Data processing method, device, computer equipment and storage medium | |
US11842165B2 (en) | Context-based image tag translation | |
CN115358817A (en) | Intelligent product recommendation method, device, equipment and medium based on social data | |
CN115017256A (en) | Power data processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |