CN111814432A - Method and apparatus for determining standard diagnostic codes for diseases - Google Patents

Method and apparatus for determining standard diagnostic codes for diseases Download PDF

Info

Publication number
CN111814432A
CN111814432A CN202010631296.2A CN202010631296A CN111814432A CN 111814432 A CN111814432 A CN 111814432A CN 202010631296 A CN202010631296 A CN 202010631296A CN 111814432 A CN111814432 A CN 111814432A
Authority
CN
China
Prior art keywords
diagnostic
standard
description
disease
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010631296.2A
Other languages
Chinese (zh)
Other versions
CN111814432B (en
Inventor
何苗
陈国锋
刘建立
苏仕颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Insurance Technology Co ltd
Original Assignee
China Insurance Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Insurance Technology Co ltd filed Critical China Insurance Technology Co ltd
Priority to CN202010631296.2A priority Critical patent/CN111814432B/en
Publication of CN111814432A publication Critical patent/CN111814432A/en
Application granted granted Critical
Publication of CN111814432B publication Critical patent/CN111814432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present disclosure relates to a method of determining a standard diagnostic code for a disease, comprising: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.

Description

Method and apparatus for determining standard diagnostic codes for diseases
Technical Field
The present disclosure relates to methods and apparatus for determining a standard diagnostic code for a disease.
Background
Insurance institutions often need to obtain medical information of insured life from medical institutions during insurance claim settlement. The medical information provided by the medical structure typically includes diagnostic codes and diagnostic descriptions. The insurance agency decides whether or not to pay for the insured life based on medical information provided by the medical agency.
Disclosure of Invention
According to one aspect of the present disclosure, there is provided a method of determining a standard diagnostic code for a disease, comprising: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
In some embodiments according to the present disclosure, in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions of the subset having a larger second similarity are selected as candidate standard diagnostic descriptions; and selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
In some embodiments according to the disclosure, selecting one of the candidate standard diagnostic descriptions as the standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
In some embodiments according to the present disclosure, the method further comprises: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
In some embodiments according to the present disclosure, the method further comprises: receiving another diagnostic description of the disease from an external entity; and determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
In some embodiments according to the disclosure, the natural language processing uses a model obtained by machine learning, the method further comprising: training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the present disclosure, determining the subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the present disclosure, the diagnostic code is an international disease classification standard code, and the standard diagnostic codes included in the major or minor class of diagnostic codes are taken as the set of standard diagnostic codes.
According to another aspect of the present disclosure, there is provided an apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to: receiving a diagnostic code and a diagnostic description of a disease from an external entity; matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions; determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold; determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code; obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
In some embodiments according to the disclosure, the processor is further configured to: selecting a predetermined number of standard diagnostic descriptions of the subset having a greater second similarity as candidate standard diagnostic descriptions if it is determined that the maximum value of the second similarity is less than the second threshold; and selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
In some embodiments according to the disclosure, the processor is further configured to: receiving one of the candidate standard diagnostic descriptions manually selected as a standard diagnostic description of the disease.
In some embodiments according to the disclosure, the processor is further configured to: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
In some embodiments according to the disclosure, the processor is further configured to: receiving another diagnostic description of the disease from an external entity; and determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
In some embodiments according to the disclosure, the natural language processing uses a model obtained by machine learning, the processor further configured to: training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
In some embodiments according to the present disclosure, the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
In some embodiments according to the present disclosure, the second threshold is less than the first threshold.
In some embodiments according to the disclosure, the processor is further configured to: determining a set of standard diagnostic codes from the diagnostic codes; and obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
In some embodiments according to the disclosure, the diagnostic code is an international disease classification standard code, the processor further configured to: and taking the standard diagnostic codes contained in the large class or the small class of the diagnostic codes as the set of the standard diagnostic codes.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, are configured to perform the above-described method of determining a standard diagnostic code for a disease.
Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
fig. 1 shows a flow diagram of a method of determining a standard diagnostic code for a disease according to an embodiment of the present disclosure.
Fig. 2 is a block diagram of a computing device according to an example embodiment of the present disclosure. .
Note that in the embodiments described below, the same reference numerals are used in common between different drawings to denote the same portions or portions having the same functions, and a repetitive description thereof will be omitted. In this specification, like reference numerals and letters are used to designate like items, and therefore, once an item is defined in one drawing, further discussion thereof is not required in subsequent drawings.
For convenience of understanding, the positions, sizes, ranges, and the like of the respective structures shown in the drawings and the like do not sometimes indicate actual positions, sizes, ranges, and the like. Therefore, the disclosed invention is not limited to the positions, dimensions, ranges, etc., disclosed in the drawings and the like.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
Diagnostic codes for diseases provided by different medical institutions may be based on different coding standards. Even if the same coding standard is used, the same disease may give different codes due to differences in the level and knowledge of workers. In addition, diagnostic descriptions of diseases are often written by physicians in medical institutions. Different physicians may use different descriptions. The same disease may also be described using different terms. This increases the complexity of the claim settlement operation performed by the insurance agency.
In some embodiments of the present disclosure, diagnostic codes and diagnostic descriptions of diseases provided by external entities (e.g., medical structures) may be converted into their own standard diagnostic codes and standard diagnostic descriptions of insurance mechanisms, thereby facilitating insurance mechanisms' processing of underwriting and claims.
Fig. 1 shows a flow diagram of a method of determining a standard diagnostic code for a disease according to an embodiment of the present disclosure.
As shown in fig. 1, first, the insurance structure may receive medical information of an insured from an external institution (e.g., medical institution, etc.) (step 101). Such medical information may include diagnostic codes and diagnostic descriptions of diseases.
Diagnostic coding is a method of coding a disease according to its kind. One commonly used diagnostic code for a disease is the international diagnostic code for the disease. Currently, the international disease diagnostic codes are widely used as both versions of ICD-10 and ICD-11. In international disease diagnostic codes, each disease is assigned a seven-digit code, with the first three digits being referred to as the major class and the first four digits being referred to as the minor class. For example, code A01.000B represents an enteric fever. The first three digits of the code, A01, represent the major groups of typhoid and paratyphoid, and the first four digits, A01.0, represent the minor groups of typhoid.
A diagnostic description is a description of the condition of a patient. The diagnostic description may generally include information such as a description of the condition. For example, in some exemplary embodiments according to the present disclosure, the diagnosis description may be a disease name corresponding to the diagnosis code.
After the insurance institution receives the diagnostic code and diagnostic description from the medical institution, the received diagnostic description is matched to the insurance institution's standard diagnosis (step 102).
Diagnostic codes from medical facilities may in many cases be inaccurate or erroneous. Therefore, the insurance agency needs to determine the correct diagnostic code (i.e., the standard diagnostic code) from the diagnostic description provided by the medical institution. In some exemplary embodiments, the standard diagnostic code and the standard diagnostic description corresponding thereto may be stored in a database. The insurance agency may match the diagnostic description provided by the medical institution with the standard diagnostic descriptions in the database to find the standard diagnostic description in the database that is most similar to the diagnostic description provided by the medical institution. The closest standard diagnostic code may be used as a diagnostic code for the disease of the patient.
In some embodiments according to the present disclosure, the diagnostic description of the medical structure may be matched to a plurality of diagnostic descriptions in a database of an insurance company through natural language processing. In the natural language processing, firstly, the diagnosis description provided by the medical institution needs to be participled, and in order to improve the accuracy of the participle, a medical participle library, a medical synonym library, a medical diagnosis description sentence segmentation library and the like can be specially established aiming at the medical field, so that the diagnosis description can be accurately participled. The tokenized diagnostic description may then be semantically matched to standard diagnostic descriptions in a database.
Using semantic matching processing, the similarity (i.e., the first similarity) between each standard diagnostic description in the database and the diagnostic description of the medical institution can be obtained. In many cases, the standard diagnostic description with the highest similarity may be output as a result of matching with the diagnostic description of the medical structure, and the standard diagnostic description and the corresponding standard diagnostic code may be output as a result. For example, a predetermined threshold (first threshold) may be set. In the case where the maximum value of the similarity is greater than or equal to the predetermined threshold value (first threshold value), the standard diagnostic description and the standard diagnostic code corresponding to the maximum value may be directly used as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code provided by the medical institution.
However, in the case where the similarity of the standard diagnostic description with the highest similarity is still low, for example, the similarity is smaller than a predetermined threshold (first threshold) (step 103), the standard diagnostic description and the standard diagnostic code obtained at this time may not be accurate. Therefore, when the similarity of the standard diagnosis description with the highest similarity is smaller than the predetermined first threshold, the diagnosis code provided by the medical institution needs to be further utilized.
A subset of the standard diagnostic descriptions in the database may be determined by the diagnostic code provided by the medical facility (step 104). For example, in the case where the diagnostic code is encoded as specified by an international disease diagnostic code (e.g., ICD-10 or ICD-11), the first three bits or the first four bits of the diagnostic code represent the classification of the disease, where the first three bits represent the major class and the first four bits represent the minor class. Accordingly, the standard diagnostic description corresponding to all diagnostic codes in the large or small class may be used as a subset according to the large or small class of diagnostic codes provided by the medical institution (step 104).
Then, the similarity (second similarity) of each standard diagnostic description in the subset to the diagnostic description provided by the medical institution may be obtained (step 105). In the above step 102, the similarity (first similarity) of each standard diagnosis description in the database and the diagnosis description provided by the medical institution has been calculated. In some embodiments according to the present disclosure, these similarities may be stored temporarily or non-temporarily, so that the similarities do not have to be recalculated in step 105, as long as the similarity of the required standard diagnostic description and the diagnostic description provided by the medical institution is read from the already stored similarities.
Of course, in some embodiments according to the present disclosure, the similarity of each standard diagnostic description in the subset to the diagnostic description provided by the medical institution may be calculated again in step 105. For example, the similarity (second similarity) may be recalculated using a different segmentation model or semantic analysis model than step 102.
Next, the maximum value of the second similarity degrees may be compared with a predetermined threshold value (second threshold value) (step 106).
If the maximum value of the second similarity is greater than or equal to the second threshold, the standard diagnostic description and the standard diagnostic code corresponding to the second similarity having the maximum value may be used as the standard diagnostic description and the standard diagnostic code corresponding to the diagnostic description and the diagnostic code of the medical institution.
If the maximum value of the second degree of similarity is still less than the second threshold value, a predetermined number of standard diagnostic descriptions may be considered as candidate standard diagnostic descriptions (step 107). For example, a predetermined number of standard diagnostic descriptions with a larger similarity may be regarded as candidate standard diagnostic descriptions according to the similarity (second similarity) of the standard diagnostic descriptions in the subset.
Further, in some embodiments according to the present disclosure, the second threshold may be less than the first threshold. For example, when the first threshold is 0.95, the second threshold may be set to 0.85.
Next, a standard diagnostic description corresponding to a diagnostic description provided by the medical institution may be further selected from the candidate standard diagnostic descriptions (step 108). For example, a selection may be made manually from candidate standard diagnostic descriptions; or may use other natural language processing and semantic matching models to select a standard diagnostic description from the candidate standard diagnostic descriptions that matches the diagnostic description provided by the medical facility.
Finally, the standard diagnostic descriptions determined to match and the corresponding standard diagnostic codes are output (step 109).
Furthermore, in some embodiments according to the present disclosure, when a standard diagnostic description corresponding to a diagnostic description provided by a medical institution is selected from the candidate standard diagnostic descriptions in step 108, the selected standard diagnostic description and the diagnostic description provided by the medical institution and the corresponding relationship therebetween may be recorded. For example, it may be recorded in a table (or database). In this way, when a similar diagnosis description is provided next time from the medical institution, the records in the table can be referred to, thereby directly determining the standard diagnosis description and the standard diagnosis code corresponding to the diagnosis description provided by the medical institution. It should be appreciated that the step of matching with reference to the table may be performed at any suitable location of the flow shown in fig. 1, as desired. For example, it may be between step 101 and step 102, or between step 103 and step 104, or between step 106 and step 107, etc.
Further, in some embodiments according to the present disclosure, the natural language processing and the semantic matching processing described above may obtain the respective models through machine learning. In this case, the standard diagnostic description corresponding to the diagnostic description provided by the medical institution, selected from the candidate standard diagnostic descriptions at step 108, may be fed back to the model as a training sample. By retraining the model with the training sample, the new model can find the matched standard diagnostic description more quickly and accurately in the subsequent natural language processing and semantic matching processing.
It should be understood that a variety of models obtained by machine learning may be used by those skilled in the art, for example, a word vectorization (WordToVector) model, a Recurrent Neural Network (RNN) model, a Long Short-Term Memory Network (LSTM) model, etc. may be used. The present disclosure is not so limited.
TABLE 1
Figure BDA0002568878840000091
Table 1 gives some illustrative embodiments according to the present disclosure.
In example 1, the name of the external code (i.e., diagnostic description) provided by an external entity (e.g., medical institution) is "acute non-viral hepatitis" and the diagnostic code is K72.004. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, the standard code name most similar to the external code name is determined in the database as "acute hepatitis (non-viral)", and the similarity is 0.9999. In the case where the predetermined first threshold value is, for example, 0.95, since the degree of similarity is greater than the first threshold value, the most similar standard code name "acute hepatitis (non-viral)" and the corresponding standard code K72.000A may be output as a result.
In example 2, the external code name (i.e., diagnostic description) provided by an external institution (e.g., medical institution) is "mucocutaneous lymph node syndrome [ kawasaki disease ]", and the diagnostic code is M30.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, the standard code name most similar to the external code name was determined in the database as "mucocutaneous lymph node syndrome [ kawasaki disease ]", and the similarity was 0.9808. In the case where the predetermined first threshold value is, for example, 0.95, since the degree of similarity is greater than the first threshold value, the most similar standard code name "mucocutaneous lymph node syndrome [ kawasaki disease ]" and the corresponding standard code M30.3 can be output as a result.
In example 3, the external code name (i.e., diagnostic description) provided by an external agency (e.g., medical agency) is "haemophilus influenzae sepsis" and the diagnostic code is a 41.300. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, a search and matching is performed in the database. Among the found results, the standard diagnosis with the highest similarity was described as "haemophilus influenzae pneumonia", the similarity was 0.9488, and the corresponding standard diagnosis code was j14.x 00. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose degree of similarity is larger than the first threshold value. In this case, it is necessary to further determine a subset based on the diagnostic code a41.300 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class a41 may be considered as a subset. Then, the standard code name with the highest similarity in the subset and the corresponding standard diagnostic code are selected for judgment. For example, the standard coding name "haemophilus influenzae sepsis" has the highest similarity 0.9366 to the "haemophilus influenzae sepsis" provided by external agencies in this subset. In the case where the predetermined second threshold value is 0.85, the similarity is larger than the second threshold value. Thus, the result of the final match can be determined for the standard encoding named "haemophilus influenzae sepsis" and the standard diagnostic encoding a41.3, and output.
In example 4, the name of the external code (i.e., the diagnostic description) provided by the external institution (e.g., medical institution) is "late-onset liver failure" and the diagnostic code is K72.005. After receiving the external code name and the diagnostic code, the insurance agency matches the received external code name with a standard code name (i.e., a standard diagnostic description) of the insurance agency. Through natural language processing, a search and matching is performed in the database. Among the searched results, the standard diagnosis having the highest similarity is described as "late schizophrenia", the similarity is 0.6838, and the corresponding standard diagnosis code is F20.802. However, in the case where the predetermined first threshold value is, for example, 0.95, there is no matching result whose degree of similarity is larger than the first threshold value. In this case, it is necessary to further determine a subset based on the diagnostic code K72.005 provided by the medical structure. For example, all standard diagnostic codes belonging to the broad class K72 may be considered as a subset. Then, the standard code name with the highest similarity in the subset and the corresponding standard diagnostic code are selected for judgment. For example, the standard coding name "acute and sub-acute liver failure" in this subset has the highest similarity 0.6233 to "late-onset liver failure" provided by external agencies, corresponding to the standard diagnostic code K72.0. However, in the case where the predetermined second threshold value is 0.85, the similarity is still smaller than the second threshold value. In this case, some standard code names having a larger similarity, the corresponding similarities, and the standard diagnostic codes are output as candidates. One of the candidates is selected as the final matching result, for example manually. For example, two candidates may be output. The first candidate standard code is named "acute and subacute liver failure", similarity 0.6233, standard diagnostic code K72.0; the second candidate was the standard code named "liver failure", similarity 0.5918, standard diagnostic code K72.900. Thus, the worker may manually select one of the two candidates (e.g., the second candidate) to be output as the final matching result. Of course, the number of candidate standard encoding names is not limited to two, and may be one, three, or more.
In addition, in some embodiments, manually selected matching results may also be recorded. For example, record in a table or database: the standard code name that matches the external code name "late-onset liver failure" is "liver failure". Therefore, when new diagnosis description and diagnosis codes are received from an external mechanism and need to be matched, the search can be carried out according to the matching relation recorded in the table or the database, the possibility that the same external code name is manually matched again is avoided, and the matching speed and the matching accuracy are improved.
Fig. 2 shows a block diagram of a computing device, which is one example of a hardware device applicable to aspects of the present disclosure, according to an example embodiment of the present disclosure.
With reference to fig. 2, a computing device 700, which is one example of a hardware device applicable to aspects of the present disclosure, will now be described. Computing device 700 may be any machine configured to implement processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an in-vehicle computer, or any combination thereof. The various aforementioned apparatus/servers/client devices may be implemented in whole or at least in part by computing device 700 or similar devices or systems.
Computing device 700 may include components connected to or in communication with bus 702, possibly via one or more interfaces. For example, computing device 700 may include a bus 702, one or more processors 704, one or more input devices 706, and one or more output devices 708. The one or more processors 704 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., dedicated processing chips). Input device 706 may be any type of device capable of inputting information to a computing device and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote controller. Output device 708 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 700 may also include or be connected with non-transitory storage device 710, which may be any storage device that is non-transitory and that enables data storage, and which may include, but is not limited to, disk drives, optical storage devices, solid-state memory, floppy disks, hard disks, magnetic tape, or any other magnetic medium, optical disks or any other optical medium, ROM (read only memory), RAM (random access memory), cache memory, and/or any memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. The non-transitory storage device 710 may be detached from the interface. The non-transitory storage device 710 may have data/instructions/code for implementing the above-described methods and steps. Computing device 700 may also include a communication device 712. The communication device 712 may be any type of device or system capable of communicating with internal apparatus and/or with a network and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a Bluetooth device, 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
According to some embodiments of the present disclosure, the following technical solutions may be adopted:
1. a method of determining a standard diagnostic code for a disease, comprising:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
2. The method according to 1, wherein in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions with larger second similarities among the standard diagnostic descriptions of the subset are selected as candidate standard diagnostic descriptions; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
4. The method of claim 2 or 3, further comprising: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
5. The method of 4, further comprising:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
6. The method of claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the following steps:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of 6, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
8. The method of 1, wherein the second threshold is less than the first threshold.
9. The method of 1, wherein determining the subset of the plurality of standard diagnostic descriptions from the diagnostic code comprises:
determining a set of standard diagnostic codes from the diagnostic codes; and
obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
10. The method of claim 9, wherein the diagnostic codes are international disease classification standard codes, and the standard diagnostic codes contained in the major or minor classes of diagnostic codes are taken as the set of standard diagnostic codes.
11. An apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
12. The apparatus of claim 11, wherein the processor is further configured to:
selecting a predetermined number of standard diagnostic descriptions of the subset having a greater second similarity as candidate standard diagnostic descriptions if it is determined that the maximum value of the second similarity is less than the second threshold; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
13. The apparatus of claim 12, wherein the processor is further configured to: receiving one of the candidate standard diagnostic descriptions manually selected as a standard diagnostic description of the disease.
14. The apparatus of claim 12 or 13, wherein the processor is further configured to:
recording a correspondence between the selected standard diagnostic description and the diagnostic description.
15. The apparatus of claim 14, wherein the processor is further configured to:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
16. The apparatus of claim 12 or 13, wherein the natural language processing uses a model obtained by machine learning,
the processor is further configured to:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
17. The apparatus of claim 16, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
18. The apparatus of claim 11, wherein the second threshold is less than the first threshold.
19. The apparatus of claim 11, wherein the processor is further configured to:
determining a set of standard diagnostic codes from the diagnostic codes; and
obtaining standard diagnostic descriptions corresponding to the standard diagnostic codes in the set as a subset of the plurality of standard diagnostic descriptions.
20. The apparatus of claim 19, wherein the diagnostic code is an international disease classification standard code, the processor further configured to:
and taking the standard diagnostic codes contained in the large class or the small class of the diagnostic codes as the set of the standard diagnostic codes.
21. A non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, the processor is configured to perform the method of any one of claims 1-10.
As used herein, the word "exemplary" means "serving as an example, instance, or illustration," and not as a "model" that is to be replicated accurately. Any implementation exemplarily described herein is not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the disclosure is not limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.
In addition, certain terminology may also be used in the following description for the purpose of reference only, and thus is not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.
It will be further understood that the terms "comprises/comprising," "includes" and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In the present disclosure, the term "providing" is used broadly to encompass all ways of obtaining an object, and thus "providing an object" includes, but is not limited to, "purchasing," "preparing/manufacturing," "arranging/setting," "installing/assembling," and/or "ordering" the object, and the like.
Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (10)

1. A method of determining a standard diagnostic code for a disease, comprising:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
2. The method according to claim 1, wherein in a case where it is determined that the maximum value of the second similarity is smaller than the second threshold value, a predetermined number of standard diagnostic descriptions of the subset having a larger second similarity are selected as candidate standard diagnostic descriptions; and
selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description for the disease.
3. The method of claim 2, wherein selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description corresponding to the disease comprises: manually selecting one of the candidate standard diagnostic descriptions as a standard diagnostic description of the disease by a human.
4. The method of claim 2 or 3, further comprising: recording a correspondence between the selected standard diagnostic description and the diagnostic description.
5. The method of claim 4, further comprising:
receiving another diagnostic description of the disease from an external entity; and
determining a standard diagnostic description of the disease from the correspondence and the further diagnostic description.
6. The method according to claim 2 or 3, wherein the natural language processing uses a model obtained by machine learning,
the method further comprises the following steps:
training the model using the selected standard diagnostic description, the diagnostic description, and the correspondence between the selected standard diagnostic description and the diagnostic description as training samples.
7. The method of claim 6, wherein the model comprises: word vectorization model, recurrent neural network model, long-short term memory network model.
8. The method of claim 1, wherein the second threshold is less than the first threshold.
9. An apparatus for determining a standard diagnostic code for a disease, comprising a memory and a processor, wherein when the processor executes instructions stored in the memory, the processor is configured to:
receiving a diagnostic code and a diagnostic description of a disease from an external entity;
matching, by natural language processing, the diagnostic description with a plurality of standard diagnostic descriptions in a database to obtain a plurality of first similarities between the diagnostic description and the plurality of standard diagnostic descriptions;
determining that a maximum value of the plurality of first similarities is less than a predetermined first threshold;
determining a subset of the plurality of standard diagnostic descriptions from the diagnostic code;
obtaining a maximum value of a second similarity of the standard diagnostic description in the subset to the diagnostic description; and
and in the case that the maximum value of the second similarity is determined to be larger than or equal to a predetermined second threshold value, taking the standard diagnosis description corresponding to the maximum value of the second similarity as the standard diagnosis description of the disease, and taking the standard diagnosis code corresponding to the standard diagnosis description as the standard diagnosis code of the disease.
10. A non-transitory computer readable storage medium having stored thereon instructions that, when executed by a processor, are configured to perform the method of any of claims 1-8.
CN202010631296.2A 2020-07-03 2020-07-03 Method and apparatus for determining standard diagnostic codes for disease Active CN111814432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010631296.2A CN111814432B (en) 2020-07-03 2020-07-03 Method and apparatus for determining standard diagnostic codes for disease

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010631296.2A CN111814432B (en) 2020-07-03 2020-07-03 Method and apparatus for determining standard diagnostic codes for disease

Publications (2)

Publication Number Publication Date
CN111814432A true CN111814432A (en) 2020-10-23
CN111814432B CN111814432B (en) 2024-04-16

Family

ID=72855269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010631296.2A Active CN111814432B (en) 2020-07-03 2020-07-03 Method and apparatus for determining standard diagnostic codes for disease

Country Status (1)

Country Link
CN (1) CN111814432B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723056A (en) * 2021-08-19 2021-11-30 杭州火树科技有限公司 ICD (interface control document) coding conversion method, device, computing equipment and storage medium
CN116385566A (en) * 2022-05-27 2023-07-04 上海玄戒技术有限公司 Light source estimation method, device, electronic equipment, chip and storage medium
CN116631614A (en) * 2023-07-24 2023-08-22 北京惠每云科技有限公司 Treatment scheme generation method, treatment scheme generation device, electronic equipment and storage medium
CN116884630A (en) * 2023-09-06 2023-10-13 深圳达实旗云健康科技有限公司 Method for improving disease automatic coding efficiency

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
EP3637431A1 (en) * 2018-10-12 2020-04-15 Fujitsu Limited Medical diagnostic aid and method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
EP3637431A1 (en) * 2018-10-12 2020-04-15 Fujitsu Limited Medical diagnostic aid and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
成诚;黄昊;欧东;: "疾病诊断自动编码系统的设计与应用", 中国病案, no. 09 *
杨华;汪凯;郑晓华;: "ICD-10智能辅助编码方法的探讨", 中国病案, no. 09 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723056A (en) * 2021-08-19 2021-11-30 杭州火树科技有限公司 ICD (interface control document) coding conversion method, device, computing equipment and storage medium
CN116385566A (en) * 2022-05-27 2023-07-04 上海玄戒技术有限公司 Light source estimation method, device, electronic equipment, chip and storage medium
CN116385566B (en) * 2022-05-27 2024-04-30 上海玄戒技术有限公司 Light source estimation method, device, electronic equipment, chip and storage medium
CN116631614A (en) * 2023-07-24 2023-08-22 北京惠每云科技有限公司 Treatment scheme generation method, treatment scheme generation device, electronic equipment and storage medium
CN116884630A (en) * 2023-09-06 2023-10-13 深圳达实旗云健康科技有限公司 Method for improving disease automatic coding efficiency

Also Published As

Publication number Publication date
CN111814432B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN111814432B (en) Method and apparatus for determining standard diagnostic codes for disease
CN111581976B (en) Medical term standardization method, device, computer equipment and storage medium
WO2022105115A1 (en) Question and answer pair matching method and apparatus, electronic device and storage medium
US11216618B2 (en) Query processing method, apparatus, server and storage medium
US20130091138A1 (en) Contextualization, mapping, and other categorization for data semantics
US20200242506A1 (en) Systems and methods for time-based abnormality identification within uniform dataset
CN107909088B (en) Method, apparatus, device and computer storage medium for obtaining training samples
CN111651552B (en) Structured information determining method and device and electronic equipment
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
CN113535817B (en) Feature broad table generation and service processing model training method and device
CN111506595B (en) Data query method, system and related equipment
US11308130B1 (en) Constructing ground truth when classifying data
CN112181490A (en) Method, device, equipment and medium for identifying function category in function point evaluation method
CN114969387A (en) Document author information disambiguation method and device and electronic equipment
CN113761867A (en) Address recognition method and device, computer equipment and storage medium
CN109144999B (en) Data positioning method, device, storage medium and program product
CN118095205A (en) Information extraction method, device and equipment of layout file and storage medium
CN111680082A (en) Government financial data acquisition system and data acquisition method based on data integration
CN117667841A (en) Enterprise data management platform and method
US11842165B2 (en) Context-based image tag translation
CN115358817A (en) Intelligent product recommendation method, device, equipment and medium based on social data
CN114444441A (en) Name similarity calculation method and device, storage medium and calculation equipment
CN115129871A (en) Text category determination method and device, computer equipment and storage medium
CN114550157A (en) Bullet screen gathering identification method and device
CN116306598B (en) Customized error correction method, system, equipment and medium for words in different fields

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant