CN111180060A - Automatic coding method and device for disease diagnosis - Google Patents

Automatic coding method and device for disease diagnosis Download PDF

Info

Publication number
CN111180060A
CN111180060A CN201911168334.9A CN201911168334A CN111180060A CN 111180060 A CN111180060 A CN 111180060A CN 201911168334 A CN201911168334 A CN 201911168334A CN 111180060 A CN111180060 A CN 111180060A
Authority
CN
China
Prior art keywords
codes
category
disease
icd
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911168334.9A
Other languages
Chinese (zh)
Other versions
CN111180060B (en
Inventor
史亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN201911168334.9A priority Critical patent/CN111180060B/en
Publication of CN111180060A publication Critical patent/CN111180060A/en
Application granted granted Critical
Publication of CN111180060B publication Critical patent/CN111180060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses an automatic coding method for disease diagnosis, which comprises the following steps: acquiring disease diagnosis according to the target case; retrieving according to the disease diagnosis to obtain ICD disease names and codes of a preset number of candidates with highest similarity ranking with the disease diagnosis; calculating scores of the disease diagnoses and the predetermined number of candidate ICD codes according to the disease diagnoses, the ICD disease names and the chapter, section, category and sub-category codes in the ICD codes; determining the candidate ICD code with the highest score as the code for the disease diagnosis. By adopting the scheme disclosed by the invention, the disease diagnosis is coded by utilizing the hierarchical characteristics of the codes, namely, the chapter, section, category and sub-category codes in the codes, so that the accuracy of the codes is improved.

Description

Automatic coding method and device for disease diagnosis
Technical Field
The invention relates to the technical field of medical services, in particular to an automatic coding method and device for disease diagnosis.
Background
International Classification of Diseases (ICD) is an International unified disease Classification method established by WHO, which classifies Diseases into an ordered combination according to characteristics such as etiology, pathology, clinical manifestation, and anatomical location of Diseases, and represents the disease by a coding method. It is common worldwide to revise the 10 th international statistical classification of diseases and related health problems, which retains the ICD abbreviation and is collectively referred to as ICD-10.
At present, a disease data set is obtained from a medical record, words are segmented to obtain disease feature words, the feature words are vectorized and transmitted into a convolutional neural network to be classified to obtain disease types, coding results are obtained by coding the disease types, hierarchical relations exist in the coding, the hierarchical features of the coding are not utilized in the prior art, the accuracy rate is low, the hierarchical features of the coding are utilized, the coding accuracy rate is improved, and the technical problem to be solved urgently is solved.
Disclosure of Invention
The invention provides an automatic coding method for disease diagnosis, which is used for improving the accuracy of coding by utilizing the hierarchical characteristics of the coding.
The invention provides an automatic coding method for disease diagnosis, which comprises the following steps:
acquiring disease diagnosis according to the target case;
retrieving according to the disease diagnosis to obtain ICD disease names and codes of a preset number of candidates with highest similarity ranking with the disease diagnosis;
calculating scores of the disease diagnoses and the predetermined number of candidate ICD codes according to the disease diagnoses, the ICD disease names and the chapter, section, category and sub-category codes in the ICD codes;
determining the candidate ICD code with the highest score as the code for the disease diagnosis.
In one embodiment, the retrieving according to the disease diagnosis to obtain the ICD disease names and codes of the predetermined number of candidate ICDs with the highest similarity ranking with the disease diagnosis includes:
retrieving the ICD10 national clinical version 2.0 disease names of the disease diagnosis through a retrieval model to obtain predetermined number of candidate ICD disease names and codes with highest similarity ranking with the disease diagnosis and retrieval scores corresponding to the predetermined number of candidate ICD disease names and codes respectively;
the retrieval scores are normalized and the processed retrieval scores are labeled score 0.
In one embodiment, the automatic coding method for disease diagnosis further comprises:
acquiring data sets of the chapter, section, category, sub-category and fine category in the ICD10 disease code;
inputting the data sets of the chapter, section, category and sub-category into a pre-training medical field model for training in a fine-tuning mode to obtain a chapter classification model, a section classification model, a category classification model and a sub-category classification model and an accuracy index P corresponding to the chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
Inputting the data set of the items into the pre-training medical field model for training to obtain an item alignment model and an accuracy rate index P corresponding to the item alignment model5
In one embodiment, said calculating scores for said disease diagnosis and said predetermined number of candidate ICD codes based on said disease diagnosis, said ICD disease name and chapter, section, category, subgenus codes in said ICD code comprises:
extracting chapter, section, category and sub-category codes in the ICD disease codes of the preset number of candidates;
applying the chapter classification model, section classification model, category classification model and sub-category classification model to the disease diagnosis to obtain scores corresponding to chapters, sections, categories and sub-category codes in a preset number of candidate ICD disease codes respectively;
normalizing the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes with the preset number of candidates;
and marking the scores respectively corresponding to the chapter, section, category and sub-category codes in the ICD disease codes with preset number as score1, score2, score3 and score 4.
In one embodiment, said calculating scores for said disease diagnosis and said predetermined number of candidate ICD codes based on said disease diagnosis, said ICD disease name and chapter, section, category, subgenus codes in said ICD code further comprises:
applying the breakdown alignment model to the disease diagnosis and the ICD disease names of the preset number of candidates to obtain corresponding scores;
the corresponding scores are normalized and the normalized scores are labeled score 5.
In one embodiment, the calculating of the scores for the disease diagnosis and the predetermined number of candidate ICD codes is performed by the following formula:
Figure BDA0002288053820000031
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
The present invention also provides an automatic coding device for disease diagnosis, comprising:
a first obtaining module for obtaining a disease diagnosis according to a target case;
the retrieval module is used for retrieving according to the disease diagnosis to obtain the ICD disease names and codes with the highest similarity ranking with the disease diagnosis and with the preset number of candidates;
a calculating module, configured to calculate scores of the ICD codes of the disease diagnosis and the predetermined number of candidate ICD codes according to the disease diagnosis, the ICD disease name, and the chapter, section, category, and sub-category codes in the ICD codes;
a determination module for determining the candidate ICD code with the highest score as the code for the disease diagnosis.
In one embodiment, the retrieval module includes:
the retrieval submodule is used for retrieving the ICD10 national clinical version 2.0 disease names of the disease diagnosis through a retrieval model so as to obtain predetermined number of candidate ICD disease names and codes with highest similarity ranking with the disease diagnosis and retrieval scores corresponding to the predetermined number of candidate ICD disease names and codes respectively;
and the first processing submodule is used for carrying out normalization processing on the retrieval scores and marking the processed retrieval scores as score 0.
In one embodiment, the automatic coding device for disease diagnosis further comprises:
the second acquisition module is used for acquiring data sets of the chapter, section, category, sub-category and item in the ICD10 disease code;
the first training module is used for inputting the data sets of the chapters, the sections, the categories and the sub-categories into a pre-training medical field model for training in a fine-tuning mode so as to obtain a chapter classification model, a section classification model, a category classification model and a sub-category classification model and an accuracy rate index P corresponding to the chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
A second training module, configured to input the data set of the items into the pre-training medical field model for training, so as to obtain an item alignment model and an accuracy index P corresponding to the item alignment model5
In one embodiment, the calculation module includes:
the extraction submodule is used for extracting chapter, section, category and sub-category codes in the ICD disease codes of the preset number of candidates;
a first scoring submodule, configured to apply the chapter classification model, the section classification model, the category classification model, and the sub-category classification model to the disease diagnosis to obtain scores corresponding to chapters, sections, categories, and sub-category codes in a preset number of candidate ICD disease codes respectively;
the second processing submodule is used for carrying out normalization processing on the scores respectively corresponding to the chapter, section, category and sub-category codes in the ICD disease codes with the preset number of candidates;
marking sub-modules, which are used for marking the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes of the preset number of candidates as score1, score2, score3 and score 4;
the obtaining sub-module is used for applying the breakdown alignment model to the disease diagnosis and the ICD disease names of the preset number of candidates to obtain corresponding scores;
a third processing submodule, configured to perform normalization processing on the corresponding score, and label the normalized score as score 5;
a calculation sub-module for calculating scores for the disease diagnosis and the predetermined number of candidate ICD codes by the formula:
Figure BDA0002288053820000051
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of an automatic coding method for disease diagnosis according to an embodiment of the present invention;
FIG. 2 is a flow chart of an automatic coding method for disease diagnosis according to an embodiment of the present invention;
FIG. 3 is a block diagram of an automatic coding device for disease diagnosis according to an embodiment of the present invention;
FIG. 4 is a block diagram of an automatic coding device for disease diagnosis according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Fig. 1 is a flow chart of an automatic coding method for disease diagnosis according to an embodiment of the present invention, as shown in fig. 1, the method can be implemented as the following steps S11-S14:
in step S11, a disease diagnosis is acquired from the target case;
in step S12, retrieving the ICD disease names and codes with the highest similarity to the disease diagnosis according to the disease diagnosis;
in step S13, calculating scores of the ICD codes for the disease diagnosis and the predetermined number of candidate ICD codes according to the disease diagnosis, the ICD disease name and the chapter, section, category, sub-category codes in the ICD codes;
in step S14, the ICD code candidate with the highest score is determined as the code for disease diagnosis.
It should be noted that, the above steps S11-S14 can also be used to automatically perform ICD-9-CM-3 encoding for the operation.
In the embodiment, a disease diagnosis of a patient is acquired according to a target case; searching according to the disease diagnosis to obtain ICD disease names and codes of a preset number of candidates with highest similarity ranking with the disease diagnosis; calculating scores of the disease diagnosis and a preset number of candidate ICD codes according to the disease diagnosis, the ICD disease name and the chapter, section, category and sub-category codes in the ICD codes; determining the candidate ICD code with the highest score as the code for the disease diagnosis.
The beneficial effect of this embodiment lies in: the disease diagnosis is coded by utilizing the hierarchical characteristics of the codes, namely, the codes of the chapters, sections, categories and sub-categories in the codes, so that the accuracy of the codes is improved.
In one embodiment, as shown in FIG. 2, the above step S12 can be implemented as the following steps S21-S22:
in step S21, retrieving the disease names of ICD10 national clinical version 2.0 of the disease diagnosis through a retrieval model to obtain the ICD disease names and codes of the preset number of candidates with the highest similarity ranking with the disease diagnosis and the retrieval scores corresponding to the ICD disease names and codes of the preset number of candidates;
in step S22, the search score is normalized, and the processed search score is labeled score 0.
In this embodiment, the disease diagnosis is input into a retrieval model (which may be referred to as model0), and the disease names of ICD10 national clinical version 2.0 for disease diagnosis are retrieved through the retrieval model, so as to obtain the ICD disease names and codes of the preset number of candidates with the highest similarity ranking with the disease diagnosis and the retrieval scores corresponding to the ICD disease names and codes of the preset number of candidates; the retrieval scores are normalized and the processed retrieval scores are labeled score0, wherein the preset number of candidate ICD disease names and codes can be 10, but not limited to 10.
The beneficial effect of this embodiment lies in: and obtaining the ICD codes of the preset number with the highest ranking similar to the disease diagnosis through the retrieval model, thereby ensuring that the codes of the disease diagnosis are more accurate.
In one embodiment, a disease diagnosis automatic coding method further comprises:
acquiring data sets of the chapter, section, category, sub-category and fine category in the ICD10 disease code;
inputting the data sets of the chapter, section, category and sub-category into a pre-training medical field model for training in a fine-tuning mode to obtain a chapter classification model, a section classification model, a category classification model, a sub-category classification model and an accuracy index P corresponding to the chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
Gathering and transmitting detailed dataTraining the model in the pre-training medical field to obtain a fine mesh alignment model and an accuracy rate index P corresponding to the fine mesh alignment model5
It should be noted that the pre-trained medical field model may be a BERT model, and the chapter classification model, the section classification model, the category classification model, the sub-category classification model, and the breakdown alignment model may be referred to as a model1, a model2, a model3, a model4, and a model5, respectively.
In this embodiment, the pre-trained expected field model is trained in a fine-tuning manner to obtain a chapter classification model, a section classification model, a category classification model, a sub-category classification model, and accuracy indexes P1, P2, P3, and P4 respectively corresponding to the chapter classification model, the section classification model, the category classification model, the sub-category classification model, and the pre-trained medical field model, so as to obtain a fine-mesh alignment model and a corresponding accuracy index P5.
The beneficial effect of this embodiment lies in: the number of the classified items is more than 3 ten thousand and 5 thousand, the effect of the classification model is poor, and the two classification problems between diagnosis and name are converted when the two classification models are aligned, so that the effect can be ensured by training to obtain the item alignment model instead of the classification model.
In one embodiment, the step S13 can be implemented as steps including:
extracting chapter, section, category and sub-category codes in a preset number of candidate ICD disease codes;
respectively applying a chapter classification model, a section classification model, a category classification model and a sub-category classification model to the disease diagnosis to obtain scores respectively corresponding to chapters, sections, categories and sub-category codes in the ICD disease codes with a preset number of candidates;
normalizing scores respectively corresponding to chapter, section, category and sub-category codes in the ICD disease codes with a preset number of candidates;
scores corresponding to chapter, section, category and sub-category codes in the ICD disease codes of the preset number of candidates after processing are respectively marked as score1, score2, score3 and score 4.
In this embodiment, scores corresponding to the chapter, section, category, and sub-category codes of the candidate ICD codes are obtained by using a chapter classification model, a section classification model, a category classification model, and a sub-category classification model, and are normalized to obtain score1, score2, score3, and score 4.
The beneficial effect of this embodiment lies in: similarity scores with the chapter, section, category, sub-category codes of the candidate ICD codes in disease diagnosis can be obtained.
In one embodiment, the step S13 may be implemented, and further includes:
applying a detailed alignment model to disease diagnosis and ICD disease names of a preset number of candidates to obtain corresponding scores;
the corresponding scores are normalized and the normalized scores are labeled score 5.
In this embodiment, the score5 is obtained by obtaining the corresponding score by using the mesh alignment model and performing normalization processing on the corresponding score.
The beneficial effect of this embodiment lies in: a similarity score can be derived between the disease diagnosis and the candidate ICD disease name.
In one embodiment, scores for a disease diagnosis and a predetermined number of candidate ICD codes are calculated by the following formula:
Figure BDA0002288053820000081
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
In this embodiment, scores for diagnosis of a disease and a predetermined number of candidate ICD codes are calculated by the above formula.
The beneficial effect of this embodiment lies in: and calculating the score conditions of the disease diagnosis and the ICD codes of the preset number of candidates through a formula, and further obtaining the candidate ICD code with the highest similarity score with the disease diagnosis.
Fig. 3 is a block diagram of an automatic coding device for disease diagnosis according to an embodiment of the present invention, as shown in fig. 3, the device may include the following modules:
a first obtaining module 31 for obtaining a disease diagnosis according to a target case;
the retrieval module 32 is configured to perform retrieval according to disease diagnosis to obtain ICD disease names and codes with a preset number of candidates with highest similarity ranking with the disease diagnosis;
a calculating module 33, configured to calculate scores of the ICD codes for the disease diagnosis and a preset number of candidate ICD codes according to the disease diagnosis, the ICD disease name, and the chapter, section, category, and sub-category codes in the ICD codes;
a determination module 34 for determining the ICD code candidate with the highest score as the code for disease diagnosis.
The beneficial effect of this embodiment lies in: the disease diagnosis is coded by utilizing the hierarchical characteristics of the codes, namely, the codes of the chapters, sections, categories and sub-categories in the codes, so that the accuracy of the codes is improved.
In one embodiment, as shown in FIG. 4, the retrieval module 32 includes:
the retrieval submodule 41 is used for retrieving the disease names of ICD10 national clinical version 2.0 of disease diagnosis through a retrieval model so as to obtain retrieval scores corresponding to the ICD disease names and codes of the preset number of candidates with highest similarity ranking with the disease diagnosis and the ICD disease names and codes of the preset number of candidates;
and the first processing submodule 42 is used for carrying out normalization processing on the retrieval scores and marking the processed retrieval scores as score 0.
The beneficial effect of this embodiment lies in: and obtaining the ICD codes of the preset number with the highest ranking similar to the disease diagnosis through the retrieval model, thereby ensuring that the codes of the disease diagnosis are more accurate.
In one embodiment, an automatic coding device for disease diagnosis further comprises:
the second acquisition module is used for acquiring data sets of the chapter, section, category, sub-category and item in the ICD10 disease code;
the first training module is used for inputting the data sets of the chapters, the sections, the categories and the sub-categories into the pre-training medical field model for training in a fine-tuning mode so as to obtain a chapter classification model, a section classification model and a categoryAccuracy index P corresponding to classification model, suborder classification model and chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
A second training module, configured to input the data set of the items into a pre-training medical field model for training, so as to obtain an item alignment model and an accuracy index P corresponding to the item alignment model5
The beneficial effect of this embodiment lies in: the number of the classified items is more than 3 ten thousand and 5 thousand, the effect of the classification model is poor, and the two classification problems between diagnosis and name are converted when the two classification models are aligned, so that the effect can be ensured by training to obtain the item alignment model instead of the classification model.
In one embodiment, a computing module, comprising:
the extraction submodule is used for extracting chapter, section, category and sub-category codes in the ICD disease codes with the preset number of candidates;
the first scoring submodule is used for respectively applying a chapter classification model, a section classification model, a category classification model and a sub-category classification model to disease diagnosis so as to obtain scores respectively corresponding to chapters, sections, categories and sub-category codes in a preset number of candidate ICD disease codes;
the second processing submodule is used for carrying out normalization processing on the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes with the preset number of candidates;
the marking submodule is used for marking the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes with preset number as score1, score2, score3 and score 4;
the obtaining submodule is used for applying a detailed alignment model to disease diagnosis and ICD disease names with a preset number of candidates to obtain corresponding scores;
the third processing submodule is used for carrying out normalization processing on the corresponding scores and marking the scores after the normalization processing as score 5;
a calculation sub-module for calculating scores for a disease diagnosis and a preset number of candidate ICD codes by the formula:
Figure BDA0002288053820000101
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
The beneficial effect of this embodiment lies in: and calculating the score conditions of the disease diagnosis and the ICD codes of the preset number of candidates through a formula, and further obtaining the candidate ICD code with the highest similarity score with the disease diagnosis.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An automatic coding method for disease diagnosis, comprising:
acquiring disease diagnosis according to the target case;
retrieving according to the disease diagnosis to obtain ICD disease names and codes of a preset number of candidates with highest similarity ranking with the disease diagnosis;
calculating scores of the disease diagnoses and the predetermined number of candidate ICD codes according to the disease diagnoses, the ICD disease names and the chapter, section, category and sub-category codes in the ICD codes;
determining the candidate ICD code with the highest score as the code for the disease diagnosis.
2. The method of claim 1, wherein said retrieving according to said disease diagnosis for obtaining a predetermined number of candidate ICD disease names and codes having highest similarity to said disease diagnosis comprises:
retrieving the ICD10 national clinical version 2.0 disease names of the disease diagnosis through a retrieval model to obtain predetermined number of candidate ICD disease names and codes with highest similarity ranking with the disease diagnosis and retrieval scores corresponding to the predetermined number of candidate ICD disease names and codes respectively;
the retrieval scores are normalized and the processed retrieval scores are labeled score 0.
3. The method of claim 1, wherein the method further comprises:
acquiring data sets of the chapter, section, category, sub-category and fine category in the ICD10 disease code;
inputting the data sets of the chapter, section, category and sub-category into a pre-training medical field model for training in a fine-tuning mode to obtain a chapter classification model, a section classification model, a category classification model and a sub-category classification model and an accuracy index P corresponding to the chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
Inputting the data set of the items into the pre-training medical field model for training to obtain an item alignment model and an accuracy rate index P corresponding to the item alignment model5
4. The method of claim 2 or 3, wherein said calculating scores for said disease diagnosis and said predetermined number of candidate ICD codes based on said disease diagnosis, said ICD disease name and chapter, section, category, subgenus codes in said ICD code comprises:
extracting chapter, section, category and sub-category codes in the ICD disease codes of the preset number of candidates;
applying the chapter classification model, section classification model, category classification model and sub-category classification model to the disease diagnosis to obtain scores corresponding to chapters, sections, categories and sub-category codes in a preset number of candidate ICD disease codes respectively;
normalizing the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes with the preset number of candidates;
and marking the scores respectively corresponding to the chapter, section, category and sub-category codes in the ICD disease codes with preset number as score1, score2, score3 and score 4.
5. The method of any of claims 2-4, wherein said calculating scores for said disease diagnosis and said predetermined number of candidate ICD codes based on said disease diagnosis, said ICD disease name, and chapter, section, category, subgenus codes in said ICD codes, further comprises:
applying the breakdown alignment model to the disease diagnosis and the ICD disease names of the preset number of candidates to obtain corresponding scores;
the corresponding scores are normalized and the normalized scores are labeled score 5.
6. The method of claim 5, wherein said calculating scores for said disease diagnosis and said predetermined number of candidate ICD codes is performed by the formula:
Figure FDA0002288053810000021
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
7. An automatic coding device for disease diagnosis, comprising:
a first obtaining module for obtaining a disease diagnosis according to a target case;
the retrieval module is used for retrieving according to the disease diagnosis to obtain the ICD disease names and codes with the highest similarity ranking with the disease diagnosis and with the preset number of candidates;
a calculating module, configured to calculate scores of the ICD codes of the disease diagnosis and the predetermined number of candidate ICD codes according to the disease diagnosis, the ICD disease name, and the chapter, section, category, and sub-category codes in the ICD codes;
a determination module for determining the candidate ICD code with the highest score as the code for the disease diagnosis.
8. The apparatus of claim 7, wherein the retrieval module comprises:
the retrieval submodule is used for retrieving the ICD10 national clinical version 2.0 disease names of the disease diagnosis through a retrieval model so as to obtain predetermined number of candidate ICD disease names and codes with highest similarity ranking with the disease diagnosis and retrieval scores corresponding to the predetermined number of candidate ICD disease names and codes respectively;
and the first processing submodule is used for carrying out normalization processing on the retrieval scores and marking the processed retrieval scores as score 0.
9. The apparatus of claim 7, wherein the apparatus further comprises:
the second acquisition module is used for acquiring data sets of the chapter, section, category, sub-category and item in the ICD10 disease code;
the first training module is used for inputting the data sets of the chapters, the sections, the categories and the sub-categories into a pre-training medical field model for training in a fine-tuning mode so as to obtain a chapter classification model, a section classification model, a category classification model and a sub-category classification model and an accuracy rate index P corresponding to the chapter classification model1Accuracy index P corresponding to section classification model2Accuracy index P corresponding to category classification model3Accuracy index P corresponding to sub-eye classification model4
A second training module for inputting the data set of the item into the pre-training medical field model for training to obtain an item alignment modelType and accuracy rate index P corresponding to the fine mesh alignment model5
10. The apparatus of claim 8 or 9, wherein the computing module comprises:
the extraction submodule is used for extracting chapter, section, category and sub-category codes in the ICD disease codes of the preset number of candidates;
a first scoring submodule, configured to apply the chapter classification model, the section classification model, the category classification model, and the sub-category classification model to the disease diagnosis to obtain scores corresponding to chapters, sections, categories, and sub-category codes in a preset number of candidate ICD disease codes respectively;
the second processing submodule is used for carrying out normalization processing on the scores respectively corresponding to the chapter, section, category and sub-category codes in the ICD disease codes with the preset number of candidates;
marking sub-modules, which are used for marking the scores respectively corresponding to the chapter codes, section codes, category codes and sub-category codes in the ICD disease codes of the preset number of candidates as score1, score2, score3 and score 4;
the obtaining sub-module is used for applying the breakdown alignment model to the disease diagnosis and the ICD disease names of the preset number of candidates to obtain corresponding scores;
a third processing submodule, configured to perform normalization processing on the corresponding score, and label the normalized score as score 5;
a calculation sub-module for calculating scores for the disease diagnosis and the predetermined number of candidate ICD codes by the formula:
Figure FDA0002288053810000041
wherein, betaiand xi is a super parameter, xi is a threshold value, and the value range of i is 0 to 5.
CN201911168334.9A 2019-11-25 2019-11-25 Disease diagnosis automatic coding method and device Active CN111180060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911168334.9A CN111180060B (en) 2019-11-25 2019-11-25 Disease diagnosis automatic coding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911168334.9A CN111180060B (en) 2019-11-25 2019-11-25 Disease diagnosis automatic coding method and device

Publications (2)

Publication Number Publication Date
CN111180060A true CN111180060A (en) 2020-05-19
CN111180060B CN111180060B (en) 2023-07-25

Family

ID=70657292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911168334.9A Active CN111180060B (en) 2019-11-25 2019-11-25 Disease diagnosis automatic coding method and device

Country Status (1)

Country Link
CN (1) CN111180060B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785387A (en) * 2020-07-02 2020-10-16 朱玮 Method and system for disease standardized mapping classification by using Bert
CN113593711A (en) * 2021-08-03 2021-11-02 中电健康云科技有限公司 Health management information pushing method based on international disease classification coding
CN115964472A (en) * 2021-12-03 2023-04-14 奥码哈(杭州)医疗科技有限公司 ICD coding method, ICD coding query method, coding system and query system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109785959A (en) * 2018-12-14 2019-05-21 平安医疗健康管理股份有限公司 A kind of disease code method and apparatus
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
US20190251455A1 (en) * 2018-02-13 2019-08-15 International Business Machines Corporation Combining chemical structure data with unstructured data for predictive analytics in a cognitive system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190251455A1 (en) * 2018-02-13 2019-08-15 International Business Machines Corporation Combining chemical structure data with unstructured data for predictive analytics in a cognitive system
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109785959A (en) * 2018-12-14 2019-05-21 平安医疗健康管理股份有限公司 A kind of disease code method and apparatus
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
鲍庆升;程绍银;蒋凡;: "基于文本分析的自动化疾病编码方法" *
鲍庆升;程绍银;蒋凡;: "基于文本分析的自动化疾病编码方法", 计算机系统应用, no. 12 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785387A (en) * 2020-07-02 2020-10-16 朱玮 Method and system for disease standardized mapping classification by using Bert
CN111785387B (en) * 2020-07-02 2024-06-11 朱玮 Method and system for classifying disease standardization mapping by using Bert
CN113593711A (en) * 2021-08-03 2021-11-02 中电健康云科技有限公司 Health management information pushing method based on international disease classification coding
CN115964472A (en) * 2021-12-03 2023-04-14 奥码哈(杭州)医疗科技有限公司 ICD coding method, ICD coding query method, coding system and query system

Also Published As

Publication number Publication date
CN111180060B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
US20200334416A1 (en) Computer-implemented natural language understanding of medical reports
CN110491465B (en) Disease classification coding method, system, device and medium based on deep learning
US20220237230A1 (en) System and method for automated file reporting
CN110109835B (en) Software defect positioning method based on deep neural network
WO2020214683A1 (en) Computer-implemented natural language understanding of medical reports
CN111026841B (en) Automatic coding method and device based on retrieval and deep learning
CN111180060A (en) Automatic coding method and device for disease diagnosis
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
WO2021208444A1 (en) Method and apparatus for automatically generating electronic cases, a device, and a storage medium
US12020786B2 (en) Model for health record classification
CN112183104B (en) Code recommendation method, system, corresponding equipment and storage medium
CN116719520B (en) Code generation method and device
CN110852076B (en) Method and device for automatic disease code conversion
JP5764942B2 (en) Information collation device, information collation system, information collation method, and information collation program
CN112655047A (en) Method for classifying medical records
CN113626591A (en) Electronic medical record data quality evaluation method based on text classification
US20140046694A1 (en) Systems and methods for synoptic element structured reporting
WO2014130287A1 (en) Method and system for propagating labels to patient encounter data
Singhal Predicting Congestive Heart failure using predictive analytics in AI
CN111081325A (en) Medical data processing method and device
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN111477321A (en) Treatment effect prediction system with self-learning capability and treatment effect prediction terminal
CN116258136A (en) Error detection model training method, medical image report detection method, system and equipment
CN110837494B (en) Method and device for identifying unspecified diagnosis coding errors of medical record home page
CN111063430B (en) Disease prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant