CN110032715A - A kind of method of disease code conversion - Google Patents
A kind of method of disease code conversion Download PDFInfo
- Publication number
- CN110032715A CN110032715A CN201910215224.7A CN201910215224A CN110032715A CN 110032715 A CN110032715 A CN 110032715A CN 201910215224 A CN201910215224 A CN 201910215224A CN 110032715 A CN110032715 A CN 110032715A
- Authority
- CN
- China
- Prior art keywords
- disease
- test set
- standard
- disease code
- rules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 89
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000006243 chemical reaction Methods 0.000 title abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 32
- 238000003745 diagnosis Methods 0.000 claims abstract description 10
- 238000013507 mapping Methods 0.000 claims abstract description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 6
- 230000000694 effects Effects 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 206010008631 Cholera Diseases 0.000 description 1
- 241000607626 Vibrio cholerae Species 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229940118696 vibrio cholerae Drugs 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/16—Automatic learning of transformation rules, e.g. from examples
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The present invention relates to a kind of methods of disease code conversion, comprising the following steps: S01: acquisition standard disease code and standard diagnostics describe corresponding each version of code, establish normal dictionary library;S02: test set is established in the disease code converted as needed and diagnosis description;S03: according to the normal dictionary library and the test set, term vector is formed;S04: extracting the top N encoded radio for the disease code converted, and obtains primary election disease code;S05: being directed to the term vector, calculates similarity value, obtains the primary election disease code of particular version corresponding with similarity maximum value;S06: according to clinical rules, the primary election disease code of the particular version of acquisition and the mapping relations for the disease code converted are verified, determines the disease code of conversion.The beneficial effects of the present invention are: the accuracy for the disease code for ensuring to convert, realizes the conversion between each version disease code.
Description
Technical Field
The invention relates to the technical field of medical science and computer application, in particular to a disease code conversion method.
Background
International Classification of diseases and related Health Issues (ICD) is an International unified disease Classification method established by WHO (World Health Organization), which classifies diseases into classes according to characteristics of disease etiology, pathology, clinical manifestation, anatomical location and the like, so that the diseases become an ordered combination and are expressed by a coding method, which is a carrier for recording medical information and is a basis for developing medical data mining, disease diagnosis grouping and performance evaluation, and medical insurance DRG collection and payment.
In the practice of domestic medical institutions, different extensions are made to the codes according to the characteristics of clinical diseases in various places, and meanwhile, for the same disease, descriptive differences in version also exist. For example, in GB-2016 ICD-10 edition, "A00.100 cholera, due to O1 group Vibrio cholerae, biotype Ellto", and "A00.101 biotype Ellto" in BJ-V6.01 edition, both differ in coding and in terms of description; therefore, the problem of non-uniformity of multiple versions occurs, and data interconnection and intercommunication and medical data mining application in the industry are seriously influenced.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for transcoding diseases, aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows: a method of disease transcoding comprising the steps of:
s01: collecting coding versions corresponding to standard disease codes and standard diagnosis descriptions, establishing a standard dictionary library, and classifying according to codes of different versions;
s02: establishing a test set according to the disease codes and diagnosis descriptions which need to be converted;
s03: forming term vectors according to the standard dictionary library and the test set, and establishing a vector space model;
s04: extracting the first N-bit coding value of the disease code to be converted, comparing the first N-bit coding value with the standard disease codes of each version in the standard dictionary library, and acquiring a plurality of versions of initially selected disease codes consistent with the first N-bit coding value;
s05: calculating a similarity value aiming at the term vector, and acquiring the initial selection disease code of a specific version corresponding to the maximum similarity value;
s06: and checking the mapping relation between the acquired initial selected disease codes of the specific version and the disease codes needing to be converted according to clinical rules, and determining the converted disease codes.
The invention has the beneficial effects that: forming term vectors by establishing a standard dictionary library and a test set, establishing a vector space model, then obtaining a primary selected disease code of a specific version corresponding to the maximum value of the similarity by calculating the similarity value, and primarily determining a converted disease code; the mapping relation is verified according to clinical rules, so that the accuracy of the converted disease codes is ensured, and the conversion among the disease codes of all versions is realized.
On the basis of the technical scheme, the invention can be further improved as follows.
Further: the standard diagnostic description includes standard surgical and operational descriptions.
Further: the test set comprises a disease code test set and a diagnostic text test set, wherein the disease code test set corresponds to a disease code to be converted, and the diagnostic text test set corresponds to the diagnostic description.
Further: the step S03 specifically includes the following steps:
s03.1: preprocessing according to the standard dictionary database and medical rules, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, and generating a standard dictionary database word packet;
s03.2: preprocessing according to the medical rules according to the test set, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, performing consistency processing on the appeared synonyms according to a preset synonym library, and generating a test library word package;
s03.3: gathering the non-repeated words related in the standard dictionary word packet and the test library word packet to be used as a term word packet;
s03.4: and forming term vectors according to the term word packet, and establishing a vector space model.
Further: the formula for calculating the similarity value is as follows,
wherein,a term vector representing the ith standard dictionary term,a term vector representing the jth test set term.
The beneficial effects of the further scheme are as follows: by utilizing algorithms such as cosine similarity and the like, automatic conversion among different ICD (international disease classification) coding versions is realized, and the efficiency and the accuracy of coding conversion are greatly improved.
Further: the clinical rules include location rules, etiology rules, and surgical rules.
The beneficial effects of the further scheme are as follows: the accuracy of checking the mapping relation between the acquired initial selected disease codes of the specific version and the disease codes needing to be converted is improved.
Further: in step S04, N is a natural number greater than or equal to 3, and the value of N includes the decimal point of the disease code.
The beneficial effects of the further scheme are as follows: the matching degree and the matching accuracy are improved.
Further: after determining the transformed disease code, further comprising,
and sending the converted disease codes to a medical expert end for auditing.
The beneficial effects of the further scheme are as follows: and optimizing the transcoding effect.
Drawings
FIG. 1 is a flow chart of a method of disease transcoding in accordance with the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a method of disease transcoding comprises the steps of:
s01: collecting coding versions corresponding to standard disease codes and standard diagnosis descriptions, establishing a standard dictionary library, and classifying according to codes of different versions;
s02: establishing a test set according to the disease codes and diagnosis descriptions which need to be converted;
s03: forming term vectors according to the standard dictionary library and the test set, and establishing a vector space model;
s04: extracting the first N-bit coding value of the disease code to be converted, comparing the first N-bit coding value with the standard disease codes of each version in the standard dictionary library, and acquiring a plurality of versions of initially selected disease codes consistent with the first N-bit coding value;
s05: calculating a similarity value aiming at the term vector, and acquiring the initial selection disease code of a specific version corresponding to the maximum similarity value;
s06: and checking the mapping relation between the acquired initial selected disease codes of the specific version and the disease codes needing to be converted according to clinical rules, and determining the converted disease codes.
The clinical rules include location rules, etiology rules, and surgical rules.
Preferably, in step S01, the standard diagnosis description includes a standard operation and operation description, which is a main diagnostic text description written by a doctor for a patient.
In step S02, the test set includes a disease code test set and a diagnostic text test set, where the disease code test set corresponds to a disease code to be converted, and the diagnostic text test set corresponds to the diagnostic description.
The step S03 specifically includes the following steps:
s03.1: preprocessing according to the standard dictionary database and medical rules, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, and generating a standard dictionary database word packet;
s03.2: preprocessing according to the medical rules according to the test set, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, performing consistency processing on the appeared synonyms according to a preset synonym library, and generating a test library word package;
s03.3: gathering the non-repeated words related in the standard dictionary word packet and the test library word packet to be used as a term word packet;
wherein, the term packet comprises a plurality of standard dictionary library terms and a plurality of test terms;
s03.4: and forming term vectors according to the term word packet, and establishing a vector space model.
In step S04, N is a natural number greater than or equal to 3, and the value of N includes the decimal point of the disease code.
Each standard dictionary base term corresponds to a standard dictionary base term vector, and each test term corresponds to a test term vector.
The term vectors are formed in a one-hot-encoding manner, and corresponding standard dictionary library term vectors and test term vectors are respectively formed for each standard dictionary library term and each test term so as to establish a vector space model.
Preferably, in step S05, the similarity value is calculated by the formula,
wherein,a term vector representing the ith standard dictionary term,a term vector representing the jth test set term.
The invention innovatively applies natural language identification (NLP) technology in ICD coding identification conversion, utilizes one-hot-encoding to construct a text vector space model, and simultaneously combines algorithms such as cosine similarity and the like to realize conversion among different coding versions, improve coding conversion efficiency and lay a foundation for medical data application (such as medical research and disease control fee management).
Specifically, a converter is constructed according to a conversion rule configured by a domain expert and a similarity algorithm, when code conversion needs to be carried out on new character diagnosis, the converter can be used for outputting target version disease codes of terms to be converted, one-key transcoding is realized, and the method is simple and convenient and has high accuracy.
Preferably, after determining the transformed disease code, further comprising,
and sending the converted disease codes to a medical expert end for auditing, and optimizing the code conversion effect.
Specifically, the converted disease codes are sent to a medical expert for auditing, data with obvious problems are corrected, and the steps from S03 to S06 are repeated, so that the conversion effect of code conversion is continuously optimized, and the accuracy of work is improved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. A method of disease transcoding comprising the steps of:
s01: collecting coding versions corresponding to standard disease codes and standard diagnosis descriptions, establishing a standard dictionary library, and classifying according to codes of different versions;
s02: establishing a test set according to the disease codes and diagnosis descriptions which need to be converted;
s03: forming term vectors according to the standard dictionary library and the test set, and establishing a vector space model;
s04: extracting the first N-bit coding value of the disease code to be converted, comparing the first N-bit coding value with the standard disease codes of each version in the standard dictionary library, and acquiring a plurality of versions of initially selected disease codes consistent with the first N-bit coding value;
s05: calculating a similarity value aiming at the term vector, and acquiring the initial selection disease code of a specific version corresponding to the maximum similarity value;
s06: and checking the mapping relation between the acquired initial selected disease codes of the specific version and the disease codes needing to be converted according to clinical rules, and determining the converted disease codes.
2. The method of disease transcoding of claim 1, wherein: the standard diagnostic description includes standard surgical and operational descriptions.
3. The method of disease transcoding of claim 1, wherein: the test set comprises a disease code test set and a diagnostic text test set, wherein the disease code test set corresponds to a disease code to be converted, and the diagnostic text test set corresponds to the diagnostic description.
4. The method of disease transcoding of claim 1, wherein: the step S03 specifically includes the following steps:
s03.1: preprocessing according to the standard dictionary database and medical rules, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, and generating a standard dictionary database word packet;
s03.2: preprocessing according to the medical rules according to the test set, performing word segmentation operation on the preprocessed data according to Chinese part-of-speech rules, removing stop words and repeated words, performing consistency processing on the appeared synonyms according to a preset synonym library, and generating a test library word package;
s03.3: gathering the non-repeated words related in the standard dictionary word packet and the test library word packet to be used as a term word packet;
s03.4: and forming term vectors according to the term word packet, and establishing a vector space model.
5. The method of disease transcoding of claim 4, wherein: the formula for calculating the similarity value is as follows,
wherein,a term vector representing the ith standard dictionary term,a term vector representing the jth test set term.
6. The method of disease transcoding of claim 1, wherein: the clinical rules include location rules, etiology rules, and surgical rules.
7. The method of disease transcoding of claim 1, wherein: in step S04, N is a natural number greater than or equal to 3, and the value of N includes the decimal point of the disease code.
8. The method of disease transcoding of claim 1, wherein: after determining the transformed disease code, further comprising,
and sending the converted disease codes to a medical expert end for auditing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910215224.7A CN110032715A (en) | 2019-03-21 | 2019-03-21 | A kind of method of disease code conversion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910215224.7A CN110032715A (en) | 2019-03-21 | 2019-03-21 | A kind of method of disease code conversion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110032715A true CN110032715A (en) | 2019-07-19 |
Family
ID=67236346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910215224.7A Pending CN110032715A (en) | 2019-03-21 | 2019-03-21 | A kind of method of disease code conversion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032715A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827929A (en) * | 2019-11-05 | 2020-02-21 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN112233794A (en) * | 2020-10-20 | 2021-01-15 | 吾征智能技术(北京)有限公司 | Disease information matching system based on hematuria information |
CN112632910A (en) * | 2020-12-21 | 2021-04-09 | 北京惠及智医科技有限公司 | Operation encoding method, electronic device and storage device |
CN113705166A (en) * | 2021-07-28 | 2021-11-26 | 浙江太美医疗科技股份有限公司 | Method and device for encoding medical events |
CN114077837A (en) * | 2020-08-10 | 2022-02-22 | 卫宁健康科技集团股份有限公司 | Method and system for converting disease codes, electronic device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844308A (en) * | 2017-01-20 | 2017-06-13 | 天津艾登科技有限公司 | A kind of use semantics recognition carries out the method for automating disease code conversion |
CN108182207A (en) * | 2017-12-15 | 2018-06-19 | 上海长江科技发展有限公司 | The intelligent coding method and system of Chinese surgical procedure based on participle network |
CN108446260A (en) * | 2018-02-06 | 2018-08-24 | 天津艾登科技有限公司 | The method and system of automation disease code conversion are carried out based on semantic approximate match algorithm |
-
2019
- 2019-03-21 CN CN201910215224.7A patent/CN110032715A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844308A (en) * | 2017-01-20 | 2017-06-13 | 天津艾登科技有限公司 | A kind of use semantics recognition carries out the method for automating disease code conversion |
CN108182207A (en) * | 2017-12-15 | 2018-06-19 | 上海长江科技发展有限公司 | The intelligent coding method and system of Chinese surgical procedure based on participle network |
CN108446260A (en) * | 2018-02-06 | 2018-08-24 | 天津艾登科技有限公司 | The method and system of automation disease code conversion are carried out based on semantic approximate match algorithm |
Non-Patent Citations (1)
Title |
---|
鲍庆升等: "基于文本分析的自动化疾病编码方法", 《计算机系统应用》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827929A (en) * | 2019-11-05 | 2020-02-21 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN110827929B (en) * | 2019-11-05 | 2022-06-07 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN114077837A (en) * | 2020-08-10 | 2022-02-22 | 卫宁健康科技集团股份有限公司 | Method and system for converting disease codes, electronic device and storage medium |
CN112233794A (en) * | 2020-10-20 | 2021-01-15 | 吾征智能技术(北京)有限公司 | Disease information matching system based on hematuria information |
CN112632910A (en) * | 2020-12-21 | 2021-04-09 | 北京惠及智医科技有限公司 | Operation encoding method, electronic device and storage device |
CN113705166A (en) * | 2021-07-28 | 2021-11-26 | 浙江太美医疗科技股份有限公司 | Method and device for encoding medical events |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032715A (en) | A kind of method of disease code conversion | |
CN109920501B (en) | Electronic medical record classification method and system based on convolutional neural network and active learning | |
CN106844308B (en) | Method for automatic disease code conversion using semantic recognition | |
CN109741806B (en) | Auxiliary generation method and device for medical image diagnosis report | |
JP5098559B2 (en) | Similar image search device and similar image search program | |
CN110047584A (en) | Hospital distributing diagnosis method, system, device and medium based on deep learning | |
WO2021046536A1 (en) | Automated information extraction and enrichment in pathology report using natural language processing | |
US20170147753A1 (en) | Method for searching for similar case of multi-dimensional health data and apparatus for the same | |
CN111180062A (en) | Disease classification coding intelligent recommendation method based on original diagnosis data | |
US20130144651A1 (en) | Determining one or more probable medical codes using medical claims | |
CN111814463B (en) | International disease classification code recommendation method and system, corresponding equipment and storage medium | |
CN111177356B (en) | Acid-base index medical big data analysis method and system | |
CN111191415A (en) | Operation classification coding method based on original operation data | |
CN113284572A (en) | Multi-modal heterogeneous medical data processing method and related device | |
WO2014130287A1 (en) | Method and system for propagating labels to patient encounter data | |
CN111259664B (en) | Method, device and equipment for determining medical text information and storage medium | |
CN114358001A (en) | Method for standardizing diagnosis result, and related device, equipment and storage medium thereof | |
CN113823414B (en) | Main diagnosis and main operation matching detection method, device, computing equipment and storage medium | |
Moldwin et al. | Empirical findings on the role of structured data, unstructured data, and their combination for automatic clinical phenotyping | |
US8473314B2 (en) | Method and system for determining precursors of health abnormalities from processing medical records | |
CN109859813B (en) | Entity modifier recognition method and device | |
CN116741358A (en) | Inquiry registration recommendation method, inquiry registration recommendation device, inquiry registration recommendation equipment and storage medium | |
CN112836006B (en) | Multi-diagnostic intelligent coding method, system, medium and equipment | |
CN112992303A (en) | Human phenotype standard expression extraction method | |
CN118016263B (en) | Digital medical assistant system based on voice recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |
|
RJ01 | Rejection of invention patent application after publication |