CN110032715A - A kind of method of disease code conversion - Google Patents

A kind of method of disease code conversion Download PDF

Info

Publication number
CN110032715A
CN110032715A CN201910215224.7A CN201910215224A CN110032715A CN 110032715 A CN110032715 A CN 110032715A CN 201910215224 A CN201910215224 A CN 201910215224A CN 110032715 A CN110032715 A CN 110032715A
Authority
CN
China
Prior art keywords
disease code
disease
code
test set
conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910215224.7A
Other languages
Chinese (zh)
Inventor
孙闯
火立龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Kindo Medical Data Technology Co Ltd
Original Assignee
Wuhan Kindo Medical Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Kindo Medical Data Technology Co Ltd filed Critical Wuhan Kindo Medical Data Technology Co Ltd
Priority to CN201910215224.7A priority Critical patent/CN110032715A/en
Publication of CN110032715A publication Critical patent/CN110032715A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/16Automatic learning of transformation rules, e.g. from examples
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention relates to a kind of methods of disease code conversion, comprising the following steps: S01: acquisition standard disease code and standard diagnostics describe corresponding each version of code, establish normal dictionary library;S02: test set is established in the disease code converted as needed and diagnosis description;S03: according to the normal dictionary library and the test set, term vector is formed;S04: extracting the top N encoded radio for the disease code converted, and obtains primary election disease code;S05: being directed to the term vector, calculates similarity value, obtains the primary election disease code of particular version corresponding with similarity maximum value;S06: according to clinical rules, the primary election disease code of the particular version of acquisition and the mapping relations for the disease code converted are verified, determines the disease code of conversion.The beneficial effects of the present invention are: the accuracy for the disease code for ensuring to convert, realizes the conversion between each version disease code.

Description

A kind of method of disease code conversion
Technical field
The present invention relates to a kind of methods of medicine, computer application technology more particularly to disease code conversion.
Background technique
International statistical classification (the International Classification of of diseases and related health problems Diseases, ICD), it is the disease for the international uniform that WHO (World Health Organization, the World Health Organization) formulates Sick classification method, it classifies disease according to characteristics such as the cause of disease of disease, pathology, clinical manifestation and anatomical positions, makes it The combination orderly as one, and the system indicated with the method for coding, it is the carrier for recording medical information, is to carry out doctor Treat the basis of data mining, medical diagnosis on disease grouping and performance appraisal, medical insurance DRG receipt and payment expense.
At home in medical institutions' practice, various regions have carried out different expansions to coding according to the characteristics of clinical disease, together When, for same disease, there is also the description sex differernces in version.For example, " A00.100 is suddenly in GB-2016 ICD-10 editions Disorderly, due to O1 group cholera vibrio, caused by El Tor biotype ", with BJ-V6.01 editions in " A00.101 El Tor biotype is suddenly Disorderly ", the two has differences on coding and term description;Thus there are multiple version disunity problems, drastically influence The excavation application of data interconnection intercommunication and medical data in industry.
Summary of the invention
The technical problem to be solved by the present invention is in view of the drawbacks of the prior art, provide a kind of side of disease code conversion Method.
The technical scheme to solve the above technical problems is that a kind of method of disease code conversion, including it is following Step:
S01: acquisition standard disease code and standard diagnostics describe corresponding each version of code, establish normal dictionary library, And classify according to different editions coding;
S02: test set is established in the disease code converted as needed and diagnosis description;
S03: according to the normal dictionary library and the test set, term vector is formed, vector space model is established;
S04: the top N encoded radio for the disease code converted is extracted, in the normal dictionary library Standard disease code described in each version is compared, and obtains the primary election disease with the consistent multiple versions of the top N encoded radio Coding;
S05: being directed to the term vector, calculates similarity value, obtains particular version corresponding with similarity maximum value The primary election disease code;
S06: it according to clinical rules, verifies the primary election disease code of the particular version of acquisition and is converted The mapping relations of the disease code determine the disease code of conversion.
The beneficial effects of the present invention are: forming term vector by establishing normal dictionary library and test set, vector sky is established Between model then by calculating similarity value obtain the primary election disease code of corresponding with similarity maximum value particular version, just Step determines the disease code of conversion;By verifying mapping relations according to clinical rules, it is ensured that the disease code of conversion it is accurate Degree, realizes the conversion between each version disease code.
Based on the above technical solution, the present invention can also be improved as follows.
Further, the standard diagnostics description includes that standard procedures and operation describe.
Further, the test set includes disease code test set and diagnosis text test set, wherein the disease code Test set is corresponding with the disease code converted, and the diagnosis text test set is corresponding with the diagnosis description.
Further, the step S03 specifically includes the following steps:
S03.1: it according to the normal dictionary library, is pre-processed according to medicine rule, and pretreated data is pressed Participle operation is carried out according to Chinese part-of-speech rule, removes stop words and repetitor, generates normal dictionary library word packet;
S03.2: it according to the test set, is pre-processed according to medicine rule, and to pretreated data according to the Chinese Language part-of-speech rule carries out participle operation, stop words and repetitor is removed, according to preconfigured thesaurus, to the synonymous of appearance Word carries out unification processing, generates test library word packet;
S03.3: not repeated vocabulary involved in the normal dictionary library word packet and the test library word packet is made For term word packet;
S03.4: term vector is formed according to the term word packet, establishes vector space model.
Further, the formula for calculating similarity value is,
Wherein,Indicate the term vector of i-th of normal dictionary term,Indicate the term of j-th of test set term to Amount.
The beneficial effect of above-mentioned further scheme is: utilizing cosine similarity scheduling algorithm, realizes different ICD (international diseases Disease classification) automatic conversion between version of code, greatly improve the efficiency and accuracy of code conversion.
Further, the clinical rules include position rule, cause of disease rule, art formula rule.
The beneficial effect of above-mentioned further scheme is: improving the primary election disease code of the particular version of acquisition and needs to carry out The accuracy of mapping relations verification between the disease code of conversion.
Further, N is the natural number greater than 3 or equal to 3, and N place value includes that the disease is compiled in the step S04 Including the decimal point of code.
The beneficial effect of above-mentioned further scheme is: improving matching degree and matching accuracy rate.
Further, further include after the disease code for determining conversion,
Medical expert end is sent by the disease code of the conversion to audit.
The beneficial effect of above-mentioned further scheme is: Optimized Coding Based conversion effect.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the method for disease code conversion of the present invention.
Specific embodiment
The principle and features of the present invention will be described below with reference to the accompanying drawings, and the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the invention.
As shown in Figure 1, a kind of method of disease code conversion comprising following steps:
S01: acquisition standard disease code and standard diagnostics describe corresponding each version of code, establish normal dictionary library, And classify according to different editions coding;
S02: test set is established in the disease code converted as needed and diagnosis description;
S03: according to the normal dictionary library and the test set, term vector is formed, vector space model is established;
S04: the top N encoded radio for the disease code converted is extracted, in the normal dictionary library Standard disease code described in each version is compared, and obtains the primary election disease with the consistent multiple versions of the top N encoded radio Coding;
S05: being directed to the term vector, calculates similarity value, obtains particular version corresponding with similarity maximum value The primary election disease code;
S06: it according to clinical rules, verifies the primary election disease code of the particular version of acquisition and is converted The mapping relations of the disease code determine the disease code of conversion.
The clinical rules include position rule, cause of disease rule, art formula rule.
Preferably, in the step S01, it is doctor couple that the standard diagnostics description, which includes that standard procedures and operation describe, The Main Diagnosis verbal description that patient writes.
In the step S02, the test set includes disease code test set and diagnosis text test set, wherein described Disease code test set is corresponding with the disease code converted, and the diagnosis text test set and the diagnosis describe It is corresponding.
The step S03 specifically includes the following steps:
S03.1: it according to the normal dictionary library, is pre-processed according to medicine rule, and pretreated data is pressed Participle operation is carried out according to Chinese part-of-speech rule, removes stop words and repetitor, generates normal dictionary library word packet;
S03.2: it according to the test set, is pre-processed according to medicine rule, and to pretreated data according to the Chinese Language part-of-speech rule carries out participle operation, stop words and repetitor is removed, according to preconfigured thesaurus, to the synonymous of appearance Word carries out unification processing, generates test library word packet;
S03.3: not repeated vocabulary involved in the normal dictionary library word packet and the test library word packet is made For term word packet;
It wherein, include a plurality of normal dictionary library term and a plurality of test term in the term word packet;
S03.4: term vector is formed according to the term word packet, establishes vector space model.
In the step S04, N is the natural number greater than 3 or equal to 3, and N place value includes the decimal of the disease code Including point.
Wherein, normal dictionary library term described in each is corresponding with normal dictionary library term vector, surveys described in each Examination term is corresponding with test term vector.
The mode for forming term vector is to use the mode of one-hot-encoding (one-hot encoding) to mark described in each Test term described in quasi- dictionary library term and each be respectively formed corresponding normal dictionary library term vector sum test term to Amount, to establish vector space model.
Preferably, in the step S05, the formula for calculating similarity value is,
Wherein,Indicate the term vector of i-th of normal dictionary term,Indicate the term of j-th of test set term to Amount.
Present invention innovation and application natural language recognition (NLP) technology in the conversion of ICD code identification, utilizes one-hot- Encoding constructs text vector spatial model, in combination with cosine similarity scheduling algorithm, realizes and turns between different coding version It changes, improves the efficiency of code conversion, lay a good foundation for medical data application (such as medical research, disease control expense manage).
Converter specifically is constructed according to the transformation rule and similarity algorithm of domain expert's configuration, when needing to newly arriving Textual diagnosis when carrying out code conversion, using this converter, i.e., the target version disease of exportable term to be converted is compiled Code realizes a key transcoding, and simple and convenient, accuracy is high.
Preferably, after the disease code for determining conversion, further include,
It sends medical expert end for the disease code of the conversion to audit, Optimized Coding Based conversion effect.
It is audited specifically, sending medical expert end for the disease code of the conversion, will wherein there is obvious problem Data, after amendment, repeat the above steps S03 to S06, and then continues to optimize the conversion effect of code conversion, improves work Accuracy.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of method of disease code conversion, which comprises the following steps:
S01: acquisition standard disease code and standard diagnostics describe corresponding each version of code, establish normal dictionary library, and press Classify according to different editions coding;
S02: test set is established in the disease code converted as needed and diagnosis description;
S03: according to the normal dictionary library and the test set, term vector is formed, vector space model is established;
S04: the top N encoded radio for the disease code converted is extracted, with each version in the normal dictionary library This described standard disease code is compared, and obtains the primary election disease code with the consistent multiple versions of the top N encoded radio;
S05: being directed to the term vector, calculates similarity value, obtains the described of particular version corresponding with similarity maximum value Primary election disease code;
S06: according to clinical rules, verify the primary election disease code of the particular version of acquisition with converted it is described The mapping relations of disease code determine the disease code of conversion.
2. the method for a kind of disease code conversion according to claim 1, it is characterised in that: the standard diagnostics description includes Standard procedures and operation describe.
3. a kind of method of disease code conversion according to claim 1, it is characterised in that: the test set includes that disease is compiled Code test set and diagnosis text test set, wherein the disease code test set is opposite with the disease code converted It answers, the diagnosis text test set is corresponding with the diagnosis description.
4. a kind of method of disease code conversion according to claim 1, it is characterised in that: the step S03 is specifically included Following steps:
S03.1: it according to the normal dictionary library, is pre-processed according to medicine rule, and to pretreated data according to the Chinese Language part-of-speech rule carries out participle operation, removes stop words and repetitor, generates normal dictionary library word packet;
S03.2: it according to the test set, is pre-processed according to medicine rule, and to pretreated data according to Chinese word Property rule carry out participle operation, remove stop words and repetitor, according to preconfigured thesaurus, to the synonym of appearance into The processing of row unification, generates test library word packet;
S03.3: not repeated vocabulary involved in the normal dictionary library word packet and the test library word packet is made as art Words and phrases packet;
S03.4: term vector is formed according to the term word packet, establishes vector space model.
5. a kind of method of disease code conversion according to claim 4, it is characterised in that: calculate the formula of similarity value For,
Wherein,Indicate the term vector of i-th of normal dictionary term,Indicate the term vector of j-th of test set term.
6. a kind of method of disease code conversion according to claim 1, it is characterised in that: the clinical rules include position Rule, cause of disease rule, art formula rule.
7. a kind of method of disease code conversion according to claim 1, it is characterised in that: in the step S04, N is big In 3 or the natural number equal to 3, and N place value is including the decimal point of the disease code.
8. a kind of method of disease code conversion according to claim 1, it is characterised in that: in the disease code for determining conversion Later, further include,
Medical expert end is sent by the disease code of the conversion to audit.
CN201910215224.7A 2019-03-21 2019-03-21 A kind of method of disease code conversion Pending CN110032715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910215224.7A CN110032715A (en) 2019-03-21 2019-03-21 A kind of method of disease code conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910215224.7A CN110032715A (en) 2019-03-21 2019-03-21 A kind of method of disease code conversion

Publications (1)

Publication Number Publication Date
CN110032715A true CN110032715A (en) 2019-07-19

Family

ID=67236346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910215224.7A Pending CN110032715A (en) 2019-03-21 2019-03-21 A kind of method of disease code conversion

Country Status (1)

Country Link
CN (1) CN110032715A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827929A (en) * 2019-11-05 2020-02-21 中山大学 Disease classification code recognition method and device, computer equipment and storage medium
CN112233794A (en) * 2020-10-20 2021-01-15 吾征智能技术(北京)有限公司 Disease information matching system based on hematuria information
CN112632910A (en) * 2020-12-21 2021-04-09 北京惠及智医科技有限公司 Operation encoding method, electronic device and storage device
CN113705166A (en) * 2021-07-28 2021-11-26 浙江太美医疗科技股份有限公司 Method and device for encoding medical events

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844308A (en) * 2017-01-20 2017-06-13 天津艾登科技有限公司 A kind of use semantics recognition carries out the method for automating disease code conversion
CN108182207A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese surgical procedure based on participle network
CN108446260A (en) * 2018-02-06 2018-08-24 天津艾登科技有限公司 The method and system of automation disease code conversion are carried out based on semantic approximate match algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844308A (en) * 2017-01-20 2017-06-13 天津艾登科技有限公司 A kind of use semantics recognition carries out the method for automating disease code conversion
CN108182207A (en) * 2017-12-15 2018-06-19 上海长江科技发展有限公司 The intelligent coding method and system of Chinese surgical procedure based on participle network
CN108446260A (en) * 2018-02-06 2018-08-24 天津艾登科技有限公司 The method and system of automation disease code conversion are carried out based on semantic approximate match algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鲍庆升等: "基于文本分析的自动化疾病编码方法", 《计算机系统应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827929A (en) * 2019-11-05 2020-02-21 中山大学 Disease classification code recognition method and device, computer equipment and storage medium
CN110827929B (en) * 2019-11-05 2022-06-07 中山大学 Disease classification code recognition method and device, computer equipment and storage medium
CN112233794A (en) * 2020-10-20 2021-01-15 吾征智能技术(北京)有限公司 Disease information matching system based on hematuria information
CN112632910A (en) * 2020-12-21 2021-04-09 北京惠及智医科技有限公司 Operation encoding method, electronic device and storage device
CN113705166A (en) * 2021-07-28 2021-11-26 浙江太美医疗科技股份有限公司 Method and device for encoding medical events

Similar Documents

Publication Publication Date Title
CN109582955B (en) Method, apparatus and medium for standardizing medical terms
CN107680600B (en) Sound-groove model training method, audio recognition method, device, equipment and medium
CN106997376B (en) Question and answer sentence similarity calculation method based on multi-level features
US20190287684A1 (en) Medical system interface apparatus and methods to classify and provide medical data using artificial intelligence
CN107833603B (en) Electronic medical record document classification method and device, electronic equipment and storage medium
CN110032715A (en) A kind of method of disease code conversion
CN106897545B (en) A kind of tumor prognosis forecasting system based on depth confidence network
CN109783479B (en) Data standardization processing method and device and storage medium
US11501178B2 (en) Data processing method, medical term processing system and medical diagnostic system
CN107680689A (en) Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
US11275934B2 (en) Positional embeddings for document processing
CN104021302A (en) Auxiliary registration method based on Bayes text classification model
CN111191415A (en) Operation classification coding method based on original operation data
CN111292814A (en) Medical data standardization method and device
CN116127056A (en) Medical dialogue abstracting method with multi-level characteristic enhancement
CN112949308A (en) Method and system for identifying named entities of Chinese electronic medical record based on functional structure
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN116911300A (en) Language model pre-training method, entity recognition method and device
CN109284491A (en) Medicine text recognition method, sentence identification model training method
CN111104481B (en) Method, device and equipment for identifying matching field
CN116976350A (en) Small sample medical entity identification method based on boundary and mutual information enhancement
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
CN116258685A (en) Multi-organ segmentation method and device for simultaneous extraction and fusion of global and local features
CN109859813A (en) A kind of entity modification word recognition method and device
CN111651575B (en) Session text processing method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190719

RJ01 Rejection of invention patent application after publication