CN109994215A - Disease automatic coding system, method, equipment and storage medium - Google Patents
Disease automatic coding system, method, equipment and storage medium Download PDFInfo
- Publication number
- CN109994215A CN109994215A CN201910338773.3A CN201910338773A CN109994215A CN 109994215 A CN109994215 A CN 109994215A CN 201910338773 A CN201910338773 A CN 201910338773A CN 109994215 A CN109994215 A CN 109994215A
- Authority
- CN
- China
- Prior art keywords
- target object
- word
- coding
- disease
- disease name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of disease automatic coding system, method, equipment and storage mediums, wherein method includes: acquisition target object, and target object is disease name to be encoded or disease description;Coding relevant to target object is filtered out from disease code library, and candidate code collection is formed by the coding filtered out;The semantic relation that each coding corresponding disease name and target object is concentrated based on candidate code is concentrated from candidate code and determines the corresponding coding of target object.Disease automatic coding provided by the present application can it is automatic, the corresponding coding of disease name or disease description to be encoded is accurately and efficiently determined from disease code library.
Description
Technical field
This application involves medical data coding techniques field more particularly to a kind of disease automatic coding systems, method, equipment
And storage medium.
Background technique
International Classification of Diseases (international classification of disease, ICD) is as disease and has
The international statistical classification standard for closing health problem, is the important component of health information standards system.Currently, ICD has 43 kinds
The translation of different language, the whole world have 117 using the country of ICD, and ICD is widely used in medical institutions, medical insurance, population
The departments such as management and patient information are collected and statistical analysis, and the Health Service Expenditure expenditure in the whole world about 70% carries out medical branch according to ICD
Pay and Health Resource.
For the ease of being stored, being retrieved to disease data and analyzed, need according to ICD coding rule, by clinical diagnosis
In disease name perhaps disease description is converted to coding and by the process that disease name or disease description are converted to coding is
Disease code, the essence of disease code are to determine volume corresponding with disease name or disease description from disease code library
Code.
Current disease code mode is mostly h coding's mode, i.e., is retouched by coder according to disease name or disease
It states and determines coding corresponding with disease name or disease description from disease code library.However, h coding's mode subjectivity
It is relatively strong, it will affect coding accuracy, and h coding's mode is time-consuming and laborious, i.e. cost of labor and time cost is higher.
Summary of the invention
In view of this, this application provides a kind of disease automatic coding system, method, equipment and storage mediums, to solve
H coding's mode certainly in the prior art is subjective, will affect coding accuracy, and h coding's mode is time-consuming and laborious,
Lead to cost of labor and the higher problem of time cost, its technical solution is as follows:
A kind of disease automatic coding, comprising:
Target object is obtained, the target object is disease name or disease description;
Coding relevant to the target object is filtered out from disease code library, and candidate volume is formed by the coding filtered out
Code collection;
The corresponding disease name of each candidate code and the semantic of the target object is concentrated to close based on the candidate code
System concentrates from the candidate code and determines the corresponding coding of the target object.
It is optionally, described that coding relevant to the target object is filtered out from disease code library, comprising:
Based on the corresponding disease name of codings all kinds of in the target object and the disease code library, the target is determined
Target text statistical nature of the object for all kinds of codings, wherein the target text that the target object encodes any sort
Statistical nature is used to characterize the degree of correlation of such coding and the target object;
Based on the target object for the target text statistical nature of all kinds of codings, screened from the disease code library
Coding relevant to the target object out.
It is optionally, described based on the corresponding disease name of codings all kinds of in the target object and the disease code library,
Determine the target object for the target text statistical nature of all kinds of codings, comprising:
Coding in the disease code library is classified, multiple coded sets, the corresponding volume of each coded set are obtained
Code classification;
Based on the target object and the corresponding disease name of each coded set, determine the target object for each volume
The the first text statistical nature, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th of code collection
Text statistical nature;Wherein, the corresponding disease name of any coded set includes the corresponding disease name of each coding in the coded set
Claim, the target object counts the first text statistical nature, the second text statistical nature, third text of any coded set
Feature, the 4th text statistical nature are respectively used to characterize each word in the target object and appear in the corresponding disease of the coded set
Name of disease claim in frequency, each word in the target object appears in the document of the coded set corresponding disease name composition
Term frequency-inverse document frequency, the text similarity of target object disease name corresponding with the coded set, the target pair
As the matching degree of keyword and qualifier in disease name corresponding with the coded set;
Special is counted for the first text statistical nature of each coded set, and/or the second text based on the target object
Sign, and/or third text statistical nature, and/or the 4th text statistical nature, determine the target object for all kinds of codings
Target text statistical nature.
Optionally, for any coded set, it is based on target object disease name corresponding with the coded set, determines institute
Target object is stated for the first text statistical nature of the coded set, comprising:
Obtain the weight of each word in the first word set, wherein first word set is by carrying out at duplicate removal the second word set
Reason obtains, and second word set is the collection that the word obtained after word segmentation processing composition is carried out to the corresponding disease name of the coded set
It closes, time that the weight of each word occurs in second word set by each word in first word set in first word set
Number determines;
Target word set is obtained, and determines that the target word concentrates each word based on the weight of each word in first word set
Weight, wherein the target word set be to the target object carry out word segmentation processing after obtain word composition set;
The weight that each word is concentrated by the target word determines the target object for the first text of the coded set
Statistical nature.
Optionally, for any coded set, it is based on target object disease name corresponding with the coded set, determines institute
Target object is stated for the second text statistical nature of the coded set, comprising:
The corresponding disease document of the coded set is obtained, the corresponding disease document of the coded set is by the corresponding disease of the coded set
Title forms;
Target word set is obtained, and each word for determining that the target word is concentrated appears in the corresponding disease document of the coded set
Term frequency-inverse document frequency, wherein the target word set is that obtained word forms after carrying out word segmentation processing to the target object
Set;
The term frequency-inverse document frequency of the corresponding disease document of the coded set is appeared in by each word that the target word is concentrated
Degree, determines the target object for the second text statistical nature of the coded set.
Optionally, for any coded set, it is based on target object disease name corresponding with the coded set, determines institute
Target object is stated for the third text statistical nature of the coded set, comprising:
Calculate separately the editing distance of target object disease name corresponding with the coded set;
By the editing distance of target object disease name corresponding with the coded set, the target object pair is determined
In the third text statistical nature of the coded set.
Optionally, for any coded set, it is based on target object disease name corresponding with the coded set, determines institute
Target object is stated for the 4th text statistical nature of the coded set, comprising:
Obtain the corresponding attributed graph of the coded set, wherein the attributed graph includes main word and attribute word, the main word
For the keyword in the corresponding disease name of the coded set, the attribute word is the qualifier of the main word;
By in target object attributed graph corresponding with the coded set main word and attribute word match;
Based on the match condition of target object attributed graph corresponding with the coded set, determine the target object for
4th text statistical nature of the coded set.
Optionally, described that the corresponding disease name of each candidate code and the target pair are concentrated based on the candidate code
The semantic relation of elephant is concentrated from the candidate code and determines the corresponding coding of the target object, comprising:
Based on the semantic analog information of each candidate code corresponding disease name and the target object, institute is determined
State the semantic vector of the corresponding disease name of each candidate code;
Based on the semantic vector of the corresponding disease name of each candidate code, is concentrated from the candidate code and determine institute
State the corresponding coding of target object.
Optionally, described similar to the semanteme of the target object based on the corresponding disease name of each candidate code
Information determines the semantic vector of the corresponding disease name of each candidate code, comprising:
For any candidate code:
Semanteme based on each character in each character in the corresponding disease name of the candidate code and the target object
Similarity determines the semantic weight of each character in the corresponding disease name of the candidate code;
Based on the semantic vector and semantic weight of each character in the corresponding disease name of the candidate code, the coding is determined
The semantic vector of corresponding disease name;
To obtain the semantic vector of the corresponding disease name of each candidate code.
Optionally, it is described based on each character in the corresponding disease name of the candidate code with it is each in the target object
The semantic similarity of character determines the semantic weight of each character in the corresponding disease name of the candidate code, comprising:
It determines every in the semantic vector and the target object of each character in the corresponding disease name of the candidate code
The semantic vector of a character;
Any character in disease name corresponding for the candidate code calculates separately semantic vector and the institute of the character
The similarity for stating the semantic vector of each character of target object makees the maximum similarity in the multiple similarities being calculated
For the semantic weight of the character, to obtain the semantic weight of each character in the corresponding disease name of the coding.
Optionally, the semantic vector based on the corresponding disease name of each candidate code candidate is compiled from described
The corresponding coding of the target object is determined in code collection, comprising:
By the semantic vector of the corresponding disease name of each candidate code, determine each candidate code
Point, wherein the score of any candidate code can characterize the language of the candidate code corresponding disease name and the target object
Adopted similarity degree;
The candidate code of highest scoring is determined as the corresponding coding of the target object.
A kind of disease automatic coding system, comprising: obtain module, coding scalping module, coding dusting cover module;
The acquisition module, for obtaining target object, the target object is disease name or disease description;
The coding scalping module, for filtering out coding relevant to the target object from disease code library, by
The coding composition candidate code collection filtered out;
The coding dusting cover module, for based on the candidate code concentrate the corresponding disease name of each candidate code with
The semantic relation of the target object is concentrated from the candidate code and determines the corresponding coding of the target object.
Optionally, the coding scalping module includes: characteristic determination module and correlative coding screening module;
The characteristic determination module, for corresponding based on all kinds of codings in the target object and the disease code library
Disease name determines the target object for the target text statistical nature of all kinds of codings, wherein the target object for
The target text statistical nature of any sort coding is used to characterize the degree of correlation of such coding and the target object;
The correlative coding screening module, for counting special based on target text of the target object for all kinds of codings
Sign, filters out coding relevant to the target object from the disease code library.
Optionally, the characteristic determination module includes: that coding specification submodule, fisrt feature determine submodule and second special
It levies and determines submodule;
The coding specification submodule obtains multiple codings for the coding in the disease code library to be classified
Collection, the corresponding coding sorts of each coded set;
The fisrt feature determines submodule, for being based on the target object and the corresponding disease name of each coded set
Claim, determine the target object for the first text statistical nature of each coded set, and/or the second text statistical nature and/
Or third text statistical nature, and/or the 4th text statistical nature;Wherein, the corresponding disease name of any coded set includes being somebody's turn to do
The corresponding disease name of each coding in coded set, the target object for any coded set the first text statistical nature,
Second text statistical nature, third text statistical nature, the 4th text statistical nature are respectively used to characterize in the target object
Each word appear in the frequency in the corresponding disease name of the coded set, each word in the target object appears in the volume
Term frequency-inverse document frequency, the target object in the document of the corresponding disease name composition of code collection is corresponding with the coded set
Keyword and qualifier in the text similarity of disease name, the target object disease name corresponding with the coded set
Matching degree;
The second feature determines submodule, for being united based on first text of the target object for each coded set
Feature, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th text statistical nature are counted, is determined
Target text statistical nature of the target object for all kinds of codings.
Optionally, the fisrt feature determines submodule for any coded set, based on the target object and the coding
Collect corresponding disease name, when determining first text statistical nature of the target object for the coded set, be specifically used for:
Obtain the weight of each word in the first word set, wherein first word set is by carrying out at duplicate removal the second word set
Reason obtains, and second word set is the collection that the word obtained after word segmentation processing composition is carried out to the corresponding disease name of the coded set
It closes, time that the weight of each word occurs in second word set by each word in first word set in first word set
Number determines;Target word set is obtained, and determines that the target word concentrates each word based on the weight of each word in first word set
Weight, wherein the target word set be to the target object carry out word segmentation processing after obtain word composition set;Pass through
The target word concentrates the weight of each word, determines the target object for the first text statistical nature of the coded set.
Optionally, the fisrt feature determines submodule for any coded set, based on the target object and the coding
Collect corresponding disease name, when determining second text statistical nature of the target object for the coded set, be specifically used for:
The corresponding disease document of the coded set is obtained, the corresponding disease document of the coded set is by the corresponding disease of the coded set
Title forms;Target word set is obtained, and determines that each word that the target word is concentrated appears in the corresponding disease text of the coded set
The term frequency-inverse document frequency of shelves, wherein the target word set is the phrase for obtain after word segmentation processing to the target object
At set;The term frequency-inverse document of the corresponding disease document of the coded set is appeared in by each word that the target word is concentrated
Frequency determines the target object for the second text statistical nature of the coded set.
Optionally, the fisrt feature determines submodule for any coded set, based on the target object and the coding
Collect corresponding disease name, when determining third text statistical nature of the target object for the coded set, be specifically used for:
Calculate separately the editing distance of target object disease name corresponding with the coded set;Pass through the target pair
As the editing distance of disease name corresponding with the coded set, determine that the target object unites for the third text of the coded set
Count feature.
Optionally, the fisrt feature determines submodule for any coded set, based on the target object and the coding
Collect corresponding disease name, when determining fourth text statistical nature of the target object for the coded set, be specifically used for:
Obtain the corresponding attributed graph of the coded set, wherein the attributed graph includes main word and attribute word, the main word
For the keyword in the corresponding disease name of the coded set, the attribute word is the qualifier of the main word;By the target
Main word and attribute word in object attributed graph corresponding with the coded set are matched;Based on the target object and the coding
The match condition for collecting corresponding attributed graph determines the target object for the 4th text statistical nature of the coded set.
Optionally, the coding dusting cover module includes: semantic vector determining module and coding screening module;
The semantic vector determining module, for based on each corresponding disease name of candidate code and the target
The semantic analog information of object determines the semantic vector of the corresponding disease name of each candidate code;
The coding screening module, for the semantic vector based on the corresponding disease name of each candidate code, from
The candidate code, which is concentrated, determines the corresponding coding of the target object.
Optionally, the semantic vector determining module includes: that weight determines that submodule and semantic vector determine submodule;
The weight determines submodule, for being based on the corresponding disease name of the candidate code for any candidate code
In in each character and the target object each character semantic similarity, determine in the corresponding disease name of the candidate code
The semantic weight of each character;
The semantic vector determines submodule, for the language based on each character in the corresponding disease name of the candidate code
Adopted vector sum semantic weight, determines the semantic vector of the corresponding disease name of the coding.
Optionally, the weight determines submodule, is specifically used for determining each in the corresponding disease name of the candidate code
The semantic vector of each character in the semantic vector of character and the target object;Disease name corresponding for the candidate code
Any character in title calculates separately the phase of the semantic vector of the semantic vector of the character and each character of the target object
Like degree, using the maximum similarity in the multiple similarities being calculated as the semantic weight of the character, to obtain the coding pair
The semantic weight of each character in the disease name answered.
Optionally, the coding screening module, specifically for passing through the corresponding disease name of each candidate code
Semantic vector, determines the score of each candidate code, and the candidate code of highest scoring is determined as the target object pair
The coding answered;Wherein, the score of any candidate code can characterize the corresponding disease name of the candidate code and the target pair
The semantic similarity degree of elephant.
A kind of disease autocoding equipment, comprising: memory and processor;
The memory, for storing program;
The processor realizes each step of the disease automatic coding for executing described program.
A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor
Each step of the existing disease automatic coding.
Disease automatic coding system, method, equipment and storage medium provided by the present application, obtain target object (i.e. to
The disease name or disease description of coding) after, progress scalping first filters out related to target object from disease code library
Coding, to obtain candidate code collection, then further progress dusting cover concentrates the corresponding disease of each coding based on candidate code
Name of disease claims the semantic relation with target object, concentrates from candidate code and determines the corresponding coding of target object.Via the above process
It is found that disease automatic coding provided by the present application can determine the corresponding volume of target object from disease code library automatically
Code, compared to h coding's mode, not only saves manpower, reduces coding time, and avoid subjective factors to volume
The influence of code accuracy, in addition, concentrating the corresponding disease name of each coding and the semantic of target object to close based on candidate code
System accurately can determine the corresponding coding of target object from candidate code concentration.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the flow diagram of disease automatic coding provided by the embodiments of the present application;
Fig. 2 filters out coding relevant to target object to be provided by the embodiments of the present application from code database, and composition is candidate
The flow diagram of the realization process of coded set;
Fig. 3 is an exemplary schematic diagram of the corresponding attributed graph of a coded set provided by the embodiments of the present application;
Fig. 4 is provided by the embodiments of the present application based on the corresponding disease name of each candidate code of candidate code concentration and mesh
The semantic relation for marking object concentrates the flow diagram for determining the realization process of the corresponding coding of target object from candidate code;
Fig. 5 is provided by the embodiments of the present application the one specific of the corresponding coding of target object to be determined from disease code library
Exemplary schematic diagram;
Fig. 6 is the structural schematic diagram of disease automatic coding system provided by the embodiments of the present application;
Fig. 7 is the structural schematic diagram of disease autocoding equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
Inventor has found during realizing the application: existing disease code scheme is mostly h coding's scheme, but
The coding quality of h coding's scheme depends on the specialized capability of coder, and it is at high cost to cultivate outstanding coder, and
Understand differently cause of the different coders to coding thickness, code database and judgment criteria, this causes coding result inconsistent, then
Person, coding work is many and diverse, h coding's inefficiency, and the case where be easy to appear code error.
In view of h coding's scheme there are the problem of, inventor is studied:
Originally thinking is, using the autocoding scheme based on retrieval technique, specifically, constructing a disease art first
Repertorie, which mainly includes the title of disease, abridge, be commonly called as, local title etc., then passes through natural language
Technology carries out participle and keyword abstraction to disease name, inquires in disease terminology bank finally by the means of keyword retrieval
The most like standard terminology with disease name or disease description to be encoded is then based on standard terminology from disease code library
Determine the corresponding coding of disease name or disease description to be encoded.
However, the autocoding scheme based on retrieval technique can be retrieved in character string level it is many similar as a result, example
Such as, disease name be " foreign body in esophagus ", based on " foreign body in esophagus " can retrieve " ear canal foreign matter ", " foreign bodies in digestive tract ", " in oesophagus
Multiple analog results such as foreign matter ", the program are difficult to select correct result in the above results.
In view of above-mentioned thinking there are some problems, inventor is made further research, and is proposed based on classification skill
The autocoding scheme of art, the thought of the program are to regard disease code as a classification task, the coding in disease code library
As the label of the classification task, by the complicated neural network model of building to the disease name of input or disease description into
The prediction of row tag along sort.
Although autocoding scheme based on sorting technique is capable of determining that disease name or disease description to be encoded
Corresponding coding, but the program is due to needing to carry out classification prediction by complicated neural network model, calculate complexity
Spend high, and the program can only carry out tens kinds of a small amount of diseases classification prediction (encode), and when the classification for needing to classify
When increasing, computation complexity will be unable to bear, and the performance of neural network model can sharply decline, and this scheme lacks robust
Property and practicability.
In view of above scheme there are the problem of, the further investigation of inventor further progress finally proposes one kind
The preferable disease automatic coding of effect, the disease automatic coding can it is automatic, accurately and efficiently from disease code library
In determine disease to be encoded or the corresponding coding of disease description, which is applicable to determine disease name
Claim or the scene of the corresponding coding of disease description, the disease automatic coding can be applied to terminal, can also be applied to server.
Next disease automatic coding provided by the present application is introduced.
Referring to Fig. 1, showing the flow diagram of disease automatic coding provided by the embodiments of the present application, this method
May include:
Step S101: target object is obtained.
Wherein, target object is disease name or disease description to be encoded.
Step S102: coding relevant to target object is filtered out from disease code library, is made of the coding filtered out
Candidate code collection.
It should be noted that including that multiclass encodes in code database, the purpose of this step is to screen from disease code library
Coding sorts relevant to target object out, to all codings under coding sorts relevant to target object be formed candidate
Coded set.
Specifically, can be based on the text statistical nature of target object disease name corresponding for all kinds of codings, from disease
Coding relevant to target object is filtered out in code database.
Step S103: concentrating the semantic relation of each coding corresponding disease name and target object based on candidate code,
It is concentrated from candidate code and determines the corresponding coding of target object.
Specifically, the corresponding disease name of each coding letter similar to the semanteme of target object can be concentrated based on candidate code
Breath is concentrated from candidate code and determines the corresponding coding of target object.
Disease automatic coding provided by the embodiments of the present application is obtaining target object (disease name i.e. to be encoded
Or disease description) after, progress scalping first filters out coding relevant to target object from disease code library, to obtain
Candidate code collection, then further progress dusting cover, i.e., concentrate the corresponding disease name of each coding and target based on candidate code
The semantic relation of object is concentrated from candidate code and determines the corresponding coding of target object.Via the above process it is found that the application is real
The disease automatic coding for applying example offer can determine the corresponding coding of target object from disease code library automatically, compared to
H coding's mode not only saves manpower, reduces coding time, and avoids subjective factors to coding accuracy
It influences, in addition, the text statistical nature based on target object disease name corresponding for all kinds of codings, it can be rapidly from disease
Coding relevant to target object is filtered out in sick code database, and the corresponding disease name of each coding is concentrated based on candidate code
With the semantic relation of target object, the corresponding coding of target object, i.e. this Shen accurately can be determined from candidate code concentration
Please embodiment provide disease automatic coding can automatically, efficiently and accurately determine the corresponding coding of target object.
In another embodiment of the application, " filtered out from code database and mesh in the step S102 of above-described embodiment
The realization process of the relevant coding of mark object " is introduced.
Coding relevant to target object is filtered out from code database referring to Fig. 2, showing, composition candidate code collection
Realization process may include:
Step S201: based on the corresponding disease name of codings all kinds of in target object and disease code library, target pair is determined
As the target text statistical nature for all kinds of codings.
Wherein, target object encodes and target pair the target text statistical nature that any sort encodes for characterizing such
The degree of correlation of elephant.
In one possible implementation, the realization process of step S201 may include:
Step S2011, the coding in disease code library is classified, obtains multiple coded sets.
Wherein, the corresponding coding sorts of each coded set.
It should be noted that ICD coding is divided into 3 codes of ICD, ICD 4 by number of encoding bits in ICD coding scheme
6 code, ICD codes, wherein 6 codes of 4 codes of ICD and ICD are the suborder code and extended code of ICD, are the detailed of 3 codes of ICD
The front three of disaggregated classification, 4 codes of ICD and 6 codes of ICD is 3 codes of ICD.
The present embodiment can classify to all 6 codes of ICD based on ICD3 codes, can be by preceding 3 identical ICD
6 codes are divided into a kind of composition coded set, for example, one kind can be merged into for 6 codes of ICD of E32 by first 3, obtain a volume
6 codes of ICD that front three is K74 can be merged into one by code collection { E32.000, E32.001, E32.002, E32.100 ... }
Class obtains a coded set { K74.000, E74.001 ... }, so can get multiple coded sets, and each coded set includes multiple
Front three 6 codes of identical ICD.
Step S2012, it is based on target object and the corresponding disease name of each coded set, determines target object for each
The the first text statistical nature, and/or the second text statistical nature, and/or third text statistical nature of coded set, and/or
Four text statistical natures.
Wherein, the corresponding encoding name of any coded set includes the corresponding disease name of each coding in the coded set.
Wherein, target object is each in target object for characterizing for the first text statistical nature of any coded set
Word appears in the frequency in the corresponding disease name of the coded set;Target object counts special for the second text of any coded set
It takes over each word in characterization target object for use and appears in the word frequency-in the document of the corresponding disease name composition of the coded set
Inverse document frequency;Target object is used to characterize target object and the coded set for the third text statistical nature of any coded set
The text similarity of corresponding disease name;Target object is used to characterize mesh for the 4th text statistical nature of any coded set
Mark the matching degree of the keyword and qualifier in object encoding name corresponding with the coded set.
Based on target object and the corresponding disease name of each coded set, determine target object for the of each coded set
One text statistical nature, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th text statistics
Feature can be found in the explanation of subsequent embodiment.
Step S2013, based on target object for the first text statistical nature of each coded set, and/or the second text
Statistical nature, and/or third text statistical nature, and/or the 4th text statistical nature, determine target object for all kinds of codings
Target text statistical nature.
It in one possible implementation, can be by target object for the first of the coded set for any coded set
Text statistical nature, the second text statistical nature, third text statistical nature, in the 4th text statistical nature any one or
Multiple (preferably several) input Multilayer Perception network MLP obtain target object and count special for the target text of the coded set
Sign, to can get target object for the target text statistical nature of each coded set, target object is for each coded set
Target text statistical nature be target text statistical nature of the target object for all kinds of codings.
Step S202: it based on target object for the target text statistical nature of all kinds of codings, is sieved from disease code library
Select coding relevant to target object.
Specifically, target object is target object for such coding for the target text statistical nature that any sort encodes
Score.In one possible implementation, can to choose N number of score in the score from target object for all kinds of codings (N number of
Score is all larger than other scores), using all codings under the corresponding coding sorts of N number of score as related to target object
Coding, for example, can the target object of score descending sort by to(for) all kinds of codings, take the corresponding coding of top n score
All codings under classification are as coding relevant to target object, wherein N can be set based on actual conditions.
Individually below to target object and the corresponding disease name of each coded set is based on, determine target object for each
First text statistical nature of coded set, the second text statistical nature, third text statistical nature, the 4th text statistical nature into
Row is introduced.
Since the corresponding first text statistical nature of each coded set, the second text statistical nature, third text statistics are special
Sign, the method for determination of the 4th text statistical nature are identical, below by taking a coded set C as an example, really to each text statistical nature
Determine process to be introduced.
Based on target object and the corresponding disease name of coded set C, determine target object for the first text of coded set C
The process of statistical nature includes:
Step a1, the weight of each word in the first word set is obtained.
Wherein, the first word set is obtained by carrying out duplicate removal processing to the second word set, and the second word set is corresponding to coded set C
Disease name carries out the set of the word obtained after word segmentation processing composition, and the weight of each word passes through in the first word set in the first word set
The number that each word occurs in the second word set determines.
It should be noted that be usually fixed due to the weight of each word in the first word set, it is every in the first word set
The weight of a word can be predefined and be stored, and when needing to encode target object, directly acquired and used, certainly,
It can be determined when being encoded to target object.
In the present embodiment, the process for determining the weight of each word in the first word set may include: to each in coded set C
It encodes corresponding disease name to carry out word segmentation processing (remove stop words, go punctuate, synonym replacement etc.), be obtained after word segmentation processing
Word forms the second word set;Duplicate removal processing is carried out to the second word set, obtains the first word set;Each word is counted in the first word set second
The number occurred in word set;Based on the number that each word in the first word set occurs in the second word set, determine every in the first word set
The weight of a word.
Illustratively, coded set C includes coding respectively encodes corresponding disease name, respectively encodes corresponding disease name
Word segmentation processing result it is as shown in table 1 below:
The relevant information of 1 one coded set of table
Word segmentation processing result in table 1 forms the second word set, after carrying out duplicate removal processing to the word in the second word set, can get
First word set { thymus gland, hyperplasia, thymopathy is congenital, atrophy, other, abscess, tumour, loose, duration, disease }, statistics the
The number that each word occurs in the second word set in one word set is counted each word in available first word set in the second word set
The number of middle appearance is respectively as follows: 7,2,1,1,1,1,1,1,1,1,1, is determined based on the number counted each in the first word set
The weight of word, specifically, the total degree that occurs in the second word set of each word in the first word set is 18, then the power of word " thymus gland "
Weight is 7/18, and the weight of word " hyperplasia " is 2/18, and the weight of other words is 1/18.The following table 2 shows each in the first word set
The frequency of occurrence and weight of word:
The frequency of occurrence and weight of each word in 2 first word set of table
Step a2, target word set is obtained.
Wherein, target word set is that the set of the word obtained after word segmentation processing composition is carried out to target object.
Illustratively, target object is " vertical diaphragm occupy-place: thymic hyperplasia ", then divides " vertical diaphragm occupy-place: thymic hyperplasia "
Word handles to obtain " vertical diaphragm/occupy-place/thymus gland/hyperplasia ", then target word set is { vertical diaphragm, occupy-place, thymus gland, hyperplasia }.
Step a3, the weight based on each word in the first word set determines that target word concentrates the weight of each word.
Illustratively, target word set is { vertical diaphragm, occupy-place, thymus gland, hyperplasia }, the weight such as table 2 of each word in the first word set
It is shown, be 0.39 by the weight that table 2 can get " thymus gland ", the weight of " hyperplasia " is 0.11, and do not have in table 2 " vertical diaphragm " and
" occupy-place ", then " vertical diaphragm " and the weight of " occupy-place " are 0.
Step a4, the weight that each word is concentrated by target word determines that target object unites for the first text of coded set C
Count feature.
Specifically, the weight of each word can be concentrated to sum to target word, the value summed is as target object for compiling
The first text statistical nature of code collection C.Illustratively, in target word set { vertical diaphragm, occupy-place, thymus gland, hyperplasia } each word weight
Respectively 0,0,0.39,0.11, then 0+0+0.39+0.11=0.5 is counted as the first text of the target object for coded set
Feature.
Based on target object and the corresponding disease name of coded set C, determine target object for the second text of coded set C
The process of statistical nature may include:
Step b1, the corresponding disease document of coded set C is obtained.
Wherein, the corresponding disease document of coded set C is made of the corresponding disease name of coded set C, that is, will be in coded set C
One document of each corresponding disease name composition of coding, as the corresponding disease document of coded set C.
Step b2, target word set is obtained.
Wherein, target word set is that the set of the word obtained after word segmentation processing composition is carried out to target object.
Step b2, determine that each word of target word concentration appears in the term frequency-inverse document of the corresponding disease document of coded set C
Frequency.
Specifically, calculating the word first for any word that target word is concentrated and appearing in the corresponding disease document of coded set C
Word frequency and the word then the corresponding disease of coded set C is appeared in by the word for the inverse document frequency of target corpus
The word frequency of sick document and the word determine that the word appears in the corresponding disease of coded set C for the inverse document frequency of target corpus
The term frequency-inverse document frequency of sick document.Wherein, include the corresponding disease document of each coded set in target corpus, need
It is bright, it further include disease in each disease document in target corpus other than including the corresponding disease name of coded set
Title is extended, for example, including each in coded set X encode in corresponding disease name and coded set X respectively in disease document T
A extension title for encoding corresponding disease name.
Further, any word t that can determine that target word is concentrated by following formula appears in the corresponding disease document of coded set C
The word frequency of d:
Wherein, ntFor the number that the word t that target word is concentrated occurs in the corresponding disease document d of coded set C, NdFor coding
Collect the total quantity of word in the corresponding disease document d of C, TF (t, d) is the word that word t appears in the corresponding disease document d of coded set C
Frequently.It should be noted that in actual use, it usually needs TF (t, d) is normalized:
Wherein, t' is the word in the corresponding disease document d of coded set C.
Inverse document frequency of any word t for target corpus D of target word concentration can be determined by following formula:
Wherein, M is the sum of document in target corpus D, mtFor in target corpus D include word t document quantity,
IDF (t, D) is inverse document frequency of the word t for target corpus D of target word concentration.
Obtain TF'(t, d) and IDF (t, D) after, by following formula determine target word concentration any word t appear in coding
Collect the term frequency-inverse document frequency TF-IDF (t, d, D) of the corresponding disease document d of C:
TF-IDF (t, d, D)=TF'(t, d) * IDF (t, D) (4)
The word of the corresponding disease document of coded set C is appeared in via each word that the above process can get target word concentration
Frequently-inverse document frequency.
Step b3, each word is concentrated to appear in the term frequency-inverse document of the corresponding disease document of coded set C frequently by target word
Degree, determines target object for the second text statistical nature of coded set C.
Specifically, each word can be appeared in the term frequency-inverse document frequency summation of the corresponding disease document of coded set C, ask
The second text statistical nature with obtained value as target object for coded set C.
For any coded set, it is based on target object and the corresponding disease name of coded set C, determines target object for compiling
The process of the third text statistical nature of code collection C may include:
Step c1, the editing distance of coded set C corresponding disease name and target object is calculated separately.
It should be noted that the editing distance between text refers to, it is that another text needs most by a text conversion
Small modifications number of words is the quantizating index for measuring two text difference degrees or similarity degree.
This step is come by the editing distance of coding each in calculation code collection C corresponding disease name and target object
Determine each text difference degree (or similarity degree) for encoding corresponding disease name and target object in coded set C, it is excellent
Choosing, can in calculation code collection C each coding corresponding disease name and disease extension title and target object editing distance.
Step c2, editing distance obtained by calculation determines that target object counts special for the third text of coded set C
Sign.
Specifically, the smallest edit distance in the editing distance being calculated can be determined as target object for coded set
The third text statistical nature of C.
It should be noted that if some in coded set C encodes corresponding disease name or disease extension title and target pair
The editing distance of elephant is 0, shows that the corresponding disease name of the coding or disease extension title and target object are completely the same, then directly
It connects and the coding is determined as the corresponding coding of target object.
Based on target object and the corresponding disease name of coded set C, determine target object for the 4th text of coded set C
Statistical nature, comprising:
Step d1, the corresponding attributed graph of coded set C is obtained.
Wherein, attributed graph includes main word and attribute word, and attributed graph is able to reflect the relationship of main word and attribute word, is dominated
Word is the keyword in the corresponding disease name of coded set C, and attribute word is the qualifier of main word.
In the present embodiment, corresponding attributed graph can be constructed for coded set C in advance, specifically, corresponding to for coded set C building
The process of attributed graph may include:
1) main word, is determined from the corresponding disease name of coded set C.
Wherein, main word is the keyword in disease name, for example, " meningitis " in " tubercular meningitis " is leading
Word, for another example, " respiratory failure " in " chronic respiratory failure " are main word.
2) the attribute word for modifying main word, is determined from the corresponding disease name of coded set, and determines each attribute
The weight of word.
In one possible implementation, it can be based on attribute predetermined, from the corresponding disease name of coded set
It determines the attribute word for modifying main word, for each attribute predetermined, can get an attribute word set.
Wherein, attribute predetermined may include with one of properties or a variety of: the cause of disease, pathology, position, by stages
Or severity, primary or secondary, multiple or single-shot, acute and chronic, pathogen, age or period, sequelae, complication etc..
For any attribute word, can be determined based on the number that the attribute word occurs in the corresponding disease name of coded set C
The weight of the attribute word, to obtain the weight of each attribute word.
3), attributed graph is constructed using the weight for main word, attribute word and the attribute word determined.
Optionally, the form of attributed graph can be with are as follows: centered on main word, the corresponding attribute word of each attribute predetermined
Information is branch, the corresponding attribute word information of an attribute (including the corresponding attribute set of words of the attribute and the attribute word set
The weight of each attribute word in conjunction) it is a branch.
Referring to Fig. 3, the schematic diagram of the corresponding attributed graph of coded set for the coding composition that front three is K74 is shown, by
The center that Fig. 3 can be seen that attributed graph is main word " cirrhosis ", and each of main word branches into each attribute predetermined
The power of each attribute word in (such as case, the cause of disease, pathogen, complication etc.) corresponding attribute set of words and attribute set of words
Weight, for example, one of them branches into corresponding with attribute " pathology " attribute word set { liver fibrosis, biliar, gangrenosum acne, tubercle
Property ... }, in attribute word set { liver fibrosis, biliar, gangrenosum acne, nodositas ... } weight of each attribute word be respectively 0.2,
0.3、0.1、0.1、…。
Step d2, by target object attributed graph corresponding with coded set C main word and attribute word match, be based on
The match condition of target object attributed graph corresponding with coded set C determines that target object counts the 4th text of coded set C
Feature.
There are many implementations of step d2, in one possible implementation:
Word segmentation processing is carried out to target object first, obtains multiple target words;Then, it determines to lead from multiple target words
Introductory word, and the main word is matched with the main word in attributed graph, if successful match, show that there are in attributed graph in target object
Main word then the target word in addition to main word is matched with the attribute word in attributed graph;By in addition to main word
The attribute word of target word successful match sums to the weight of objective attribute target attribute word, summation obtains value as mesh as objective attribute target attribute word
Object is marked for the 4th text statistical nature of coded set C.If from the main word and attributed graph determined in multiple target words
Main word it fails to match, show that there is no the main words in attributed graph in target object, it is determined that target object is for coding
The 4th text statistical nature for collecting C is 0.
Above-mentioned implementation needs choose main word from target object, it is to be appreciated that main word is most important, such as
The selection of fruit main word is improper will to will affect the 4th text statistical nature, and then influence the accuracy of next code screening, in view of
This can be without the selection of main word, using each word in target object as main word in alternatively possible realization
It is matched, it is specific: word segmentation processing being carried out to target object first, obtains multiple target words;Then, to multiple target words into
Row traversal: using the word currently traversed as main word, being matched with the main word in attributed graph, will if successful match
Target word in addition to main word is matched with the attribute word in attributed graph, by with the target word successful match in addition to main word
The corresponding weight summation of attribute word, summation obtain value and are used as a candidate feature, if the word currently traversed as main word, with
It fails to match for main word in attributed graph, then is used as a candidate feature for 0, then traverses next target word, until traversal
Complete all target words obtain multiple candidate features by above-mentioned ergodic process, will be worth maximum candidate feature as target pair
As the 4th text statistical nature for coded set C.
It should be noted that above-mentioned carry out coding sieve based on target text statistical nature of the target object for all kinds of codings
Choosing, is substantially to be screened according to the feature of character level, this screening mode is difficult to filter out mesh from disease code library
The corresponding coding of object is marked, coding relevant to target object can only be filtered out, in view of this, the application is further in semantic layer
Face is screened, i.e., the semantic relation of each candidate code corresponding disease name and target object is concentrated based on candidate code,
The corresponding coding of target object is determined from candidate code concentration.
Below to the semantic relation for concentrating each candidate code corresponding disease name and target object based on candidate code,
It is concentrated from candidate code and determines that the realization process of the corresponding coding of target object is introduced, which may include:
Step S401: the semantic analog information based on the corresponding disease name of each candidate code and target object determines
The semantic vector of the corresponding disease name of each candidate code.
Specifically, for any candidate code c, the semanteme based on candidate code c corresponding disease name and target object
Analog information determines that the process of the semantic vector of the corresponding disease name of candidate code c may include:
Step S4011, each character based on each character and target object in the corresponding disease name of candidate code c
Semantic similarity, determine the semantic weight of each character in the corresponding disease name of candidate code c.
Specifically, each character based on each character and target object in the corresponding disease name of candidate code c
Semantic similarity determines the semantic weight of each character in the corresponding disease name of candidate code c, comprising:
1) each word in the semantic vector and target object of each character is determined in the corresponding disease name of candidate code c
The semantic vector of symbol.
Determine the semantic vector of each character in the corresponding disease name of candidate code c process include: first will be candidate
It encodes the corresponding disease name of c and presses character cutting, obtain multiple characters;It then is vector by each character representation;Then by table
Show that the vector input of each character can extract the network (such as LSTM network) of context semantic information, to obtain each word
The semantic vector of symbol.It should be noted that the network (such as LSTM network) of context semantic information can be extracted to input
Each character carries out the other contextual information of character level and extracts, and exports the semantic vector of each character.Determine each of target object
The process of the semantic vector of a character is similar, and therefore not to repeat here for the present embodiment.
2) any character in disease name corresponding for candidate code c, calculate separately the semantic vector of the character with
The similarity of the semantic vector of each character of target object, using the maximum similarity in the multiple similarities being calculated as
The semantic weight of the character, to obtain the semantic weight of each character in the corresponding disease name of candidate code c.
Specifically, i-th of character being calculate by the following formula in the corresponding disease name of candidate code c and target object
The semantic similarity S (i, j) of j-th of character:
Wherein,For the semantic vector of i-th of character in the corresponding disease name of candidate code c,For target object
The semantic vector of j-th of character.
Step S4012, semantic vector and semantic weight based on each character in the corresponding disease name of candidate code c,
Determine the semantic vector of the corresponding disease name of candidate code c.
Specifically, first passing through the semantic vector and semantic weight weight of each character in the corresponding disease name of candidate code c
The semantic vector of each character, then splices the semantic vector reconstructed in the corresponding disease name of structure candidate code c,
Semantic vector of the vector obtained after splicing as the corresponding disease name of candidate code c.
Wherein, it can be utilized by the semantic vector and semantic weight of each character in the corresponding disease name of candidate code c
Following formula reconstructs the semantic vector of each character in the corresponding disease name of candidate code c:
Wherein, wiWithThe semantic vector of i-th of character and semantic power respectively in the corresponding disease name of candidate code c
Weight,To pass through wiWithSemantic vector reconstruct, i-th character.
It can get the semantic vector that candidate code concentrates the corresponding disease name of each coding via the above process.
Step S402: concentrating the semantic vector of the corresponding disease name of each candidate code based on candidate code, from candidate
The corresponding coding of target object is determined in coded set.
Specifically, the semantic vector of the corresponding disease name of each candidate code can be concentrated by candidate code, determines and wait
Select and compile the score of each candidate code in code collection, wherein it is corresponding that the score of any candidate code can characterize the candidate code
The semantic similarity degree of disease name and target object;The candidate code of highest scoring is concentrated to be determined as target pair candidate code
As corresponding coding.
In one possible implementation, candidate code can be concentrated to the language of the corresponding disease name of each candidate code
Adopted vector inputs Multilayer Perception network MLP, obtains the score that candidate code concentrates each candidate code.
Please refer to Fig. 5 and show an example of the corresponding coding of determining target object: target object is that " nasopharynx differentiated is non-
Keratosa cancer " can get 10 ICD 3 based on it for the target text statistical nature of all kinds of codings in disease code library
Code, such as " J31- rhinitis chronic, nasopharyngitis and pharyngitis ", " C11- malignant tumor of nasopharynx " etc., by the institute under 3 codes of each ICD
Candidate code collection is formed by 6 codes of ICD, candidate code is then based on and concentrates each candidate code and target object " nasopharynx differentiation
The semantic analog information of the non-keratosa cancer of type ", reconstructs the semantic vector that candidate code concentrates each candidate code, based on candidate
The semantic vector of each candidate code determines that candidate code concentrates the score of each candidate code in coded set, for example, J31.100
It is scored at 0.1257, C11.900 and is scored at 0.7329, be " nasopharynx using the candidate code of highest scoring as target object
The corresponding coding of the non-keratosa cancer of differentiated ", as shown in figure 5, highest scoring is encoded to C11.900, that is, C11.900 is true
It is set to " the non-keratosa cancer of nasopharynx differentiated " corresponding coding.
Disease automatic coding provided by the embodiments of the present application, can be according to the feature of character level from disease code library
Coding relevant to target object (disease name or disease description i.e. to be encoded) is filtered out, to obtain candidate code collection,
After obtaining candidate code collection, the corresponding coding of target object, character level can be filtered out from candidate code concentration in semantic level
Scalping the complexity of subsequent dusting cover can be reduced under the premise of guaranteeing accuracy rate, the dusting cover of semantic level can guarantee from time
The corresponding coding of target object is accurately filtered out in a collection of selected materials code collection.Disease automatic coding provided by the embodiments of the present application,
It can determine that the corresponding coding of target object not only saves people compared to h coding's mode from disease code library automatically
Power reduces coding time, and avoids influence of the subjective factors to coding accuracy, also, the application passes through character
The scalping of level and the dusting cover of semantic level efficiently and accurately can determine that target object is corresponding from disease code library
Coding.
The embodiment of the present application also provides a kind of disease automatic coding systems, below to disease provided by the embodiments of the present application
Automatic coding system is described, and disease automatic coding system described below and above-described disease automatic coding can
Correspond to each other reference.
Referring to Fig. 6, a kind of structural schematic diagram of disease automatic coding system provided by the embodiments of the present application is shown, it should
Disease automatic coding system may include: to obtain module 601, coding scalping module 602 and coding dusting cover module 603.
Module 601 is obtained, for obtaining target object, the target object is disease name or disease description;
Scalping module 602 is encoded, for filtering out coding relevant to the target object from disease code library, by sieving
The coding composition candidate code collection selected.
Encode dusting cover module 603, for based on the candidate code concentrate the corresponding disease name of each candidate code with
The semantic relation of the target object is concentrated from the candidate code and determines the corresponding coding of the target object.
Disease automatic coding system provided by the embodiments of the present application is obtaining target object (disease name i.e. to be encoded
Or disease description) after, progress scalping first filters out coding relevant to target object from disease code library, to obtain
Candidate code collection, then further progress dusting cover, i.e., concentrate the corresponding disease name of each coding and target based on candidate code
The semantic relation of object is concentrated from candidate code and determines the corresponding coding of target object.Via the above process it is found that the application is real
The disease automatic coding system for applying example offer can determine the corresponding coding of target object from disease code library automatically, compared to
H coding's mode not only saves manpower, reduces coding time, and avoids subjective factors to coding accuracy
It influences, in addition, the semantic relation of each coding corresponding disease name and target object is concentrated based on candidate code, it can be accurate
The corresponding coding of target object, i.e., disease automatic coding system provided by the embodiments of the present application are determined from candidate code concentration in ground
It can automatically, efficiently and accurately determine the corresponding coding of target object.
In one possible implementation, the coding scalping mould in disease automatic coding system provided by the above embodiment
Block 602 may include: characteristic determination module and correlative coding screening module.
The characteristic determination module, for corresponding based on all kinds of codings in the target object and the disease code library
Disease name determines the target object for the target text statistical nature of all kinds of codings, wherein the target object for
The target text statistical nature of any sort coding is used to characterize the degree of correlation of such coding and the target object;
The correlative coding screening module, for counting special based on target text of the target object for all kinds of codings
Sign, filters out coding relevant to the target object from the disease code library.
In one possible implementation, the above-mentioned characteristic determination module that obtains includes: coding specification submodule, the first spy
It levies and determines that submodule and second feature determine submodule.
The coding specification submodule obtains multiple codings for the coding in the disease code library to be classified
Collection, the corresponding coding sorts of each coded set.
The fisrt feature determines submodule, for being based on the target object and the corresponding disease name of each coded set
Claim, determine the target object for the first text statistical nature of each coded set, and/or the second text statistical nature and/
Or third text statistical nature, and/or the 4th text statistical nature;Wherein, the corresponding disease name of any coded set includes being somebody's turn to do
The corresponding disease name of each coding in coded set, the target object for any coded set the first text statistical nature,
Second text statistical nature, third text statistical nature, the 4th text statistical nature are respectively used to characterize in the target object
Each word appear in the frequency in the corresponding disease name of the coded set, each word in the target object appears in the volume
Term frequency-inverse document frequency, the target object in the document of the corresponding disease name composition of code collection is corresponding with the coded set
Keyword and qualifier in the text similarity of disease name, the target object disease name corresponding with the coded set
Matching degree.
The second feature determines submodule, for being united based on first text of the target object for each coded set
Feature, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th text statistical nature are counted, is determined
Target text statistical nature of the target object for all kinds of codings.
In one possible implementation, above-mentioned fisrt feature determines that submodule for any coded set, is based on institute
Target object disease name corresponding with the coded set is stated, determines that the target object counts the first text of the coded set
When feature, be specifically used for: obtain the first word set in each word weight, wherein first word set by the second word set into
Row duplicate removal processing obtains, and second word set is the phrase for obtain after word segmentation processing to the coded set corresponding disease name
At set, the weight of each word is gone out in second word set by each word in first word set in first word set
Existing number determines;Target word set is obtained, and determines that the target word is concentrated based on the weight of each word in first word set
The weight of each word, wherein the target word set is the collection that the word obtained after word segmentation processing composition is carried out to the target object
It closes;The weight that each word is concentrated by the target word determines that the target object counts the first text of the coded set
Feature.
In one possible implementation, above-mentioned fisrt feature determines that submodule for any coded set, is based on institute
Target object disease name corresponding with the coded set is stated, determines that the target object counts the second text of the coded set
When feature, it is specifically used for: obtains the corresponding disease document of the coded set, the corresponding disease document of the coded set is by the coded set pair
The disease name composition answered;Target word set is obtained, and it is corresponding to determine that each word of the target word concentration appears in the coded set
Disease document term frequency-inverse document frequency, wherein the target word set be to the target object carry out word segmentation processing after
The set of the word composition arrived;The word of the corresponding disease document of the coded set is appeared in by each word that the target word is concentrated
Frequently-and against document frequency, determine the target object for the second text statistical nature of the coded set.
In one possible implementation, above-mentioned fisrt feature determines that submodule for any coded set, is based on institute
Target object disease name corresponding with the coded set is stated, determines that the target object counts the third text of the coded set
When feature, it is specifically used for: calculates separately the editing distance of target object disease name corresponding with the coded set;Pass through institute
The editing distance for stating target object disease name corresponding with the coded set determines the target object for the of the coded set
Three text statistical natures.
In one possible implementation, above-mentioned fisrt feature determines that submodule for any coded set, is based on institute
Target object disease name corresponding with the coded set is stated, determines that the target object counts the 4th text of the coded set
When feature, it is specifically used for: obtains the corresponding attributed graph of the coded set, wherein the attributed graph includes main word and attribute word, institute
Stating main word is the keyword in the corresponding disease name of the coded set, and the attribute word is the qualifier of the main word;It will
Main word and attribute word in target object attributed graph corresponding with the coded set are matched;Based on the target object
The match condition of attributed graph corresponding with the coded set determines that the target object counts special for the 4th text of the coded set
Sign.
In one possible implementation, the coding dusting cover mould in disease automatic coding system provided by the above embodiment
Block 603 may include: semantic vector determining module and coding screening module.
The semantic vector determining module, for based on each corresponding disease name of candidate code and the target
The semantic analog information of object determines the semantic vector of the corresponding disease name of each candidate code.
The coding screening module, for the semantic vector based on the corresponding disease name of each candidate code, from
The candidate code, which is concentrated, determines the corresponding coding of the target object.
In one possible implementation, above-mentioned semantic vector determining module may include: that weight determines submodule
Submodule is determined with semantic vector.
The weight determines submodule, for being based on the corresponding disease name of the candidate code for any candidate code
In in each character and the target object each character semantic similarity, determine in the corresponding disease name of the candidate code
The semantic weight of each character.
The semantic vector determines submodule, for the language based on each character in the corresponding disease name of the candidate code
Adopted vector sum semantic weight, determines the semantic vector of the corresponding disease name of the coding.
In one possible implementation, above-mentioned weight determines submodule, is specifically used for determining the candidate code pair
In the disease name answered in the semantic vector of each character and the target object each character semantic vector;For the time
Any character in the corresponding disease name of code is selected and compile, each of the semantic vector of the character and the target object is calculated separately
The similarity of the semantic vector of character is weighed the maximum similarity in the multiple similarities being calculated as the semanteme of the character
Weight, to obtain the semantic weight of each character in the corresponding disease name of the coding.
In one possible implementation, above-mentioned coding screening module is specifically used for through each candidate volume
The semantic vector of the corresponding disease name of code, determines the score of each candidate code, and the candidate code of highest scoring is true
It is set to the corresponding coding of the target object;Wherein, the score of any candidate code can characterize the corresponding disease of the candidate code
Name of disease claims the semantic similarity degree with the target object.
The embodiment of the present application also provides a kind of disease autocoding equipment, compile automatically referring to Fig. 7, showing the disease
The structural schematic diagram of decoding apparatus, the disease autocoding equipment may include: at least one processor 701, at least one communication
Interface 702, at least one processor 703 and at least one communication bus 704;
In the embodiment of the present application, processor 701, communication interface 702, memory 703, communication bus 704 quantity be
At least one, and processor 701, communication interface 702, memory 703 complete mutual communication by communication bus 704;
Processor 701 may be a central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road etc.;
Memory 703 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-
Volatile memory) etc., a for example, at least magnetic disk storage;
Wherein, memory is stored with program, the program that processor can call memory to store, and described program is used for:
Target object is obtained, the target object is disease name or disease description;
Coding relevant to the target object is filtered out from disease code library, and candidate volume is formed by the coding filtered out
Code collection;
The corresponding disease name of each candidate code and the semantic of the target object is concentrated to close based on the candidate code
System concentrates from the candidate code and determines the corresponding coding of the target object.
Optionally, the refinement function of described program and extension function can refer to above description.
The embodiment of the present application also provides a kind of readable storage medium storing program for executing, which can be stored with and hold suitable for processor
Capable program, described program are used for:
Target object is obtained, the target object is disease name or disease description;
Coding relevant to the target object is filtered out from disease code library, and candidate volume is formed by the coding filtered out
Code collection;
The corresponding disease name of each candidate code and the semantic of the target object is concentrated to close based on the candidate code
System concentrates from the candidate code and determines the corresponding coding of the target object.
Optionally, the refinement function of described program and extension function can refer to above description.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (24)
1. a kind of disease automatic coding characterized by comprising
Target object is obtained, the target object is disease name or disease description;
Coding relevant to the target object is filtered out from disease code library, and candidate code is formed by the coding filtered out
Collection;
The semantic relation that each candidate code corresponding disease name and the target object is concentrated based on the candidate code, from
The candidate code, which is concentrated, determines the corresponding coding of the target object.
2. disease automatic coding according to claim 1, which is characterized in that described to be filtered out from disease code library
Coding relevant to the target object, comprising:
Based on the corresponding disease name of codings all kinds of in the target object and the disease code library, the target object is determined
For the target text statistical nature of all kinds of codings, wherein the target object counts the target text that any sort encodes
Feature is used to characterize the degree of correlation of such coding and the target object;
Based on the target object for the target text statistical nature of all kinds of codings, filtered out from the disease code library with
The relevant coding of the target object.
3. disease automatic coding according to claim 2, which is characterized in that described to be based on the target object and institute
The corresponding disease name of all kinds of codings in disease code library is stated, determines that the target object unites for the target text of all kinds of codings
Count feature, comprising:
Coding in the disease code library is classified, multiple coded sets, the corresponding coding class of each coded set are obtained
Not;
Based on the target object and the corresponding disease name of each coded set, determine the target object for each coded set
The first text statistical nature, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th text
Statistical nature;Wherein, the corresponding disease name of any coded set includes the corresponding disease name of each coding, institute in the coded set
State target object for the first text statistical nature of any coded set, the second text statistical nature, third text statistical nature,
Each word that 4th text statistical nature is respectively used to characterize in the target object appears in the corresponding disease name of the coded set
Each word in frequency, the target object in title appears in the word in the document of the corresponding disease name composition of the coded set
Frequently-inverse document frequency, the text similarity of target object disease name corresponding with the coded set, the target object and
The matching degree of keyword and qualifier in the corresponding disease name of the coded set;
Based on the target object for the first text statistical nature of each coded set, and/or the second text statistical nature,
And/or third text statistical nature, and/or the 4th text statistical nature, determine the target object for the mesh of all kinds of codings
Mark text statistical nature.
4. disease automatic coding according to claim 3, which is characterized in that for any coded set, based on described
Target object disease name corresponding with the coded set determines that the target object counts special for the first text of the coded set
Sign, comprising:
Obtain the weight of each word in the first word set, wherein first word set is obtained by carrying out duplicate removal processing to the second word set
It arrives, second word set is that the set of the word obtained after word segmentation processing composition, institute are carried out to the corresponding disease name of the coded set
It is true to state the number that the weight of each word in the first word set occurs in second word set by each word in first word set
It is fixed;
Target word set is obtained, and determines that the target word concentrates the power of each word based on the weight of each word in first word set
Weight, wherein the target word set is that the set of the word obtained after word segmentation processing composition is carried out to the target object;
The weight that each word is concentrated by the target word determines that the target object counts the first text of the coded set
Feature.
5. disease automatic coding according to claim 3, which is characterized in that for any coded set, based on described
Target object disease name corresponding with the coded set determines that the target object counts special for the second text of the coded set
Sign, comprising:
The corresponding disease document of the coded set is obtained, the corresponding disease document of the coded set is by the corresponding disease name of the coded set
Composition;
Target word set is obtained, and determines that each word of the target word concentration appears in the word of the corresponding disease document of the coded set
Frequently-inverse document frequency, wherein the target word set is the collection that the word obtained after word segmentation processing composition is carried out to the target object
It closes;
The term frequency-inverse document frequency of the corresponding disease document of the coded set is appeared in by each word that the target word is concentrated,
Determine the target object for the second text statistical nature of the coded set.
6. disease automatic coding according to claim 3, which is characterized in that for any coded set, based on described
Target object disease name corresponding with the coded set determines that the target object counts special for the third text of the coded set
Sign, comprising:
Calculate separately the editing distance of target object disease name corresponding with the coded set;
By the editing distance of target object disease name corresponding with the coded set, determine the target object for this
The third text statistical nature of coded set.
7. disease automatic coding according to claim 3, which is characterized in that for any coded set, based on described
Target object disease name corresponding with the coded set determines that the target object counts special for the 4th text of the coded set
Sign, comprising:
Obtain the corresponding attributed graph of the coded set, wherein the attributed graph includes main word and attribute word, and the main word is should
Keyword in the corresponding disease name of coded set, the attribute word are the qualifier of the main word;
By in target object attributed graph corresponding with the coded set main word and attribute word match;
Based on the match condition of target object attributed graph corresponding with the coded set, determine the target object for the volume
4th text statistical nature of code collection.
8. disease automatic coding described according to claim 1~any one of 7, which is characterized in that described to be based on institute
The semantic relation that candidate code concentrates each candidate code corresponding disease name and the target object is stated, from the candidate volume
The corresponding coding of the target object is determined in code collection, comprising:
Based on the semantic analog information of each candidate code corresponding disease name and the target object, determine described each
The semantic vector of the corresponding disease name of a candidate code;
Based on the semantic vector of the corresponding disease name of each candidate code, is concentrated from the candidate code and determine the mesh
Mark the corresponding coding of object.
9. disease automatic coding according to claim 8, which is characterized in that described to be based on each candidate code
The semantic analog information of corresponding disease name and the target object determines the corresponding disease name of each candidate code
Semantic vector, comprising:
For any candidate code:
It is similar to the semanteme of each character in the target object based on each character in the corresponding disease name of the candidate code
Degree, determines the semantic weight of each character in the corresponding disease name of the candidate code;
Based on the semantic vector and semantic weight of each character in the corresponding disease name of the candidate code, determine that the coding is corresponding
Disease name semantic vector;
To obtain the semantic vector of the corresponding disease name of each candidate code.
10. disease automatic coding according to claim 9, which is characterized in that described corresponding based on the candidate code
Disease name in each character and the target object each character semantic similarity, determine that the candidate code is corresponding
The semantic weight of each character in disease name, comprising:
Determine in the corresponding disease name of the candidate code each word in the semantic vector and the target object of each character
The semantic vector of symbol;
Any character in disease name corresponding for the candidate code calculates separately the semantic vector and the mesh of the character
The similarity for marking the semantic vector of each character of object, using the maximum similarity in the multiple similarities being calculated as this
The semantic weight of character, to obtain the semantic weight of each character in the corresponding disease name of the coding.
11. disease automatic coding according to claim 8, which is characterized in that described based on each candidate volume
The semantic vector of the corresponding disease name of code is concentrated from the candidate code and determines the corresponding coding of the target object, comprising:
By the semantic vector of the corresponding disease name of each candidate code, the score of each candidate code is determined,
Wherein, the score of any candidate code can characterize the semantic phase of the corresponding disease name of the candidate code with the target object
Like degree;
The candidate code of highest scoring is determined as the corresponding coding of the target object.
12. a kind of disease automatic coding system characterized by comprising obtain module, coding scalping module, coding dusting cover mould
Block;
The acquisition module, for obtaining target object, the target object is disease name or disease description;
The coding scalping module, for filtering out coding relevant to the target object from disease code library, by screening
Coding out forms candidate code collection;
The coding dusting cover module, for based on the candidate code concentrate the corresponding disease name of each candidate code with it is described
The semantic relation of target object is concentrated from the candidate code and determines the corresponding coding of the target object.
13. disease automatic coding system according to claim 12, which is characterized in that the coding scalping module includes:
Characteristic determination module and correlative coding screening module;
The characteristic determination module, for based on the corresponding disease of codings all kinds of in the target object and the disease code library
Title determines the target object for the target text statistical nature of all kinds of codings, wherein the target object is for any
The target text statistical nature of class coding is used to characterize the degree of correlation of such coding and the target object;
The correlative coding screening module, for the target text statistical nature based on the target object for all kinds of codings,
Coding relevant to the target object is filtered out from the disease code library.
14. disease automatic coding system according to claim 13, which is characterized in that the characteristic determination module includes:
Coding specification submodule, fisrt feature determine that submodule and second feature determine submodule;
The coding specification submodule obtains multiple coded sets, often for the coding in the disease code library to be classified
The corresponding coding sorts of a coded set;
The fisrt feature determines submodule, for being based on the target object and the corresponding disease name of each coded set, really
The fixed target object is for the first text statistical nature of each coded set, and/or the second text statistical nature, and/or the
Three text statistical natures, and/or the 4th text statistical nature;Wherein, the corresponding disease name of any coded set includes the coding
Concentrate the corresponding disease name of each coding, the target object for any coded set the first text statistical nature, second
Text statistical nature, third text statistical nature, the 4th text statistical nature are respectively used to characterize each in the target object
A word appears in the frequency in the corresponding disease name of the coded set, each word in the target object appears in the coded set
Term frequency-inverse document frequency, target object disease corresponding with the coded set in the document of corresponding disease name composition
The matching of keyword and qualifier in the text similarity of title, the target object disease name corresponding with the coded set
Degree;
The second feature determines submodule, for counting special based on first text of the target object for each coded set
Sign, and/or the second text statistical nature, and/or third text statistical nature, and/or the 4th text statistical nature, determine described in
Target text statistical nature of the target object for all kinds of codings.
15. disease automatic coding system according to claim 14, which is characterized in that the fisrt feature determines submodule
For any coded set, be based on target object disease name corresponding with the coded set, determine the target object for
When the first text statistical nature of the coded set, it is specifically used for:
Obtain the weight of each word in the first word set, wherein first word set is obtained by carrying out duplicate removal processing to the second word set
It arrives, second word set is that the set of the word obtained after word segmentation processing composition, institute are carried out to the corresponding disease name of the coded set
It is true to state the number that the weight of each word in the first word set occurs in second word set by each word in first word set
It is fixed;Target word set is obtained, and determines that the target word concentrates the power of each word based on the weight of each word in first word set
Weight, wherein the target word set is that the set of the word obtained after word segmentation processing composition is carried out to the target object;By described
Target word concentrates the weight of each word, determines the target object for the first text statistical nature of the coded set.
16. disease automatic coding system according to claim 14, which is characterized in that the fisrt feature determines submodule
For any coded set, be based on target object disease name corresponding with the coded set, determine the target object for
When the second text statistical nature of the coded set, it is specifically used for:
The corresponding disease document of the coded set is obtained, the corresponding disease document of the coded set is by the corresponding disease name of the coded set
Composition;Target word set is obtained, and each word for determining that the target word is concentrated appears in the corresponding disease document of the coded set
Term frequency-inverse document frequency, wherein the target word set is that the word obtained after word segmentation processing composition is carried out to the target object
Set;The term frequency-inverse document frequency of the corresponding disease document of the coded set is appeared in by each word that the target word is concentrated,
Determine the target object for the second text statistical nature of the coded set.
17. disease automatic coding system according to claim 14, which is characterized in that the fisrt feature determines submodule
For any coded set, be based on target object disease name corresponding with the coded set, determine the target object for
When the third text statistical nature of the coded set, it is specifically used for:
Calculate separately the editing distance of target object disease name corresponding with the coded set;By the target object with
The editing distance of the corresponding disease name of the coded set determines that the target object counts special for the third text of the coded set
Sign.
18. disease automatic coding system according to claim 14, which is characterized in that the fisrt feature determines submodule
For any coded set, be based on target object disease name corresponding with the coded set, determine the target object for
When the 4th text statistical nature of the coded set, it is specifically used for:
Obtain the corresponding attributed graph of the coded set, wherein the attributed graph includes main word and attribute word, and the main word is should
Keyword in the corresponding disease name of coded set, the attribute word are the qualifier of the main word;By the target object
Main word and attribute word in attributed graph corresponding with the coded set are matched;Based on the target object and the coded set pair
The match condition for the attributed graph answered determines the target object for the 4th text statistical nature of the coded set.
19. disease automatic coding system described in any one of 2~18 according to claim 1, which is characterized in that the coding
Dusting cover module includes: semantic vector determining module and coding screening module;
The semantic vector determining module, for based on each corresponding disease name of candidate code and the target object
Semantic analog information, determine the semantic vector of the corresponding disease name of each candidate code;
The coding screening module, for the semantic vector based on the corresponding disease name of each candidate code, from described
Candidate code, which is concentrated, determines the corresponding coding of the target object.
20. disease automatic coding system according to claim 19, which is characterized in that the semantic vector determining module packet
Include: weight determines that submodule and semantic vector determine submodule;
The weight determines submodule, every in the corresponding disease name of the candidate code for being based on for any candidate code
The semantic similarity of each character in a character and the target object determines each in the corresponding disease name of the candidate code
The semantic weight of character;
The semantic vector determines submodule, for based in the corresponding disease name of the candidate code each character it is semantic to
Amount and semantic weight, determine the semantic vector of the corresponding disease name of the coding.
21. disease automatic coding system according to claim 21, which is characterized in that the weight determines submodule, tool
Body is used to determine each in the semantic vector and the target object of each character in the corresponding disease name of the candidate code
The semantic vector of character;Any character in disease name corresponding for the candidate code, calculates separately the semanteme of the character
The similarity of the semantic vector of each character of vector and the target object, by the maximum in the multiple similarities being calculated
Semantic weight of the similarity as the character, to obtain the semantic weight of each character in the corresponding disease name of the coding.
22. disease automatic coding system according to claim 19, which is characterized in that the coding screening module, specifically
For the semantic vector by the corresponding disease name of each candidate code, the score of each candidate code is determined,
The candidate code of highest scoring is determined as the corresponding coding of the target object;Wherein, the score of any candidate code can
Characterize the semantic similarity degree of the corresponding disease name of the candidate code Yu the target object.
23. a kind of disease autocoding equipment characterized by comprising memory and processor;
The memory, for storing program;
The processor realizes the disease autocoding as described in any one of claim 1~11 for executing described program
Each step of method.
24. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed
When device executes, each step of the disease automatic coding as described in any one of claim 1~11 is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910338773.3A CN109994215A (en) | 2019-04-25 | 2019-04-25 | Disease automatic coding system, method, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910338773.3A CN109994215A (en) | 2019-04-25 | 2019-04-25 | Disease automatic coding system, method, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109994215A true CN109994215A (en) | 2019-07-09 |
Family
ID=67135249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910338773.3A Pending CN109994215A (en) | 2019-04-25 | 2019-04-25 | Disease automatic coding system, method, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109994215A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827929A (en) * | 2019-11-05 | 2020-02-21 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN110880362A (en) * | 2019-11-12 | 2020-03-13 | 南京航空航天大学 | Large-scale medical data knowledge mining and treatment scheme recommending system |
CN110993045A (en) * | 2019-12-03 | 2020-04-10 | 中国医学科学院北京协和医院 | Rare disease registration system |
CN111081333A (en) * | 2019-12-03 | 2020-04-28 | 中国医学科学院北京协和医院 | Rare disease registration method |
CN111180060A (en) * | 2019-11-25 | 2020-05-19 | 云知声智能科技股份有限公司 | Automatic coding method and device for disease diagnosis |
CN111428477A (en) * | 2020-03-06 | 2020-07-17 | 安徽科大讯飞医疗信息技术有限公司 | Diagnostic name standardization method, device, electronic equipment and storage medium |
CN111652737A (en) * | 2020-04-17 | 2020-09-11 | 世纪保众(北京)网络科技有限公司 | Insurance underwriting method and device based on text semantic processing |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112632910A (en) * | 2020-12-21 | 2021-04-09 | 北京惠及智医科技有限公司 | Operation encoding method, electronic device and storage device |
CN112802566A (en) * | 2020-12-31 | 2021-05-14 | 医渡云(北京)技术有限公司 | Method and device for encoding electronic medical record |
CN113223729A (en) * | 2021-05-26 | 2021-08-06 | 广州天鹏计算机科技有限公司 | Data processing method of medical data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577826A (en) * | 2017-10-25 | 2018-01-12 | 山东众阳软件有限公司 | Classification of diseases coding method and system based on raw diagnostic data |
CN107705839A (en) * | 2017-10-25 | 2018-02-16 | 山东众阳软件有限公司 | Disease automatic coding and system |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN108021553A (en) * | 2017-09-30 | 2018-05-11 | 北京颐圣智能科技有限公司 | Word treatment method, device and the computer equipment of disease term |
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
CN109657250A (en) * | 2018-12-12 | 2019-04-19 | 科大讯飞股份有限公司 | A kind of text interpretation method, device, equipment and readable storage medium storing program for executing |
-
2019
- 2019-04-25 CN CN201910338773.3A patent/CN109994215A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021553A (en) * | 2017-09-30 | 2018-05-11 | 北京颐圣智能科技有限公司 | Word treatment method, device and the computer equipment of disease term |
CN107577826A (en) * | 2017-10-25 | 2018-01-12 | 山东众阳软件有限公司 | Classification of diseases coding method and system based on raw diagnostic data |
CN107705839A (en) * | 2017-10-25 | 2018-02-16 | 山东众阳软件有限公司 | Disease automatic coding and system |
CN107977361A (en) * | 2017-12-06 | 2018-05-01 | 哈尔滨工业大学深圳研究生院 | The Chinese clinical treatment entity recognition method represented based on deep semantic information |
CN109065157A (en) * | 2018-08-01 | 2018-12-21 | 中国人民解放军第二军医大学 | A kind of Disease Diagnosis Standard coded Recommendation list determines method and system |
CN109657250A (en) * | 2018-12-12 | 2019-04-19 | 科大讯飞股份有限公司 | A kind of text interpretation method, device, equipment and readable storage medium storing program for executing |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827929B (en) * | 2019-11-05 | 2022-06-07 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN110827929A (en) * | 2019-11-05 | 2020-02-21 | 中山大学 | Disease classification code recognition method and device, computer equipment and storage medium |
CN110880362A (en) * | 2019-11-12 | 2020-03-13 | 南京航空航天大学 | Large-scale medical data knowledge mining and treatment scheme recommending system |
CN110880362B (en) * | 2019-11-12 | 2022-10-11 | 南京航空航天大学 | Large-scale medical data knowledge mining and treatment scheme recommending system |
CN111180060A (en) * | 2019-11-25 | 2020-05-19 | 云知声智能科技股份有限公司 | Automatic coding method and device for disease diagnosis |
CN110993045A (en) * | 2019-12-03 | 2020-04-10 | 中国医学科学院北京协和医院 | Rare disease registration system |
CN111081333A (en) * | 2019-12-03 | 2020-04-28 | 中国医学科学院北京协和医院 | Rare disease registration method |
CN111428477A (en) * | 2020-03-06 | 2020-07-17 | 安徽科大讯飞医疗信息技术有限公司 | Diagnostic name standardization method, device, electronic equipment and storage medium |
CN111428477B (en) * | 2020-03-06 | 2023-10-17 | 讯飞医疗科技股份有限公司 | Diagnostic name standardization method, device, electronic equipment and storage medium |
CN111652737A (en) * | 2020-04-17 | 2020-09-11 | 世纪保众(北京)网络科技有限公司 | Insurance underwriting method and device based on text semantic processing |
CN111652737B (en) * | 2020-04-17 | 2023-12-22 | 世纪保众(北京)网络科技有限公司 | Insurance verification method and apparatus based on text semantic processing |
CN112489740A (en) * | 2020-12-17 | 2021-03-12 | 北京惠及智医科技有限公司 | Medical record detection method, training method of related model, related equipment and device |
CN112632910A (en) * | 2020-12-21 | 2021-04-09 | 北京惠及智医科技有限公司 | Operation encoding method, electronic device and storage device |
CN112802566A (en) * | 2020-12-31 | 2021-05-14 | 医渡云(北京)技术有限公司 | Method and device for encoding electronic medical record |
CN113223729A (en) * | 2021-05-26 | 2021-08-06 | 广州天鹏计算机科技有限公司 | Data processing method of medical data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109994215A (en) | Disease automatic coding system, method, equipment and storage medium | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
CN107731269B (en) | Disease coding method and system based on original diagnosis data and medical record file data | |
CN105589844B (en) | It is a kind of to be used to take turns the method for lacking semantic supplement in question answering system more | |
CN111159407B (en) | Method, apparatus, device and medium for training entity recognition and relation classification model | |
CN110457696A (en) | A kind of talent towards file data and policy intelligent Matching system and method | |
WO2021073116A1 (en) | Method and apparatus for generating legal document, device and storage medium | |
CN107844559A (en) | A kind of file classifying method, device and electronic equipment | |
CN107705839A (en) | Disease automatic coding and system | |
CN110276054B (en) | Insurance text structuring realization method | |
US20160048587A1 (en) | System and method for real-time dynamic measurement of best-estimate quality levels while reviewing classified or enriched data | |
CN104573130B (en) | The entity resolution method and device calculated based on colony | |
CN108280149A (en) | A kind of doctor-patient dispute class case recommendation method based on various dimensions tag along sort | |
CN111680225B (en) | WeChat financial message analysis method and system based on machine learning | |
CN111309900B (en) | Legal class similarity judging and pushing method | |
CN103617435A (en) | Image sorting method and system for active learning | |
CN106445906A (en) | Generation method and apparatus for medium-and-long phrase in domain lexicon | |
CN111899090A (en) | Enterprise associated risk early warning method and system | |
CN116304035B (en) | Multi-notice multi-crime name relation extraction method and device in complex case | |
CN105630931A (en) | Document classification method and device | |
CN107844558A (en) | The determination method and relevant apparatus of a kind of classification information | |
CN109472021A (en) | Critical sentence screening technique and device in medical literature based on deep learning | |
CN111008262A (en) | Lawyer evaluation method and recommendation method based on knowledge graph | |
CN108710907A (en) | Handwritten form data classification method, model training method, device, equipment and medium | |
CN116644184B (en) | Human resource information management system based on data clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190709 |
|
RJ01 | Rejection of invention patent application after publication |