CN112183104B - Code recommendation method, system, corresponding equipment and storage medium - Google Patents

Code recommendation method, system, corresponding equipment and storage medium Download PDF

Info

Publication number
CN112183104B
CN112183104B CN202010870834.3A CN202010870834A CN112183104B CN 112183104 B CN112183104 B CN 112183104B CN 202010870834 A CN202010870834 A CN 202010870834A CN 112183104 B CN112183104 B CN 112183104B
Authority
CN
China
Prior art keywords
sub
diagnosis
text
icd
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010870834.3A
Other languages
Chinese (zh)
Other versions
CN112183104A (en
Inventor
李毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wanghai Kangxin Beijing Technology Co ltd
Original Assignee
Wanghai Kangxin Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wanghai Kangxin Beijing Technology Co ltd filed Critical Wanghai Kangxin Beijing Technology Co ltd
Priority to CN202010870834.3A priority Critical patent/CN112183104B/en
Publication of CN112183104A publication Critical patent/CN112183104A/en
Application granted granted Critical
Publication of CN112183104B publication Critical patent/CN112183104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application discloses a code recommendation method, a system, corresponding equipment and a storage medium, wherein the method comprises the following steps: acquiring a medical records top page diagnosis text as input of a trained first sub-objective inference model, wherein the diagnosis text at least comprises a main diagnosis text, and the first sub-objective inference model comprises a deep learning language model and a sub-objective projection layer linked with the output of the deep learning language model, and the deep learning language model is used for outputting at least a main diagnosis semantic vector based on the input diagnosis text; in response to the diagnostic text not including the sub-diagnosis text, the sub-order projection layer is configured to determine a classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and taking m sub-orders with highest classification probability aiming at the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text. The application can automatically recommend corresponding ICD codes based on text intelligence, and has high accuracy and efficiency.

Description

Code recommendation method, system, corresponding equipment and storage medium
Technical Field
The present application relates to the field of electronic digital data processing, and in particular, to a method and a system for code recommendation, and a corresponding device and a storage medium.
Background
Traditional auxiliary coding tools have existed for many years, but the manner of implementing intelligence has remained at a relatively shallow level. Most of the existing coding tools are based on texts, and possible prompts are made on the basis of keywords to guide coding staff to operate step by step to obtain final codes. The core of traditional auxiliary coding is character string search and string matching.
However, text is in most cases presented in natural language form (i.e., manually entered textual content). Meanwhile, ICD codes (including ICD-10 and ICD-9-CM3 versions) of the national specifications, wherein the standard writing mode is also based on natural language. In general, it is necessary to select the appropriate corresponding code from ICD codes that are automatically matched by the system through keywords or complete contents input by oneself. However, many times the writing pattern is quite different from the pattern in the national specification, and thus the corresponding ICD code cannot be found by searching for keywords in some cases.
Disclosure of Invention
The invention provides a code recommending method, a system, corresponding equipment and a storage medium, which can intelligently and automatically recommend corresponding ICD codes based on texts and improve recommending accuracy and efficiency.
In a first aspect of the present invention, there is provided a code recommendation method, comprising:
Acquiring a medical records top page diagnosis text as input of a trained first sub-objective inference model, wherein the diagnosis text at least comprises a main diagnosis text, and the first sub-objective inference model comprises a deep learning language model and a sub-objective projection layer linked with the output of the deep learning language model, and the deep learning language model is used for outputting at least a main diagnosis semantic vector based on the input diagnosis text;
In response to the diagnostic text not including the sub-diagnosis text, the sub-order projection layer is configured to determine a classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and
Taking m sub-orders with highest classification probability aiming at the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text, wherein m is more than or equal to 1;
Wherein training of the deep learning language model comprises at least: and training the deep learning language model by using the primary diagnosis text, the secondary diagnosis text and the corresponding ICD coding data in the medical records home page library.
In an embodiment, the method further comprises: responding to the diagnosis text including a sub-diagnosis text, outputting a sub-diagnosis semantic vector by the deep learning language model, and respectively determining classification probabilities of the corresponding main diagnosis and sub-diagnosis text for each international disease classification ICD sub-order based on the main diagnosis and the sub-diagnosis semantic vector by the sub-order projection layer; and taking m 'and n sub-orders with highest classification probability aiming at the primary diagnosis and the secondary diagnosis texts as recommended primary diagnosis ICD sub-orders and secondary diagnosis ICD sub-orders of the primary diagnosis and the secondary diagnosis texts respectively, wherein m' is more than or equal to 1, and n is more than or equal to 1.
In an embodiment, the method further comprises: the deep learning language model is used for outputting a subsymbol semantic vector and determining classification probability of the corresponding subsymbol text for each international disease classification ICD subsyman based on the subsymbol semantic vector; and taking the n sub-orders with highest classification probability aiming at the sub-diagnosis text as recommended sub-ICD sub-orders of the sub-diagnosis text, wherein n is more than or equal to 1.
In an embodiment, the method further comprises: responding to the diagnosis text including the sub-diagnosis text, and searching the corresponding sub-diagnosis ICD subgraph from the mapping table according to the sub-diagnosis text; combining the recommended sub-triage ICDs with sub-triage ICDs looked up from the mapping table to form a sub-triage ICD sub-triage set, and combining the expression of each sub-triage one-hot vector in the sub-triage ICD sub-triage set into a sub-triage multi-hots vector; the primary diagnosis semantic vector and the secondary diagnosis multi-hots vector are used as the input of a trained second sub-objective inference model, and the second sub-objective inference model outputs k primary diagnosis ICD sub-objectives with highest classification probability of the corresponding primary diagnosis text, wherein k is more than or equal to 1.
In an embodiment, the method further comprises: based on an index library containing ICD coding standard library data to be used, determining one ICD detail code corresponding to a main diagnosis text to be subjected to ICD coding recommendation from ICD detail corresponding to the main diagnosis ICD subgraph in the index library by adopting a text similarity algorithm.
In a second aspect of the present invention, there is provided an encoding recommendation system comprising:
The medical records diagnosis system comprises an input module, a diagnosis module and a diagnosis module, wherein the input module is used for acquiring a medical records first page diagnosis text as the input of a trained first sub-order inference model, the diagnosis text at least comprises a main diagnosis text, the first sub-order inference model comprises a deep learning language model and a sub-order projection layer linked with the output of the deep learning language model, and the deep learning language model is used for outputting at least a main diagnosis semantic vector based on the input diagnosis text;
The projection module is used for responding to the fact that the diagnosis text does not comprise the sub-diagnosis text, and the sub-order projection layer is used for determining the classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and
The recommendation module is used for taking m sub-orders with highest classification probability for the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text, wherein m is more than or equal to 1;
Wherein training of the deep learning language model comprises at least: and training the deep learning language model by using the primary diagnosis text, the secondary diagnosis text and the corresponding ICD coding data in the medical records home page library.
In a third aspect of the invention there is provided a computer device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect of the invention when the computer program is executed by the processor.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method according to the first aspect of the present invention.
According to the invention, the deep learning language model such as the BERT model is trained by using the medical coding related text corpus and the massive medical records first page library of the writing habit of doctors, the BERT model can be utilized to carry out semantic analysis on discharge diagnosis characters written by the doctors, a section of doctor diagnosis texts can be very effectively positioned to the suborder, and the suborder accuracy rate is up to more than 86 percent. Furthermore, the multi-hots vector expression of the next diagnosis and other individual features of the first page of the medical records are applied to model training to match core words, and finally, the detail of intelligent codes is determined, so that the process of rough classification and then fine matching is realized, and the recommendation efficiency of ICD codes is higher.
Other features and advantages of the present invention will become more apparent from the following detailed description of embodiments of the present invention, which is to be read in connection with the accompanying drawings.
Drawings
FIG. 1 is a flow chart of an embodiment of a method according to the present invention;
FIG. 2 shows a schematic structural diagram of a first subgraph inference model in accordance with one embodiment of the present invention;
FIG. 3 schematically illustrates the merging of multiple one-hot vectors into a multi-hots vector;
FIG. 4 shows a schematic structural diagram of a second subgraph inference model in accordance with one embodiment of the present invention;
Fig. 5 is a block diagram of an embodiment of a system according to the present invention.
For the sake of clarity, these figures are schematic and simplified drawings, which only give details which are necessary for an understanding of the invention, while other details are omitted.
Detailed Description
Embodiments and examples of the present invention will be described in detail below with reference to the accompanying drawings.
The scope of applicability of the present invention will become apparent from the detailed description given hereinafter. It should be understood, however, that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only.
The international disease classification (International Classification of Diseases, ICD) natural coding business plays an important role in medical treatment, scientific research, teaching, hospital management, medical payment, domestic and international communication. The ICD natural coding service is the basis of disease Diagnosis Related Grouping (DRG) and disease classification, and the accuracy of ICD coding directly influences the operation of medical insurance funds, which is the basis of the refined operation of hospitals. The first page diagnosis expression of the medical records is mainly text, and comprises a primary diagnosis text and/or a secondary diagnosis text. The medical records page also comprises characteristic fields such as name, gender, age, discharge department, hospital days and the like.
Fig. 1 shows a flow chart of a preferred embodiment of the code recommendation method according to the invention, i.e. the case where the medical records top page does not contain a diagnosis.
In step S102, a medical records home diagnosis text is acquired as input to a trained first sub-order inference model comprising a deep learning language model and a sub-order projection layer linked to its output. The deep learning language model may employ the BERT, ALBERT models of google, or other language models such as RoBERTa, XLNet, etc., where the BERT model is preferred and the invention is described below in connection with the BERT model. In an embodiment, as shown in fig. 2, the sub-mesh projection layer includes a full tie layer (fc) and a softmax layer.
The BERT (Bidirectional Encoder Representations from Transformers) model is a language model recently proposed by Google, and as a substitute for Word2Vec, BERT essentially learns a good semantic feature representation for words by running a self-supervised learning method on the basis of massive corpora. The self-supervised learning is supervised learning performed on data without manual annotation. The BERT provides a model for other task migration learning that can be tuned or fixed based on the task and then act as a feature extractor.
The training of the BERT model includes: the BERT model is optimized by using MaskedLM (masking language model) from the collected medical coding related text corpus, and the MaskedLM is helpful for subsequent model training by performing unsupervised training on the text in which a large number of unlabeled medical fields are collected. And then, forming a classification data set by using the main diagnosis text, the sub diagnosis text and the corresponding ICD coding data in the massive medical records home page library to further and purposefully infer and tune the BERT model. In other embodiments, training with MaskedLM may be omitted.
In an embodiment, the amount of diagnostic text should be less than 512 words, if exceeded, then the intercept is performed. The BERT model outputs a prime diagnosis semantic vector (such as 768-dimensional vector) based on the input prime diagnosis text, and the sub-order projection layer determines the classification probability of the corresponding prime diagnosis text for each international disease classification ICD sub-order based on the prime diagnosis semantic vector.
In step S104, m sub-orders with highest classification probability for the main diagnosis text are used as recommended main diagnosis ICD sub-orders of the main diagnosis text, and m is more than or equal to 1 and is an integer. In this example, the diagnostic expression of the top page of the medical records may have an ICD subgraph accuracy prediction rate of 86%.
In another embodiment, the medical records first page diagnostic expression includes a primary diagnosis and a secondary diagnosis. In this case, the BERT model outputs primary diagnosis and secondary diagnosis semantic vectors based on the input diagnosis expressions, the sub-order projection layer determines classification probabilities of corresponding primary diagnosis and secondary diagnosis texts for each international disease classification ICD sub-order based on the primary diagnosis and secondary diagnosis semantic vectors, and then uses m 'and n sub-orders with highest classification probabilities for the primary diagnosis and secondary diagnosis texts as recommended primary diagnosis ICD sub-order and secondary diagnosis ICD sub-order of the primary diagnosis and secondary diagnosis texts, respectively, where m' is greater than or equal to 1, n is greater than or equal to 1, m 'and n are integers, and m, m' and n may be equal to or unequal to each other.
In a case where the medical records first page diagnosis expression includes a main diagnosis and a sub-diagnosis, in another embodiment, the BERT model outputs a main diagnosis and a sub-diagnosis semantic vector based on the inputted diagnosis expression, the sub-order projection layer determines classification probabilities of the corresponding sub-diagnosis text for each international disease classification ICD sub-order based on the sub-diagnosis semantic vector, and uses n sub-orders with highest classification probabilities for the sub-diagnosis text as recommended sub-diagnosis ICD sub-orders of the sub-diagnosis text. On this basis, rule matching, such as keyword matching, is performed on the sub-diagnosis text to find the corresponding sub-diagnosis ICD subgraph from the mapping table.
One-hot vector is a process of converting a class variable into a form that is easy to use by a machine learning algorithm, and is represented as a feature vector of an attribute, that is, only one activation point (not 0) at the same time, only one feature of the vector is not 0, and the other features are all 0, so that the feature becomes sparse.
The recommended sub-triage ICDs are combined with sub-triage ICDs looked up from the mapping table to form a sub-triage ICD sub-triage set, and all sub-triage in the sub-triage ICD sub-triage set are expressed as one-hot vectors of sub-triage classification. And combining all sub-diagnostic sub-order one-hot vector expressions corresponding to one main diagnosis to form multi-hots vector expressions of the sub-diagnosis. The multiple sub-diagnostic sub-purpose one-hot vectors are combined into a sub-diagnostic multi-hots vector as shown in figure 3.
And then, taking the main diagnosis semantic vector and the sub-diagnosis multi-hots vector together as secondary deducing features and inputting the secondary deducing features into a trained second sub-order deducing model, namely taking the relation between the main diagnosis and all the sub-diagnosis related to the main diagnosis as a feature, carrying out multidimensional division on a sample by multi-hots vector expression and the main diagnosis vector together, wherein the second sub-order deducing model is used for outputting k main diagnosis ICD sub-orders with highest classification probability of corresponding main diagnosis texts, k is more than or equal to 1 and is an integer, and k and m, m' and n can be equal or unequal to each other. In an embodiment, as shown in fig. 4, the second sub-order inference model includes DenseFeatures layers, dropout layers, batchNormalization layers, dense layers, and Softmax layers connected in sequence. The DenseFeatures layer is used for stitching the vectors input to the second sub-order inference model into a multi-feature vector, the Dropout layer is used for preventing overfitting, the BatchNormalization layer is used for normalizing the data in batches, and the Dense layer and the Softmax layer are used for projection to the sub-order classification. Since the secondary diagnosis has a strong auxiliary effect on the coding prediction of the primary diagnosis, for example, diabetes patients, more than half of complications are caused by cardiovascular and cerebrovascular diseases, and 10% are caused by nephrosis. Therefore, the multi-feature formed by the multi-hots vector of the secondary diagnosis code can obviously improve the accuracy of the prediction of the primary diagnosis code.
In another embodiment, since the characteristic fields of gender, age, blood type, hospital days, discharge department and the like in the first page of the medical records have a certain correlation with the coding classification, one or more of the characteristic fields in the first page of the medical records can be extracted, the numerical type fields are subjected to standardization, discretization and the like, the classified fields are subjected to one-hot coding, and then are combined into one vector and are input into a trained second sub-objective inference model together with a main diagnosis semantic vector and a sub-diagnosis multi-hots vector as a secondary inference feature, so that the prediction accuracy of the ICD sub-objective can be further improved by approximately 5%.
Training of the second sub-order inference model comprises training, based on the main diagnosis text, the sub-diagnosis text and the corresponding ICD codes in the massive medical records first page library, the main diagnosis semantic vector obtained through the first sub-order inference model and the sub-diagnosis multi-hots vector formed as described above, optionally, and the vector formed by the medical records first page feature field together as the second inference feature and as the input of the second sub-order inference model, so as to realize comprehensive ICD sub-order inference.
In an embodiment, after recommending ICD sub-order codes, corresponding ICD detail codes may also be recommended. For example, an elastiscsearch full text search model may be used for ICD detail coding recommendation, with the elastiscsearch using the BW25 text similarity algorithm. Based on an index library containing ICD coding standard library data to be used, determining one ICD detail code corresponding to the main diagnosis text from ICD detail corresponding to the main diagnosis ICD subgraph in the index library by adopting a text similarity algorithm. And multi-level ICD coding recommendation is carried out based on the context semantic model and a text similarity algorithm, and the text similarity algorithm takes the subgrade as a filtering condition, so that the reliability and the accuracy rate of ICD coding recommendation are greatly improved.
FIG. 5 shows a block diagram of a preferred embodiment of a coding recommendation system according to the present invention, the system comprising:
An input module 502 for obtaining a medical records top page diagnostic text as input of a trained first sub-order inference model, wherein the diagnostic text comprises at least a prime diagnostic text, wherein the first sub-order inference model comprises a deep learning language model and a sub-order projection layer linked to an output of the deep learning language model, the deep learning language model for outputting at least a prime semantic vector based on the input diagnostic text;
A projection module 504 for determining, in response to the diagnostic text not including the sub-diagnosis text, a classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and
The recommendation module 506 is configured to take m sub-orders with highest classification probability for the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text, where m is greater than or equal to 1.
In another embodiment, the present invention provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the method embodiments shown and described in connection with fig. 1-4 or other corresponding method embodiments, which are not described in detail herein.
In another embodiment, the present invention provides a computer device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the steps of the method embodiments shown and described in connection with fig. 1-4 or other corresponding method embodiments are implemented by the processor when the computer program is executed, and are not repeated herein.
The various embodiments described herein, or particular features, structures, or characteristics thereof, may be combined as suitable in one or more embodiments of the invention. In addition, in some cases, the order of steps described in the flowcharts and/or flow-line processes may be modified as appropriate and need not be performed in exactly the order described. Additionally, various aspects of the invention may be implemented using software, hardware, firmware, or a combination thereof and/or other computer-implemented modules or devices that perform the described functions. A software implementation of the present invention may include executable code stored in a computer readable medium and executed by one or more processors. The computer-readable medium may include a computer hard drive, ROM, RAM, flash memory, a portable computer storage medium such as CD-ROM, DVD-ROM, flash drives and/or other devices having a Universal Serial Bus (USB) interface, and/or any other suitable tangible or non-transitory computer-readable medium or computer memory on which executable code may be stored and executed by a processor. The invention may be used in connection with any suitable operating system.
As used herein, the singular forms "a", "an" and "the" include plural referents (i.e., having the meaning of "at least one") unless otherwise indicated. It will be further understood that the terms "has," "comprises," "including" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
While the foregoing is directed to some preferred embodiments of the present invention, it should be emphasized that the present invention is not limited to these embodiments, but may be embodied in other forms within the scope of the inventive subject matter. Various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, and these changes or modifications still fall within the scope of the invention.

Claims (8)

1. A code recommendation method, the method comprising:
Obtaining a medical records top page diagnosis text as input of a trained first sub-order inference model, wherein the diagnosis text comprises a main diagnosis text, and the first sub-order inference model comprises a deep learning language model and a sub-order projection layer linked with the output of the deep learning language model, wherein the deep learning language model is used for outputting a main diagnosis semantic vector based on the input diagnosis text;
in response to the diagnostic text not including the sub-diagnosis text, the sub-order projection layer is configured to determine a classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and
Taking m sub-orders with highest classification probability aiming at the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text, wherein m is more than or equal to 1;
Wherein the training of the deep learning language model comprises: training the deep learning language model by using the primary diagnosis text, the secondary diagnosis text and the corresponding ICD coding data in the medical records home page library;
The deep learning language model is used for outputting a subsymbol semantic vector and determining classification probability of the corresponding subsymbol diagnosis text for each international disease classification ICD subsyman based on the subsymbol semantic vector; and
Taking n sub-orders with highest classification probability aiming at the sub-diagnosis text as recommended sub-ICD sub-orders of the sub-diagnosis text, wherein n is more than or equal to 1;
Searching a corresponding sub-diagnosis ICD subgraph from the mapping table according to the sub-diagnosis text;
Combining the recommended sub-triage ICDs with sub-triage ICDs looked up from the mapping table to form a sub-triage ICD sub-triage set, and combining the expression of each sub-triage one-hot vector in the sub-triage ICD sub-triage set into a sub-triage multi-hots vector;
The primary diagnosis semantic vector and the secondary diagnosis multi-hots vector are used as the input of a trained second sub-objective inference model, and the second sub-objective inference model outputs k primary diagnosis ICD sub-objectives with highest classification probability of the corresponding primary diagnosis text, wherein k is more than or equal to 1.
2. The method according to claim 1, wherein the method further comprises:
Responding to the diagnosis text including a sub-diagnosis text, and outputting a sub-diagnosis semantic vector by the deep learning language model, wherein a sub-order projection layer is used for respectively determining the classification probability of the corresponding main diagnosis and sub-diagnosis text for each international disease classification ICD sub-order based on the main diagnosis and the sub-diagnosis semantic vector; and
And respectively taking m 'and n sub-orders with highest classification probability aiming at the primary diagnosis and the secondary diagnosis texts as recommended primary diagnosis ICD sub-orders and secondary diagnosis ICD sub-orders of the primary diagnosis and the secondary diagnosis texts, wherein m' is more than or equal to 1, and n is more than or equal to 1.
3. The method of claim 1, wherein the input of the second subgraph inference model further comprises one-hot vectors of feature fields of the medical records homepage, the feature fields including gender, age, discharge department, and/or blood type.
4. A method according to claim 1 or 3, wherein the second sub-order inference model comprises DenseFeatures, dropout, batchNormalization, dense and Softmax layers connected in sequence, wherein DenseFeatures is used to stitch vectors input to the second sub-order inference model into one multi-feature vector, dropout is used to prevent overfitting, batchNormalization is used to normalize the data batch, and Dense and Softmax layers are used to project to the sub-order classification.
5. The method according to claim 1, wherein the method further comprises:
Based on an index library containing ICD coding standard library data to be used, determining one ICD detail code corresponding to a main diagnosis text to be subjected to ICD coding recommendation from ICD detail corresponding to the main diagnosis ICD subgraph in the index library by adopting a text similarity algorithm.
6. A code recommendation system, the system comprising:
An input module for obtaining a medical records top page diagnostic text as input of a trained first sub-order inference model, wherein the diagnostic text comprises a prime diagnostic text, wherein the first sub-order inference model comprises a deep learning language model and a sub-order projection layer linked with an output of the deep learning language model, the deep learning language model is used for outputting a prime semantic vector based on the input diagnostic text;
the first projection module is used for responding to the fact that the diagnosis text does not comprise the sub-diagnosis text, and the sub-order projection layer is used for determining the classification probability of the corresponding main diagnosis text for each international disease classification ICD sub-order based on the main diagnosis semantic vector; and
The first recommendation module is used for taking m sub-orders with highest classification probability for the main diagnosis text as recommended main diagnosis ICD sub-orders of the main diagnosis text, wherein m is more than or equal to 1;
wherein the training of the deep learning language model comprises: training the deep learning language model by using the primary diagnosis text, the secondary diagnosis text and the corresponding ICD coding data in the medical records home page library;
Wherein the system further comprises:
The second projection module is used for responding to the diagnosis text and comprises a sub-diagnosis text, the deep learning language model is further used for outputting a sub-diagnosis semantic vector, and the sub-order projection layer is further used for determining the classification probability of the corresponding sub-diagnosis text for each international disease classification ICD sub-order based on the sub-diagnosis semantic vector; and
The second recommendation module is used for taking n sub-orders with highest classification probability for the sub-diagnosis text as recommended sub-diagnosis ICD sub-orders of the sub-diagnosis text, wherein n is more than or equal to 1;
The searching module is used for searching the corresponding sub-diagnosis ICD subgraph from the mapping table according to the sub-diagnosis text;
The merging processing module is used for forming a sub-diagnosis ICD subgraph group by the recommended sub-diagnosis ICD subgraph and the sub-diagnosis ICD subgraph searched from the mapping table, and merging the expression of each sub-object one-hot vector in the sub-diagnosis ICD subgraph group into a sub-diagnosis multi-hots vector;
the third recommendation module is used for taking the main diagnosis semantic vector and the sub-diagnosis multi-hots vector together as the input of a trained second sub-order inference model, and the second sub-order inference model outputs k main diagnosis ICD sub-orders with highest classification probability of the corresponding main diagnosis text, wherein k is more than or equal to 1.
7. A computer device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to claim 1 when the computer program is executed by the processor.
8. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to claim 1.
CN202010870834.3A 2020-08-26 2020-08-26 Code recommendation method, system, corresponding equipment and storage medium Active CN112183104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010870834.3A CN112183104B (en) 2020-08-26 2020-08-26 Code recommendation method, system, corresponding equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010870834.3A CN112183104B (en) 2020-08-26 2020-08-26 Code recommendation method, system, corresponding equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112183104A CN112183104A (en) 2021-01-05
CN112183104B true CN112183104B (en) 2024-06-14

Family

ID=73925713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010870834.3A Active CN112183104B (en) 2020-08-26 2020-08-26 Code recommendation method, system, corresponding equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112183104B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818085B (en) * 2021-01-28 2024-06-18 东软集团股份有限公司 Value range data matching method and device, storage medium and electronic equipment
CN112599213B (en) * 2021-03-04 2021-05-25 联仁健康医疗大数据科技股份有限公司 Classification code determining method, device, equipment and storage medium
CN115270718B (en) * 2022-07-26 2023-10-10 中国医学科学院阜外医院 Automatic cataloging method and system for disease codes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109697285A (en) * 2018-12-13 2019-04-30 中南大学 Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080288292A1 (en) * 2007-05-15 2008-11-20 Siemens Medical Solutions Usa, Inc. System and Method for Large Scale Code Classification for Medical Patient Records
CN106951684B (en) * 2017-02-28 2020-10-09 北京大学 Method for entity disambiguation in medical disease diagnosis record
CN107705839B (en) * 2017-10-25 2020-06-26 山东众阳软件有限公司 Disease automatic coding method and system
US11024424B2 (en) * 2017-10-27 2021-06-01 Nuance Communications, Inc. Computer assisted coding systems and methods
CN109273062A (en) * 2018-08-09 2019-01-25 北京爱医声科技有限公司 ICD intelligence Auxiliary Encoder System
CN109993227B (en) * 2019-03-29 2021-09-24 京东方科技集团股份有限公司 Method, system, apparatus and medium for automatically adding international disease classification code
CN110491465B (en) * 2019-08-20 2020-09-15 山东众阳健康科技集团有限公司 Disease classification coding method, system, device and medium based on deep learning
CN111581987A (en) * 2020-04-13 2020-08-25 广州天鹏计算机科技有限公司 Disease classification code recognition method, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065157A (en) * 2018-08-01 2018-12-21 中国人民解放军第二军医大学 A kind of Disease Diagnosis Standard coded Recommendation list determines method and system
CN109697285A (en) * 2018-12-13 2019-04-30 中南大学 Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness

Also Published As

Publication number Publication date
CN112183104A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN112183104B (en) Code recommendation method, system, corresponding equipment and storage medium
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
CN111897967A (en) Medical inquiry recommendation method based on knowledge graph and social media
CN116682553B (en) Diagnosis recommendation system integrating knowledge and patient representation
CN112232065B (en) Method and device for mining synonyms
CN111814463B (en) International disease classification code recommendation method and system, corresponding equipment and storage medium
CN113779179B (en) ICD intelligent coding method based on deep learning and knowledge graph
CN112559684A (en) Keyword extraction and information retrieval method
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN116719520B (en) Code generation method and device
CN112883199A (en) Collaborative disambiguation method based on deep semantic neighbor and multi-entity association
CN112069783A (en) Medical record input method and input system thereof
CN116911300A (en) Language model pre-training method, entity recognition method and device
CN113284627B (en) Medication recommendation method based on patient characterization learning
CN113722507B (en) Hospitalization cost prediction method and device based on knowledge graph and computer equipment
CN113297852B (en) Medical entity word recognition method and device
Ruas et al. LasigeBioTM at CANTEMIST: Named Entity Recognition and Normalization of Tumour Morphology Entities and Clinical Coding of Spanish Health-related Documents.
CN110287270B (en) Entity relationship mining method and equipment
Zhou et al. Dut-nlp at mediqa 2019: an adversarial multi-task network to jointly model recognizing question entailment and question answering
CN116881336A (en) Efficient multi-mode contrast depth hash retrieval method for medical big data
CN116992002A (en) Intelligent care scheme response method and system
Wang et al. Enabling scientific reproducibility through FAIR data management: An ontology-driven deep learning approach in the NeuroBridge Project
CN113495964A (en) Method, device and equipment for screening triples and readable storage medium
CN112185573B (en) Similar character string determining method and device based on LCS and TF-IDF

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant