CN114974602A - Diagnostic coding method and system based on contrast learning - Google Patents

Diagnostic coding method and system based on contrast learning Download PDF

Info

Publication number
CN114974602A
CN114974602A CN202210581884.9A CN202210581884A CN114974602A CN 114974602 A CN114974602 A CN 114974602A CN 202210581884 A CN202210581884 A CN 202210581884A CN 114974602 A CN114974602 A CN 114974602A
Authority
CN
China
Prior art keywords
diagnosis name
clinical diagnosis
standard
name
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210581884.9A
Other languages
Chinese (zh)
Inventor
薛付忠
张琪
胡锡锋
季晓康
陈耀祖
张健
李平福
王永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210581884.9A priority Critical patent/CN114974602A/en
Publication of CN114974602A publication Critical patent/CN114974602A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention belongs to the technical field of medical data processing, and provides a diagnostic coding method and a diagnostic coding system based on contrast learning, wherein the method comprises the following steps: acquiring a plurality of clinical diagnosis codes and positive examples and negative examples thereof; training a contrast learning based diagnostic coding model; respectively obtaining vector representation for the pre-acquired clinical diagnosis name and the standard diagnosis name based on the model to form a diagnosis name vector representation library; obtaining the clinical diagnosis name of the code to be matched, and obtaining vector representation according to the model; and acquiring the most similar clinical diagnosis name/standard diagnosis name through the similarity between vector representations, wherein the corresponding standard code is the required standard code. The matching between the diagnostic name to be checked and the standard diagnostic name is converted into distance measurement in a public expression space, so that the matching efficiency is improved on the basis of ensuring the precision.

Description

Diagnostic coding method and system based on contrast learning
Technical Field
The invention belongs to the technical field of medical data processing, and particularly relates to a diagnostic coding method and system based on contrast learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The diagnosis coding task is to correspond the diagnosis part in the medical record to the standard diagnosis, the diagnosis coding has extremely important functions in the aspects of medical big data mining analysis, case filing, DRGS medical insurance payment and the like, the former diagnosis coding task is often coded by hospital coders in a manual mode, the coding mode has limited efficiency, and the coding quality is poor due to different coders understanding the coding standard or different levels.
With the development of artificial intelligence, many researchers have performed ICD coding from the perspective of deep learning. The coding task is still difficult to solve by deep learning, for example, the ICD10 standard version has more than 2 ten thousand standard codes, the clinical version standard codes have more types, most of the codes are diagnosed as rare diseases, data are extremely unbalanced, if each standard code is used as one type, modeling is performed by adopting a multi-classification mode, predictive codes are deviated to most types, classification accuracy is low, and required training data are more millions. The problem of unbalanced data can be solved by using a short text matching scheme to splice clinical diagnosis and standard diagnosis to form a positive sample and a negative sample for training, but when the method is used for prediction, a sentence formed by the diagnosis to be matched and all possible standard diagnoses needs to be used as the input of a model for prediction, so that the speed is low, and the practical application is difficult.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a diagnostic coding scheme based on contrast learning, wherein codes are mapped into a certain representation space, the space distance of the same diagnosis is closer, and the space distance of different diagnoses is farther. The codes are projected to the expression space, so that the problem of most types of deflection caused by unbalanced samples is solved, diagnosis similarity is measured by calculating the vector distance of the projection space, similar diagnosis can be quickly found, cosine similarity is calculated between clinical diagnosis to be matched and all similar diagnoses including the clinical diagnosis, and errors caused by large difference between aliases of diseases can be reduced, so that the purpose is achieved, and one or more embodiments of the invention provide the following technical scheme:
a diagnostic coding method based on contrast learning comprises the following steps:
acquiring a clinical diagnosis name and a standard diagnosis name;
acquiring a positive example and a negative example of each clinical diagnosis name;
training a model based on a contrast learning framework according to the clinical diagnosis codes and positive examples and negative examples thereof;
obtaining corresponding vector representation for all clinical diagnosis names and standard diagnosis names based on the model respectively to form a diagnosis name vector representation library;
acquiring a clinical diagnosis name of a code to be checked, and obtaining corresponding vector representation according to the model;
and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the vector representation of the clinical diagnosis name of the code to be checked, wherein the corresponding standard code is the standard code of the clinical diagnosis name of the code to be checked.
Further, for each clinical diagnostic code, obtaining positive and negative examples thereof comprises:
after the clinical diagnosis names are obtained, aiming at each clinical diagnosis name, constructing a clinical diagnosis name-standard diagnosis code-standard diagnosis name matching pair based on the standard codes;
in the plurality of matching pairs, similarity calculation is carried out on each clinical diagnosis name and each standard diagnosis name and other clinical diagnosis names, and for the clinical diagnosis name and the standard diagnosis name with the similarity higher than a set threshold value, standard codes in the matching pairs where the clinical diagnosis name and the standard diagnosis name are located are respectively obtained, and if the standard codes are the same, the matching pairs are marked as positive examples, and if the standard codes are different, the matching pairs are marked as negative examples.
Further, deriving a vector representation from the model comprises:
and inputting the clinical diagnosis names into the model, and performing mean pooling on the output to obtain vector representation.
Further, the model adopts a twin network-based contrast learning architecture, and the training method comprises the following steps:
for each clinical diagnosis name in the Batch, inputting the clinical diagnosis name, the positive example and the negative example into a model, and performing mean pooling on output to obtain corresponding vector representation of the three;
constructing a loss function based on the clinical diagnosis name and the vector representation similarity of the positive case and the vector representation similarity of the negative case:
Figure BDA0003664223150000031
wherein, the simp i,i The vector representing the similarity of the ith clinical diagnosis name to the positive case represents the similarity, simni, j the vector representing the similarity of the ith clinical diagnosis name and the counterexample represents the similarity, and tau represents a temperature parameter; n represents the number of diagnoses in the batch;
and obtaining a trained model by taking the lowest Loss value as a model termination condition.
Further, obtaining the most similar clinical diagnosis name/standard diagnosis name includes:
according to the similarity between the diagnosis names, acquiring a plurality of clinical diagnosis names or standard diagnosis names which are most similar to the clinical diagnosis name of the code to be checked from a vector representation library;
and acquiring the most similar clinical diagnosis name or standard diagnosis name from the plurality of clinical diagnosis names or standard diagnosis names according to the similarity among the vector representations.
Further, the similarity measure between vector representations employs cosine similarity.
One or more embodiments provide a contrast learning based diagnostic coding system, comprising:
the data acquisition module is used for acquiring a clinical diagnosis name and a standard diagnosis name;
the training data construction module is used for acquiring a positive example and a negative example of each clinical diagnosis name;
the model training module is used for training a code matching model based on a comparison learning framework;
the vector representation library generating module is used for obtaining corresponding vector representations of all clinical diagnosis names and standard diagnosis names in the plurality of matching pairs based on the model respectively to form a diagnosis name vector representation library;
the code to be checked name coding module is used for acquiring the clinical diagnosis name of the code to be checked and obtaining corresponding vector representation according to the model; and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the vector representation of the clinical diagnosis name of the code to be checked, wherein the corresponding standard code is the standard code of the clinical diagnosis name of the code to be checked.
One or more embodiments provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the contrast learning based diagnostic encoding method when executing the program.
One or more embodiments provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the contrast learning-based diagnostic encoding method.
The above one or more technical solutions have the following beneficial effects:
through constructing a model, vector representation can be carried out on any clinical diagnosis name or standard diagnosis name, namely mapping to a public representation space, matching between the diagnosis name to be subjected to code matching and the standard diagnosis name is converted into distance measurement in the public representation space, the space distance of the same diagnosis is close, the space distance of different diagnoses is far, and the diagnosis similarity is measured by calculating the vector distance of the public space, so that the similar diagnosis can be quickly found; different diagnosis names can be better distinguished through vector representation, and matching accuracy is guaranteed; meanwhile, by projecting the diagnosis names to the public representation space, most kinds of deflection problems caused by unbalanced samples are solved.
By constructing a diagnosis name vector representation library, both clinical diagnosis names and standard diagnosis names are brought into the diagnosis name vector representation library to be used as comparison objects of clinical diagnosis names to be subjected to code comparison, so that errors caused by large difference among aliases of diseases can be effectively reduced; and the matching efficiency is improved by comparing the names first and then comparing the vector representation, and the standard codes of the clinical diagnosis names of the codes to be matched can be quickly found.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow diagram of a contrast learning based diagnostic coding method in one or more embodiments of the invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The embodiment discloses a diagnostic coding method based on contrast learning, which comprises the following steps:
step 1: and acquiring a plurality of clinical diagnosis names, positive examples and negative examples thereof, wherein the standard diagnosis names in the positive examples are the same as the standard codes of the clinical diagnosis names, and the standard diagnosis names in the negative examples are different from the standard codes of the clinical diagnosis names.
The step 1 specifically comprises:
step 1-1: for each clinical diagnosis name, a clinical diagnosis name-standard diagnosis code-standard diagnosis name matching pair is constructed based on the standard codes.
The clinical diagnosis name is obtained based on an electronic medical record. Specifically, a plurality of electronic medical records are acquired, diagnosis data in the electronic medical records are extracted, and preprocessing is performed, wherein the preprocessing comprises the following steps: and splitting the multiple diagnoses to remove error diagnoses. It should be noted here that the more electronic medical records are acquired, the more clinical diagnosis names that can be extracted, the different expressions of the same diagnosis name can be acquired to the maximum extent, which is helpful for acquiring enough training data, so as to improve the accuracy of the subsequent comparison model.
After the clinical diagnosis names are obtained, aiming at each clinical diagnosis name, a clinical diagnosis name-standard diagnosis code-standard diagnosis name matching pair is constructed on the basis of the standard codes. In this example, the standard code is international classification standard code (ICD-10) clinical version 2.0.
And after the matching pairs are obtained, storing the matching pairs into a data table. Specifically, the data table comprises a clinical diagnosis name field, a standard diagnosis code field and a standard diagnosis name field, and each matching pair is stored in the data table as a record. In this embodiment, an Elasticsearch database is used for data storage.
Step 1-2: in a plurality of the matching pairs, for each clinical diagnosis name, similarity calculation is performed with each standard diagnosis name and other clinical diagnosis names:
and respectively acquiring the standard codes in the matching pairs of the clinical diagnosis name and the standard diagnosis name with the similarity higher than a set threshold value based on the matching pairs of the clinical diagnosis name, the standard diagnosis code and the standard diagnosis name, and recording as a positive example if the standard codes are the same, and recording as a negative example if the standard codes are different.
Specifically, in this embodiment, based on the elastic search database, in the data table storing the diagnosis matching pairs in the database, the match mode is used to search the clinical diagnosis name column and the standard diagnosis name column, traverse each clinical diagnosis name (standard diagnosis name) di, and obtain the clinical diagnosis name (standard diagnosis name) dj with high similarity to di, if the standard diagnosis code dcj corresponding to dj is the same as the standard diagnosis code dci corresponding to di, then di-dj is taken as a positive example, otherwise di-dj is taken as a negative example. For each clinical diagnosis name, 50 positive examples and 100 negative examples are taken at the maximum. And after the obtained data set is disordered, splitting the data set into a training set, a verification set and a test set according to the proportion of 0.7, 0.15 and 0.15.
Step 2: from the data set, a training model is learned based on the comparison.
Adopting a comparative learning framework based on a twin network to construct a training model, adopting a roberta-2-128 model as a basic model of the twin network, taking a pre-training result as an initial parameter of the model, and obtaining an original diagnosis part d in a training sample i Inputting the hidden state of the last layer of output of the model into the model, wherein X belongs to R B×S×E Wherein B is the size of batch, S is the maximum length of the input character, E is the dimensionality of the hidden layer, and the compressed matrix X is obtained by adopting a mean pooling mode m ,X m ∈R B×E The positive and negative examples d corresponding to the original diagnosis are also taken j Inputting the data into a model and performing pooling operation to obtain Xp m And Xn m
The specific formula of the mean pooling is as follows:
Figure BDA0003664223150000061
calculating cosine similarity of the mean value pooling result of the positive example, simp ═ cos _ sim (X) m ,Xp m ),cosp∈R B×B The cosine similarity of the mean pooling result is also calculated for the negative case, simn ═ cos _ sim (X) m ,Xn m ). For any one of the diagnoses, the corresponding positive example is the positive sample, the corresponding negative example is the negative sample, and the info is used as the loss function for optimization. The specific formula of the loss function is as follows:
Figure BDA0003664223150000071
wherein, the simp i,i The vector representing the similarity of the ith clinical diagnosis name and the positive case represents simn i,j The vector representing the ith clinical diagnosis name and the opposite example represents the similarity, tau represents the temperature parameter, and N represents the number of diagnoses in the batch.
And obtaining a trained model by taking the lowest Loss value of the verification set as a model termination condition.
In order to improve the accuracy of the model, the data set used in the model training includes, in addition to the clinical diagnosis code obtained in step 1 and its arrangement and counter example, other samples, and the sources of these samples may be samples that have been screened before, and are not limited herein.
And step 3: and inputting all clinical diagnosis names and standard diagnosis names into the model, and performing mean pooling on the output matrix to obtain a diagnosis name vector representation library.
The vector representation and its corresponding diagnostic name and standard diagnostic code are stored in the ES database.
And 4, step 4: and acquiring the clinical diagnosis name of the code to be checked, inputting the model according to the model, and performing mean pooling on the output matrix to obtain vector representation of the clinical diagnosis name of the code to be checked.
And 5: and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the clinical diagnosis name and the vector representation of the code to be checked.
Specifically, vectors corresponding to k most similar diagnosis names to the clinical diagnosis names are retrieved from the diagnosis name field in the ES database, and cosine similarity between the vectors and the k most similar diagnosis vectors is calculated. And taking the standard code corresponding to the vector with the highest similarity as the standard code corresponding to the diagnosis.
Example two
Based on the method provided by the first embodiment, the present embodiment provides a diagnostic coding system based on contrast learning, including:
the data acquisition module is used for acquiring a clinical diagnosis name and a standard diagnosis name;
and the training data construction module is used for acquiring the positive examples and the negative examples of each clinical diagnosis name in the training data construction module.
The model training module is used for training a model based on a comparison learning framework;
the vector representation library generating module is used for respectively obtaining corresponding vector representations of all clinical diagnosis names and standard diagnosis names in the matching pairs based on the model to form a diagnosis name vector representation library;
the code to be checked name coding module is used for acquiring the clinical diagnosis name of the code to be checked and obtaining corresponding vector representation according to the model; and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the vector representation of the clinical diagnosis name of the code to be checked, wherein the corresponding standard code is the standard code of the clinical diagnosis name of the code to be checked.
EXAMPLE III
The embodiment aims at providing an electronic device.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the contrast learning-based diagnostic encoding method according to the first embodiment.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a contrast learning-based diagnostic coding method according to a first embodiment.
The second to fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the related description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
One or more of the above embodiments can perform vector representation on any clinical diagnosis name or standard diagnosis name by constructing the model, and simultaneously provide a common representation space, and convert direct comparison before the diagnosis name into distance measurement in the common representation space, thereby improving matching accuracy.
The clinical diagnosis names and the standard diagnosis names are all brought into a diagnosis name vector representation library to be used as comparison objects of the clinical diagnosis names of the codes to be compared, so that errors caused by large difference among the aliases of the diseases can be effectively reduced.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A diagnostic coding method based on contrast learning is characterized by comprising the following steps:
acquiring a clinical diagnosis name and a standard diagnosis name;
acquiring a positive example and a negative example of each clinical diagnosis name;
training a model based on a contrast learning framework according to the clinical diagnosis name and a positive example and a negative example thereof;
obtaining corresponding vector representation for all clinical diagnosis names and standard diagnosis names based on the model respectively to form a diagnosis name vector representation library;
acquiring a clinical diagnosis name of a code to be paired, and obtaining corresponding vector representation according to the model;
and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the vector representation of the clinical diagnosis name of the code to be checked, wherein the corresponding standard code is the standard code of the clinical diagnosis name of the code to be checked.
2. The contrast learning-based diagnostic coding method of claim 1, wherein for each clinical diagnostic code therein, obtaining positive and negative examples thereof comprises:
after the clinical diagnosis names are obtained, aiming at each clinical diagnosis name, constructing a clinical diagnosis name-standard diagnosis code-standard diagnosis name matching pair based on the standard codes;
in the plurality of matching pairs, similarity calculation is carried out on each clinical diagnosis name and each standard diagnosis name and other clinical diagnosis names, and for the clinical diagnosis name and the standard diagnosis name with the similarity higher than a set threshold value, standard codes in the matching pairs where the clinical diagnosis name and the standard diagnosis name are located are respectively obtained, and if the standard codes are the same, the matching pairs are marked as positive examples, and if the standard codes are different, the matching pairs are marked as negative examples.
3. The contrast learning based diagnostic encoding method of claim 1, wherein deriving a vector representation from the model comprises:
and inputting the clinical diagnosis name, the standard diagnosis name or the clinical diagnosis name of the code to be checked into the model, and performing mean value pooling on the output to obtain vector representation.
4. The contrast learning based diagnostic coding method of claim 3, wherein the model employs a twin network based contrast learning architecture, the training method comprising:
for each clinical diagnosis name in the Batch, inputting the clinical diagnosis name into a model, and performing mean pooling on output to obtain vector representation of the clinical diagnosis name;
constructing a loss function based on the clinical diagnosis name and the vector representation similarity of the positive case and the vector representation similarity of the negative case:
Figure FDA0003664223140000021
wherein, the simp i,i The vector representing the similarity of the ith clinical diagnosis name and the positive case represents simn i,j The vector representing the similarity of the ith clinical diagnosis name and the counterexample represents the similarity, and tau represents a temperature parameter; n represents the number of diagnoses in the batch;
and obtaining a trained model by taking the lowest Loss value as a model termination condition.
5. The contrast learning-based diagnostic coding method of claim 1, wherein obtaining the most similar clinical diagnosis name/standard diagnosis name comprises:
according to the similarity between the diagnosis names, acquiring a plurality of clinical diagnosis names or standard diagnosis names which are most similar to the clinical diagnosis name of the code to be checked from a vector representation library;
and acquiring the most similar clinical diagnosis name or standard diagnosis name from the plurality of clinical diagnosis names or standard diagnosis names according to the similarity among the vector representations.
6. The contrast-learning based diagnostic coding method of claim 5, wherein the similarity measure between vector representations employs cosine similarity.
7. A contrast learning based diagnostic coding system, comprising:
the data acquisition module is used for acquiring a clinical diagnosis name and a standard diagnosis name;
the training data construction module is used for acquiring a positive example and a negative example of each clinical diagnosis name;
the model training module is used for training a model based on a contrast learning framework according to the clinical diagnosis name and the positive example and the negative example thereof;
the vector representation library generating module is used for respectively obtaining corresponding vector representations of all clinical diagnosis names and standard diagnosis names in the matching pairs based on the model to form a diagnosis name vector representation library;
the code to be checked name coding module is used for acquiring the clinical diagnosis name of the code to be checked and obtaining corresponding vector representation according to the model; and acquiring the most similar clinical diagnosis name/standard diagnosis name based on a diagnosis name vector representation library according to the vector representation of the clinical diagnosis name of the code to be checked, wherein the corresponding standard code is the standard code of the clinical diagnosis name of the code to be checked.
8. The contrast learning based diagnostic coding system of claim 7, wherein the model employs a twin network based contrast learning architecture, the training method comprising:
for each clinical diagnosis name in the Batch, inputting the clinical diagnosis name into a model, and performing mean pooling on output to obtain vector representation of the clinical diagnosis name;
constructing a loss function based on the clinical diagnosis name and the vector representation similarity of the positive case and the vector representation similarity of the negative case:
Figure FDA0003664223140000031
wherein, the simp i,i The vector representing the similarity of the ith clinical diagnosis name and the positive case represents simn i,j The vector representing the similarity of the ith clinical diagnosis name and the counterexample represents the similarity, and tau represents a temperature parameter; n represents the number of diagnoses in the batch;
and obtaining a trained model by taking the lowest Loss value as a model termination condition.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the contrast learning based diagnostic coding method of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a contrast learning based diagnostic coding method according to any one of claims 1 to 6.
CN202210581884.9A 2022-05-26 2022-05-26 Diagnostic coding method and system based on contrast learning Pending CN114974602A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210581884.9A CN114974602A (en) 2022-05-26 2022-05-26 Diagnostic coding method and system based on contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210581884.9A CN114974602A (en) 2022-05-26 2022-05-26 Diagnostic coding method and system based on contrast learning

Publications (1)

Publication Number Publication Date
CN114974602A true CN114974602A (en) 2022-08-30

Family

ID=82955287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210581884.9A Pending CN114974602A (en) 2022-05-26 2022-05-26 Diagnostic coding method and system based on contrast learning

Country Status (1)

Country Link
CN (1) CN114974602A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827929A (en) * 2019-11-05 2020-02-21 中山大学 Disease classification code recognition method and device, computer equipment and storage medium
CN113593709A (en) * 2021-07-30 2021-11-02 江先汉 Disease coding method, system, readable storage medium and device
CN113593661A (en) * 2021-07-07 2021-11-02 青岛国新健康产业科技有限公司 Clinical term standardization method, device, electronic equipment and storage medium
CN114328948A (en) * 2021-11-24 2022-04-12 腾讯科技(深圳)有限公司 Training method of text standardization model, text standardization method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827929A (en) * 2019-11-05 2020-02-21 中山大学 Disease classification code recognition method and device, computer equipment and storage medium
CN113593661A (en) * 2021-07-07 2021-11-02 青岛国新健康产业科技有限公司 Clinical term standardization method, device, electronic equipment and storage medium
CN113593709A (en) * 2021-07-30 2021-11-02 江先汉 Disease coding method, system, readable storage medium and device
CN114328948A (en) * 2021-11-24 2022-04-12 腾讯科技(深圳)有限公司 Training method of text standardization model, text standardization method and device

Similar Documents

Publication Publication Date Title
CN106844368B (en) Method for man-machine conversation, neural network system and user equipment
CN111382255B (en) Method, apparatus, device and medium for question-answering processing
CN111814466A (en) Information extraction method based on machine reading understanding and related equipment thereof
CN108091372B (en) Medical field mapping verification method and device
JP7290861B2 (en) Answer classifier and expression generator for question answering system and computer program for training the expression generator
CN111738001B (en) Training method of synonym recognition model, synonym determination method and equipment
CN109473149B (en) Data quality evaluation method and device, electronic equipment and computer readable medium
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
CN116719520B (en) Code generation method and device
CN113987199B (en) BIM intelligent image examination method, system and medium with standard automatic interpretation
CN112527970B (en) Data dictionary standardization processing method, device, equipment and storage medium
CN111090641A (en) Data processing method and device, electronic equipment and storage medium
CN112599213B (en) Classification code determining method, device, equipment and storage medium
CN113779996B (en) Standard entity text determining method and device based on BiLSTM model and storage medium
CN111782826A (en) Knowledge graph information processing method, device, equipment and storage medium
JP2023509405A (en) Translation method, device, electronic device and computer program
US11385988B2 (en) System and method to improve results of a static code analysis based on the probability of a true error
CN112329460A (en) Text topic clustering method, device, equipment and storage medium
CN111814432B (en) Method and apparatus for determining standard diagnostic codes for disease
CN111368051A (en) Dialog generation method and device and computer equipment
CN113343696A (en) Electronic medical record named entity identification method, device, remote terminal and system
CN114298314A (en) Multi-granularity causal relationship reasoning method based on electronic medical record
US11157246B2 (en) Code recommender for resolving a new issue received by an issue tracking system
CN116360794A (en) Database language analysis method, device, computer equipment and storage medium
CN114281950B (en) Data retrieval method and system based on multi-graph weighted fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination