CN114238555A

CN114238555A - Medical record missed diagnosis detection method and device, electronic equipment and storage medium

Info

Publication number: CN114238555A
Application number: CN202111284889.7A
Authority: CN
Inventors: 刘少辉; 周开银; 刘喜恩; 周梦强; 尤心心
Original assignee: Beijing Huiji Zhiyi Technology Co ltd
Current assignee: Beijing Huiji Zhiyi Technology Co ltd
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-03-25

Abstract

The invention provides a medical record missed diagnosis detection method, a medical record missed diagnosis detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a medical record text to be detected; performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text; and performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text. According to the method, the device, the electronic equipment and the storage medium, the candidate diseases in the medical record text are mined, manual operation is not needed in the process of carrying out missed diagnosis detection on the medical record text based on the candidate diseases, time and labor are saved, the medical record missed diagnosis detection is carried out on the medical record text based on the context semantics of the candidate diseases in the medical record text while the medical record missed diagnosis detection efficiency is provided, and accurate and reliable missed diagnosis detection can be realized under the condition that the medical record text is complicatedly written.

Description

Medical record missed diagnosis detection method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of natural language processing, in particular to a medical record missed diagnosis detection method and device, electronic equipment and a storage medium.

Background

Missed diagnosis is a common medical malpractice, which means that some diseases cannot be diagnosed in time during the treatment process. Missed diagnosis can directly influence medical insurance reimbursement, and in severe cases, the missed diagnosis can also cause medical hidden danger and threaten the life safety of patients.

For the missed diagnosis problem, doctors are mostly relied on to detect each medical record, but the screening of the missed diagnosis problem requires that operators have higher professional literacy and must be performed by the professional doctors, so that the hospitals are difficult to bear huge human resource investment, and the manual detection is also possible to cause careless discovery by the detecting doctors, so that the reliability and the accuracy of the missed diagnosis detection are not high.

Disclosure of Invention

The invention provides a method and a device for detecting missed medical record, electronic equipment and a storage medium, which are used for solving the problems of poor reliability, accuracy and high cost of manual missed medical record detection in the prior art.

The invention provides a medical record missed diagnosis detection method, which comprises the following steps:

determining a medical record text to be detected;

performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text;

and performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

According to the medical record missed diagnosis detection method provided by the invention, the medical data mining is performed on the medical record text to obtain the candidate diseases contained in the medical record text, and the method comprises the following steps:

performing disease named body recognition on the medical record text to obtain an explicit candidate disease;

and/or, based on an implicit mining rule, performing medical data mining on the medical record text to obtain an implicit candidate disease, wherein the implicit mining rule comprises a corresponding relation between diagnosis and treatment information and the disease, and the diagnosis and treatment information comprises examination items and/or medicines;

determining candidate diseases contained in the medical record text based on the explicit candidate diseases and/or the implicit candidate diseases.

According to the medical record missed diagnosis detection method provided by the invention, the implicit mining rule is determined based on the association degree between the diagnosis and treatment information and the disease in the disease knowledge text and/or the sample medical record.

According to the medical record missed diagnosis detection method provided by the invention, the implicit mining rule is determined based on the following steps:

screening candidate diagnosis and treatment disease pairs from the disease knowledge text based on the association degree between the diagnosis and treatment information and various diseases in the disease knowledge text, wherein the candidate diagnosis and treatment disease pairs are the combination of candidate diagnosis and treatment information and diseases;

and screening diagnosis and treatment disease pairs from the candidate diagnosis and treatment disease pairs as the implicit mining rule based on the correlation degree between the diagnosis and treatment information and the corresponding diseases in the candidate diagnosis and treatment disease pairs in the sample medical record.

According to the medical record missed diagnosis detection method provided by the invention, the missed diagnosis detection of the medical record text based on the context semantics of the candidate diseases in the medical record text comprises the following steps:

determining a segment text containing the candidate diseases in the medical record text;

inputting the candidate diseases and the fragment texts of the candidate diseases into a diagnosis confirming detection model, determining context semantics of the candidate diseases by the diagnosis confirming detection model based on the fragment texts, and performing diagnosis confirming detection based on the context semantics to obtain diagnosis confirming detection results output by the diagnosis confirming detection model;

based on the confirmed diagnosis detection result, carrying out missed diagnosis detection on the medical record text;

the confirmed detection model is obtained by training based on sample diseases in a sample medical record, and sample fragment texts and confirmed labels of the sample diseases.

According to the medical record missed diagnosis detection method provided by the invention, the candidate diseases and the segment texts of the candidate diseases are input into a confirmed diagnosis detection model, the confirmed diagnosis detection model determines the context semantics of the candidate diseases based on the segment texts, and carries out confirmed diagnosis detection based on the context semantics to obtain a confirmed diagnosis detection result output by the confirmed diagnosis detection model, and the method comprises the following steps:

inputting the candidate diseases and the fragment texts of the candidate diseases, the chapters of the fragment texts in the medical record texts and/or the disease types of the candidate diseases into a diagnosis confirming detection model, determining the context semantics of the candidate diseases by the diagnosis confirming detection model based on the fragment texts, and performing diagnosis confirming detection by combining the context semantics and the chapters and/or the disease types to obtain a diagnosis confirming detection result output by the diagnosis confirming detection model.

According to the medical record missed diagnosis detection method provided by the invention, the missed diagnosis detection of the medical record text based on the confirmed diagnosis detection result comprises the following steps:

taking the candidate disease with the confirmed disease detection result as a candidate missed diagnosis disease;

inputting the candidate missed diagnosis diseases and various diagnosis diseases corresponding to the medical record texts into a disease representation model respectively to obtain disease representations of the candidate missed diagnosis diseases and the various diagnosis diseases output by the disease representation model; the disease representation model is obtained by training based on clinical names of all diseases, positive case disease coding standard names and negative case clinical names;

and screening the candidate missed diagnosis diseases based on the similarity between the disease representation of the candidate missed diagnosis diseases and the disease representation of each diagnosis disease.

According to the medical record missed diagnosis detection method provided by the invention, the loss function of the disease representation model is determined based on the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the positive case disease coding standard name of each disease, and the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the negative case clinical name of each disease;

the predictive representation of the clinical name of the disease is determined based on the clinical name of the disease by a disease representation model during training, and the predictive representation of the disease code standard name is determined based on the disease code standard name by a disease representation model during training.

The invention also provides a medical record missed diagnosis detection device, which comprises:

the text determining unit is used for determining a medical record text to be detected;

the medical data mining unit is used for mining the medical data of the medical record text to obtain candidate diseases contained in the medical record text;

and the missed diagnosis detection unit is used for carrying out missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of any one of the medical record missed diagnosis detection methods.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the medical record missed diagnosis detection methods described above.

According to the medical record missed diagnosis detection method, the medical record missed diagnosis detection device, the electronic equipment and the storage medium, the candidate diseases in the medical record text are mined, manual operation is not needed in the process of carrying out missed diagnosis detection on the medical record text based on the candidate diseases, time and labor are saved, the medical record missed diagnosis detection is carried out on the medical record text based on the context semantics of the candidate diseases in the medical record text while the medical record missed diagnosis detection efficiency is improved, and accurate and reliable missed diagnosis detection can be achieved under the condition that the medical record text is written in a complex mode.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a medical record missing diagnosis detection method according to the present invention;

FIG. 2 is a schematic flow chart illustrating step 120 of the medical record missing detection method provided by the present invention;

FIG. 3 is a flow chart of an implicit mining rule determination method provided by the present invention;

FIG. 4 is a schematic flow chart illustrating step 130 of the medical record missing detection method provided by the present invention;

FIG. 5 is a schematic structural diagram of a missed diagnosis detection model provided by the present invention;

FIG. 6 is a schematic flow chart illustrating step 133 of the medical record missing detection method according to the present invention;

FIG. 7 is a schematic diagram of the training of a disease representation model provided by the present invention;

FIG. 8 is a second flowchart illustrating a step 133 of the medical record missing detection method according to the present invention;

FIG. 9 is a second schematic flow chart of the medical record missing diagnosis detection method according to the present invention;

FIG. 10 is a schematic structural diagram of a medical record missing diagnosis detection apparatus provided by the present invention;

fig. 11 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the missed diagnosis problem, doctors are mostly relied on to detect each medical record, but the missed diagnosis problem requires high professional literacy, and needs to be detected by professional doctors, so that the hospital is difficult to bear huge investment of human resources, and the manual detection is also possible to cause careless discovery of missed diagnosis by the doctors, and the reliability and accuracy of the missed diagnosis detection are not high.

In addition, for the missed diagnosis problem, automated missed diagnosis detection based on a disease knowledge base or a disease prediction idea is also considered at present.

The missed diagnosis detection based on the disease knowledge base can compare the diseases appearing in the medical record with the discharge diagnosis by using the disease knowledge base, and output the diseases which are different after comparison as the missed diagnosis, however, considering that the actual writing condition of the medical record is very complex, the medical record may have the conditions that the diseases are found and the diseases are not diagnosed, such as only showing a certain disease to be identified or a certain secondary disease to be prevented, and the diseases may be missed only based on the comparison of the disease knowledge base.

The missed diagnosis detection based on the disease prediction idea is essentially disease prediction, but because the medical history contains a large amount of content information, the disease prediction range is large, and the learning difficulty is very large. Furthermore, the first N of the predicted results are often diseases very similar to the discharge diagnosis, and the missed diagnosis is not necessarily detected. In addition, due to the inexplicability of the deep learning method, the prediction result lacks judgment basis, and the method does not conform to the actual application scene of missed diagnosis detection and is difficult to meet the actual use requirement.

In view of the above problem, an embodiment of the present invention provides a method for detecting missed medical record, fig. 1 is a schematic flow diagram of the method for detecting missed medical record provided by the present invention, and as shown in fig. 1, the method includes:

step 110, determining a medical record text to be detected.

Specifically, the medical record text is a text corresponding to the medical record of the patient. Here, the medical record text to be detected is a medical record text that needs to be subjected to medical record missing detection, and the medical record text may be an electronic medical record entered according to patient self-description and inquiry of a doctor, or may be a text obtained by performing optical character recognition OCR on a medical record image obtained by scanning or shooting a handwritten paper medical record, which is not specifically limited in this embodiment of the present invention.

And step 120, performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text.

Specifically, the medical history text contains a large amount of disease description and information of examination and diagnosis of the patient, and also contains conclusive information obtained after diagnosis by the doctor. The disease description and the information of the patient receiving examination and diagnosis can reflect the disease which the patient can suffer from laterally, and the conclusive information obtained by the doctor diagnosis can directly reflect the disease which the patient suffers from. For rich and complex information contained in the medical record text, the candidate diseases contained in the medical record text can be obtained by performing medical data mining on the medical record text, where the candidate diseases are diseases that a patient corresponding to the medical record text may suffer from, the candidate diseases may be directly recorded in the medical record text or may not be directly recorded in the medical record text, but implicit information derived from the information contained in the medical record text is available, which is not specifically limited in the embodiment of the present invention.

Here, the medical record text is subjected to medical data mining, the medical record text can be obtained by using a disease name as a naming body and performing naming body recognition, or information of a disease gold standard or a specific medicine can be collected in advance, an examination item or a medicine name is used as a naming body, the medical record text is subjected to naming body recognition to obtain an examination item or a medicine name contained in the medical record text, and the examination item or the medicine name corresponds to the information of the disease gold standard or the specific medicine, so that a candidate disease hidden in the medical record text is determined. In addition, diseases contained in the sample medical record text can be labeled in advance, and the medical data mining model is trained according to the diseases, so that the medical data mining model can learn the mapping relation between the medical record text and the diseases, and the diseases contained in the input medical record text can be output as candidate diseases.

And step 130, performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

Specifically, considering the complexity of the writing situation of the medical record text itself, for example, some candidate diseases appear in the "past history" section in the medical record text and are actually cured early, and for example, some candidate diseases are confirmed in the medical record text as needing to be prevented or observed further and are not diagnosed, and the candidate diseases obtained by mining the medical data of the medical record text may be diagnosed clearly in the medical record text, or may be indicated in the medical record text as needing to be prevented or observed, or may be mentioned in the medical record text as having been suffered but cured, and all the situations are not missed diagnosis.

Therefore, when the medical record text is subjected to missed diagnosis detection based on the candidate diseases, if the candidate diseases are simply compared with the diagnosed disease list, complex information in the medical record text is obviously ignored, and detection errors are easy to occur. Therefore, when the missed diagnosis is detected, the context semantics of the candidate diseases in the medical record text are combined, so that the missed diagnosis detection can refer to the semantic information related to the candidate diseases in the medical record text, and the situations that the candidate diseases are obtained by definite diagnosis in the medical record text, or are not diagnosed and need to be prevented and observed, or are cured and the like are eliminated, so that the real missed diagnosis situation is obtained.

Further, the context semantics of the candidate diseases in the medical record text can be obtained by locating the segment text containing the candidate diseases or the information related to the candidate diseases in the medical record text and extracting the semantics of the segment text, and after the context semantics of the candidate diseases in the medical record text is obtained, whether the candidate diseases are missed in the medical record text can be judged based on the context semantics. Here, the semantic extraction may be implemented by a pre-trained language model, for example, the semantic extraction may be performed by a bert (bidirectional Encoder reproduction from transforms) model, or the semantic encoding may be performed by an encoding portion in a transform model.

In addition, on the basis of semantic extraction on the segment text, it may be determined whether the candidate disease is missed in the medical record text, for example, whether the candidate disease is curable or not, by combining the chapter position of the segment text in the medical record text and the characteristics of the candidate disease, for example, whether the candidate disease is curable or not.

According to the method provided by the embodiment of the invention, the candidate diseases in the medical record text are mined, manual operation is not needed in the process of carrying out missed diagnosis detection on the medical record text based on the candidate diseases, time and labor are saved, the missed diagnosis detection on the medical record text is carried out based on the context semantics of the candidate diseases in the medical record text while the medical record missed diagnosis detection efficiency is provided, and accurate and reliable missed diagnosis detection can be realized under the condition that the medical record text is complicatedly written.

In actual medical records, missed diagnoses can be generally divided into two cases: one is explicit diagnosis and the other is implicit diagnosis. Explicit missed diagnosis means that the doctor actually made a corresponding diagnosis during the course of treatment, but did not record it in the discharge diagnosis. Implicit missed diagnosis refers to the condition that a doctor finds but does not give a corresponding diagnosis in the treatment process, and mainly comprises two conditions: 1) actual treatment was performed, but no corresponding diagnosis was given, e.g. certain specific drugs were used; 2) some disease judgment bases are found in auxiliary examination, but corresponding diagnosis is not given. For the two different situations, based on the above embodiments, fig. 2 is a schematic flow chart of step 120 in the medical record missed diagnosis detection method provided by the present invention, as shown in fig. 2, step 120 respectively mines diseases that may have missed diagnosis in the two situations, and step 120 includes:

and step 121, performing disease named body recognition on the medical record text to obtain an explicit candidate disease.

Specifically, in the case of explicit missed diagnosis, a clear disease diagnosis record is usually associated in the medical record text, so that the medical record text can be directly subjected to disease named body recognition, and the recognized disease named body is used as an explicit candidate disease which may have a missed diagnosis problem.

The disease Named Entity Recognition here can be realized by a Near Entity Recognition (NER) model in the medical field, or an NER model in the general field can be optimally trained by combining a sample medical record obtained in advance and a disease name marked in the sample medical record, so that the optimally trained NER model is used for disease Named Entity Recognition, which is not specifically limited in the embodiment of the present invention.

And/or step 122, performing medical data mining on the medical record text based on an implicit mining rule to obtain an implicit candidate disease, wherein the implicit mining rule comprises a corresponding relation between diagnosis and treatment information and the disease, and the diagnosis and treatment information comprises examination items and/or medicines.

Specifically, in the case of implicit missed diagnosis, the medical record text does not necessarily have a clear description of the implicit missed diagnosis disease, and more likely has a description of some diagnosis and treatment information in the medical record text, and the disease suffered by the patient can be directly inferred from the diagnosis and treatment information, where the diagnosis and treatment information may be a name of a drug used by the patient or an examination item received by the patient.

For such cases, implicit mining rules can be applied to mine medical data of medical history texts, so that the mined diseases are used as implicit candidate diseases with possible missed diagnosis problems. The implicit mining rule may be preset, or may be obtained by mining a correspondence relationship of a knowledge text in the medical field, for example, the implicit mining rule may be constructed by performing entity recognition on the knowledge text, determining an examination item entity, a drug entity, and a disease entity included in the knowledge text, determining a correspondence relationship between the examination item and the disease or a correspondence relationship between the drug entity and the disease entity by counting a co-occurrence probability of the examination item entity and the disease entity, or training a relationship extraction model according to the sample knowledge text and a relationship label of diagnosis and treatment information and the disease labeled in the sample knowledge text, and extracting a correspondence relationship between the diagnosis and treatment information and the disease included in the massive knowledge text based on the relationship extraction model. Here, the implicit mining rule may include a correspondence relationship between the examination item and the disease, where the disease corresponding to the examination item, i.e., the disease that the patient suffers from can be directly distinguished according to the examination item. The corresponding relation between the examination item and the disease is consistent with the diagnostic gold standard which is the most reliable method for diagnosing the disease and is accepted by the clinical medical field and is used for accurately distinguishing whether the tested object is a patient. When the diagnosis gold standard is applied to medical data mining, examination items accepted by a patient and recorded in a medical record text can be acquired, and if a certain examination item is an item used for distinguishing whether a certain disease is suffered or not in the diagnosis gold standard, the disease can be used as an implicit candidate disease.

In addition, the implicit mining rules may also include the correspondence between the drug and the disease, where the drug corresponds to the disease, i.e., the disease that the drug is dedicated to treat. The correspondence between the drug and the disease is consistent with the information on the specific drug, which reflects the specific drug used for treating some specific diseases, and the specific drug here can be understood as a drug used only for a certain disease. When the specific drug information is applied to medical data mining, various drugs used by a patient recorded in a medical record text can be acquired, and if a certain drug is a specific drug for treating a certain disease in the specific drug information, the disease can be used as an implicit candidate disease.

The implicit mining rule may include a correspondence between an examination item and a disease and a correspondence between a medicine and a disease, and when medical data is mined, the examination item accepted by a patient and various medicines used by the patient, which are described in a medical record text, may be acquired.

It should be noted that, when both

steps

121 and 122 need to be executed, step 121 may be executed before or after step 122, or may be executed in synchronization with step 122.

And step 123, determining candidate diseases contained in the medical record text based on the explicit candidate diseases and/or the implicit candidate diseases.

Specifically, when only step 121 is executed, all explicit candidate diseases obtained in step 121 may be taken as candidate diseases included in the medical record text, and when only step 122 is executed, all implicit candidate diseases obtained in step 122 may be taken as candidate diseases included in the medical record text; when both step 121 and step 122 are executed, all explicit candidate diseases and all implicit candidate diseases obtained by the both may be used as candidate diseases included in the medical record text, for example, the explicit candidate diseases and the implicit candidate diseases may both be placed in a candidate disease set, and then the diseases included in the candidate disease set are deduplicated, or all explicit candidate diseases may be placed in the candidate disease set first, and then the implicit candidate diseases and the existing candidate diseases in the candidate disease set are subjected to similarity matching one by one, if there is a candidate disease with a high similarity to the implicit candidate disease, the implicit candidate disease is deleted, otherwise, the implicit candidate disease is placed in the candidate disease set.

According to the method provided by the embodiment of the invention, the mining of the displayed candidate diseases and the implicit candidate diseases is respectively carried out, so that the comprehensiveness of medical data mining based on medical record texts is ensured, and the omission of the candidate diseases during subsequent medical record missed diagnosis detection is avoided.

In practical operation, the construction of an implicit mining rule applied to mining implicit candidate diseases is difficult, particularly the collection of disease gold standards and specific drugs is difficult: the common medical knowledge base often does not contain this type of knowledge; however, since the knowledge acquired by the doctor is one-sidedly, it is difficult to acquire the knowledge directly by the doctor labeling. In this regard, based on any of the above embodiments, the implicit mining rules are determined based on the association between the medical information and the disease in the disease knowledge text and/or the sample medical records.

Specifically, the disease knowledge text may be a text in a disease knowledge base, which is a disease-centered information database containing information on symptoms, supplementary examinations, laboratory examinations, medicines, surgeries, and the like. The disease knowledge text can cover examination items required for disease diagnosis and medicines required for disease treatment, so that the examination items specially used for diagnosing a certain disease can be obtained by mining the association degree between the examination items contained in the disease knowledge text and the disease, so as to obtain a diagnosis gold standard, and the medicines specially used for treating a certain disease can be mined by mining the association degree between the medicines and the disease in the disease knowledge text, so as to obtain specific medicine information.

The sample medical record is a medical record text which is collected in advance, the sample medical record covers the examination items which are received by the patient in the real diagnosis and treatment process, the diseases which are obtained through diagnosis and the medicines for treating the diseases, so the examination items which are specially used for diagnosing a certain disease can be obtained by mining the association degree between the examination items and the diseases contained in the sample medical record, the diagnosis gold standard can be obtained, the medicines which are specially used for treating a certain disease can be mined by mining the association degree between the medicines and the diseases in the sample medical record, and the special-effect medicine information can be obtained.

Further, the degree of association between the examination item and the disease may be understood as a probability of co-occurrence of the name of the examination item and the name of the disease, a probability of confirming that the patient has a certain disease after performing a certain examination item, and the degree of association between the drug and the disease may be understood as a probability of co-occurrence of the name of the drug and the name of the disease, or a probability of having a certain disease when a certain drug is used.

In the embodiment of the invention, the implicit mining rule is constructed by applying the association degree between the diagnosis and treatment information and the disease in the disease knowledge text and/or the sample medical record, so that the comprehensiveness and the reliability of the implicit mining rule are ensured. The implicit mining rule constructed by the method can avoid the cognitive deviation of a doctor, and is more robust.

Based on any of the embodiments, candidate diagnosis and treatment disease pairs can be obtained by screening from the disease knowledge text based on the association degree of each diagnosis and treatment information and each disease in the disease knowledge text, and an implicit mining rule is determined based on each candidate diagnosis and treatment disease pair. Here, the candidate clinical disease pair is a combination of candidate clinical information and a disease.

Here, the medical information may be examination items or drugs. The following operations can be directly applied to the inspection items or the medicines to realize the construction of the implicit mining rules.

Specifically, the association between the diagnosis and treatment information and the disease contained in the disease knowledge text obtained by mining and statistics, that is, the probability of suffering from various diseases under each diagnosis and treatment information in the disease knowledge text, may be directly recorded in the disease knowledge text, or may be calculated according to the relevant data counted in the disease knowledge text, and the embodiment of the present invention is not limited to this.

When the diagnosis and treatment information is any examination item, the probability of having any disease under any diagnosis and treatment information in the disease knowledge text can be marked as P (disease)_j|auxiliary_i) I.e. accepting the ith check term auxiliary_iThe patient suffers from the jth disease_jWherein auxiliary_iAnd disease_jMay form a pair, e.g. P (disease)_j|auxiliary_i) If the judgment threshold is larger than the preset gold standard judgment threshold, the auxiliary can be used_iAnd disease_jThe gold standard judgment threshold value may be set to 0.95, 0.9, or the like as a pair of candidate diagnosis and treatment disease pairs, which may be understood as a diagnosis gold standard obtained by text mining of disease knowledge, that is, a correspondence relationship between a check item and a disease.

When the diagnosis and treatment information is any medicine, the probability of having any disease under any diagnosis and treatment information in the disease knowledge text can be marked as P (disease)_j|drug_i) I.e. using drug of the ith kind_iThe patient suffers from the jth disease_jThe probability of (c). Wherein the drug_iAnd disease_jMay form a pair, e.g. P (disease)_j|drug_i) If the threshold value is larger than the preset specific medicine judgment threshold value, the drug can be judged_iAnd disease_jAs a pair of candidate diagnostic and therapeutic disease pairs, specific drugs are given hereThe judgment threshold value can be set to 0.95, 0.9 and the like, and the candidate diagnosis and treatment disease pair can be understood as specific medicine information obtained by mining a disease knowledge text, namely the corresponding relation between the medicine and the disease.

The candidate diagnosis and treatment disease pair obtained by the method can be directly used for constructing an implicit mining rule.

Based on any of the above embodiments, fig. 3 is a schematic flow chart of the implicit mining rule determination method provided by the present invention, and as shown in fig. 3, the implicit mining rule is determined based on the following steps:

and 310, screening candidate diagnosis and treatment disease pairs from the disease knowledge text based on the association degree between the diagnosis and treatment information and various diseases in the disease knowledge text, wherein the candidate diagnosis and treatment disease pairs are the candidate diagnosis and treatment information and disease combinations.

When the diagnosis and treatment information is any examination item, the probability of having any disease under any diagnosis and treatment information in the disease knowledge text can be marked as P (disease)_j|auxiliary_i) I.e. accepting the ith check term auxiliary_iThe patient suffers from the jth disease_jWherein auxiliary_iAnd disease_jMay form a pair, e.g. P (disease)_j|auxiliary_i) If the judgment threshold is larger than the preset gold standard judgment threshold, the auxiliary can be used_iAnd disease_jAs a pair of candidate clinical disease pairs, the candidate clinical disease pairs herein can be understood as a diagnostic gold standard obtained by text mining of disease knowledge, that is, a correspondence relationship between examination items and diseases.

When the diagnosis and treatment information is any medicine, the probability of having any disease under any diagnosis and treatment information in the disease knowledge text can be marked as P (disease)_j|drug_i) I.e. using drug of the ith kind_iThe patient suffers from the jth disease_jThe probability of (c). Wherein the drug_iAnd disease_jMay form a pair, e.g. P (disease)_j|drug_i) If the threshold value is larger than the preset specific medicine judgment threshold value, the drug can be judged_iAnd disease_jAs a pair of candidate clinical disease pairs, the candidate clinical disease pairs herein can be understood as specific drug information obtained by mining a disease knowledge text, that is, a correspondence relationship between a drug and a disease.

And 320, screening diagnosis and treatment disease pairs from the candidate diagnosis and treatment disease pairs as an implicit mining rule based on the correlation degree between the diagnosis and treatment information and the corresponding diseases in the candidate diagnosis and treatment disease pairs in the sample medical record.

In particular, considering the knowledge limitation in the knowledge base, for example, some drugs may be associated with only some diseases only because these drugs are rare, but not specific drugs for these diseases, so the candidate diagnosis and treatment disease pairs obtained by mining the disease knowledge text need to be further verified by the sample medical records.

In the sample medical records, the correlation between the diagnosis and treatment information in any candidate diagnosis and treatment disease pair and the corresponding disease can be expressed as the co-occurrence probability of the candidate diagnosis and treatment disease pair in the sample medical records, and can also be expressed as the probability of the corresponding disease suffered by the diagnosis and treatment information in any candidate diagnosis and treatment disease pair in the sample medical records.

The co-occurrence probability of any candidate diagnosis and treatment disease pair in the sample medical records, namely the co-occurrence ratio of the combination of the diagnosis and treatment information and the disease in the candidate diagnosis and treatment disease pair in each sample medical record can be understood as the support degree of the candidate diagnosis and treatment disease, the higher the co-occurrence ratio is, the greater the co-occurrence probability is, the denser the co-occurrence condition is, the more common the candidate diagnosis and treatment disease pair exists, the higher the support degree is, and the higher the probability is taken as an implicit mining rule is; conversely, the lower the co-occurrence ratio, the lower the co-occurrence probability, the fewer the co-occurrences, the less common the candidate clinical disease pair exists, and the lower the support degree, the lower the probability of being used as an implicit mining rule.

The probability of a disease corresponding to diagnosis and treatment information in any candidate diagnosis and treatment disease pair in a sample medical record, namely the conditional probability of the occurrence of the corresponding disease when the diagnosis and treatment information occurs can be understood as the confidence of the candidate diagnosis and treatment disease, the higher the conditional probability is, the higher the confidence is, the higher the probability that the diagnosis and treatment information has specificity is, and the higher the probability that the candidate diagnosis and treatment disease pair is used as an implicit mining rule is; conversely, the lower the conditional probability, the more different diseases corresponding to the diagnosis and treatment information, the lower the confidence, the lower the probability that the diagnosis and treatment information has specificity, and the lower the probability that the candidate diagnosis and treatment disease pair is used as an implicit mining rule.

The diagnosis and treatment disease pairs can be selected from the candidate diagnosis and treatment disease pairs as an implicit mining rule only based on the co-occurrence probability of the candidate diagnosis and treatment disease pairs in the sample medical record, or the diagnosis and treatment disease pairs can be selected from the candidate diagnosis and treatment disease pairs as an implicit mining rule only based on the probability of the corresponding disease under the diagnosis and treatment information in the candidate diagnosis and treatment disease pairs in the sample medical record, or the diagnosis and treatment disease pairs can be selected from the candidate diagnosis and treatment disease pairs as an implicit mining rule by referring to the two, and the number of the invention is not particularly limited.

According to the method provided by the embodiment of the invention, the candidate diagnosis and treatment disease pairs are mined through the disease knowledge text, and the candidate diagnosis and treatment disease pairs are further screened by combining the sample medical records on the basis, so that the reliability of the implicit mining rule is ensured.

Based on any of the above embodiments, in step 320, when any candidate diagnosis and treatment disease pair is (disease, examination item), the co-occurrence condition of any candidate diagnosis and treatment disease pair in the sample medical record may specifically be the co-occurrence condition of the candidate diagnosis and treatment disease pair in the auxiliary examination field of the sample medical record. The probability of the corresponding disease under the examination item in any candidate diagnosis and treatment disease pair in the sample medical record can be specifically the probability of the corresponding disease under the examination item in the candidate diagnosis and treatment disease pair in the auxiliary examination field of the sample medical record.

Here, the examination item in the auxiliary examination field is often presented in the form of (examination item: disease), which means that the existence of the corresponding disease can be determined according to the examination item (including types such as assay and image), and there are often obvious segmentation signs between the two examination items, for example, "head and neck artery imaging: 1. right anterior cerebral artery multiple mild stenosis 2. left common carotid artery proximal bifurcation mild stenosis", which means that the existence of "1 right anterior cerebral artery multiple mild stenosis" and "2 left common carotid artery proximal bifurcation mild stenosis" can be diagnosed according to the result of the head and neck artery imaging. Can be extracted from the text by some fixed modes, word lists and disease standardization tools (head and neck blood vessel imaging, cerebral artery stenosis), (head and neck blood vessel imaging, carotid artery stenosis). The contents in the sample medical record can be used as the knowledge of the examination item of the disease and used for calculating the support degree and the confidence degree of the candidate diagnosis and treatment disease pair so as to verify whether the candidate diagnosis and treatment disease pair has certain universality and specificity.

The support of the candidate diagnosis and treatment disease pairs, i.e. the co-occurrence of the candidate diagnosis and treatment disease pairs, can be identified as follows:

Support(disease_i,auxiliary_j)＝#(disease_i,auxiliary_j)/#

wherein, # (disease)_i,auxiliary_j) Representing the number of occurrences of (disease i, examination item j) in the auxiliary examination field in all sample medical records; # (disease)_i,auxiliary_j) And # (auxiliary)_j,disease_i) And equivalence. # denotes the number of all combinations (diseases, examination items) present in the auxiliary examination field in all sample medical records.

The confidence of the candidate diagnosis and treatment disease pair, that is, the probability of the corresponding disease suffered under the diagnosis and treatment information in the candidate diagnosis and treatment disease pair, may be identified as:

Confidence(disease_i,auxiliary_j)＝#(disease_i,auxiliary_j)/#(auxiliary_j)

wherein, # (auxiliary)_j) Indicating the number of times the examination item j appears in the auxiliary examination fields in all medical records.

After the support degree and the confidence degree of (disease, examination item) are obtained through calculation, the (disease, examination item) with the support degree larger than the support degree threshold value and the confidence degree larger than the confidence degree threshold value can be screened as a diagnosis gold standard according to the preset support degree threshold value and confidence degree threshold value of the examination item, and therefore accurate and reliable acquisition of the diagnosis gold standard is achieved. In order to achieve accuracy and avoid normalization errors caused by complex forms, the found diagnostic gold standards are simple in form, such as: CT check-urate crystals- > gout.

Based on any of the above embodiments, in step 320, when any candidate diagnosis and treatment disease pair is (disease, drug), the co-occurrence of any candidate diagnosis and treatment disease pair in the sample medical record may specifically be the co-occurrence of the candidate diagnosis and treatment disease pair in the main diagnosis of the sample medical record. The probability of the disease corresponding to the disease under the drug in any candidate diagnosis and treatment disease pair in the sample medical record can be specifically the probability of the disease corresponding to the disease under the drug in the candidate diagnosis and treatment disease pair in the main diagnosis of the sample medical record.

The main diagnosis is considered to be the most serious, most costly and longest-lasting disease related to the hospital, and in the embodiment of the present invention, it is assumed that all the therapeutic drugs in the treatment process have a certain relationship with the main diagnosis.

Support(disease_i,drug_j)＝#(disease_i,drug_j)/#

Confidence(disease_i,drug_j)＝#(disease_i,drug_j)/#(drug_j)

wherein, # denotes the number of combinations in the case history of all samples (first page of case-major diagnosis, treatment process-certain drug), # of_i,drug_j) Indicates the candidate disease i as the main diagnosis and the drug j during the treatment process in all sample medical recordsThe number of times of statistics in (1). # (drug)_j) Indicating the number of times drug j occurred during the course of treatment.

After the support degree and the confidence degree of (diseases and medicines) are obtained through calculation, the (diseases and medicines) with the support degree larger than the support degree threshold value and the confidence degree larger than the confidence degree threshold value can be screened as special medicine information according to the support degree threshold value and the confidence degree threshold value of the preset examination item, and therefore accurate and reliable acquisition of the special medicines is achieved. Here, the support threshold and the confidence threshold used for the screening (disease, drug) may be the same as or different from the support threshold and the confidence threshold used for the screening (disease, examination item). In addition, because the actual clinical medication is often not limited to the indication condition of the medicine, the actual found specific medicine information is often in a special condition, such as: enema- > constipation.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of step 130 in the medical record missed diagnosis detection method provided by the present invention, and as shown in fig. 4, step 130 includes:

step 131, determining a segment text containing the candidate diseases in the medical record text;

step 132, inputting the candidate diseases and the segment texts of the candidate diseases into a diagnosis confirming detection model, determining context semantics of the candidate diseases by the diagnosis confirming detection model based on the segment texts, and performing diagnosis confirming detection based on the context semantics to obtain a diagnosis confirming detection result output by the diagnosis confirming detection model;

the confirmed detection model is obtained by training based on the sample diseases in the sample medical record, the sample fragment texts of the sample diseases and the confirmed labels.

Specifically, after medical data mining for a medical record text is completed, candidate diseases included in the medical record text can be obtained, on this basis, a segment text including the candidate diseases can be located from the medical record text, where the segment text may be one clause, or at least two clauses centered on the candidate diseases, or one speech segment, and the like. For example, the candidate disease is "asthma", and the segment text corresponding to the candidate disease may be "possibility of asthma to be prevented".

After the segment text of the candidate disease is obtained, the candidate disease and the segment text thereof can be input into the confirmed detection model. The input mode here may be to input the candidate disease and the segment text separately, or to input the segment text and input the position code of the candidate disease in the segment text, which is not specifically limited in the embodiment of the present invention.

The confirmed detection model can extract semantic information of the input fragment text, thereby determining context semantics of the input candidate disease, and judging whether the candidate disease is confirmed in the medical record text based on the context semantics, thereby outputting a confirmed detection result of the candidate disease, namely the candidate disease is a confirmed disease or the candidate disease is not confirmed.

Before step 132 is executed, a diagnosis-confirmed detection model may be obtained through pre-training, and the training process of the diagnosis-confirmed detection model may be implemented through the following steps: firstly, collecting a large number of sample medical records, labeling sample diseases contained in the sample medical records, positioning sample fragment texts of the sample diseases in the sample medical records, and labeling whether the sample diseases are the diagnosed diseases of the sample medical records. And carrying out model training based on the samples so as to obtain a diagnosis confirming detection model.

And step 133, performing missed diagnosis detection on the medical record text based on the confirmed diagnosis detection result.

Specifically, for candidate diseases whose confirmed detection result is confirmed, the diseases listed in the discharge diagnosis can be compared, so as to determine whether there is a missed diagnosis.

Here, when the diagnosed candidate disease is compared with the diseases listed in the discharge diagnosis, the similarity between the diagnosed candidate disease and the disease names of the diseases listed in the discharge diagnosis can be directly calculated, if a disease with the similarity between the disease names greater than the similarity threshold exists in the discharge diagnosis, the candidate disease is determined not to be missed, otherwise, the candidate disease is determined to be missed;

the method can also be used for respectively obtaining the disease representation of the diagnosed candidate disease and the disease representation of the diseases listed in the discharge diagnosis by applying a pre-trained disease representation model, calculating the similarity of the diagnosed candidate disease and the disease representation of the diseases listed in the discharge diagnosis on the basis, if the diseases with the similarity larger than the similarity threshold value exist in the discharge diagnosis, determining that the candidate disease is not missed, otherwise, determining that the candidate disease is missed;

the method can also be applied to a positive disease and negative disease pair of sample diseases and a trained disease comparison model, wherein the positive disease pair of the sample diseases contains two disease names of the same disease, and the negative disease pair contains two different disease names, so that the trained disease comparison model has the function of automatically judging whether the two input diseases are the same disease. Inputting the diagnosed candidate diseases and the diseases listed in the discharge diagnosis into the disease comparison model to obtain the comparison result output by the disease comparison model, determining that the candidate diseases are not missed to be diagnosed when the comparison results are the same, and determining that the candidate diseases are missed to be diagnosed when the comparison results are different. And for the candidate diseases with confirmed diagnosis detection results of undetermined diagnosis, the absence of missed diagnosis can be directly judged.

According to the method provided by the embodiment of the invention, the context semantics of the candidate diseases are obtained through the fragment texts of the candidate diseases for definite diagnosis detection, so that the reliability of missed diagnosis detection is improved under the condition that the medical history texts are written in a complex manner.

Based on any of the above embodiments, step 132 includes:

inputting the candidate diseases, the segment texts of the candidate diseases, chapters of the segment texts in the medical record texts and/or the disease types of the candidate diseases into a diagnosis confirming detection model, determining the context semantics of the candidate diseases by the diagnosis confirming detection model based on the segment texts, and performing diagnosis confirming detection by combining the context semantics and the chapters and/or the disease types to obtain a diagnosis confirming detection result output by the diagnosis confirming detection model.

Specifically, during the confirmed diagnosis detection, the position of the candidate disease in the medical record text has a great influence on whether the candidate disease is confirmed in the medical record text, for example, most of the diseases contained in the "past history" of the medical record text are not the diseases confirmed at this time. Therefore, when the diagnosis of the candidate diseases is performed, not only the context semantics of the candidate diseases can be considered, but also the positions of the candidate diseases in the medical record text, that is, the chapters of the segment text in the medical record text, can be considered.

Correspondingly, in the input stage of the confirmed diagnosis detection model, the candidate diseases and the segment texts of the candidate diseases can be input, and the sections of the segment texts in the medical record texts can be simultaneously input, so that the context semantics of the candidate diseases and the sections of the segment texts can be referred to for analysis when the confirmed diagnosis detection model carries out confirmed diagnosis detection. Here, the chapter name may be directly input for the input of the chapter, or a code corresponding to the chapter name may be input, which is not specifically limited in the embodiment of the present invention.

Further, the chapter of the segment text in the medical record text can be determined according to the attribution relationship between each segment text and each chapter in the medical record text in the structured medical record text, or the attribution relationship between each segment text and each chapter in the sample medical record text can be marked, so that the chapter recognition model is trained, after the chapter recognition model is obtained, the segment text can be input into the chapter recognition model which is trained in advance, and the chapter recognition model identifies the chapter of the segment text, so that the chapter to which the segment text belongs can be obtained. In addition, due to the influence of the nature of the disease itself, some diseases can be cured in a short period of time, some diseases require a longer treatment period, and some diseases may not be cured, for example, traumatic bleeding can be cured in a short period of time, traumatic bleeding is diagnosed before, the definitive diagnosis of traumatic bleeding in the medical record text cannot continue to exist, for example, cerebral infarction cannot be cured, cerebral infarction is diagnosed last time, and the definitive diagnosis of cerebral infarction still exists in the medical record text. Thus, in performing a definitive diagnosis for a candidate disease, not only the contextual semantics of the candidate disease may be taken into account, but also the disease type of the candidate disease, where the disease type is intended to reflect the persistent nature of the disease, e.g., the disease type may be a short-term cure or a long-term treatment.

Further, the disease type of the candidate disease may be directly determined according to a preset correspondence between various diseases and the disease type, or a type recognition model may be trained according to a disease type corresponding to a pre-labeled sample disease, after the type recognition model is obtained, the disease name of the candidate disease may be input into the trained type recognition model, and the type recognition model performs type recognition on the input candidate disease, thereby obtaining the disease type of the candidate disease.

Correspondingly, in the input stage of the confirmed diagnosis detection model, the candidate diseases and the fragment texts of the candidate diseases can be input, and the disease types of the candidate diseases can be simultaneously input, so that the confirmed diagnosis detection model can be analyzed by referring to the context semantics of the candidate diseases and the disease types when the confirmed diagnosis detection is carried out. Here, the disease type may be directly input or a code corresponding to the disease type may be input for the input of the disease type, which is not particularly limited in the embodiment of the present invention.

In addition, in the input stage of the confirmed diagnosis detection model, the chapters of the segment texts in the medical record text and the disease types of the candidate diseases can be simultaneously input while the segment texts of the candidate diseases and the candidate diseases are input, so that the confirmed diagnosis detection model can refer to the context semantics of the candidate diseases, the chapters of the segment texts in the medical record text and the disease types for analysis when the confirmed diagnosis detection is carried out, and a more reliable confirmed diagnosis detection result is obtained.

The method provided by the embodiment of the invention combines the chapters of the segment texts of the candidate diseases in the medical record text and/or the disease types of the candidate diseases in the definite diagnosis process, so that whether the situation described by the segment texts is related to the diagnosis can be judged from the chapters in the definite diagnosis process, and whether the candidate diseases have persistence or not can be judged from the disease types, thereby providing more reference information for whether the candidate diseases are the definite disease of the diagnosis, and being beneficial to improving the reliability and accuracy of definite diagnosis detection.

Based on any of the above embodiments, fig. 5 is a schematic structural diagram of the definitive diagnosis detection model provided by the present invention, and as shown in fig. 5, the inputs of the definitive diagnosis detection model include candidate diseases, such as "asthma", segment texts of the candidate diseases, such as "possibility of preventing asthma", and disease types and chapter codes of the candidate diseases, and the classification result of whether diagnosis is confirmed is output, if yes, the candidate diseases may have missed diagnosis, and if not, the candidate diseases do not have missed diagnosis.

The segment text can be mapped into two parts of word coding and position coding, wherein the word coding represents a word vector of each word in the segment text, and can be specifically represented as H ═ H₀,...,h_n],h_i∈R^sWherein h is_iThe word vector of the ith word in the segment text is obtained, n is the length of the segment text, and s is the encoding dimension. The position code represents the relative bit information of the segment text and the candidate disease, for example, in fig. 5, the position code is specifically the distance and direction from each word in the segment text to the candidate disease character, -1 represents 1 word from the candidate disease character and is located at the left side of the candidate disease character, 2 represents 2 words from the candidate disease character and is located at the right side of the candidate disease character, and 0 represents that the word belongs to the candidate disease character. For the case where the distance is greater than some threshold set in advance, for example, 30, the position code may still be set to-30 or 30. Subsequently during the training process, if a better position code is learned, the position code can be expressed as P ═ P₀,...,p_n],p_i∈R^sWherein. p is a radical of_iThe position of the ith word is encoded.

The chapter codes in fig. 5 are represented as boxes filled with oblique lines on the rightmost side of the same line as the word codes, different chapters may correspond to different chapter codes, the chapter codes here are codes with dimension s, and may be directly spliced at the end of the word code sequence, where the word code sequence length is n + 1.

Furthermore, the input disease type can be expressed in a form of disease type code with dimension s, and the disease type code can be added to each character of the candidate disease in the segment text.

Thereafter, word encoding (including chapter encoding) and position encoding of the segment text may be concatenated to obtain contextual semantics, where the table for each wordShown as e_i＝[h_i；p_i],i＝1,...,n+1,h_i∈R^2s. And then coding the context semantics through a multilayer attention coder, so that the candidate diseases and the context semantics are fully interacted, information describing whether the candidate diseases are diagnosed is obtained from the segment text, and the coding result is T ═ T [ [ T [ ]₀,...,t_n],t_i∈R^2s. The multi-layer attention encoder can select a transformer so as to facilitate full interaction between the corresponding codes of the candidate diseases and the context semantics and the section codes, and finally, the corresponding codes of the candidate diseases in the output result of the multi-layer encoder can be fused with the context semantics and the section coding information.

Then adding the codes of the positions corresponding to the disease names to obtain the indication that whether the disease is diagnosed or not_jh_j,h_i∈R^2sWhere j is the position index where the disease name is located, i.e., the portion whose position codes are 0. Then, nonlinear transformation is carried out through a multi-layer full connection layer, and the last layer parameter can be W epsilon R^s×1,b∈R¹The function is activated using sigmoid. The output result obtained by the method is a probability value, if the output value is larger than a threshold value, the candidate disease is a confirmed disease, and if the output value is smaller than the threshold value, the candidate disease is an undiagnosed disease.

The diagnosis confirming detection model can use cross entropy as a damage function during training, is optimized by an Adam method, and utilizes all parameters in a back propagation learning model.

Based on any of the above embodiments, in step 133, when the medical record text is subjected to missed diagnosis detection based on the confirmed diagnosis detection result, the candidate diseases that have been confirmed as the confirmed diagnosis detection result may be compared with the diseases listed in the discharge diagnosis, so as to determine whether there is missed diagnosis.

However, due to the complexity of natural language itself, clinical names of the same disease may have many different writing methods, such as "upper respiratory infection" and "upper respiratory infection", i.e., different representations of the same disease, and simply performing a match-and-compare cannot be used to determine whether a disease that has been diagnosed in a medical record is present in the list of discharged diagnoses.

Based on this, fig. 6 is one of the flow diagrams of step 133 in the medical record missed diagnosis detection method provided by the present invention, and as shown in fig. 6, step 133 includes:

step 1331, using the candidate disease with confirmed disease as the candidate missed diagnosis disease.

Step 1332, inputting the candidate missed diagnosis diseases and the various diagnosis diseases corresponding to the medical history text into the disease representation model respectively, and obtaining the disease representation of the candidate missed diagnosis diseases output by the disease representation model and the disease representation of the various diagnosis diseases; the disease representation model is trained based on the clinical name of each disease, as well as the positive case disease coding standard name and the negative case clinical name.

Step 1333, screening the candidate missed-diagnosis diseases based on the similarity between the disease representation of the candidate missed-diagnosis diseases and the disease representation of each diagnosis disease.

Specifically, the confirmed detection result is a candidate disease for the confirmed disease, that is, a disease that may have a risk of missed diagnosis.

In order to avoid the problem of false alarm of missed diagnosis detection caused by the fact that the condition that various clinical names of the same disease cannot be covered by mechanical comparison from a text, the disease representation model trained in advance in the embodiment of the invention directly converts candidate missed diagnosis diseases and various diagnosis diseases needing to be compared into a disease representation form, so that the interference of the clinical names can be ignored, and accurate and reliable comparison is realized. Here, the diagnostic diseases may be diseases in a discharge diagnosis list.

In order to ensure that various clinical names of the same disease can obtain consistent disease representation through a disease representation model, the disease representation of the various clinical names of the same disease can be close to the standard disease name of the disease code of the disease as much as possible by using the corresponding relation between the various clinical names of the same disease and the same disease code by taking the disease code as a reference, namely the disease representation of the disease code standard name, so that the distance between the disease representations of the various clinical names of the same disease is shortened. The disease code may be an International Classification of Diseases (ICD) code, or may be another coding scheme for classifying Diseases.

In contrast, before step 1332 is executed, a disease representation model needs to be obtained through pre-training, the training of the disease representation model can be realized by the disease representation model based on the clinical name of each disease, and the standard name of the positive case disease code and the negative case clinical name, where for the clinical name of any disease, the positive case disease code is the disease code actually corresponding to the disease, and the standard name of the positive case disease code is the standard disease name of the disease code actually corresponding to the disease; the negative clinical name is a clinical name of a disease different from the clinical name, and the negative clinical name can be a clinically diversified name of other diseases or a standard disease name of a disease code corresponding to other diseases.

Specifically, in the training process, the disease representation model may output a predicted representation of the clinical name or disease coding standard name of each disease for the clinical name or disease coding standard name of each disease, a predicted representation of the positive disease coding standard name of each disease for the positive disease coding standard name of each disease, and a predicted representation of the negative clinical name of each disease for the negative clinical name of each disease, where the predicted representations are the representation vectors output in the training process. Considering that the clinical names of the diseases and the positive disease codes thereof reflect the same disease conditions, the distance between the predicted representation of the clinical names of the diseases and the predicted representation of the positive disease code standard names can be reduced as much as possible in the model training process, on the contrary, the clinical names of the diseases and the negative clinical names thereof reflect different disease conditions, the distance between the predicted representation of the clinical names of the diseases and the predicted representation of the negative clinical names thereof can be enlarged as much as possible in the model training process, so that the disease representation model obtained by training can output the disease representations close to the disease representation of the disease code standard names when the disease representations are expressed for different clinical names under the same disease codes, therefore, the disease representations corresponding to different clinical names of the same disease can ignore the interference of the clinical names, reflecting the nature of the disease itself.

Based on the thus obtained disease representation of the candidate missed diagnosis disease and the disease representations of the various diagnosis diseases, a similarity between the disease representation of the candidate missed diagnosis disease and the disease representations of the various diagnosis diseases can be calculated, wherein the higher the similarity between the disease representation of the candidate missed diagnosis disease and the disease representation of any diagnosis disease is, the more likely the candidate missed diagnosis disease is to be the diagnosis disease, i.e., the higher the probability that the candidate missed diagnosis disease is recorded in the discharge diagnosis is, the more unlikely the candidate missed diagnosis disease is to be the missed diagnosis disease; the lower the similarity between the disease representation of the candidate missed diagnosis disease and the disease representation of each diagnosed disease, the less likely the candidate missed diagnosis disease is to be a listed item of diagnosed disease, i.e. the lower the probability that the candidate missed diagnosis disease is recorded in the discharge diagnosis, the more likely the candidate missed diagnosis disease is to be a missed diagnosis disease.

Specifically, when candidate missed diagnosis diseases are screened out based on the similarity between the disease representation of the candidate missed diagnosis diseases and the disease representation of each diagnosis disease, a similarity threshold value can be preset, if the similarity between the disease representation of the candidate missed diagnosis diseases and the disease representation of any diagnosis disease is greater than the similarity threshold value, the candidate missed diagnosis diseases can be considered as the diagnosis diseases, no risk of missed diagnosis exists, and the candidate missed diagnosis diseases can be deleted; if the similarity between the disease representation of the candidate missed diagnosis disease and the disease representation of each diagnosis disease is less than the similarity threshold, the candidate missed diagnosis disease can be considered not to be listed in the discharge diagnosis, and the candidate missed diagnosis disease can be determined as the missed diagnosis disease.

According to the method provided by the embodiment of the invention, the disease representation model is trained by means of the corresponding relation between various clinical names of the same disease and the same ICD code, so that different clinical names of the same disease can correspond to the same disease representation, the interference of the clinical names is ignored, accurate and reliable disease comparison is realized, and the accuracy of missed diagnosis detection is ensured.

Based on any of the above embodiments, the loss function of the disease representation model is determined based on the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the positive case disease coding standard name of each disease, and the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the negative case clinical name of each disease;

the predictive representation of the clinical name of the disease is determined based on the clinical name of the disease for the disease representation model during training, and the predictive representation of the disease code standard name is determined based on the disease code standard name for the disease representation model during training.

Specifically, during the training process of the disease representation model, the disease representation model may encode the input clinical name of the disease, thereby outputting a predicted representation of the clinical name of the disease, where the predicted representation is the disease representation output by the model for the clinical name of the disease during the training process. Similarly, the disease representation model may encode the input disease coding standard name to output a predicted representation of the disease coding standard name, where the predicted representation is the disease representation output by the model for the disease coding standard name during the training process.

Considering that there may be a plurality of different clinical names for the same disease, in the training process, the loss function of the disease representation model may be constructed with the goal of maximizing the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the positive case disease coding standard name of each disease, and minimizing the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the negative case clinical name of each disease. The greater the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the normative disease coding standard name of each disease, the closer the predicted representation of each disease and the corresponding disease coding standard name is, the smaller the interference of different clinical names under each disease on the predicted representation is, and the more the problem that the diseases cannot be matched due to different clinical names can be avoided; the smaller the similarity between the predicted representation of each disease and the predicted representation of the negative case clinical name of each disease, the larger the difference between the predicted representation of each disease and the predicted representation of the disease irrelevant to each disease, and the clearer the difference between the different clinical names of each disease and the predicted representations of other diseases, thereby ensuring the confusion of the similar clinical names of different diseases.

Based on any of the above embodiments, fig. 7 is a schematic diagram of the training of the disease expression model provided by the present invention, and as shown in fig. 7, the left-side dashed-line block diagram indicates how to construct samples required for training of the disease expression model, i.e., a positive sample pair consisting of the clinical name of each disease and the positive disease coding standard name of each disease, and a negative sample pair consisting of the clinical name of each disease and the negative clinical name of each disease. Here the disease code is ICD code. In fig. 7, a line with √ indicates a positive example ICD coding standard name where the connected disease name is the current disease clinical name, and a line with × indicates a negative example where the connected disease name is the current disease clinical name. Taking the clinical name 1 of the disease as an example, the discharging diagnosis list is negative except the ICD coding standard name corresponding to the clinical diagnosis is positive. The positive case of the corresponding ICD coding standard name 1 is only the clinical name 1 of the disease, and the clinical names of other discharge diagnoses and the ICD standard name are negative cases. The positive and negative examples of other diagnosis names (including clinical names and ICD standard names) are constructed in the same way, and all the positive and negative examples generated in the discharge diagnosis list of a medical record form a batch for training.

In fig. 7, the right-side dashed-line block diagram represents the training framework of the disease representation model. Specifically, in the comparative learning of the disease expression model, the model architecture used is a model structure with upper and lower symmetric branches. Taking the clinical name 1 of the disease and the ICD standard name 1 to calculate the comparison relationship as an example, the input of the upper and lower branches are the clinical name and the ICD coding standard name, respectively. The upper and lower branches have the same structure, use the same parameters, and map the input to the same representation space after multi-layer coding. And finally, measuring the similarity of the upper branch and the lower branch by using a similarity measurement function. In contrast learning, for a positive example, it is desirable that it is as close as possible in the representation space; and for the negative example, it is desirable that it be as far as possible in the representation space.

Further, the encoding process of any branch (the upper and lower branches are the same) is taken as an example for explanation:

inputting the name of a disease, firstly mapping the disease name to a word vector E E R according to words^l×eWhere l is the number of characters and e is the word vector dimension. Then, the encoder encodes the word vector of the disease name to obtain the coded representation of the disease, wherein the encoder may be specifically in the form of TextCNN (Text Convolutional Neural Networks) or LSTM (Long Short-Term Memory Networks), and the obtained coded representation may be H ═ TextCNN (e), H ∈ R, and R ∈ R^eE.g. the ith disease name, the corresponding coded representation of which may be h_iWhere the 3 filter _ sizes (convolution kernel sizes) of TextCNN may be 3, 4, 5. A Projector is a transformer that maps a coded representation of a disease name to a more abstract dimension associated with a task to yield a disease representation, which may be Z ═ Projector (h) ═ FC (Relu (FC (h))), Z ∈ R ═ R (R) >)^eWhere FC denotes a fully connected network and Relu denotes an activation function, e.g. the ith disease name, whose corresponding disease representation may be Z_i。

The similarity metric function may be cosine similarity, euclidean distance, or the like, and may be represented as:

wherein | Z | Y purple₂Denotes the L2 norm, Z_iAnd Z_jDisease representations for the ith and jth disease names, respectively.

Based on any of the above embodiments, the training Loss function of the disease representation model may use InfoNCE Loss, and the InfoNCE Loss corresponding to a certain disease name is:

wherein Z⁺Is represented by the formula Z_iIn the positive case, K is the number of disease names in one batch. t is a temperature coefficient, which can be set according to training requirements, and can be set to 0.1, for example. In one training, a certain disease name needs to be equal to all of the batchOther names need to calculate their similarity, and only the clinical name and its ICD standard name are positive examples, and the others are negative examples. Through training, clinical names and ICD standard names are closer to each other in the representation space, and other disease names are further away from each other in the representation space, so that a better disease representation is obtained through learning in the process.

Based on any of the above embodiments, fig. 8 is a second flowchart of the step 133 in the medical record missed diagnosis detection method provided by the present invention, as shown in fig. 8, the disease representation model is a model with an encoder + Projector structure obtained by training in the training manner shown in fig. 7, and the candidate missed diagnosis disease and the disease representation of each diagnosis disease in the discharge diagnosis list are respectively input into the disease representation model, so as to obtain the candidate missed diagnosis disease and the disease representation of each diagnosis disease respectively output by the disease representation model, and on this basis, the similarity Sim between the disease representation of the candidate missed diagnosis disease and the disease representation of each diagnosis disease is respectively calculated, so as to obtain the conclusion whether the candidate missed diagnosis disease is missed diagnosis or not.

In addition, the flowchart shown in fig. 8 may also be used as a whole model flow of a disease comparison model, where the input of the disease comparison model is the candidate missed diagnosis disease and each diagnosis disease in the discharge diagnosis list, and the output is whether the candidate missed diagnosis disease is missed, and the disease comparison model may take on the tasks of disease representation coding, disease representation similarity calculation, and output of a missed diagnosis conclusion of the candidate missed diagnosis disease and each diagnosis disease in the discharge diagnosis list. The disease comparison model can be obtained by training based on the parameters of the disease representation model and by using whether each candidate missed diagnosis disease in the sample medical record is missed diagnosis or not as a label on the basis of the disease representation model obtained by training.

In particular, during the training of the disease alignment model, cross entropy may be used as a loss function. The parameters of the Encoder module are updated with the training process using the pre-training results of the disease representation model. The parameters of the projector module are initialized randomly and updated with the training process.

Based on any of the above embodiments, fig. 9 is a second schematic flow chart of the medical record missed diagnosis detection method provided by the present invention, and as shown in fig. 9, the medical record missed diagnosis detection method includes the following steps:

firstly, a medical record text to be detected is determined.

Secondly, the medical record text is mined to possibly have missed diagnosis diseases.

Here, the medical data mining is divided into explicit candidate medical data mining and implicit candidate medical data mining. Aiming at the explicit candidate diseases, the medical history text can be directly subjected to disease named body recognition for mining; for implicit candidate diseases, pre-collected diagnostic gold standard and/or specific drug information can be adopted for mining.

After the candidate diseases are obtained by mining, the diagnosis omission judgment can be carried out in two steps:

firstly, carrying out preliminary diagnosis omission judgment based on context semantics: the confirmed diagnosis detection can be specifically carried out based on the context semantics of the fragment text containing the candidate diseases in the medical record text, if the confirmed diagnosis detection result of the candidate diseases is confirmed diagnosis, the candidate diseases are diseases possibly with missed diagnosis risks, and the missed diagnosis based on comparison is carried out on the diseases marked as the candidate missed diagnosis diseases for judging again; if the result of the confirmed diagnosis detection of the candidate disease is not confirmed, the possibility that the candidate disease is a missed diagnosis disease is excluded, and the candidate disease is directly generalized to a non-missed diagnosis disease.

Then, for the candidate missed diagnosis disease, performing comparison-based missed diagnosis re-judgment: the candidate missed diagnosis diseases and the diagnosis diseases in the discharge diagnosis list can be respectively input into the disease representation model to obtain the candidate missed diagnosis diseases and the disease representations of the diagnosis diseases which are respectively output by the disease representation model, and on the basis, the similarity between the disease representation of the candidate missed diagnosis diseases and the disease representation of each diagnosis disease is respectively calculated, so that the conclusion whether the candidate missed diagnosis diseases are missed diagnosis or not can be obtained on the basis.

The method provided by the embodiment of the invention needs to find out all candidate diseases and carries out secondary judgment on each candidate disease to ensure that the candidate disease is a real missed diagnosis disease. The judgment basis of each missed disease in the medical record can be found in the final output result, and the interpretability is strong.

Based on any of the above embodiments, fig. 10 is a schematic structural diagram of a medical record missed diagnosis detection apparatus provided by the present invention, and as shown in fig. 10, the apparatus includes:

a text determining unit 1010, configured to determine a medical record text to be detected;

a medical data mining unit 1020, configured to perform medical data mining on the medical record text to obtain candidate diseases included in the medical record text;

a missed diagnosis detection unit 1030, configured to perform missed diagnosis detection on the medical record text based on context semantics of the candidate disease in the medical record text.

According to the device provided by the embodiment of the invention, the candidate diseases in the medical record text are mined, manual operation is not needed in the process of carrying out missed diagnosis detection on the medical record text based on the candidate diseases, time and labor are saved, the medical record missed diagnosis detection efficiency is provided, meanwhile, the missed diagnosis detection is carried out on the medical record text based on the context semantics of the candidate diseases in the medical record text, and the accurate and reliable missed diagnosis detection can be realized under the condition that the medical record text is complicatedly written.

According to any of the above embodiments, the medical data mining unit 1020 is configured to:

Based on any embodiment, the implicit mining rule is determined based on the association degree between the diagnosis and treatment information and the disease in the disease knowledge text and/or the sample medical record.

Based on any of the above embodiments, the method further includes a mining rule determining unit, configured to:

Based on any of the above embodiments, the missed diagnosis detection unit 1030 is configured to:

Based on any of the above embodiments, the missed diagnosis detection unit 1030 is specifically configured to:

Fig. 11 illustrates a physical structure diagram of an electronic device, and as shown in fig. 11, the electronic device may include: a processor (processor)1110, a communication Interface (Communications Interface)1120, a memory (memory)1130, and a communication bus 1140, wherein the processor 1110, the communication Interface 1120, and the memory 1130 communicate with each other via the communication bus 1140. The processor 1110 may invoke logic instructions in the memory 1130 to perform a medical history missed diagnosis detection method comprising: determining a medical record text to be detected; performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text; and performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

In addition, the logic instructions in the memory 1130 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the medical record missing diagnosis detection method provided by the above methods, the method comprising: determining a medical record text to be detected; performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text; and performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the medical record missed diagnosis detection method provided above, the method comprising: determining a medical record text to be detected; performing medical data mining on the medical record text to obtain candidate diseases contained in the medical record text; and performing missed diagnosis detection on the medical record text based on the context semantics of the candidate diseases in the medical record text.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A medical record missed diagnosis detection method is characterized by comprising the following steps:

determining a medical record text to be detected;

2. The medical record missing-diagnosis detection method according to claim 1, wherein the mining of medical data of the medical record text to obtain candidate diseases contained in the medical record text comprises:

3. The medical record missed-diagnosis detection method according to claim 2, wherein the implicit mining rules are determined based on association between medical information and diseases in a disease knowledge text and/or sample medical records.

4. The medical record missing diagnosis detection method according to claim 3, wherein the implicit mining rule is determined based on the following steps:

5. The medical record missing-diagnosis detection method according to any one of claims 1 to 4, wherein the missing-diagnosis detection of the medical record text based on the context semantics of the candidate diseases in the medical record text comprises:

6. The medical record missing diagnosis detection method according to claim 5, wherein the inputting the candidate diseases and the segment texts of the candidate diseases into a confirmed diagnosis detection model, determining the context semantics of the candidate diseases based on the segment texts by the confirmed diagnosis detection model, and performing confirmed diagnosis detection based on the context semantics to obtain the confirmed diagnosis detection result output by the confirmed diagnosis detection model comprises:

7. The medical record missed diagnosis detection method according to claim 5, wherein the missed diagnosis detection of the medical record text based on the confirmed diagnosis detection result comprises:

8. The medical record missing-diagnosis detection method according to claim 7, wherein the loss function of the disease representation model is determined based on the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the positive case disease code standard name of each disease, and the similarity between the predicted representation of the clinical name of each disease and the predicted representation of the negative case clinical name of each disease;

9. A medical record missed diagnosis detection device is characterized by comprising:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the medical record omission detection method according to any one of claims 1-8.

11. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the medical record missed diagnosis detection method according to any one of claims 1 to 8.