CN113658652B

CN113658652B - Binary relation extraction method based on electronic medical record data text

Info

Publication number: CN113658652B
Application number: CN202110946939.7A
Authority: CN
Inventors: 朱婷; 张伟; 刘瑞航
Original assignee: West China Hospital of Sichuan University
Current assignee: West China Hospital of Sichuan University
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2023-07-28
Anticipated expiration: 2041-08-18
Also published as: CN113658652A

Abstract

The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting corresponding binary relation from the preprocessed source data text; c. and matching the extracted results according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Has the advantages of rapider, more detailed and accurate extracted content, and the like.

Description

Binary relation extraction method based on electronic medical record data text

Technical Field

The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.

Background

The electronic medical record is a medical record which is stored, managed, transmitted and copied by medical staff in the process of medical activity by utilizing digital information such as texts, symbols, charts, graphs, data, images and the like generated by a medical institution information system. By analyzing the electronic medical records, a great deal of medical knowledge closely related to patients can be mined.

There is a large amount of text in electronic medical records describing the patient's condition, but it is likely that one patient is currently admitted with multiple complications. When the personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction method is often based on dictionary and deep learning, but the extraction speed of the method in practical application is relatively low, and the detail degree and accuracy of the extracted content need to be improved.

Disclosure of Invention

The invention aims to provide a binary relation extraction method based on electronic medical record data text.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:

a. inputting source data, and preprocessing the source data;

b. extracting corresponding binary relation from the preprocessed source data text;

c. and matching the extracted results according to the ID, sorting, storing and outputting.

Further, the preprocessing comprises the following steps:

a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;

a102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation;

a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library to be extracted.

Further, the step b includes the following extraction steps:

b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;

b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;

b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;

step B104, judging the number of the extracted numbers, and if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.

Further, the pretreatment further comprises manual auxiliary treatment.

Further, the source data includes txt, xlsx or csv text.

The beneficial effects of the invention are concentrated in that:

the invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Compared with the traditional dictionary-based and deep learning method, the method has the advantages of being quicker, enabling the extracted content to be more detailed and accurate, and the like.

Drawings

Fig. 1 is an extraction flow chart of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

As shown in FIG. 1, the invention can rapidly and automatically extract self-defined binary relation pairs, such as symptoms, duration, medication, dosage and the like, from texts by using a binary relation extraction method based on electronic medical record data texts, and specifically comprises the following extraction steps:

a. source data input, namely, a list of independent data is subject to source data preprocessing, wherein the source data comprises txt, xlsx or csv texts in the embodiment;

the preprocessing of the source data specifically comprises the following steps:

for example: removing special Chinese characters: cause, due, etc.;

removing punctuation marks: "etc.;

replacement of ambiguous chinese characters: "one and a half years", "2 years 7 months", etc.;

processing exception substitution: "reverse the same 1 problem", "2 episodes of loss of consciousness within 20 years", etc.

A102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation; for example: each symptom + duration is a session, palpitation is afraid of 3+ months, aggravating 15+ days accompanying the injury;

a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library (list) to be extracted.

In step a, the pretreatment further comprises a manual auxiliary treatment for manually removing special symptoms and special complaints.

the specific extraction steps in this step in this embodiment are:

and B104, in the step, firstly judging whether to extract the number plus unit, if so, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extractions is more than one, namely two or three, position judgment is carried out, and the binary corresponding relation is updated.

c. Matching the extracted results according to the ID, sorting, storing and outputting; three lists are stored in this step, and converted into a CSV output file through a dataframe.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the acts and elements referred to are not necessarily required in the present application.

Claims

1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps of: the method comprises the following extraction steps:

a. inputting source data, and preprocessing the source data;

c. matching the extracted results according to the ID, sorting, storing and outputting;

the step b comprises the following extraction steps:

step B104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentences into sentences in the cache library, and corresponding to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.

2. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment comprises the following steps:

3. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment further comprises manual auxiliary treatment.

4. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the source data includes txt, xlsx or csv text.