CN113658652A

CN113658652A - Binary relation extraction method based on electronic medical record data text

Info

Publication number: CN113658652A
Application number: CN202110946939.7A
Authority: CN
Inventors: 朱婷; 张伟; 刘瑞航
Original assignee: West China Hospital of Sichuan University
Current assignee: West China Hospital of Sichuan University
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-11-16
Anticipated expiration: 2041-08-18
Also published as: CN113658652B

Abstract

The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on an electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting the corresponding binary relation of the preprocessed source data text; c. and matching the extracted result according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Has the advantages of rapidness, more detailed and accurate extracted content and the like.

Description

Binary relation extraction method based on electronic medical record data text

Technical Field

The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.

Background

An electronic medical record is a medical record in which digital information such as text, symbols, charts, graphs, data, images and the like generated by medical staff by using a medical institution information system is stored, managed, transmitted and copied during medical activities. Through the analysis of the electronic medical record, a lot of medical knowledge closely related to the patient can be mined.

There is a lot of text in electronic medical records describing the condition of a patient, but it is possible that when a patient is currently admitted, it is often accompanied by a variety of disease complications. When personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction methods are usually based on dictionaries and deep learning, but the extraction speed of the methods in practical application is relatively slow, and the details and accuracy of the extracted content need to be improved.

Disclosure of Invention

The invention aims to provide a binary relation extraction method based on an electronic medical record data text.

In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:

a. inputting source data, and preprocessing the source data;

b. extracting the corresponding binary relation of the preprocessed source data text;

c. and matching the extracted result according to the ID, sorting, storing and outputting.

Further, the pre-treatment comprises the following steps:

a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;

a102, clause: setting punctuation marks for sentence division according to the text characteristics;

and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library to be extracted.

Further, the step b comprises the following extraction steps:

b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;

b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;

b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;

b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.

Further, the pretreatment also comprises manual auxiliary treatment.

Further, the source data includes txt, xlsx, or csv text.

The beneficial effects of the invention are concentrated and expressed as follows:

the invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Compared with the traditional method based on a dictionary and deep learning, the method has the advantages of rapidness, more detailed and accurate extracted content and the like.

Drawings

FIG. 1 is a flow chart of the extraction process of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the invention provides a binary relation extraction method based on an electronic medical record data text, which can quickly and automatically extract a self-defined binary relation pair from the text, such as symptom + duration, medication + dosage, and the like, and specifically comprises the following extraction steps:

a. source data input, that is, a single column of data is complained, and source data preprocessing is performed, where the source data includes txt, xlsx, or csv text in this embodiment;

the preprocessing of the source data specifically comprises the following steps:

for example: removing special Chinese characters: "therefore", and the like;

removing punctuation marks: "and the like;

replacing ambiguous Chinese characters: "one and half year", "2 years and 7 months", etc.;

processing exception replacement: "reverse the same 1 question", "2 episodes of loss of consciousness within 20 years", etc.

A102, clause: setting punctuation marks for sentence division according to the text characteristics; for example: each symptom + duration is one period, the patient feels panic for 3+ months, and the patient is aggravated with self-injury for 15+ days;

and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library (list) to be extracted.

In step a, the pretreatment also comprises a manual auxiliary treatment, and special symptoms and special complaints are removed manually.

the specific extraction steps in this step in this embodiment are:

and B104, in the step, firstly judging whether the number + the unit is extracted or not, if the contents are successfully extracted, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extracted data is more than one, namely two or three data, position judgment is carried out, and the binary corresponding relation is updated.

c. Matching the extracted result according to the ID, sorting, storing and outputting; in this step, three list files are stored and converted into CSV output files through dataframe.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.

Claims

1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps: comprises the following extraction steps:

a. inputting source data, and preprocessing the source data;

2. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment comprises the following steps:

3. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the step b comprises the following extraction steps:

b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into the sentence in the cache library, and corresponding to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.

4. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment also comprises artificial auxiliary treatment.

5. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the source data includes txt, xlsx or csv text.