CN111177309A

CN111177309A - Medical record data processing method and device

Info

Publication number: CN111177309A
Application number: CN201911237998.6A
Authority: CN
Inventors: 齐振宇; 陈炜; 刘焱; 徐爽
Original assignee: Ningbo Zidong Cognitive Information Technology Co Ltd
Current assignee: Ningbo Zidong Cognitive Information Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-19
Anticipated expiration: 2039-12-05
Also published as: CN111177309B

Abstract

The embodiment of the invention relates to a method and a device for processing medical record data, wherein the method comprises the following steps: performing clause splitting on medical record data to be processed to obtain a clause set; marking each clause in the clause set according to the text content of the clause; and adding the clauses into corresponding table entries according to the labeling result of each clause to obtain semi-structured data. Therefore, the unstructured medical record data can be converted into semi-structured medical record data, so that the medical record data can be displayed in a more intuitive mode, and the requirements of medical staff can be better met.

Description

Medical record data processing method and device

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a method and a device for processing medical record data.

Background

The medical records are records of medical personnel on the processes of medical activities such as occurrence, development, outcome, examination, diagnosis, treatment and the like of diseases of patients, and have important functions in the aspects of medical treatment, prevention, teaching, scientific research, hospital management and the like.

Currently, existing medical records belong to unstructured data, in which recorded contents, such as current medical history, past medical history, and the like, are simply listed together in plain text. Therefore, the information expression in the existing medical record is not intuitive, and the efficiency of information retrieval is low.

Disclosure of Invention

In view of this, to solve the above technical problem or some technical problems, embodiments of the present invention provide a method and an apparatus for processing medical record data, so as to convert unstructured medical record data into semi-structured data, so that the medical record data is displayed in a more intuitive manner, so as to better meet the requirements of medical staff.

In a first aspect, an embodiment of the present invention provides a method for processing medical record data, including:

performing clause splitting on medical record data to be processed to obtain a clause set;

marking each clause in the clause set according to the text content of the clause;

and adding the clauses into corresponding table entries according to the labeling result of each clause to obtain semi-structured medical record data.

In a possible embodiment, the performing clause splitting on the medical record data to be processed to obtain a clause set includes:

determining a target punctuation mark according to a known label of medical record data to be processed;

and splitting clauses of the medical record data by using the target punctuations to obtain a clause set.

In a possible embodiment, when the known tags of the medical record data belong to a predefined class of tags, the labeling the clause according to the text content of the clause includes:

carrying out named entity identification on the clauses so as to mark out named entities in the clauses;

semantic annotation is carried out on the clauses after the named body recognition by utilizing a preset semantic annotation system so as to mark out the semantic category to which the clauses belong;

and carrying out time marking on the clauses identified by the named body according to the time class entity so as to mark the course of illness corresponding to the clauses.

In a possible embodiment, when the known tags of the medical record data belong to predefined two types of tags, the labeling the clause according to the text content of the clause includes:

and marking the clauses according to the known labels of the clauses.

In a possible embodiment, when the known tags of the medical record data belong to three predefined types of tags, the labeling the clause according to the text content of the clause includes:

determining a high-frequency clause set according to a preset medical record data sample set, wherein known labels of medical record data in the medical record data sample set are the three types of labels;

checking whether the clause is included in the high-frequency clause set;

if not, marking the clause as effective information.

In a possible implementation manner, the adding the clause to the corresponding entry according to the labeling result of each clause to obtain semi-structured data includes:

and for each clause, if the labeling result of the clause shows that the clause is effective information, adding the clause to the corresponding table entry to obtain semi-structured medical record data.

In a second aspect, an embodiment of the present invention provides an apparatus for processing medical record data, including:

the clause splitting module is used for splitting clauses of medical record data to be processed to obtain a clause set;

a clause marking module, configured to mark, for each clause in the clause set, the clause according to the text content of the clause;

and the data forming module is used for adding the clauses into the corresponding table entries according to the labeling result of each clause to obtain the semi-structured data.

In a possible implementation manner, the clause splitting module is specifically configured to determine a target punctuation mark according to a known label of medical record data to be processed; and splitting clauses of the medical record data by using the target punctuations to obtain a clause set.

In a possible embodiment, when the known label of the medical record data belongs to a predefined class of labels, the clause labeling module is specifically configured to perform named entity identification on the clause to label out a named entity in the clause; semantic annotation is carried out on the clauses after the named body recognition by utilizing a preset semantic annotation system so as to mark out the semantic category to which the clauses belong; and carrying out time marking on the clauses identified by the named body according to the time class entity so as to mark the course of illness corresponding to the clauses.

In a possible embodiment, when the known label of the medical record data belongs to the predefined second class label, the clause labeling module is specifically configured to label the clause according to the known label of the clause.

In a possible implementation manner, when the known labels of the medical record data belong to three predefined types of labels, the clause labeling module is specifically configured to determine a high-frequency clause set according to a preset medical record data sample set, where the known labels of the medical record data in the medical record data sample set are the three types of labels; checking whether the clause is included in the high-frequency clause set; if not, marking the clause as effective information.

In a possible implementation manner, the data forming module is specifically configured to, for each clause, add the clause to a corresponding entry if a labeling result of the clause indicates that the clause is valid information, so as to obtain semi-structured medical record data.

According to the medical record data processing scheme provided by the embodiment of the invention, the clause set is obtained by splitting the clauses of the medical record data to be processed, the clauses are labeled according to the text content of the clauses aiming at each clause in the clause set, and the clauses are added into the corresponding table entry according to the labeling result of each clause to obtain the semi-structured medical record data.

Drawings

FIG. 1 is an example of an existing medical record;

fig. 2 is a flowchart of an embodiment of a method for processing medical record data according to an exemplary embodiment of the present invention;

fig. 3 is an example of an implementation flow of labeling a clause according to the text content of the clause for each clause in the clause set according to the embodiment of the present invention;

fig. 4 is another example of an implementation flow related to labeling a clause according to the text content of the clause for each clause in the clause set according to the embodiment of the present invention;

fig. 5 is a block diagram of an embodiment of a medical record data processing apparatus according to an exemplary embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

For ease of understanding, some concepts involved in the present application will be described first using existing medical records as an example:

fig. 1 is a diagram illustrating an example of a medical record.

As shown in fig. 1, the existing medical records contain a very large amount of contents, including information such as personal information, causes, symptoms, examination conditions, medical history, physical conditions, medical history, and family medical history of patients. Wherein, different information corresponds to different known tags, for example, fig. 1 includes: the medical history, chief complaints, special examinations, auxiliary examinations, past history, personal history, family history, marriage and childbirth history and physical examinations are 9 known labels.

The invention has different degrees of effects on the diagnosis of patients based on different information, and the 9 known labels are divided into three types which are respectively called as a first type label, a second type label and a third type label for the convenience of description. The information corresponding to the first class of labels plays a decisive role in the diagnosis of the patient, and the difference of the information corresponding to the first class of labels is very obvious due to different patients, so that the 'current medical history' is defined as the first class of labels; the information corresponding to the second class of labels plays a main reference role in the diagnosis of patients, and three labels of chief complaints, special examinations and auxiliary examinations are defined as the second class of labels based on the information; the information corresponding to the three types of tags plays a secondary reference role in diagnosis of patients, and the personalized characteristics are not obvious, and based on the information, the 5 tags of the past history, the personal history, the family history, the marriage and education history and the physical examination are defined as the three types of tags.

The processing of unstructured data provided by the present invention is described in detail below by way of specific examples:

referring to fig. 2, a flowchart of an embodiment of a method for processing medical record data according to an exemplary embodiment of the present invention is provided, where the method includes the following steps:

step 201: and splitting clauses of the medical record data to be processed to obtain a clause set.

As can be seen from the medical record shown in fig. 1, the content corresponding to each label is composed of one or more sentences, and based on this, in this step, a clause set can be obtained by performing clause splitting on medical record data to be processed.

As an embodiment, because the contents corresponding to different labels play different degrees of roles in diagnosing patients, the resolution details adopted for the contents corresponding to different labels may be different, and based on this, when performing clause splitting on medical record data to be processed, firstly, punctuation marks (hereinafter, referred to as target punctuation marks) are determined according to known labels of the medical record data to be processed, and the medical record data is subjected to clause splitting by using the target punctuation marks.

For example, for the content corresponding to the "present medical history" tag, clauses are split using commas, colons and semicolons, and for the content corresponding to other tags, the content is split using commas.

As an example, the clause resulting from splitting may contain three parts: the medical record id number, the label and the text content of the clause.

For example, assume that the original medical record data is as follows:

000012321-present history-patient underwent radical cervical cancer therapy in the hospital by cervical cancer 6 years ago. Radiotherapy is carried out for two months after operation, and no relapse sign is found after regular reexamination.

After the clauses are split, the obtained clauses are as follows:

000012321-present history-curative effect on cervical cancer in hospital before 6 years

000012321 current medical history postoperative radiotherapy for two months

000012321 current medical history periodic review shows no evidence of recurrence

Step 202: and marking the clauses according to the text contents of the clauses aiming at each clause in the clause set.

First, in this step, different labeling methods are adopted for clauses with different labels. The labeling methods adopted by the above three types of labels are described as follows:

(1) known tags of medical record data belong to a predefined class of tags:

referring to fig. 3, an example of an implementation flow of step 202 is shown in fig. 2, which includes the following steps:

step 301: and carrying out named entity recognition on the clauses by utilizing a preset medical knowledge map so as to mark the named entities in the clauses.

First, the purpose of this step 301 is to label both medical entities (e.g., diseases, symptoms, etc.) and general entities (e.g., time, numbers, etc.) in the clause. Based on this, in this step, an entity vocabulary corresponding to 7 kinds of concepts, which are diseases, causes, symptoms, parts, medicines, examination and treatment means, respectively, can be obtained first based on the medical knowledge map; then, the medical entity in the clause can be marked based on the entity word list; finally, the number and time entities in the clause are noted.

For example, the following is the labeling result obtained by executing the step 301:

000012321-present history-patient before 6 years-cervical cancer in hospital row-cervical cancer radical operation-patient before [ time ] in hospital row [ operation ] _6 years-cervical cancer radical operation

000012321 current medical history postoperative radiotherapy two months postoperative radiotherapy [ time ] two months postoperative radiotherapy

000012321 current medical history periodic review for no evidence of recurrence

Furthermore, it should be noted that if there is a type of clause where there is no named entity, the clause itself may be retained as the annotation result.

Step 302: and performing semantic annotation on the clauses identified by the named body by using a preset semantic annotation system so as to annotate the semantic categories to which the clauses belong.

First, the purpose of this step 302 is to classify the clauses after recognition of the named object into a certain category under the semantic annotation system.

The construction of the semantic annotation system can refer to the related content in the existing professional textbook "diagnostics" of Ministry of health, and is assisted by the practical experience of clinicians and the practical experience of semantic annotation. For example, as shown in table 1 below, an example of a semantic annotation hierarchy is shown:

TABLE 1

In this step 302, the clauses after the named body recognition can be semantically labeled by using the semantic labeling system illustrated in table 1, wherein one clause is allowed to be labeled with multiple categories.

For example, the following is the labeling result obtained by executing the step 302:

000012321-present history-patient before 6 years cervical cancer in the hospital row cervical cancer radical surgery-patient before [ time ] because [ disease ] in the hospital row [ operation ] _6 years-cervical cancer radical surgery _5.1_5.2_5.5

000012321 current medical history postoperative radiotherapy two months postoperative radiotherapy [ time ] _5.5

000012321 current medical history periodic review No signs of relapse periodic review 5.4

Step 303: and carrying out time marking on the clauses identified by the named body according to the time type entity so as to mark the course of illness corresponding to the clauses.

In application, some patients suffer from chronic diseases, the whole disease process is very long (for example, decades), and then the content of the corresponding current medical history is more abundant. Based on this, the present invention proposes to divide the content of the present medical history into several "courses" according to time, and in a colloquial way, the "course" refers to the course of the disease (development, treatment, etc.). For example, the following include 4 disease courses:

"… before 20 years [ course 1 ] after … 6 [ course 2 ] … 3 before … [ course 3 ] in a hospital … [ course 3 ] … to date in my hospital … [ course 4 ]. "

Therefore, in this step, the clauses after the named body recognition can be time-labeled according to the time-class entity, so as to label the course of illness corresponding to the clauses.

(2) Known labels of medical record data belong to a predefined two-class label:

in the invention, the clauses can be directly labeled according to the known labels of the clauses, namely the known labels of the clauses are the standard results of the clauses.

(3) Known labels of medical record data belong to three predefined classes of labels:

referring to fig. 4, another example of the implementation flow of step 202 is shown in fig. 3, which includes the following steps:

step 401: and determining a high-frequency clause set according to a preset medical record data sample set.

First, known labels of medical record data in the medical record data sample set are three types of labels.

In step 401, a high-frequency clause set is determined according to the medical record data sample set, wherein the occurrence frequency of clauses in the high-frequency clause set in the unstructured sample data set exceeds a set threshold.

For example, assuming that there are 1000 medical records in which the clause "plain body health" appears in the past history portion of 924 medical records, and assuming that the threshold is set to 0.9, then the clause can be classified into a high frequency clause set, as described above.

Step 402: it is checked whether the clause is included in the high frequency clause set.

Step 403: if not, marking the clause as effective information.

Step 402 and step 403 are collectively described below:

in application, taking the past history as an example, most patients have the description of 'plain body health', so the description has a relatively low value, and only a few people have the description of 'plain body poor', so the description has a relatively high value. On the basis of this, it is possible to check whether a clause is included in the high-frequency clause set, and if not, it is described that the frequency of occurrence of the clause is low, and therefore, the value of the clause can be considered to be high, and therefore, the clause can be marked as valid information.

For example, assume that the high-frequency clause set corresponding to "past history" includes 3 clauses: "plain physical health", "history of vaccination is unknown" and "deny history of infection", and assume that the past history of medical record 1 includes the following: "plain physical health", "surgical history", the past history of medical record 2 includes the following: "plain body is normal" and "history of vaccination is not detailed". As described above, the valid information in case 1 is "there is a surgical history" and the valid information in case 2 is "plain physical".

Step 203: and adding the clauses into the corresponding table entry according to the labeling result of each clause to obtain semi-structured medical record data.

First, as shown in the following table 2, an example of semi-structured medical record data obtained by executing the present step 203 is described:

TABLE 2

As can be seen from Table 2 above, each course is subdivided into several entries. For clauses of a class of labels, corresponding table entries are levels in a semantic annotation system, wherein if only one level exists, the first level is used as a table entry name, and if two levels exist, the first level and the second level are combined to be used as the table entry name. For clauses of the second type tags and the third type tags, the corresponding entry names are the known tags.

The content in each entry is the clause itself, and since the corresponding annotation result of the clause can be found through the clause itself, the information such as the entity included in the clause can be found if necessary. On the basis of the embodiment shown in fig. 4, the content in each entry is the specific implementation of the clause itself: and aiming at each clause, if the labeling result of the clause shows that the clause is effective information, adding the clause to the corresponding table entry to obtain the semi-structured data.

In addition, it should be noted that, for the clauses corresponding to the second class of tags, the clauses are automatically classified into the last course of disease.

As can be seen from the above embodiments, a clause set is obtained by splitting clauses of medical record data to be processed, and for each clause in the clause set, the clause is labeled according to text content of the clause, and the clause is added to the corresponding table entry according to a labeling result of each clause, so as to obtain semi-structured data.

Compared with the medical record data processing method, the invention also provides a medical record data processing device.

As shown in fig. 5, a block diagram of an embodiment of a medical record data processing apparatus according to an exemplary embodiment of the present invention includes: a clause splitting module 51, a clause labeling module 52 and a data forming module 53.

A clause splitting module 51, configured to split a clause of medical record data to be processed to obtain a clause set;

a clause labeling module 52, configured to label, for each clause in the clause set, the clause according to the text content of the clause;

and the data forming module 53 is configured to add the clause to the corresponding entry according to the labeling result of each clause, so as to obtain semi-structured medical record data.

In an embodiment, the clause splitting module 51 performs clause splitting on medical record data to be processed, and obtaining a clause set includes:

In an embodiment, when the known tags of the medical record data belong to a predefined class of tags, the clause labeling module 52 labels the clause according to the text content of the clause, including:

In an embodiment, when the known tags of the medical record data belong to two predefined types of tags, the clause labeling module 52 labels the clause according to the text content of the clause, including:

and marking the clauses according to the known labels of the clauses.

In an embodiment, when the known tags of the medical record data belong to three predefined types of tags, the clause labeling module 52 labels the clause according to the text content of the clause, including:

checking whether the clause is included in the high-frequency clause set;

if not, marking the clause as effective information.

In an embodiment, the data forming module 53 adds the clause to the corresponding entry according to the labeling result of each clause, and obtaining the semi-structured data includes:

and for each clause, if the labeling result of the clause indicates that the clause is effective information, adding the clause to a corresponding table entry to obtain semi-structured data.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for processing medical record data is characterized by comprising the following steps:

2. The method according to claim 1, wherein the performing clause splitting on the medical record data to be processed to obtain a clause set comprises:

3. The method of claim 1, wherein when known tags of the medical record data belong to a predefined class of tags, the labeling the clause according to the text content of the clause comprises:

4. The method of claim 1, wherein when the known tags of the medical record data belong to a predefined class II tag, the labeling the clause according to the text content of the clause comprises:

and marking the clauses according to the known labels of the clauses.

5. The method of claim 1, wherein when known tags of the medical record data belong to three predefined types of tags, the labeling the clause according to the text content of the clause comprises:

checking whether the clause is included in the high-frequency clause set;

if not, marking the clause as effective information.

6. The method of claim 5, wherein the adding the clause to the corresponding entry according to the labeling result of each clause to obtain semi-structured medical record data comprises:

7. An apparatus for processing medical record data, comprising:

and the data forming module is used for adding the clauses into corresponding table entries according to the labeling result of each clause to obtain semi-structured medical record data.

8. The apparatus according to claim 7, wherein the clause splitting module is specifically configured to determine a target punctuation mark according to a known label of medical record data to be processed; and splitting clauses of the medical record data by using the target punctuations to obtain a clause set.

9. The apparatus according to claim 7, wherein the clause labeling module is specifically configured to perform named entity recognition on the clause to label named entities in the clause when the known labels of the medical record data belong to a predefined class of labels; semantic annotation is carried out on the clauses after the named body recognition by utilizing a preset semantic annotation system so as to mark out the semantic category to which the clauses belong; and carrying out time marking on the clauses identified by the named body according to the time class entity so as to mark the course of illness corresponding to the clauses.

10. The apparatus of claim 7, wherein the clause labeling module is specifically configured to label the clause according to the known label of the clause when the known label of the medical record data belongs to a predefined class two label.