CN113658652B - Binary relation extraction method based on electronic medical record data text - Google Patents
Binary relation extraction method based on electronic medical record data text Download PDFInfo
- Publication number
- CN113658652B CN113658652B CN202110946939.7A CN202110946939A CN113658652B CN 113658652 B CN113658652 B CN 113658652B CN 202110946939 A CN202110946939 A CN 202110946939A CN 113658652 B CN113658652 B CN 113658652B
- Authority
- CN
- China
- Prior art keywords
- extracted
- text
- binary relation
- extraction method
- electronic medical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Epidemiology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting corresponding binary relation from the preprocessed source data text; c. and matching the extracted results according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Has the advantages of rapider, more detailed and accurate extracted content, and the like.
Description
Technical Field
The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.
Background
The electronic medical record is a medical record which is stored, managed, transmitted and copied by medical staff in the process of medical activity by utilizing digital information such as texts, symbols, charts, graphs, data, images and the like generated by a medical institution information system. By analyzing the electronic medical records, a great deal of medical knowledge closely related to patients can be mined.
There is a large amount of text in electronic medical records describing the patient's condition, but it is likely that one patient is currently admitted with multiple complications. When the personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction method is often based on dictionary and deep learning, but the extraction speed of the method in practical application is relatively low, and the detail degree and accuracy of the extracted content need to be improved.
Disclosure of Invention
The invention aims to provide a binary relation extraction method based on electronic medical record data text.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting corresponding binary relation from the preprocessed source data text;
c. and matching the extracted results according to the ID, sorting, storing and outputting.
Further, the preprocessing comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
a102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library to be extracted.
Further, the step b includes the following extraction steps:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
step B104, judging the number of the extracted numbers, and if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.
Further, the pretreatment further comprises manual auxiliary treatment.
Further, the source data includes txt, xlsx or csv text.
The beneficial effects of the invention are concentrated in that:
the invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Compared with the traditional dictionary-based and deep learning method, the method has the advantages of being quicker, enabling the extracted content to be more detailed and accurate, and the like.
Drawings
Fig. 1 is an extraction flow chart of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the invention can rapidly and automatically extract self-defined binary relation pairs, such as symptoms, duration, medication, dosage and the like, from texts by using a binary relation extraction method based on electronic medical record data texts, and specifically comprises the following extraction steps:
a. source data input, namely, a list of independent data is subject to source data preprocessing, wherein the source data comprises txt, xlsx or csv texts in the embodiment;
the preprocessing of the source data specifically comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
for example: removing special Chinese characters: cause, due, etc.;
removing punctuation marks: "etc.;
replacement of ambiguous chinese characters: "one and a half years", "2 years 7 months", etc.;
processing exception substitution: "reverse the same 1 problem", "2 episodes of loss of consciousness within 20 years", etc.
A102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation; for example: each symptom + duration is a session, palpitation is afraid of 3+ months, aggravating 15+ days accompanying the injury;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library (list) to be extracted.
In step a, the pretreatment further comprises a manual auxiliary treatment for manually removing special symptoms and special complaints.
b. Extracting corresponding binary relation from the preprocessed source data text;
the specific extraction steps in this step in this embodiment are:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
and B104, in the step, firstly judging whether to extract the number plus unit, if so, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extractions is more than one, namely two or three, position judgment is carried out, and the binary corresponding relation is updated.
c. Matching the extracted results according to the ID, sorting, storing and outputting; three lists are stored in this step, and converted into a CSV output file through a dataframe.
The invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Compared with the traditional dictionary-based and deep learning method, the method has the advantages of being quicker, enabling the extracted content to be more detailed and accurate, and the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the acts and elements referred to are not necessarily required in the present application.
Claims (4)
1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps of: the method comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting corresponding binary relation from the preprocessed source data text;
c. matching the extracted results according to the ID, sorting, storing and outputting;
the step b comprises the following extraction steps:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
step B104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentences into sentences in the cache library, and corresponding to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.
2. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
a102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library to be extracted.
3. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment further comprises manual auxiliary treatment.
4. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the source data includes txt, xlsx or csv text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110946939.7A CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110946939.7A CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113658652A CN113658652A (en) | 2021-11-16 |
CN113658652B true CN113658652B (en) | 2023-07-28 |
Family
ID=78480800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110946939.7A Active CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113658652B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665832A (en) * | 2023-06-01 | 2023-08-29 | 湖南首辰健康科技有限公司 | Intelligent quality control method, device, equipment and storage medium based on patient medical record |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147639A (en) * | 2017-05-08 | 2017-09-08 | 国家电网公司 | A kind of actual time safety method for early warning based on Complex event processing |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN113130025A (en) * | 2020-01-16 | 2021-07-16 | 中南大学 | Entity relationship extraction method, terminal equipment and computer readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427491B (en) * | 2019-07-04 | 2020-05-12 | 北京爱医生智慧医疗科技有限公司 | Medical knowledge graph construction method and device based on electronic medical record |
CN111223539A (en) * | 2019-12-30 | 2020-06-02 | 同济大学 | Method for extracting relation of Chinese electronic medical record |
CN111352987A (en) * | 2020-02-28 | 2020-06-30 | 汤学民 | Electronic medical record structuring method, system and related equipment |
-
2021
- 2021-08-18 CN CN202110946939.7A patent/CN113658652B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147639A (en) * | 2017-05-08 | 2017-09-08 | 国家电网公司 | A kind of actual time safety method for early warning based on Complex event processing |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN113130025A (en) * | 2020-01-16 | 2021-07-16 | 中南大学 | Entity relationship extraction method, terminal equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113658652A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106919793B (en) | Data standardization processing method and device for medical big data | |
CN111291568B (en) | Automatic entity relationship labeling method applied to medical texts | |
CN106844351B (en) | Medical institution organization entity identification method and device oriented to multiple data sources | |
CN112464667B (en) | Text entity identification method and device, electronic equipment and storage medium | |
CN111126065A (en) | Information extraction method and device for natural language text | |
CN106095913A (en) | A kind of electronic health record text structure method | |
Malmasi et al. | Canary: an NLP platform for clinicians and researchers | |
CN113658652B (en) | Binary relation extraction method based on electronic medical record data text | |
US10120843B2 (en) | Generation of parsable data for deep parsing | |
CA3164921A1 (en) | Unsupervised taxonomy extraction from medical clinical trials | |
CN115458113A (en) | Medical record generation method, system, storage medium and electronic equipment | |
CN110134766B (en) | Word segmentation method and device for traditional Chinese medical ancient book documents | |
CN116775897A (en) | Knowledge graph construction and query method and device, electronic equipment and storage medium | |
Orosz et al. | Hybrid text segmentation for Hungarian clinical records | |
Bauersfeld et al. | Cracking double-blind review: Authorship attribution with deep learning | |
Zhou et al. | Context-sensitive spelling correction of consumer-generated content on health care | |
Carvalho et al. | Fuzzy preprocessing of medical text annotations of intensive care units patients | |
Htait et al. | Unsupervised creation of normalization dictionaries for micro-blogs in Arabic, French and English | |
Böschen | Evaluation of the extraction of methodological study characteristics with JATSdecoder | |
Cappello et al. | Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data | |
RU2751993C1 (en) | Method for extracting information from unstructured texts written in natural language | |
Andrews | Digital Techniques for Critical Edition | |
AU2021106441A4 (en) | Method, System and Device for Extracting Compound Words of Pathological location in Medical Texts Based on Word-Formation | |
CN111326262B (en) | Entity relation extraction method, device and system in electronic medical record data | |
Yepes et al. | The read-biomed team in livingner task 1 (2022): Adaptation of an english annotation system to spanish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |