CN113658652A - Binary relation extraction method based on electronic medical record data text - Google Patents
Binary relation extraction method based on electronic medical record data text Download PDFInfo
- Publication number
- CN113658652A CN113658652A CN202110946939.7A CN202110946939A CN113658652A CN 113658652 A CN113658652 A CN 113658652A CN 202110946939 A CN202110946939 A CN 202110946939A CN 113658652 A CN113658652 A CN 113658652A
- Authority
- CN
- China
- Prior art keywords
- extracted
- text
- sentence
- electronic medical
- medical record
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000002159 abnormal effect Effects 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 206010067671 Disease complication Diseases 0.000 description 1
- 208000003443 Unconsciousness Diseases 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Epidemiology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on an electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting the corresponding binary relation of the preprocessed source data text; c. and matching the extracted result according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Has the advantages of rapidness, more detailed and accurate extracted content and the like.
Description
Technical Field
The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.
Background
An electronic medical record is a medical record in which digital information such as text, symbols, charts, graphs, data, images and the like generated by medical staff by using a medical institution information system is stored, managed, transmitted and copied during medical activities. Through the analysis of the electronic medical record, a lot of medical knowledge closely related to the patient can be mined.
There is a lot of text in electronic medical records describing the condition of a patient, but it is possible that when a patient is currently admitted, it is often accompanied by a variety of disease complications. When personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction methods are usually based on dictionaries and deep learning, but the extraction speed of the methods in practical application is relatively slow, and the details and accuracy of the extracted content need to be improved.
Disclosure of Invention
The invention aims to provide a binary relation extraction method based on an electronic medical record data text.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting the corresponding binary relation of the preprocessed source data text;
c. and matching the extracted result according to the ID, sorting, storing and outputting.
Further, the pre-treatment comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
a102, clause: setting punctuation marks for sentence division according to the text characteristics;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library to be extracted.
Further, the step b comprises the following extraction steps:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.
Further, the pretreatment also comprises manual auxiliary treatment.
Further, the source data includes txt, xlsx, or csv text.
The beneficial effects of the invention are concentrated and expressed as follows:
the invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Compared with the traditional method based on a dictionary and deep learning, the method has the advantages of rapidness, more detailed and accurate extracted content and the like.
Drawings
FIG. 1 is a flow chart of the extraction process of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the invention provides a binary relation extraction method based on an electronic medical record data text, which can quickly and automatically extract a self-defined binary relation pair from the text, such as symptom + duration, medication + dosage, and the like, and specifically comprises the following extraction steps:
a. source data input, that is, a single column of data is complained, and source data preprocessing is performed, where the source data includes txt, xlsx, or csv text in this embodiment;
the preprocessing of the source data specifically comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
for example: removing special Chinese characters: "therefore", and the like;
removing punctuation marks: "and the like;
replacing ambiguous Chinese characters: "one and half year", "2 years and 7 months", etc.;
processing exception replacement: "reverse the same 1 question", "2 episodes of loss of consciousness within 20 years", etc.
A102, clause: setting punctuation marks for sentence division according to the text characteristics; for example: each symptom + duration is one period, the patient feels panic for 3+ months, and the patient is aggravated with self-injury for 15+ days;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library (list) to be extracted.
In step a, the pretreatment also comprises a manual auxiliary treatment, and special symptoms and special complaints are removed manually.
b. Extracting the corresponding binary relation of the preprocessed source data text;
the specific extraction steps in this step in this embodiment are:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
and B104, in the step, firstly judging whether the number + the unit is extracted or not, if the contents are successfully extracted, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extracted data is more than one, namely two or three data, position judgment is carried out, and the binary corresponding relation is updated.
c. Matching the extracted result according to the ID, sorting, storing and outputting; in this step, three list files are stored and converted into CSV output files through dataframe.
The invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Compared with the traditional method based on a dictionary and deep learning, the method has the advantages of rapidness, more detailed and accurate extracted content and the like.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.
Claims (5)
1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps: comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting the corresponding binary relation of the preprocessed source data text;
c. and matching the extracted result according to the ID, sorting, storing and outputting.
2. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
a102, clause: setting punctuation marks for sentence division according to the text characteristics;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library to be extracted.
3. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the step b comprises the following extraction steps:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into the sentence in the cache library, and corresponding to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.
4. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment also comprises artificial auxiliary treatment.
5. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the source data includes txt, xlsx or csv text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110946939.7A CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110946939.7A CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113658652A true CN113658652A (en) | 2021-11-16 |
CN113658652B CN113658652B (en) | 2023-07-28 |
Family
ID=78480800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110946939.7A Active CN113658652B (en) | 2021-08-18 | 2021-08-18 | Binary relation extraction method based on electronic medical record data text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113658652B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665832A (en) * | 2023-06-01 | 2023-08-29 | 湖南首辰健康科技有限公司 | Intelligent quality control method, device, equipment and storage medium based on patient medical record |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147639A (en) * | 2017-05-08 | 2017-09-08 | 国家电网公司 | A kind of actual time safety method for early warning based on Complex event processing |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN110427491A (en) * | 2019-07-04 | 2019-11-08 | 北京爱医生智慧医疗科技有限公司 | A kind of medical knowledge map construction method and device based on electronic health record |
CN111223539A (en) * | 2019-12-30 | 2020-06-02 | 同济大学 | Method for extracting relation of Chinese electronic medical record |
CN111352987A (en) * | 2020-02-28 | 2020-06-30 | 汤学民 | Electronic medical record structuring method, system and related equipment |
CN113130025A (en) * | 2020-01-16 | 2021-07-16 | 中南大学 | Entity relationship extraction method, terminal equipment and computer readable storage medium |
-
2021
- 2021-08-18 CN CN202110946939.7A patent/CN113658652B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107147639A (en) * | 2017-05-08 | 2017-09-08 | 国家电网公司 | A kind of actual time safety method for early warning based on Complex event processing |
CN110069623A (en) * | 2017-12-06 | 2019-07-30 | 腾讯科技(深圳)有限公司 | Summary texts generation method, device, storage medium and computer equipment |
CN110427491A (en) * | 2019-07-04 | 2019-11-08 | 北京爱医生智慧医疗科技有限公司 | A kind of medical knowledge map construction method and device based on electronic health record |
CN111223539A (en) * | 2019-12-30 | 2020-06-02 | 同济大学 | Method for extracting relation of Chinese electronic medical record |
CN113130025A (en) * | 2020-01-16 | 2021-07-16 | 中南大学 | Entity relationship extraction method, terminal equipment and computer readable storage medium |
CN111352987A (en) * | 2020-02-28 | 2020-06-30 | 汤学民 | Electronic medical record structuring method, system and related equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665832A (en) * | 2023-06-01 | 2023-08-29 | 湖南首辰健康科技有限公司 | Intelligent quality control method, device, equipment and storage medium based on patient medical record |
Also Published As
Publication number | Publication date |
---|---|
CN113658652B (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299472B (en) | Text data processing method and device, electronic equipment and computer readable medium | |
CN111428036B (en) | Entity relationship mining method based on biomedical literature | |
Al‐Sughaiyer et al. | Arabic morphological analysis techniques: A comprehensive survey | |
CN112464667B (en) | Text entity identification method and device, electronic equipment and storage medium | |
CN111291568B (en) | Automatic entity relationship labeling method applied to medical texts | |
AU2019203783B2 (en) | Extraction of tokens and relationship between tokens from documents to form an entity relationship map | |
CN113159969A (en) | Financial long text rechecking system | |
CN113658652A (en) | Binary relation extraction method based on electronic medical record data text | |
US11113469B2 (en) | Natural language processing matrices | |
CN113254651B (en) | Method and device for analyzing referee document, computer equipment and storage medium | |
CN110705319A (en) | Translation method | |
CN113111660A (en) | Data processing method, device, equipment and storage medium | |
Čibej et al. | Normalisation, tokenisation and sentence segmentation of Slovene tweets | |
Andrews | Digital Techniques for Critical Edition | |
CN111326262B (en) | Entity relation extraction method, device and system in electronic medical record data | |
Sodhar et al. | Word by Word Labelling of Romanized Sindhi Text by using Online Python Tool | |
Aliyu et al. | SED: An Algorithm for Automatic Identification of Section and Subsection Headings in Text Documents | |
RU2751993C1 (en) | Method for extracting information from unstructured texts written in natural language | |
AU2021106441A4 (en) | Method, System and Device for Extracting Compound Words of Pathological location in Medical Texts Based on Word-Formation | |
CN111415751B (en) | Topic segmentation method, device and system for electronic medical record data | |
US11783112B1 (en) | Framework agnostic summarization of multi-channel communication | |
Carvalho et al. | Towards Unsupervised Word Error Correction in Textual Big Data. | |
CN115905297B (en) | Method, apparatus and medium for retrieving data | |
Özge et al. | Diacritics correction in Turkish with context-aware sequence to sequence modeling | |
Ruis et al. | Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |