CN113658652A - Binary relation extraction method based on electronic medical record data text - Google Patents

Binary relation extraction method based on electronic medical record data text Download PDF

Info

Publication number
CN113658652A
CN113658652A CN202110946939.7A CN202110946939A CN113658652A CN 113658652 A CN113658652 A CN 113658652A CN 202110946939 A CN202110946939 A CN 202110946939A CN 113658652 A CN113658652 A CN 113658652A
Authority
CN
China
Prior art keywords
extracted
text
sentence
electronic medical
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110946939.7A
Other languages
Chinese (zh)
Other versions
CN113658652B (en
Inventor
朱婷
张伟
刘瑞航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202110946939.7A priority Critical patent/CN113658652B/en
Publication of CN113658652A publication Critical patent/CN113658652A/en
Application granted granted Critical
Publication of CN113658652B publication Critical patent/CN113658652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on an electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting the corresponding binary relation of the preprocessed source data text; c. and matching the extracted result according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Has the advantages of rapidness, more detailed and accurate extracted content and the like.

Description

Binary relation extraction method based on electronic medical record data text
Technical Field
The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.
Background
An electronic medical record is a medical record in which digital information such as text, symbols, charts, graphs, data, images and the like generated by medical staff by using a medical institution information system is stored, managed, transmitted and copied during medical activities. Through the analysis of the electronic medical record, a lot of medical knowledge closely related to the patient can be mined.
There is a lot of text in electronic medical records describing the condition of a patient, but it is possible that when a patient is currently admitted, it is often accompanied by a variety of disease complications. When personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction methods are usually based on dictionaries and deep learning, but the extraction speed of the methods in practical application is relatively slow, and the details and accuracy of the extracted content need to be improved.
Disclosure of Invention
The invention aims to provide a binary relation extraction method based on an electronic medical record data text.
In order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting the corresponding binary relation of the preprocessed source data text;
c. and matching the extracted result according to the ID, sorting, storing and outputting.
Further, the pre-treatment comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
a102, clause: setting punctuation marks for sentence division according to the text characteristics;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library to be extracted.
Further, the step b comprises the following extraction steps:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.
Further, the pretreatment also comprises manual auxiliary treatment.
Further, the source data includes txt, xlsx, or csv text.
The beneficial effects of the invention are concentrated and expressed as follows:
the invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Compared with the traditional method based on a dictionary and deep learning, the method has the advantages of rapidness, more detailed and accurate extracted content and the like.
Drawings
FIG. 1 is a flow chart of the extraction process of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the invention provides a binary relation extraction method based on an electronic medical record data text, which can quickly and automatically extract a self-defined binary relation pair from the text, such as symptom + duration, medication + dosage, and the like, and specifically comprises the following extraction steps:
a. source data input, that is, a single column of data is complained, and source data preprocessing is performed, where the source data includes txt, xlsx, or csv text in this embodiment;
the preprocessing of the source data specifically comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
for example: removing special Chinese characters: "therefore", and the like;
removing punctuation marks: "and the like;
replacing ambiguous Chinese characters: "one and half year", "2 years and 7 months", etc.;
processing exception replacement: "reverse the same 1 question", "2 episodes of loss of consciousness within 20 years", etc.
A102, clause: setting punctuation marks for sentence division according to the text characteristics; for example: each symptom + duration is one period, the patient feels panic for 3+ months, and the patient is aggravated with self-injury for 15+ days;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library (list) to be extracted.
In step a, the pretreatment also comprises a manual auxiliary treatment, and special symptoms and special complaints are removed manually.
b. Extracting the corresponding binary relation of the preprocessed source data text;
the specific extraction steps in this step in this embodiment are:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
and B104, in the step, firstly judging whether the number + the unit is extracted or not, if the contents are successfully extracted, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extracted data is more than one, namely two or three data, position judgment is carried out, and the binary corresponding relation is updated.
c. Matching the extracted result according to the ID, sorting, storing and outputting; in this step, three list files are stored and converted into CSV output files through dataframe.
The invention does not limit or fix the extracted content, and the relationship of the named entity is limited to the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for the modeling research of the subsequent clinical decision-making problem. Compared with the traditional method based on a dictionary and deep learning, the method has the advantages of rapidness, more detailed and accurate extracted content and the like.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.

Claims (5)

1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps: comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting the corresponding binary relation of the preprocessed source data text;
c. and matching the extracted result according to the ID, sorting, storing and outputting.
2. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment comprises the following steps:
a101, text normalization: one or more operations of removing special texts, replacing ambiguous texts and processing abnormal replacement are included;
a102, clause: setting punctuation marks for sentence division according to the text characteristics;
and A103, removing redundancy, removing blank, simultaneously cleaning repeated and abnormal data, and putting the processed content into a sentence library to be extracted.
3. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the step b comprises the following extraction steps:
b101, establishing an intermediate cache variable, and traversing texts to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the normal form, judging whether the extracted text contains numbers, if not, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and carrying out manual processing and checking;
b104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentence into the sentence in the cache library, and corresponding to the current formatted content; and if the number of the extracted data is more than one, judging the position and updating the binary corresponding relation.
4. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the pretreatment also comprises artificial auxiliary treatment.
5. The method for extracting the binary relation based on the electronic medical record data text as claimed in claim 1, wherein: the source data includes txt, xlsx or csv text.
CN202110946939.7A 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text Active CN113658652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110946939.7A CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110946939.7A CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Publications (2)

Publication Number Publication Date
CN113658652A true CN113658652A (en) 2021-11-16
CN113658652B CN113658652B (en) 2023-07-28

Family

ID=78480800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110946939.7A Active CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Country Status (1)

Country Link
CN (1) CN113658652B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665832A (en) * 2023-06-01 2023-08-29 湖南首辰健康科技有限公司 Intelligent quality control method, device, equipment and storage medium based on patient medical record

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107147639A (en) * 2017-05-08 2017-09-08 国家电网公司 A kind of actual time safety method for early warning based on Complex event processing
CN110069623A (en) * 2017-12-06 2019-07-30 腾讯科技(深圳)有限公司 Summary texts generation method, device, storage medium and computer equipment
CN110427491A (en) * 2019-07-04 2019-11-08 北京爱医生智慧医疗科技有限公司 A kind of medical knowledge map construction method and device based on electronic health record
CN111223539A (en) * 2019-12-30 2020-06-02 同济大学 Method for extracting relation of Chinese electronic medical record
CN111352987A (en) * 2020-02-28 2020-06-30 汤学民 Electronic medical record structuring method, system and related equipment
CN113130025A (en) * 2020-01-16 2021-07-16 中南大学 Entity relationship extraction method, terminal equipment and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107147639A (en) * 2017-05-08 2017-09-08 国家电网公司 A kind of actual time safety method for early warning based on Complex event processing
CN110069623A (en) * 2017-12-06 2019-07-30 腾讯科技(深圳)有限公司 Summary texts generation method, device, storage medium and computer equipment
CN110427491A (en) * 2019-07-04 2019-11-08 北京爱医生智慧医疗科技有限公司 A kind of medical knowledge map construction method and device based on electronic health record
CN111223539A (en) * 2019-12-30 2020-06-02 同济大学 Method for extracting relation of Chinese electronic medical record
CN113130025A (en) * 2020-01-16 2021-07-16 中南大学 Entity relationship extraction method, terminal equipment and computer readable storage medium
CN111352987A (en) * 2020-02-28 2020-06-30 汤学民 Electronic medical record structuring method, system and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665832A (en) * 2023-06-01 2023-08-29 湖南首辰健康科技有限公司 Intelligent quality control method, device, equipment and storage medium based on patient medical record

Also Published As

Publication number Publication date
CN113658652B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN109299472B (en) Text data processing method and device, electronic equipment and computer readable medium
CN111428036B (en) Entity relationship mining method based on biomedical literature
Al‐Sughaiyer et al. Arabic morphological analysis techniques: A comprehensive survey
CN112464667B (en) Text entity identification method and device, electronic equipment and storage medium
CN111291568B (en) Automatic entity relationship labeling method applied to medical texts
AU2019203783B2 (en) Extraction of tokens and relationship between tokens from documents to form an entity relationship map
CN113159969A (en) Financial long text rechecking system
CN113658652A (en) Binary relation extraction method based on electronic medical record data text
US11113469B2 (en) Natural language processing matrices
CN113254651B (en) Method and device for analyzing referee document, computer equipment and storage medium
CN110705319A (en) Translation method
CN113111660A (en) Data processing method, device, equipment and storage medium
Čibej et al. Normalisation, tokenisation and sentence segmentation of Slovene tweets
Andrews Digital Techniques for Critical Edition
CN111326262B (en) Entity relation extraction method, device and system in electronic medical record data
Sodhar et al. Word by Word Labelling of Romanized Sindhi Text by using Online Python Tool
Aliyu et al. SED: An Algorithm for Automatic Identification of Section and Subsection Headings in Text Documents
RU2751993C1 (en) Method for extracting information from unstructured texts written in natural language
AU2021106441A4 (en) Method, System and Device for Extracting Compound Words of Pathological location in Medical Texts Based on Word-Formation
CN111415751B (en) Topic segmentation method, device and system for electronic medical record data
US11783112B1 (en) Framework agnostic summarization of multi-channel communication
Carvalho et al. Towards Unsupervised Word Error Correction in Textual Big Data.
CN115905297B (en) Method, apparatus and medium for retrieving data
Özge et al. Diacritics correction in Turkish with context-aware sequence to sequence modeling
Ruis et al. Human-in-the-loop Language-agnostic Extraction of Medication Data from Highly Unstructured Electronic Health Records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant