CN113658652B - Binary relation extraction method based on electronic medical record data text - Google Patents

Binary relation extraction method based on electronic medical record data text Download PDF

Info

Publication number
CN113658652B
CN113658652B CN202110946939.7A CN202110946939A CN113658652B CN 113658652 B CN113658652 B CN 113658652B CN 202110946939 A CN202110946939 A CN 202110946939A CN 113658652 B CN113658652 B CN 113658652B
Authority
CN
China
Prior art keywords
extracted
text
binary relation
extraction method
electronic medical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110946939.7A
Other languages
Chinese (zh)
Other versions
CN113658652A (en
Inventor
朱婷
张伟
刘瑞航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202110946939.7A priority Critical patent/CN113658652B/en
Publication of CN113658652A publication Critical patent/CN113658652A/en
Application granted granted Critical
Publication of CN113658652B publication Critical patent/CN113658652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of electronic medical records, and discloses a binary relation extraction method based on electronic medical record data text, which comprises the following extraction steps: a. inputting source data, and preprocessing the source data; b. extracting corresponding binary relation from the preprocessed source data text; c. and matching the extracted results according to the ID, sorting, storing and outputting. The invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Has the advantages of rapider, more detailed and accurate extracted content, and the like.

Description

Binary relation extraction method based on electronic medical record data text
Technical Field
The invention relates to the field of electronic medical records, in particular to a binary relation extraction method based on electronic medical record data texts.
Background
The electronic medical record is a medical record which is stored, managed, transmitted and copied by medical staff in the process of medical activity by utilizing digital information such as texts, symbols, charts, graphs, data, images and the like generated by a medical institution information system. By analyzing the electronic medical records, a great deal of medical knowledge closely related to patients can be mined.
There is a large amount of text in electronic medical records describing the patient's condition, but it is likely that one patient is currently admitted with multiple complications. When the personalized medical health information research is carried out on the cases, effective information in the electronic medical record needs to be extracted, the existing extraction method is often based on dictionary and deep learning, but the extraction speed of the method in practical application is relatively low, and the detail degree and accuracy of the extracted content need to be improved.
Disclosure of Invention
The invention aims to provide a binary relation extraction method based on electronic medical record data text.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a binary relation extraction method based on electronic medical record data text comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting corresponding binary relation from the preprocessed source data text;
c. and matching the extracted results according to the ID, sorting, storing and outputting.
Further, the preprocessing comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
a102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library to be extracted.
Further, the step b includes the following extraction steps:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
step B104, judging the number of the extracted numbers, and if the number of the extracted numbers is one, adding the current sentence into a cache library to correspond to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.
Further, the pretreatment further comprises manual auxiliary treatment.
Further, the source data includes txt, xlsx or csv text.
The beneficial effects of the invention are concentrated in that:
the invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Compared with the traditional dictionary-based and deep learning method, the method has the advantages of being quicker, enabling the extracted content to be more detailed and accurate, and the like.
Drawings
Fig. 1 is an extraction flow chart of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in FIG. 1, the invention can rapidly and automatically extract self-defined binary relation pairs, such as symptoms, duration, medication, dosage and the like, from texts by using a binary relation extraction method based on electronic medical record data texts, and specifically comprises the following extraction steps:
a. source data input, namely, a list of independent data is subject to source data preprocessing, wherein the source data comprises txt, xlsx or csv texts in the embodiment;
the preprocessing of the source data specifically comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
for example: removing special Chinese characters: cause, due, etc.;
removing punctuation marks: "etc.;
replacement of ambiguous chinese characters: "one and a half years", "2 years 7 months", etc.;
processing exception substitution: "reverse the same 1 problem", "2 episodes of loss of consciousness within 20 years", etc.
A102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation; for example: each symptom + duration is a session, palpitation is afraid of 3+ months, aggravating 15+ days accompanying the injury;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library (list) to be extracted.
In step a, the pretreatment further comprises a manual auxiliary treatment for manually removing special symptoms and special complaints.
b. Extracting corresponding binary relation from the preprocessed source data text;
the specific extraction steps in this step in this embodiment are:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
and B104, in the step, firstly judging whether to extract the number plus unit, if so, judging the number of the extracted contents according to the medical text characteristics of the electronic medical record, and if the number of the extracted contents is one, adding the current sentence into a cache library to correspond to the current formatted contents. If the number of the extractions is more than one, namely two or three, position judgment is carried out, and the binary corresponding relation is updated.
c. Matching the extracted results according to the ID, sorting, storing and outputting; three lists are stored in this step, and converted into a CSV output file through a dataframe.
The invention does not limit or fix the extracted content and fix the relationship of the named entity in the binary relationship. Such as: the method has the advantages that the current patient discharge summary text can be traversed through simple template setting, and the key contents can be rapidly proposed so as to provide assistance for subsequent clinical decision problem modeling research. Compared with the traditional dictionary-based and deep learning method, the method has the advantages of being quicker, enabling the extracted content to be more detailed and accurate, and the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the acts and elements referred to are not necessarily required in the present application.

Claims (4)

1. A binary relation extraction method based on electronic medical record data text is characterized by comprising the following steps of: the method comprises the following extraction steps:
a. inputting source data, and preprocessing the source data;
b. extracting corresponding binary relation from the preprocessed source data text;
c. matching the extracted results according to the ID, sorting, storing and outputting;
the step b comprises the following extraction steps:
b101, newly creating an intermediate buffer variable, and traversing a text to be extracted in a sentence library to be extracted;
b102, analyzing and splitting the paradigm, judging whether the extracted text contains numbers or not, if no number is extracted, adding the sentence into a cache library, and returning to the step B101 to traverse the next sentence;
b103, if the number is successfully extracted, judging whether the number plus the unit can be extracted or not, if the unit cannot be extracted, adding the sentence into an abnormal library, and performing manual processing inspection;
step B104, judging the number of the extracted numbers, if the number of the extracted numbers is one, adding the current sentences into sentences in the cache library, and corresponding to the current formatted content; if the number of the extractions is more than one, position judgment is carried out, and the binary corresponding relation is updated.
2. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment comprises the following steps:
a101, normalizing the text: one or more operations including removing special text, replacing ambiguous text, handling exception replacement;
a102, clauses: according to the characteristics of the text, punctuation marks are set for sentence separation;
a103, removing redundancy, removing blank, cleaning repeated and abnormal data, and placing the processed content into a sentence library to be extracted.
3. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the pretreatment further comprises manual auxiliary treatment.
4. The binary relation extraction method based on the electronic medical record data text according to claim 1, wherein the binary relation extraction method is characterized by comprising the following steps of: the source data includes txt, xlsx or csv text.
CN202110946939.7A 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text Active CN113658652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110946939.7A CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110946939.7A CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Publications (2)

Publication Number Publication Date
CN113658652A CN113658652A (en) 2021-11-16
CN113658652B true CN113658652B (en) 2023-07-28

Family

ID=78480800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110946939.7A Active CN113658652B (en) 2021-08-18 2021-08-18 Binary relation extraction method based on electronic medical record data text

Country Status (1)

Country Link
CN (1) CN113658652B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665832A (en) * 2023-06-01 2023-08-29 湖南首辰健康科技有限公司 Intelligent quality control method, device, equipment and storage medium based on patient medical record

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107147639A (en) * 2017-05-08 2017-09-08 国家电网公司 A kind of actual time safety method for early warning based on Complex event processing
CN110069623A (en) * 2017-12-06 2019-07-30 腾讯科技(深圳)有限公司 Summary texts generation method, device, storage medium and computer equipment
CN113130025A (en) * 2020-01-16 2021-07-16 中南大学 Entity relationship extraction method, terminal equipment and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427491B (en) * 2019-07-04 2020-05-12 北京爱医生智慧医疗科技有限公司 Medical knowledge graph construction method and device based on electronic medical record
CN111223539A (en) * 2019-12-30 2020-06-02 同济大学 Method for extracting relation of Chinese electronic medical record
CN111352987A (en) * 2020-02-28 2020-06-30 汤学民 Electronic medical record structuring method, system and related equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107147639A (en) * 2017-05-08 2017-09-08 国家电网公司 A kind of actual time safety method for early warning based on Complex event processing
CN110069623A (en) * 2017-12-06 2019-07-30 腾讯科技(深圳)有限公司 Summary texts generation method, device, storage medium and computer equipment
CN113130025A (en) * 2020-01-16 2021-07-16 中南大学 Entity relationship extraction method, terminal equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN113658652A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN106919793B (en) Data standardization processing method and device for medical big data
CN111291568B (en) Automatic entity relationship labeling method applied to medical texts
CN106844351B (en) Medical institution organization entity identification method and device oriented to multiple data sources
CN112464667B (en) Text entity identification method and device, electronic equipment and storage medium
CN111126065A (en) Information extraction method and device for natural language text
CN106095913A (en) A kind of electronic health record text structure method
Malmasi et al. Canary: an NLP platform for clinicians and researchers
CN113658652B (en) Binary relation extraction method based on electronic medical record data text
US10120843B2 (en) Generation of parsable data for deep parsing
CA3164921A1 (en) Unsupervised taxonomy extraction from medical clinical trials
CN115458113A (en) Medical record generation method, system, storage medium and electronic equipment
CN110134766B (en) Word segmentation method and device for traditional Chinese medical ancient book documents
CN116775897A (en) Knowledge graph construction and query method and device, electronic equipment and storage medium
Orosz et al. Hybrid text segmentation for Hungarian clinical records
Bauersfeld et al. Cracking double-blind review: Authorship attribution with deep learning
Zhou et al. Context-sensitive spelling correction of consumer-generated content on health care
Carvalho et al. Fuzzy preprocessing of medical text annotations of intensive care units patients
Htait et al. Unsupervised creation of normalization dictionaries for micro-blogs in Arabic, French and English
Böschen Evaluation of the extraction of methodological study characteristics with JATSdecoder
Cappello et al. Defining a Preprocessing Pipeline for the MULTI-SITA Project and General Medical Italian Natural Language Data
RU2751993C1 (en) Method for extracting information from unstructured texts written in natural language
Andrews Digital Techniques for Critical Edition
AU2021106441A4 (en) Method, System and Device for Extracting Compound Words of Pathological location in Medical Texts Based on Word-Formation
CN111326262B (en) Entity relation extraction method, device and system in electronic medical record data
Yepes et al. The read-biomed team in livingner task 1 (2022): Adaptation of an english annotation system to spanish

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant