CN113130025A - Entity relationship extraction method, terminal equipment and computer readable storage medium - Google Patents

Entity relationship extraction method, terminal equipment and computer readable storage medium Download PDF

Info

Publication number
CN113130025A
CN113130025A CN202010047654.5A CN202010047654A CN113130025A CN 113130025 A CN113130025 A CN 113130025A CN 202010047654 A CN202010047654 A CN 202010047654A CN 113130025 A CN113130025 A CN 113130025A
Authority
CN
China
Prior art keywords
candidate
extraction
template
seed
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010047654.5A
Other languages
Chinese (zh)
Other versions
CN113130025B (en
Inventor
唐琎
覃若彬
高琰
王艳东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010047654.5A priority Critical patent/CN113130025B/en
Publication of CN113130025A publication Critical patent/CN113130025A/en
Application granted granted Critical
Publication of CN113130025B publication Critical patent/CN113130025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an entity relationship extraction method, a terminal device and a computer readable storage medium, wherein the method comprises the following steps: manually extracting a plurality of binary entity pairs which accord with a preset entity relationship from an electronic medical record text database to serve as seed examples; for each seed instance, searching sentences comprising the seed instances in an electronic medical record text database, and extracting the feature vectors of the sentences; clustering the seed examples based on the feature vectors; generating an extraction template corresponding to the cluster according to the seed example and the characteristic vector of the sentence corresponding to the seed example; extracting candidate examples in an electronic medical record text database by using an extraction template; and calculating the confidence of each candidate instance according to the entity relationship between the candidate instance and the extraction template, and determining whether to use the candidate instance as a new seed instance for the next iteration according to the confidence. The method and the device can greatly improve the accuracy of extracting the entity relationship of the electronic medical record.

Description

Entity relationship extraction method, terminal equipment and computer readable storage medium
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a medical electronic medical record entity relation extraction method based on semi-supervision, a terminal device and a computer readable storage medium.
Background
In the more and more information and intelligent age, the medical health services are continuously developing towards the aspects of information and intelligence, and the medical electronic medical records are beginning to play more and more important roles in the medical health field. Medical Records (Medical Records) are Records of Medical activities of Medical personnel in examining, diagnosing, treating, etc. for occurrence, development and outcome of diseases of patients. The medical health record of the patient is also written according to the specified format and requirements by carrying out induction, arrangement and comprehensive analysis on the collected data. The traditional paper medical records have the defects of scattered storage, difficult retrieval, easy loss, difficult handwriting identification and the like, so that the medical records are difficult to manage and utilize by a modern means, and the electronic medical records are superior to the paper medical records in the aspects of content, availability and the like. In recent years, the use of electronic medical records is becoming more and more widespread, people have gradually improved knowledge of electronic medical records, and how to effectively mine a large amount of clinical information of patients, such as numbers, characters, tables, figures, images and other medical knowledge, and the utilization of the professional knowledge plays an important role in the development of medical health care industry.
The natural language processing method is mainly used for mining knowledge in medical texts, and the information extraction task mainly comprises NER (named entity recognition) and RE (relationship extraction). This task is used in medical informatics for Clinical Decision Support (CDS) research services for medical professionals. The method is mainly a method provided for the task of extracting the relation.
Relationship extraction is a task of extracting named relationships between entities in a natural language processing process, and extracting semantic relationships between entities in sentences that are labeled in an entity recognition process. The relation extraction technology is divided into three categories based on machine learning, supervised relation extraction, semi-supervised relation extraction, unsupervised relation extraction and open entity relation extraction according to the dependence of the training data set on manual labeling in the extraction process.
1. And (3) extracting supervision relations: the essence of supervised relationship extraction is classification, and the method needs a large number of labeled training data sets, and then identifies and classifies entity relationship types of a text corpus through machine learning. The feature vector-based method is to extract morphological information, syntactic information, and relational mode information from sentences of a text corpus, and quantize and encode useful information extracted from the sentences. Feature vectors and feature combinations may then be constructed. An entity relationship extraction model (e.g., classifier SVM, WINDOWs) may be established by machine learning. The quantity requirement of manually annotating the corpus is the greatest weakness of supervised relationship extraction, and the method is not suitable for processing a massive data corpus.
2. Weak supervision relation extraction: weakly supervised relationship extraction requires only a small annotated corpus and uses a representative sample of relationship seeds. The seeds of the labeled training dataset can be applied in a large-scale corpus and new extraction patterns are continuously extracted by an iterative method. The most widely used methods are bootstrapping, tag propagation and active learning. The bootstrap program summarizes the extended seed set by performing multiple experiments on a limited seed sample, and obtains the training examples through multiple iterations. In the bootstrap study, two representative systems were DIPRE and Snowball. The method has high requirements on initial relation seeds, each field needs a high-quality relation, and researches show that the method has low recall rate and poor portability.
3. Unsupervised relationship extraction: unsupervised relationship extraction does not require any manually annotated corpus and does not require predefined entity relationships, and the automatic extraction process of semantic relationships depends mainly on clustering the corpus. The method has strong portability in various fields and can be used for large-scale information extraction. However, the current experimental research has not obtained ideal extraction results, and the accuracy and the recall ratio are not obviously improved.
Relationship extraction based on semi-supervision can utilize a large amount of unlabelled data, only a small amount of entity relationships need to be annotated manually, the method can be used for extracting the entity relationships lacking in an annotated corpus, and has shown advantages in the electronic medical record relationship extraction.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, a terminal device and a computer-readable storage medium for extracting an entity relationship of a medical electronic medical record based on semi-supervision, which can greatly improve the accuracy of extracting the entity relationship of the electronic medical record.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
an entity relationship extraction method comprises the following steps:
step 1, manually extracting a plurality of binary entity pairs which accord with a preset entity relationship from an electronic medical record text database to serve as seed examples;
step 2, for each seed case, searching sentences including the seed case in an electronic medical record text database, and extracting characteristic vectors of the sentences;
step 3, clustering the seed examples based on the characteristic vectors; generating an extraction template corresponding to the cluster according to the seed example and the characteristic vector of the sentence corresponding to the seed example;
step 4, extracting candidate examples from the electronic medical record text database by using the extraction template obtained in the step 3;
each extraction template can extract a group of a plurality of candidate examples, and a plurality of extraction templates can extract the same candidate example;
step 5, adding a new seed example according to the confidence of the candidate example;
step 5.1, for each extracted template obtained in the step 3, calculating the confidence of the extracted template by using the entity relationship between the candidate instance extracted by the template and the template;
step 5.2, for each candidate example obtained in the step 4, calculating the confidence coefficient of the candidate example by using the confidence coefficients of all the extracted templates which can extract the candidate example;
and 5.3, taking the candidate example with the confidence coefficient larger than the confidence coefficient threshold value as a new seed example, returning to the step 2, and executing the next iteration until the preset iteration times are reached.
In a more preferred technical solution, the method for calculating the confidence of each extracted template in step 5.1 is as follows:
counting the candidate examples extracted by the self, wherein if the candidate examples are the same as the 2 entities in the extraction template, the candidate examples are extracted; if the candidate instance is the same as 1 entity in the extraction template, the candidate instance is negative extraction; if the candidate instance is different from 2 entities in the extraction template, the candidate instance is unknown extraction; then, according to the number of positive extractions, negative extractions and unknown extractions, the confidence of the extraction template is calculated according to the following formula:
Figure BDA0002370008980000031
in the formula, Confρ(P) represents the confidence coefficient of the template P, wherein P, N and U respectively represent the number of positive extraction, negative extraction and unknown extraction corresponding to the template P, and Wngt、WunkWeights for negative and unknown extractions, respectively;
the method for calculating the confidence of the candidate instance in step 5.2 is as follows:
Figure BDA0002370008980000032
in the formula, Confι(i) For waitingThe confidence coefficient of the selected example i, xi is the set formed by all the extraction templates of the candidate example i, xijFor an extraction template referenced j in the set xi, CiThe sentence is the sentence where the candidate instance i is; sim (C)ij) Representing sentence CiAnd extracting template xijThe similarity between them.
In a more preferred technical solution, the candidate examples refer to all pairs of binary entities that satisfy a preset entity relationship and have similarity greater than a similarity threshold with the extracted template.
In a more preferred technical scheme, the specific process of extracting the feature vector of each sentence is as follows: analyzing the sentence according to the dependency syntax, extracting all dependency characteristics of the binary entity pairs in the sentence, extracting a word vector of each dependency characteristic by using a skip-gram method, and taking the average value of all the word vectors as the feature vector of the sentence.
In a more preferred technical scheme, a single-pass algorithm is used to cluster sentences.
In a more preferred embodiment, the pair of binary entities that satisfy the predetermined entity relationship is < body part, medical description >.
In a more preferred technical scheme, the electronic medical record text database is a txt document which comprises a plurality of medical electronic medical record text data, is processed in a sentence division manner and is obtained by carrying out entity labeling processing on each sentence.
In a more preferred technical solution, the number of iterations is preset to 5.
The present invention also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements any one of the methods described above when executing the computer program.
The invention also provides a computer-readable storage medium, in which a computer program is stored, wherein the computer program, when executed by a processor, implements the method of any one of the above.
Advantageous effects
Compared with the prior art, the invention has the beneficial effects that:
according to the method, firstly, a small number of seed instances are used for generating an extraction template, then, candidate instances are extracted from an electronic medical record text database according to the extraction template, finally, the confidence coefficient of each candidate instance is calculated according to the entity relationship between the candidate instances and the extraction template, whether the candidate instances are used as new seed instances for next iteration is determined according to the confidence coefficient, so that semantic drift can be controlled, namely, some candidate instances with low correlation degree with the extraction template are prevented from being used as seed instances to enter the next iteration, and more relationship instances irrelevant to the seed instances are frequently generated, so that the accuracy rate of extracting the entity relationship of the electronic medical record can be greatly improved; in addition, only a small amount of seed examples need to be provided, so that a large amount of data without labels can be processed, the effect is good, and the development of medical health careers can be better assisted.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail, which are developed based on the technical solutions of the present invention, and give detailed implementation manners and specific operation procedures to further explain the technical solutions of the present invention.
The embodiment provides a medical electronic medical record entity relation extraction method based on semi-supervision, which comprises the following steps as shown in fig. 1:
step 1, preprocessing data;
acquiring a plurality of medical electronic medical record text data for training from a hospital, and combining all the data into a txt document; then dividing the document into sentences; secondly, carrying out entity labeling on sentences in the document by using a BILSTM + CRF technology, and paying attention to two types of entities, namely BODYPART (body part) and DESCRIPTION (medical DESCRIPTION), to obtain a sentence document; finally, a small number of binary entity pairs with the entity relationship of < body part, medical description > are selected as seed examples in a sentence document in an artificial mode, such as < waist, pain >.
Step 2, searching seed matching: for each seed instance, a sentence comprising the seed instance is searched in the text database of the electronic medical record, and a feature vector of the sentence is extracted.
In particular, the sentence file is scanned and if two entities in the seed instance appear in a sentence at the same time, the sentence S is scannedi={ai1,ai2,ai3,...,ainPerforming dependency syntax analysis, and extracting the common dependency characteristics a of two entities in the sentenceiqExtracting all dependency characteristics of the binary entity pairs in the sentences; then, word embedding is carried out by using a skip-gram method to obtain each dependency characteristic aiqCorresponding word vector
Figure BDA0002370008980000051
Finally, all the word vectors are taken
Figure BDA0002370008980000052
Is taken as the sentence SiFeature vector of
Figure BDA0002370008980000053
Figure BDA0002370008980000054
Step 3, generating an extraction template: clustering the seed examples based on the feature vectors and by adopting a single-pass algorithm; and for each cluster, generating an extraction template corresponding to the cluster according to the seed example and the feature vector of the sentence corresponding to the seed example.
Specifically, all the seed instances are obtained, and the 1 st seed instance is allocated to a new empty cluster; traversing each remaining seed instance, calculating a similarity between the seed instance and each cluster based on the feature vectors, and assigning the seed instance to a similarity greater than or equal to a similarity threshold τsimIf the similarity of the seed instance to each cluster is below the similarity threshold τsimIf so, creating a new cluster and assigning the seed instance to the newly created cluster; finally, each cluster comprises a group of a plurality of seed instances, the wrong cluster is removed through a manual supervision method, and the rest clusters are used for generating a template through averaging the feature vectors of the seed instances
Figure BDA0002370008980000055
I.e. each cluster CljGenerating an extraction template, wherein
Figure BDA0002370008980000056
As a template PjThe feature vector of (2). In this embodiment, if the entity relationship of the seed instances in the cluster does not conform to the preset entity relationship, that is, does not conform to the preset entity relationship<Body part, medical description>This relationship is considered to be the wrong cluster.
Wherein, seed example inAnd cluster CljThe similarity function between them is sim (i)n,Clj) By computing seed instances inAnd cluster CljIf the similarity score of more than half of the seed instances is more than the similarity threshold value, taking the maximum similarity score as the seed instance inAnd cluster CljSimilarity value between otherwise seed instance inAnd cluster CljThe similarity value therebetween is assigned to 0. And the similarity between the two seed instances is calculated by the following formula:
sim(in,ij)=sim(Sn,Sj)=cos(Vn,Vj);
wherein in,ijRepresents two different seed instances, Sn,SjRespectively represent seed instances in,ijThe sentence in which V is locatedn,VjRespectively represent sentences Sn,SjCharacteristic vector of (c), cos (V)n,Vj) Representation of feature vector Vn,VjCosine similarity between them.
Step 4, searching candidate examples: extracting candidate examples, namely all binary entity pairs which are in accordance with a preset entity relationship and have similarity with the extracted template larger than a similarity threshold value, from the electronic medical record text database by using the extracted template obtained in the step (3);
each extraction template can extract a group of a plurality of candidate examples, and a plurality of extraction templates can extract the same candidate example.
Specifically, the method comprises the following steps:
step 4.1, scanning sentence documents, and collecting all sentences containing binary entity pairs which accord with the preset entity relationship;
step 4.2, traversing each sentence obtained in step 4.1: performing dependency syntactic analysis and other steps on the sentence according to the same method in the step 2 to extract a feature vector of the sentence; then, the similarity of the sentence and each extraction template is calculated based on the feature vectors: if the similarity between the sentence and any one of the extraction templates is greater than the similarity threshold, taking the binary entity pair in the sentence as a candidate example, and taking all the extraction templates with the similarity greater than the similarity threshold as the extraction templates of the candidate example;
step 4.3, after step 4.2 is completed, each candidate instance may correspond to a group of several extraction templates, and a group of several candidate instances may correspond to the same extraction template, that is: each extraction template can extract a group of a plurality of candidate examples, and a plurality of extraction templates can extract the same candidate example.
Step 5, controlling semantic drift to add a new seed instance according to the confidence of the candidate instance;
step 5.1, for each extracted template obtained in step 3, calculating the confidence of the extracted template by using the entity relationship between the candidate instance extracted by the extracted template and the extracted template, specifically:
counting the candidate examples extracted by the self, wherein if the candidate examples are the same as the 2 entities in the extraction template, the candidate examples are extracted; if the candidate instance is the same as 1 entity in the extraction template, the candidate instance is negative extraction; if the candidate instance is different from 2 entities in the extraction template, the candidate instance is unknown extraction; then, according to the number of positive extractions, negative extractions and unknown extractions, the confidence of the extraction template is calculated according to the following formula:
Figure BDA0002370008980000061
in the formula, Confρ(P) represents the confidence coefficient of the template P, wherein P, N and U respectively represent the number of positive extraction, negative extraction and unknown extraction corresponding to the template P, and Wngt、WunkWeights for negative and unknown extractions, respectively;
step 5.2, for each candidate example obtained in step 4, the confidence degrees of all the extracted templates which can extract the candidate example are used, and the confidence degree of the candidate example is calculated according to the following formula:
Figure BDA0002370008980000071
in the formula, Confι(i) Is the confidence of the candidate instance i, ξ is the set of all the extracted templates of the candidate instance i, ξjFor an extraction template referenced j in the set xi, CiThe sentence is the sentence where the candidate instance i is; sim (C)ij) Representing sentence CiAnd extracting template xijThe similarity between them;
step 5.3, the confidence coefficient is larger than the confidence coefficient threshold value tautThe candidate example is used as a new seed example, the step 2 is returned to execute the next iteration until the preset iteration times are reached, and the process is finished; in the present embodiment, the preset number of iterations is set to 5.
The present invention also provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method described in the above method embodiments when executing the computer program.
The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method described in the above-mentioned method embodiments.
According to the entity relationship extraction method, the terminal device and the computer readable storage medium in the embodiments of the invention, firstly, a small number of seed instances are used to generate an extraction template, then, candidate instances are extracted from an electronic medical record text database according to the extraction template, and then, the confidence coefficient of each candidate instance is calculated according to the entity relationship between the candidate instances and the extraction template, so as to determine whether to perform the next iteration by taking the candidate instances as new seed instances according to the confidence coefficient, thereby controlling semantic drift, namely, avoiding that some candidate instances with low correlation with the extraction template enter the next iteration as seed instances to frequently generate more instances irrelevant to the seed instances, and greatly improving the accuracy of the entity relationship extraction of the electronic medical record; in addition, only a small amount of seed examples need to be provided, so that a large amount of data without labels can be processed, the effect is good, and the development of medical health career can be better assisted.
The above embodiments are preferred embodiments of the present application, and those skilled in the art can make various changes or modifications without departing from the general concept of the present application, and such changes or modifications should fall within the scope of the claims of the present application.

Claims (10)

1. An entity relationship extraction method is characterized by comprising the following steps:
step 1, manually extracting a plurality of binary entity pairs which accord with a preset entity relationship from an electronic medical record text database to serve as seed examples;
step 2, for each seed case, searching sentences including the seed case in an electronic medical record text database, and extracting characteristic vectors of the sentences;
step 3, clustering the seed examples based on the characteristic vectors; generating an extraction template corresponding to the cluster according to the seed example and the characteristic vector of the sentence corresponding to the seed example;
step 4, extracting candidate examples from the electronic medical record text database by using the extraction template obtained in the step 3;
each extraction template can extract a group of a plurality of candidate examples, and a plurality of extraction templates can extract the same candidate example;
step 5, adding a new seed example according to the confidence of the candidate example;
step 5.1, for each extracted template obtained in the step 3, calculating the confidence of the extracted template by using the entity relationship between the candidate instance extracted by the template and the template;
step 5.2, for each candidate example obtained in the step 4, calculating the confidence coefficient of the candidate example by using the confidence coefficients of all the extracted templates which can extract the candidate example;
and 5.3, taking the candidate example with the confidence coefficient larger than the confidence coefficient threshold value as a new seed example, returning to the step 2, and executing the next iteration until the preset iteration times are reached.
2. The method of claim 1, wherein the confidence level of each extracted template in step 5.1 is calculated by:
counting the candidate examples extracted by the self, wherein if the candidate examples are the same as the 2 entities in the extraction template, the candidate examples are extracted; if the candidate instance is the same as 1 entity in the extraction template, the candidate instance is negative extraction; if the candidate instance is different from 2 entities in the extraction template, the candidate instance is unknown extraction; then, according to the number of positive extractions, negative extractions and unknown extractions, the confidence of the extraction template is calculated according to the following formula:
Figure FDA0002370008970000011
in the formula, Confρ(P) represents the confidence coefficient of the template P, wherein P, N and U respectively represent the number of positive extraction, negative extraction and unknown extraction corresponding to the template P, and Wngt、WunkNegative and unknown respectivelyThe weight taken;
the method for calculating the confidence of the candidate instance in step 5.2 is as follows:
Figure FDA0002370008970000012
in the formula, Confι(i) Is the confidence of the candidate instance i, ξ is the set of all the extracted templates of the candidate instance i, ξjFor an extraction template referenced j in the set xi, CiThe sentence is the sentence where the candidate instance i is; sim (C)ij) Representing sentence CiAnd extracting template xijThe similarity between them.
3. The method of claim 1, wherein the candidate instances refer to all pairs of binary entities matching a predetermined entity relationship, and the similarity between the pairs of binary entities and the extracted template is greater than a similarity threshold.
4. The method according to claim 1, wherein the specific process of extracting the feature vector of each sentence is as follows: analyzing the sentence according to the dependency syntax, extracting all dependency characteristics of the binary entity pairs in the sentence, extracting a word vector of each dependency characteristic by using a skip-gram method, and taking the average value of all the word vectors as the feature vector of the sentence.
5. The method of claim 1, wherein sentences are clustered using a single-pass algorithm.
6. The method of claim 1, wherein the pair of binary entities that conform to the predetermined entity relationship is < body part, medical description >.
7. The method as claimed in claim 6, wherein the electronic medical record text database is txt document which comprises a plurality of medical electronic medical record text data, is processed by sentence division, and is obtained by entity labeling processing on each sentence.
8. The method of claim 1, wherein the predetermined number of iterations is 5.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
CN202010047654.5A 2020-01-16 2020-01-16 Entity relation extraction method, terminal equipment and computer readable storage medium Active CN113130025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010047654.5A CN113130025B (en) 2020-01-16 2020-01-16 Entity relation extraction method, terminal equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010047654.5A CN113130025B (en) 2020-01-16 2020-01-16 Entity relation extraction method, terminal equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113130025A true CN113130025A (en) 2021-07-16
CN113130025B CN113130025B (en) 2023-11-24

Family

ID=76771765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010047654.5A Active CN113130025B (en) 2020-01-16 2020-01-16 Entity relation extraction method, terminal equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113130025B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658652A (en) * 2021-08-18 2021-11-16 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN114625880A (en) * 2022-05-13 2022-06-14 上海帜讯信息技术股份有限公司 Character relation extraction method, device, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027664A1 (en) * 2003-07-31 2005-02-03 Johnson David E. Interactive machine learning system for automated annotation of information in text
US20190065576A1 (en) * 2017-08-23 2019-02-28 Rsvp Technologies Inc. Single-entity-single-relation question answering systems, and methods
CN109710932A (en) * 2018-12-22 2019-05-03 北京工业大学 A kind of medical bodies Relation extraction method based on Fusion Features
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050027664A1 (en) * 2003-07-31 2005-02-03 Johnson David E. Interactive machine learning system for automated annotation of information in text
US20190065576A1 (en) * 2017-08-23 2019-02-28 Rsvp Technologies Inc. Single-entity-single-relation question answering systems, and methods
CN109710932A (en) * 2018-12-22 2019-05-03 北京工业大学 A kind of medical bodies Relation extraction method based on Fusion Features
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHIGUANG WANG等: "Pedestrian Detection via Body Part Semantic and Contextual Information With DNN", 《IEEE TRANSACTIONS ON MULTIMEDIA》, vol. 20, no. 11, pages 3148 - 3159, XP011691817, DOI: 10.1109/TMM.2018.2829602 *
SHIGUANG WAN等: "PCN: Part and Context Information for Pedestrian Detection with CNNs", 《网页在线公开:HTTPS://ARXIV.ORG/ABS/1804.04483V1》, pages 1 - 13 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658652A (en) * 2021-08-18 2021-11-16 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN113658652B (en) * 2021-08-18 2023-07-28 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN114625880A (en) * 2022-05-13 2022-06-14 上海帜讯信息技术股份有限公司 Character relation extraction method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN113130025B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
CN111079377B (en) Method for recognizing named entities of Chinese medical texts
Alyas et al. Empirical method for thyroid disease classification using a machine learning approach
CN111949759A (en) Method and system for retrieving medical record text similarity and computer equipment
CN109508459B (en) Method for extracting theme and key information from news
CN111834014A (en) Medical field named entity identification method and system
WO2021151353A1 (en) Medical entity relationship extraction method and apparatus, and computer device and readable storage medium
CN113076411B (en) Medical query expansion method based on knowledge graph
CN112052318A (en) Semantic recognition method and device, computer equipment and storage medium
CN111143571B (en) Entity labeling model training method, entity labeling method and device
CN110675962A (en) Traditional Chinese medicine pharmacological action identification method and system based on machine learning and text rules
CN112214335A (en) Web service discovery method based on knowledge graph and similarity network
CN113130025B (en) Entity relation extraction method, terminal equipment and computer readable storage medium
CN113254609B (en) Question-answering model integration method based on negative sample diversity
Saranya et al. Intelligent medical data storage system using machine learning approach
Ihou et al. Stochastic topic models for large scale and nonstationary data
CN111597330A (en) Intelligent expert recommendation-oriented user image drawing method based on support vector machine
CN117235275A (en) Medical disease coding mapping method and device based on large language model reasoning
CN116737924A (en) Medical text data processing method and device
CN116719840A (en) Medical information pushing method based on post-medical-record structured processing
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium
CN115345165A (en) Specific entity identification method oriented to label scarcity or distribution unbalance scene
CN112836014A (en) Multi-field interdisciplinary-oriented expert selection method
CN110689943A (en) Acupuncture clinical data preprocessing control system and method and information data processing terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant