CN111798987A - Entity relationship extraction method and device - Google Patents

Entity relationship extraction method and device Download PDF

Info

Publication number
CN111798987A
CN111798987A CN202010648089.8A CN202010648089A CN111798987A CN 111798987 A CN111798987 A CN 111798987A CN 202010648089 A CN202010648089 A CN 202010648089A CN 111798987 A CN111798987 A CN 111798987A
Authority
CN
China
Prior art keywords
entity
sentence
sentences
bert model
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010648089.8A
Other languages
Chinese (zh)
Inventor
陆晓静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010648089.8A priority Critical patent/CN111798987A/en
Publication of CN111798987A publication Critical patent/CN111798987A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The invention provides a method and a device for extracting entity relations, wherein the method comprises the following steps: acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs; extracting sentences containing the entity pairs from professional data in the medical field; screening sentence templates for representing the relation from the sentences based on an initial BERT model; and adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model. Through the scheme, data are extracted from professional data in the medical field, the sentence templates used for representing relations are selected based on the sentences obtained through extraction, the work efficiency of data labeling and feature matching is improved, various data in the medical field can be adapted, a large amount of manpower is not needed, and the cost is saved.

Description

Entity relationship extraction method and device
Technical Field
The present invention relates to the field of data relationship extraction technologies, and in particular, to a method and an apparatus for extracting an entity relationship.
Background
Currently, for medical entities such as diseases and symptoms, diseases and operations, extraction of relationships between entities is required, and there are two types of existing extraction methods: the method is based on rule extraction, and the other method is based on supervised learning, wherein the rule extraction method is to use a preset rule to extract corresponding entities from a text or judge whether the entities conform to the corresponding relationship, and the supervised learning method is to train a classifier to judge whether the entities have the corresponding relationship after marking a large amount of data. Both of these current solutions present problems:
the method for using the rules depends on the quality of the established rules, a large amount of manpower input is needed in the early stage, the manually established rules cannot necessarily cover all relation types, and the recall rate is poor. The method of supervised learning requires a large amount of labeled data, is high in cost, time-consuming and labor-consuming, and is poor in flexibility because data labeling is required if a new relationship needs to be extracted.
Thus, there is a need for a better approach to solving the problems encountered in entity relationship extraction.
Disclosure of Invention
According to the scheme, data are extracted from professional data in the medical field, sentence templates for representing the relation are screened out based on the extracted sentences, the work efficiency of data labeling and feature matching is improved, the method can adapt to various data in the medical field, a large amount of manpower is not needed, and the cost is saved.
Specifically, the present invention proposes the following specific examples:
the embodiment of the invention provides a method for extracting entity relationships, which comprises the following steps:
acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs;
extracting sentences containing the entity pairs from professional data in the medical field;
screening sentence templates for representing the relation from the sentences based on an initial BERT model;
and adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model. .
In a specific embodiment, the medical domain professional data comprises medical record data.
In a specific embodiment, a preset relationship exists between entities in each of the entity pairs;
the extracting the sentence containing the entity pair from the professional data of the medical field comprises:
and extracting sentences which have preset lengths and contain preset interval values of the intervals of the entities in the entity pairs from professional data in the medical field.
In a specific embodiment, the "screening a sentence template for characterizing the relationship from the sentences based on the initial BERT model" includes:
generating an initial sentence template based on the sentences, wherein the sentences constructed based on the initial sentence template exceed a preset proportion and meet the relationship;
carrying out usability scoring on each initial sentence template through a BERT model;
and screening the initial sentence template according to the usability scores to select a sentence template for representing the relationship.
In a particular embodiment of the present invention,
the "scoring usability of each of the initial sentence templates by a BERT model" includes:
for each initial sentence template, constructing a sentence with a space based on the initial sentence template and each entity in the entity pair data set;
predicting the blank in the sentence based on a BERT model to obtain a prediction result;
determining a score for the initial sentence template based on the prediction result.
In a particular embodiment, the availability score is determined based on the following formula:
Figure BDA0002573900220000031
wherein, the
Figure BDA0002573900220000034
As sentence templates
Figure BDA0002573900220000035
An availability score of (a);
Figure BDA0002573900220000032
at sj∈SijIs then 1, in
Figure BDA0002573900220000038
Is 0;
Figure BDA0002573900220000033
at tj∈TijIs then 1, in
Figure BDA0002573900220000039
Is 0;
Tijand SijAre respectively sentences
Figure BDA0002573900220000036
And sentences
Figure BDA0002573900220000037
Top-k predictions of (c); .
sjAnd tjIs an entity in a pair of entity pairs.
In a specific embodiment, the "adjusting the initial BERT model based on the sentence template and the entity pair dataset" includes:
summarizing the sentence templates obtained after screening into a sentence template set;
constructing a regular sentence on the sentence template set based on the entity pair data set;
constructing a counterexample sentence on the sentence template set based on a part of entity pairs in the entity pair data set and the anti-entity pair data set; the anti-entity pair dataset is identical to the entities of the entity pair dataset, and the entities in the entity pairs are in reverse order;
and adjusting the BERT model based on the positive example sentences and the negative example sentences so as to pass through the adjusted BERT model.
In a specific embodiment, the method further comprises the following steps:
when the designated entity pair is required to be judged whether to meet the given relationship, predicting the constructed sentences to be predicted through the adjusted BERT model; the sentence to be predicted is constructed and generated based on the specified entity pair and the sentence template set;
and if the average value of the obtained prediction results is larger than a set threshold value, determining that the designated entity pair meets the given relationship.
The embodiment of the present invention further provides an entity relationship extraction device, including:
the acquisition module is used for acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs;
the extraction module is used for extracting sentences containing the entity pairs from professional data in the medical field;
the screening module is used for screening sentence templates for representing the relation from the sentences based on an initial BERT model;
and the processing module is used for adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model.
In a specific embodiment, the medical domain professional data comprises medical record data. Therefore, the embodiment of the invention provides a method and equipment for extracting entity relationships, wherein the method comprises the following steps: acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs; extracting sentences containing the entity pairs from professional data in the medical field; screening sentence templates for representing the relation from the sentences based on an initial BERT model; and adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model. Through the scheme, data are extracted from professional data in the medical field, the sentence templates used for representing relations are selected based on the sentences obtained through extraction, the work efficiency of data labeling and feature matching is improved, various data in the medical field can be adapted, a large amount of manpower is not needed, and the cost is saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of an entity relationship extraction method according to an embodiment of the present invention;
fig. 2 is a schematic view of a flow framework of an entity relationship extraction method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an entity relationship extraction device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an entity relationship extraction device according to an embodiment of the present invention.
Detailed Description
Various embodiments of the present disclosure will be described more fully hereinafter. The present disclosure is capable of various embodiments and of modifications and variations therein. However, it should be understood that: there is no intention to limit the various embodiments of the disclosure to the specific embodiments disclosed herein, but rather, the disclosure is to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of the various embodiments of the disclosure.
The terminology used in the various embodiments of the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the present disclosure. As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present disclosure belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined in various embodiments of the present disclosure.
Example 1
The embodiment 1 of the invention discloses a method for extracting entity relationship, which comprises the following steps as shown in figure 1:
step 101, acquiring an entity pair data set containing a preset relationship; wherein the entity pair data set comprises a plurality of entity pairs.
In particular, given a set of related entity pair datasets
Figure BDA0002573900220000061
For example, entity-to-data sets of disease and symptoms { (hepatitis, hepatomegaly), (rubella, headache) }. The specific entity pair may be, for example, (rubella, headache), wherein "rubella" and "headache" are both entities in the entity pair.
Step 102, extracting sentences containing the entity pairs from professional data in the medical field;
specifically, the professional data in the medical field includes medical record data.
In a specific embodiment, a preset relationship exists between entities in each of the entity pairs;
the extracting the sentence containing the entity pair from the professional data of the medical field comprises:
and extracting sentences which have preset lengths and contain preset interval values of the intervals of the entities in the entity pairs from professional data in the medical field.
Thus, all sentences containing the entity pairs in R are extracted from the medical record, wherein only the length L (for example, the number of characters is 15 or other numbers, and the specific length can be) s is extractediAnd tiThe interval of (a) is W (which may also be measured in characters or other manners, for example, the interval is 8 characters, etc., and the specific interval may be flexibly set according to actual conditions). By phi1(x1,y1),…,φm(xm,ym) Representing all the extracted sentences.
And 103, screening a sentence template for representing the relation from the sentences based on the initial BERT model.
Specifically, an initial sentence template phi is screened from the extracted sentencesj(xj,yj) So that the sentence set phi constructed therewithj(s1,t1),...,φj(sn,tn) Most sentences in (e.g., scale 8 or higher or lower) may satisfy the relationship to be extracted. For example, the initial sentence template "patient _ would show _" can be used to characterize the relationship of most diseases to symptoms.
In one particular embodiment, to screen out more desirable sentence templates,
"screening out a sentence template for characterizing the relationship from the sentences based on the initial BERT model" includes:
generating an initial sentence template based on the sentences, wherein the sentences constructed based on the initial sentence template exceed a preset proportion and meet the relationship;
carrying out usability scoring on each initial sentence template through a BERT model;
and screening the initial sentence template according to the usability scores to select a sentence template for representing the relationship.
In addition, the "scoring usability of each of the initial sentence templates by the BERT model" includes:
for each initial sentence template, constructing a sentence with a space based on the initial sentence template and each entity in the entity pair data set;
predicting the blank in the sentence based on a BERT model to obtain a prediction result;
determining a score for the initial sentence template based on the prediction result.
Evaluation of a template phi using the original BERT modeliAvailability of (c). Construction of a sentence phi using a data set Ri(s1,_),...,φi(snPhi and phi arei(_,t1),...,φi(_,tn) Wherein the underline represents a space, if the sentence "will show a _" if hepatitis is found ", the space part is predicted using the BERT model, and the corresponding t is counted1,…,tnAnd s1,…,snWhether it is in the first k (top-k) of the prediction. Using Tij,SijRespectively represent the sentences phii(sjAnd phi and the sentencei(_,tj) Top-k predictions, the usability score is determined based on the following equation:
Figure BDA0002573900220000081
wherein, the
Figure BDA0002573900220000086
As sentence templates
Figure BDA0002573900220000087
An availability score of (a);
Figure BDA0002573900220000082
at sj∈SijIs then 1, in
Figure BDA0002573900220000084
Is 0;
Figure BDA0002573900220000083
at tj∈TijIs then 1, in
Figure BDA0002573900220000085
Is 0;
Tijand SijAre respectively sentences
Figure BDA0002573900220000088
And sentences
Figure BDA0002573900220000089
Top-k predictions of (c); wherein, the first K (top-K) prediction results (such as predicted words) of underlining are predicted by using bert;
sjand tjIs an entity in a pair of entity pairs.
In a specific embodiment, the "adjusting the initial BERT model based on the sentence template and the entity pair dataset" includes:
summarizing the sentence templates obtained after screening into a sentence template set;
constructing a regular sentence on the sentence template set based on the entity pair data set;
constructing a counterexample sentence on the sentence template set based on a part of entity pairs in the entity pair data set and the anti-entity pair data set; the anti-entity pair dataset is identical to the entities of the entity pair dataset, and the entities in the entity pairs are in reverse order;
and adjusting the BERT model based on the positive example sentences and the negative example sentences so as to pass through the adjusted BERT model.
Specifically, as shown in fig. 2, psi ═ psi is used1,…,ψcRepresents C template sentences after screening. To improve the prediction accuracy, the data set R {(s) may be used1,t1),…,(sn,tn) In the moldConstructing sentences of a positive example on the board set psi;
in addition, the usage data set R { (t)1,s1),…,(tn,sn) And sampling a batch of samples(s) from the data set Ri,tj);i≠j;i,j∈[1,n]And constructing sentences of negative examples on the template set psi, and performing finetune (namely adjustment) on the original BERT model by using the constructed positive and negative example sentences through a binary classification method.
Further, as shown in fig. 2, the method further includes:
when the designated entity pair is required to be judged whether to meet the given relationship, predicting the constructed sentences to be predicted through the adjusted BERT model; the sentence to be predicted is constructed and generated based on the specified entity pair and the sentence template set;
and if the average value of the obtained prediction results is larger than a set threshold value, determining that the designated entity pair meets the given relationship.
Specifically, BERT after using finetune predicts whether (s, t) is such that a given relationship is satisfied. Because there are c sentence templates, c sentences can be constructed, and thus, the predicted results are respectively: p is a radical of1(s, t), …, pc (s, t), setting a threshold lambda, and averaging the prediction results
Figure BDA0002573900220000091
When, the input (s, t) can be considered to satisfy a given relationship.
Example 2
Embodiment 2 of the present invention further discloses an entity relationship extraction device, as shown in fig. 3, including:
an obtaining module 201, configured to obtain an entity pair data set including a preset relationship; wherein the entity pair data set comprises a plurality of entity pairs.
An extraction module 202, configured to extract a sentence including the entity pair from professional data in the medical field;
a screening module 203, configured to screen a sentence template for characterizing the relationship from the sentences based on an initial BERT model;
the processing module 204 is configured to adjust the initial BERT model based on the sentence template and the entity-to-data set, so as to perform relationship extraction on the entity-to-data set to be subjected to the extraction relationship through the adjusted BERT model.
In a specific embodiment, the medical domain professional data comprises medical record data.
In a specific embodiment, a preset relationship exists between entities in each of the entity pairs;
the extraction module 202 is configured to:
and extracting sentences which have preset lengths and contain preset interval values of the intervals of the entities in the entity pairs from professional data in the medical field.
In a particular embodiment, the screening module 203 is used for
Generating an initial sentence template based on the sentences, wherein the sentences constructed based on the initial sentence template exceed a preset proportion and meet the relationship;
carrying out usability scoring on each initial sentence template through a BERT model;
and screening the initial sentence template according to the usability scores to select a sentence template for representing the relationship.
In a specific embodiment, the filtering module 203 scores the usability of each of the initial sentence templates through a BERT model, which includes:
for each initial sentence template, constructing a sentence with a space based on the initial sentence template and each entity in the entity pair data set;
predicting the blank in the sentence based on a BERT model to obtain a prediction result;
determining a score for the initial sentence template based on the prediction result.
In a particular embodiment, the availability score is determined based on the following formula:
Figure BDA0002573900220000101
wherein, the
Figure BDA0002573900220000105
As sentence templates
Figure BDA0002573900220000104
An availability score of (a);
Figure BDA0002573900220000102
at sj∈SijIs then 1, in
Figure BDA0002573900220000106
Is 0;
Figure BDA0002573900220000103
at tj∈TijIs then 1, in
Figure BDA0002573900220000107
Is 0;
Tijand SijAre respectively sentences
Figure BDA0002573900220000111
And sentences
Figure BDA0002573900220000112
Top-k predictions of (c); .
sjAnd tjIs an entity in a pair of entity pairs.
In a specific embodiment, the processing module 204 is configured to
Summarizing the sentence templates obtained after screening into a sentence template set;
constructing a regular sentence on the sentence template set based on the entity pair data set;
constructing a counterexample sentence on the sentence template set based on a part of entity pairs in the entity pair data set and the anti-entity pair data set; the anti-entity pair dataset is identical to the entities of the entity pair dataset, and the entities in the entity pairs are in reverse order;
and adjusting the BERT model based on the positive example sentences and the negative example sentences so as to pass through the adjusted BERT model.
In a specific embodiment, as shown in fig. 4, the method further includes: a judging module 205 for
When the designated entity pair is required to be judged whether to meet the given relationship, predicting the constructed sentences to be predicted through the adjusted BERT model; the sentence to be predicted is constructed and generated based on the specified entity pair and the sentence template set;
and if the average value of the obtained prediction results is larger than a set threshold value, determining that the designated entity pair meets the given relationship.
Therefore, the embodiment of the invention provides a method and equipment for extracting entity relationships, wherein the method comprises the following steps: acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs; extracting sentences containing the entity pairs from professional data in the medical field; screening sentence templates for representing the relation from the sentences based on an initial BERT model; and adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model. Through the scheme, data are extracted from professional data in the medical field, the sentence templates used for representing relations are selected based on the sentences obtained through extraction, the work efficiency of data labeling and feature matching is improved, various data in the medical field can be adapted, a large amount of manpower is not needed, and the cost is saved.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned invention numbers are merely for description and do not represent the merits of the implementation scenarios.
The above disclosure is only a few specific implementation scenarios of the present invention, however, the present invention is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims (10)

1. A method of entity relationship extraction, comprising:
acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs;
extracting sentences containing the entity pairs from professional data in the medical field;
screening sentence templates for representing the relation from the sentences based on an initial BERT model;
and adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model.
2. The method of claim 1, wherein the medical domain professional data comprises medical record data.
3. The method of claim 1, wherein there is a predetermined relationship between entities in each of the entity pairs;
the extracting the sentence containing the entity pair from the professional data of the medical field comprises:
and extracting sentences which have preset lengths and contain preset interval values of the intervals of the entities in the entity pairs from professional data in the medical field.
4. The method of claim 1, wherein the step of selecting sentence templates for characterizing the relationship from the sentences based on the initial BERT model comprises:
generating an initial sentence template based on the sentences, wherein the sentences constructed based on the initial sentence template exceed a preset proportion and meet the relationship;
carrying out usability scoring on each initial sentence template through a BERT model;
and screening the initial sentence template according to the usability scores to select a sentence template for representing the relationship.
5. The method of claim 4, wherein said scoring usability of each of said initial sentence templates by a BERT model comprises:
for each initial sentence template, constructing a sentence with a space based on the initial sentence template and each entity in the entity pair data set;
predicting the blank in the sentence based on a BERT model to obtain a prediction result;
determining a score for the initial sentence template based on the prediction result.
6. A method of entity relationship extraction as claimed in claim 4 or 5, wherein said availability score is determined based on the following formula:
Figure FDA0002573900210000021
wherein the score (phi)i) For sentence templates phiiAn availability score of (a);
Figure FDA0002573900210000022
at sj∈SijIs then 1, in
Figure FDA0002573900210000023
Is 0;
Figure FDA0002573900210000024
at tj∈TijIs then 1, in
Figure FDA0002573900210000025
Is 0;
Tijand SijAre respectively a sentence phii(sjAnd phi and the sentencei(_,tj) Top-k predictions of (c); .
sjAnd tjIs an entity in a pair of entity pairs.
7. The method of claim 1, wherein the "adjusting the initial BERT model based on the sentence templates and the entities to the dataset" comprises:
summarizing the sentence templates obtained after screening into a sentence template set;
constructing a regular sentence on the sentence template set based on the entity pair data set;
constructing a counterexample sentence on the sentence template set based on a part of entity pairs in the entity pair data set and the anti-entity pair data set; the anti-entity pair dataset is identical to the entities of the entity pair dataset, and the entities in the entity pairs are in reverse order;
and adjusting the BERT model based on the positive example sentences and the negative example sentences so as to pass through the adjusted BERT model.
8. The method of entity relationship extraction as claimed in claim 7, further comprising:
when judging whether the designated entity pair meets the given relationship, predicting the constructed sentences to be predicted through the adjusted BERT model; the sentence to be predicted is constructed and generated based on the specified entity pair and the sentence template set;
and if the average value of the obtained prediction results is larger than a set threshold value, determining that the designated entity pair meets the given relationship.
9. An apparatus for entity relationship extraction, comprising:
the acquisition module is used for acquiring an entity pair data set containing a preset relation; wherein the entity pair dataset comprises a plurality of entity pairs;
the extraction module is used for extracting sentences containing the entity pairs from professional data in the medical field;
the screening module is used for screening sentence templates for representing the relation from the sentences based on an initial BERT model;
and the processing module is used for adjusting the initial BERT model based on the sentence template and the entity pair data set so as to extract the relation of the entity pair data set to be extracted through the adjusted BERT model.
10. The apparatus for entity relationship extraction as claimed in claim 9, wherein the professional data of the medical field comprises medical record data.
CN202010648089.8A 2020-07-07 2020-07-07 Entity relationship extraction method and device Pending CN111798987A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010648089.8A CN111798987A (en) 2020-07-07 2020-07-07 Entity relationship extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010648089.8A CN111798987A (en) 2020-07-07 2020-07-07 Entity relationship extraction method and device

Publications (1)

Publication Number Publication Date
CN111798987A true CN111798987A (en) 2020-10-20

Family

ID=72810468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010648089.8A Pending CN111798987A (en) 2020-07-07 2020-07-07 Entity relationship extraction method and device

Country Status (1)

Country Link
CN (1) CN111798987A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076774A1 (en) * 2016-10-28 2018-05-03 Boe Technology Group Co., Ltd. Information extraction method and apparatus
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110442777A (en) * 2019-06-24 2019-11-12 华中师范大学 Pseudo-linear filter model information search method and system based on BERT

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018076774A1 (en) * 2016-10-28 2018-05-03 Boe Technology Group Co., Ltd. Information extraction method and apparatus
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
CN110134772A (en) * 2019-04-18 2019-08-16 五邑大学 Medical text Relation extraction method based on pre-training model and fine tuning technology
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN110442777A (en) * 2019-06-24 2019-11-12 华中师范大学 Pseudo-linear filter model information search method and system based on BERT

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁龙 等: "基于预训练BERT字嵌入模型的领域实体识别", 《情报工程》, pages 1 - 10 *

Similar Documents

Publication Publication Date Title
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN109635296B (en) New word mining method, device computer equipment and storage medium
CN107577739B (en) Semi-supervised domain word mining and classifying method and equipment
CN111144079B (en) Method and device for intelligently acquiring learning resources, printer and storage medium
US20190164109A1 (en) Similarity Learning System and Similarity Learning Method
CN106960248B (en) Method and device for predicting user problems based on data driving
CN109165529B (en) Dark chain tampering detection method and device and computer readable storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN103824090A (en) Adaptive face low-level feature selection method and face attribute recognition method
WO2018171295A1 (en) Method and apparatus for tagging article, terminal, and computer readable storage medium
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN110941702A (en) Retrieval method and device for laws and regulations and laws and readable storage medium
CN108509561B (en) Post recruitment data screening method and system based on machine learning and storage medium
CN112036295A (en) Bill image processing method, bill image processing device, storage medium and electronic device
CN110910175A (en) Tourist ticket product portrait generation method
CN113590764B (en) Training sample construction method and device, electronic equipment and storage medium
Hardaya et al. Application of text mining for classification of community complaints and proposals
CN108021595A (en) Examine the method and device of knowledge base triple
CN109325096B (en) Knowledge resource search system based on knowledge resource classification
CN111798987A (en) Entity relationship extraction method and device
CN111708810A (en) Model optimization recommendation method and device and computer storage medium
Hansen et al. Temporal context for authorship attribution: a study of Danish secondary schools
CN115879463A (en) Course element recognition model training and recognition method based on text mining
CN111341404B (en) Electronic medical record data set analysis method and system based on ernie model
CN113468176B (en) Information input method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination