CN112926332A - Entity relationship joint extraction method and device - Google Patents
Entity relationship joint extraction method and device Download PDFInfo
- Publication number
- CN112926332A CN112926332A CN202110340031.1A CN202110340031A CN112926332A CN 112926332 A CN112926332 A CN 112926332A CN 202110340031 A CN202110340031 A CN 202110340031A CN 112926332 A CN112926332 A CN 112926332A
- Authority
- CN
- China
- Prior art keywords
- entity
- word
- type
- vector
- relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 101
- 239000013598 vector Substances 0.000 claims abstract description 226
- 238000000034 method Methods 0.000 claims abstract description 47
- 238000007781 pre-processing Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 10
- 230000007704 transition Effects 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 210000004185 liver Anatomy 0.000 description 29
- 230000006870 function Effects 0.000 description 21
- 230000015654 memory Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 230000036541 health Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000003860 storage Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000000658 coextraction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002792 vascular Effects 0.000 description 1
- 210000001835 viscera Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The document provides an entity relationship joint extraction method and device, wherein the method comprises the following steps: acquiring text data to be predicted; extracting text data to be predicted by using a pre-established entity relation joint extraction model, predicting to obtain the type of a word case interval and the relation type of an entity phrase, wherein the type of the word case interval comprises an entity type and a non-entity type, an entity word is the word case interval of the entity type, and the relation type of the entity phrase comprises a relation and a non-relation; the entity relationship joint extraction model is used for preprocessing text data; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase. Text semantic information is enriched by considering entity phrases and character vectors between entity words in the entity phrases, and all entity phrase relation types of complex text data can be accurately extracted.
Description
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for extracting entity relationships jointly.
Background
With the continuous development of medical informatization technology, extraction and structuring of useful information are urgently needed for a large amount of unstructured text data in health medical data such as health examination reports and electronic medical records, so that the data have greater value in practical application and production.
The extraction of the entity relationship of the medical data is a core task of extracting unstructured text information in the medical field and constructing a knowledge map in the health medical field. In the prior art, there are two main methods for extracting entity relationships: one method is to extract entity relations in a serial connection mode, namely named entity recognition is firstly carried out, related medical entities in a text are recognized, and then a classification method is utilized to obtain the relation between every two entities; the other method is an entity relationship combined extraction method, which can simultaneously identify medical entities in a text by using a model and judge the relationship type between every two entities.
For the existing serial connection method for extracting entity relationship, error transmission and accumulation can be caused, redundant information can be generated, and the effect is not ideal. For the existing entity relationship joint extraction method, although the effect of the existing entity relationship joint extraction method is obviously improved compared with the extraction method of the first series connection mode, context semantic information of characters between two entities is not considered, and for text data with complex text structures (such as complex text structures of entity word parallel, entity overlapping, relationship overlapping and the like) and more entities and relationships (up to hundreds), the effect of entity relationship extraction is still not ideal even with the assistance of some expert experiences.
Disclosure of Invention
The method is used for solving the problems that the influence of characters between entities on the entity relationship is not considered, text semantic information is not fully recognized, the recognition precision is poor, and the method is not suitable for scenes with complex entity relationships (such as entity word parallel, entity word overlapping, relationship overlapping and the like) and more entity relationships in the prior art.
In order to solve the above technical problem, a first aspect of the present disclosure provides an entity relationship joint extraction method, including:
acquiring text data to be predicted;
extracting the text data to be predicted by using a pre-established entity relationship joint extraction model, predicting to obtain the type of a word case interval and the relationship type of an entity phrase, wherein the type of the word case interval comprises an entity type and a non-entity type, an entity word is the word case interval of the entity type, and the relationship type of the entity phrase comprises a relationship and a non-relationship;
the entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
In a further embodiment herein, the entity relationship joint extraction method further includes:
and filtering the relationship type of the entity phrase obtained by prediction according to the allowable relationship constraint dictionary in the field to which the text data to be predicted belongs.
In a further embodiment herein, the entity-relationship joint extraction model comprises: the device comprises a preprocessing module and a classification module, wherein the classification module comprises an embedding layer, a first classifier, a transition layer and a second classifier;
the preprocessing module is used for preprocessing the text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors;
the embedded layer is connected with the preprocessing module and used for constructing a first vector according to information obtained by preprocessing;
the first classifier is connected with the embedded layer, and the type of the word case interval is obtained through prediction according to the first vector;
the transition layer is connected with the first classifier and the second classifier and used for screening out word case intervals of entity types to obtain entity words; splicing an entity phrase formed by every two entity words and character vectors between the entity words in the entity phrase into a second vector;
and the second classifier is used for predicting the relationship type of the entity phrase according to the second vector.
In a further embodiment of this document, the preprocessing module processes the text data to obtain a word case interval, a word case interval vector, a word case interval length vector, and a text vector, and includes:
performing word segmentation/word segmentation processing on the text data to obtain a word case list;
processing the word case list by using a BERT pre-training model to obtain a text vector and word case vectors corresponding to all word cases;
acquiring a word example interval according to the word example list and a preset sliding window;
the word case vectors contained in the word case interval are subjected to a fusion function to obtain a word case interval vector;
and acquiring a word example interval length vector according to the length of the word example interval.
In a further embodiment herein, constructing a first vector based on the preprocessed information comprises:
and splicing the word case interval vector, or the word case interval vector and the text vector, or the word case interval vector and the word case interval length vector, or the word case interval vector, the word case interval length vector and the text vector into a first vector.
In further embodiments herein, the first classifier comprises: the first classification function unit is used for outputting a probability vector of a word case interval type, and the first judgment unit is used for determining the type of the word case interval according to the probability vector of the word case interval type;
the second classifier includes: a second classification function unit and a second judgment unit; the second classification function unit is used for outputting the probability vector of the relationship type of the entity phrase, and the second judgment unit is used for determining the relationship type of the entity phrase according to the probability vector of the relationship type of the entity phrase.
In a further embodiment herein, the entity-relationship joint extraction model is trained by:
preprocessing the training text data by utilizing the preprocessing module to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors;
acquiring entity types of the word case intervals obtained by labeling and the association relation of entity phrases;
constructing a first vector according to the information obtained by preprocessing;
inputting the first vector into the first classifier, and predicting the type of a word case interval;
screening out word case intervals of entity types to obtain entity words, and splicing entity word groups formed by every two entity words and vectors formed by characters between the entity word groups into second vectors;
inputting the second vector into the second classifier, and predicting to obtain a relation type of the entity phrase;
and training parameters in the entity relation joint extraction model according to the entity type of the word case interval and the relation type of the entity phrase obtained by prediction, and the entity type of the word case interval and the relation type of the entity phrase obtained by labeling.
In a further embodiment herein, before constructing the first vector according to the preprocessed information, the method further includes:
comparing the word case interval with the entity words marked in advance, if one word case interval is the same as one of the entity words marked in advance, the word case interval is an entity positive sample case, otherwise, the word case interval is an entity negative sample case;
and sampling the entity load according to the first preset value.
In a further embodiment herein, before determining the second vector, the method further comprises:
judging whether each entity phrase accords with the entity relationship labeled in advance; if an entity phrase accords with the entity relationship labeled in advance, the entity phrase is a relationship positive sample, otherwise, the entity phrase is a relationship negative sample;
and sampling the relation negative sample according to a second preset value.
The second aspect of the present disclosure also provides an entity relationship joint extraction apparatus, including:
the receiving module is used for acquiring text data to be predicted;
the extraction module is used for extracting the text data to be predicted by utilizing a pre-established entity relationship joint extraction model, predicting to obtain the type of a word case interval and the relationship type of an entity phrase, wherein the type of the word case interval comprises an entity type and a non-entity type, an entity word is the word case interval of the entity type, and the relationship type of the entity phrase comprises a relationship and a non-relationship;
the entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
According to the entity relationship joint extraction method and device, the recognition of the non-entity data and the non-relationship data is added, so that the entity relationship joint extraction model extracts the data of the non-entity type and the non-relationship type, and the recognition accuracy of different data types can be improved. When the entity phrase relationship type is identified, text semantic information is enriched by considering the character vectors between entity words, all entity phrase relationship types of complex text data can be accurately extracted, and further structuralization is carried out according to the obtained entity words and the obtained entity phrase relationship types, so that useful information is extracted.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating a structure of an entity-relationship joint extraction model according to an embodiment of the present disclosure;
FIG. 2 is a first flowchart illustrating a process of training a joint extraction model of entity relationships according to an embodiment herein;
FIG. 3 is a second flowchart illustrating the training process of the entity-relationship joint extraction model according to the embodiment herein;
FIG. 4 is a third flowchart of the entity-relationship joint extraction model training process according to the embodiment of the present disclosure;
FIG. 5 is a first flowchart of a method for entity relationship co-extraction according to an embodiment of the present disclosure;
FIG. 6 is a second flowchart illustrating a method for federated abstraction of entity relationships according to an embodiment herein;
FIG. 7 shows a first block diagram of a physical relationship federation extraction mechanism according to embodiments herein;
FIG. 8 is a second block diagram of an entity relationship joint extraction apparatus according to an embodiment of the present disclosure;
FIG. 9 is a block diagram illustrating a computer device according to an embodiment herein;
FIG. 10 is a diagram illustrating a labeling result of a word example interval according to an embodiment of the present disclosure.
Description of the symbols of the drawings:
110. a preprocessing module;
120. a classification module;
121. an embedding layer;
122. a first classifier;
123. a transition layer;
124. a second classifier;
710. a receiving module;
720. an extraction module;
730. a filtration module;
902. a computer device;
904. a processor;
906. a memory;
908. a drive mechanism;
910. an input/output module;
912. an input device;
914. an output device;
916. a presentation device;
918. a graphical user interface;
920. a network interface;
922. a communication link;
924. a communication bus.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection.
The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual system or apparatus product executes, it can execute sequentially or in parallel according to the method shown in the embodiment or the figures.
The entity relationship joint extraction method provided by the invention is suitable for identification of health medical text data (including but not limited to health examination reports, electronic medical records and the like), and can also be applied to identification of text data in other professional fields (such as legal documents and the like), and the cited field is not particularly limited.
The term "entity" as used herein refers to a word or phrase having a descriptive meaning, typically a name of a person, a place, an organization, a product, or a domain having a meaning, such as a disease, a drug, a name of an organism, or a proprietary vocabulary related to law.
The term relationship of entities as used herein refers to the relationship of different entities to each other. Entities are not independent from one another and often have certain association.
Although the entity relation joint extraction method in the prior art can identify entity words and entity phrase relation types to a certain extent, the existing entity relation joint extraction method cannot realize entity phrase identification of non-entity words and non-relation types, does not consider the influence of characters between entities on entity relations, does not fully identify text semantic information, has poor identification precision, and is not suitable for scenes with complex entity relations (such as entity word parallel, entity word overlapping, relation overlapping and the like).
In order to solve the technical problems, a novel entity relationship joint extraction model is established in advance, and is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase. The type of the word case interval comprises an entity type and a non-entity type, the entity word is the word case interval of the entity type, the word case interval of the non-entity type is a non-entity word, and the relationship type of the entity phrase comprises a relationship type and a non-relationship type.
Specifically, as shown in fig. 1, the entity-relationship joint extraction model includes: a preprocessing module 110 and a classification module 120, wherein the classification module 120 includes: an embedding layer 121, a first classifier 122, a transition layer 123, and a second classifier 124.
The preprocessing module 110 is configured to preprocess the text data to obtain a word case interval, a word case interval vector, a word case interval length vector, and a text vector;
the embedded layer 121 is connected to the preprocessing module, and is configured to construct a first vector according to information obtained by preprocessing;
the first classifier 122 is connected with the embedded layer 121, and predicts the type of the word case interval according to the first vector;
the transition layer 123 is connected to the first classifier 122 and the second classifier 124, and is configured to screen out word case intervals of the entity type to obtain entity words; splicing an entity phrase formed by every two entity words and character vectors among the entity words in the entity phrase into a second vector;
the second classifier 124 is configured to predict a relationship type of the entity phrase according to the second vector, where the relationship type of the entity phrase includes a relationship and a non-relationship.
The entity relation extraction model provided by the text comprises learning of non-entities and non-relation types, and can improve identification precision. In addition, when the entity relation extraction model identifies the entity phrase relation types, text semantic information is enriched by considering the character vectors between entity words, so that the semantic information expression is richer, and all entity phrase relation types of complex text data can be accurately extracted.
In detail, the term intervals described herein are determined as follows: performing word segmentation/word segmentation processing on the text data to obtain a word example (token) list; and acquiring a word example interval according to the word example list and a preset sliding window. The word case list refers to a list of words (characters) corresponding to the text data, for example, the text data "liver size is normal", and the corresponding word case list is "liver size is normal". The word case interval comprises an entity positive example (a word conforming to the pre-labeled entity word, namely, a word case interval of the entity type) and an entity negative example (a word except the pre-labeled entity word, namely, a word case interval of the non-entity type), the words in the word case list can be divided comprehensively, the preset sliding window can be determined according to the maximum word number of the entity word in the text, and the specific value of the text is not limited. Assuming that the preset sliding window size is 3, the word case interval corresponding to the word case list "liver size is normal" includes: the "liver", "liver big", "viscera big", "small and normal", "normal", and "normal".
The term interval vector described herein is a vector composed of all the participles/participles included in the term interval. For example, if the word case interval is "liver large", the corresponding word case vector is the word case vector corresponding to the word case { "liver", "large" } according to the word segmentation, and the corresponding word case vector is the word case vector corresponding to the word case { "liver", "dirty", "large" } according to the word segmentation. In specific implementation, the word case vectors included in the word case interval are subjected to a fusion function (such as a maximum pooling method) to obtain the word case interval vector. The term interval length vector described herein is mainly used to represent text length information corresponding to a term interval. In specific implementation, according to the length information of the case interval, a case interval length vector can be obtained, the case interval length vector can be a vector with parameters initialized randomly and fixed length, and each parameter in the vector can be learned through model training.
The text vector described herein is a vector representation corresponding to text data, and may be obtained by processing a list of word cases using a pre-training model such as BERT (bidirectional Encoder retrieval from transforms), which includes but is not limited to BERT. The implementation of the pre-training model such as BERT can be referred to the prior art, and is not described in detail herein.
The entity phrase described herein refers to a phrase consisting of any two entity words in all entity words. In specific implementation, the second vector is formed according to the sequence of the first entity word, the text vector between the first entity word and the second entity word, and the second entity word.
The number of the first vectors constructed by the embedding layer 121 is the same as the number of the case intervals, for example, if the case intervals are 100, the number of the case intervals corresponds to 100 first vectors.
If the processing capability of the first classifier 122 is large, the first vector constructed by the embedding layer 121 may be input into the first classifier 122 at a time, and if the processing capability of the first classifier 122 is small, the first vector constructed by the embedding layer 121 may be input into the first classifier 122 in batches.
The second vector and the number vector of the entity phrase constructed by the transition layer 123 are specifically the following vectors(n-1) Wherein n is the number of entity words.
If the processing capacity of the second classifier 124 is large, the second vector constructed by the transition layer 123 may be input into the second classifier 124 once, and if the processing capacity of the second classifier 124 is small, the second vector constructed by the transition layer 123 may be input into the second classifier 124 in batches.
In specific implementation, for different application scenes, an entity relationship joint extraction model can be established in advance according to the training text data in the corresponding application scene, and then the established entity relationship joint extraction model is utilized to predict the entity relationship.
In an embodiment herein, constructing the first vector according to the preprocessed information may be performed in one of the following manners:
(1) splicing the word case interval vectors into a first vector;
(2) splicing the word case interval vector and the text vector into a first vector;
(3) splicing the word case interval vector and the word case interval length vector into a first vector, wherein the word case interval length vector is output by the preprocessing module, the word case interval length vector is used for limiting the length of a solid word, for example, the length of the solid word is 8 characters, 10 characters and the like, the word case interval length vector can be a vector with a fixed length and parameters in the vector can be learned through model training;
(4) and splicing the word case interval vector, the word case interval length vector and the text vector into a first vector.
Preferably, in order to enrich text semantic information and improve the recognition accuracy of the word case interval type, the first vector is determined by adopting the (4) th mode.
In an embodiment of this document, the step of splicing an entity phrase composed of every two entity words and the text vectors between the entity words in the entity phrase into a second vector further includes:
the word vector is formed by splicing a first entity word vector (namely a word case interval vector) and a first entity word length vector (namely the word case interval length vector) in an entity word group, a character vector consisting of characters between two entity words, a second entity word vector (namely the word case interval vector) and a second entity word length vector (namely the word case interval length vector) in the two entity words front and back.
In practical implementation, if the context between the first entity word and the second entity word is empty, the text vector between the first entity word and the second entity word is represented by a zero vector.
In specific implementation, in order to further enrich text semantic information, a text vector can be added to the second vector.
In one embodiment, to identify an entity type and a non-entity type between word examples, the first classifier 122 includes: a first classification function unit and a first judgment unit. The first classification function unit is configured to output a probability vector of a case interval type, which may adopt a softmax function, where the case interval type includes an entity type and a non-entity type, for example, the entity type includes 5 types, and 1 type of the non-entity type is added, and the softmax function unit outputs a probability vector including 6 elements, for example, { entity type 1 probability, entity type 2 probability, entity type 3 probability, entity type 4 probability, entity type 5 probability, and non-entity type probability }, where a sum of all probabilities in the probability vector is 1.
The first determining unit is configured to determine a type of the token interval according to the probability vector of the type of the token interval, and in a specific implementation, the type of the token interval corresponding to the type of the token interval may be obtained according to a maximum probability value in the probability vector output by the softmax function unit, for example, if the probability vector output by the softmax function unit is { entity type 1 probability 0.5, entity type 2 probability 0.25, entity type 3 probability 0.1, entity type 4 probability 0.1, entity type 5 probability 0.05, and non-entity type probability 0}, the type of the token interval determined by the first determining unit is entity type 1.
In one embodiment, to realize the identification of the relationship type between the entity words, the second classifier includes a second classification function unit and a second judgment unit. The second classification function unit is used for outputting probability vectors of relationship types of entity word groups, and in order to solve the problem that a plurality of relationships may exist between two same entity words, for example, in "jijlun composition singing" qilixiang ", 2 relationships exist: (Qilix, singer, Zhougelon) and (Qilix, composer, Zhougelon), for example, sigmoid functions may be employed. The relationship type of the entity phrase comprises a relationship type and a non-relationship type, the relationship type probability and the non-relationship type probability are relatively independent, and the sum is not 1. The second judging unit is used for determining the relationship type of the entity phrase according to the probability vector of the relationship type of the entity phrase. In specific implementation, the relationship type of the entity phrase can be determined according to a preset probability upper limit.
In an embodiment of the present disclosure, as shown in fig. 2, the entity-relationship joint extraction model shown in fig. 1 is trained as follows:
Specifically, the training text data described herein is data generated historically, and taking analysis of health medical text data as an example, the training text data is health medical text data generated historically, and the health medical text data may be acquired from a hospital, a physical examination institution or a patient, and the acquisition manner of the training text data herein is not limited. In specific implementation, the training text can be preprocessed into a training set, a verification set and a test set according to a certain proportion. The training set is used for training the entity-relationship joint extraction model, the verification set is used for evaluating and adjusting parameters in the entity-relationship joint extraction model, and the test set is used for testing the generalization capability of the entity-relationship joint extraction model.
The step 210 performs a process including: performing word segmentation/word segmentation processing on the training text data to obtain a word case list; processing the word case list by using a BERT pre-training model to obtain a text vector and word case vectors corresponding to all word cases; acquiring a word example interval according to the word example list and a preset sliding window; and the word case vectors contained in the word case interval are subjected to a fusion function to obtain the word case interval vector.
The entity type of the word-case interval marked in the step 220 and the association relationship of the entity phrase may be manually implemented, and in specific implementation, a marking person may mark the word-case interval generated in the step 210 or directly analyze and mark the training text data.
In step 230, the first vector may be constructed in one of the following ways: the word case interval vectors are spliced into a first vector, the word case interval vectors and the text vectors are spliced into a first vector, the word case interval vectors and the word case interval length vectors are spliced into a first vector, and the word case interval vectors, the word case interval length vectors and the text vectors are spliced into a first vector. The word case interval vector and the word case interval length vector contain parameters to be adjusted in the training stage and are not fixed values.
The parameters in the first classifier used in step 240 are parameters to be adjusted in the training stage, and are not fixed values.
When the above step 260 is implemented, the text vector and the word case interval length vector corresponding to each entity word in the entity word group are also added to the second vector.
The parameters in the second classifier used in step 270 are the parameters to be adjusted in the training phase.
The step 280 may be implemented as follows: establishing a loss function L according to the entity type of the word case interval and the relation type of the entity phrase obtained by prediction, the entity type of the word case interval and the relation type of the entity phrase obtained by labeling, wherein the loss function L comprises two parts, and a first classifier loses L1(cross entropy is used here) and second classifier penalty L2(binary cross entropy is adopted here), when the loss function is smaller, the accuracy of the model is higher, and the model can better extract entity relationship combination in the text, which is defined as follows:
wherein, y1Is the entity type of the annotated interval of the word case,is the entity type of the word case interval obtained by prediction, y2Is the relationship type of the labeled entity phrase,is the relationship type of the entity phrase obtained by prediction, and lambda is a parameter.
In specific implementation, whether the entity type of the predicted word case interval and the relation type of the entity phrase approach the entity type of the labeled word case interval and the relation type of the entity phrase infinitely or not is judged through the loss function, if not, different parameter adjusting step lengths can be set for the preprocessing module, the first classifier and the second classifier through the step length adjusting parameters of the preprocessing module, the first classifier and the second classifier.
In an embodiment of this document, as shown in fig. 3, before the step 230 constructs the first vector according to the preprocessed information, the method further includes:
For example, the training text data is "liver size is normal", the pre-labeled entity words are "liver", "size", and "normal", and when the sliding window size is 3, the word example intervals are "liver", "liver size", "dirty size", "large", "size" plus "," small plus "," normal ", and" normal ". By comparing the word case intervals with the pre-labeled entity words, it can be determined that the entity positive examples include "liver", "size", and "normal", and the entity negative examples include "liver", "liver large", "dirty large", "dirty size", "large", "size positive", "small positive", "small normal", "positive", "normal", and "normal".
As can be seen from the above step, there are more entity load examples, so that in order to reduce the calculation amount of the model, step 222 is added, and entity load examples not exceeding the first preset value can be obtained by performing random sampling on the entity load examples, and the entity types corresponding to the entity load examples are non-entity types.
It should be noted here that the sampling process for the entity negative examples only exists in the model training process, and all the word example intervals need to be reserved in the model prediction process to determine all possible entity words.
In the embodiment, the number of the entity load examples is considered to be large, and the calculation rate of the entity relationship extraction model can be improved by randomly sampling the entity load examples.
In a further embodiment herein, as shown in fig. 4, before the step 260 of determining the second vector, the method further includes:
For example, the training text data is "liver size is normal", and the pre-labeled entity words are "liver" (type is part), "size" (type is attribute), and "normal" (type is non-numeric result). The relationship types of the pre-labeled entity phrases are (size, modified, liver), (normal, modified, size), and the entity phrases include (size, liver), (normal, size), (liver, normal), (size, normal), (normal, liver). The positive examples of the analyzed relationship include (size, modified, liver), (normal, modified, size). Examples of negative relational examples include (liver, irrelevant, size), (liver, irrelevant, normal), (size, irrelevant, normal), (normal, irrelevant, liver)).
The second preset value may be the same as or different from the first preset value, and may be determined according to the calculation force of the computing device in the specific implementation, which is not specifically limited herein.
In the embodiment, the relationship negative examples not exceeding the second preset value are obtained by randomly sampling the relationship negative examples, and the relationship types corresponding to the relationship negative examples are defined as "non-relationship" types, so that the calculation amount of the entity-relationship joint extraction model can be reduced.
It should be noted here that the sampling process for the relationship negative examples only exists in the model training process, and all entity phrases need to be retained in the model prediction process to determine all entity phrases that may exist in the relationship.
After the entity relationship joint extraction model is established and obtained through the foregoing embodiment, the entity relationship joint extraction model can be used for identifying an entity relationship, and specifically, as shown in fig. 5, the entity relationship joint extraction method includes:
and 520, extracting the text data to be predicted by using a pre-established entity relation joint extraction model, and predicting to obtain the type of the word case interval and the relation type of the entity phrase. The types of the word case intervals comprise entity types and non-entity types, the entity words are the word case intervals of the entity types, and the relationship types of the entity word groups comprise relationships and non-relationships.
The entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
In detail, the text data to be predicted is the text data generated in the entity relationship joint extraction model adaptation field. The structure, training process, etc. of the entity-relationship joint extraction model are referred to the foregoing embodiments, and will not be described in detail here. The type of the word example interval and the relationship type of the entity phrase can be output in a list form.
For example, the text data to be predicted is: the liver has normal shape and size, even distribution of substantial echoes, clear vascular texture, no echoes in the right lobe of the liver, the size of about 24 x 18mm, and clear boundary. The manual labeling result is shown in fig. 10, and the labeled entity words include: liver, shape, size, normal, essence, echo, distribution, uniformity, blood vessel, texture, clarity, liver, right lobe, sight, no echo, size, 24 x 18mm, boundary, clear, different line block diagrams in the figure correspond to entity words of different entity types, and the relationship exists between the interconnected entity words.
The prediction result of the entity relationship joint extraction model provided by the text is shown as the following output result, wherein entites represents entity words, end represents the end positions of the entity words in the text, id represents the number of the entity words, start represents the starting positions of the entity words in the text, type is the entity word type and the relationship type of the entity word groups, and word is the entity word characters. Relationships represent the type of relationships between entity words, and head and tail represent id numbers corresponding to a first entity word and a second entity word in the entity phrase.
The following output results show that the entity relation joint extraction model prediction result is the same as the manual labeling result, so that the prediction of the type of the word case interval and the relation type of the entity phrase can be accurately realized.
In the embodiment, the identification of the non-entity type data and the non-relationship type data is added, so that the entity-relationship joint extraction model extracts the data of the non-entity type and the non-relationship type, and the accuracy of identification of different data types can be improved. When the entity phrase relationship type is identified, text semantic information is enriched by considering entity words and character vectors between the entity words, all entity phrase relationship types of complex text data can be accurately extracted, structuring is carried out according to the obtained entity words and the obtained entity phrase relationship types, and useful information extraction is completed.
In a further embodiment of this document, as shown in fig. 6, the entity relationship joint extraction method further includes, in addition to the above steps 510 and 520:
and 530, filtering the relationship type of the entity phrase obtained by prediction according to the allowable relationship constraint dictionary in the field to which the text data to be predicted belongs.
In detail, the allowable relation constraint dictionary in the field to which the text data to be predicted belongs is determined by a person skilled in the art, for example, an entity of an attribute type cannot modify an entity of a numerical type, and the like, which is not specifically limited herein.
In the embodiment, the recognition result of the entity relationship joint extraction model is combined with the allowable relationship constraint dictionary of the field to which the text data to be predicted belongs, and the entity phrases which do not conform to the allowable relationship constraint can be filtered out through the allowable relationship constraint dictionary of the field to which the text data to be predicted belongs, so that the entity relationship extraction effect can be further improved.
Based on the same inventive concept, an entity-relationship joint extraction device is also provided, as described in the following embodiments. Because the principle of solving the problem of the entity relationship joint extraction device is similar to that of the entity relationship joint extraction method, the entity relationship joint extraction device can be implemented by referring to the entity relationship joint extraction method, and repeated parts are not described again.
The entity relationship joint extraction device provided in this embodiment includes a plurality of functional modules, which may be implemented by dedicated or general chips, and may also be implemented by software programs, which are not limited herein.
Specifically, as shown in fig. 7, the entity relationship joint extraction device includes:
a receiving module 710, configured to obtain text data to be predicted;
an extraction module 720, configured to extract the text data to be predicted by using a pre-established entity-relationship joint extraction model, and predict to obtain a type of a word-case interval and a relationship type of an entity phrase, where the type of the word-case interval includes an entity type and a non-entity type, an entity word is a word-case interval of the entity type, and the relationship type of the entity phrase includes a relationship and a non-relationship;
the entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
Further, as shown in fig. 8, the entity-relationship joint extraction apparatus further includes:
and the filtering module 730 is configured to filter the relationship type of the entity phrase obtained by prediction according to the allowable relationship constraint dictionary in the field to which the text data to be predicted belongs.
Compared with the prior art, the entity relationship joint extraction method and the entity relationship joint extraction device have the advantages that when a plurality of training text data are learned, the relationships between the entities and the entities are subjected to joint classification learning, learning of 'non-entities' and 'non-relationship' categories is added in entity classification and relationship classification, random sampling of non-entity samples and relationship negative samples is carried out in the text, calculation efficiency and model effect are considered, and the problem of text entity relationship extraction with more entities and relationships can be solved well.
And the text vector and the context vector between the entity words are respectively added in the calculation of the first vector and the second vector, so that the text semantic information is enriched, and the entity relation extraction under the complex text structure (such as entity word parallel, entity overlapping, relation overlapping and the like) can be well processed.
And finally, filtering entity relationship combinations which do not conform to the allowable relationship constraint based on the allowable relationship constraint dictionary in the field added after model prediction, thereby further improving the extraction effect of the entity relationship.
In an embodiment herein, the entity relationship joint extraction model training process and the entity relationship prediction process described above may be implemented by a computer device, and specifically, as shown in fig. 9, the computer device 902 may include one or more processors 904, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 902 may further comprise any memory 906 for storing any kind of information, such as code, settings, data, etc., some embodiments summarize that a computer program is stored in the memory 906, which computer program, when executed by the processor 904 of the computer device, performs the entity-relationship joint extraction method or the training method of the entity-relationship joint extraction model as described in the previous embodiments. For example, and without limitation, memory 906 may include any one or more of the following in combination: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 902. In one case, when the processor 904 executes the associated instructions, which are stored in any memory or combination of memories, the computer device 902 can perform any of the operations of the associated instructions. The computer device 902 also includes one or more drive mechanisms 908, such as a hard disk drive mechanism, an optical disk drive mechanism, etc., for interacting with any memory.
Corresponding to the methods in fig. 2-6, the embodiments herein also provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the above-described method.
Embodiments herein also provide computer readable instructions, wherein when executed by a processor, a program thereof causes the processor to perform the method as shown in fig. 2-6.
It should be understood that, in various embodiments herein, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments herein.
It should also be understood that, in the embodiments herein, the term "and/or" is only one kind of association relation describing an associated object, meaning that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided herein, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purposes of the embodiments herein.
In addition, functional units in the embodiments herein may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present invention may be implemented in a form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The principles and embodiments of this document are explained herein using specific examples, which are presented only to aid in understanding the methods and their core concepts; meanwhile, for the general technical personnel in the field, according to the idea of this document, there may be changes in the concrete implementation and the application scope, in summary, this description should not be understood as the limitation of this document.
Claims (10)
1. An entity relationship joint extraction method is characterized by comprising the following steps:
acquiring text data to be predicted;
extracting the text data to be predicted by using a pre-established entity relationship joint extraction model, predicting to obtain the type of a word case interval and the relationship type of an entity phrase, wherein the type of the word case interval comprises an entity type and a non-entity type, an entity word is the word case interval of the entity type, and the relationship type of the entity phrase comprises a relationship and a non-relationship;
the entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
2. The entity relationship joint extraction method of claim 1, further comprising:
and filtering the relationship type of the entity phrase obtained by prediction according to the allowable relationship constraint dictionary in the field to which the text data to be predicted belongs.
3. The entity-relationship joint extraction method according to claim 1, wherein the entity-relationship joint extraction model comprises: the device comprises a preprocessing module and a classification module, wherein the classification module comprises an embedding layer, a first classifier, a transition layer and a second classifier;
the preprocessing module is used for preprocessing the text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors;
the embedded layer is connected with the preprocessing module and used for constructing a first vector according to information obtained by preprocessing;
the first classifier is connected with the embedded layer, and the type of the word case interval is obtained through prediction according to the first vector;
the transition layer is connected with the first classifier and the second classifier and used for screening out word case intervals of entity types to obtain entity words; splicing an entity phrase formed by every two entity words and character vectors between the entity words in the entity phrase into a second vector;
and the second classifier is used for predicting the relationship type of the entity phrase according to the second vector.
4. The entity relationship joint extraction method as claimed in claim 3, wherein the preprocessing module processes the text data to obtain a word case interval, a word case interval vector, a word case interval length vector and a text vector, and comprises:
performing word segmentation/word segmentation processing on the text data to obtain a word case list;
processing the word case list by using a BERT pre-training model to obtain a text vector and word case vectors corresponding to all word cases;
acquiring a word example interval according to the word example list and a preset sliding window;
the word case vectors contained in the word case interval are subjected to a fusion function to obtain a word case interval vector;
and acquiring a word example interval length vector according to the length of the word example interval.
5. The entity-relationship joint extraction method as claimed in claim 3, wherein constructing the first vector according to the preprocessed information comprises:
and splicing the word case interval vector, or the word case interval vector and the text vector, or the word case interval vector and the word case interval length vector, or the word case interval vector, the word case interval length vector and the text vector into a first vector.
6. The entity relationship joint extraction method of claim 3, wherein the first classifier comprises: the first classification function unit is used for outputting a probability vector of a word case interval type, and the first judgment unit is used for determining the type of the word case interval according to the probability vector of the word case interval type;
the second classifier includes: a second classification function unit and a second judgment unit; the second classification function unit is used for outputting the probability vector of the relationship type of the entity phrase, and the second judgment unit is used for determining the relationship type of the entity phrase according to the probability vector of the relationship type of the entity phrase.
7. The entity-relationship joint extraction method according to claim 3, wherein the entity-relationship joint extraction model is trained by:
preprocessing the training text data by utilizing the preprocessing module to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors;
acquiring entity types of the word case intervals obtained by labeling and the association relation of entity phrases;
constructing a first vector according to the information obtained by preprocessing;
inputting the first vector into the first classifier, and predicting the type of a word case interval;
screening out word case intervals of entity types to obtain entity words, and splicing an entity phrase formed by every two entity words and character vectors among the entity words in the entity phrase into a second vector;
inputting the second vector into the second classifier, and predicting to obtain a relation type of the entity phrase;
and training parameters in the entity relation joint extraction model according to the entity type of the word case interval and the relation type of the entity phrase obtained by prediction, and the entity type of the word case interval and the relation type of the entity phrase obtained by labeling.
8. The entity-relationship joint extraction method as claimed in claim 7, wherein before constructing the first vector according to the preprocessed information, the method further comprises:
comparing the word case interval with the entity words marked in advance, if one word case interval is the same as one of the entity words marked in advance, the word case interval is an entity positive sample case, otherwise, the word case interval is an entity negative sample case;
and sampling the entity load example according to a first preset value.
9. The entity-relationship joint extraction method of claim 7, wherein before determining the second vector, further comprising:
judging whether each entity phrase accords with the entity relationship labeled in advance; if an entity phrase accords with the entity relationship labeled in advance, the entity phrase is a relationship positive sample, otherwise, the entity phrase is a relationship negative sample;
and sampling the relation negative sample according to a second preset value.
10. An entity-relationship joint extraction device, comprising:
the receiving module is used for acquiring text data to be predicted;
the extraction module is used for extracting the text data to be predicted by utilizing a pre-established entity relationship joint extraction model, predicting to obtain the type of a word case interval and the relationship type of an entity phrase, wherein the type of the word case interval comprises an entity type and a non-entity type, an entity word is the word case interval of the entity type, and the relationship type of the entity phrase comprises a relationship and a non-relationship;
the entity relation joint extraction model is used for preprocessing text data to obtain word case intervals, word case interval vectors, word case interval length vectors and text vectors; predicting the type of the word case interval according to the information obtained by preprocessing; and predicting to obtain the relation type of the entity phrase according to the entity phrase and the character vector between the entity words in the entity phrase.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110340031.1A CN112926332A (en) | 2021-03-30 | 2021-03-30 | Entity relationship joint extraction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110340031.1A CN112926332A (en) | 2021-03-30 | 2021-03-30 | Entity relationship joint extraction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112926332A true CN112926332A (en) | 2021-06-08 |
Family
ID=76176538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110340031.1A Pending CN112926332A (en) | 2021-03-30 | 2021-03-30 | Entity relationship joint extraction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112926332A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779999A (en) * | 2021-11-12 | 2021-12-10 | 航天宏康智能科技(北京)有限公司 | Named entity recognition method and named entity recognition device |
CN114817562A (en) * | 2022-04-26 | 2022-07-29 | 马上消费金融股份有限公司 | Knowledge graph construction method, knowledge graph training method, information recommendation method and information recommendation device |
CN116631642A (en) * | 2023-07-24 | 2023-08-22 | 北京惠每云科技有限公司 | Extraction method and device for clinical discovery event |
CN116975299A (en) * | 2023-09-22 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Text data discrimination method, device, equipment and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107957991A (en) * | 2017-12-05 | 2018-04-24 | 湖南星汉数智科技有限公司 | A kind of entity attribute information extraction method and device relied on based on syntax |
CN110196978A (en) * | 2019-06-04 | 2019-09-03 | 重庆大学 | A kind of entity relation extraction method for paying close attention to conjunctive word |
CN110413796A (en) * | 2019-07-03 | 2019-11-05 | 北京信息科技大学 | A kind of coal mine typical power disaster Methodologies for Building Domain Ontology |
CN110704576A (en) * | 2019-09-30 | 2020-01-17 | 北京邮电大学 | Text-based entity relationship extraction method and device |
CN111428493A (en) * | 2020-03-06 | 2020-07-17 | 中国平安人寿保险股份有限公司 | Entity relationship acquisition method, device, equipment and storage medium |
CN111639185A (en) * | 2020-06-04 | 2020-09-08 | 虎博网络技术(上海)有限公司 | Relationship information extraction method and device, electronic equipment and readable storage medium |
CN111832307A (en) * | 2020-07-09 | 2020-10-27 | 北京工业大学 | Entity relationship extraction method and system based on knowledge enhancement |
WO2021051871A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Text extraction method, apparatus, and device, and storage medium |
-
2021
- 2021-03-30 CN CN202110340031.1A patent/CN112926332A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107957991A (en) * | 2017-12-05 | 2018-04-24 | 湖南星汉数智科技有限公司 | A kind of entity attribute information extraction method and device relied on based on syntax |
CN110196978A (en) * | 2019-06-04 | 2019-09-03 | 重庆大学 | A kind of entity relation extraction method for paying close attention to conjunctive word |
CN110413796A (en) * | 2019-07-03 | 2019-11-05 | 北京信息科技大学 | A kind of coal mine typical power disaster Methodologies for Building Domain Ontology |
WO2021051871A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Text extraction method, apparatus, and device, and storage medium |
CN110704576A (en) * | 2019-09-30 | 2020-01-17 | 北京邮电大学 | Text-based entity relationship extraction method and device |
CN111428493A (en) * | 2020-03-06 | 2020-07-17 | 中国平安人寿保险股份有限公司 | Entity relationship acquisition method, device, equipment and storage medium |
CN111639185A (en) * | 2020-06-04 | 2020-09-08 | 虎博网络技术(上海)有限公司 | Relationship information extraction method and device, electronic equipment and readable storage medium |
CN111832307A (en) * | 2020-07-09 | 2020-10-27 | 北京工业大学 | Entity relationship extraction method and system based on knowledge enhancement |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779999A (en) * | 2021-11-12 | 2021-12-10 | 航天宏康智能科技(北京)有限公司 | Named entity recognition method and named entity recognition device |
CN114817562A (en) * | 2022-04-26 | 2022-07-29 | 马上消费金融股份有限公司 | Knowledge graph construction method, knowledge graph training method, information recommendation method and information recommendation device |
CN116631642A (en) * | 2023-07-24 | 2023-08-22 | 北京惠每云科技有限公司 | Extraction method and device for clinical discovery event |
CN116631642B (en) * | 2023-07-24 | 2023-11-03 | 北京惠每云科技有限公司 | Extraction method and device for clinical discovery event |
CN116975299A (en) * | 2023-09-22 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Text data discrimination method, device, equipment and medium |
CN116975299B (en) * | 2023-09-22 | 2024-05-28 | 腾讯科技(深圳)有限公司 | Text data discrimination method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112926332A (en) | Entity relationship joint extraction method and device | |
CN112015859A (en) | Text knowledge hierarchy extraction method and device, computer equipment and readable medium | |
Carchiolo et al. | Medical prescription classification: a NLP-based approach | |
CN110335653A (en) | Non-standard case history analytic method based on openEHR case history format | |
CN110532398B (en) | Automatic family map construction method based on multi-task joint neural network model | |
CN110750635B (en) | French recommendation method based on joint deep learning model | |
WO2017041651A1 (en) | User data classification method and device | |
CN113779260B (en) | Pre-training model-based domain map entity and relationship joint extraction method and system | |
CN112017744A (en) | Electronic case automatic generation method, device, equipment and storage medium | |
CN113297379A (en) | Text data multi-label classification method and device | |
CN111428513A (en) | False comment analysis method based on convolutional neural network | |
CN110969015B (en) | Automatic label identification method and equipment based on operation and maintenance script | |
CN116029306A (en) | Automatic scoring method for simple answers of limited domain literature | |
CN116150367A (en) | Emotion analysis method and system based on aspects | |
CN113450905A (en) | Medical auxiliary diagnosis system, method and computer readable storage medium | |
CN115146062A (en) | Intelligent event analysis method and system fusing expert recommendation and text clustering | |
CN114428860A (en) | Pre-hospital emergency case text recognition method and device, terminal and storage medium | |
CN112732910B (en) | Cross-task text emotion state evaluation method, system, device and medium | |
CN106815209B (en) | Uygur agricultural technical term identification method | |
CN114298314A (en) | Multi-granularity causal relationship reasoning method based on electronic medical record | |
CN112784601B (en) | Key information extraction method, device, electronic equipment and storage medium | |
CN114372532A (en) | Method, device, equipment, medium and product for determining label marking quality | |
CN109036506A (en) | Monitoring and managing method, electronic device and the readable storage medium storing program for executing of internet medical treatment interrogation | |
CN110765908A (en) | Cascade type cancer cell detection system based on deep learning | |
CN116415593A (en) | Research front identification method, system, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |