CN113468335A - Method and equipment for extracting entity implicit relationship - Google Patents
Method and equipment for extracting entity implicit relationship Download PDFInfo
- Publication number
- CN113468335A CN113468335A CN202010236475.6A CN202010236475A CN113468335A CN 113468335 A CN113468335 A CN 113468335A CN 202010236475 A CN202010236475 A CN 202010236475A CN 113468335 A CN113468335 A CN 113468335A
- Authority
- CN
- China
- Prior art keywords
- sample set
- entity
- model
- text
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000002372 labelling Methods 0.000 claims abstract description 98
- 238000012549 training Methods 0.000 claims abstract description 70
- 238000000605 extraction Methods 0.000 claims abstract description 44
- 238000003062 neural network model Methods 0.000 claims abstract description 28
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 9
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The embodiment of the invention provides an entity implicit relationship extraction method and equipment, wherein the method comprises the following steps: acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model; determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model; the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set. The embodiment of the invention can accurately extract the implicit relationship between the entities.
Description
Technical Field
The embodiment of the invention relates to the technical field of knowledge maps, in particular to a method and equipment for extracting an entity implicit relation.
Background
With the development of big data and artificial intelligence, knowledge maps are important components of artificial intelligence technology, and have been widely applied to the fields of finance, agriculture, e-commerce, medical electronics, transportation and the like due to the strong semantic processing, interconnection organization, information retrieval and knowledge reasoning capabilities. In essence, a knowledge graph is a huge semantic network graph, which describes various entities or concepts and their relationships existing in the real world by representing the entities or concepts by nodes and representing the relationships by edges.
Often, there is also an implicit relationship between entities. In the prior art, entities and relationships between the entities are usually extracted first, and then an implicit relationship between the two entities is obtained through relationship reasoning. For example, for the text "zhang san works at company a", the relationship between the entity "zhang san" and the entity "company a" is a working relationship; for the text "Liqu 2018, 6-month-entry company A", the relationship between the entity "Liqu" and the entity "company A" is a working relationship, and the implicit relationship between Zhang three "and" Liqu "is inferred to be a co-worker relationship.
However, the inventor found that, since there is a certain error in extracting the entities and the relationship between the entities, the accumulated error of the implicit relationship between the two entities obtained according to the error is large, which results in inaccurate results.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for extracting an entity implicit relationship, which can accurately extract the implicit relationship between entities.
In a first aspect, an embodiment of the present invention provides an entity implicit relationship extraction method, including:
acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model;
determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
As an embodiment of the present invention, the method further includes the step of obtaining a sample set with entity implicit relationship labels, as follows:
acquiring text data, and preprocessing the text data to obtain a sample set to be labeled;
sending the sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to perform implicit relationship labeling on entities with implicit relationships in the sample set to be labeled;
and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
As an embodiment of the present invention, the method further includes the step of training a neural network model according to the training sample set to obtain a sequence labeling model, as follows:
coding the training sample set according to a pre-training language model to obtain a coding vector;
and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
As an embodiment of the present invention, the pre-training language model is BERT language;
further comprising the step of pre-processing the sample set as follows:
adding [ CLS ] labels to the beginning of each sentence in the sample set, adding [ SEP ] labels to the end of each sentence, and connecting sentence pairs by using the [ SEP ] labels;
and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
As an embodiment of the present invention, the determining an entity implicit relationship of the text to be processed according to the output result of the sequence tagging model includes:
determining target entities with the same labeling labels according to the output result of the sequence labeling model;
and determining the implicit relation between the target entities according to the implicit relation corresponding to the label tag.
As an embodiment of the present invention, the training the BiLSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model includes:
training a BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model;
obtaining a test sample set;
testing the target sequence labeling model according to the test sample set;
and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
As an embodiment of the present invention, the method further includes:
and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
In a second aspect, an embodiment of the present invention provides an entity implicit relationship extraction apparatus, including:
the input module is used for acquiring a text to be processed and inputting the text to be processed into the sequence labeling model;
the determining module is used for determining the entity implicit relationship of the text to be processed according to the output result of the sequence marking model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
In a third aspect, an embodiment of the present invention provides an entity implicit relationship extraction device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method as described above in the first aspect and various possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to the first aspect and various possible implementations of the first aspect are implemented.
The method and the device for extracting the entity implicit relationship provided by the embodiment input a text to be processed into a sequence labeling model, the sequence labeling model outputs an entity implicit relationship labeling result of the text to be processed, and the entity implicit relationship of the text to be processed can be determined according to the labeling result, wherein the sequence labeling model is obtained by preprocessing a sample set with entity implicit relationship labels to obtain a training sample set and training a neural network model according to the training sample set. The embodiment of the invention can directly extract the implicit relationship between the entities, thereby improving the accuracy of the obtained implicit relationship, and the method is simple and convenient.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an entity implicit relationship extraction method according to an embodiment of the present invention;
FIG. 2 is a first flowchart illustrating an implementation of a method for extracting an entity implicit relationship according to an embodiment of the present invention;
FIG. 3 is a flowchart II illustrating an implementation of the method for extracting an entity implicit relationship according to an embodiment of the present invention;
fig. 4 is a third flow chart for implementing the method for extracting an entity implicit relationship according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of sample pre-processing provided by an embodiment of the present invention;
FIG. 6 is a fourth flowchart illustrating an implementation of the method for extracting an entity implicit relationship according to the embodiment of the present invention;
FIG. 7 is a frame diagram of a BiLSTM-CRF neural network model extraction implication relationship provided by the embodiment of the invention;
FIG. 8 is a fifth flowchart illustrating an implementation of the method for extracting an entity implicit relationship according to an embodiment of the present invention;
fig. 9 is a first schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention
Fig. 11 is a schematic diagram of a hardware structure of an entity implicit relationship extraction device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a knowledge graph, there are often implicit relationships between entities, which refer to relationships between entities and entities with which they are not directly associated. In the prior art, entities and relationships between the entities are usually extracted first, and then an implicit relationship between the two entities is obtained through relationship reasoning. For example, for the text "zhang san works at company a", the relationship between the entity "zhang san" and the entity "company a" is a working relationship; for the text "Liqu 2018, 6-month-entry company A", the relationship between the entity "Liqu" and the entity "company A" is a working relationship, and the implicit relationship between Zhang three "and" Liqu "is inferred to be a co-worker relationship.
However, since there is a certain error in extracting the entity and the relationship between the entities, the accumulated error of the implicit relationship between the two entities obtained according to the error is large, which results in inaccurate results.
The application provides an entity implicit relationship extraction method, which is characterized in that a text to be processed is input into a trained neural network model, an entity implicit relationship labeling result is directly output, and the entity implicit relationship can be obtained without relationship reasoning, so that the accuracy of the obtained entity implicit relationship is improved.
Fig. 1 is a schematic view of an application scenario of an entity implicit relationship extraction method according to an embodiment of the present invention. The entity relationship extraction method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 101 communicates with the server 103 via the network 102. The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 103 may be implemented by an independent server or a server cluster formed by a plurality of servers.
For example, the embodiment of the invention can be used for detecting cases by public security institutions. The server 103 stores a large amount of multi-source heterogeneous data, most of which are case information. The terminal 101 sends an implicit relationship extraction request and case information to the server 103, and after receiving the implicit relationship extraction request, the server 103 extracts the implicit relationship between entities from the case information and returns the extraction result to the terminal 101.
Fig. 2 is a first flowchart of an implementation of a method for extracting an entity implicit relationship according to an embodiment of the present invention, where the method is applied to the terminal in fig. 1, and may also be applied to a server, and the method is applied to the server for description, as shown in fig. 2, the method in this embodiment includes:
step S201, obtaining a text to be processed, inputting the text to be processed into a sequence labeling model, wherein the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set, and training a neural network model according to the training sample set.
In the embodiment of the invention, the text to be processed refers to the text needing entity implicit relation labeling. And the terminal sends an entity implicit relationship extraction request to the server, and the server generates a corresponding implicit relationship extraction task according to the entity implicit relationship extraction request. The server can also generate an implicit relationship extraction task according to the preconfigured information. The implicit relation extraction task comprises attribute information of the text to be processed. The text attribute information to be processed includes a text name, a text path, a text link, or the like. The server obtains the text to be processed according to the text name or the text path, the server can also crawl the corresponding text to be processed in a network according to the text link, and the terminal can also send the text to be processed to the server. The format of the acquired text to be processed includes, but is not limited to, data in one or more formats of TXT text, word text and PDF text.
In the embodiments of the present invention, an implicit relationship refers to a relationship between an entity and an entity that is not directly associated with the entity. For example, entity a has an implicit relationship with entity C if entity a has a relationship with entity B and entity B has a relationship with entity C. For another example, for the text "zhang san works in company a", the entity "zhang san" is in working relationship with the entity "company a", and for the text "lie si works in company a", the entity "lie si" is in working relationship with the entity "company a", and then the implicit relationship between "zhang san" and "lie si" is in a co-worker relationship.
The sequence labeling model can label the entities with implicit relations in the text to be processed. Obtaining a sequence labeling model through the following steps: and obtaining a sample set with entity implicit relation labels, preprocessing the sample set to obtain a training sample set, and training the neural network model according to the training sample set to obtain a sequence label model. The terminal sends a sample set with entity implicit relation labels to a server; or the terminal sends the sample name or the sample path of the sample set with the entity implicit relationship label to the server, and the server obtains the corresponding sample set from the database according to the sample name or the sample path. And the server performs format conversion on the sample set, converts the sample set into a data format required by the neural network model, and obtains a training sample set.
The server preprocesses the text to be processed, converts the format of the text to be processed into a data format required by the sequence labeling model, inputs the preprocessed text to be processed into the sequence labeling model, and the sequence labeling model outputs an entity implicit relation labeling result.
And S202, determining the entity implicit relation of the text to be processed according to the output result of the sequence annotation model.
In the embodiment of the invention, the output result of the sequence labeling model is the entity implicit relationship labeling result of the text to be processed, and the entity implicit relationship of the text to be processed can be determined according to the labeling result.
According to the embodiment of the invention, the text to be processed is input into the sequence marking model, the sequence marking model outputs the entity implicit relation marking result of the text to be processed, and the entity implicit relation of the text to be processed can be determined according to the marking result.
Fig. 3 is a second implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 2, the embodiment of the present invention further includes a step of obtaining a sample set with an entity implicit relationship label, and as shown in fig. 3, the method according to the embodiment of the present invention includes:
step S301, acquiring text data, and preprocessing the text data to obtain a sample set to be labeled.
In the embodiment of the present invention, the terminal transmits the text name, the text path, or the text link of the text data. The server can obtain corresponding text data in the database according to the text name or the text path. The server can also crawl corresponding text data in the network according to the text links. The format of the acquired text data includes, but is not limited to, data in one or more formats of TXT text, word text, PDF text.
The server preprocesses the text data and comprises the following steps: firstly, irrelevant words, sentences and punctuation marks are removed, traditional characters are converted into simplified characters and the like, and then text data is segmented according to the sentences to obtain a sample set to be labeled.
Step S302, a sample set to be labeled is sent to a labeling terminal, and the sample set to be labeled is used for indicating a target person to label an implicit relationship of an entity with the implicit relationship in the sample set to be labeled.
And step S303, receiving the implicit relationship labeling result fed back by the terminal to obtain a sample set with entity implicit relationship labeling.
In the embodiment of the invention, the server sends the sample set to be labeled to the labeling terminal, and the target person manually labels the implicit relationship of the entity with the implicit relationship in the sample set to be labeled through the labeling terminal. And after the labeling is finished, the labeling terminal feeds back the sample set with the entity implicit relation label to the server.
The implicit relationship may be manually labeled in a BIO labeling manner, where B represents the beginning of the entity or relationship, I represents a non-beginning portion of the entity or relationship, and O represents a portion that is not the entity or relationship. The labeling terminal can be a visual data labeling terminal reconstructed in a webpage form, and can also adopt the existing data labeling platform in the current market. The labeling terminal can label the text in the format of TXT, word, PDF, etc.
In the following, the description of the BIO implicit relationship labeling is given by taking original text data as case information data of the public security organization, and the case part text information is as follows:
zhang three, a man, 25 years old, went to Hotel A in 2019, 4, month 15, he worked at company B once, and worked at the research and development department for two years.
Lee, female, 32 years old, introduced into company B in 2016, 3 months, and worked to date at the research and development department.
Zhangli, female, 34 years old, enrollment C in 2017, 4 months, worked to date in the test division.
And carrying out BIO labeling on the entity implicit relation. From the case text information, the fact that Zhang III and company B belong to the working relationship can be obviously obtained, the fact that Liqu and company B also belong to the working relationship, and the fact that Zhang III and Liqu belong to the co-worker relationship is further obtained. Taking the example of the human entity label identified by the named entity, in the prior art, all the names of the people in the text message are labeled, for example, B-person three I-person, O man O … …. Plum B-person four I-person, O woman O … …. Sheet B-person Li I-person, O woman … ….
The embodiment of the invention labels the implicit relationship between the entities, and the new data labeling format is as follows: zhang B-collegue three I-collegue, O Man O … …. Plum B-colleague four I-colleague, O woman O … …. Zhangli O, Olympic O … …. The new data label format indicates that Zhang III and Li IV belong to a co-worker relationship, and Zhang Li and Zhang III and Li IV do not belong to a co-worker relationship.
According to the embodiment of the invention, the text data is sent to the labeling terminal, the target person manually labels the implicit relationship of the entities, and the relationship between the target person and the entities is clear whether explicit or invisible, so that the labeling accuracy is improved.
Fig. 4 is a third flow chart for implementing the method for extracting an entity implicit relationship according to the embodiment of the present invention, on the basis of the embodiment shown in fig. 2, the embodiment of the present invention further includes a step of training a neural network model according to a training sample set to obtain a sequence labeling model, and as shown in fig. 4, the method according to the embodiment of the present invention includes:
step S401, the training sample set is coded according to the pre-training language model to obtain a coding vector.
And S402, training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
In the embodiment of the invention, the pre-training language model encodes each character in the training sample set to generate an encoding vector. The coded vector includes context information. Specifically, the pre-training language model may be a bert (bidirectional Encoder expressions) language model. And the server encodes each character of the marked text by using a first layer of Transformer of the BERT language model, transmits the character encoding vector of the first layer to a second layer of Transformer, and continues encoding by the second layer of Transformer until the last layer of Transformer finishes encoding to obtain a final encoding vector of the character, which is also called as a character encoding vector. In the coding process, the BERT language model can code characters by using the model parameters of each layer of the transformers, the model parameters of each layer of the transformers of the BERT language model are fully used, the performance of extracting the entity implicit relation can be effectively improved, and sentence-level relevant characteristics can be learned.
In the embodiment of the invention, before the BERT language model encodes the sample set, the sample set needs to be preprocessed to obtain a training sample set. Specifically, as shown in fig. 5, first, a [ CLS ] tag is added to the beginning of each sentence in the sample set, a [ SEP ] tag is added to the end of each sentence, and the [ SEP ] tags are used to connect the sentence pairs, so as to convert the sample set into the format required by the BERT model. Then, word embedding (Token embedding), sentence embedding (Segment embedding) and Position embedding (Position embedding) are respectively carried out on the sentence to which the label is added, and a training sample set is obtained.
And (3) training the training sample set by using a BERT language model to obtain a coding vector, and inputting the coding vector into a BilSTM-CRF neural network model to learn the characteristics of the coding vector so as to obtain a sequence labeling model.
The semantic information of the characters is enhanced through the pre-training language model, the coding vector is obtained through the pre-training language model, the coding vector is input into the BilSTM-CRF neural network model for training, the needed computer memory is small, and the training model period is short.
Fig. 6 is a fourth implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 2, the embodiment of the present invention describes in detail one possible implementation manner of step S202, and as shown in fig. 6, the method according to the embodiment of the present invention includes:
step S601, determining the target entities with the same label according to the output result of the sequence label model.
Step S602, determining the implicit relationship between the target entities according to the implicit relationship corresponding to the label.
In the embodiment of the present invention, entities having the same label have an implicit relationship corresponding to the label.
For example, fig. 7 is a frame diagram of extracting implicit relations by using a BiLSTM-CRF neural network model according to an embodiment of the present invention, as shown in fig. 7, a text to be processed includes 5 words w0w1w2w3w4w5, entity implicit relation extraction is performed on the text to be processed by using the BiLSTM-CRF model, and according to an extraction result, it can be determined that tags of [ w0, w1] and [ w3, w4] are both "college", and therefore, it can be determined that the implicit relation of [ w0, w1] and [ w3, w4] is the implicit relation corresponding to the tag "college", for example, the tag "college" corresponds to a colleague relation, and then the implicit relation of [ w0, w1] and [ w3, w4] is a colleague relation.
Fig. 8 is a fifth implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 4, the embodiment of the present invention describes in detail an implementation manner of step S402, and as shown in fig. 8, the method according to the embodiment of the present invention includes:
step S801, training the BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model.
Step S802, a test sample set is obtained.
And step S803, testing the target sequence labeling model according to the test sample set.
Step S804, if the accuracy of the output result of the target sequence model is greater than the preset threshold, determining the target sequence annotation model as the sequence annotation model.
In the embodiment of the invention, the terminal sends the attribute information of the test sample set to the server, and the attribute information of the test sample set comprises a test sample set name, a test sample set path or a test sample set link and the like. The server may obtain a corresponding set of test samples in the database according to the set of test samples identification. The server may also crawl the corresponding set of test samples over the network based on the text links.
After the server obtains the test sample set, the server inputs the test sample set into the target sequence labeling model for testing, the output result of the target sequence labeling model and the test sample set are sent to the terminal, target personnel determine the accuracy of the entity implicit relation extracted by the target sequence labeling model according to the output result of the target sequence labeling model and the test sample set, and only when the accuracy is greater than a preset threshold value, the target sequence labeling model is used as the sequence labeling model.
According to the embodiment of the invention, the accuracy of the extraction of the entity implicit relationship is improved by testing the target sequence marking model.
As an embodiment of the present invention, on the basis of the above embodiment of fig. 2, the method of the embodiment of the present invention may further include: and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
In the embodiment of the invention, the existing knowledge graph only comprises entities and the dominant relation between the entities, and the embodiment of the invention completes the knowledge graph and supplements the invisible relation between the entities into the knowledge graph, so that the knowledge graph can more comprehensively reflect the entities and the relation between the entities.
Fig. 9 is a first schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention. As shown in fig. 9, the entity implicit relationship extraction apparatus 900 includes: an input module 901 and a determination module 902.
The input module 901 is configured to acquire a text to be processed and input the text to be processed into the sequence labeling model. The sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
A determining module 902, configured to determine an entity implicit relationship of the text to be processed according to an output result of the sequence tagging model.
Fig. 10 is a schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention. As shown in fig. 10, the entity implicit relationship extracting apparatus 900 further includes: a model training module 903 and a building module 904. The model training module 903 includes an acquisition sub-module 9031, a preprocessing sub-module 9032, a training sub-module 9033, and a testing sub-module 9034.
As an embodiment of the present invention, the obtaining sub-module 9031 is configured to obtain text data, and preprocess the text data to obtain a sample set to be labeled; sending a sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to label an implicit relationship of an entity with the implicit relationship in the sample set to be labeled; and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
As an embodiment of the present invention, the training submodule 9033 is configured to encode the training sample set according to the pre-training language model to obtain a coding vector; and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
As an embodiment of the present invention, the pre-trained language model is the BERT language. The preprocessing submodule 9032 is configured to add a [ CLS ] tag to a beginning of each sentence in the sample set, add an [ SEP ] tag to an end of the sentence, and connect the sentence pairs with the [ SEP ] tags; and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
As an embodiment of the present invention, the determining module 902 is specifically configured to determine target entities with the same label tag according to an output result of the sequence label model; and determining the implicit relation between the target entities according to the implicit relation corresponding to the labeled labels.
As an embodiment of the present invention, the training submodule 9033 is configured to train the neural network model according to a training sample set, so as to obtain a target sequence labeling model. The test sub-module 9034 is configured to obtain a test sample set; testing the target sequence labeling model according to the test sample set; and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
As an embodiment of the present invention, the construction module 904 is configured to construct a knowledge graph according to the entity implicit relationship extraction result and the graph database.
The entity implicit relationship extraction device provided in the embodiment of the present invention may be used to implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 11 is a schematic diagram of a hardware structure of an entity implicit relationship extraction device according to an embodiment of the present invention. As shown in fig. 11, the entity implicit relationship extracting device 1100 provided in this embodiment includes: at least one processor 1101 and memory 1102. The entity implicit relationship extraction device 1100 also includes a communication component 1103. The processor 1101, the memory 1102, and the communication unit 1103 are connected by a bus 1104.
In particular implementations, at least one processor 1101 executes computer-executable instructions stored by memory 1102 to cause the at least one processor 1101 to perform an entity implicit relationship extraction method as performed by the entity implicit relationship extraction device 1100.
For a specific implementation process of the processor 1101, reference may be made to the above method embodiments, which implement similar principles and technical effects, and details of this embodiment are not described herein again.
In the embodiment shown in fig. 11, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the entity implicit relationship extraction method performed by the entity implicit relationship extraction device is implemented.
The present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the entity implicit relationship extraction method implemented by the entity implicit relationship extraction device is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. An entity implicit relationship extraction method, comprising:
acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model;
determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
2. The method of claim 1, further comprising the step of obtaining a sample set with entity implication relationship labels, as follows:
acquiring text data, and preprocessing the text data to obtain a sample set to be labeled;
sending the sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to perform implicit relationship labeling on entities with implicit relationships in the sample set to be labeled;
and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
3. The method of claim 1, further comprising the step of training a neural network model according to the training sample set to obtain a sequence labeling model, as follows:
coding the training sample set according to a pre-training language model to obtain a coding vector;
and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
4. The method of claim 3,
the pre-training language model is a BERT language;
further comprising the step of pre-processing the sample set as follows:
adding [ CLS ] labels to the beginning of each sentence in the sample set, adding [ SEP ] labels to the end of each sentence, and connecting sentence pairs by using the [ SEP ] labels;
and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
5. The method of claim 1, wherein the determining the entity implication relationship of the text to be processed according to the output result of the sequence annotation model comprises:
determining target entities with the same labeling labels according to the output result of the sequence labeling model;
and determining the implicit relation between the target entities according to the implicit relation corresponding to the label tag.
6. The method of claim 3, wherein training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model comprises:
training a BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model;
obtaining a test sample set;
testing the target sequence labeling model according to the test sample set;
and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
7. The method of any of claims 1 to 6, further comprising:
and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
8. An entity implicit relationship extraction apparatus, comprising:
the input module is used for acquiring a text to be processed and inputting the text to be processed into the sequence labeling model;
the determining module is used for determining the entity implicit relationship of the text to be processed according to the output result of the sequence marking model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
9. An entity implicit relationship extraction device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the memory-stored computer-executable instructions cause the at least one processor to perform the entity implicit relationship extraction method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the entity implication relationship extraction method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236475.6A CN113468335A (en) | 2020-03-30 | 2020-03-30 | Method and equipment for extracting entity implicit relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010236475.6A CN113468335A (en) | 2020-03-30 | 2020-03-30 | Method and equipment for extracting entity implicit relationship |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113468335A true CN113468335A (en) | 2021-10-01 |
Family
ID=77864867
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010236475.6A Pending CN113468335A (en) | 2020-03-30 | 2020-03-30 | Method and equipment for extracting entity implicit relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113468335A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291687A (en) * | 2017-04-27 | 2017-10-24 | 同济大学 | It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
KR20190019661A (en) * | 2017-08-18 | 2019-02-27 | 동아대학교 산학협력단 | Method for Natural Langage Understanding Based on Distribution of Task-specific Labels |
CN109446523A (en) * | 2018-10-23 | 2019-03-08 | 重庆誉存大数据科技有限公司 | Entity attribute extraction model based on BiLSTM and condition random field |
CN109885691A (en) * | 2019-01-08 | 2019-06-14 | 平安科技(深圳)有限公司 | Knowledge mapping complementing method, device, computer equipment and storage medium |
CN110046252A (en) * | 2019-03-29 | 2019-07-23 | 北京工业大学 | A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110570920A (en) * | 2019-08-20 | 2019-12-13 | 华东理工大学 | Entity and relationship joint learning method based on attention focusing model |
CN110598000A (en) * | 2019-08-01 | 2019-12-20 | 达而观信息科技(上海)有限公司 | Relationship extraction and knowledge graph construction method based on deep learning model |
CN110910243A (en) * | 2019-09-26 | 2020-03-24 | 山东佳联电子商务有限公司 | Property right transaction method based on reconfigurable big data knowledge map technology |
-
2020
- 2020-03-30 CN CN202010236475.6A patent/CN113468335A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291687A (en) * | 2017-04-27 | 2017-10-24 | 同济大学 | It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method |
KR20190019661A (en) * | 2017-08-18 | 2019-02-27 | 동아대학교 산학협력단 | Method for Natural Langage Understanding Based on Distribution of Task-specific Labels |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
CN109446523A (en) * | 2018-10-23 | 2019-03-08 | 重庆誉存大数据科技有限公司 | Entity attribute extraction model based on BiLSTM and condition random field |
CN109885691A (en) * | 2019-01-08 | 2019-06-14 | 平安科技(深圳)有限公司 | Knowledge mapping complementing method, device, computer equipment and storage medium |
CN110046252A (en) * | 2019-03-29 | 2019-07-23 | 北京工业大学 | A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping |
CN110209836A (en) * | 2019-05-17 | 2019-09-06 | 北京邮电大学 | Remote supervisory Relation extraction method and device |
CN110598000A (en) * | 2019-08-01 | 2019-12-20 | 达而观信息科技(上海)有限公司 | Relationship extraction and knowledge graph construction method based on deep learning model |
CN110570920A (en) * | 2019-08-20 | 2019-12-13 | 华东理工大学 | Entity and relationship joint learning method based on attention focusing model |
CN110910243A (en) * | 2019-09-26 | 2020-03-24 | 山东佳联电子商务有限公司 | Property right transaction method based on reconfigurable big data knowledge map technology |
Non-Patent Citations (1)
Title |
---|
翟社平等: "基于BILSTM_CRF的知识图谱实体抽取方法", 计算机应用与软件, 31 May 2019 (2019-05-31) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287480B (en) | Named entity identification method, device, storage medium and terminal equipment | |
JP2023529939A (en) | Multimodal POI feature extraction method and apparatus | |
CN111709240A (en) | Entity relationship extraction method, device, equipment and storage medium thereof | |
CN114297394B (en) | Method and electronic equipment for extracting event arguments in text | |
CN110134780B (en) | Method, device, equipment and computer readable storage medium for generating document abstract | |
CN113724819B (en) | Training method, device, equipment and medium for medical named entity recognition model | |
CN115952791A (en) | Chapter-level event extraction method, device and equipment based on machine reading understanding and storage medium | |
CN112085091B (en) | Short text matching method, device, equipment and storage medium based on artificial intelligence | |
CN114724166A (en) | Title extraction model generation method and device and electronic equipment | |
CN116912847A (en) | Medical text recognition method and device, computer equipment and storage medium | |
CN115983271A (en) | Named entity recognition method and named entity recognition model training method | |
CN114357167A (en) | Bi-LSTM-GCN-based multi-label text classification method and system | |
CN117038099A (en) | Medical term standardization method and device | |
CN116774973A (en) | Data rendering method, device, computer equipment and storage medium | |
CN116453125A (en) | Data input method, device, equipment and storage medium based on artificial intelligence | |
CN116595023A (en) | Address information updating method and device, electronic equipment and storage medium | |
CN113468335A (en) | Method and equipment for extracting entity implicit relationship | |
CN110826330B (en) | Name recognition method and device, computer equipment and readable storage medium | |
CN114417891A (en) | Reply sentence determination method and device based on rough semantics and electronic equipment | |
CN113901815A (en) | Emergency working condition event detection method based on dam operation log | |
CN114398492B (en) | Knowledge graph construction method, terminal and medium in digital field | |
CN113792539B (en) | Entity relationship classification method and device based on artificial intelligence, electronic equipment and medium | |
CN116151219A (en) | Named entity identification-based bid winning data analysis treatment method, device and medium | |
CN113158656B (en) | Ironic content recognition method, ironic content recognition device, electronic device, and storage medium | |
CN113157866B (en) | Data analysis method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |