CN113468335A - Method and equipment for extracting entity implicit relationship - Google Patents

Method and equipment for extracting entity implicit relationship Download PDF

Info

Publication number
CN113468335A
CN113468335A CN202010236475.6A CN202010236475A CN113468335A CN 113468335 A CN113468335 A CN 113468335A CN 202010236475 A CN202010236475 A CN 202010236475A CN 113468335 A CN113468335 A CN 113468335A
Authority
CN
China
Prior art keywords
sample set
entity
model
text
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010236475.6A
Other languages
Chinese (zh)
Inventor
蒋鹏民
唐至威
王月岭
贾鹏飞
高雪松
陈维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Hisense Co Ltd
Original Assignee
Hisense Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Co Ltd filed Critical Hisense Co Ltd
Priority to CN202010236475.6A priority Critical patent/CN113468335A/en
Publication of CN113468335A publication Critical patent/CN113468335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The embodiment of the invention provides an entity implicit relationship extraction method and equipment, wherein the method comprises the following steps: acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model; determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model; the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set. The embodiment of the invention can accurately extract the implicit relationship between the entities.

Description

Method and equipment for extracting entity implicit relationship
Technical Field
The embodiment of the invention relates to the technical field of knowledge maps, in particular to a method and equipment for extracting an entity implicit relation.
Background
With the development of big data and artificial intelligence, knowledge maps are important components of artificial intelligence technology, and have been widely applied to the fields of finance, agriculture, e-commerce, medical electronics, transportation and the like due to the strong semantic processing, interconnection organization, information retrieval and knowledge reasoning capabilities. In essence, a knowledge graph is a huge semantic network graph, which describes various entities or concepts and their relationships existing in the real world by representing the entities or concepts by nodes and representing the relationships by edges.
Often, there is also an implicit relationship between entities. In the prior art, entities and relationships between the entities are usually extracted first, and then an implicit relationship between the two entities is obtained through relationship reasoning. For example, for the text "zhang san works at company a", the relationship between the entity "zhang san" and the entity "company a" is a working relationship; for the text "Liqu 2018, 6-month-entry company A", the relationship between the entity "Liqu" and the entity "company A" is a working relationship, and the implicit relationship between Zhang three "and" Liqu "is inferred to be a co-worker relationship.
However, the inventor found that, since there is a certain error in extracting the entities and the relationship between the entities, the accumulated error of the implicit relationship between the two entities obtained according to the error is large, which results in inaccurate results.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for extracting an entity implicit relationship, which can accurately extract the implicit relationship between entities.
In a first aspect, an embodiment of the present invention provides an entity implicit relationship extraction method, including:
acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model;
determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
As an embodiment of the present invention, the method further includes the step of obtaining a sample set with entity implicit relationship labels, as follows:
acquiring text data, and preprocessing the text data to obtain a sample set to be labeled;
sending the sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to perform implicit relationship labeling on entities with implicit relationships in the sample set to be labeled;
and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
As an embodiment of the present invention, the method further includes the step of training a neural network model according to the training sample set to obtain a sequence labeling model, as follows:
coding the training sample set according to a pre-training language model to obtain a coding vector;
and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
As an embodiment of the present invention, the pre-training language model is BERT language;
further comprising the step of pre-processing the sample set as follows:
adding [ CLS ] labels to the beginning of each sentence in the sample set, adding [ SEP ] labels to the end of each sentence, and connecting sentence pairs by using the [ SEP ] labels;
and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
As an embodiment of the present invention, the determining an entity implicit relationship of the text to be processed according to the output result of the sequence tagging model includes:
determining target entities with the same labeling labels according to the output result of the sequence labeling model;
and determining the implicit relation between the target entities according to the implicit relation corresponding to the label tag.
As an embodiment of the present invention, the training the BiLSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model includes:
training a BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model;
obtaining a test sample set;
testing the target sequence labeling model according to the test sample set;
and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
As an embodiment of the present invention, the method further includes:
and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
In a second aspect, an embodiment of the present invention provides an entity implicit relationship extraction apparatus, including:
the input module is used for acquiring a text to be processed and inputting the text to be processed into the sequence labeling model;
the determining module is used for determining the entity implicit relationship of the text to be processed according to the output result of the sequence marking model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
In a third aspect, an embodiment of the present invention provides an entity implicit relationship extraction device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method as described above in the first aspect and various possible implementations of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to the first aspect and various possible implementations of the first aspect are implemented.
The method and the device for extracting the entity implicit relationship provided by the embodiment input a text to be processed into a sequence labeling model, the sequence labeling model outputs an entity implicit relationship labeling result of the text to be processed, and the entity implicit relationship of the text to be processed can be determined according to the labeling result, wherein the sequence labeling model is obtained by preprocessing a sample set with entity implicit relationship labels to obtain a training sample set and training a neural network model according to the training sample set. The embodiment of the invention can directly extract the implicit relationship between the entities, thereby improving the accuracy of the obtained implicit relationship, and the method is simple and convenient.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an entity implicit relationship extraction method according to an embodiment of the present invention;
FIG. 2 is a first flowchart illustrating an implementation of a method for extracting an entity implicit relationship according to an embodiment of the present invention;
FIG. 3 is a flowchart II illustrating an implementation of the method for extracting an entity implicit relationship according to an embodiment of the present invention;
fig. 4 is a third flow chart for implementing the method for extracting an entity implicit relationship according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of sample pre-processing provided by an embodiment of the present invention;
FIG. 6 is a fourth flowchart illustrating an implementation of the method for extracting an entity implicit relationship according to the embodiment of the present invention;
FIG. 7 is a frame diagram of a BiLSTM-CRF neural network model extraction implication relationship provided by the embodiment of the invention;
FIG. 8 is a fifth flowchart illustrating an implementation of the method for extracting an entity implicit relationship according to an embodiment of the present invention;
fig. 9 is a first schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention
Fig. 11 is a schematic diagram of a hardware structure of an entity implicit relationship extraction device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In a knowledge graph, there are often implicit relationships between entities, which refer to relationships between entities and entities with which they are not directly associated. In the prior art, entities and relationships between the entities are usually extracted first, and then an implicit relationship between the two entities is obtained through relationship reasoning. For example, for the text "zhang san works at company a", the relationship between the entity "zhang san" and the entity "company a" is a working relationship; for the text "Liqu 2018, 6-month-entry company A", the relationship between the entity "Liqu" and the entity "company A" is a working relationship, and the implicit relationship between Zhang three "and" Liqu "is inferred to be a co-worker relationship.
However, since there is a certain error in extracting the entity and the relationship between the entities, the accumulated error of the implicit relationship between the two entities obtained according to the error is large, which results in inaccurate results.
The application provides an entity implicit relationship extraction method, which is characterized in that a text to be processed is input into a trained neural network model, an entity implicit relationship labeling result is directly output, and the entity implicit relationship can be obtained without relationship reasoning, so that the accuracy of the obtained entity implicit relationship is improved.
Fig. 1 is a schematic view of an application scenario of an entity implicit relationship extraction method according to an embodiment of the present invention. The entity relationship extraction method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 101 communicates with the server 103 via the network 102. The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 103 may be implemented by an independent server or a server cluster formed by a plurality of servers.
For example, the embodiment of the invention can be used for detecting cases by public security institutions. The server 103 stores a large amount of multi-source heterogeneous data, most of which are case information. The terminal 101 sends an implicit relationship extraction request and case information to the server 103, and after receiving the implicit relationship extraction request, the server 103 extracts the implicit relationship between entities from the case information and returns the extraction result to the terminal 101.
Fig. 2 is a first flowchart of an implementation of a method for extracting an entity implicit relationship according to an embodiment of the present invention, where the method is applied to the terminal in fig. 1, and may also be applied to a server, and the method is applied to the server for description, as shown in fig. 2, the method in this embodiment includes:
step S201, obtaining a text to be processed, inputting the text to be processed into a sequence labeling model, wherein the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set, and training a neural network model according to the training sample set.
In the embodiment of the invention, the text to be processed refers to the text needing entity implicit relation labeling. And the terminal sends an entity implicit relationship extraction request to the server, and the server generates a corresponding implicit relationship extraction task according to the entity implicit relationship extraction request. The server can also generate an implicit relationship extraction task according to the preconfigured information. The implicit relation extraction task comprises attribute information of the text to be processed. The text attribute information to be processed includes a text name, a text path, a text link, or the like. The server obtains the text to be processed according to the text name or the text path, the server can also crawl the corresponding text to be processed in a network according to the text link, and the terminal can also send the text to be processed to the server. The format of the acquired text to be processed includes, but is not limited to, data in one or more formats of TXT text, word text and PDF text.
In the embodiments of the present invention, an implicit relationship refers to a relationship between an entity and an entity that is not directly associated with the entity. For example, entity a has an implicit relationship with entity C if entity a has a relationship with entity B and entity B has a relationship with entity C. For another example, for the text "zhang san works in company a", the entity "zhang san" is in working relationship with the entity "company a", and for the text "lie si works in company a", the entity "lie si" is in working relationship with the entity "company a", and then the implicit relationship between "zhang san" and "lie si" is in a co-worker relationship.
The sequence labeling model can label the entities with implicit relations in the text to be processed. Obtaining a sequence labeling model through the following steps: and obtaining a sample set with entity implicit relation labels, preprocessing the sample set to obtain a training sample set, and training the neural network model according to the training sample set to obtain a sequence label model. The terminal sends a sample set with entity implicit relation labels to a server; or the terminal sends the sample name or the sample path of the sample set with the entity implicit relationship label to the server, and the server obtains the corresponding sample set from the database according to the sample name or the sample path. And the server performs format conversion on the sample set, converts the sample set into a data format required by the neural network model, and obtains a training sample set.
The server preprocesses the text to be processed, converts the format of the text to be processed into a data format required by the sequence labeling model, inputs the preprocessed text to be processed into the sequence labeling model, and the sequence labeling model outputs an entity implicit relation labeling result.
And S202, determining the entity implicit relation of the text to be processed according to the output result of the sequence annotation model.
In the embodiment of the invention, the output result of the sequence labeling model is the entity implicit relationship labeling result of the text to be processed, and the entity implicit relationship of the text to be processed can be determined according to the labeling result.
According to the embodiment of the invention, the text to be processed is input into the sequence marking model, the sequence marking model outputs the entity implicit relation marking result of the text to be processed, and the entity implicit relation of the text to be processed can be determined according to the marking result.
Fig. 3 is a second implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 2, the embodiment of the present invention further includes a step of obtaining a sample set with an entity implicit relationship label, and as shown in fig. 3, the method according to the embodiment of the present invention includes:
step S301, acquiring text data, and preprocessing the text data to obtain a sample set to be labeled.
In the embodiment of the present invention, the terminal transmits the text name, the text path, or the text link of the text data. The server can obtain corresponding text data in the database according to the text name or the text path. The server can also crawl corresponding text data in the network according to the text links. The format of the acquired text data includes, but is not limited to, data in one or more formats of TXT text, word text, PDF text.
The server preprocesses the text data and comprises the following steps: firstly, irrelevant words, sentences and punctuation marks are removed, traditional characters are converted into simplified characters and the like, and then text data is segmented according to the sentences to obtain a sample set to be labeled.
Step S302, a sample set to be labeled is sent to a labeling terminal, and the sample set to be labeled is used for indicating a target person to label an implicit relationship of an entity with the implicit relationship in the sample set to be labeled.
And step S303, receiving the implicit relationship labeling result fed back by the terminal to obtain a sample set with entity implicit relationship labeling.
In the embodiment of the invention, the server sends the sample set to be labeled to the labeling terminal, and the target person manually labels the implicit relationship of the entity with the implicit relationship in the sample set to be labeled through the labeling terminal. And after the labeling is finished, the labeling terminal feeds back the sample set with the entity implicit relation label to the server.
The implicit relationship may be manually labeled in a BIO labeling manner, where B represents the beginning of the entity or relationship, I represents a non-beginning portion of the entity or relationship, and O represents a portion that is not the entity or relationship. The labeling terminal can be a visual data labeling terminal reconstructed in a webpage form, and can also adopt the existing data labeling platform in the current market. The labeling terminal can label the text in the format of TXT, word, PDF, etc.
In the following, the description of the BIO implicit relationship labeling is given by taking original text data as case information data of the public security organization, and the case part text information is as follows:
zhang three, a man, 25 years old, went to Hotel A in 2019, 4, month 15, he worked at company B once, and worked at the research and development department for two years.
Lee, female, 32 years old, introduced into company B in 2016, 3 months, and worked to date at the research and development department.
Zhangli, female, 34 years old, enrollment C in 2017, 4 months, worked to date in the test division.
And carrying out BIO labeling on the entity implicit relation. From the case text information, the fact that Zhang III and company B belong to the working relationship can be obviously obtained, the fact that Liqu and company B also belong to the working relationship, and the fact that Zhang III and Liqu belong to the co-worker relationship is further obtained. Taking the example of the human entity label identified by the named entity, in the prior art, all the names of the people in the text message are labeled, for example, B-person three I-person, O man O … …. Plum B-person four I-person, O woman O … …. Sheet B-person Li I-person, O woman … ….
The embodiment of the invention labels the implicit relationship between the entities, and the new data labeling format is as follows: zhang B-collegue three I-collegue, O Man O … …. Plum B-colleague four I-colleague, O woman O … …. Zhangli O, Olympic O … …. The new data label format indicates that Zhang III and Li IV belong to a co-worker relationship, and Zhang Li and Zhang III and Li IV do not belong to a co-worker relationship.
According to the embodiment of the invention, the text data is sent to the labeling terminal, the target person manually labels the implicit relationship of the entities, and the relationship between the target person and the entities is clear whether explicit or invisible, so that the labeling accuracy is improved.
Fig. 4 is a third flow chart for implementing the method for extracting an entity implicit relationship according to the embodiment of the present invention, on the basis of the embodiment shown in fig. 2, the embodiment of the present invention further includes a step of training a neural network model according to a training sample set to obtain a sequence labeling model, and as shown in fig. 4, the method according to the embodiment of the present invention includes:
step S401, the training sample set is coded according to the pre-training language model to obtain a coding vector.
And S402, training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
In the embodiment of the invention, the pre-training language model encodes each character in the training sample set to generate an encoding vector. The coded vector includes context information. Specifically, the pre-training language model may be a bert (bidirectional Encoder expressions) language model. And the server encodes each character of the marked text by using a first layer of Transformer of the BERT language model, transmits the character encoding vector of the first layer to a second layer of Transformer, and continues encoding by the second layer of Transformer until the last layer of Transformer finishes encoding to obtain a final encoding vector of the character, which is also called as a character encoding vector. In the coding process, the BERT language model can code characters by using the model parameters of each layer of the transformers, the model parameters of each layer of the transformers of the BERT language model are fully used, the performance of extracting the entity implicit relation can be effectively improved, and sentence-level relevant characteristics can be learned.
In the embodiment of the invention, before the BERT language model encodes the sample set, the sample set needs to be preprocessed to obtain a training sample set. Specifically, as shown in fig. 5, first, a [ CLS ] tag is added to the beginning of each sentence in the sample set, a [ SEP ] tag is added to the end of each sentence, and the [ SEP ] tags are used to connect the sentence pairs, so as to convert the sample set into the format required by the BERT model. Then, word embedding (Token embedding), sentence embedding (Segment embedding) and Position embedding (Position embedding) are respectively carried out on the sentence to which the label is added, and a training sample set is obtained.
And (3) training the training sample set by using a BERT language model to obtain a coding vector, and inputting the coding vector into a BilSTM-CRF neural network model to learn the characteristics of the coding vector so as to obtain a sequence labeling model.
The semantic information of the characters is enhanced through the pre-training language model, the coding vector is obtained through the pre-training language model, the coding vector is input into the BilSTM-CRF neural network model for training, the needed computer memory is small, and the training model period is short.
Fig. 6 is a fourth implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 2, the embodiment of the present invention describes in detail one possible implementation manner of step S202, and as shown in fig. 6, the method according to the embodiment of the present invention includes:
step S601, determining the target entities with the same label according to the output result of the sequence label model.
Step S602, determining the implicit relationship between the target entities according to the implicit relationship corresponding to the label.
In the embodiment of the present invention, entities having the same label have an implicit relationship corresponding to the label.
For example, fig. 7 is a frame diagram of extracting implicit relations by using a BiLSTM-CRF neural network model according to an embodiment of the present invention, as shown in fig. 7, a text to be processed includes 5 words w0w1w2w3w4w5, entity implicit relation extraction is performed on the text to be processed by using the BiLSTM-CRF model, and according to an extraction result, it can be determined that tags of [ w0, w1] and [ w3, w4] are both "college", and therefore, it can be determined that the implicit relation of [ w0, w1] and [ w3, w4] is the implicit relation corresponding to the tag "college", for example, the tag "college" corresponds to a colleague relation, and then the implicit relation of [ w0, w1] and [ w3, w4] is a colleague relation.
Fig. 8 is a fifth implementation flowchart of the method for extracting an entity implicit relationship according to the embodiment of the present invention, where on the basis of the embodiment shown in fig. 4, the embodiment of the present invention describes in detail an implementation manner of step S402, and as shown in fig. 8, the method according to the embodiment of the present invention includes:
step S801, training the BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model.
Step S802, a test sample set is obtained.
And step S803, testing the target sequence labeling model according to the test sample set.
Step S804, if the accuracy of the output result of the target sequence model is greater than the preset threshold, determining the target sequence annotation model as the sequence annotation model.
In the embodiment of the invention, the terminal sends the attribute information of the test sample set to the server, and the attribute information of the test sample set comprises a test sample set name, a test sample set path or a test sample set link and the like. The server may obtain a corresponding set of test samples in the database according to the set of test samples identification. The server may also crawl the corresponding set of test samples over the network based on the text links.
After the server obtains the test sample set, the server inputs the test sample set into the target sequence labeling model for testing, the output result of the target sequence labeling model and the test sample set are sent to the terminal, target personnel determine the accuracy of the entity implicit relation extracted by the target sequence labeling model according to the output result of the target sequence labeling model and the test sample set, and only when the accuracy is greater than a preset threshold value, the target sequence labeling model is used as the sequence labeling model.
According to the embodiment of the invention, the accuracy of the extraction of the entity implicit relationship is improved by testing the target sequence marking model.
As an embodiment of the present invention, on the basis of the above embodiment of fig. 2, the method of the embodiment of the present invention may further include: and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
In the embodiment of the invention, the existing knowledge graph only comprises entities and the dominant relation between the entities, and the embodiment of the invention completes the knowledge graph and supplements the invisible relation between the entities into the knowledge graph, so that the knowledge graph can more comprehensively reflect the entities and the relation between the entities.
Fig. 9 is a first schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention. As shown in fig. 9, the entity implicit relationship extraction apparatus 900 includes: an input module 901 and a determination module 902.
The input module 901 is configured to acquire a text to be processed and input the text to be processed into the sequence labeling model. The sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
A determining module 902, configured to determine an entity implicit relationship of the text to be processed according to an output result of the sequence tagging model.
Fig. 10 is a schematic structural diagram of an entity implicit relationship extraction apparatus according to an embodiment of the present invention. As shown in fig. 10, the entity implicit relationship extracting apparatus 900 further includes: a model training module 903 and a building module 904. The model training module 903 includes an acquisition sub-module 9031, a preprocessing sub-module 9032, a training sub-module 9033, and a testing sub-module 9034.
As an embodiment of the present invention, the obtaining sub-module 9031 is configured to obtain text data, and preprocess the text data to obtain a sample set to be labeled; sending a sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to label an implicit relationship of an entity with the implicit relationship in the sample set to be labeled; and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
As an embodiment of the present invention, the training submodule 9033 is configured to encode the training sample set according to the pre-training language model to obtain a coding vector; and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
As an embodiment of the present invention, the pre-trained language model is the BERT language. The preprocessing submodule 9032 is configured to add a [ CLS ] tag to a beginning of each sentence in the sample set, add an [ SEP ] tag to an end of the sentence, and connect the sentence pairs with the [ SEP ] tags; and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
As an embodiment of the present invention, the determining module 902 is specifically configured to determine target entities with the same label tag according to an output result of the sequence label model; and determining the implicit relation between the target entities according to the implicit relation corresponding to the labeled labels.
As an embodiment of the present invention, the training submodule 9033 is configured to train the neural network model according to a training sample set, so as to obtain a target sequence labeling model. The test sub-module 9034 is configured to obtain a test sample set; testing the target sequence labeling model according to the test sample set; and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
As an embodiment of the present invention, the construction module 904 is configured to construct a knowledge graph according to the entity implicit relationship extraction result and the graph database.
The entity implicit relationship extraction device provided in the embodiment of the present invention may be used to implement the method embodiments described above, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 11 is a schematic diagram of a hardware structure of an entity implicit relationship extraction device according to an embodiment of the present invention. As shown in fig. 11, the entity implicit relationship extracting device 1100 provided in this embodiment includes: at least one processor 1101 and memory 1102. The entity implicit relationship extraction device 1100 also includes a communication component 1103. The processor 1101, the memory 1102, and the communication unit 1103 are connected by a bus 1104.
In particular implementations, at least one processor 1101 executes computer-executable instructions stored by memory 1102 to cause the at least one processor 1101 to perform an entity implicit relationship extraction method as performed by the entity implicit relationship extraction device 1100.
For a specific implementation process of the processor 1101, reference may be made to the above method embodiments, which implement similar principles and technical effects, and details of this embodiment are not described herein again.
In the embodiment shown in fig. 11, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The application also provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the entity implicit relationship extraction method performed by the entity implicit relationship extraction device is implemented.
The present application further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the entity implicit relationship extraction method implemented by the entity implicit relationship extraction device is implemented.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An entity implicit relationship extraction method, comprising:
acquiring a text to be processed, and inputting the text to be processed into a sequence labeling model;
determining an entity implicit relationship of the text to be processed according to an output result of the sequence labeling model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
2. The method of claim 1, further comprising the step of obtaining a sample set with entity implication relationship labels, as follows:
acquiring text data, and preprocessing the text data to obtain a sample set to be labeled;
sending the sample set to be labeled to a labeling terminal, wherein the sample set to be labeled is used for indicating a target person to perform implicit relationship labeling on entities with implicit relationships in the sample set to be labeled;
and receiving the implicit relationship labeling result fed back by the labeling terminal to obtain a sample set with entity implicit relationship labeling.
3. The method of claim 1, further comprising the step of training a neural network model according to the training sample set to obtain a sequence labeling model, as follows:
coding the training sample set according to a pre-training language model to obtain a coding vector;
and training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model.
4. The method of claim 3,
the pre-training language model is a BERT language;
further comprising the step of pre-processing the sample set as follows:
adding [ CLS ] labels to the beginning of each sentence in the sample set, adding [ SEP ] labels to the end of each sentence, and connecting sentence pairs by using the [ SEP ] labels;
and performing word embedding processing, sentence embedding processing and position embedding processing on the sentence to which the label is added.
5. The method of claim 1, wherein the determining the entity implication relationship of the text to be processed according to the output result of the sequence annotation model comprises:
determining target entities with the same labeling labels according to the output result of the sequence labeling model;
and determining the implicit relation between the target entities according to the implicit relation corresponding to the label tag.
6. The method of claim 3, wherein training the BilSTM-CRF neural network model according to the coding vector to obtain a sequence labeling model comprises:
training a BiLSTM-CRF neural network model according to the coding vector to obtain a target sequence labeling model;
obtaining a test sample set;
testing the target sequence labeling model according to the test sample set;
and if the accuracy of the output result of the target sequence model is greater than a preset threshold value, determining the target sequence labeling model as a sequence labeling model.
7. The method of any of claims 1 to 6, further comprising:
and constructing a knowledge graph according to the entity implicit relation extraction result and the graph database.
8. An entity implicit relationship extraction apparatus, comprising:
the input module is used for acquiring a text to be processed and inputting the text to be processed into the sequence labeling model;
the determining module is used for determining the entity implicit relationship of the text to be processed according to the output result of the sequence marking model;
the sequence labeling model is obtained by preprocessing a sample set with entity implicit relation labels to obtain a training sample set and training a neural network model according to the training sample set.
9. An entity implicit relationship extraction device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the memory-stored computer-executable instructions cause the at least one processor to perform the entity implicit relationship extraction method of any of claims 1 to 7.
10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the entity implication relationship extraction method of any of claims 1-7.
CN202010236475.6A 2020-03-30 2020-03-30 Method and equipment for extracting entity implicit relationship Pending CN113468335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010236475.6A CN113468335A (en) 2020-03-30 2020-03-30 Method and equipment for extracting entity implicit relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010236475.6A CN113468335A (en) 2020-03-30 2020-03-30 Method and equipment for extracting entity implicit relationship

Publications (1)

Publication Number Publication Date
CN113468335A true CN113468335A (en) 2021-10-01

Family

ID=77864867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010236475.6A Pending CN113468335A (en) 2020-03-30 2020-03-30 Method and equipment for extracting entity implicit relationship

Country Status (1)

Country Link
CN (1) CN113468335A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
KR20190019661A (en) * 2017-08-18 2019-02-27 동아대학교 산학협력단 Method for Natural Langage Understanding Based on Distribution of Task-specific Labels
CN109446523A (en) * 2018-10-23 2019-03-08 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and condition random field
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110570920A (en) * 2019-08-20 2019-12-13 华东理工大学 Entity and relationship joint learning method based on attention focusing model
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN110910243A (en) * 2019-09-26 2020-03-24 山东佳联电子商务有限公司 Property right transaction method based on reconfigurable big data knowledge map technology

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method
KR20190019661A (en) * 2017-08-18 2019-02-27 동아대학교 산학협력단 Method for Natural Langage Understanding Based on Distribution of Task-specific Labels
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
CN109446523A (en) * 2018-10-23 2019-03-08 重庆誉存大数据科技有限公司 Entity attribute extraction model based on BiLSTM and condition random field
CN109885691A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Knowledge mapping complementing method, device, computer equipment and storage medium
CN110046252A (en) * 2019-03-29 2019-07-23 北京工业大学 A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping
CN110209836A (en) * 2019-05-17 2019-09-06 北京邮电大学 Remote supervisory Relation extraction method and device
CN110598000A (en) * 2019-08-01 2019-12-20 达而观信息科技(上海)有限公司 Relationship extraction and knowledge graph construction method based on deep learning model
CN110570920A (en) * 2019-08-20 2019-12-13 华东理工大学 Entity and relationship joint learning method based on attention focusing model
CN110910243A (en) * 2019-09-26 2020-03-24 山东佳联电子商务有限公司 Property right transaction method based on reconfigurable big data knowledge map technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
翟社平等: "基于BILSTM_CRF的知识图谱实体抽取方法", 计算机应用与软件, 31 May 2019 (2019-05-31) *

Similar Documents

Publication Publication Date Title
CN110287480B (en) Named entity identification method, device, storage medium and terminal equipment
JP2023529939A (en) Multimodal POI feature extraction method and apparatus
CN111709240A (en) Entity relationship extraction method, device, equipment and storage medium thereof
CN114297394B (en) Method and electronic equipment for extracting event arguments in text
CN110134780B (en) Method, device, equipment and computer readable storage medium for generating document abstract
CN113724819B (en) Training method, device, equipment and medium for medical named entity recognition model
CN115952791A (en) Chapter-level event extraction method, device and equipment based on machine reading understanding and storage medium
CN112085091B (en) Short text matching method, device, equipment and storage medium based on artificial intelligence
CN114724166A (en) Title extraction model generation method and device and electronic equipment
CN116912847A (en) Medical text recognition method and device, computer equipment and storage medium
CN115983271A (en) Named entity recognition method and named entity recognition model training method
CN114357167A (en) Bi-LSTM-GCN-based multi-label text classification method and system
CN117038099A (en) Medical term standardization method and device
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN116453125A (en) Data input method, device, equipment and storage medium based on artificial intelligence
CN116595023A (en) Address information updating method and device, electronic equipment and storage medium
CN113468335A (en) Method and equipment for extracting entity implicit relationship
CN110826330B (en) Name recognition method and device, computer equipment and readable storage medium
CN114417891A (en) Reply sentence determination method and device based on rough semantics and electronic equipment
CN113901815A (en) Emergency working condition event detection method based on dam operation log
CN114398492B (en) Knowledge graph construction method, terminal and medium in digital field
CN113792539B (en) Entity relationship classification method and device based on artificial intelligence, electronic equipment and medium
CN116151219A (en) Named entity identification-based bid winning data analysis treatment method, device and medium
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN113157866B (en) Data analysis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination