CN114328978B

CN114328978B - Relationship extraction method, device, equipment and readable storage medium

Info

Publication number: CN114328978B
Application number: CN202210228412.5A
Authority: CN
Inventors: 毛震东; 张勇东; 付艺硕; 高杰; 徐本峰
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-05-24
Anticipated expiration: 2042-03-10
Also published as: CN114328978A

Abstract

The application discloses a relation extraction method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: receiving natural language data input by a user; inputting the natural language data into a first language model to obtain entity relationship data, wherein the entity relationship data is used for representing entity relationships in the natural language data, the first language model is obtained by training a predetermined second language model by taking an entity relationship matching ERM and a mask language model MLM as pre-training tasks to obtain a third language model, and then training the third language model by using a prediction relationship formula. The method and the device avoid the problem that the performance of the relation extraction task cannot be improved by the language model from the NSP due to the fact that the NSP is used as the pre-training task of the language model, and improve the data processing capacity of the language model.

Description

Relationship extraction method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a method, an apparatus, a device, and a readable storage medium for extracting relationships.

Background

With the rapid development of technologies such as internet, cloud computing and the like, information data present the characteristics of high-speed growth and huge scale, and information generated and available by human beings all grows at an exponential rate.

Based on this, the relation extraction technology is developed, and the conventional relation extraction technology mostly uses NSP (Next sequence Prediction) as a pre-training task.

On the implementation level, the NSP is not adapted to the downstream relationship extraction task, so that the model cannot obtain performance improvement on the relationship extraction task from the NSP.

Disclosure of Invention

The present application aims to solve the technical problem of how to better adapt to a downstream relationship extraction task to improve the relationship extraction capability of a language model in the prior art by training the language model with an Entity Relationship Matching (ERM) as a pre-training task.

To achieve the above object, the present application provides a relationship extraction method, including:

receiving natural language data input by a user;

inputting the natural Language data into a first Language model to obtain entity relationship data, wherein the entity relationship data is used for representing entity relationships in the natural Language data, the first Language model is obtained by training a predetermined second Language model by using an entity relationship matching ERM (error correlation model) and a mask Language model MLM (mask Language model) as a pre-training task, and then training the third Language model by using a prediction relationship formula.

Illustratively, before receiving the user-input natural language data, the method includes:

acquiring an ERM pre-training sample, an ERM pre-training loss function and a prediction relation formula;

training the second language model based on the ERM pre-training sample and the ERM pre-training loss function to obtain a third language model;

and training the third language model based on the prediction relation formula to obtain the first language model.

Illustratively, the obtaining the ERM pre-training sample includes:

acquiring pre-training data;

segmenting the pre-training data to obtain a first sample;

extracting entity pairs contained in the first sample, and determining a relationship label according to the relationship between the entity pairs, wherein the entity pairs comprise a first entity and a second entity;

inserting a first special label and a second special label into two sides of the first entity respectively, and inserting a third special label and a fourth special label into two sides of the second entity respectively to obtain a second sample, wherein the first special label, the second special label, the third special label and the fourth special label are used for marking the entity pair;

and combining the second sample [ SEP ] according to [ CLS ] and the relation label [ SEP ] to obtain the ERM pre-training sample, wherein the [ CLS ] is placed at the head for classification, and the [ SEP ] is placed in the middle for separation.

Illustratively, the obtaining the ERM pre-training loss function includes:

obtaining a sample vector, wherein the sample vector is obtained by encoding a part of the ERM pre-training sample;

acquiring a first hidden layer, wherein the first hidden layer is obtained by coding a first special label through the second language model, and the first special label is contained in the ERM pre-training sample;

acquiring a second hidden layer, wherein the second hidden layer is obtained by encoding a third special label through the second language model, and the third special label is contained in the ERM pre-training sample;

combining the first hidden layer and the second hidden layer to obtain relationship information between entity pairs in the ERM pre-training sample;

inputting the relation information into a linear layer to obtain a prediction score;

inputting the prediction score into a regression function to obtain a prediction probability;

and obtaining the ERM pre-training loss function based on the sample vector and the prediction probability.

Illustratively, the obtaining a predictive relationship formula includes:

splitting a second relation matrix based on a relation type, a meta relation number and a vector dimension of the meta relation to obtain a third relation matrix and a fourth relation matrix, wherein the relation type, the meta relation number, the vector dimension of the meta relation and the second relation matrix are preset;

processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection;

combining the relation vector collection to obtain a first relation matrix;

and inputting the first relation matrix, the third relation matrix, the fourth relation matrix and relation information into a function to obtain the prediction relation formula, wherein the relation information is contained in the ERM pre-training loss function.

Exemplarily, the processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection includes:

and multiplying each row of the third relation matrix by the fourth relation matrix to obtain the relation vector collection.

Exemplarily, the processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection further includes:

acquiring a fifth special label, wherein the fifth special label is a concatenation of a "vector representation" of a first special label corresponding to each layer in the second language model, and the first special label is included in the ERM pre-training sample;

acquiring a sixth special label, wherein the sixth special label is a concatenation of a "vector representation" of a third special label corresponding to each layer in the second language model, and the third special label is included in the ERM pre-training sample;

after the fifth special label and the sixth special label are combined, inputting the combined label into a full connection layer to obtain a query vector collection;

multiplying the query vector collection by each line of the fourth relation matrix, and then inputting a regression function to obtain an attention score collection;

and multiplying the attention score set by the fourth relation matrix to obtain the relation vector set.

Illustratively, to achieve the above object, the present application further provides a relationship extraction apparatus, comprising:

the receiving module is used for receiving natural language data input by a user;

the first language model is obtained by training a predetermined second language model by taking an entity relationship matching ERM and a mask language model MLM as pre-training tasks to obtain a third language model and then training the third language model by using a predictive relationship formula.

Illustratively, to achieve the above object, the present application further provides a relationship extraction device, including: a memory, a processor and a relationship extraction program stored on the memory and executable on the processor, the relationship extraction program when executed by the processor implementing the steps of the relationship extraction method as described above.

Illustratively, to achieve the above object, the present application further provides a computer readable storage medium having a relationship extraction program stored thereon, which when executed by a processor implements the steps of the relationship extraction method as described above.

Compared with the prior art that an NSP pre-training task frequently used by a language model is not adaptive to a downstream relation extraction task, and the language model cannot further improve the relation extraction capability from the NSP pre-training task, the method receives natural language data input by a user, inputs the natural language data into a first language model to obtain entity relation data, wherein the entity relation data is used for representing entity relations in the natural language data, the first language model is obtained by training a predetermined second language model by taking an entity relation matching ERM and a mask language model MLM as pre-training tasks to obtain a third language model, and then training the third language model by using a prediction relation formula to extract the task in a more adaptive downstream relation. According to the method and the device, the NSP is avoided being used as a pre-training task of the language model, the ERM is used as the pre-training task of the language model, the relation matching sample is firstly used for carrying out common training on the language model aiming at a general relation extraction task, and then special training is carried out aiming at a specific relation extraction task, so that the training efficiency of the model and the accuracy of relation extraction are improved, and the relation extraction capability of the language model is improved. Therefore, the method and the device avoid the phenomenon that the NSP is used as a pre-training task to train the language model, so that the NSP is not adaptive to a downstream relation extraction task, and further improve the relation extraction capability of the language model.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a first embodiment of a method for extracting relationships according to the present application;

FIG. 2 is a schematic flow chart illustrating language model training in the method for extracting relationship according to the present application;

FIG. 3 is a schematic flow chart diagram of a preferred embodiment of the present application relationship extraction method;

FIG. 4 is a distribution diagram of a relationship matrix in a relationship space of the present application relationship extraction method;

fig. 5 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1 and 2, fig. 1 is a schematic flow chart of a first embodiment of the method for extracting a relationship of an application, and fig. 2 is a schematic flow chart of model training in the method for extracting a relationship of an application.

Embodiments of the present application provide embodiments of a relationship extraction method, and it should be noted that, although a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in an order different from that in the flowchart, and the relationship extraction method includes:

in step S110, natural language data input by a user is received.

Natural language generally refers to a language that naturally evolves with culture, and chinese, english, japanese, and the like all belong to natural languages. With the development of computer technology, how to realize the intercommunication between Natural Language and computer Language becomes a core subject of data Processing and artificial intelligence, and based on the birth of the Natural Language Processing (NLP) technology, the main task is to convert unstructured Natural Language into structured computer Language.

Step S120, inputting the natural language data into a first language model to obtain entity relationship data, wherein the entity relationship data is used for representing entity relationships in the natural language data, the first language model is obtained by taking an entity relationship matching ERM and a mask language model MLM as pre-training tasks to train a predetermined second language model to obtain a third language model, and then training the third language model by using a prediction relationship formula.

The second language model may be BERT (pre-training language Representation model), or may be another language model that can convert an unstructured language into a structured language, as long as the unstructured language can be converted into the structured language, which is not limited in this embodiment.

Taking the example that the second language model is BERT, the BERT is divided into a pre-training stage and a fine-tuning stage; the pre-training stage is to train partial models of the middle and bottom layers and the commonalities of the downstream tasks in advance; the fine tuning stage is used for obtaining an adaptive model for a specific downstream task more quickly by combining the model trained in the pre-training stage with an additional output layer. By using the BERT, the parameter quantity of the model to be learned can be greatly reduced, and the model training efficiency is improved.

The first language model is obtained by firstly pre-training the second language model by taking ERM and MLM as pre-training tasks to obtain a third language model and then carrying out fine-tuning training on the third language model by utilizing a predictive relation formula so as to convert natural language data input by a user into computer language data and further extract entity relation data from the natural language data.

The entity relationship data is used for representing entity relationships in natural language data, namely entity relationships which a user wants to extract from input natural language data.

Illustratively, before receiving the natural language data input by the user, the method includes:

step S210, an ERM pre-training sample, an ERM pre-training loss function and a prediction relation formula are obtained.

Step S220, training the second language model based on the ERM pre-training sample and the ERM pre-training loss function to obtain the third language model.

Step S230, training the third language model based on the predictive relationship formula to obtain the first language model.

The ERM pre-training sample is used for performing commonality training on the second language model, wherein the commonality training refers to learning underlying and commonalities contents in all relation extraction tasks, for example: extracting book names and author names from the text, or extracting company names and company-located area names from the text, wherein the common content is to perform word segmentation on the text and search keywords in the text; the ERM pre-training loss function is used for improving the pre-training accuracy of the second language model and reducing the pre-training error of the second language model; the predictive relational formula is used for performing targeted training on the third language model, wherein the targeted training main pointer performs targeted training on a specific relational extraction task, such as: and extracting the name of the parent company and the name of the subsidiary company from the text, and performing targeted training on the stockholder relationship and the transaction relationship among the companies to improve the relationship extraction capability of the second language model.

Illustratively, the obtaining the ERM pre-training sample includes: pre-training data is acquired.

The pre-training data is selected from a Corpus, wherein the Corpus can be an English Wikipedia Corpus or a Books Corpus book Corpus, and the specific Corpus is not limited in the embodiment.

And segmenting the pre-training data to obtain a first sample.

The method comprises the steps that a first sample is obtained by segmenting pre-training data according to sentences to obtain a plurality of example sentences; dividing each example sentence according to words to obtain a plurality of words; and combining the obtained multiple words to construct a word collection to obtain a first sample. For example: the text is divided into a plurality of sentences according to the periods (exclamation marks and semicolons), each sentence is divided into a plurality of words according to the words, the plurality of words divided from each sentence are combined into a word set, and the plurality of sentences are combined to form the data sample.

In addition, the pre-training sample data can also be segmented according to paragraphs to obtain a plurality of paragraph examples; segmenting each paragraph example according to words to obtain a plurality of words; combining the obtained multiple words to form a word collection to obtain a first sample.

And using an NER (Named Entity Recognition) tool to mark out Entity pairs contained in the first sample, and determining a relationship label according to the relationship between the Entity pairs, wherein the Entity pairs comprise a first Entity and a second Entity.

The first entity and the second entity are words having entity meanings in the first sample, an entity relationship exists between the first entity and the second entity, and the first entity and the second entity form an entity pair having the entity relationship. For example: the headquarters of the alpha company is set in the beta city, the first entity is the alpha company, the second entity is the beta city, and the entity relationship between the alpha company and the beta city is marked as sitting; if "Beijing is the capital of China", the first entity is "Beijing", the second entity is "China", and the entity relationship between "Beijing" and "China" is marked as the capital.

in order to mark a first entity and a second entity with entity relationship in a first sample, a first special label and a second special label are respectively inserted at two ends of the first entity, and a third special label and a fourth special label are respectively inserted at two ends of the second entity. After inserting the special label into the first sample, a second sample is obtained.

And combining the relation labels [ SEP ] according to [ CLS ] and the second sample [ SEP ] to obtain the ERM pre-training sample, wherein [ CLS ] is placed at the head for classification, and [ SEP ] is placed in the middle for separation.

The relationship label is obtained by marking the entity relationship between the entity pairs; and dividing the entity relationship in the first sample into two parts according to the proportion of 1:1, wherein the relationship label corresponding to one part of the entity relationship is real, and the relationship label corresponding to the other part of the entity relationship is false. For example: the entity relationship between the 'alpha company' and the 'beta city' is marked as sitting, and the relationship label corresponding to the entity relationship is real; the entity relationship between Beijing and China is marked as province, and the corresponding relationship label of the entity relationship is false. The comparison sample is set in such a way to improve the accuracy of the language model relation extraction.

And combining the second sample and the relationship label to obtain an ERM pre-training sample, wherein the combination mode is as follows: [ CLS ] second sample [ SEP ] relationship tag [ SEP ], wherein [ CLS ] flag is placed at the head of first sentence for classification; [ SEP ] is used to separate two input sentences.

It can be understood that the processing procedure for processing the obtained pre-training samples is as follows: firstly, segmenting a pre-training sample according to sentences to obtain a plurality of example sentences; each example sentence is segmented according to words to obtain a plurality of word collections; the resulting word collections are combined to obtain a first sample. Extracting words with entity relations in the first sample, marking the words as a first entity and a second entity, combining the first entity and the second entity to obtain an entity pair, and determining a relation label according to the actual relation between the entity pair; in order to enable the model to obtain a better training effect, when the relation label is determined, a reference group is set, namely, half of the entity relation in the first sample is marked as a real relation label and is marked as a matching sample; and marking the other half entity relationship as a false relationship label and marking as a mismatch sample.

In order to facilitate more accurate positioning of a first entity and a second entity from a plurality of words, a first special label and a second special label are respectively inserted into two ends of the first entity, a third special label and a fourth special label are respectively inserted into two ends of the second entity, so that a second sample is obtained, and the obtained second sample and a relation label are combined in the following mode, namely [ CLS ] second sample [ SEP ] relation label [ SEP ], so that an ERM pre-training sample is obtained.

Compared with the NSP pre-training sample in the prior art, the ERM pre-training sample is constructed on the basis of entity relationship matching, and is divided into the equal amount of matching sample and unmatched sample, so that the second language model can learn the common content extracted from the relationship in the ERM pre-training sample more quickly, the problem that the previous language model needs a large amount of pre-training samples to learn is avoided, and the learning efficiency of the language model is improved.

Illustratively, the obtaining the ERM pre-training loss function includes: and acquiring a sample vector, wherein the sample vector is obtained by encoding a part of the ERM pre-training samples.

The sample vector may be obtained by encoding the matching sample in the first sample by using a one-hot (one-bit efficient coding) method, or may be obtained by encoding the matching sample by using another coding method, which is not limited in this embodiment.

And acquiring a first hidden layer, wherein the first hidden layer is obtained by encoding a first special label through the second language model, and the first special label is contained in the ERM pre-training sample.

And acquiring a second hidden layer, wherein the second hidden layer is obtained by encoding a third special label through the second language model, and the third special label is contained in the ERM pre-training sample.

The hidden layer is a multi-level abstraction of input features to better linearly partition different types of data. And inputting the first special label and the third special label into a second language model encoder for encoding to respectively obtain a first hidden layer and a second hidden layer.

And combining the first hidden layer and the second hidden layer to obtain the relationship information between the entity pairs in the ERM pre-training sample.

And inputting the relation information into a linear layer to obtain a prediction score.

The linear layer is to perform linear change on input data, and the linear change is to map a vector in one vector space to another vector in another vector space, that is, to output another vector after performing linear change on the input vector. For example: linear change γ = F χ + G, where χ, γ are variables and F, G is a constant, meaning that χ is linearly changed and then mapped to γ.

And inputting the prediction score into a regression function to obtain the prediction probability.

The regression function may be a Softmax regression function, or may be other review functions for implementing the two-classification, and the embodiment is not particularly limited. The classification means that the classification task has two categories, in this embodiment, the two categories are the relation and the non-relation of the input natural language data.

The Softmax regression function is generated by popularizing Logistic regression (logarithmic probability regression) to the application of classifying the categories, and has the function of realizing multi-category classification without establishing a plurality of two-category classifiers. The idea of the Softmax classifier is that for a new sample, a score is calculated by the Softmax regression model for each class, then a probability value is calculated by the Softmax function, and which class belongs to is determined according to the final probability value.

The effect of the loss function is to minimize the error per training case during machine learning. The constructed loss function may be a cross entropy loss function or a regression loss function, as long as the minimization of the training error can be achieved, and the embodiment is not particularly limited.

And transposing the sample vectors, summing the transposed sample vectors, and multiplying the transposed sample vectors by a value obtained by logarithmizing the prediction probability to obtain an ERM pre-training loss function.

It can be understood that the pre-training samples are materials for model training, and the model is trained by the pre-training samples to quickly form the language processing capability, but how to reduce the learning error of the model in the learning process needs to construct a pre-training loss function in the model training process so as to minimize the training error.

The ERM pre-training loss function is constructed by encoding the matched sample in a one-hot encoding mode to obtain a sample vector; inputting the first special label and the second special label into an encoder of a second language model, and obtaining a first hidden layer and a second hidden layer after encoding by the encoder; combining the first hidden layer and the second hidden layer to obtain the relationship information between the entity pairs; inputting the obtained relation information into a linear layer in a second language model for linear transformation to obtain a prediction score; inputting the prediction score into a regression function for regression analysis to obtain a prediction probability; and transposing the obtained sample vector, summing the transposed sample vector, and multiplying the transposed sample vector by a value obtained by logarithmizing the prediction probability to obtain an ERM pre-training loss function.

Illustratively, the obtaining a predictive relationship formula includes: splitting the second relation matrix based on the relation type, the element relation quantity and the vector dimension of the element relation to obtain a third relation matrix and a fourth relation matrix, wherein the relation type, the element relation quantity, the vector dimension of the element relation and the second relation matrix are preset.

The predictive relation formula has the function of enabling the third language model to be adapted to a specific relation extraction task more quickly, and is characterized in that the model after the training of the underlying and common tasks is completed is combined with the predictive relation formula to conduct targeted relation extraction training, so that the language model can complete the relation extraction task better.

The core of the construction of the prediction relational formula is the construction of the relational matrix, and the traditional deep learning library usually only uses one statistical distribution to initialize the relational matrix, which leads to the regular distribution of the relational matrix in the relational space, and is obviously unreasonable; the relationship representation has strong association with the entity type of the entity pair, the vector representations of different relationships should present cluster-like distribution, and the same relationship in the entity pair should be grouped into a cluster. For example: the relationship x between the class A entity and the class B entity is closer to the relationship y within the relationship space, while the relationship x between the class A entity and the class B entity is farther from the relationship z between the class A entity and the class C entity within the relationship space.

Based on this, in order to optimize the distribution of the relationship identifiers, the relationship type, the meta relationship number, and the vector dimension of the meta relationship need to be preset, and the preset second relationship matrix is split according to the preset relationship type, the meta relationship number, and the vector dimension of the meta relationship, so as to obtain the third relationship matrix and the fourth relationship matrix.

And processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection.

Illustratively, each row of the third relationship matrix is multiplied by the fourth relationship matrix to obtain the relationship vector collection.

And combining the relation vector collection to obtain a first relation matrix.

It can be understood that, in order to optimize the distribution of the relationship matrix, the relationship type S, the number a of element relationships, and the vector dimension D of the element relationships need to be preset, and based on this, the preset second relationship matrix W is divided into W₁And W₂Wherein W ∈ R^S×D，W₁∈R^S×A，W₂∈R^A×D，W=W₁×W₂Where R denotes a matrix, S × D denotes S rows and D columns, S × a denotes S rows and a columns, and a × D denotes a rows and D columns. W is to be₁Each row vector in (1) is multiplied by W₂I.e. W^∗(i)=W₁(i)×W₂Wherein i represents a variable, and the meaning of i is a meta-relation vector. To W^∗(i) Are combined to obtain W^∗=[W^∗(1),W^∗(2),⋯,W^∗(S)]^TWhere T denotes transposing the matrix, W^∗I.e. the first relation matrix.

Compared with the prior art that a statistical distribution is used for initializing the relation matrix, the method and the device have the advantages that the second relation matrix is reconstructed by using the meta-relation vectors, and the weights of the corresponding relation matrixes are different according to different complexity degrees of entity relations, so that the training task of the third language model has different emphasis points; the entity relationship is complex, the weight of the corresponding relationship matrix is high, and the task amount of the third language model training is large; the entity relationship is simple, the weight of the corresponding relationship matrix is low, and the task amount of the third language model training is small. The method avoids the defect that the prior language model training has no emphasis point, and the relation extraction capability can not be obviously improved after a large amount of training. Therefore, the method and the device can greatly improve the training efficiency of the language model, so that the trained first language model is more adaptive to the relation extraction task, especially the complex relation extraction task.

Representing the relationship information as (h)₁:h₂) Inputting the first relation matrix, the third relation matrix, the fourth relation matrix and the relation information into a function to obtain a prediction relation formula as follows: argmax (Softmax W)₁W₂(h₁:h₂)+W^∗(h₁:h₂)]Where argmax is a function of the parameters (set) for the function, when there is another function ω = f (φ), if there is a result φ₀= argmax[f(φ)]Then, it means that when the function f (φ) takes φ = φ₀Then, obtaining the maximum value of the value range of f (phi); if there are multiple points such that f (φ) achieves the same maximum, argmax [ f (φ)]The result of (a) is a set of points, in other words argmax [ f (φ)]Is the variable point φ (or set of φ) to which f (φ) takes a maximum value, arg is the argument, which is here meant as an "argument".

And training the second language model through the ERM pre-training sample and the ERM pre-training loss function to obtain a third language model, and training the third language model by using a predictive relation formula, so that the first language model with relation extraction capability can be obtained more quickly.

Compared with the prior art that an NSP pre-training task frequently used by a language model is not adaptive to a downstream relation extraction task, and the language model cannot further improve the relation extraction capability from the NSP pre-training task, the method receives natural language data input by a user, inputs the natural language data into a first language model to obtain entity relation data, wherein the entity relation data is used for representing entity relations in the natural language data, the first language model is obtained by training a predetermined second language model by taking an entity relation matching ERM and a mask language model MLM as pre-training tasks to obtain a third language model, and then training the third language model by using a prediction relation formula to extract the task in a more adaptive downstream relation. The method and the device avoid using NSP as a pre-training task of the language model, and improve the relation extraction capability of the language model by using ERM as the pre-training task of the language model. Therefore, the method and the device avoid the condition that the NSP is used as a pre-training task to train the language model, so that the NSP is not adaptive to a downstream relation extraction task, and further improve the relation extraction capability of the language model.

Illustratively, based on the first embodiment of the relationship extraction method of the present application, a second embodiment is proposed, where the method further includes:

the processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection, including: and acquiring a fifth special label, wherein the fifth special label is a concatenation of the vector representations of the first special labels corresponding to each layer in the second language model, and the first special label is included in the ERM pre-training sample.

And acquiring a sixth special label, wherein the sixth special label is a concatenation of the vector representations of the third special labels corresponding to each layer in the second language model, and the third special labels are included in the ERM pre-training sample.

A plurality of layers are nested in the second language model, and the vector representations of the first special label and the second special label corresponding to each layer in the second language model are spliced to obtain a fifth special label and a sixth special label.

And after combining the fifth special label and the sixth special label, inputting the combined labels into a full connection layer to obtain a query vector collection.

The full connection layer is that each node is connected with all nodes of the previous layer and is used for integrating the extracted characteristics. And inputting the fifth special label and the sixth special label into the full-connection layer to obtain a query vector collection.

And after multiplying the query vector collection by each row of the fourth relation matrix, inputting a regression function to obtain an attention score collection.

The meaning of the attention score is that for the relationship information between the entity pairs, the higher the calculated attention score is, the higher the weight coefficient is, and the higher the importance of model learning is; the lower the calculated attention score, the lower its weight coefficient, and the lower the importance of model learning.

And combining the relation vector collection to obtain a first relation matrix.

It is understood that the second relationship matrix W is split into W based on the first embodiment₁And W₂Then, combining the first special label and the third special label corresponding to each layer of the second language model to obtain a fifth special label and a sixth special label, and marking the fifth special label as H₁=(H₁₁:H₁₂:⋯:H_1n) And the sixth special label is marked as H₂=(H₂₁:H₂₂:⋯:H_2n) Where n denotes the number of layers of the second language model, H_1kIndicating a first special label at the k-th level, H, of the second language model_2kIndicating a third special label at the k-th layer of the second language model, and adding H₁And H₂Inputting the combined query vectors into a group of full connection layers of a second language model to obtain a query vector collection; multiplying the query vector collection by each row of the fourth relation matrix, and inputting the multiplied query vector collection into a regression function to calculate an attention score collection; multiplying the calculated attention score set by a fourth relation matrix to obtain a relation vector set W^∗(i) I represents a variable, and the meaning of i is a meta relation vector; to W^∗(i) Are combined to obtain W^∗=[W^∗(1),W^∗(2),⋯,W^∗(S)]^TWhere T denotes transposing the matrix, W^∗I.e. the first relation matrix.

Representing the relationship information as (h)₁:h₂) Inputting the first relation matrix, the third relation matrix, the fourth relation matrix and the relation information into a function to obtain a prediction relation formula as follows: argmax (Softmax W)₁W₂(h₁:h₂)+W^∗(h₁:h₂)]}。

Through the steps, the attention score mechanism is utilized to construct the first relation matrix, so that the second language model has different training task amounts aiming at different weight coefficients in relation extraction training. The attention score is high, the representative weight coefficient is large, and the task amount of the second language model training is large; the attention score is low, the representative weight coefficient is small, and the task amount of the second language model training is small. Due to the adoption of an attention score mechanism, the training task of the second language model can be simplified and complicated, and the training efficiency and the training accuracy are obviously improved.

Compared with the prior art that a statistical distribution is used for initializing the relation matrix, the method and the device have the advantages that the attention score mechanism is used for reconstructing the second relation matrix, and the training tasks of the third language model have different emphasis points according to different entity relation complexity and different corresponding attention scores; the entity relationship is complex, the corresponding attention score is high, and the task amount of the third language model training is large; the entity relationship is simple, the corresponding attention score is low, and the task amount of the third language model training is small. The method avoids the defect that the prior language model training has no emphasis point, and the relation extraction capability can not be obviously improved after a large amount of training. Therefore, the method and the device can greatly improve the training efficiency of the language model, so that the trained first language model is more adaptive to the relation extraction task, especially the complex relation extraction task.

The contents related to the above embodiments are described below with reference to a preferred embodiment, and with reference to fig. 3 and fig. 4, fig. 3 is a schematic flow chart of the preferred embodiment of the present application relation extraction method, and fig. 4 is a distribution diagram of a relation matrix in a relation space in the present application relation extraction method.

Step S310, an ERM pre-training sample is obtained.

Firstly, selecting a pre-training sample from a corpus, segmenting the pre-training sample according to sentences to obtain an example sentence X, marking an entity pair contained in the example sentence X by using an NER tool, inquiring a corresponding knowledge graph, confirming whether the entity pair in the example sentence has a certain relation or not, marking the entity pair as the relation r if the relation r exists, and constructing a triple (e) for the example sentence X (e)₁,r,e₂)。

Dividing words of example sentence X to obtain X = [ X = [ [ X ]₁,⋯,e₁,⋯,e₂⋯,x₂]Wherein the illustrative sentence X is composed of n words X_i(1. ltoreq. i. ltoreq. n) and e₁And e₂For the entity pair specified in the example sentence X, r is the entity pair (e)₁,e₂) A corresponding relationship; then at the designated entity e₁And e₂Is inserted with a special label on both sides [ E ]₁]、[\E₁]、[E₂]And [ \ E₂]Constituting the sentence Ẋ, Ẋ = [ x =₁,⋯,[E₁],e₁,[\E₁]⋯,[E₂],e₂,[\E₂]⋯,x₂]Combining the obtained Ẋ with the relationship-name to obtain an ERM pre-training sample, namely [ CLS]Ẋ[SEP] relation-name [SEP]Wherein, [ CLS]The mark is placed at the head of the first sentence for classification; [ SEP ]]For splitting two input sentences, the relation-name represents the name of the relation r, wherein in 50% of the pre-training samples, the relation-name is the real name of the relation r, and the sample label is labeled as isMatch (match); in the other 50% of the samples, the relationship-name is the false name of the relation r, and the sample label is labeled notMatch.

Step S320, obtaining an ERM pre-training loss function.

ERM pretraining a task requires that the language model BERT determine the pair of specified entities (e) from the input samples₁,e₂) Whether there is a relation-name between them is a two-classification task, that is, there are two classifications with and without relation, so cross entropy describes the distance between two probability distributions using cross entropy as a loss function, and when the cross entropy is smaller, the closer the two are, the larger the cross entropy is, the farther the distance between the two is.

After BERT encoding, the entity pair (e) added in the entity pair is extracted₁,e₂) Two-sided special labels [ E ]₁]And [ E₂]And is denoted by h₁And h₂：

h₁=BERT-enconder([E₁])，h₁∈R^{1×hidder-size}

h₂=BERT-enconder([E₂])，h₂∈R^{1×hidder-size}

Wherein BERT-encoder represents BERT coding, R represents a matrix, and 1 × hidder-size represents 1 row and 1 time of hidden layer column of the matrix; the hidden layer is a multi-level abstraction of input features to better linearly partition different types of data.

H is to be₁And h₂Spliced together to obtain an entity pair (e)₁,e₂) The related information of the relation r represents: h = (h)₁:h₂)，h∈R^{1×2hidder-size}Where 1 × 2 ladder-size represents 2 times as many hidden layer columns as 1 row in the matrix.

Linear layer Linear with h input parameter size (2 odd-size, 2) yields predicted scores logits over (isMatch, notMatch) two classes: logits = linear (h) = W ∙ h + b, where W ∈ R^{2hidder-size×2}，b∈R^1×2，logits∈R^1×2The 2-level-size × 2 matrix is 2-level-size rows, 2 cases.

Then, the locations are normalized by a Softmax function to obtain the prediction probability prob on the two classes: prob = Softmax (logits), where prob ∈ R^1×2。

Then the sample labeled isMatch, label (label ∈ R)^1×1) Coded into R by one-hot (one-bit efficient coding)^1×2The vector of (a): one-hot-label = one-hot (label).

Finally, defining ERM pre-training loss function as:

Loss=Σone-hot-label^Tx log (prob), where Σ denotes a summation function and log denotes a logarithmic function.

Step S330, a prediction relation formula is obtained.

Step S340, an ERM pre-training sample and an ERM pre-training loss function are used for pre-training the language model, and then a prediction relation formula is used for carrying out fine tuning training on the language model to obtain a relation extraction language model.

Step S350, inputting the natural language data input by the user into the relationship extraction language model to obtain entity relationship data.

In fine-tuning stage, supervised learning relationships are generally usedIn this way, the pre-trained language model is finely tuned, and the following formula is often used in the prior art: rid = argmax { Softmax [ W (h)₁:h₂)]Is predicted, where argmax is a function of the parameters (set) of the function, and when there is another function ω = f (φ), if there is a result φ₀= argmax[f(φ)]Then, it means that when the function f (φ) takes φ = φ₀Obtaining the maximum value of the value range of f (phi); if there are multiple points such that f (φ) achieves the same maximum, argmax [ f (φ)]The result of (a) is a set of points, in other words argmax [ f (φ)]Is the variable point φ (or set of φ) to which f (φ) takes a maximum value, arg is the argument, which is here meant as an "argument".

The Softmax regression function is generated by popularizing Logistic regression (logarithmic probability regression) to the application of classifying the categories, and has the function of realizing multi-category classification without establishing a plurality of two-category classifiers. The idea of the Softmax classifier is that for a new sample, a score is calculated by a Softmax regression model for each class, a probability value is calculated by a Softmax function, and the class to which the Softmax classifier belongs is determined according to the final probability value.

h₁And h₂For the above-mentioned hidden layer representation of the corresponding entity to the special token, W ∈ R^relation ^{-num×2hidden-size}The relation-num indicates the number of relation labels, 2hidden-size indicates twice the hidden layer, and relation-num × 2hidden-size indicates that the matrix is a relation-num row and 2hidden-size column.

The weight matrix W is generally understood as a relationship matrix, and a row of W may be understood as a vector representation of a relationship. Representation of an entity pair (h)₁:h₂) Multiplying the sum relation matrix to obtain a size of [ relation-num,1]The vectors of the same size can be obtained through Softmax normalization processing, then subscripts corresponding to the maximum elements in the vectors of the logits are taken as predictions of the relationship between the entity pairs by the model, and the visible relationship matrix is an important component of the model.

However, in the implementation level of the model, the deep learning library usually initializes the weight matrix by using only one statistical distribution, which results in a relatively regular distribution of the relationship matrix in the relationship space. It is not reasonable to understand that the relationship representation has strong association with the entity type of the entity pair, the vector representations of different relationships should have cluster-like distribution, and the relationships with the same type of the entity pair should be clustered, for example, the relationship x between the type a entity and the type B entity is closer to the relationship y in the relationship space, and the relationship x between the type a entity and the type B entity is further from the relationship z between the type a entity and the type C entity.

Therefore, in order to optimize the distribution of the relational expression, two methods are provided for reconstructing the relational matrix, enhancing the expression capability of the model and guiding the model to learn the cluster distribution of the relational expression. To reconstruct the relationship matrix, first the meta-relationships are introduced, and then two reconstruction methods are introduced. Relationships can be abstractly decomposed into multiple meta-relationships, so a vector representation of a relationship can be obtained by a weighted sum of vector representations of multiple meta-relationships, and relationships that are weight vectors similar converge into a cluster.

(1) And decomposing the relation matrix, and training the decomposed matrix by using the element relation.

Making the relation matrix W be belonged to R^S×DSplitting into two matrices W₁∈R^S×A(weight matrix) and W₂∈R^A×D(relation matrix), W = W₁×W₂Wherein S represents relationship type, A represents element relationship quantity, and D represents vector dimension of element relationship. W₂One row W of₂(i) Can be seen as a vector representation of the meta-relation i, W₁One row W of₁(i) The meta-relation weight vector, which can be regarded as relation i, i.e. the relation matrix decomposition reconstruction representation of relation i is represented by W^∗(i)=W₁(i)×W₂Obtaining, so that the decomposed matrix is trained using meta-relations based on decomposing the relational matrix, the relational matrix W being reconstructed as W^∗I.e. W^∗=[W^∗(1),W^∗(2),⋯W^∗(S)]^T，W^∗∈R^S×D。

(2) And decomposing the relation matrix, and training the decomposed matrix by using an attention mechanism.

The method mentioned in (1), wherein the weight W can be learned by training₁And acquiring a meta-relation weight vector of the relation i, but it is noted that the pre-trained language model also contains abundant entity type information, so based on the knowledge, an attention mechanism is used for constructing another set of meta-relation weight vectors.

Splicing the entity special labels corresponding to each layer in the pre-training language model to obtain: h₁=(H₁₁:H₁₂:⋯:H_1n)，H₂=(H₂₁:H₂₂:⋯:H_2n) Where n denotes the number of layers of the pre-trained language model, H_1kIs represented by [ E₁]Corresponding to the k-th layer, H, of the language model_2kIs represented by [ E₂]Corresponding to the k-th layer of the language model. Subsequently, H is reacted with₁And H₂After splicing, sending the data to a group of full connection layers to obtain a query matrix, namely: query_i=fully-connected-layer_i(H₁:H₂)，query_i∈R^D，qurey=(query₁,query₂,⋯,query_S)^T。

Wherein full-connected-layer represents a fully connected layer, S represents a relationship type, and i represents a meta-relationship vector.

Will query_iAnd W₂Are multiplied and input into a Softmax function to obtain corresponding attention scores, namely score_ij=Softmax(query_i×W₂(j)^T) Where i ∈ (1, S), j ∈ (1, A).

score_i=(score_i1,score_i2,⋯,score_iA) Will score_iMultiplying by W₂To obtain W^∗(i) I.e. W^∗(i)=score_i×W₂，W^∗(i)∈R^D。

Training the decomposed matrix by using an attention mechanism, and reconstructing a relation matrix W into W^∗I.e. W^∗=[W^∗(1),W^∗(2),⋯W^∗(S)]^T，W^∗∈R^S×D。

And finally, expressing a prediction relation formula by using the reconstructed relation matrix as follows:

rid=argmax{Softmax[W₁W₂(h₁:h₂)+W^∗(h₁:h₂)]}。

illustratively, the present application further provides a relationship extraction apparatus, including:

the first language model is obtained by training a predetermined second language model by taking an entity relationship matching ERM and a mask language model MLM as a pre-training task to obtain a third language model and then training the third language model by using a predictive relationship formula.

Illustratively, the relationship extraction device further includes:

the first acquisition module is used for acquiring an ERM pre-training sample, an ERM pre-training loss function and a prediction relation formula;

and the training module is used for training the second language model based on the ERM pre-training sample, the ERM pre-training loss function and the prediction relation formula to obtain the first language model.

Illustratively, the relationship extraction device further includes:

the second acquisition module is used for acquiring pre-training data;

the segmentation module is used for segmenting the pre-training data to obtain a first sample;

an extraction module, configured to extract an entity pair included in the first sample, and determine a relationship tag according to a relationship between the entity pair, where the entity pair includes a first entity and a second entity;

the inserting module is used for respectively inserting a first special label and a second special label at two sides of the first entity, and respectively inserting a third special label and a fourth special label at two sides of the second entity to obtain a second sample, wherein the first special label, the second special label, the third special label and the fourth special label are used for marking the entity pair;

and the first combination module is used for combining the second sample and the relationship label to obtain the ERM pre-training sample.

Illustratively, the relationship extraction device further includes:

a third obtaining module, configured to obtain a sample vector, where the sample vector is obtained by encoding a part of the ERM pre-training sample;

a fourth obtaining module, configured to obtain a first hidden layer, where the first hidden layer is obtained by encoding a first special label through the second language model, and the first special label is included in the ERM pre-training sample;

a fifth obtaining module, configured to obtain a second hidden layer, where the second hidden layer is obtained by encoding a third special label through the second language model, and the third special label is included in the ERM pre-training sample;

a second combining module, configured to combine the first hidden layer and the second hidden layer to obtain relationship information between entity pairs in the ERM pre-training sample;

the second input module is used for inputting the relation information into a linear layer to obtain a prediction score;

the third input module is used for inputting the prediction score into a regression function to obtain the prediction probability;

a first processing module, configured to obtain the ERM pre-training loss function based on the sample vector and the prediction probability.

Illustratively, the relationship extraction device further includes:

the splitting module is used for splitting a second relation matrix based on the relation type, the meta relation quantity and the vector dimension of the meta relation to obtain a third relation matrix and a fourth relation matrix, wherein the relation type, the meta relation quantity, the vector dimension of the meta relation and the second relation matrix are preset;

the second processing module is used for processing the third relation matrix and the fourth relation matrix to obtain a relation vector collection;

the third combination module is used for combining the relation vector collection to obtain a first relation matrix;

a fourth input module, configured to input the first relationship matrix, the third relationship matrix, the fourth relationship matrix, and relationship information into a function to obtain the prediction relationship formula, where the relationship information is included in the ERM pre-training loss function.

Illustratively, the relationship extraction device further includes:

and the third processing module is used for multiplying each row of the third relation matrix by the fourth relation matrix to obtain the relation vector collection.

Illustratively, the relationship extraction device further includes:

a sixth obtaining module, configured to obtain a fifth special label, where the fifth special label is a concatenation of "vector representations" of first special labels corresponding to each layer in the second language model, and the first special label is included in the ERM pre-training sample;

a seventh obtaining module, configured to obtain a sixth special label, where the sixth special label is a concatenation of "vector representations" of third special labels corresponding to each layer in the second language model, and the third special label is included in the ERM pre-training sample;

the fourth combination module is used for inputting the combined fifth special label and the combined sixth special label into a full connection layer to obtain a query vector collection;

the fourth processing module is used for multiplying the query vector collection by each row of the fourth relation matrix and then inputting a regression function to obtain an attention score collection;

and the fifth processing module is used for multiplying the attention score collection by the fourth relation matrix to obtain the relation vector collection.

The specific implementation of the relationship extraction device of the present application is substantially the same as that of the above-mentioned relationship extraction method, and is not described herein again.

In addition, the application also provides a relation extraction device. As shown in fig. 5, fig. 5 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present application.

For example, fig. 5 is a schematic structural diagram of a hardware operating environment of the relationship extraction device.

As shown in fig. 5, the relationship extraction device may include a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 are communicated with each other through the communication bus 504, and the memory 503 is used for storing computer programs; the processor 501 is configured to implement the steps of the relationship extraction method when executing the program stored in the memory 503.

The communication bus 504 mentioned in the above relation extracting apparatus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 504 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 502 is used for communication between the above-described relationship extraction device and other devices.

The Memory 503 may include a Random Access Memory (RMD) and a Non-Volatile Memory (NM), such as at least one disk Memory. Optionally, the memory 503 may also be at least one storage device located remotely from the processor 501.

The Processor 501 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The specific implementation of the relationship extraction device of the present application is substantially the same as that of each embodiment of the relationship extraction method, and is not described herein again.

Furthermore, an embodiment of the present application also provides a computer-readable storage medium, where a relationship extraction program is stored, and when being executed by a processor, the relationship extraction program implements the steps of the relationship extraction method as described above.

The specific implementation of the computer-readable storage medium of the present application is substantially the same as the embodiments of the relationship extraction method, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a device, or a network device) to execute the method according to the embodiments of the present application.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A method of relational extraction, the method comprising:

acquiring an ERM pre-training loss function;

combining the relation vector collection to obtain a first relation matrix;

inputting the first relation matrix, the third relation matrix, the fourth relation matrix and relation information into a function to obtain a prediction relation formula, wherein the relation information is contained in the ERM pre-training loss function;

receiving natural language data input by a user;

inputting the natural language data into a first language model to obtain entity relationship data, wherein the entity relationship data is used for representing entity relationships in the natural language data, the first language model is obtained by training a predetermined second language model by taking an entity relationship matching ERM and a mask language model MLM as pre-training tasks to obtain a third language model, and then training the third language model by using a prediction relationship formula.

2. The method of claim 1, wherein prior to receiving the user-input natural language data, comprising:

obtaining an ERM pre-training sample;

3. The method of claim 2, wherein the obtaining the ERM pre-training sample comprises:

acquiring pre-training data;

segmenting the pre-training data to obtain a first sample;

4. The method of claim 2, wherein the obtaining the ERM pre-training loss function comprises:

acquiring a first hidden layer, wherein the first hidden layer is obtained by encoding a first special label through the second language model, and the first special label is contained in the ERM pre-training sample;

5. The method of claim 1, wherein the processing the third relationship matrix and the fourth relationship matrix to obtain a set of relationship vectors comprises:

6. The method of claim 1, wherein the processing the third relationship matrix and the fourth relationship matrix to obtain a set of relationship vectors, further comprises:

acquiring a fifth special label, wherein the fifth special label is a concatenation of a vector representation of a first special label corresponding to each layer in the second language model, and the first special label is included in an ERM pre-training sample;

multiplying the query vector collection by each row of the fourth relation matrix, and inputting a regression function to obtain an attention score collection;

7. A relationship extraction apparatus, the apparatus comprising:

the first acquisition module is used for acquiring an ERM pre-training loss function;

a fourth input module, configured to input the first relation matrix, the third relation matrix, the fourth relation matrix, and relation information into a function to obtain a predicted relation formula, where the relation information is included in the ERM pre-training loss function;

8. A relationship extraction apparatus, characterized in that the apparatus comprises: memory, a processor and a relationship extraction program stored on the memory and executable on the processor, the relationship extraction program when executed by the processor implementing the steps of the relationship extraction method as claimed in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a relationship extraction program which, when executed by a processor, implements the steps of the relationship extraction method according to any one of claims 1 to 6.