CN114896415A

CN114896415A - Entity relation joint extraction method and device based on lightweight self-attention mechanism

Info

Publication number: CN114896415A
Application number: CN202210499603.5A
Authority: CN
Inventors: 王艺轩; 吴正洋; 汤庸
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-08-12

Abstract

The invention discloses a method and a device for extracting entity relationship combination based on a lightweight self-attention mechanism, wherein the method comprises the steps of obtaining target sentence data and inputting the target sentence data into an entity classification model; classifying all subsequences of the target sentence data through the entity classification model to obtain an entity sequence and a non-entity sequence; acquiring any pair of entity pairs in the entity sequence, and combining the entity pairs and each word between the entity pairs and inputting the combined word into a relation classification model; generating a classification result of entity relation combination through an entity relation combination extraction model; the entity relation joint extraction model comprises a Bert encoder, an entity classification model and a relation classification model. The method has low complexity, can improve the performance of the entity relationship model, and can be widely applied to the technical field of artificial intelligence.

Description

Entity relation joint extraction method and device based on lightweight self-attention mechanism

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an entity relationship joint extraction method and device based on a lightweight self-attention mechanism.

Background

Named entity recognition: the name Entity Recognition, NER for short. The term named entity is used to identify entities with specific meaning in text, including names of people, places, organizations, proper nouns, and characters such as time, amount, currency, and ratio value. It refers to things that can be identified by proper nouns (names), and a named entity generally represents only one specific individual, including names of people, places, etc. NER belongs to the subtask of classifying and locating the emotion of a named entity from unstructured text, and the process is to generate a named entity expression of proper noun label information from an unstructured text expression.

And (3) extracting the relation: the relation extraction is to extract a triplet (subject, relation, object) from a text, and the triplet is represented in english as (subject, relation, object). So relational extraction, sometimes also called triple extraction. It can also be seen from the definition of relationship extraction that relationship extraction mainly does two things: identify the subject and object in the text (entity identification task) and determine which relationship the two entities belong to (relationship classification).

The self-attention mechanism is as follows: a machine learning technique, where self-attention can naturally perceive attributes, also known as internal attention, is an attention mechanism that relates different positions of a single sequence to compute a representation of the same sequence. It has proven to be very useful in machine reading, abstract or image description generation.

Entity recognition and relationship extraction are key techniques for constructing knowledge graphs from natural language text as a data source. With the development of social networks, the amount of natural language text data and the production speed are increasing day by day, and new knowledge is also continuously generated. At the same time, the update of the knowledge graph also needs the supplement of new knowledge. For the knowledge graph applied to false news detection, the updating speed of knowledge is particularly important. However, in order to increase the speed of acquiring knowledge from rapidly growing natural language text data, a method is needed that is capable of rapidly identifying entities and relationships from natural language text data that exists on different network platforms. This helps to increase the update speed of the knowledge-graph. Current entity and relationship extraction tasks typically use a multi-head self-attention model. However, the classical multi-head self-attention model includes three transformation matrices, and the number of parameters is large, which results in high complexity, low training efficiency and large amount of training data sets. Especially, the natural language processing model based on the pre-training model has good effect in the entity and relation extraction task. However, the training cost of the pre-training model is high, the model is complex, deployment and application on edge equipment are difficult, and the pre-training model cannot adapt to task data for rapidly updating the knowledge graph by using the network natural language text.

Disclosure of Invention

In view of this, embodiments of the present invention provide a low-complexity entity-relationship joint extraction method and apparatus based on a lightweight self-attention mechanism, and improve the performance of an entity-relationship model.

The invention provides an entity relation joint extraction method based on a lightweight self-attention mechanism, which comprises the following steps:

acquiring target sentence data and inputting the target sentence data into an entity classification model;

classifying all subsequences of the target sentence data through the entity classification model to obtain an entity sequence and a non-entity sequence;

acquiring any pair of entity pairs in the entity sequence, and combining the entity pairs and each word between the entity pairs and inputting the combined word into a relation classification model;

generating a classification result of entity relation combination through an entity relation combination extraction model;

the entity relation joint extraction model comprises a Bert encoder, an entity classification model and a relation classification model.

Optionally, the obtaining target sentence data and inputting the target sentence data into the entity classification model includes:

inputting a target sentence into the model of the Bert encoder for encoding to obtain an encoding result;

splitting the coding result into a plurality of subsequences to construct the target sentence data;

and inputting the target sentence data into an entity classification model.

Optionally, the entity classification model and the relationship classification model are implemented by a single-layer perceptron.

Optionally, the Bert encoder is a light attention encoder, and the light attention encoder includes a plurality of encoder blocks and a text encoder module composed of fully connected network layers;

when the input of the light attention encoder is a text vector, synchronously acquiring one-dimensional position encoding information of the text vector, and then carrying out sentence encoding;

after the sentence is coded, an additional one-dimensional vector is added to serve as global information to supplement the overall information of the sentence;

inputting the vector sequence after sentence coding into a span classifier for entity detection, and dividing all detected subsequences into entity types and non-entity types;

combining pairs of entities and spans between them into one long vector;

and inputting the long vector into a relation classifier, and extracting the relation in the long vector.

Optionally, the span classifier comprises a fully connected mapping layer; the relation classifier comprises a full-connection mapping layer;

the encoder block in the light attention encoder comprises two sub-modules, wherein one sub-module adopts a light attention mechanism, and the sub-modules carry out residual connection and then output and standardize the output; the other sub-module is a projection layer module, which comprises a three-layer fully-connected layer, wherein the dimension of the first layer fully-connected layer is consistent with the input dimension, the dimension of the second layer fully-connected layer is four times of the input dimension, and the dimension of the third layer fully-connected layer is consistent with the input dimension.

Optionally, the number of encoder blocks in the low attention encoder is dynamically updated according to the memory size.

Optionally, the processing procedure of the light attention encoder includes:

performing point multiplication on an input characteristic matrix and two transformation matrixes to obtain Query and Key, and simultaneously taking the input characteristic matrix as Value;

then, cosine similarity calculation is carried out on the value of Query and the value of Key to obtain a similarity score;

carrying out point multiplication on the similarity fraction to obtain a matrix, and then converting the obtained diagonal matrix into an irregular matrix through one sotfmax normalization so as to act on a Value;

the Value of the last output is the output from the attention layer, and the output is subjected to a residual linking and layer normalization.

Another aspect of the embodiments of the present invention further provides an entity relationship joint extraction apparatus based on a lightweight self-attention mechanism, including:

the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring target sentence data and inputting the target sentence data into an entity classification model;

a second module, configured to classify all subsequences of the target sentence data through the entity classification model to obtain an entity sequence and a non-entity sequence;

a third module, configured to obtain any pair of entity pairs in the entity sequence, and input the entity pairs and each word combination between the entity pairs into a relationship classification model;

the fourth module is used for generating a classification result of entity relation combination through the entity relation combination extraction model;

Another aspect of the embodiments of the present invention further provides an electronic device, which includes a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Yet another aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a program, which is executed by a processor to implement the method as described above.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

The embodiment of the invention obtains target sentence data and inputs the target sentence data into an entity classification model; classifying all subsequences of the target sentence data through the entity classification model to obtain an entity sequence and a non-entity sequence; acquiring any pair of entity pairs in the entity sequence, and combining the entity pairs and each word between the entity pairs and inputting the combined word into a relation classification model; generating a classification result of entity relation combination through an entity relation combination extraction model; the entity relation joint extraction model comprises a Bert encoder, an entity classification model and a relation classification model. The method is low in complexity and can improve the performance of the entity relationship model.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart illustrating the overall steps provided by an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an entity relationship extraction model according to an embodiment of the present invention;

fig. 3 is a flowchart of an operation process of the light attention module according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Aiming at the problems in the prior art, the embodiment of the invention provides an entity relationship joint extraction method based on a lightweight self-attention mechanism, which comprises the following steps:

and inputting the target sentence data into an entity classification model.

combining pairs of entities and spans between them into one long vector;

Optionally, the processing procedure of the light attention encoder includes:

a first module for obtaining target sentence data and inputting the target sentence data into an entity classification model;

Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.

The following detailed description of the embodiments of the present invention is made with reference to the accompanying drawings:

the existing entity extraction model of the current mainstream has large quantity of model parameters and is convenient to be deployed at edge equipment to process a knowledge graph task needing to quickly collect data and update. To solve the problems of the prior art, the present invention proposes a novel span-based federated entity and relationship extraction method with a light attention encoder (model name SpELA). In order to reduce the complexity of the model and better apply the correlation model to a small data set and deploy the correlation model to an edge device, a light-attention mechanism based on a multi-head attention mechanism is proposed. The V transformation matrix is reduced, the K and Q transformation matrices obtain the information learned in the reduction transformation matrix by learning respective parameters, and the cosine similarity projection matrix is used for mapping the input to the characteristic space of K and Q. The invention provides a novel light attention encoder based on a span combined entity relationship method, which improves the performance of an entity relationship model and reduces the complexity of the model at the same time, so that the encoder can be applied to edge equipment with a small memory.

As shown in fig. 1, the overall implementation steps of the present invention include: acquiring target sentence data and inputting the target sentence data into an entity classification model; classifying all subsequences of the target sentence data through the entity classification model to obtain an entity sequence and a non-entity sequence; acquiring any pair of entity pairs in the entity sequence, and combining the entity pairs and each word between the entity pairs and inputting the combined word into a relation classification model; generating a classification result of entity relation combination through an entity relation combination extraction model; the entity relation joint extraction model comprises a Bert encoder, an entity classification model and a relation classification model.

It should be noted that the entity classification model and the relationship classification model in this embodiment are constituent modules of an entity-relationship joint extraction model. And the entity relation joint extraction model is formed by a Bert encoder, an entity classification model and a relation classification model.

With reference to fig. 1, the whole process can be described as follows: firstly, a sentence is input into a Bert encoder to be encoded, then a model divides an encoding result of the sentence into a plurality of subsequences, then each subsequence is classified through an entity neural network classification model, the class with low probability is classified into a none class, then all subsequences which are not the none class are taken out, words between the two subsequences are spliced with the two subsequences, then the sentence is changed into a combined entity, then the sentence is classified into the none class through another relation neural network classification model, if the sentence is not related, the sentence is classified into the none class, and if the sentence is related, the relation between the entities is output, so that the relation between the entities is determined.

The following takes the sentence "qilixiang is song of zhou jieren" as an example to input, and the specific implementation process of the present invention is described in detail: firstly, a sentence "qilixiang is song of zhougeny" wherein qilixiang and zhougeny are two entity classes labeled in advance, and then the sentence is encoded by a Bert encoder, for the convenience of understanding, the encoded sentence is still: murraya is a song of cygeron, which is then split into n subsequences, such as: qilixiang, qilixiang is, zhougenen song. Qilixiang is an entity, proposing that the song of zhou jilun is not an entity, discarding, and zhou jilun is an entity, proposing. Finally, the entity classification model obtains two entities of Qilixiang and Zhougenlun, and then words between the two entities and the two entities are spliced together to form the entity classification model, such as: qilixiang is Zhougelong, the sentence is a joint entity, and finally, the relation of the sentence is output through a relation classification model, such as author relation. If there is no relation, then none is output.

The entity neural network classification model and the relation neural network classification model are both realized by a single-layer perceptron.

Fig. 2 is our proposed span-based entity-relationship extraction model, where the light attention encoder is a text encoder module consisting of n encoder blocks and one fully connected network layer. When a text vector (e.g., the bottom rectangle) is input, one-dimensional position-coding information is added to the group to ensure that the model can learn the context information of the sentence. After sentence coding, an additional one-dimensional vector CLS is added as global information to supplement the overall information of the learned sentence. The span classifier is an entity relation classification module based on spans, and is a fully connected mapping layer. When the encoded vector sequence enters it, it will perform an entity detection on the vector quantum sequence. It then classifies all detected subsequences as entity types and filters non-entities. After that, the remaining pairs of entities and the span between them are combined into one long vector. The long vectors are input to a relational classifier, which uses a fully connected mapping layer to extract relations from the long vectors.

It is noted that the encoder block in the encoder consists of two modules. The first module is a variation of the self-attentiveness mechanism, called the light self-attentiveness mechanism. The outputs are normalized and residual connections are made to form the output. The second module is a projection layer module. The concrete structure of the device is a three-layer full-connection layer. The first layer is coincident with the input dimension, the second layer is four times the dimension of the input layer, and the last layer is projected back to the dimension of the input layer. The output is then normalized and the remaining connections are performed to form the output. The two modules form an encoder block, and the number of blocks can then be dynamically selected based on the size of the video memory of the device.

As shown in fig. 3, when the feature matrix is input, it is first multiplied by two transformation matrix points to obtain Query and Key, and meanwhile, the feature matrix itself is regarded as Value, and the formula is as follows:

Q＝W ^q (I)，K＝W ^k (I)，V＝I

wherein: q stands for Query, K stands for Key, V stands for Value, and I stands for Input.

Then, cosine similarity calculation is carried out on the Q value and the K value, so that a similarity score is obtained, and the formula is as follows:

wherein: alpha is alpha _i Representing a similarity score.

Because the obtained similarity score is a column vector and the V value is a matrix, the similarity score is required to be applied to the matrix, the similarity score is required to be subjected to point multiplication to be changed into the matrix, and then the obtained diagonal matrix is changed into an irregular matrix through sotfmax normalization so as to be applied to the V value, and the formula is as follows:

P＝α·α ^T ，attn＝softmax(P)，output＝attn·V

wherein: output stands for output matrix

And finally outputting a V value which is output from an attention layer, and in order to solve the problems of less gradient diffusion and the like, performing residual error linkage on the output and performing layer standardization, wherein the formula is as follows:

X＝Norm(output+I)

wherein: x represents the input matrix of the next layer module

The application scenario of the invention is as follows:

the updating speed of knowledge is particularly important when the method is applied to the knowledge graph of false news detection and public opinion detection. However, in order to increase the speed of acquiring knowledge from rapidly growing natural language text data, a method is needed that is capable of rapidly identifying entities and relationships from natural language text data that exists on different network platforms. This helps to increase the update speed of the knowledge-graph. And the public opinion fermentation speed is too fast on the internet at present, a model capable of rapidly modeling public opinion conditions is needed, the original model parameter is too large, the parameter quantity can be reduced by the model provided by people, and the public opinion can be rapidly modeled, so that the correct trend of the public opinion is guided.

In conclusion, the method can greatly reduce the parameter quantity of the model, so that the model can be deployed on the edge-blocking equipment, the similarity calculation mode is changed, the method is more suitable for the task of language data, and the performance of the model is more prominent.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The entity relationship joint extraction method based on the lightweight self-attention mechanism is characterized by comprising the following steps:

the entity relationship combined extraction model comprises a Bert encoder, an entity classification model and a relationship classification model.

2. The method according to claim 1, wherein the obtaining target sentence data and inputting the target sentence data into an entity classification model comprises:

and inputting the target sentence data into an entity classification model.

3. The entity relationship joint extraction method based on the lightweight self-attention mechanism as claimed in claim 1, wherein the entity classification model and the relationship classification model are implemented by a single-layer perceptron.

4. The entity relationship joint extraction method based on the lightweight self-attention mechanism as claimed in claim 1, wherein the Bert encoder is a light attention encoder, the light attention encoder comprises a plurality of encoder blocks and a text encoder module composed of fully connected network layers;

combining pairs of entities and spans between them into one long vector;

5. The method of claim 4, wherein the span classifier comprises a fully connected mapping layer; the relation classifier comprises a full-connection mapping layer;

6. The method of claim 5, wherein the number of encoder blocks in the light attention encoder is dynamically updated according to a memory size.

7. The method of claim 6, wherein the processing procedure of the light attention encoder comprises:

8. Entity relation joint extraction device based on lightweight self-attention mechanism, characterized by comprising:

9. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1 to 7.