CN116205217A

CN116205217A - Small sample relation extraction method, system, electronic equipment and storage medium

Info

Publication number: CN116205217A
Application number: CN202310495624.4A
Authority: CN
Inventors: 于艳华; 李劼; 杨胜利; 管印
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-06-02
Anticipated expiration: 2043-05-05
Also published as: CN116205217B

Abstract

The invention discloses a small sample relation extraction method, a system, electronic equipment and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: acquiring a target text; determining entity relation representation according to the small sample relation extraction model and the target text; the entity relationship representation comprises entity text and corresponding concepts and relationships; the small sample relation extraction model is trained by comparing the learning loss and the cross entropy loss; the small sample relation extraction model comprises a concept coding module, a sentence coding module and a text concept fusion module; the concept coding module and the sentence coding module are connected with the text concept fusion module; the concept coding module is constructed based on a skip-gram model; the sentence coding module is constructed based on a Bert embedding model; the text concept fusion module is built based on a self-attention mechanism network and a similarity gate. The invention can improve the accuracy of extracting the sample relation when the sample is insufficient.

Description

Small sample relation extraction method, system, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and system for extracting a small sample relationship, an electronic device, and a storage medium.

Background

The purpose of the relationship extraction task is to extract relationships between entities from unstructured raw text data, thereby converting these raw unstructured data into structured data that is easy to store and analyze. The relation extraction technology is widely applied to the artificial intelligence fields such as knowledge maps, automatic questions and answers, search engines and the like. The relation extraction is divided into general relation extraction, open domain relation extraction, small sample relation extraction and the like according to the number of training samples. Deep learning techniques use neural network models to automatically extract text features, and it is a difficulty how to have the neural network models better mine text semantic information for classification. Typically, the relationship extraction task requires a significant amount of training data to support, which typically requires a significant amount of human effort to annotate. The small sample relation extraction is to extract the relation existing between given test sentences given a few sample instances.

The performance of current traditional models relies heavily on time-consuming labor-intensive labeling data, while some models achieve good results in terms of public relations, as the number of training instances of a relation decreases, the performance of the relation may drop dramatically, and classifiers may be prone to more sample classes, so that when the samples are insufficient, the accuracy of the existing models in sample relation extraction is not high.

Disclosure of Invention

The invention aims to provide a small sample relation extraction method, a system, electronic equipment and a storage medium, which can improve the accuracy of sample relation extraction when samples are insufficient.

In order to achieve the above object, the present invention provides the following solutions:

a small sample relationship extraction method, comprising:

acquiring a target text;

determining entity relation representation according to the small sample relation extraction model and the target text; the entity relation representation comprises entity texts and corresponding concepts and relations; the small sample relation extraction model is trained by comparing learning loss and cross entropy loss;

the small sample relation extraction model comprises a concept coding module, a sentence coding module and a text concept fusion module; the concept coding module and the sentence coding module are connected with the text concept fusion module; the concept coding module is constructed based on a skip-gram model; the sentence coding module is constructed based on a Bert embedding model; the text concept fusion module is constructed based on a self-attention mechanism network and a similarity gate;

the concept coding module is used for determining a plurality of concept embedding vectors of entity texts in the target texts; the sentence coding module is used for determining sentence embedded vectors of the target text; the text concept fusion module is used for determining entity relation expression according to each concept embedding vector and each sentence embedding vector.

Optionally, the determining entity relationship representation according to the small sample relationship extraction model and the target text specifically includes:

extracting concepts of the entity text of the target text according to a set concept database to obtain a plurality of candidate concepts of the entity text;

inputting each candidate concept into the concept coding module to obtain a plurality of concept embedding vectors;

inputting the target text into the sentence coding module to obtain a sentence embedded vector;

and inputting each concept embedded vector and each sentence embedded vector into the text concept fusion module to obtain entity relationship representation.

Optionally, inputting each concept embedding vector and each sentence embedding vector into the text concept fusion module to obtain an entity relationship representation, which specifically includes:

calculating the similarity between each concept embedded vector and each sentence embedded vector to obtain a plurality of similarity matrixes;

normalizing each similarity matrix by using a Softmax function to obtain each similarity score;

determining an optimal concept embedding vector corresponding to the sentence embedding vector according to a preset similarity threshold value and each similarity score value;

and splicing the optimal concept embedded vector and the sentence embedded vector to obtain entity relation expression.

Optionally, the training process of the small sample relation extraction model specifically includes:

acquiring training data; the training data comprises training texts and corresponding relation labels; the relationship label comprises entity texts in training texts and corresponding concepts and relationships;

constructing a training model based on the Bert embedded model, the skip-gram model, the self-attention mechanism network and the similarity gate;

and inputting the training data into the training model, training the training model by utilizing the contrast learning loss and the cross entropy loss, and determining the trained training model as the small sample relation extraction model.

Optionally, the setting concept database includes a YAGO3 database, a ConceptNet database, and a ConceptGraph database.

The invention also provides a small sample relation extraction system, comprising:

the text acquisition module is used for acquiring a target text;

the relation extraction module is used for determining entity relation expression according to the small sample relation extraction model and the target text; the entity relation representation comprises entity texts and corresponding concepts and relations; the small sample relation extraction model is trained by comparing learning loss and cross entropy loss;

The invention also provides an electronic device comprising a memory for storing a computer program and a processor for running the computer program to cause the electronic device to perform a small sample relation extraction method according to the above.

The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements a small sample relationship extraction method as described above.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention discloses a small sample relation extraction method, a system, electronic equipment and a storage medium. The small sample relation extraction model is trained by comparing the learning loss and the cross entropy loss, and the similarity of the entity text and the corresponding concepts is controlled by utilizing a self-attention mechanism network and a similarity gate in the text concept fusion module, so that the model result is prevented from being prone to more types of samples, and the accuracy of sample relation extraction is improved when the samples are insufficient.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a small sample relationship extraction method according to the present invention;

FIG. 2 is a schematic diagram of the operation logic of the small sample relation extraction model in the present embodiment;

FIG. 3 is a schematic diagram of the operation logic of the concept selection gate algorithm in the present embodiment;

FIG. 4 is a schematic diagram of text fusion operation logic based on self-attention in the present embodiment;

FIG. 5 is a diagram of a relational classification module architecture based on contrast learning in the present embodiment;

FIG. 6 is a block diagram of a small sample relationship extraction system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in fig. 1, the present invention provides a small sample relation extraction method, which includes:

step 100: and acquiring the target text.

Step 200: determining entity relation representation according to the small sample relation extraction model and the target text; the entity relation representation comprises entity texts and corresponding concepts and relations; wherein the small sample relationship extraction model is trained by comparing learning loss and cross entropy loss.

Specifically, the small sample relation extraction model comprises a concept coding module, a sentence coding module and a text concept fusion module; the concept coding module and the sentence coding module are connected with the text concept fusion module; the concept coding module is constructed based on a skip-gram model; the sentence coding module is constructed based on a Bert embedding model; the text concept fusion module is constructed based on a self-attention mechanism network and a similarity gate.

The training process of the small sample relation extraction model specifically comprises the following steps:

acquiring training data; the training data comprises training texts and corresponding relation labels; the relationship label comprises entity texts in training texts and corresponding concepts and relationships; constructing a training model based on the Bert embedded model, the skip-gram model, the self-attention mechanism network and the similarity gate; and inputting the training data into the training model, training the training model by utilizing the contrast learning loss and the cross entropy loss, and determining the trained training model as the small sample relation extraction model.

In this embodiment, the set concept database includes a YAGO3 database, a ConceptNet database, and a ConceptGraph database.

As a specific embodiment of step 200, it includes:

firstly, extracting concepts of the entity text of the target text according to a set concept database, and obtaining a plurality of candidate concepts of the entity text.

And then, inputting each candidate concept into the concept coding module to obtain a plurality of concept embedding vectors. And inputting the target text into the sentence coding module to obtain a sentence embedded vector.

And finally, inputting each concept embedded vector and each sentence embedded vector into the text concept fusion module to obtain entity relationship representation. The method specifically comprises the following steps:

calculating the similarity between each concept embedded vector and each sentence embedded vector to obtain a plurality of similarity matrixes; normalizing each similarity matrix by using a Softmax function to obtain each similarity score; determining an optimal concept embedding vector corresponding to the sentence embedding vector according to a preset similarity threshold value and each similarity score value; and splicing the optimal concept embedded vector and the sentence embedded vector to obtain entity relation expression.

Based on the above scheme, embodiments as shown in fig. 2-5 are provided:

as shown in fig. 2, the main operation logic of the small sample relation extraction model is: for the text of the input sentence, candidate concepts corresponding to the entities are obtained from the database (such as the candidate concepts corresponding to the Bellgas in FIG. 2: people, million-rich, and Enterprise). Embedding sentences by using the Bert, and embedding selected candidate relations by using a skip-gram training model (a skip model) to obtain vector representations of sentences and concepts. And calculating the similarity between all concepts and sentences according to the concept vectors and the sentence vectors, judging a threshold value of the similarity after softmax normalization, and selecting a reasonable vector. And fusing sentences and concepts by using a self-attention mechanism, and finally, introducing supervised contrast learning loss and cross entropy loss for training.

A key part of the above process is the introduction of external knowledge to enhance the embedded representation (i.e., to set up the concept database), it is apparent that plain text information is limited in small sample relation extraction scenarios. Therefore, when some external auxiliary information is introduced to compensate for limited information in the support set, the performance of the model can be improved. How to extract the most useful information from the external information while avoiding the introduction of interference information is a considerable problem. In this embodiment, therefore, the entity concept is introduced as external information to enhance the representation of the prototype.

(1) First, concepts are intuitive and concise descriptions of entities and can be easily obtained from a database of set concepts (YAGO 3, conceptNet, conceptGraph). The present embodiment option is obtained from the ConceptGraph database. The ConceptGraph database is a large-scale common sense concept knowledge graph developed by Microsoft, contains entity concepts stored in triples (entity, isA, concept), and can provide concept knowledge for entities in a relationship extraction scheme (ConceptFERE). Wherein concept embedding employs pre-trained concept embedding.

(2) In addition, concepts are more abstract than each entity-specific textual description, and not only can supplement the limited information in the support set, but are also more suitable as prototypes for a class of relationships.

As shown in table 1, it is intuitively known that the head entity is a company, the tail entity is an enterprise, and the corresponding relationship of entity pairs in sentences can be limited to a range. On the other hand, some relationships should be eliminated, for example, as read. Semantic information of concepts may help determine model predicted relationships: the creator.

Table 1 concept introduction example

But an entity may have different concepts in different aspects or angles, such as the person belgium, the million-rich, the business, and ultimately only a few concepts may be related to their relationship. Therefore, when external information is introduced, a similarity judging gate is introduced to judge the similarity of the external concepts and the relational text. Secondly, since sentence embedding and pre-trained concept embedding are not learned in the same semantic space, the present embodiment adopts a self-attention mechanism to perform word-level semantic fusion on sentences and selected concepts, resulting in final sentence representation.

Sentence embedding:

in this embodiment, a Bert pre-training model is used as text embedding for sentences of the support set. The input of Bert is a representation corresponding to each embedded word (token), and is input into the Bert pre-training model by converting the token into feature vectors. To accomplish a specific classification task, a specific classification token is inserted at the beginning of each sentence sequence entered, except for the token of the word, and the last attention layer (transducer) output corresponding to the classification token is used to aggregate the entire sequence characterization information.

Since Bert is a pre-trained model that must accommodate a wide variety of natural language tasks, the sequence entered by the model must be capable of containing one sentence (text emotion classification, sequence labeling task) or more than two sentences (text abstract, natural language inference, question-answering task). In order for the model to have the ability to resolve which part belongs to sentence a and which part belongs to sentence B, bert adopts two methods to solve: inserting a segmentation token ([ SEP ]) into each sentence in the sequence token to separate different sentences token; a learnable segmentation embedding (empdding) is added to each token to indicate whether it belongs to sentence a or sentence B.

In terms of pre-training, bert builds two pre-training tasks, a mask language model (Masked Language Model, MLM) and next sentence prediction (Next Sentence Prediction, NSP), respectively. The MLM task tends to extract token-level tokens and therefore cannot directly obtain sentence-level tokens. In order to enable models to understand relationships between sentences, bert uses NSP tasks to pretrain, simply to predict whether two sentences constitute a context. In this way, some tasks such as question-answering, natural language inference, etc. that require understanding the relationship between two sentences can be well trained using Bert.

Based on the above two points, the Bert pre-training model can be well used as an embedding layer for NLP relation extraction.

Concept embedding:

for introduced concepts, conventional pre-training concept embedding is used, i.e., the embedded layer representation of the concept is learned on learning wiki encyclopedias and conceptual diagrams using a skip-gram model.

The concept selection gate algorithm as shown in fig. 3:

an entity may have different concepts in different aspects or angles, and eventually only a few concepts may be related to the relationship to which the entity belongs, while other concepts may have a negative effect on the classification of the relationship, and a more semantically similar concept needs to be selected by the model autonomously.

First, since concept embedding is performed through a concept embedding layer (Skip-gram)V _c And sentence embedding via a sentence embedding layer (Bert)V _s Not learned in the same semantic space, so semantic similarity cannot be directly compared. Thus, the present embodiment uses a fully connected mapping layerPWill beV _c And (3) withV _s Mapping to the same space, and comparing the similarity. The mapping layer herePIs learnable.

Secondly, sentences and n candidate concepts are respectively embedded into a vector with the length of 768 by a Bert and skip-gram pre-training modelV _s AndV _c after that, through the fully connected mapping layerPWill beV _s AndV _c performing similarity comparison to obtain a similarity matrix with length of nsim _cs 。

Finally, the similarity values were normalized by the Softmax function. Obtaining similarity scores corresponding to the n candidate concepts when the scores are greater than a threshold

When this corresponding concept and sentence is considered very similar and relevant. Then, in the following self-attention based fusion module, the corresponding concept weight is assigned 1, and the score is smaller than the threshold +.>

When the concept is considered to be irrelevant to sentences, and possibly brings interference to relation classification, the weight of the corresponding concept is assigned with 0. Here->

Is a model hyper-parameter.

The self-attention based text concept fusion module as shown in fig. 4:

since the concept embedding and the word embedding in the sentence are not learned in the same semantic space, the present embodiment designs a self-attention based fusion module to perform word-level semantic fusion for each word in the concept and sentence. First, the embedding of all words in the sentence and the embedding of the selected concept are connected and then sent to the self-attention module. As shown in fig. 4, the self-attention module calculates a similarity value between the concept and each word in the sentence. It multiplies the concept embedding and the similarity value and then combines with the corresponding word embedding as follows:

wherein ,

is shown in the firstiThe individual words are embedded after performing word-level semantic fusion. N is the number of words in the sentence,jis as followsiGenerating an embedded firstjThe individual affects the word.q _i 、k _j Andv _j each of the vectors generated after passing through the self-attention module matrix.

Thus, the support set sentence embedding which introduces external reasonable clues is obtained.

Prototype discriminant ability was increased using contrast learning: the embodiment adopts a contrast learning method to solve the problem that prototypes of different categories are likely to be close in an embedding space, and the specific process is as follows.

In the above, sentence embedding has been achieved that introduces external rational clues. The embodiment proposes that the prototype network is used as a backbone network, and contrast learning loss is introduced when a relational prototype is constructed, so as to alleviate the above approach problem. The overall architecture of the relation classification module based on contrast learning is shown in fig. 5, and the algorithm mainly comprises the following steps:

1. text embedding

For sentences belonging to Support Instance and sentences belonging to Query Instance, the embodiment is firstly sent into a fusion module of external concepts to obtain sentences embedded with richer semantics

AndQ _j. wherein ,

Representation support setiClass relationship of the firstkA vector representation of the instances of the object,Q _j represent the firstjVector representations of individual query set instances. In addition, the present embodiment also sends the relation descriptor in the support set to the Bert encoder to obtain the vector representation corresponding to the relationR ⁱ 。/>

2. Generating relationship prototypes

For any relationship under a batch, the algorithm generates its corresponding relationship prototype (prototype). For the previous prototype network, the method that the average value embedded by all the instances under the same class is used as the rough method of the relation prototype, the embodiment manually selects more important vectors to generate the prototype. Specifically, for the firstiPrototype of class relationships, the following formula:

wherein ,

to support centralization ofiClass IIIkThe hidden layer vector representation of the token corresponding to the external concept and the token in the embedded representation of the individual sentences are added to obtain the single sentence feature.r _i Is the firstiClass relationship descriptors are relationship features represented by token hidden layer vectors after bert encoding. Reuse is carried outkThe average value of the single sentence features is added with the relation features to finally obtain the first sentenceiA relationship prototype of the relationship.

Then, by embedding the representation of Query Instance and prototypes of N relationships, the model can calculate the probability of the possible relationships of the Query Instance:

wherein ,q _j a vector representation representing the j-th query instance,p _n a relationship prototype vector representing a relationship of the nth class.

3. Adjusting a relationship prototype based on contrast learning

As described above, when the relationship types in the same training batch are similar, the distribution of the relationship prototypes in the embedding space is close, and thus the discrimination capability of the model may be reduced. Therefore, after the relation prototypes are embedded and generated through the text, all the relation prototypes contained in the support set are put into the contrast learning module, so that the distribution of each relation prototype in an embedding space is more discretized, and the discrimination capability of the prototypes is improved. It should be noted that, unlike the conventional self-supervised contrast learning approach, labels supporting set instance relationships are used for supervised contrast learning.

Specifically, the supervised contrast learning module uses the relationship as an anchor point, takes a relationship prototype of the same class as a positive sample, takes prototypes of different relationship classes under the same batch as negative samples, and aims to pull the positive sample relationship by the anchor point and push the negative samples away.

For support under-set (BJV)iClass relationshipsr _i The model will select the corresponding positive sample prototypep _i And negative sample prototype

Similarity is measured using dot products, with:

wherein ,

representing the similarity between pairs of positive samples in the relationship of class i,

Representing the similarity of negative pairs of samples in the i-th class relationship.

The supervised contrast learning penalty is as follows:

wherein ,

representing the similarity between the negative sample pairs under the i-th class relationship and the other n classes.

And finally, embedding the query instance and the sentence prototype of the support set adjusted by the comparison learning module into dot products, performing cross entropy loss as a classification result closest to the query instance, and optimizing the overall parameters.

wherein ,L_CE Representing cross entropy loss; z _y Representing the predicted outcome of the model.

In this embodiment, the required operating environment is: pytorch:1.7.1, CUDA:11.0, GPU: NVIDIA GeForce RTX 3090, 24G. Training was performed in this environment.

The embodiment provides an external information introducing module based on similarity, which can enable a model to autonomously select external information so as to filter external noise and improve the performance of the model. In addition, in the embodiment, for the similarity relationship, a method of contrast learning is introduced to improve the discrimination capability of the model.

As shown in fig. 6, the present invention further provides a small sample relationship extraction system, comprising:

the text acquisition module is used for acquiring a target text;

the relation extraction module is used for determining entity relation expression according to the small sample relation extraction model and the target text; the entity relation representation comprises entity texts and corresponding concepts and relations; the small sample relation extraction model is trained by comparing learning loss and cross entropy loss; the small sample relation extraction model comprises a concept coding module, a sentence coding module and a text concept fusion module; the concept coding module and the sentence coding module are connected with the text concept fusion module; the concept coding module is constructed based on a skip-gram model; the sentence coding module is constructed based on a Bert embedding model; the text concept fusion module is constructed based on a self-attention mechanism network and a similarity gate; the concept coding module is used for determining a plurality of concept embedding vectors of entity texts in the target texts; the sentence coding module is used for determining sentence embedded vectors of the target text; the text concept fusion module is used for determining entity relation expression according to each concept embedding vector and each sentence embedding vector.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the core concept of the invention; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A method for small sample relationship extraction, comprising:

acquiring a target text;

2. The small sample relationship extraction method according to claim 1, wherein the determining an entity relationship representation according to the small sample relationship extraction model and the target text specifically comprises:

3. The method for extracting small sample relationships according to claim 2, wherein inputting each of the concept embedding vectors and the sentence embedding vectors into the text concept fusion module to obtain an entity relationship representation, comprises:

4. The small sample relationship extraction method according to claim 2, wherein the training process of the small sample relationship extraction model specifically comprises:

5. The small sample relationship extraction method of claim 2, wherein the set concept database comprises a YAGO3 database, a ConceptNet database, and a ConceptGraph database.

6. A small sample relationship extraction system, comprising:

the text acquisition module is used for acquiring a target text;

7. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the small sample relationship extraction method of claims 1-5.

8. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the small sample relationship extraction method as claimed in claims 1-5.