CN114896402A

CN114896402A - Text relation extraction method, device, equipment and computer storage medium

Info

Publication number: CN114896402A
Application number: CN202210565045.8A
Authority: CN
Inventors: 曾碧卿; 李砚龙; 邓会敏; 丁明浩; 蔡剑
Original assignee: GUANGDONG AIB POLYTECHNIC COLLEGE; South China Normal University
Current assignee: GUANGDONG AIB POLYTECHNIC COLLEGE; South China Normal University
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-08-12

Abstract

The invention relates to a text relation extraction method, a text relation extraction device, text relation extraction equipment and a computer storage medium. The text relation extraction method comprises the steps of obtaining a text sentence packet; carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet; and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet. The text relation extraction method provided by the invention has the advantages that the text sentence packet is subjected to negative training through the Bert pre-training model, the noise sentence packet of the text sentence packet can be identified, the noise sentence packet is re-labeled, the noise of the text sentence packet is obviously reduced, and the text relation extraction effect is favorably improved.

Description

Text relation extraction method, device, equipment and computer storage medium

Technical Field

The present invention relates to the field of text relationship extraction technologies, and in particular, to a text relationship extraction method, apparatus, device, and computer storage medium.

Background

Information extraction is a main task of natural language processing, aims to extract structured text information from unstructured text, and plays an important role in construction and expansion of knowledge maps and structuring of knowledge bases.

The relation extraction is an important step in the information extraction, the aim of the relation extraction is to classify the relation between entity pairs in sentences, and the relation extraction has important effects on question answering of a knowledge base, construction of the knowledge base, text summarization and the like. The remote supervision acquires training data by aligning the corpus with the knowledge base, namely, if a certain entity pair exists in the knowledge base, all sentences containing the entity pair in the corpus are marked as the corresponding relation of the entity pair in the knowledge base. Remote supervision is an effective method for automatically labeling large-scale training data, but because the assumption is too loose, a large amount of noise is generated in the process of labeling data back, and the performance of relationship extraction is seriously affected by the noise data, so that the noise reduction processing is required to be carried out on the remote supervision relationship extraction.

At present, a rule statistics method, a multi-example learning method, a countermeasure and reinforcement learning method and the like are mainly adopted to perform noise reduction processing on remote supervision relation extraction, wherein:

(1) the rule statistical method belongs to the category of machines and aims to find out the relation between entity pairs and templates or rules in relation, mainly comprises a probability graph model, matrix completion, a kernel method, a dependency relation and the like. However, the method relies on a large amount of characteristic engineering, which not only consumes manpower and time, but also has the problems of poor classification effect and low generalization capability.

(2) The multi-example learning method is one of main noise reduction methods for remote supervision relationship extraction, and is to combine aligned texts with the same entity pair into a multi-example packet (sentence packet for short) and predict the multi-example packet by taking the sentence packet as a unit. The multi-example learning method may be divided into error label prediction, at least one hypothesis, and an attention mechanism according to the prediction method, wherein: a) the error label prediction is to judge whether the label is an error label by calculating the correlation between the label of the sentence packet and each sentence label in the sentence packet; b) at least one assumption is that at least one sentence exists in the remotely supervised aligned text, the real entity pair relationship can be obtained in seconds, and only one sentence in the sentence packet needs to be selected for prediction each time; c) the attention mechanism assigns a weight to each sentence in the sentence packet, assigns a lower weight to reduce its influence on the classification of the sentence packet when the sentence is noise, and assigns a higher weight to emphasize its importance when the sentence is not noise, that is, it assigns a weight to each sentence to strengthen real tag data and weaken erroneous tag data. It can be seen that, compared with the rule statistics method, the multi-example learning method can avoid the risk of error propagation brought by feature engineering, so that the model has generalization capability.

However, the multi-instance learning method has higher time and space complexity requirements than the rule statistics method, and has the following problems in model training: 1) the multi-example learning method cannot process the condition that all sentences in a sentence packet are noise, namely a full-noise sentence packet, because if a certain entity is noise for all aligned texts, at least one sentence is assigned with a larger weight under the multi-example learning framework, namely one sentence is considered to be not noise certainly, and the condition that all sentences are noise cannot be processed; 2) the mapping of the relation between sentences and labels cannot be well processed based on the prediction of the sentence packet, and the situation that the labels of the sentence packet and the labels of the sentences are not designated clearly can occur while the relation is predicted, so that the prediction difficulty is increased; 3) the soft strategy mainly based on the attention mechanism tolerates a part of noise to participate in the training of the model, so that the model learns some wrong mapping relations, and the whole classification effect can be influenced.

(3) The main objective of the confrontation and reinforcement learning method is to improve the quality of the whole sample, enable the model to learn more accurate entity-pair relationship, and further improve the generalization ability and robustness of the training model, so that the confrontation and reinforcement learning method can well improve the quality of the corpus. However, the countermeasure and reinforcement learning method requires a plurality of models for joint training, and has the disadvantages of high training difficulty, poor stability, high time and space complexity, and difficulty in industrial implementation.

Therefore, the existing relation extraction has the problem of poor noise reduction effect.

Disclosure of Invention

Based on this, the present invention aims to provide a text relation extraction method, apparatus, device and computer storage medium, which reduces the noise of the text and improves the extraction effect.

A text relation extraction method comprises the following steps:

s1: acquiring a text sentence packet;

s2: carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;

s3: and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet.

Further, the Bert pre-trained model includes an input layer, an embedding layer, a feature extraction layer, and an output layer, and step S2 includes the following steps:

s21: inputting the text sentence packet into an input layer to obtain a sentence sequence;

s22: inputting the sentence sequence into the embedding layer to obtain a sentence vector;

s23: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;

s24: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;

s25: calculating a negative training cross entropy loss function by using the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by using the probability distribution of the text sentence packet, and re-labeling the noise sentence packet to obtain an improved text sentence packet;

s26: and repeating the steps S21-S25 until the Bert pre-training model meets the convergence condition, stopping iteration, and determining the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.

Further, the embedding layer includes a word embedding vector and a position embedding vector, and step S22 is to obtain a sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.

Further, the feature extraction layer includes a hidden layer and a relation attention layer, and step S23 is: inputting sentence vectors into a hidden layer to obtain a relation matrix and hidden vectors of a text sentence packet; inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix; carrying out weighted summation on the hidden vector and the attention weight coefficient of the last hidden layer to obtain a hidden vector weighted representation; cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet; and carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet.

Further, in step S25, the re-labeling of the noise sentence packets is performed by using an unsupervised clustering method, and includes the following sub-steps:

s251: acquiring a text label set, and processing the text label set by using an unsupervised clustering model to obtain an initial clustering center;

s252: the method comprises the steps that an initial feature vector of a noise sentence packet is represented and calculated by an implicit vector output by the last layer of the noise sentence packet in a feature extraction layer;

s253: calculating the similarity between the initial characteristic vector and the initial clustering center by utilizing the t distribution of the students;

s254: calculating KL divergence by utilizing the similarity of the initial characteristic vector and the initial clustering center, judging whether the unsupervised clustering model meets a convergence condition or not according to the KL divergence, updating the initial characteristic vector and the initial clustering center when the unsupervised clustering model does not meet the convergence condition, repeating the steps S253-S254 until the unsupervised clustering model meets the convergence condition, and stopping iteration to obtain an optimized characteristic vector and an optimized clustering center;

s255: and calculating the similarity of the optimized feature vector and the optimized clustering center by utilizing the t distribution of the students, and determining the label of the optimized clustering center with the similarity larger than a similarity threshold value as the label of the noise sentence packet.

Further, in step S25, the text sentence packets are divided into clean sentence packets and noise sentence packets by using the probability distribution of the text sentence packets, and the text sentence packets with probability values greater than or equal to the probability threshold are determined as clean sentence packets; and determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet.

Further, step S3 is:

s31: inputting the optimized text sentence packet into an input layer to obtain a sentence sequence;

s32: inputting the sentence sequence into the embedding layer to obtain a sentence vector;

s33: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;

s34: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;

s35: and calculating a forward training cross entropy loss function by using the probability distribution of the text sentence packet, and classifying the text sentence packet according to the forward training cross entropy loss function and the probability distribution of the text sentence packet to obtain a classification result of the text sentence packet.

The invention also provides a text relation extracting device, which comprises:

the acquisition module is used for acquiring the text sentence packet;

the noise reduction module is used for carrying out negative training on the text sentence packet by utilizing a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;

and the classification module is used for positively training the optimized text sentence packet by utilizing the BERT pre-training model to obtain a classification result of the text sentence packet.

The invention also provides a text relation extraction device comprising a memory, a processor and a computer program, wherein the computer program is stored in the memory and is configured to be executed by the processor to realize the text relation extraction method.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which is executed by a processor to implement the text relation extracting method according to the present invention.

For a better understanding and practice, the present invention is described in detail below with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a remote supervised relationship extraction method of the present invention;

FIG. 2 is a diagram of a Bert pre-training model in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that the embodiments described are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It is to be understood that the embodiments of the present application are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the embodiments of the present application is limited only by the following claims.

As shown in fig. 1, the present invention provides a remote supervised relationship extraction method, which includes the following steps:

s1: and acquiring the text sentence packet.

In this embodiment, the text sentence packet is a set of a plurality of sentences, and the sentences in the text sentence packet have the same entity pair (head entity, tail entity), let b _i Representing sentences, then textSentence bag B _n Expressed as: b is _n ＝{b ₁ ,b ₂ ,...,b _n And n is the number of sentences in the text sentence packet.

S2: and carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet.

The Negative Training (NT) is trained based on the concept that the input sentence does not belong to the complementary tag, because the negative training can not only provide less noise information, but also effectively separate a clean sentence packet and a noise sentence packet during the training process, which is helpful to reduce the influence of the noise sentence packet on the extraction of the text relationship.

The Bert pre-training Model is a pre-training Language Model, which is called Bidirectional Encoder retrieval from transformations, and is obtained by training a Masked Language Model and predicting a next sentence task. In this embodiment, as shown in fig. 2, the Bert pre-trained model includes an input layer, an embedding layer, a feature extraction layer, and an output layer, and step S2 includes the following steps:

(1) and inputting the text sentence packet into an input layer to obtain a sentence sequence.

The sentences in the text sentence packets are converted into sentence sequences, so that the Bert pre-training model can more efficiently capture the label relation and the unstructured relation implied by the entity pairs of the sentences. In the sentence sequence, [ CLS ] represents the sentence sequence information, [ H-SEP ] represents the separator of the head entity, [ T-SEP ] represents the separator of the tail entity, and [ SEP ] represents the character of the end of the sequence. Moreover, in order to enable the Bert pre-training model to pay more attention to the entity pairs and the structural information of the possible relationship types in the feature extraction process, in the implementation, Sub-tree Parse (STP) is adopted to process sentences of a text sentence packet to obtain an STP sequence, and then the STP sequence is spliced with the participled sentence sequence to obtain the sentence sequence.

(2) And inputting the sentence sequence into the embedding layer to obtain a sentence vector.

In this embodiment, the embedding layer includes a word embedding vector and a position embedding vector, where: the word embedding vector adopts a BPE (Byte-Pair Encoding) word segmentation mode, and the word is decomposed into fragments with finer granularity by the BPE technology, so that a Bert pre-training model can be used for extracting useful information with finer granularity in a sentence sequence; the position embedding vector is an important component of an attention mechanism in a Bert pre-training model, and the Bert pre-training model can learn a unique position embedding to represent the mark position of each input sequence in the input sequence through position coding, so that after the sentence sequence is input into an embedding layer, the embedding layer can obtain the sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.

(3) And inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation.

In this embodiment, the feature extraction layer includes a hidden layer and a relationship attention layer, and the sentence vector is input into the feature extraction layer, and the sentence packet representation is obtained by the following method:

a) inputting sentence vectors into a hidden layer to obtain a relation matrix and a hidden vector of a text sentence packet, and enabling h to be a head entity vector and t to be a tail entity vector, so that the ith entity corresponds to a corresponding relation vector l _i Comprises the following steps: l _i ＝t _i -h _i And using the relation vector l _i Composition relation matrix L _i ＝{l ₁ ,l ₂ ,...,l _i }; moreover, more nonlinear information can be captured through linear transformation of the head entity vector and the tail entity vector and the activation function Tanh, and at the moment, the relation matrix L _i The transformation can be as follows: l is _i ＝Tanh(w ₁ (t _i -h _i )+c ₁ ) Wherein: w is a ₁ Is a relational weight matrix, c ₁ Is a bias variable, when the relationship matrix L _i A matrix may be constructed for possible relationships rather than a true relationship matrix.

b) And inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix. Specifically, a relationship matrix L is utilized _i Relation vector l in _i And the hidden vector h output by the last hidden layer _L To obtainAttention weight coefficient α _r Namely:

wherein l _i Is a relationship matrix L _i Relation vector of (1), h _Li Hidden vector h output for hidden layer of last layer of Bert pre-training model _L And n is the number of sentences.

c) And carrying out weighted summation on the hidden vector and the attention weight coefficient of the last layer to obtain a hidden vector weighted representation. In this embodiment, let h' _L For implicit vector weighted characterization, then

Wherein: h is _L And outputting the hidden vectors for the hidden layer of the last layer.

d) And cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet. In this embodiment, let s _i Representing a sentence representation, then s _i ＝[l _i ；h′ _L ]Wherein: l _i Is a relation vector, h' _L And (4) carrying out hidden vector weighting characterization.

e) And carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet. In this embodiment, let B represent a sentence representation of a text sentence package, then

Wherein: alpha is alpha _i Characterizing s for sentences _i The relation with the text sentence packet represents the similarity of r, and

(4) and representing the sentence packets in the input and output layer to obtain the probability distribution of the text sentence packets.

In this embodiment, the output layer of the Bert pre-trained model has a Softmax function, so when a packet characterizes the B input-output layer, the output layer will output a probability distribution p (r) that is a relational characterization r, and p (r) is Softmax (W) _r ,B+d _r ) Wherein: w _r Is a relational weight matrix, d _r Is a bias factor.

(5) Calculating a negative training cross entropy loss function by using the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; and when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by utilizing the probability distribution of the text sentence packet, and re-labeling the noise sentence packet to obtain an improved text sentence packet.

In the embodiment, the cross entropy loss function when the Bert pre-training model is subjected to negative training is calculated

And is

Wherein:

is the label relationship y ^* In the removal of y ^* Complementary tag relationships generated by means of random sampling in the space of tag relationships outside, i.e.

y ^* E.g., R ═ {1,2, 3. Cross entropy loss function due to negative training

To reduce complementary tags

And p as the negative training cross entropy loss function decreases _k → 0, therefore, whether the Bert pre-trained model satisfies the convergence condition can be judged by using the negative training cross entropy loss function; and when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by utilizing the probability distribution P (r) of the text sentence packet, and specifically, determining the text sentence packet with the probability value larger than or equal to the probability threshold valueIs a clean sentence packet; determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet; and then, re-labeling the noise sentence packets to obtain improved text sentence packets, wherein the improved text sentence packets include re-labeled noise sentence packets and clean sentence packets.

(6) And (5) repeating the steps (1) to (5), stopping iteration until the Bert pre-training model meets the convergence condition, and determining the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.

In this embodiment, the noise packets are re-labeled in an unsupervised clustering manner, and the method includes the following sub-steps:

(1) obtaining a text label set, and processing the text label set by using an unsupervised clustering model to obtain an initial clustering center (mu) _i } ^Kc Where Kc is the number of initial cluster centers.

(2) Computing initial characteristic vector of noise sentence packet by using hidden vector representation output by last layer of noise sentence packet in characteristic extraction layer, and ordering h ″ _j Hidden vector l' output from the hidden layer of the last layer of noise sentence _i The corresponding relation vector of the text label set, order C _cluster Characterizing initial feature vectors in an unsupervised clustering model for sentences of a noisy sentence packet, then C _cluster ＝H″L″ ^T + d, wherein

Is a relationship matrix L _i D is the offset coefficient, H ″ { H ″ ] ₁ ,h″ ₂ ,...,h″ _j }，L″＝{l″ ₁ ,l″ ₂ ,...,l″ _i }。

(3) And calculating the similarity of the initial feature vector and the initial clustering center by utilizing the t distribution of the students. Let q be _ji As a sentence vector c _j And cluster center mu _i Is similar to each other, and

wherein α is 1;

(4) calculating KL divergence by using the similarity between the initial feature vector and the initial clustering center, in this embodiment, let p be _ji Representing a sentence S _j Assigned a relationship label of l _i Probability of, then

The calculation formula of the KL divergence is:

then, judging whether the unsupervised clustering model meets a convergence condition or not according to the KL divergence, updating the initial characteristic vector and the initial clustering center when the unsupervised clustering model does not meet the convergence condition, repeating the steps (2) - (4) until the unsupervised clustering model meets the convergence condition, and stopping iteration to obtain an optimized characteristic vector and an optimized clustering center;

(5) and calculating the similarity of the optimized feature vector and the optimized clustering center by utilizing the t distribution of the students, and determining the label of the optimized clustering center with the similarity larger than a similarity threshold value as the label of the noise sentence packet.

In the present embodiment, let q _ji As a sentence vector c _j And cluster center mu _i Similarity of (a) to (b), and (b) is _ji And comparing the similarity with a similarity threshold, and determining the label of the optimized clustering center with the similarity larger than the similarity threshold as the label of the noise sentence packet so as to finish the re-labeling of the noise sentence packet.

The text relation extraction method disclosed by the invention can identify the noise sentence packet and the clean sentence packet in the text sentence packet by carrying out negative training on the text sentence packet by using the Bert pre-training model, optimizes the noise sentence packet by adopting a clustering and re-labeling mode to obtain the improved text sentence packet, then carries out negative training on the improved text sentence packet by using the Bert pre-training model again until the Bert pre-training model meets a convergence condition, stops iteration, greatly reduces the number of the noise sentence packets in the text sentence packet by a multi-iteration mode, improves the cleanness degree of the data of the text sentence packet, and obviously improves the relation extraction effect of the text sentence packet.

Based on the concept that the input sentence belongs to the label, the label relation of the sentence can be predicted by positively Training the optimized text sentence packet through a BERT pre-Training model, and then the classification result of the text sentence packet is obtained.

In the present embodiment, step S3 is implemented as follows:

(1) and inputting the optimized text sentence packet into an input layer to obtain a sentence sequence.

(2) Inputting the sentence sequence into the embedding layer to obtain a sentence vector

In this embodiment, after the sentence sequence is input to the embedding layer, the embedding layer obtains a sentence vector of the sentence sequence using the word embedding vector and the position embedding vector.

(3) Inputting the sentence vector into the feature extraction layer to obtain a sentence packet representation, wherein the step is the same as the step when the Bert pre-training model carries out negative training on the text sentence packet, and specifically comprises the following steps:

a) the sentence vector is input into the hidden layer to obtain the improved relation matrix and hidden vector of text sentence packet

Is a vector of the head entity,

the vector of the tail entity is the corresponding relation vector of the ith entity pair

Comprises the following steps:

and using the relation vector

Component relationship vector matrix

Moreover, more nonlinear information can be captured through linear transformation of the head entity vector and the tail entity vector and the activation function Tanh, and at the moment, the relation matrix

The transformation can be as follows:

wherein: w is a ₁ Is a relational weight matrix, c ₁ Is a bias variable.

b) And inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix. In particular, a relationship matrix is utilized

Relation vector of

And the hidden vector output by the last hidden layer

Get attention toWeight coefficient

Namely:

wherein,

is a relationship matrix

The relationship vector of (1) is calculated,

hidden vectors output for hidden layers of the last layer of the Bert pre-trained model

n is the number of sentences.

c) And carrying out weighted summation on the hidden vector and the attention weight coefficient of the last layer to obtain a hidden vector weighted representation. In this embodiment, let

For implicit vector weighted characterization, then

Wherein:

and outputting the hidden vectors for the hidden layer of the last layer.

d) And cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet. In this embodiment, let

Representing a sentence characterization, then

e) And carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet. In this embodiment, let

A sentence packet representation representing a text sentence packet, then

Wherein:

characterizing sentences

The relation with the text sentence packet represents the similarity of r, and

(4) representing the sentence packets into an input and output layer to obtain the probability distribution of the text sentence packets

In this embodiment, the output layer of the Bert pre-trained model has Softmax, so when a sentence packet characterizes

When inputting the output layer, a probability distribution P' (r) of the relation representation r is obtained, and

wherein: w _r Is a relational weight matrix, d _r Is a bias factor.

(5) And calculating a forward training cross entropy loss function by using the probability distribution of the text sentence packet, and classifying the text sentence packet according to the forward training cross entropy loss function and the probability distribution of the text sentence packet to obtain a classification result of the text sentence packet.

In this embodiment, the cross-entropy loss function of positive training is L _PT (g,y ^* ) And is and

wherein: for a given tagged input s, y ^* E.r ═ {1,2, 3.., C }, and p ═ g(s) is expressed as the tag relationship given a sentence by a relationship classifier g(s) is y ^* The probability vector of (2). And, p _k Represents the probability of predicting the kth label, and follows L _PT (g,y ^* ) Decrease of p _k → 1, therefore, label y when the positive training model satisfies the convergence condition ^* The labels of the sentences s in the optimized text sentence packets are obtained, and then the text sentence packets are classified according to the probability distribution of the text sentence packets to obtain the classification results of the text sentence packets.

Based on the text relation extraction method disclosed in the embodiment, the embodiment further provides a text relation extraction device, which comprises an acquisition module, a negative training re-label module and a positive training module.

And the acquisition module is used for acquiring the text sentence packet.

And the noise reduction module is used for carrying out negative training on the text sentence packet by utilizing the Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet.

In this embodiment, the noise reduction module includes a negative training unit, a dividing unit, a re-labeling unit, an updating unit, and a first determining unit, where: (1) the negative training unit is used for carrying out negative training on the text sentence packet by utilizing a Bert pre-training model; (2) the dividing unit divides the text sentence packet into a clean sentence packet and a noise sentence packet according to the result of the negative training unit; (3) the re-labeling is carried out on the noise sentence packet; (4) the updating unit acquires an improved text sentence packet by using the clean sentence packet and the noise sentence packet with the heavy label; (5) and the first determining unit determines the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.

Moreover, the noise reduction module performs negative training on the text sentence packet by using the Bert pre-training model, divides the text sentence packet into a clean sentence packet and a noise sentence packet, and performs re-labeling on the noise sentence packet to obtain an optimized text sentence packet, which comprises the following specific processes:

the negative training unit firstly inputs a text sentence packet into an input layer to obtain a sentence sequence; then, inputting the sentence sequence output by the input layer into the embedding layer to obtain a sentence vector; then, inputting the sentence vectors output by the embedding layer into the feature extraction layer to obtain a sentence packet representation; then, the sentence packets output by the feature extraction layer are represented by the input and output layer, and the probability distribution of the text sentence packets is obtained; then, calculating a negative training cross entropy loss function by utilizing the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; when the Bert pre-training model does not meet the convergence condition, the dividing unit divides the text sentence packet into a clean sentence packet and a noise sentence packet; then, the re-labeling unit re-labels the noise sentence packet by using an unsupervised clustering mode; and finally, the updating unit acquires an improved text sentence packet by using the clean sentence packet and the noise sentence packet with heavy labels, and inputs the improved text sentence packet into the negative training unit until the Bert pre-training model meets the convergence condition, and the first determining unit determines the improved text sentence packet as the optimized text sentence packet when the Bert pre-training model meets the convergence condition.

And the classification module is used for positively training the optimized text sentence packet by utilizing the BERT pre-training model to obtain a classification result of the text sentence packet. In this embodiment, the classification module includes a forward training unit and a second determining unit, the forward training unit performs forward training on the optimized text sentence packet by using a BERT pre-training model, and the second determining unit is configured to obtain a classification result of the text sentence packet according to an output result of the forward training unit.

And the classification module is used for positively training the optimized text sentence packet by using the BERT pre-training model, and the specific process of obtaining the classification result of the text sentence packet is as follows:

the positive training unit inputs the optimized text sentence packet into an input layer to obtain a sentence sequence; then, inputting the sentence sequence output by the input layer into the embedding layer to obtain a sentence vector; then, inputting the sentence vectors output by the embedding layer into the feature extraction layer to obtain a sentence packet representation; then, the sentence packets output by the feature extraction layer are represented by the input and output layer, and the probability distribution of the text sentence packets is obtained; then, calculating a forward training cross entropy loss function by utilizing the probability distribution of the text sentence packet; and then, the second determining unit classifies the text sentence packets according to the cross entropy loss function of the training and the probability distribution of the text sentence packets to obtain the classification result of the text sentence packets.

The present embodiment also provides a text relation extraction device, including a memory, a processor, and a computer program, where the computer program is stored in the memory and configured to be executed by the processor to implement the text relation extraction method of the present embodiment.

The present embodiment also provides a computer-readable storage medium, which is characterized in that a computer program is stored thereon, and the computer program is executed by a processor to implement the text relation extraction method of the present embodiment.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, to those skilled in the art, changes and modifications may be made without departing from the spirit of the present invention, and it is intended that the present invention encompass such changes and modifications.

Claims

1. A text relation extraction method is characterized by comprising the following steps:

s1: acquiring a text sentence packet;

2. The remote supervised relationship extraction method of claim 1, wherein the Bert pre-trained model comprises an input layer, an embedding layer, a feature extraction layer and an output layer, and step S2 comprises the sub-steps of:

3. The remote supervised relationship extraction method as claimed in claim 2, wherein the embedding layer includes a word embedding vector and a position embedding vector, and the step S22 is to obtain a sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.

4. The remote supervised relationship extraction method as claimed in claim 2, wherein the feature extraction layer includes a hidden layer and a relationship attention layer, and the step S23 is as follows: inputting sentence vectors into a hidden layer to obtain a relation matrix and hidden vectors of a text sentence packet; inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix; carrying out weighted summation on the hidden vector and the attention weight coefficient of the last hidden layer to obtain a hidden vector weighted representation; cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet; and carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet.

5. The method for extracting remote supervised relationship as recited in claim 2, wherein the step S25, re-labeling the noise packets is performed by using unsupervised clustering, and includes the following sub-steps:

6. The remote supervised relationship extraction method as recited in claim 2, wherein in step S25, the text sentence packets are divided into clean sentence packets and noise sentence packets by using the probability distribution of the text sentence packets, and the text sentence packets with probability values greater than or equal to the probability threshold are determined as the clean sentence packets; and determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet.

7. The text relation extraction method according to any one of claims 2 to 6, wherein step S3 is:

8. A text relation extracting apparatus, comprising:

the acquisition module is used for acquiring the text sentence packet;

9. A textual relationship extraction device, comprising a memory, a processor, and a computer program stored in the memory and configured for execution by the processor to implement the textual relationship extraction method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to implement the textual relationship extraction method of any of claims 1-7.