CN114896402A - Text relation extraction method, device, equipment and computer storage medium - Google Patents

Text relation extraction method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN114896402A
CN114896402A CN202210565045.8A CN202210565045A CN114896402A CN 114896402 A CN114896402 A CN 114896402A CN 202210565045 A CN202210565045 A CN 202210565045A CN 114896402 A CN114896402 A CN 114896402A
Authority
CN
China
Prior art keywords
sentence
packet
text
sentence packet
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210565045.8A
Other languages
Chinese (zh)
Inventor
曾碧卿
李砚龙
邓会敏
丁明浩
蔡剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGDONG AIB POLYTECHNIC COLLEGE
South China Normal University
Original Assignee
GUANGDONG AIB POLYTECHNIC COLLEGE
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGDONG AIB POLYTECHNIC COLLEGE, South China Normal University filed Critical GUANGDONG AIB POLYTECHNIC COLLEGE
Priority to CN202210565045.8A priority Critical patent/CN114896402A/en
Publication of CN114896402A publication Critical patent/CN114896402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a text relation extraction method, a text relation extraction device, text relation extraction equipment and a computer storage medium. The text relation extraction method comprises the steps of obtaining a text sentence packet; carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet; and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet. The text relation extraction method provided by the invention has the advantages that the text sentence packet is subjected to negative training through the Bert pre-training model, the noise sentence packet of the text sentence packet can be identified, the noise sentence packet is re-labeled, the noise of the text sentence packet is obviously reduced, and the text relation extraction effect is favorably improved.

Description

Text relation extraction method, device, equipment and computer storage medium
Technical Field
The present invention relates to the field of text relationship extraction technologies, and in particular, to a text relationship extraction method, apparatus, device, and computer storage medium.
Background
Information extraction is a main task of natural language processing, aims to extract structured text information from unstructured text, and plays an important role in construction and expansion of knowledge maps and structuring of knowledge bases.
The relation extraction is an important step in the information extraction, the aim of the relation extraction is to classify the relation between entity pairs in sentences, and the relation extraction has important effects on question answering of a knowledge base, construction of the knowledge base, text summarization and the like. The remote supervision acquires training data by aligning the corpus with the knowledge base, namely, if a certain entity pair exists in the knowledge base, all sentences containing the entity pair in the corpus are marked as the corresponding relation of the entity pair in the knowledge base. Remote supervision is an effective method for automatically labeling large-scale training data, but because the assumption is too loose, a large amount of noise is generated in the process of labeling data back, and the performance of relationship extraction is seriously affected by the noise data, so that the noise reduction processing is required to be carried out on the remote supervision relationship extraction.
At present, a rule statistics method, a multi-example learning method, a countermeasure and reinforcement learning method and the like are mainly adopted to perform noise reduction processing on remote supervision relation extraction, wherein:
(1) the rule statistical method belongs to the category of machines and aims to find out the relation between entity pairs and templates or rules in relation, mainly comprises a probability graph model, matrix completion, a kernel method, a dependency relation and the like. However, the method relies on a large amount of characteristic engineering, which not only consumes manpower and time, but also has the problems of poor classification effect and low generalization capability.
(2) The multi-example learning method is one of main noise reduction methods for remote supervision relationship extraction, and is to combine aligned texts with the same entity pair into a multi-example packet (sentence packet for short) and predict the multi-example packet by taking the sentence packet as a unit. The multi-example learning method may be divided into error label prediction, at least one hypothesis, and an attention mechanism according to the prediction method, wherein: a) the error label prediction is to judge whether the label is an error label by calculating the correlation between the label of the sentence packet and each sentence label in the sentence packet; b) at least one assumption is that at least one sentence exists in the remotely supervised aligned text, the real entity pair relationship can be obtained in seconds, and only one sentence in the sentence packet needs to be selected for prediction each time; c) the attention mechanism assigns a weight to each sentence in the sentence packet, assigns a lower weight to reduce its influence on the classification of the sentence packet when the sentence is noise, and assigns a higher weight to emphasize its importance when the sentence is not noise, that is, it assigns a weight to each sentence to strengthen real tag data and weaken erroneous tag data. It can be seen that, compared with the rule statistics method, the multi-example learning method can avoid the risk of error propagation brought by feature engineering, so that the model has generalization capability.
However, the multi-instance learning method has higher time and space complexity requirements than the rule statistics method, and has the following problems in model training: 1) the multi-example learning method cannot process the condition that all sentences in a sentence packet are noise, namely a full-noise sentence packet, because if a certain entity is noise for all aligned texts, at least one sentence is assigned with a larger weight under the multi-example learning framework, namely one sentence is considered to be not noise certainly, and the condition that all sentences are noise cannot be processed; 2) the mapping of the relation between sentences and labels cannot be well processed based on the prediction of the sentence packet, and the situation that the labels of the sentence packet and the labels of the sentences are not designated clearly can occur while the relation is predicted, so that the prediction difficulty is increased; 3) the soft strategy mainly based on the attention mechanism tolerates a part of noise to participate in the training of the model, so that the model learns some wrong mapping relations, and the whole classification effect can be influenced.
(3) The main objective of the confrontation and reinforcement learning method is to improve the quality of the whole sample, enable the model to learn more accurate entity-pair relationship, and further improve the generalization ability and robustness of the training model, so that the confrontation and reinforcement learning method can well improve the quality of the corpus. However, the countermeasure and reinforcement learning method requires a plurality of models for joint training, and has the disadvantages of high training difficulty, poor stability, high time and space complexity, and difficulty in industrial implementation.
Therefore, the existing relation extraction has the problem of poor noise reduction effect.
Disclosure of Invention
Based on this, the present invention aims to provide a text relation extraction method, apparatus, device and computer storage medium, which reduces the noise of the text and improves the extraction effect.
A text relation extraction method comprises the following steps:
s1: acquiring a text sentence packet;
s2: carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;
s3: and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet.
Further, the Bert pre-trained model includes an input layer, an embedding layer, a feature extraction layer, and an output layer, and step S2 includes the following steps:
s21: inputting the text sentence packet into an input layer to obtain a sentence sequence;
s22: inputting the sentence sequence into the embedding layer to obtain a sentence vector;
s23: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;
s24: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;
s25: calculating a negative training cross entropy loss function by using the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by using the probability distribution of the text sentence packet, and re-labeling the noise sentence packet to obtain an improved text sentence packet;
s26: and repeating the steps S21-S25 until the Bert pre-training model meets the convergence condition, stopping iteration, and determining the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.
Further, the embedding layer includes a word embedding vector and a position embedding vector, and step S22 is to obtain a sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.
Further, the feature extraction layer includes a hidden layer and a relation attention layer, and step S23 is: inputting sentence vectors into a hidden layer to obtain a relation matrix and hidden vectors of a text sentence packet; inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix; carrying out weighted summation on the hidden vector and the attention weight coefficient of the last hidden layer to obtain a hidden vector weighted representation; cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet; and carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet.
Further, in step S25, the re-labeling of the noise sentence packets is performed by using an unsupervised clustering method, and includes the following sub-steps:
s251: acquiring a text label set, and processing the text label set by using an unsupervised clustering model to obtain an initial clustering center;
s252: the method comprises the steps that an initial feature vector of a noise sentence packet is represented and calculated by an implicit vector output by the last layer of the noise sentence packet in a feature extraction layer;
s253: calculating the similarity between the initial characteristic vector and the initial clustering center by utilizing the t distribution of the students;
s254: calculating KL divergence by utilizing the similarity of the initial characteristic vector and the initial clustering center, judging whether the unsupervised clustering model meets a convergence condition or not according to the KL divergence, updating the initial characteristic vector and the initial clustering center when the unsupervised clustering model does not meet the convergence condition, repeating the steps S253-S254 until the unsupervised clustering model meets the convergence condition, and stopping iteration to obtain an optimized characteristic vector and an optimized clustering center;
s255: and calculating the similarity of the optimized feature vector and the optimized clustering center by utilizing the t distribution of the students, and determining the label of the optimized clustering center with the similarity larger than a similarity threshold value as the label of the noise sentence packet.
Further, in step S25, the text sentence packets are divided into clean sentence packets and noise sentence packets by using the probability distribution of the text sentence packets, and the text sentence packets with probability values greater than or equal to the probability threshold are determined as clean sentence packets; and determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet.
Further, step S3 is:
s31: inputting the optimized text sentence packet into an input layer to obtain a sentence sequence;
s32: inputting the sentence sequence into the embedding layer to obtain a sentence vector;
s33: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;
s34: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;
s35: and calculating a forward training cross entropy loss function by using the probability distribution of the text sentence packet, and classifying the text sentence packet according to the forward training cross entropy loss function and the probability distribution of the text sentence packet to obtain a classification result of the text sentence packet.
The invention also provides a text relation extracting device, which comprises:
the acquisition module is used for acquiring the text sentence packet;
the noise reduction module is used for carrying out negative training on the text sentence packet by utilizing a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;
and the classification module is used for positively training the optimized text sentence packet by utilizing the BERT pre-training model to obtain a classification result of the text sentence packet.
The invention also provides a text relation extraction device comprising a memory, a processor and a computer program, wherein the computer program is stored in the memory and is configured to be executed by the processor to realize the text relation extraction method.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which is executed by a processor to implement the text relation extracting method according to the present invention.
For a better understanding and practice, the present invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow chart of a remote supervised relationship extraction method of the present invention;
FIG. 2 is a diagram of a Bert pre-training model in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that the embodiments described are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
It is to be understood that the embodiments of the present application are not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the embodiments of the present application is limited only by the following claims.
As shown in fig. 1, the present invention provides a remote supervised relationship extraction method, which includes the following steps:
s1: and acquiring the text sentence packet.
In this embodiment, the text sentence packet is a set of a plurality of sentences, and the sentences in the text sentence packet have the same entity pair (head entity, tail entity), let b i Representing sentences, then textSentence bag B n Expressed as: b is n ={b 1 ,b 2 ,...,b n And n is the number of sentences in the text sentence packet.
S2: and carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet.
The Negative Training (NT) is trained based on the concept that the input sentence does not belong to the complementary tag, because the negative training can not only provide less noise information, but also effectively separate a clean sentence packet and a noise sentence packet during the training process, which is helpful to reduce the influence of the noise sentence packet on the extraction of the text relationship.
The Bert pre-training Model is a pre-training Language Model, which is called Bidirectional Encoder retrieval from transformations, and is obtained by training a Masked Language Model and predicting a next sentence task. In this embodiment, as shown in fig. 2, the Bert pre-trained model includes an input layer, an embedding layer, a feature extraction layer, and an output layer, and step S2 includes the following steps:
(1) and inputting the text sentence packet into an input layer to obtain a sentence sequence.
The sentences in the text sentence packets are converted into sentence sequences, so that the Bert pre-training model can more efficiently capture the label relation and the unstructured relation implied by the entity pairs of the sentences. In the sentence sequence, [ CLS ] represents the sentence sequence information, [ H-SEP ] represents the separator of the head entity, [ T-SEP ] represents the separator of the tail entity, and [ SEP ] represents the character of the end of the sequence. Moreover, in order to enable the Bert pre-training model to pay more attention to the entity pairs and the structural information of the possible relationship types in the feature extraction process, in the implementation, Sub-tree Parse (STP) is adopted to process sentences of a text sentence packet to obtain an STP sequence, and then the STP sequence is spliced with the participled sentence sequence to obtain the sentence sequence.
(2) And inputting the sentence sequence into the embedding layer to obtain a sentence vector.
In this embodiment, the embedding layer includes a word embedding vector and a position embedding vector, where: the word embedding vector adopts a BPE (Byte-Pair Encoding) word segmentation mode, and the word is decomposed into fragments with finer granularity by the BPE technology, so that a Bert pre-training model can be used for extracting useful information with finer granularity in a sentence sequence; the position embedding vector is an important component of an attention mechanism in a Bert pre-training model, and the Bert pre-training model can learn a unique position embedding to represent the mark position of each input sequence in the input sequence through position coding, so that after the sentence sequence is input into an embedding layer, the embedding layer can obtain the sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.
(3) And inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation.
In this embodiment, the feature extraction layer includes a hidden layer and a relationship attention layer, and the sentence vector is input into the feature extraction layer, and the sentence packet representation is obtained by the following method:
a) inputting sentence vectors into a hidden layer to obtain a relation matrix and a hidden vector of a text sentence packet, and enabling h to be a head entity vector and t to be a tail entity vector, so that the ith entity corresponds to a corresponding relation vector l i Comprises the following steps: l i =t i -h i And using the relation vector l i Composition relation matrix L i ={l 1 ,l 2 ,...,l i }; moreover, more nonlinear information can be captured through linear transformation of the head entity vector and the tail entity vector and the activation function Tanh, and at the moment, the relation matrix L i The transformation can be as follows: l is i =Tanh(w 1 (t i -h i )+c 1 ) Wherein: w is a 1 Is a relational weight matrix, c 1 Is a bias variable, when the relationship matrix L i A matrix may be constructed for possible relationships rather than a true relationship matrix.
b) And inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix. Specifically, a relationship matrix L is utilized i Relation vector l in i And the hidden vector h output by the last hidden layer L To obtainAttention weight coefficient α r Namely:
Figure BDA0003657748970000061
wherein l i Is a relationship matrix L i Relation vector of (1), h Li Hidden vector h output for hidden layer of last layer of Bert pre-training model L And n is the number of sentences.
c) And carrying out weighted summation on the hidden vector and the attention weight coefficient of the last layer to obtain a hidden vector weighted representation. In this embodiment, let h' L For implicit vector weighted characterization, then
Figure BDA0003657748970000071
Wherein: h is L And outputting the hidden vectors for the hidden layer of the last layer.
d) And cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet. In this embodiment, let s i Representing a sentence representation, then s i =[l i ;h′ L ]Wherein: l i Is a relation vector, h' L And (4) carrying out hidden vector weighting characterization.
e) And carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet. In this embodiment, let B represent a sentence representation of a text sentence package, then
Figure BDA0003657748970000072
Wherein: alpha is alpha i Characterizing s for sentences i The relation with the text sentence packet represents the similarity of r, and
Figure BDA0003657748970000073
(4) and representing the sentence packets in the input and output layer to obtain the probability distribution of the text sentence packets.
In this embodiment, the output layer of the Bert pre-trained model has a Softmax function, so when a packet characterizes the B input-output layer, the output layer will output a probability distribution p (r) that is a relational characterization r, and p (r) is Softmax (W) r ,B+d r ) Wherein: w r Is a relational weight matrix, d r Is a bias factor.
(5) Calculating a negative training cross entropy loss function by using the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; and when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by utilizing the probability distribution of the text sentence packet, and re-labeling the noise sentence packet to obtain an improved text sentence packet.
In the embodiment, the cross entropy loss function when the Bert pre-training model is subjected to negative training is calculated
Figure BDA0003657748970000074
And is
Figure BDA0003657748970000075
Wherein:
Figure BDA0003657748970000076
is the label relationship y * In the removal of y * Complementary tag relationships generated by means of random sampling in the space of tag relationships outside, i.e.
Figure BDA0003657748970000077
y * E.g., R ═ {1,2, 3. Cross entropy loss function due to negative training
Figure BDA0003657748970000078
To reduce complementary tags
Figure BDA0003657748970000079
And p as the negative training cross entropy loss function decreases k → 0, therefore, whether the Bert pre-trained model satisfies the convergence condition can be judged by using the negative training cross entropy loss function; and when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by utilizing the probability distribution P (r) of the text sentence packet, and specifically, determining the text sentence packet with the probability value larger than or equal to the probability threshold valueIs a clean sentence packet; determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet; and then, re-labeling the noise sentence packets to obtain improved text sentence packets, wherein the improved text sentence packets include re-labeled noise sentence packets and clean sentence packets.
(6) And (5) repeating the steps (1) to (5), stopping iteration until the Bert pre-training model meets the convergence condition, and determining the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.
In this embodiment, the noise packets are re-labeled in an unsupervised clustering manner, and the method includes the following sub-steps:
(1) obtaining a text label set, and processing the text label set by using an unsupervised clustering model to obtain an initial clustering center (mu) i } Kc Where Kc is the number of initial cluster centers.
(2) Computing initial characteristic vector of noise sentence packet by using hidden vector representation output by last layer of noise sentence packet in characteristic extraction layer, and ordering h ″ j Hidden vector l' output from the hidden layer of the last layer of noise sentence i The corresponding relation vector of the text label set, order C cluster Characterizing initial feature vectors in an unsupervised clustering model for sentences of a noisy sentence packet, then C cluster =H″L″ T + d, wherein
Figure BDA0003657748970000084
Is a relationship matrix L i D is the offset coefficient, H ″ { H ″ ] 1 ,h″ 2 ,...,h″ j },L″={l″ 1 ,l″ 2 ,...,l″ i }。
(3) And calculating the similarity of the initial feature vector and the initial clustering center by utilizing the t distribution of the students. Let q be ji As a sentence vector c j And cluster center mu i Is similar to each other, and
Figure BDA0003657748970000081
wherein α is 1;
(4) calculating KL divergence by using the similarity between the initial feature vector and the initial clustering center, in this embodiment, let p be ji Representing a sentence S j Assigned a relationship label of l i Probability of, then
Figure BDA0003657748970000082
The calculation formula of the KL divergence is:
Figure BDA0003657748970000083
then, judging whether the unsupervised clustering model meets a convergence condition or not according to the KL divergence, updating the initial characteristic vector and the initial clustering center when the unsupervised clustering model does not meet the convergence condition, repeating the steps (2) - (4) until the unsupervised clustering model meets the convergence condition, and stopping iteration to obtain an optimized characteristic vector and an optimized clustering center;
(5) and calculating the similarity of the optimized feature vector and the optimized clustering center by utilizing the t distribution of the students, and determining the label of the optimized clustering center with the similarity larger than a similarity threshold value as the label of the noise sentence packet.
In the present embodiment, let q ji As a sentence vector c j And cluster center mu i Similarity of (a) to (b), and (b) is ji And comparing the similarity with a similarity threshold, and determining the label of the optimized clustering center with the similarity larger than the similarity threshold as the label of the noise sentence packet so as to finish the re-labeling of the noise sentence packet.
The text relation extraction method disclosed by the invention can identify the noise sentence packet and the clean sentence packet in the text sentence packet by carrying out negative training on the text sentence packet by using the Bert pre-training model, optimizes the noise sentence packet by adopting a clustering and re-labeling mode to obtain the improved text sentence packet, then carries out negative training on the improved text sentence packet by using the Bert pre-training model again until the Bert pre-training model meets a convergence condition, stops iteration, greatly reduces the number of the noise sentence packets in the text sentence packet by a multi-iteration mode, improves the cleanness degree of the data of the text sentence packet, and obviously improves the relation extraction effect of the text sentence packet.
S3: and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet.
Based on the concept that the input sentence belongs to the label, the label relation of the sentence can be predicted by positively Training the optimized text sentence packet through a BERT pre-Training model, and then the classification result of the text sentence packet is obtained.
In the present embodiment, step S3 is implemented as follows:
(1) and inputting the optimized text sentence packet into an input layer to obtain a sentence sequence.
The sentences in the text sentence packets are converted into sentence sequences, so that the Bert pre-training model can more efficiently capture the label relation and the unstructured relation implied by the entity pairs of the sentences. In the sentence sequence, [ CLS ] represents the sentence sequence information, [ H-SEP ] represents the separator of the head entity, [ T-SEP ] represents the separator of the tail entity, and [ SEP ] represents the character of the end of the sequence. Moreover, in order to enable the Bert pre-training model to pay more attention to the entity pairs and the structural information of the possible relationship types in the feature extraction process, in the implementation, Sub-tree Parse (STP) is adopted to process sentences of a text sentence packet to obtain an STP sequence, and then the STP sequence is spliced with the participled sentence sequence to obtain the sentence sequence.
(2) Inputting the sentence sequence into the embedding layer to obtain a sentence vector
In this embodiment, after the sentence sequence is input to the embedding layer, the embedding layer obtains a sentence vector of the sentence sequence using the word embedding vector and the position embedding vector.
(3) Inputting the sentence vector into the feature extraction layer to obtain a sentence packet representation, wherein the step is the same as the step when the Bert pre-training model carries out negative training on the text sentence packet, and specifically comprises the following steps:
a) the sentence vector is input into the hidden layer to obtain the improved relation matrix and hidden vector of text sentence packet
Figure BDA0003657748970000091
Is a vector of the head entity,
Figure BDA0003657748970000092
the vector of the tail entity is the corresponding relation vector of the ith entity pair
Figure BDA0003657748970000093
Comprises the following steps:
Figure BDA0003657748970000094
and using the relation vector
Figure BDA0003657748970000095
Component relationship vector matrix
Figure BDA0003657748970000096
Moreover, more nonlinear information can be captured through linear transformation of the head entity vector and the tail entity vector and the activation function Tanh, and at the moment, the relation matrix
Figure BDA0003657748970000097
The transformation can be as follows:
Figure BDA0003657748970000098
wherein: w is a 1 Is a relational weight matrix, c 1 Is a bias variable.
b) And inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix. In particular, a relationship matrix is utilized
Figure BDA0003657748970000101
Relation vector of
Figure BDA0003657748970000102
And the hidden vector output by the last hidden layer
Figure BDA0003657748970000103
Get attention toWeight coefficient
Figure BDA0003657748970000104
Namely:
Figure BDA0003657748970000105
wherein,
Figure BDA0003657748970000106
is a relationship matrix
Figure BDA0003657748970000107
The relationship vector of (1) is calculated,
Figure BDA0003657748970000108
hidden vectors output for hidden layers of the last layer of the Bert pre-trained model
Figure BDA0003657748970000109
n is the number of sentences.
c) And carrying out weighted summation on the hidden vector and the attention weight coefficient of the last layer to obtain a hidden vector weighted representation. In this embodiment, let
Figure BDA00036577489700001010
For implicit vector weighted characterization, then
Figure BDA00036577489700001011
Wherein:
Figure BDA00036577489700001012
and outputting the hidden vectors for the hidden layer of the last layer.
d) And cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet. In this embodiment, let
Figure BDA00036577489700001013
Representing a sentence characterization, then
Figure BDA00036577489700001014
e) And carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet. In this embodiment, let
Figure BDA00036577489700001015
A sentence packet representation representing a text sentence packet, then
Figure BDA00036577489700001016
Wherein:
Figure BDA00036577489700001017
characterizing sentences
Figure BDA00036577489700001018
The relation with the text sentence packet represents the similarity of r, and
Figure BDA00036577489700001019
(4) representing the sentence packets into an input and output layer to obtain the probability distribution of the text sentence packets
In this embodiment, the output layer of the Bert pre-trained model has Softmax, so when a sentence packet characterizes
Figure BDA00036577489700001020
When inputting the output layer, a probability distribution P' (r) of the relation representation r is obtained, and
Figure BDA00036577489700001021
wherein: w r Is a relational weight matrix, d r Is a bias factor.
(5) And calculating a forward training cross entropy loss function by using the probability distribution of the text sentence packet, and classifying the text sentence packet according to the forward training cross entropy loss function and the probability distribution of the text sentence packet to obtain a classification result of the text sentence packet.
In this embodiment, the cross-entropy loss function of positive training is L PT (g,y * ) And is and
Figure BDA00036577489700001022
wherein: for a given tagged input s, y * E.r ═ {1,2, 3.., C }, and p ═ g(s) is expressed as the tag relationship given a sentence by a relationship classifier g(s) is y * The probability vector of (2). And, p k Represents the probability of predicting the kth label, and follows L PT (g,y * ) Decrease of p k → 1, therefore, label y when the positive training model satisfies the convergence condition * The labels of the sentences s in the optimized text sentence packets are obtained, and then the text sentence packets are classified according to the probability distribution of the text sentence packets to obtain the classification results of the text sentence packets.
Based on the text relation extraction method disclosed in the embodiment, the embodiment further provides a text relation extraction device, which comprises an acquisition module, a negative training re-label module and a positive training module.
And the acquisition module is used for acquiring the text sentence packet.
And the noise reduction module is used for carrying out negative training on the text sentence packet by utilizing the Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet.
In this embodiment, the noise reduction module includes a negative training unit, a dividing unit, a re-labeling unit, an updating unit, and a first determining unit, where: (1) the negative training unit is used for carrying out negative training on the text sentence packet by utilizing a Bert pre-training model; (2) the dividing unit divides the text sentence packet into a clean sentence packet and a noise sentence packet according to the result of the negative training unit; (3) the re-labeling is carried out on the noise sentence packet; (4) the updating unit acquires an improved text sentence packet by using the clean sentence packet and the noise sentence packet with the heavy label; (5) and the first determining unit determines the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.
Moreover, the noise reduction module performs negative training on the text sentence packet by using the Bert pre-training model, divides the text sentence packet into a clean sentence packet and a noise sentence packet, and performs re-labeling on the noise sentence packet to obtain an optimized text sentence packet, which comprises the following specific processes:
the negative training unit firstly inputs a text sentence packet into an input layer to obtain a sentence sequence; then, inputting the sentence sequence output by the input layer into the embedding layer to obtain a sentence vector; then, inputting the sentence vectors output by the embedding layer into the feature extraction layer to obtain a sentence packet representation; then, the sentence packets output by the feature extraction layer are represented by the input and output layer, and the probability distribution of the text sentence packets is obtained; then, calculating a negative training cross entropy loss function by utilizing the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; when the Bert pre-training model does not meet the convergence condition, the dividing unit divides the text sentence packet into a clean sentence packet and a noise sentence packet; then, the re-labeling unit re-labels the noise sentence packet by using an unsupervised clustering mode; and finally, the updating unit acquires an improved text sentence packet by using the clean sentence packet and the noise sentence packet with heavy labels, and inputs the improved text sentence packet into the negative training unit until the Bert pre-training model meets the convergence condition, and the first determining unit determines the improved text sentence packet as the optimized text sentence packet when the Bert pre-training model meets the convergence condition.
And the classification module is used for positively training the optimized text sentence packet by utilizing the BERT pre-training model to obtain a classification result of the text sentence packet. In this embodiment, the classification module includes a forward training unit and a second determining unit, the forward training unit performs forward training on the optimized text sentence packet by using a BERT pre-training model, and the second determining unit is configured to obtain a classification result of the text sentence packet according to an output result of the forward training unit.
And the classification module is used for positively training the optimized text sentence packet by using the BERT pre-training model, and the specific process of obtaining the classification result of the text sentence packet is as follows:
the positive training unit inputs the optimized text sentence packet into an input layer to obtain a sentence sequence; then, inputting the sentence sequence output by the input layer into the embedding layer to obtain a sentence vector; then, inputting the sentence vectors output by the embedding layer into the feature extraction layer to obtain a sentence packet representation; then, the sentence packets output by the feature extraction layer are represented by the input and output layer, and the probability distribution of the text sentence packets is obtained; then, calculating a forward training cross entropy loss function by utilizing the probability distribution of the text sentence packet; and then, the second determining unit classifies the text sentence packets according to the cross entropy loss function of the training and the probability distribution of the text sentence packets to obtain the classification result of the text sentence packets.
The present embodiment also provides a text relation extraction device, including a memory, a processor, and a computer program, where the computer program is stored in the memory and configured to be executed by the processor to implement the text relation extraction method of the present embodiment.
The present embodiment also provides a computer-readable storage medium, which is characterized in that a computer program is stored thereon, and the computer program is executed by a processor to implement the text relation extraction method of the present embodiment.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, to those skilled in the art, changes and modifications may be made without departing from the spirit of the present invention, and it is intended that the present invention encompass such changes and modifications.

Claims (10)

1. A text relation extraction method is characterized by comprising the following steps:
s1: acquiring a text sentence packet;
s2: carrying out negative training on the text sentence packet by using a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;
s3: and (5) carrying out forward training on the optimized text sentence packet by using a BERT pre-training model to obtain a classification result of the text sentence packet.
2. The remote supervised relationship extraction method of claim 1, wherein the Bert pre-trained model comprises an input layer, an embedding layer, a feature extraction layer and an output layer, and step S2 comprises the sub-steps of:
s21: inputting the text sentence packet into an input layer to obtain a sentence sequence;
s22: inputting the sentence sequence into the embedding layer to obtain a sentence vector;
s23: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;
s24: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;
s25: calculating a negative training cross entropy loss function by using the probability distribution of the text sentence packet, and judging whether the Bert pre-training model meets the convergence condition or not according to the negative training cross entropy loss function; when the Bert pre-training model does not meet the convergence condition, dividing the text sentence packet into a clean sentence packet and a noise sentence packet by using the probability distribution of the text sentence packet, and re-labeling the noise sentence packet to obtain an improved text sentence packet;
s26: and repeating the steps S21-S25 until the Bert pre-training model meets the convergence condition, stopping iteration, and determining the improved text sentence packet when the Bert pre-training model meets the convergence condition as the optimized text sentence packet.
3. The remote supervised relationship extraction method as claimed in claim 2, wherein the embedding layer includes a word embedding vector and a position embedding vector, and the step S22 is to obtain a sentence vector of the sentence sequence by using the word embedding vector and the position embedding vector.
4. The remote supervised relationship extraction method as claimed in claim 2, wherein the feature extraction layer includes a hidden layer and a relationship attention layer, and the step S23 is as follows: inputting sentence vectors into a hidden layer to obtain a relation matrix and hidden vectors of a text sentence packet; inputting the relation matrix and the hidden vector into a relation attention layer to obtain an attention weight coefficient of the relation matrix; carrying out weighted summation on the hidden vector and the attention weight coefficient of the last hidden layer to obtain a hidden vector weighted representation; cascading the relation vector and the hidden vector weighting representation to obtain a sentence representation of the text sentence packet; and carrying out weighted summation on the sentence representations of the text sentence packet to obtain the sentence packet representation of the text sentence packet.
5. The method for extracting remote supervised relationship as recited in claim 2, wherein the step S25, re-labeling the noise packets is performed by using unsupervised clustering, and includes the following sub-steps:
s251: acquiring a text label set, and processing the text label set by using an unsupervised clustering model to obtain an initial clustering center;
s252: the method comprises the steps that an initial feature vector of a noise sentence packet is represented and calculated by an implicit vector output by the last layer of the noise sentence packet in a feature extraction layer;
s253: calculating the similarity between the initial characteristic vector and the initial clustering center by utilizing the t distribution of the students;
s254: calculating KL divergence by utilizing the similarity of the initial characteristic vector and the initial clustering center, judging whether the unsupervised clustering model meets a convergence condition or not according to the KL divergence, updating the initial characteristic vector and the initial clustering center when the unsupervised clustering model does not meet the convergence condition, repeating the steps S253-S254 until the unsupervised clustering model meets the convergence condition, and stopping iteration to obtain an optimized characteristic vector and an optimized clustering center;
s255: and calculating the similarity of the optimized feature vector and the optimized clustering center by utilizing the t distribution of the students, and determining the label of the optimized clustering center with the similarity larger than a similarity threshold value as the label of the noise sentence packet.
6. The remote supervised relationship extraction method as recited in claim 2, wherein in step S25, the text sentence packets are divided into clean sentence packets and noise sentence packets by using the probability distribution of the text sentence packets, and the text sentence packets with probability values greater than or equal to the probability threshold are determined as the clean sentence packets; and determining the text sentence packet with the probability value smaller than the probability threshold value as a noise sentence packet.
7. The text relation extraction method according to any one of claims 2 to 6, wherein step S3 is:
s31: inputting the optimized text sentence packet into an input layer to obtain a sentence sequence;
s32: inputting the sentence sequence into the embedding layer to obtain a sentence vector;
s33: inputting the sentence vectors into a feature extraction layer to obtain a sentence packet representation;
s34: representing the sentence packets in an input and output layer to obtain the probability distribution of the text sentence packets;
s35: and calculating a forward training cross entropy loss function by using the probability distribution of the text sentence packet, and classifying the text sentence packet according to the forward training cross entropy loss function and the probability distribution of the text sentence packet to obtain a classification result of the text sentence packet.
8. A text relation extracting apparatus, comprising:
the acquisition module is used for acquiring the text sentence packet;
the noise reduction module is used for carrying out negative training on the text sentence packet by utilizing a Bert pre-training model, dividing the text sentence packet into a clean sentence packet and a noise sentence packet, and re-labeling the noise sentence packet to obtain an optimized text sentence packet;
and the classification module is used for positively training the optimized text sentence packet by utilizing the BERT pre-training model to obtain a classification result of the text sentence packet.
9. A textual relationship extraction device, comprising a memory, a processor, and a computer program stored in the memory and configured for execution by the processor to implement the textual relationship extraction method of any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to implement the textual relationship extraction method of any of claims 1-7.
CN202210565045.8A 2022-05-23 2022-05-23 Text relation extraction method, device, equipment and computer storage medium Pending CN114896402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210565045.8A CN114896402A (en) 2022-05-23 2022-05-23 Text relation extraction method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210565045.8A CN114896402A (en) 2022-05-23 2022-05-23 Text relation extraction method, device, equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN114896402A true CN114896402A (en) 2022-08-12

Family

ID=82723388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210565045.8A Pending CN114896402A (en) 2022-05-23 2022-05-23 Text relation extraction method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN114896402A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071077A (en) * 2023-03-06 2023-05-05 深圳市迪博企业风险管理技术有限公司 Risk assessment and identification method and device for illegal account

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071077A (en) * 2023-03-06 2023-05-05 深圳市迪博企业风险管理技术有限公司 Risk assessment and identification method and device for illegal account

Similar Documents

Publication Publication Date Title
WO2020073714A1 (en) Training sample obtaining method, account prediction method, and corresponding devices
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN112883153B (en) Relationship classification method and device based on information enhancement BERT
CN116432655B (en) Method and device for identifying named entities with few samples based on language knowledge learning
CN113723083A (en) Weighted negative supervision text emotion analysis method based on BERT model
CN115641529A (en) Weak supervision time sequence behavior detection method based on context modeling and background suppression
CN115659947A (en) Multi-item selection answering method and system based on machine reading understanding and text summarization
CN117557886A (en) Noise-containing tag image recognition method and system integrating bias tags and passive learning
CN113505120B (en) Double-stage noise cleaning method for large-scale face data set
CN114896402A (en) Text relation extraction method, device, equipment and computer storage medium
CN118012776A (en) Software defect prediction method and system based on generation of countermeasure and pretraining model
CN113254429B (en) BERT and MLM-based noise reduction method for remote supervision relation extraction
CN114626461A (en) Cross-domain target detection method based on domain self-adaptation
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN115827871A (en) Internet enterprise classification method, device and system
CN115713082A (en) Named entity identification method, device, equipment and storage medium
CN114357166A (en) Text classification method based on deep learning
CN114547264A (en) News diagram data identification method based on Mahalanobis distance and comparison learning
CN114021658A (en) Training method, application method and system of named entity recognition model
CN113988194A (en) Multi-label text classification method and system
CN113592045A (en) Model adaptive text recognition method and system from printed form to handwritten form
WO2010076386A2 (en) Method for a pattern discovery and recognition
CN112434516B (en) Self-adaptive comment emotion analysis system and method for merging text information
CN116976351B (en) Language model construction method based on subject entity and subject entity recognition device
CN118536049B (en) Content main body discovery method based on multi-mode abnormal content understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination