CN114925693B

CN114925693B - Multi-model fusion-based multivariate relation extraction method and extraction system

Info

Publication number: CN114925693B
Application number: CN202210009601.3A
Authority: CN
Inventors: 蔡传宏; 胡沛弦
Original assignee: Huaneng Guixin Trust Co ltd
Current assignee: Huaneng Guixin Trust Co ltd
Priority date: 2022-01-05
Filing date: 2022-01-05
Publication date: 2023-04-07
Anticipated expiration: 2042-01-05
Also published as: CN114925693A

Abstract

The invention discloses a multivariate relation extraction method, an extraction system, computer equipment and a storage medium based on multi-model fusion, wherein the multivariate relation extraction method of one embodiment comprises the following steps: s1: preprocessing an input text, extracting semantic features by using a shared coding layer and outputting a semantic feature vector; s2: respectively using each depth relation model to perform relation extraction on the semantic feature vectors and outputting relation types, wherein each depth relation model comprises at least one relation sub-model for relation extraction and a decision unit for deciding and outputting the relation types; s3: and converging the relation types output by the depth relation models by using a relation converging unit and generating a relation type result. The embodiment provided by the invention can quickly identify the entity relationship in the unstructured document, particularly the entity relationship in the unstructured document in the financial field, thereby finding the risk and opportunity in the unstructured document and having practical application value.

Description

Multi-model fusion-based multivariate relation extraction method and extraction system

Technical Field

The invention relates to the technical field of computer natural language processing, in particular to a multivariate relation extraction method based on multi-model fusion, a multivariate relation extraction system, a computer readable storage medium and computer equipment.

Background

With the further development of deep learning technology, computer natural language processing has become a popular technology and is applied to various aspects of daily life. Specifically, extraction (RE) based on entity relationship is to extract a pair of entities from a text and give a relationship between the entities, for example, in chinese patent document, an application number of "Extraction method of relationship between entities based on deep learning subway design field specification" is cn202110722239.X, a method for extracting relationship between entities based on deep learning subway design field specification is introduced, the method for extracting relationship between entities adopts a certain output structure to capture multiple relationships between multiple entities existing in a sentence, and masks off portions other than the entities in the sentence by using mask information based on the entities. Meanwhile, the relative position information of the entities in the sentence is integrated into the attention calculation process, and the attention information of 'every word' in the sentence is strengthened. And in the aspect of parameter selection, a preference process based on a plurality of super parameters such as iteration times, learning rate, BERT layer number used for fine tuning, maximum length, entity maximum distance, maximum relation number and the like is developed. The task of extracting the relationships among the entities of the method for extracting the relationships among the entities is based on the identification development of named entities, specific entity contents are defined particularly aiming at the field of subway design, namely, the designated content identification is carried out aiming at related contents of the subway design, although the task performance of extracting the relationships among the entities can be improved to a certain extent, the problem of weak generalization of the entity identification exists, and the problem of entity identification of texts outside the field of subway design cannot be solved; meanwhile, the method for extracting the relationship between the entities only enhances the accuracy of relationship identification through the distance information of the edges, and has the problem of single relationship identification mode.

In chinese patent document, application No. CN202110564234.9 entitled "a structured airport alarm processing method based on entity relationship extraction" introduces a structured airport alarm processing method based on entity relationship extraction, which includes: acquiring and preprocessing airport alarm data and outputting unstructured text data; constructing a dictionary, a search rule and a meteorological element entity extraction model, wherein the dictionary, the search rule and the meteorological element entity extraction model are all used for extracting meteorological element entities from unstructured text data and generating a meteorological element entity set; establishing a meteorological element entity relation classification model by taking time and place as main objects, carrying out relation analysis on meteorological element entities, respectively determining the relation between the main objects and the latest weather words, and simultaneously associating the special meteorological element entities with the corresponding weather and outputting association results according to preset rules; and carrying out normalization conversion of time and units on the output associated entities, and finally outputting the data in the structured format. The structured airport alarm processing method uses BERT-LSTM-Softmax to extract entities, and has the problems of low speed and low efficiency of entity extraction.

In actual production life, especially in the financial market with great change, a great deal of valuable information exists in unstructured texts, such as policy documents, market public opinions, company announcements and the like; and the entity relationship is extracted from various documents, and a relationship map among the entities is constructed according to the entity relationship, so that the method has great significance for the understanding of markets and the discovery of investment machines. Therefore, how to identify entity relationships of unstructured texts becomes an urgent problem to be solved by those skilled in the art.

Disclosure of Invention

In order to solve at least one of the above problems, a first embodiment of the present invention provides a multivariate relation extraction method based on multi-model fusion, including:

s1: preprocessing an input text, extracting semantic features by using a shared coding layer and outputting a semantic feature vector;

s2: respectively using each depth relation model to perform relation extraction on the semantic feature vectors and outputting relation types, wherein each depth relation model comprises at least one relation sub-model for relation extraction and a decision unit for deciding and outputting the relation types;

s3: and converging the relation types output by the depth relation models by using a relation converging unit and generating a relation type result.

For example, in some embodiments of the present application, the depth relationship model includes a first relationship submodel, the first relationship submodel includes a first enumerator, a first classifier, and a second classifier,

the S1 further includes:

s111: splitting an input text into a plurality of word input vectors according to characters, wherein the word input vectors comprise word vectors, paragraph vectors and position vectors;

s112: extracting semantic features by using a shared coding layer and outputting semantic feature vectors, wherein the semantic feature vectors comprise a first overall feature vector and a plurality of second semantic feature vectors;

the S2 further comprises:

s211: segment enumerating the input text using the first enumerator and outputting a plurality of first entity candidate segments;

s212: using the first classifier to perform operation according to a combination vector and the first overall feature vector of each first entity candidate segment and output a plurality of first entity vectors and a plurality of first context text feature vectors, wherein the combination vector comprises a text feature vector, a position vector and a width vector of the first entity candidate segment, and the first entity vectors comprise a text feature vector, a position vector and a width vector;

s213: and operating according to the plurality of first entity vectors and the plurality of first context text characteristic vectors respectively by using the second classifier, and outputting a first relation type.

For example, in a multivariate relationship extraction method provided in some embodiments of the present application, the S212 further includes:

s2121: using the first classifier to respectively determine the combination direction of each first entity candidate segment

Calculating the quantity and the first integral characteristic vector to generate a first candidate entity;

s2122: judging whether the first candidate entity is an entity by using Softmax, and outputting a first entity vector if the first candidate entity is the entity;

the S213 further includes: and using the second classifier to classify pairwise relations according to two first entity vectors in the plurality of first entity vectors and the first context text feature vector between the two first entity vectors respectively and generate corresponding first relation types.

For example, in the multivariate relation extraction method provided in some embodiments of the present application, the loss function of the first relation submodel is a weighted average of the loss functions of the first classifier and the second classifier.

For example, in the multivariate relational extraction method provided in some embodiments herein, the depth relational model comprises a second relational submodel, the second relational submodel comprises a conditional random field model, a third classifier, and a fourth classifier,

the S2 further includes:

s221: performing path judgment according to the semantic feature vector by using the conditional random field model and outputting a plurality of identified second entity vectors;

s222: operating on the plurality of second entity vectors using the third classifier and generating a plurality of third entity vectors and a plurality of second context text feature vectors;

s223: and using the fourth classifier to perform pairwise relationship classification according to two third entity vectors in the plurality of third entity vectors, the second context text feature vector between the two third entity vectors, the position feature vector of the first third entity vector in the two third entity vectors and the distance feature vector between the two third entity vectors respectively, and generate a corresponding second relationship type.

For example, in some embodiments of the present application, the depth relation model includes a third relation submodel, the third relation submodel includes a fifth classifier and a relation extractor,

the S2 further includes:

s231: classifying and outputting a labeled fourth entity vector by using the fifth classifier according to the semantic feature vector, wherein the fourth entity vector comprises a relation subject identification vector, a relation object identification vector and an internal identification vector;

s232: and extracting by using the relation extractor according to the relation subject identification vector, the relation object identification vector and the internal identification vector to obtain a subject and an object of the relation, and outputting a third relation type according to the subject and the object of the relation.

For example, in a multivariate relationship extraction method provided in some embodiments of the present application, the S2 further includes: and the decision unit is used for making a decision according to the first relation type, the second relation type and the third relation type and outputting the relation types.

For example, in the multivariate relationship extraction method provided in some embodiments of the present application, the shared coding layer is one of a BERT-wwm model, a RoBERTa model, an ERNIE model, a NEZHA model, and an XLNet model.

For example, in a multivariate relation extraction method provided in some embodiments of the present application, the shared coding layer includes a 12-layer coder, and the S1 further includes:

the average of the output results of the last 3-layer encoder is used as the semantic feature vector.

A second embodiment of the present invention provides a multivariate relationship extraction system using the multivariate relationship extraction method according to the first embodiment, including:

the document preprocessing model is configured to preprocess an input text, extract semantic features by using a shared coding layer and output a semantic feature vector;

the relation extraction model is configured to perform relation extraction on the semantic feature vectors and output relation types, and comprises a plurality of depth relation models and a decision unit for deciding and outputting the relation types, and each depth relation model comprises at least one relation sub-model for relation extraction; and

and the relation convergence model is configured to converge the relation types output by the depth relation models and generate a relation type result.

A third embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method according to the first embodiment.

A fourth embodiment of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first embodiment when executing the program.

The invention has the following beneficial effects:

aiming at the existing problems, the invention sets a multivariate relation extraction method, an extraction system, computer equipment and a storage medium based on multi-model fusion, performs relation extraction on input texts respectively by using a plurality of depth relation models to obtain a plurality of relation types, and then performs convergence by a relation convergence unit to generate a relation type result, so that the recall rate of the relation extraction can be effectively improved, the accuracy of entity judgment is improved, and the relation identification efficiency is improved, thereby overcoming the problems in the prior art and having practical application value.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow diagram illustrating a multivariate relationship extraction method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a multivariate relationship extraction system according to an embodiment of the invention;

FIG. 3 is a schematic diagram illustrating the structure of a document pre-processing model according to an embodiment of the invention;

FIG. 4 is a schematic diagram of the structure of the first relational submodel according to one embodiment of the invention;

FIG. 5 illustrates a portion of a structural diagram of the second relationship submodel according to an embodiment of the invention;

FIG. 6 is a diagram of another part of the second relation submodel according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the third relationship submodel according to one embodiment of the invention;

FIG. 8 is a block diagram illustrating the structure of a multivariate relationship extraction system according to an embodiment of the invention;

FIG. 9 is a schematic structural diagram of the decision unit according to an embodiment of the present invention;

fig. 10 shows a schematic structural diagram of a computer device according to another embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the invention, the invention is further described below with reference to preferred embodiments and the accompanying drawings. Similar parts in the figures are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.

According to the above problem and the reasons causing the problem, as shown in fig. 1, an embodiment of the present invention provides a multivariate relation extraction method based on multi-model fusion, including:

In this embodiment, a plurality of depth relationship models are used to perform relationship extraction on an input text respectively to obtain a plurality of relationship types, and then a relationship aggregation unit is used to perform aggregation to generate a relationship type result of the input text, so that a recall rate of the relationship extraction can be effectively improved, an accuracy rate of entity judgment is improved, and further relationship identification efficiency is improved, thereby solving the problems in the prior art and having practical application value.

Specifically, the entity Relationship Extraction (RE) is to extract a pair of entities from a text and provide a relationship between the entities, and as shown in fig. 2, the multivariate relationship Extraction system of the present application includes a document preprocessing model, a relationship Extraction model, and a relationship aggregation model, where the document preprocessing model receives an input text to be subjected to relationship Extraction, preprocesses the text and outputs a semantic feature vector, then performs relationship Extraction by using the relationship Extraction model and outputs a relationship type, and generates a relationship type result after aggregating the output relationship types by using the relationship aggregation model, where the relationship type result is a subject-prediction-object (SPO-object).

In a specific embodiment, the document preprocessing model includes a word processing model and a shared coding layer, and the preprocessing the input text and performing semantic feature extraction using the shared coding layer and outputting a semantic feature vector further includes:

s112: and performing semantic feature extraction by using a shared coding layer and outputting a semantic feature vector, wherein the semantic feature vector comprises a first overall feature vector and a plurality of second semantic feature vectors.

In this embodiment, as shown in fig. 3, the word processing model splits an input text according to characters and forms a corresponding word input vector, where the word input vector includes a word vector, a paragraph vector, and a position vector. For example, the input text is "one, two, three, and four", which are split into "one", "two", "three", and "four", respectively, and form a word vector, a paragraph vector, and a position vector corresponding to each character. The word input vectors are then input into a shared coding layer, which in an alternative embodiment is one of a BERT-wwm model, a RoBERTA model, an ERNIE model, a NEZHA model, and an XLNET model. Those skilled in the art should select an appropriate model to implement the processing of the word input vector according to the actual application requirements. In this embodiment, a BERT-wm model is taken as an example to describe, as shown in fig. 3, the BERT-wm model includes 12 layers of encoder transformers, and in order to better utilize semantic representation information obtained by the model, an average value of output results of a last 3 layers of encoders is used as the semantic feature vector, so that accuracy of relationship extraction can be effectively improved. Specifically, in the embodiment, the output results of the tenth, eleventh, and twelfth layer encoders are averaged to be used as the semantic feature vector output by the BERT-wm model, and the evaluation index (F1 value) is extracted in a relationship to be evaluated, which can effectively improve the output quality by 1 to 2%.

In an alternative embodiment, as shown in fig. 4, the depth relationship model includes a first relationship submodel, i.e. a joint extraction model, the first relationship submodel includes a first enumerator, a first classifier, and a second classifier, and the performing relationship extraction on the semantic feature vectors and outputting relationship types using the depth relationship models respectively further includes:

s211: segment enumerating the input text using the first enumerator and outputting a plurality of first entity candidate segments.

In this embodiment, as shown in fig. 4, a first enumerator enumerates text segments that may form an entity in an input text, so as to obtain a plurality of first entity candidate segments, for example, C1, C2, C3, and C4 shown in the figure.

S212: and operating according to a combination vector and the first overall feature vector of each first entity candidate segment and outputting a plurality of first entity vectors and a plurality of first context text feature vectors by using the first classifier, wherein the combination vector comprises the text feature vector, the position vector and the width vector of the first entity candidate segment, and the first entity vector comprises the text feature vector, the position vector and the width vector.

In this embodiment, the first classifier performs a vector splicing operation based on the first entity candidate segment, and in a specific example, S212 further includes:

s2121: and using the first classifier to perform operation according to the combination vector of each first entity candidate segment and the first overall characteristic vector to generate first candidate entities.

In this embodiment, as shown in fig. 4, a position vector output by the word processing model corresponding to the first entity candidate segment C1, a first global feature vector output by the shared encoder, a maximum pooled (max-pool) text feature vector corresponding to the first entity candidate segment C1, and a width vector corresponding to the first entity candidate segment C1 are subjected to a splicing operation to obtain a first candidate entity.

S2122: and judging whether the first candidate entity is an entity by using Softmax, and outputting a first entity vector if the first candidate entity is the entity.

In this embodiment, as shown in fig. 4, softmax is input into the obtained first candidate entity, and it is determined by Softmax whether the first entity candidate segment C1 is an entity, if so, a corresponding first entity vector is output, where the first entity vector includes a text feature vector, a position vector, and a width vector, and if not, a non-entity is output; and simultaneously outputting a plurality of maximum pooling (max-pool) processed first context text feature vectors T1.

In this embodiment, the second classifier performs a splicing operation according to the plurality of first entity vectors and the plurality of first context text feature vectors and outputs a first relationship type. In a specific example, pairwise relationship classification is performed by using the second classifier according to two first entity vectors of the plurality of first entity vectors and the first context text feature vector between the two first entity vectors, and a corresponding first relationship type is generated.

In this embodiment, a second classifier is used to classify pairwise relationships of first entity vectors output by the first classifier, specifically, two first entity vectors and a first context text feature vector between the two first entity vectors are subjected to a splicing operation to generate corresponding first relationship types, and therefore, the second classifier generates a plurality of first relationship types through pairwise relationship classification.

In this embodiment, the first relation sub-model finds out all possible fragment combinations by a fragment classification method through a first enumerator, then determines, through the first classifier, a probability of whether each fragment combination is an entity, and after determining all entities, pairwise pairs of the entities are performed through a second classifier, so as to determine a probability that a relation exists between each pair of entity pairs. In other words, the first relation submodel realizes entity identification through the first enumerator and the first classifier, realizes relation classification through the second classifier, and uses semantic feature vectors output by the shared coding layer in the two processes of entity identification and relation classification, namely, realizes parameter sharing of the two processes, so that information of the two processes can be interacted, and the performances of entity identification and relation classification are improved simultaneously.

In an alternative embodiment, the loss function of the first relational sub-model is a weighted average of the loss functions of the first classifier and the second classifier.

In this embodiment, the first classifier and the second classifier both use single-label classification problem and use Softmax as the discriminant function, so the loss functions of the first classifier and the second classifier are both in the form of loss functions

In particular, the loss function of the first classifier is ^>

The loss function of the second classifier is ^ h>

The loss function of the first relation submodel is L = L ₁ +λL ₂ Wherein, the lambda is a weight coefficient and is obtained by adjusting parameters in the training process. Therefore, the loss function of the first relation submodel can simultaneously constrain the two classifiers, and the accuracy of the multivariate relation extraction system is effectively improved.

In an alternative embodiment, as shown in fig. 5 and 6, the depth relation model includes a second relation submodel, namely a step-wise extraction model, the second relation submodel includes a conditional random field model, a third classifier and a fourth classifier, the S2 further includes:

s221: and performing path judgment according to the semantic feature vector by using the conditional random field model and outputting a plurality of identified second entity vectors.

In this embodiment, as shown in fig. 5, a Conditional Random Field (CRF) model is used as a decoder, and a label is marked on each character in the text by using a label combining with a BIO or a BIESO system, that is, the label of each character is determined by a Conditional Random Field decoding layer. Specifically, the conditional random field performs path judgment according to the semantic feature vector output by the shared coding layer, for example, outputs a result of the identified entity according to the character tag, such as identifying a company name entity, a number entity, and the like in the text. In this embodiment, the output label is constrained by the conditional random field, and the decoding effect is better than the direct use of Softmax classification.

S222: operating on the plurality of second entity vectors using the third classifier and generating a plurality of third entity vectors and a plurality of second context text feature vectors.

In this embodiment, as shown in fig. 6, according to a plurality of second entity vectors output by the conditional random field model, a third classifier is used to operate on the second entity vectors and output a plurality of third entity vectors C5, C6 and a plurality of second context feature vectors T2 after maximal pooling.

In this embodiment, a fourth classifier is used to classify pairwise relationships between the second entity vectors output by the third classifier, and specifically, the second classifier performs a splicing operation on the two second entity vectors and the second context text feature vector between the two second entity vectors to generate corresponding second relationship types, so that the second classifier generates a plurality of second relationship types through pairwise relationship classification.

In an alternative embodiment, as shown in fig. 7, the depth relation model includes a third relation submodel, i.e. a one-step extraction model, the third relation submodel includes a fifth classifier and a relation extractor, the S2 further includes:

s231: and classifying and outputting a labeled fourth entity vector by using the fifth classifier according to the semantic feature vector, wherein the fourth entity vector comprises a relation subject identification vector, a relation object identification vector and an internal identification vector.

In this embodiment, as shown in fig. 7, a fifth classifier Sigmoid is used to classify the semantic feature vectors output by the shared coding layer to obtain character labels, and mark out the identified relationship results, such as a relationship subject identification vector S-R1, relationship object identification vectors O-R1 and O-R2, internal identification vectors I-R1 and I-R2, and other irrelevant vectors O.

S232: and extracting by using the relation extractor according to the relation subject identification vector, the relation object identification vector and the internal identification vector to obtain a subject and an object of a relation, and outputting a third relation type according to the subject and the object of the relation.

In this embodiment, as shown in fig. 7, a relationship extractor is used to perform extraction according to the obtained relationship subject identifier vector, relationship object identifier vector and internal identifier vector, for example, to identify the relationship subject identifier vector S-R1 as the subject of relationship 1, identify the relationship object identifier vector O-R1 as the object of relationship 2, and output a third relationship type according to the identified subject of relationship 1 and the subject of relationship 2.

It should be noted that each depth relationship model includes at least one relationship submodel, for example, at least one of the first relationship submodel, the second relationship submodel, and the third relationship submodel, which is not specifically limited in this application, and those skilled in the art should select the appropriate number of relationship submodels according to the actual application requirements.

In an alternative embodiment, as shown in fig. 8, each depth relationship model includes a first relationship submodel, a second relationship submodel, and a third relationship submodel, and a decision unit is used to make a decision according to the first relationship type, the second relationship type, and the third relationship type and output the relationship types.

In this embodiment, as shown in fig. 9, the decision unit determines whether there is a first relation submodel, a second relation submodel, or a third relation submodel according to the input relation types, and if there is a first relation submodel, the decision unit determines a score corresponding to the first relation submodel _i =1, otherwise score _i =0, the weights w of three different models are set simultaneously _i Calculating the total Score of the relationship type, judging whether the total Score is greater than a Threshold value Threshold, if not, outputting the result of the relationship type, and if not, outputting the result; and the decision unit arranged in each depth relation model decides the input relation type and outputs the decided relation type as the relation type of the depth relation model.

It should be noted that, in the present application, the number of depth relationship models is not specifically limited, and those skilled in the art should select an appropriate number of depth relationship models according to the actual application situation, which is not described herein again.

In this embodiment, as shown in fig. 2, the multiple depth relationship models are included, each depth relationship model outputs the relationship type output by the decision unit to the relationship aggregation model, and the relationship aggregation model integrates the input relationship types and uses the integrated relationship types as a result of the multivariate relationship extraction system.

Therefore, the extraction of the multivariate relation of the input text is realized, and the relation type result is output.

In a specific embodiment, a multivariate relation extraction system in the financial field is taken as an example for explanation, the multivariate relation extraction system comprises a document preprocessing model, a shared coding layer of the document preprocessing model is a BERT-wm model, the document preprocessing model comprises 12 layers of encoders, and an average value of output results of a last 3-layer encoder is used as the semantic feature vector; the multivariate relation extraction system comprises three depth relation models, namely a high-pipe type relation model, an upstream and downstream type relation model and a stock control type relation model, wherein each depth relation model comprises three relation submodels, namely a combination type extraction model, a step-by-step type extraction model and a step-by-step type extraction model.

As shown in fig. 4-7, the input text "a segment of the text will invest in the mass travel, amounting to forty million in two stages" is taken as an example for explanation:

firstly, preprocessing an input text, extracting semantic features by using a shared coding layer and outputting a semantic feature vector.

1) The method comprises the steps of splitting an input text into a plurality of word input vectors according to characters, wherein the word input vectors comprise word vectors, paragraph vectors and position vectors.

In this embodiment, the text is first split into 20 characters (including punctuation marks), and "[ CLS ]" and "[ SEP ]" are added to the beginning and the end of the text, respectively.

2) And extracting semantic features by using a shared coding layer and outputting a semantic feature vector, wherein the semantic feature vector comprises a first overall feature vector and a plurality of second semantic feature vectors.

In this embodiment, 20 split characters are input to a shared coding layer optimized and refined based on a BERT-wm model, so as to obtain semantic feature vectors of 22 characters in total, and the dimension of each vector is 1x768.

And secondly, performing relation extraction on the semantic feature vectors by using each depth relation model and outputting a relation type, wherein each depth relation model comprises at least one relation sub-model for relation extraction and a decision unit for deciding and outputting the relation type.

First, as shown in fig. 4, relational extraction is performed using a joint extraction model. The federated decimation model includes a first enumerator, a first classifier, and a second classifier.

1) Segment enumerating the input text using the first enumerator and outputting a plurality of first entity candidate segments.

Specifically, the first enumerator enumerates text segments of all possible entities in the text to obtain potential first entity candidate segments. Specifically, four candidate segments of "Alibaba", "popular trip", "two-phase", and "forty million" can be obtained.

2) And using the first classifier to perform operation according to a combination vector and the first overall feature vector of each first entity candidate segment and output a plurality of first entity vectors and a plurality of first context text feature vectors, wherein the combination vector comprises the text feature vector, the position vector and the width vector of the first entity candidate segment, and the first entity vector comprises the text feature vector, the position vector and the width vector.

Specifically, the first classifier, according to each of the first entity candidate segments, performs an operation by combining a text feature vector (after maximum pooling), a position vector of a starting text of the segment, an overall feature vector (CLS vector output by a BERT-wm model) of an input text, and a width vector of the segment, determines whether the segment is an entity by Softmax, and outputs a corresponding first entity vector if the segment is an entity, including the position vector of the starting text of the entity, the text feature vector (after maximum pooling), the entity width vector, and a plurality of first context text feature vectors (after maximum pooling) between entities. In this example, the maximum pooling processing is performed on the "Alibaba" candidate segment, the semantic feature vector of the segment is obtained, the semantic feature vector of the segment is 1x768, the position of the starting text of the segment is 2 (the beginning is calculated as 0), the segment is mapped to a 1x30 position vector, the first overall feature vector ([ CLS ] position) of the text is 1x768 feature vector, the width of the segment is 4, the segment is mapped to a 1x30 width vector, the four vectors are spliced together to obtain a 1x 1596-dimensional vector, the vector is input to the full-connection layer to be classified by 0-1, and the model output is greater than 0.5, and the segment is judged to be a real entity. The above embodiment finally obtains three entities, "Ali baba", "popular trip" and "forty million" after passing through the first classifier.

3) And operating according to the plurality of first entity vectors and the plurality of first context text characteristic vectors respectively by using the second classifier, and outputting a first relation type.

Specifically, the first classifier performs splicing and operation according to each first entity vector and each first context vector to generate a first relationship type. For the above embodiment, the first classifier obtains three entities, namely, "arbiba", "popular trip", and "forty million", and performs pairwise relationship classification on the three entities, taking "arbiba" and "popular trip" as examples, obtains two 1x 828-dimensional entity vectors (excluding [ CLS ] position vectors) through the second classifier, performs maximum pooling processing on text feature vectors of three characters corresponding to "investment" between the two entities to obtain a 1x 768-dimensional first context vector, splices the three vectors together to obtain a 1x 2424-dimensional vector, performs relationship classification after passing through a full connection layer, and obtains a relationship type belonging to "subject company-investment-object company". It should be noted that, because the relationship has the subject-object classification, the pairwise classification has the difference of the front and back order, for example, "ariibaba-investment-mass travel" and "mass travel-invested-ariibaba" are two different relationships, and 3 entities need to be classified pairwise, and the total number of classification calculations needs to be performed for 6 times.

Next, as shown in FIGS. 5 and 6, the relationship extraction is performed using a step-wise extraction model, which includes a conditional random field model (CRF), a third classifier, and a fourth classifier.

1) And performing path judgment according to the semantic feature vector by using the conditional random field model and outputting a plurality of identified second entity vectors.

Specifically, the conditional random field model performs path judgment according to the semantic feature vector output by the shared coding layer and outputs the identified second entity vector. In this example, the output of the shared coding layer is the feature vectors of the corresponding characters, each of which is 1x768 dimensions, the semantic feature vectors of 22 characters are input into the CRF layer, and the entity label (adopting the BIO label system) corresponding to each character can be obtained through the constraint of the CRF, as shown in fig. 5, labels corresponding to the Alibaba are B-Co, I-Co, I-Co and I-Co respectively, wherein the Co represents the entity type of the company name, the entity is extracted and obtained according to the label rule and is a company entity, and the company name entity 'mass travel' and the digital entity 'forty million' can be obtained in the same way.

2) Operating on the plurality of second entity vectors using the third classifier and generating a plurality of third entity vectors and a plurality of second context text feature vectors.

In this example, the third classifier operates on the second entity vector and generates a third entity vector and a second context text feature vector. According to the recognition result, taking the 'Alibab' and the 'popular trip' as examples, the text feature vectors of the characters corresponding to the 'Alibab' and the 'popular trip' are respectively subjected to maximum pooling processing to obtain two third entity vectors with 1x768 dimensions, and the text feature vectors of the three characters are subjected to maximum pooling processing for the intermediate text 'investment' to obtain a second context text feature vector with 1x768 dimensions.

3) And using the fourth classifier to perform pairwise relationship classification according to two third entity vectors in the plurality of third entity vectors, the second context text feature vector between the two third entity vectors, the position feature vector of the first third entity vector in the two third entity vectors and the distance feature vector between the two third entity vectors respectively, and generate a corresponding second relationship type.

In this example, the fourth classifier respectively performs operations on two third entity vectors, a second context text feature vector between the two third entity vectors, a position feature vector of an entity starting position of a previous third entity vector in a text, and a distance feature vector between two entities, and outputs a corresponding second relationship type, where the distance feature vector is a distance between an end position of the previous third entity vector and a starting position of a next third entity vector. Taking "a, a" third entity vector as "a, a" third entity vector "as" a, "a position in a text is 0 (starting with 0), the position is mapped to a position vector with 1x30 dimensions, a distance between a previous third entity vector end position and a next third entity vector start position between two third entity vectors is calculated to be 3 (if a negative number represents a rear of a previous entity from a next entity), the distance is mapped to a distance vector with 1x30 dimensions, the position vector, the distance vector, the two third entity vectors and the second context vector are spliced to obtain a vector with 1x828 dimensions, and a classification result is obtained after a full connection layer, and the classification result belongs to a relationship type of" subject company-investment-object company ". It should be noted that, the step-by-step extraction model also needs to pay attention to the influence of the sequence between the entities on the relationship type, the operation between every two three entities needs to be classified for 6 times when the three entities are not optimized, in this example, only 4 times of classification operation are needed after the three entities are actually optimized, and the relationship between the digital entity vector and the company name entity vector only considers the company entity as the main body.

Again, as shown in fig. 7, the relationship extraction is performed using a one-step extraction model, which includes a fifth classifier and a relationship extractor.

1) And classifying and outputting a labeled fourth entity vector by using the fifth classifier according to the semantic feature vector, wherein the fourth entity vector comprises a relation subject identification vector, a relation object identification vector and an internal identification vector.

And the fifth classifier classifies according to the semantic feature vector output by the shared coding layer and outputs a labeled fourth entity vector, wherein the fourth entity vector comprises a relation subject identification vector, a relation object identification vector, an internal identification vector and other identification vectors. In this example, 20 characters of semantic feature vectors are obtained after passing through the shared coding layer, the dimension of each feature vector is 1x768, the probability that each character belongs to a certain relationship type role is obtained after the character is input into the full connection layer and the Sigmoid layer, and if the probability is greater than 0.5, the character is considered to belong to the role.

2) And extracting by using the relation extractor according to the relation subject identification vector, the relation object identification vector and the internal identification vector to obtain a subject and an object of the relation, and outputting a third relation type according to the subject and the object of the relation.

And the relation extractor extracts according to the relation subject identification vector, the relation object identification vector and the internal identification vector to obtain a relation entity and an object, and outputs a third relation type according to the relation entity and the object. In the above embodiment, "aribaba" is modeled to obtain corresponding character labels as "S-R1, I-R1", where "S-R1" represents a Subject (Subject) of the relationship "R1", and "mass travel" represents an object of the relationship "R1", and "forty million" represents an object of the relationship "R2", so as to obtain a third relationship type with a final result of "Subject company-investment-object company".

Finally, as shown in fig. 9, a decision unit is used to make a decision according to the first relationship type, the second relationship type, and the third relationship type and output the relationship types.

The decision unit determines weights (w) corresponding to the three sub-models according to the levels of a first relation type evaluation index (F1 value), a second relation type evaluation index (F2 value) and a third relation type evaluation index (F3 value) obtained in the training process ₁ ，w ₂ ，w ₃ ) The simplest way to do this is to set w ₁ :w ₂ :w ₃ 1 =1, for a specific relationship type, there is a score corresponding to the result of the submodel _i Finally, score of the relationship type is Score = w, which is 1 ₁ *score ₁ +w ₂ *score ₂ +w ₃ *score ₃ If the Score is more than or equal to 2, the relation class is outputAnd (4) molding.

And the relation extraction of the joint extraction model, the step extraction model and the one-step extraction model in the upstream and downstream relation models on the input text is completed, and the decision unit is used for outputting the relation type.

It should be noted that the manner of extracting the relationship by using the high-management-class relationship model and the stock-control-class relationship model is consistent with the upstream-downstream-class relationship model, and the three depth relationship models run in parallel and output the relationship types respectively, which is not described herein again.

And thirdly, converging the relation types output by the depth relation models by using a relation converging unit and generating a relation type result.

In this example, for the same input document, corresponding relationship type results are obtained through the high-management type relationship model, the upstream and downstream type relationship model and the stock control type relationship model, respectively, and the aggregation unit merges and collects the relationship results and outputs the result as all relationship type results of the document.

According to the multivariate relation extraction method, the multiple depth relation models are used for respectively extracting the relation of the input text to obtain multiple relation types, and then the relation aggregation units are used for aggregating to generate the relation type result, so that the recall rate of the relation extraction can be effectively improved, the accuracy rate of entity judgment is improved, the relation identification efficiency is improved, the problems in the prior art are solved, and the multivariate relation extraction method has practical application value.

Corresponding to the multivariate relationship extraction method provided in the foregoing embodiments, an embodiment of the present application further provides a multivariate relationship extraction system using the multivariate relationship extraction method, and since the multivariate relationship extraction system provided in the embodiment of the present application corresponds to the multivariate relationship extraction methods provided in the foregoing several embodiments, the foregoing embodiment is also applicable to the multivariate relationship extraction system provided in the present embodiment, and will not be described in detail in the present embodiment.

As shown in fig. 2, an embodiment of the present application further provides a multivariate relation extraction system applying the multivariate relation extraction method, including:

In the embodiment, the relation extraction is performed on the input text respectively by using the depth relation models to obtain a plurality of relation types, and then the relation aggregation unit is used for aggregating to generate the relation type result, so that the recall rate of the relation extraction can be effectively improved, the accuracy of entity judgment is improved, the relation identification efficiency is improved, the problems in the prior art are solved, and the method has practical application value.

Another embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements: s1: preprocessing an input text, extracting semantic features by using a shared coding layer and outputting a semantic feature vector; s2: respectively using each depth relation model to perform relation extraction on the semantic feature vectors and outputting relation types, wherein each depth relation model comprises at least one relation sub-model for relation extraction and a decision unit for deciding and outputting the relation types; s3: and converging the relation types output by the depth relation models by using a relation converging unit and generating a relation type result.

In practice, the computer readable storage medium may take any combination of one or more computer readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

As shown in fig. 10, another embodiment of the present invention provides a schematic structural diagram of a computer device. The computer device 12 shown in FIG. 10 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 10, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 10, and commonly referred to as a "hard drive"). Although not shown in FIG. 10, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown in FIG. 10, the network adapter 20 communicates with the other modules of the computer device 12 via the bus 18. It should be understood that although not shown in FIG. 10, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a multivariate relationship extraction method based on multi-model fusion provided by the embodiment of the present invention.

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims

1. A multivariate relation extraction method based on multi-model fusion is characterized by comprising the following steps:

s1: preprocessing the input text, extracting semantic features by using a shared coding layer and outputting a semantic feature vector, and further comprising:

s2: respectively using each depth relation model to perform relation extraction on the semantic feature vectors and outputting a plurality of relation types, wherein each depth relation model comprises at least two relation submodels with different structures for relation extraction and a decision unit for making a decision on an output result of the relation submodel and outputting the relation types;

2. The multivariate relationship extraction method based on multi-model fusion as claimed in claim 1, wherein the depth relationship model comprises a first relationship submodel comprising a first enumerator, a first classifier and a second classifier,

the S2 further includes:

3. The multivariate relationship extraction method based on multi-model fusion as claimed in claim 2,

the S212 further includes:

s2121: using the first classifier to respectively carry out operation according to the combination vector of each first entity candidate segment and the first overall characteristic vector to generate a first candidate entity;

the S213 further includes: and using the second classifier to perform pairwise relationship classification according to two first entity vectors in the plurality of first entity vectors and the first context text feature vector between the two first entity vectors respectively and generate corresponding first relationship types.

4. The multivariate relation extraction method based on multi-model fusion as claimed in claim 2, characterized in that the loss function of the first relation submodel is a weighted average of the loss functions of the first classifier and the second classifier.

5. The multivariate relationship extraction method based on multi-model fusion as claimed in claim 2, wherein the depth relationship model comprises a second relationship submodel comprising a conditional random field model, a third classifier and a fourth classifier,

the S2 further includes:

s222: computing, using the third classifier, the plurality of second entity vectors and generating a plurality of third entity vectors and a plurality of second context text feature vectors;

6. The multivariate relationship extraction method based on multi-model fusion as claimed in claim 5, wherein the depth relationship model comprises a third relationship submodel comprising a fifth classifier and a relationship extractor,

the S2 further includes:

7. The multivariate relationship extraction method based on multi-model fusion as claimed in claim 6, wherein the S2 further comprises: and the decision unit is used for making a decision according to the first relation type, the second relation type and the third relation type and outputting the relation types.

8. The multivariate relationship extraction method based on multi-model fusion as claimed in any one of claims 1-7, wherein the shared coding layer is one of a BERT-wm model, a RoBERTA model, an ERNIE model, a NEZHA model and an XLNET model.

9. The multivariate relation extraction method based on multi-model fusion as claimed in claim 8, wherein the shared coding layer comprises a 12-layer coder, the S1 further comprises:

10. A multivariate relation extraction system applying the multivariate relation extraction method based on multi-model fusion according to any one of claims 1-9, comprising:

the relation extraction model is configured to extract the relation of the semantic feature vector and output a plurality of relation types, and comprises a plurality of depth relation models and a decision unit, each depth relation model comprises at least two relation submodels with different structures for extracting the relation, and the decision unit is used for deciding the output result of the relation submodel and outputting the relation type;

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.

12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-9 when executing the program.