CN112364654A

CN112364654A - Education-field-oriented entity and relation combined extraction method

Info

Publication number: CN112364654A
Application number: CN202011252896.4A
Authority: CN
Inventors: 秦锋; 张志文; 郑啸
Original assignee: Anhui University of Technology AHUT
Current assignee: Anhui University of Technology AHUT
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2021-02-12

Abstract

The invention discloses an entity and relation combined extraction method facing the education field, which is used for solving the problem that the existing method is lack of application in the education field, the invention utilizes a pre-trained XLNET language model to obtain high-level feature embedding, captures text context semantic information through a Morrifier BiGRU neural network, and introduces a MultiHead Attention mechanism behind the Morrifier BiGRU neural network to capture more important parts in text features, thereby effectively solving the problem that a large number of modifiers interfere in the entity; the invention extracts the entity and the relation at the same time in a combined extraction mode, realizes the dependence between the entity and the relation subtask through a parameter sharing coding layer, and thereby relieves the problem of error propagation.

Description

Education-field-oriented entity and relation combined extraction method

Technical Field

The invention relates to an entity and relation combined extraction method for the education field, and belongs to the natural language processing technology.

Background

With the rapid development of online learning in the education field, the data volume of online courses grows exponentially, and how to efficiently and accurately extract useful entity and relationship information from the data becomes a research hotspot. In the past decades, text mining and Natural Language Processing (NLP) have made great progress; but the information extraction technology in the education field has a great promotion space. Information extraction techniques, which are representative in the field of online education, include extracting specific types of course knowledge point entity information and relationships between entities from text information of online courses. The extracted information is used for various types of research, is not only suitable for various NLP tasks (such as a document classification and question-answering system), but also plays an important role in personalized recommendation of online learning. As entity recognition and relationship extraction are widely applied in knowledge discovery and data mining analysis, the need for this technology will continue to grow.

Entity recognition and relationship extraction mainly comprise a dictionary-based mode, a rule-based mode, a machine learning-based mode and a deep learning-based mode. In the dictionary-based approach, terms in the dictionary are simply matched with words in the target sequence for entity extraction. While this approach is simple, the continuing increase in the number of entities and the diversity of symbols in the online course text data makes extraction difficult. In rule-based approaches, entity extraction tends to exhibit higher performance when applied to only one particular domain. In machine learning based methods, entity extraction is performed using various algorithms and statistical models. However, both rule-based and machine-learning methods are highly dependent on feature engineering, which is not only labor and time consuming, but also requires a lot of domain knowledge. Different from the previous method, the deep learning method does not need heavy manpower to make the features, and the deep learning method automatically extracts the most representative features by using a neural network, so that a very good effect is obtained.

In the existing research on named entity identification and relationship extraction, most scholars divide the process into two independent tasks, and solve the extraction problem of entities and relationships in a pipeline mode, and the method considers the extraction of the entities and the relationships as two independent subtasks executed successively: named Entity Recognition (NER) and Relationship Extraction (RE). Specifically, named entities in sentences are extracted firstly, then pairwise combination pairing is carried out on the extracted named entities, and finally semantic relations existing between the named entity pairs are identified. However, there are two major disadvantages to this type of approach: firstly, error propagation is carried out, errors of the named entity recognition module are transmitted to a downstream relation extraction module, and therefore the relation extraction performance is influenced; the second is to ignore the dependency existing between the two subtasks.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides an entity and relation combined extraction method facing the field of education, which is used for solving the problems that the existing method is lack of application in the field of education, high-level feature embedding is obtained by utilizing a pre-trained XLNET language model and an attention mechanism, and entity recognition and relation classification are processed simultaneously through a combined model to relieve error propagation.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:

an entity and relation combined extraction method oriented to the education field comprises the following steps:

(1) establishing a course knowledge point named entity corpus, wherein the course knowledge point named entity corpus consists of text data containing course knowledge points;

(2) carrying out distributed expression on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model (disorder language model);

(3) inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network (a deformed bidirectional gated recurrent neural network) for text feature extraction;

(4) introducing a MultiHead Attention mechanism (a multi-head Attention mechanism) behind a Mogrifer BiGRU neural network to capture a more important part in text features; the important part refers to a part which can form a knowledge entity in the text characteristics;

(5) and (4) obtaining the relationship between the named entities of the curriculum knowledge points and the knowledge entities by combining a CRF (conditional random field) model.

Specifically, in the step (1), a BIO labeling method (a standard method for converting a sequence into an original label) is firstly adopted to label the knowledge entities of the text data in the course knowledge point named entity corpus, that is, the text data is divided into P categories, each category is a label, the pth category is represented as a label P, where P is 1,2, …, and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.

Specifically, in the step (2), the sentence to which the XLNET language model is input is represented by S ═ S₁,s₂,…,s_N]The text pre-training vector output by the XLNET language model is represented as X ═ X₁,x₂,…,x_N](ii) a Wherein s is_iRepresenting the ith word, x, constituting the sentence S_iIs a character s_iI is 1,2, …, N.

Specifically, the Morrifier BiGRU neural network is different from the traditional GRU network (supplementing Chinese explanation) in that the context modeling capability of the whole model can be enhanced in a pre-interaction mode; the Morrifier BiGRU neural network comprises a forward GRU network and a backward GRU network, and the input and hidden layer output of the Morrifier BiGRU neural network are respectively X ═ X₁,x₂,…,x_N]And H ═ H₁,h₂,…,h_N]The input and hidden layer output of the forward GRU network are respectively

And

the input and hidden layer output to the GRU network are respectively

And

the superscripts t and t-1 denote t time and t-1 time, in pairs

And

carry out bidirectional multi-round interaction to obtain

For the forward GRU network, the interaction process is as follows:

(a41) to pair

And

carry out interaction to obtain

(a42) To pair

And

carry out interaction to obtain

(a43) To pair

And

carry out interaction to obtain

(a44) To pair

And

carry out interaction to obtain

(a45)

For the backward GRU network, the interaction process is as follows:

(b41) to pair

And

carry out interaction to obtain

(b42) To pair

And

carry out interaction to obtain

(b43) To pair

And

carry out interaction to obtain

(b44) To pair

And

carry out interaction to obtain

(b45)

Wherein: σ is a logistic regression function, R¹、R²、R³、R⁴Is a model parameter; to reduce the number of parameters, R may be used¹、R²、R³、R⁴Are all designed as product form of low rank matrix.

Specifically, in the step (4), a MultiHead authorization mechanism is introduced after the Morrifier BiGRU neural network, and the MultiHead authorization mechanism is used for further capturing the words s_iThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer. The MultiHead authorization mechanism differs from the traditionally used authorization mechanism in that: the MultiHead Attention mechanism can generate a plurality of different Attention scores in parallel, and finally, the Attention scores are spliced to serve as a final Attention score, so that important parts in text features can be captured better.

Specifically, the computation process of the MultiHead Attention mechanism includes the following steps:

(41) changing X to [ X ]₁,x₂,…,x_N]H ═ H output by Mogrifier BiGRU neural network₁,h₂,…,h_N]Mapping into K, Q, V three vectors;

(42) k, Q, V Attention is focused on the jth point of the Multihead Attention mechanism

Wherein:

is a matrix of three global parameters, d_NDenotes the input dimension of the MultiHead Attention mechanism, D denotes the total number of heads of Attention of the MultiHead Attention mechanism, D_k＝d_q＝d_v＝d_N/D；

(43) Calculating the jth attention value

(44) Splicing the D attention values to obtain the multi-head attention

Wherein: w^oAs a weight matrix, the ith row and the jth column element B of B_ijIndicating the word s_iWeight on jth attention;

(45) associated words s_iHidden state h of_iAnd attention weight b_ijGenerating words s_iContent vector of

(46) The important part of text features captured by introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network is C ═ C₁,c₂,…,c_N]。

Specifically, in the step (5), the CRF model is used as a label score layer, the CRF model is used to calculate the label score of each word under each label, then a Viterbi algorithm (Viterbi algorithm) is applied to obtain a label sequence with the highest label score, and then a relationship between the course knowledge point named entity and the knowledge entity is obtained through the relationship extraction layer.

More specifically, in the step (5), the CRF model is used to calculate the word s_iLabel score under label p

Wherein: superscript (ner) represents knowledge entity annotation recognition; v^(ner)And U^(ner)Representing a weight matrix, b^(ner)Representing a bias matrix, V^(ner)∈R^p×l，U^(ner)∈R^l×2d，b^(ner)∈R^lL is the layer width of the CRF model, and d is the number of hidden layer units of the Morrifier BiGRU neural network; f (-) represents a nonlinear activation function;

assigning labels to all words in the sentence S, so as to obtain a label sequence of the sentence S, where each sentence S has R ═ N^PAnd (3) breeding the label sequences, and calculating the label score of S under the r label sequences:

wherein: y is_rDenotes the r-th tag sequence, Y ═ Y₁,Y₂,…,Y_R]，r＝1,2,…,R，

Indicates the lower character s of the r-th label sequence_iThe label score under the assigned label is,

under the designation of the r-th tag sequenceTag score of sentence S under assigned tag, A_{(i,r),(i+1,r)}Indicates the lower character s of the r-th label sequence_iTransferring the assigned label to the word s_i+1A represents a transition matrix, A ∈ R^(P+2)×(P+2)(ii) a Because the start label and the end label are considered when the hierarchy is built, the dimensionality in the transfer process is 2 more than that of P;

scoring tags for tag sequences

And (3) carrying out normalization to obtain the probability distribution of each label sequence:

obtaining the label sequence with the highest label score by using Viterbi algorithm

Wherein: sequence of tags scoring highest at tag

Middle and character s_iThe assigned label is g_i；

The label score layer is trained using a method that minimizes cross-entropy loss.

More specifically, in the step (5), when the relationship extraction layer is used for extracting the relationship between the named entity of the curriculum knowledge point and the knowledge entity, the word s under the given relationship q is calculated firstly_iAnd character s_jThe relationship score between:

S^(re)(m_j,m_i,q)＝V^(re)f(U^(re)m_j+W^(re)m_i+b^(re))

wherein: m is_i＝[c_i；g_i]，m_j＝[c_j；g_j]Superscript (re) denotes relationship recognition, V^(re)、U^(re)And W^(re)Representing a weight matrix, b^(re) Representing a bias matrix, V^(re)∈R^l，U^(re)∈R^l×(2a+d)，W^(re)∈R^l×(2a+d)，b^(re)∈R^lL is the layer width of the relation extraction layer, d is the number of hidden layer units of the Mogrifier BiGRU neural network, and a is the dimension of the label; f (-) represents a nonlinear activation function;

word s_iAnd character s_jThe probability distribution case with the relation q:

the relationship extraction layer is trained using a method of minimizing cross-entropy loss.

Specifically, the cross entropy loss L in the training process of the relationship extraction layer_RECalculated using the formula:

the objective function is min (L)_NER+L_RE) Wherein: l is_NERCross entropy loss during training for label scoring tiers.

Has the advantages that: compared with the prior art, the entity and relationship combined extraction method for the education field has the following advantages that: 1. according to the invention, a pre-trained XLANT language model is used for designing high-level feature embedding, and dynamic embedding representation is carried out on the same word according to context information instead of directly using fixed word vector information, so that the accuracy of converting a word embedding layer text into a low-density embedding vector can be greatly improved, the negative influence of a ambiguous word on the model performance is reduced, and the local and global information of the word is effectively captured; 2. according to the method, a MultiHead authorization mechanism is introduced behind a Morrifier BiGRU neural network to capture a more important part in text characteristics, so that the problem of interference of a large number of modifiers in an entity is effectively solved; 3. the invention extracts the entity and the relation at the same time in a combined extraction mode, realizes the dependence between the entity and the relation subtask through a parameter sharing coding layer, and thereby relieves the problem of error propagation.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

Fig. 1 shows a combined extraction method of entities and relations in the education domain, which comprises the following steps:

the method comprises the following steps: establishing a course knowledge point named entity corpus, wherein the course knowledge point named entity corpus is composed of text data containing course knowledge points.

Firstly, carrying out knowledge entity annotation on text data in a course knowledge point named entity corpus by adopting a BIO annotation method, namely dividing the text data into P categories, wherein each category is a label, and the P-th category is represented as a label P, wherein P is 1,2, … and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.

Step two: and performing distributed representation on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model.

A sentence input into the XLNET language model is denoted as S ═ S₁,s₂,…,s_N]The text pre-training vector output by the XLNET language model is represented as X ═ X₁,x₂,…,x_N](ii) a Wherein s is_iRepresenting the ith word, x, constituting the sentence S_iIs a character s_iI is 1,2, …, N.

Step three: and inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network for text feature extraction.

The Morrifier BiGRU neural network comprises a forward GRU network and a backward GRU network, and the input and hidden layer output of the Morrifier BiGRU neural network are respectively X ═ X₁,x₂,…,x_N]And H ═ H₁,h₂,…,h_N]The input and hidden layer output of the forward GRU network are respectively

And

the input and hidden layer output to the GRU network are respectively

And

the superscripts t and t-1 denote t time and t-1 time, in pairs

And

carry out bidirectional multi-round interaction to obtain

For the forward GRU network, the interaction process is as follows:

(a41) to pair

And

carry out interaction to obtain

(a42) To pair

And

carry out interaction to obtain

(a43) To pair

And

carry out interaction to obtain

(a44) To pair

And

carry out interaction to obtain

(a45)

For the backward GRU network, the interaction process is as follows:

(b41) to pair

And

carry out interaction to obtain

(b42) To pair

And

carry out interaction to obtain

(b43) To pair

And

carry out interaction to obtain

(b44) To pair

And

carry out interaction to obtain

(b45)

Wherein: σ is a logistic regression function, R¹、R²、R³、R⁴Are model parameters.

Step four: introducing a MultiHead Attention mechanism behind a Morrifier BiGRU neural network to capture more important parts in text features; the important part refers to the part of the text feature which can form the knowledge entity.

Introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network, and further capturing words s by using the MultiHead Attention mechanism_iThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer. The calculation process of the MultiHead Attention mechanism comprises the following steps:

Wherein:

(43) Calculating the jth attention value

(44) Splicing the D attention values to obtain the multi-head attention

Step five: and obtaining the relationship between the course knowledge point named entity and the knowledge entity by combining the CRF model.

And taking the CRF model as a label score layer, firstly calculating the label score of each word under each label by using the CRF model, then obtaining a label sequence with the highest label score by applying a Viterbi algorithm, and then obtaining the relation between the course knowledge point named entity and the knowledge entity through a relation extraction layer.

Computing words s using a CRF model_iLabel score under label p

represents the tag score of the sentence S under the assigned tag under the r label sequence, A_{(i,r),(i+1,r)}Indicates the lower character s of the r-th label sequence_iTransferring the assigned label to the word s_i+1A represents a transition matrix, A ∈ R^(P+2)×(P+2)；

Scoring tags for tag sequences

Wherein: sequence of tags scoring highest at tag

Middle and character s_iThe assigned label is g_i；

When the relation extraction layer is adopted to extract the relation between the named entities of the curriculum knowledge points and the knowledge entities, the character s under the given relation q is calculated firstly_iAnd character s_jThe relationship score between:

S^(re)(m_j,m_i,q)＝V^(re)f(U^(re)m_j+W^(re)m_i+b^(re))

wherein: m is_i＝[c_i；g_i]，m_j＝[c_j；g_j]Superscript (re) denotes relationship recognition, V^(re)、U^(re)And W^(re)Representing a weight matrix, b^(re)Representing a bias matrix, V^(re)∈R^l，U^(re)∈R^l×(2a+d)，W^(re)∈R^l×(2a+d)，b^(re)∈R^lL is the layer width of the relation extraction layer, d is the number of hidden layer units of the Mogrifier BiGRU neural network, and a is the dimension of the label; f (-) represents a nonlinear activation function;

word s_iAnd character s_jThe probability distribution case with the relation q:

Cross entropy loss L in the training process of the relation extraction layer_RECalculated using the formula:

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. An entity and relation combined extraction method oriented to the education field is characterized in that: the method comprises the following steps:

(2) carrying out distributed representation on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model;

(3) inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network for text feature extraction;

(4) introducing a MultiHead Attention mechanism behind a Morrifier BiGRU neural network to capture more important parts in text features; the important part refers to a part which can form a knowledge entity in the text characteristics;

(5) and obtaining the relationship between the course knowledge point named entity and the knowledge entity by combining the CRF model.

2. The method of claim 1, wherein the method comprises the following steps: in the step (1), knowledge entity labeling is performed on text data in the course knowledge point named entity corpus by using a BIO labeling method, that is, the text data is divided into P categories, each category is a label, the pth category is represented as a label P, and P is 1,2, …, and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.

3. The method of claim 1, wherein the method comprises the following steps: said step (2)In (1), a sentence input to the XLNET language model is denoted as S ═ S₁,s₂,…,s_N]The text pre-training vector output by the XLNET language model is represented as X ═ X₁,x₂,…,x_N](ii) a Wherein s is_iRepresenting the ith word, x, constituting the sentence S_iIs a character s_iI is 1,2, …, N.

4. The method of claim 3, wherein the method comprises the following steps: in the step (3), the morrifier BiGRU neural network includes a forward GRU network and a backward GRU network, and the input and the hidden layer output of the morrifier BiGRU neural network are X ═ X respectively₁,x₂,…,x_N]And H ═ H₁,h₂,…,h_N]The input and hidden layer output of the forward GRU network are respectively

And

the input and hidden layer output to the GRU network are respectively

And

the superscripts t and t-1 denote t time and t-1 time, in pairs

And

carry out bidirectional multi-round interaction to obtain

For the forward GRU network, the interaction process is as follows:

(a41) to pair

And

carry out interaction to obtain

(a42) To pair

And

carry out interaction to obtain

(a43) To pair

And

carry out interaction to obtain

(a44) To pair

And

carry out interaction to obtain

(a45)

For the backward GRU network, the interaction process is as follows:

(b41) to pair

And

carry out interaction to obtain

(b42) To pair

And

carry out interaction to obtain

(b43) To pair

And

carry out interaction to obtain

(b44) To pair

And

carry out interaction to obtain

(b45)

5. The method of claim 4, wherein the method comprises the following steps: in the step (4), a MultiHead authorization mechanism is introduced after the Morrifier BiGRU neural network, and the MultiHead authorization mechanism is used for further capturing the words s_iThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer.

6. The method of claim 5, wherein the method comprises the following steps: the calculation process of the MultiHead Attention mechanism comprises the following steps:

Wherein:

(43) Calculating the jth attention value

(44) Splicing the D attention values to obtain the multi-head attention

7. The method of claim 5, wherein the method comprises the following steps: in the step (5), the CRF model is used as a label score layer, the CRF model is used for calculating the label score of each word under each label, then a Viterbi algorithm is applied to obtain a label sequence with the highest label score, and then the relation between the course knowledge point named entity and the knowledge entity is obtained through the relation extraction layer.

8. The method of claim 7The entity and relation combined extraction method facing the education field is characterized in that: in the step (5), the CRF model is used for calculating the words s_iLabel score under label p

Scoring tags for tag sequences

Wherein: sequence of tags scoring highest at tag

Middle and character s_iThe assigned label is g_i；

9. The method of claim 8, wherein the method comprises the steps of: in the step (5), when the relation between the course knowledge point named entity and the knowledge entity is extracted by adopting the relation extraction layer, the character s under the given relation q is calculated firstly_iAnd character s_jThe relationship score between:

S^(re)(m_j,m_i,q)＝V^(re)f(U^(re)m_j+W^(re)m_i+b^(re))

word s_iAnd character s_jThe probability distribution case with the relation q:

10. The method of claim 8, wherein the method comprises the steps of: cross entropy loss L in the training process of the relation extraction layer_RECalculated using the formula: