CN112364654A - Education-field-oriented entity and relation combined extraction method - Google Patents

Education-field-oriented entity and relation combined extraction method Download PDF

Info

Publication number
CN112364654A
CN112364654A CN202011252896.4A CN202011252896A CN112364654A CN 112364654 A CN112364654 A CN 112364654A CN 202011252896 A CN202011252896 A CN 202011252896A CN 112364654 A CN112364654 A CN 112364654A
Authority
CN
China
Prior art keywords
label
entity
relation
attention
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011252896.4A
Other languages
Chinese (zh)
Inventor
秦锋
张志文
郑啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Technology AHUT
Original Assignee
Anhui University of Technology AHUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Technology AHUT filed Critical Anhui University of Technology AHUT
Priority to CN202011252896.4A priority Critical patent/CN112364654A/en
Publication of CN112364654A publication Critical patent/CN112364654A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Abstract

The invention discloses an entity and relation combined extraction method facing the education field, which is used for solving the problem that the existing method is lack of application in the education field, the invention utilizes a pre-trained XLNET language model to obtain high-level feature embedding, captures text context semantic information through a Morrifier BiGRU neural network, and introduces a MultiHead Attention mechanism behind the Morrifier BiGRU neural network to capture more important parts in text features, thereby effectively solving the problem that a large number of modifiers interfere in the entity; the invention extracts the entity and the relation at the same time in a combined extraction mode, realizes the dependence between the entity and the relation subtask through a parameter sharing coding layer, and thereby relieves the problem of error propagation.

Description

Education-field-oriented entity and relation combined extraction method
Technical Field
The invention relates to an entity and relation combined extraction method for the education field, and belongs to the natural language processing technology.
Background
With the rapid development of online learning in the education field, the data volume of online courses grows exponentially, and how to efficiently and accurately extract useful entity and relationship information from the data becomes a research hotspot. In the past decades, text mining and Natural Language Processing (NLP) have made great progress; but the information extraction technology in the education field has a great promotion space. Information extraction techniques, which are representative in the field of online education, include extracting specific types of course knowledge point entity information and relationships between entities from text information of online courses. The extracted information is used for various types of research, is not only suitable for various NLP tasks (such as a document classification and question-answering system), but also plays an important role in personalized recommendation of online learning. As entity recognition and relationship extraction are widely applied in knowledge discovery and data mining analysis, the need for this technology will continue to grow.
Entity recognition and relationship extraction mainly comprise a dictionary-based mode, a rule-based mode, a machine learning-based mode and a deep learning-based mode. In the dictionary-based approach, terms in the dictionary are simply matched with words in the target sequence for entity extraction. While this approach is simple, the continuing increase in the number of entities and the diversity of symbols in the online course text data makes extraction difficult. In rule-based approaches, entity extraction tends to exhibit higher performance when applied to only one particular domain. In machine learning based methods, entity extraction is performed using various algorithms and statistical models. However, both rule-based and machine-learning methods are highly dependent on feature engineering, which is not only labor and time consuming, but also requires a lot of domain knowledge. Different from the previous method, the deep learning method does not need heavy manpower to make the features, and the deep learning method automatically extracts the most representative features by using a neural network, so that a very good effect is obtained.
In the existing research on named entity identification and relationship extraction, most scholars divide the process into two independent tasks, and solve the extraction problem of entities and relationships in a pipeline mode, and the method considers the extraction of the entities and the relationships as two independent subtasks executed successively: named Entity Recognition (NER) and Relationship Extraction (RE). Specifically, named entities in sentences are extracted firstly, then pairwise combination pairing is carried out on the extracted named entities, and finally semantic relations existing between the named entity pairs are identified. However, there are two major disadvantages to this type of approach: firstly, error propagation is carried out, errors of the named entity recognition module are transmitted to a downstream relation extraction module, and therefore the relation extraction performance is influenced; the second is to ignore the dependency existing between the two subtasks.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides an entity and relation combined extraction method facing the field of education, which is used for solving the problems that the existing method is lack of application in the field of education, high-level feature embedding is obtained by utilizing a pre-trained XLNET language model and an attention mechanism, and entity recognition and relation classification are processed simultaneously through a combined model to relieve error propagation.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
an entity and relation combined extraction method oriented to the education field comprises the following steps:
(1) establishing a course knowledge point named entity corpus, wherein the course knowledge point named entity corpus consists of text data containing course knowledge points;
(2) carrying out distributed expression on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model (disorder language model);
(3) inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network (a deformed bidirectional gated recurrent neural network) for text feature extraction;
(4) introducing a MultiHead Attention mechanism (a multi-head Attention mechanism) behind a Mogrifer BiGRU neural network to capture a more important part in text features; the important part refers to a part which can form a knowledge entity in the text characteristics;
(5) and (4) obtaining the relationship between the named entities of the curriculum knowledge points and the knowledge entities by combining a CRF (conditional random field) model.
Specifically, in the step (1), a BIO labeling method (a standard method for converting a sequence into an original label) is firstly adopted to label the knowledge entities of the text data in the course knowledge point named entity corpus, that is, the text data is divided into P categories, each category is a label, the pth category is represented as a label P, where P is 1,2, …, and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.
Specifically, in the step (2), the sentence to which the XLNET language model is input is represented by S ═ S1,s2,…,sN]The text pre-training vector output by the XLNET language model is represented as X ═ X1,x2,…,xN](ii) a Wherein s isiRepresenting the ith word, x, constituting the sentence SiIs a character siI is 1,2, …, N.
Specifically, the Morrifier BiGRU neural network is different from the traditional GRU network (supplementing Chinese explanation) in that the context modeling capability of the whole model can be enhanced in a pre-interaction mode; the Morrifier BiGRU neural network comprises a forward GRU network and a backward GRU network, and the input and hidden layer output of the Morrifier BiGRU neural network are respectively X ═ X1,x2,…,xN]And H ═ H1,h2,…,hN]The input and hidden layer output of the forward GRU network are respectively
Figure BDA0002772171960000031
And
Figure BDA0002772171960000032
the input and hidden layer output to the GRU network are respectively
Figure BDA0002772171960000033
And
Figure BDA0002772171960000034
the superscripts t and t-1 denote t time and t-1 time, in pairs
Figure BDA0002772171960000035
And
Figure BDA0002772171960000036
carry out bidirectional multi-round interaction to obtain
Figure BDA0002772171960000037
Figure BDA0002772171960000038
For the forward GRU network, the interaction process is as follows:
(a41) to pair
Figure BDA0002772171960000039
And
Figure BDA00027721719600000310
carry out interaction to obtain
Figure BDA00027721719600000311
(a42) To pair
Figure BDA00027721719600000312
And
Figure BDA00027721719600000313
carry out interaction to obtain
Figure BDA00027721719600000314
(a43) To pair
Figure BDA00027721719600000315
And
Figure BDA00027721719600000316
carry out interaction to obtain
Figure BDA00027721719600000317
(a44) To pair
Figure BDA00027721719600000318
And
Figure BDA00027721719600000319
carry out interaction to obtain
Figure BDA00027721719600000320
(a45)
Figure BDA00027721719600000321
For the backward GRU network, the interaction process is as follows:
(b41) to pair
Figure BDA00027721719600000322
And
Figure BDA00027721719600000323
carry out interaction to obtain
Figure BDA00027721719600000324
(b42) To pair
Figure BDA0002772171960000041
And
Figure BDA0002772171960000042
carry out interaction to obtain
Figure BDA0002772171960000043
(b43) To pair
Figure BDA0002772171960000044
And
Figure BDA0002772171960000045
carry out interaction to obtain
Figure BDA0002772171960000046
(b44) To pair
Figure BDA0002772171960000047
And
Figure BDA0002772171960000048
carry out interaction to obtain
Figure BDA0002772171960000049
(b45)
Figure BDA00027721719600000410
Wherein: σ is a logistic regression function, R1、R2、R3、R4Is a model parameter; to reduce the number of parameters, R may be used1、R2、R3、R4Are all designed as product form of low rank matrix.
Specifically, in the step (4), a MultiHead authorization mechanism is introduced after the Morrifier BiGRU neural network, and the MultiHead authorization mechanism is used for further capturing the words siThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer. The MultiHead authorization mechanism differs from the traditionally used authorization mechanism in that: the MultiHead Attention mechanism can generate a plurality of different Attention scores in parallel, and finally, the Attention scores are spliced to serve as a final Attention score, so that important parts in text features can be captured better.
Specifically, the computation process of the MultiHead Attention mechanism includes the following steps:
(41) changing X to [ X ]1,x2,…,xN]H ═ H output by Mogrifier BiGRU neural network1,h2,…,hN]Mapping into K, Q, V three vectors;
(42) k, Q, V Attention is focused on the jth point of the Multihead Attention mechanism
Figure BDA00027721719600000411
Figure BDA00027721719600000412
Wherein:
Figure BDA00027721719600000413
is a matrix of three global parameters, dNDenotes the input dimension of the MultiHead Attention mechanism, D denotes the total number of heads of Attention of the MultiHead Attention mechanism, Dk=dq=dv=dN/D;
(43) Calculating the jth attention value
Figure BDA0002772171960000051
(44) Splicing the D attention values to obtain the multi-head attention
Figure BDA0002772171960000052
Wherein: woAs a weight matrix, the ith row and the jth column element B of BijIndicating the word siWeight on jth attention;
(45) associated words siHidden state h ofiAnd attention weight bijGenerating words siContent vector of
Figure BDA0002772171960000053
(46) The important part of text features captured by introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network is C ═ C1,c2,…,cN]。
Specifically, in the step (5), the CRF model is used as a label score layer, the CRF model is used to calculate the label score of each word under each label, then a Viterbi algorithm (Viterbi algorithm) is applied to obtain a label sequence with the highest label score, and then a relationship between the course knowledge point named entity and the knowledge entity is obtained through the relationship extraction layer.
More specifically, in the step (5), the CRF model is used to calculate the word siLabel score under label p
Figure BDA0002772171960000054
Figure BDA0002772171960000055
Wherein: superscript (ner) represents knowledge entity annotation recognition; v(ner)And U(ner)Representing a weight matrix, b(ner)Representing a bias matrix, V(ner)∈Rp×l,U(ner)∈Rl×2d,b(ner)∈RlL is the layer width of the CRF model, and d is the number of hidden layer units of the Morrifier BiGRU neural network; f (-) represents a nonlinear activation function;
assigning labels to all words in the sentence S, so as to obtain a label sequence of the sentence S, where each sentence S has R ═ NPAnd (3) breeding the label sequences, and calculating the label score of S under the r label sequences:
Figure BDA0002772171960000056
wherein: y isrDenotes the r-th tag sequence, Y ═ Y1,Y2,…,YR],r=1,2,…,R,
Figure BDA0002772171960000057
Indicates the lower character s of the r-th label sequenceiThe label score under the assigned label is,
Figure BDA0002772171960000058
under the designation of the r-th tag sequenceTag score of sentence S under assigned tag, A(i,r),(i+1,r)Indicates the lower character s of the r-th label sequenceiTransferring the assigned label to the word si+1A represents a transition matrix, A ∈ R(P+2)×(P+2)(ii) a Because the start label and the end label are considered when the hierarchy is built, the dimensionality in the transfer process is 2 more than that of P;
scoring tags for tag sequences
Figure BDA0002772171960000061
And (3) carrying out normalization to obtain the probability distribution of each label sequence:
Figure BDA0002772171960000062
obtaining the label sequence with the highest label score by using Viterbi algorithm
Figure BDA0002772171960000063
Figure BDA0002772171960000064
Wherein: sequence of tags scoring highest at tag
Figure BDA0002772171960000065
Middle and character siThe assigned label is gi
The label score layer is trained using a method that minimizes cross-entropy loss.
More specifically, in the step (5), when the relationship extraction layer is used for extracting the relationship between the named entity of the curriculum knowledge point and the knowledge entity, the word s under the given relationship q is calculated firstlyiAnd character sjThe relationship score between:
S(re)(mj,mi,q)=V(re)f(U(re)mj+W(re)mi+b(re))
wherein: m isi=[ci;gi],mj=[cj;gj]Superscript (re) denotes relationship recognition, V(re)、U(re)And W(re)Representing a weight matrix, b(re) Representing a bias matrix, V(re)∈Rl,U(re)∈Rl×(2a+d),W(re)∈Rl×(2a+d),b(re)∈RlL is the layer width of the relation extraction layer, d is the number of hidden layer units of the Mogrifier BiGRU neural network, and a is the dimension of the label; f (-) represents a nonlinear activation function;
word siAnd character sjThe probability distribution case with the relation q:
Figure BDA0002772171960000066
the relationship extraction layer is trained using a method of minimizing cross-entropy loss.
Specifically, the cross entropy loss L in the training process of the relationship extraction layerRECalculated using the formula:
Figure BDA0002772171960000067
the objective function is min (L)NER+LRE) Wherein: l isNERCross entropy loss during training for label scoring tiers.
Has the advantages that: compared with the prior art, the entity and relationship combined extraction method for the education field has the following advantages that: 1. according to the invention, a pre-trained XLANT language model is used for designing high-level feature embedding, and dynamic embedding representation is carried out on the same word according to context information instead of directly using fixed word vector information, so that the accuracy of converting a word embedding layer text into a low-density embedding vector can be greatly improved, the negative influence of a ambiguous word on the model performance is reduced, and the local and global information of the word is effectively captured; 2. according to the method, a MultiHead authorization mechanism is introduced behind a Morrifier BiGRU neural network to capture a more important part in text characteristics, so that the problem of interference of a large number of modifiers in an entity is effectively solved; 3. the invention extracts the entity and the relation at the same time in a combined extraction mode, realizes the dependence between the entity and the relation subtask through a parameter sharing coding layer, and thereby relieves the problem of error propagation.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
Fig. 1 shows a combined extraction method of entities and relations in the education domain, which comprises the following steps:
the method comprises the following steps: establishing a course knowledge point named entity corpus, wherein the course knowledge point named entity corpus is composed of text data containing course knowledge points.
Firstly, carrying out knowledge entity annotation on text data in a course knowledge point named entity corpus by adopting a BIO annotation method, namely dividing the text data into P categories, wherein each category is a label, and the P-th category is represented as a label P, wherein P is 1,2, … and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.
Step two: and performing distributed representation on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model.
A sentence input into the XLNET language model is denoted as S ═ S1,s2,…,sN]The text pre-training vector output by the XLNET language model is represented as X ═ X1,x2,…,xN](ii) a Wherein s isiRepresenting the ith word, x, constituting the sentence SiIs a character siI is 1,2, …, N.
Step three: and inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network for text feature extraction.
The Morrifier BiGRU neural network comprises a forward GRU network and a backward GRU network, and the input and hidden layer output of the Morrifier BiGRU neural network are respectively X ═ X1,x2,…,xN]And H ═ H1,h2,…,hN]The input and hidden layer output of the forward GRU network are respectively
Figure BDA0002772171960000081
And
Figure BDA0002772171960000082
the input and hidden layer output to the GRU network are respectively
Figure BDA0002772171960000083
And
Figure BDA0002772171960000084
the superscripts t and t-1 denote t time and t-1 time, in pairs
Figure BDA0002772171960000085
And
Figure BDA0002772171960000086
carry out bidirectional multi-round interaction to obtain
Figure BDA0002772171960000087
Figure BDA0002772171960000088
For the forward GRU network, the interaction process is as follows:
(a41) to pair
Figure BDA0002772171960000089
And
Figure BDA00027721719600000810
carry out interaction to obtain
Figure BDA00027721719600000811
(a42) To pair
Figure BDA00027721719600000812
And
Figure BDA00027721719600000813
carry out interaction to obtain
Figure BDA00027721719600000814
(a43) To pair
Figure BDA00027721719600000815
And
Figure BDA00027721719600000816
carry out interaction to obtain
Figure BDA00027721719600000817
(a44) To pair
Figure BDA00027721719600000818
And
Figure BDA00027721719600000819
carry out interaction to obtain
Figure BDA00027721719600000820
(a45)
Figure BDA00027721719600000821
For the backward GRU network, the interaction process is as follows:
(b41) to pair
Figure BDA00027721719600000822
And
Figure BDA00027721719600000823
carry out interaction to obtain
Figure BDA00027721719600000824
(b42) To pair
Figure BDA00027721719600000825
And
Figure BDA00027721719600000826
carry out interaction to obtain
Figure BDA00027721719600000827
(b43) To pair
Figure BDA0002772171960000091
And
Figure BDA0002772171960000092
carry out interaction to obtain
Figure BDA0002772171960000093
(b44) To pair
Figure BDA0002772171960000094
And
Figure BDA0002772171960000095
carry out interaction to obtain
Figure BDA0002772171960000096
(b45)
Figure BDA0002772171960000097
Wherein: σ is a logistic regression function, R1、R2、R3、R4Are model parameters.
Step four: introducing a MultiHead Attention mechanism behind a Morrifier BiGRU neural network to capture more important parts in text features; the important part refers to the part of the text feature which can form the knowledge entity.
Introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network, and further capturing words s by using the MultiHead Attention mechanismiThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer. The calculation process of the MultiHead Attention mechanism comprises the following steps:
(41) changing X to [ X ]1,x2,…,xN]H ═ H output by Mogrifier BiGRU neural network1,h2,…,hN]Mapping into K, Q, V three vectors;
(42) k, Q, V Attention is focused on the jth point of the Multihead Attention mechanism
Figure BDA0002772171960000098
Figure BDA0002772171960000099
Wherein:
Figure BDA00027721719600000910
is a matrix of three global parameters, dNDenotes the input dimension of the MultiHead Attention mechanism, D denotes the total number of heads of Attention of the MultiHead Attention mechanism, Dk=dq=dv=dN/D;
(43) Calculating the jth attention value
Figure BDA00027721719600000911
(44) Splicing the D attention values to obtain the multi-head attention
Figure BDA00027721719600000912
Wherein: woAs a weight matrix, the ith row and the jth column element B of BijIndicating the word siWeight on jth attention;
(45) associated words siHidden state h ofiAnd attention weight bijGenerating words siContent vector of
Figure BDA00027721719600000913
(46) The important part of text features captured by introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network is C ═ C1,c2,…,cN]。
Step five: and obtaining the relationship between the course knowledge point named entity and the knowledge entity by combining the CRF model.
And taking the CRF model as a label score layer, firstly calculating the label score of each word under each label by using the CRF model, then obtaining a label sequence with the highest label score by applying a Viterbi algorithm, and then obtaining the relation between the course knowledge point named entity and the knowledge entity through a relation extraction layer.
Computing words s using a CRF modeliLabel score under label p
Figure BDA0002772171960000101
Figure BDA0002772171960000102
Wherein: superscript (ner) represents knowledge entity annotation recognition; v(ner)And U(ner)Representing a weight matrix, b(ner)Representing a bias matrix, V(ner)∈Rp×l,U(ner)∈Rl×2d,b(ner)∈RlL is the layer width of the CRF model, and d is the number of hidden layer units of the Morrifier BiGRU neural network; f (-) represents a nonlinear activation function;
assigning labels to all words in the sentence S, so as to obtain a label sequence of the sentence S, where each sentence S has R ═ NPAnd (3) breeding the label sequences, and calculating the label score of S under the r label sequences:
Figure BDA0002772171960000103
wherein: y isrDenotes the r-th tag sequence, Y ═ Y1,Y2,…,YR],r=1,2,…,R,
Figure BDA0002772171960000104
Indicates the lower character s of the r-th label sequenceiThe label score under the assigned label is,
Figure BDA0002772171960000105
represents the tag score of the sentence S under the assigned tag under the r label sequence, A(i,r),(i+1,r)Indicates the lower character s of the r-th label sequenceiTransferring the assigned label to the word si+1A represents a transition matrix, A ∈ R(P+2)×(P+2)
Scoring tags for tag sequences
Figure BDA0002772171960000106
And (3) carrying out normalization to obtain the probability distribution of each label sequence:
Figure BDA0002772171960000107
obtaining the label sequence with the highest label score by using Viterbi algorithm
Figure BDA0002772171960000108
Figure BDA0002772171960000111
Wherein: sequence of tags scoring highest at tag
Figure BDA0002772171960000112
Middle and character siThe assigned label is gi
The label score layer is trained using a method that minimizes cross-entropy loss.
When the relation extraction layer is adopted to extract the relation between the named entities of the curriculum knowledge points and the knowledge entities, the character s under the given relation q is calculated firstlyiAnd character sjThe relationship score between:
S(re)(mj,mi,q)=V(re)f(U(re)mj+W(re)mi+b(re))
wherein: m isi=[ci;gi],mj=[cj;gj]Superscript (re) denotes relationship recognition, V(re)、U(re)And W(re)Representing a weight matrix, b(re)Representing a bias matrix, V(re)∈Rl,U(re)∈Rl×(2a+d),W(re)∈Rl×(2a+d),b(re)∈RlL is the layer width of the relation extraction layer, d is the number of hidden layer units of the Mogrifier BiGRU neural network, and a is the dimension of the label; f (-) represents a nonlinear activation function;
word siAnd character sjThe probability distribution case with the relation q:
Figure BDA0002772171960000113
the relationship extraction layer is trained using a method of minimizing cross-entropy loss.
Cross entropy loss L in the training process of the relation extraction layerRECalculated using the formula:
Figure BDA0002772171960000114
the objective function is min (L)NER+LRE) Wherein: l isNERCross entropy loss during training for label scoring tiers.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1. An entity and relation combined extraction method oriented to the education field is characterized in that: the method comprises the following steps:
(1) establishing a course knowledge point named entity corpus, wherein the course knowledge point named entity corpus consists of text data containing course knowledge points;
(2) carrying out distributed representation on the preprocessed text data containing the course knowledge points, taking sentences as input, and obtaining a text pre-training vector through an XLNET language model;
(3) inputting the obtained text pre-training vector into a Mogrifier BiGRU neural network for text feature extraction;
(4) introducing a MultiHead Attention mechanism behind a Morrifier BiGRU neural network to capture more important parts in text features; the important part refers to a part which can form a knowledge entity in the text characteristics;
(5) and obtaining the relationship between the course knowledge point named entity and the knowledge entity by combining the CRF model.
2. The method of claim 1, wherein the method comprises the following steps: in the step (1), knowledge entity labeling is performed on text data in the course knowledge point named entity corpus by using a BIO labeling method, that is, the text data is divided into P categories, each category is a label, the pth category is represented as a label P, and P is 1,2, …, and P; dividing the relation between the knowledge entities into Q relations, wherein the Q-th relation is expressed as a relation Q, and Q is 1,2, … and Q; dividing the text data into a training set and a test set; in the BIO labeling method, B represents the beginning of a knowledge entity, I represents other parts of the knowledge entity, and O represents a non-knowledge entity.
3. The method of claim 1, wherein the method comprises the following steps: said step (2)In (1), a sentence input to the XLNET language model is denoted as S ═ S1,s2,…,sN]The text pre-training vector output by the XLNET language model is represented as X ═ X1,x2,…,xN](ii) a Wherein s isiRepresenting the ith word, x, constituting the sentence SiIs a character siI is 1,2, …, N.
4. The method of claim 3, wherein the method comprises the following steps: in the step (3), the morrifier BiGRU neural network includes a forward GRU network and a backward GRU network, and the input and the hidden layer output of the morrifier BiGRU neural network are X ═ X respectively1,x2,…,xN]And H ═ H1,h2,…,hN]The input and hidden layer output of the forward GRU network are respectively
Figure FDA0002772171950000011
And
Figure FDA0002772171950000012
the input and hidden layer output to the GRU network are respectively
Figure FDA0002772171950000013
And
Figure FDA0002772171950000021
the superscripts t and t-1 denote t time and t-1 time, in pairs
Figure FDA0002772171950000022
And
Figure FDA0002772171950000023
carry out bidirectional multi-round interaction to obtain
Figure FDA0002772171950000024
Figure FDA0002772171950000025
For the forward GRU network, the interaction process is as follows:
(a41) to pair
Figure FDA0002772171950000026
And
Figure FDA0002772171950000027
carry out interaction to obtain
Figure FDA0002772171950000028
(a42) To pair
Figure FDA0002772171950000029
And
Figure FDA00027721719500000210
carry out interaction to obtain
Figure FDA00027721719500000211
(a43) To pair
Figure FDA00027721719500000212
And
Figure FDA00027721719500000213
carry out interaction to obtain
Figure FDA00027721719500000214
(a44) To pair
Figure FDA00027721719500000215
And
Figure FDA00027721719500000216
carry out interaction to obtain
Figure FDA00027721719500000217
(a45)
Figure FDA00027721719500000218
For the backward GRU network, the interaction process is as follows:
(b41) to pair
Figure FDA00027721719500000219
And
Figure FDA00027721719500000220
carry out interaction to obtain
Figure FDA00027721719500000221
(b42) To pair
Figure FDA00027721719500000222
And
Figure FDA00027721719500000223
carry out interaction to obtain
Figure FDA00027721719500000224
(b43) To pair
Figure FDA00027721719500000225
And
Figure FDA00027721719500000226
carry out interaction to obtain
Figure FDA00027721719500000227
(b44) To pair
Figure FDA00027721719500000228
And
Figure FDA00027721719500000229
carry out interaction to obtain
Figure FDA0002772171950000031
(b45)
Figure FDA0002772171950000032
Wherein: σ is a logistic regression function, R1、R2、R3、R4Are model parameters.
5. The method of claim 4, wherein the method comprises the following steps: in the step (4), a MultiHead authorization mechanism is introduced after the Morrifier BiGRU neural network, and the MultiHead authorization mechanism is used for further capturing the words siThe context semantics of (1) and highlighting the significance of the keywords in the sentence (S), and assigning Attention weight, taking the MultiHead Attention mechanism as the Attention layer.
6. The method of claim 5, wherein the method comprises the following steps: the calculation process of the MultiHead Attention mechanism comprises the following steps:
(41) changing X to [ X ]1,x2,…,xN]H ═ H output by Mogrifier BiGRU neural network1,h2,…,hN]Mapping into K, Q, V three vectors;
(42) k, Q, V Attention is focused on the jth point of the Multihead Attention mechanism
Figure FDA0002772171950000033
Figure FDA0002772171950000034
Wherein:
Figure FDA0002772171950000035
is a matrix of three global parameters, dNDenotes the input dimension of the MultiHead Attention mechanism, D denotes the total number of heads of Attention of the MultiHead Attention mechanism, Dk=dq=dv=dN/D;
(43) Calculating the jth attention value
Figure FDA0002772171950000036
(44) Splicing the D attention values to obtain the multi-head attention
Figure FDA0002772171950000037
Wherein: woAs a weight matrix, the ith row and the jth column element B of BijIndicating the word siWeight on jth attention;
(45) associated words siHidden state h ofiAnd attention weight bijGenerating words siContent vector of
Figure FDA0002772171950000038
(46) The important part of text features captured by introducing a MultiHead Attention mechanism after a Morrifier BiGRU neural network is C ═ C1,c2,…,cN]。
7. The method of claim 5, wherein the method comprises the following steps: in the step (5), the CRF model is used as a label score layer, the CRF model is used for calculating the label score of each word under each label, then a Viterbi algorithm is applied to obtain a label sequence with the highest label score, and then the relation between the course knowledge point named entity and the knowledge entity is obtained through the relation extraction layer.
8. The method of claim 7The entity and relation combined extraction method facing the education field is characterized in that: in the step (5), the CRF model is used for calculating the words siLabel score under label p
Figure FDA0002772171950000041
Figure FDA0002772171950000042
Wherein: superscript (ner) represents knowledge entity annotation recognition; v(ner)And U(ner)Representing a weight matrix, b(ner)Representing a bias matrix, V(ner)∈Rp×l,U(ner)∈Rl×2d,b(ner)∈RlL is the layer width of the CRF model, and d is the number of hidden layer units of the Morrifier BiGRU neural network; f (-) represents a nonlinear activation function;
assigning labels to all words in the sentence S, so as to obtain a label sequence of the sentence S, where each sentence S has R ═ NPAnd (3) breeding the label sequences, and calculating the label score of S under the r label sequences:
Figure FDA00027721719500000410
wherein: y isrDenotes the r-th tag sequence, Y ═ Y1,Y2,…,YR],r=1,2,…,R,
Figure FDA0002772171950000043
Indicates the lower character s of the r-th label sequenceiThe label score under the assigned label is,
Figure FDA0002772171950000044
represents the tag score of the sentence S under the assigned tag under the r label sequence, A(i,r),(i+1,r)Indicates the lower character s of the r-th label sequenceiTransferring the assigned label to the word si+1A represents a transition matrix, A ∈ R(P+2)×(P+2)
Scoring tags for tag sequences
Figure FDA0002772171950000045
And (3) carrying out normalization to obtain the probability distribution of each label sequence:
Figure FDA0002772171950000046
obtaining the label sequence with the highest label score by using Viterbi algorithm
Figure FDA0002772171950000047
Figure FDA0002772171950000048
Wherein: sequence of tags scoring highest at tag
Figure FDA0002772171950000049
Middle and character siThe assigned label is gi
The label score layer is trained using a method that minimizes cross-entropy loss.
9. The method of claim 8, wherein the method comprises the steps of: in the step (5), when the relation between the course knowledge point named entity and the knowledge entity is extracted by adopting the relation extraction layer, the character s under the given relation q is calculated firstlyiAnd character sjThe relationship score between:
S(re)(mj,mi,q)=V(re)f(U(re)mj+W(re)mi+b(re))
wherein: m isi=[ci;gi],mj=[cj;gj]Superscript (re) denotes relationship recognition, V(re)、U(re)And W(re)Representing a weight matrix, b(re)Representing a bias matrix, V(re)∈Rl,U(re)∈Rl×(2a+d),W(re)∈Rl×(2a+d),b(re)∈RlL is the layer width of the relation extraction layer, d is the number of hidden layer units of the Mogrifier BiGRU neural network, and a is the dimension of the label; f (-) represents a nonlinear activation function;
word siAnd character sjThe probability distribution case with the relation q:
Figure FDA0002772171950000051
the relationship extraction layer is trained using a method of minimizing cross-entropy loss.
10. The method of claim 8, wherein the method comprises the steps of: cross entropy loss L in the training process of the relation extraction layerRECalculated using the formula:
Figure FDA0002772171950000052
the objective function is min (L)NER+LRE) Wherein: l isNERCross entropy loss during training for label scoring tiers.
CN202011252896.4A 2020-11-11 2020-11-11 Education-field-oriented entity and relation combined extraction method Pending CN112364654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011252896.4A CN112364654A (en) 2020-11-11 2020-11-11 Education-field-oriented entity and relation combined extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011252896.4A CN112364654A (en) 2020-11-11 2020-11-11 Education-field-oriented entity and relation combined extraction method

Publications (1)

Publication Number Publication Date
CN112364654A true CN112364654A (en) 2021-02-12

Family

ID=74515944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011252896.4A Pending CN112364654A (en) 2020-11-11 2020-11-11 Education-field-oriented entity and relation combined extraction method

Country Status (1)

Country Link
CN (1) CN112364654A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553385A (en) * 2021-07-08 2021-10-26 北京计算机技术及应用研究所 Relation extraction method of legal elements in judicial documents

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553385A (en) * 2021-07-08 2021-10-26 北京计算机技术及应用研究所 Relation extraction method of legal elements in judicial documents
CN113553385B (en) * 2021-07-08 2023-08-25 北京计算机技术及应用研究所 Relation extraction method for legal elements in judicial document

Similar Documents

Publication Publication Date Title
Huang et al. Facial expression recognition with grid-wise attention and visual transformer
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
Qiu et al. DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN111160037B (en) Fine-grained emotion analysis method supporting cross-language migration
CN111581401B (en) Local citation recommendation system and method based on depth correlation matching
Chen et al. A semantics-assisted video captioning model trained with scheduled sampling
CN109508459B (en) Method for extracting theme and key information from news
CN111858896B (en) Knowledge base question-answering method based on deep learning
Li et al. UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning
Han et al. A survey of transformer-based multimodal pre-trained modals
Xu et al. Enhanced attentive convolutional neural networks for sentence pair modeling
Puscasiu et al. Automated image captioning
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN111242059A (en) Method for generating unsupervised image description model based on recursive memory network
CN113360667B (en) Biomedical trigger word detection and named entity identification method based on multi-task learning
CN113297375B (en) Document classification method, system, device and storage medium based on label
Elleuch et al. The Effectiveness of Transfer Learning for Arabic Handwriting Recognition using Deep CNN.
CN112101014B (en) Chinese chemical industry document word segmentation method based on mixed feature fusion
CN112364654A (en) Education-field-oriented entity and relation combined extraction method
CN116680407A (en) Knowledge graph construction method and device
Goel et al. Injecting prior knowledge into image caption generation
CN116127954A (en) Dictionary-based new work specialized Chinese knowledge concept extraction method
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN115409028A (en) Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination