CN114239574A

CN114239574A - Miner violation knowledge extraction method based on entity and relationship joint learning

Info

Publication number: CN114239574A
Application number: CN202111564215.2A
Authority: CN
Inventors: 史新国; 刘柯; 冯仕民; 刘业献; 翟勃; 谢亚波; 王卫龙
Original assignee: Zibo Mining Group Co ltd; Xuzhou University of Technology
Current assignee: Zibo Mining Group Co ltd; Xuzhou University of Technology
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-03-25

Abstract

The invention discloses a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning, which comprises the following steps: data labeling, namely identifying entities in input sentences and relations among the entities so as to obtain triple results; preprocessing, namely performing word segmentation processing on training data before model training; projection, namely encoding input sentences through three distributed models for enriching semantic information of the sentences; designing a network model, and learning a nested structure of input linguistic data and a potential dependency relationship between the input linguistic data and a label; and performing feature extraction by taking the text and the label as network input, and classifying the entities by using a CRF layer and a Softmax layer respectively. The invention carries out common learning by two tasks of entity identification and relation extraction, and the learning parameters and the characteristic information are shared in the learning process of different tasks, thereby optimizing the knowledge extraction effect.

Description

Miner violation knowledge extraction method based on entity and relationship joint learning

Technical Field

The invention relates to the technical field of coal mine exploration and development, in particular to a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning.

Background

At present, the coal mine mainly stores knowledge of the violation behaviors of miners in an unstructured data form such as a document, and a computer cannot understand the knowledge, so that the knowledge cannot be used for identifying the violation behaviors of the miners by the computer. The manual integration of these professional data and literature is a huge engineering burden.

Named entity recognition is the initial step of the knowledge extraction task. At present, many researches on named entity recognition are carried out, but compared with the general field, the information extraction task of the data in the violation behavior field of miners comprises coal mine geographic information and a large number of proper nouns, and the difficulty is that the named entities have the phenomenon of meaning of one word or synonymy of multiple words, and certain semantic relations exist among different named entities, and the semantic relations have great influence on entity recognition. The task of named entity recognition in the field of miner violations remains a significant challenge. Conventional rule-and-statistic-based methods require manual feature extraction, which, although an improvement over manual entity extraction, still consume a lot of time and labor, and the selection of features determines the upper limit of the model. With the tremendous success of deep web models in natural language processing, a large number of named entity recognitions are being driven to begin automatically recognizing entities from unstructured text using deep learning techniques that do not rely on expert constructed features. However, for a strong domain text in the domain of the violation of the miners, different models can be designed in different ways to have different influences on the recognition effect.

In the knowledge extraction task, named entity recognition can provide information related to terms in the text, but the provided information is limited, and the relationships among the entities contain a large amount of knowledge and abundant semantic information, so the relationship extraction is usually accepted in the knowledge extraction after the entity recognition task. Deep learning is currently applied in the field of relationship extraction. The convolutional neural network is slightly deficient in the extraction of timing characteristics. The cyclic neural network overcomes the defects of the neural network in the time sequence characteristic extraction by changing the connection mode between the neurons and the time-based back propagation algorithm, but the cyclic neural network cannot be parallelized, and the propagation is time-consuming compared with the convolutional network. Currently, most research on relationship extraction is performed on the basis of the result of named entity identification, so that errors of named entity identification can be propagated to a relationship extraction task, relationship information between entities can have certain influence on the identification of the entities, and the interaction information between two subtasks is ignored by simple entity identification.

At present, most machine learning researches aim at a certain specific task, and a specific learning model is constructed so as to solve the specific task. Many tasks are not completely independent and have rich association information between them. Taking named entity identification and relationship extraction as an example, most researchers regard the two tasks as two independent subtasks to respectively perform feature extraction and identification, ignore shared information between the two tasks, and easily cause problems of error propagation and the like.

Disclosure of Invention

In order to solve the problem of error propagation in the knowledge extraction task, the invention provides a method for extracting the knowledge of the violation behaviors of miners based on entity and relationship joint learning. In order to achieve the technical purpose, the invention adopts the following technical scheme:

a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning comprises the following steps:

s1: data annotation: marking entities in the input sentence and the relation between the entities so as to obtain a triple result;

s2: pretreatment: carrying out jieba word segmentation processing on training data before model training;

s3: projection: in order to enrich semantic information of sentences, training data after word segmentation processing is coded through three distributed models;

s4: designing a network model, and learning a nested structure of training data and a potential dependency relationship between the nested structure and a label: an enhanced model is provided, the enhanced model embeds the bidirectional LSTM into a self-attention mechanism on the basis of an original model so as to better extract time sequence characteristics of the text and the label, the text characteristics are learned by using a deep network instead of independent learning of encoding and decoding of samples and label characteristics, and the label of the sequence is obtained by using the maximum likelihood;

s5: and (3) performing feature extraction by taking the text and the label as network input, and classifying the entities by respectively using a CRF layer and a Softmax layer in order to explore the performance of the depth model on dependent feature learning.

Preferably, step S3 specifically includes:

s31: using Word to Vector to map Word segmentation processing results to obtain Word vectors and Word vectors, performing combined training, introducing Word vectors with finer granularity into Word representation for improving the accuracy Of low-frequency Word representation, and jointly training the Word vectors and the Word vectors by using an improved Continuous Bag-Of-Words model to obtain a new Word representation model;

s32: in order to learn word-level context information and sentence structure information, a word vector is trained by using Fastext;

s33: in order to learn the co-occurrence information among the words, the words are subjected to distributed learning by using the global word vector;

s34: extracting relative position information: extracting features by using an attention mechanism, wherein the attention mechanism cannot distinguish different position features, so that position coding information of each word is added;

s35: connecting the vectors obtained in the steps S31, S32 and S33 in series, adding the position coding information in the step S34 to generate a new projection vector, adding a full connection layer after the spliced vectors in order to avoid data offset caused by repeated extraction of information, introducing a weight matrix, and reducing the dimension of input; a dropout layer is added behind a full connection layer, and some neuron nodes are thrown away temporarily with certain probability, so that networks with different structures are trained each time.

Preferably, the Continuous Bag-Of-Words model improvement formula in step S31 is as follows:

wherein x is_jTo output, w_jIs a weight, N_jAs the number of Chinese characters in the text, c_kFor coding words, coefficients

The consistency of the word distance calculated by the word vector and the word vector is ensured, in order to simplify the model, the word vector information is only introduced to the context part, namely the final target information is obtained by predicting the combined information of the word vector and the word vector.

Preferably, the step S4 of enhancing the model specifically includes:

s41: attention layer based on bidirectional LSTM: the bidirectional LSTM is the concatenation of forward LSTM and backward LSTM results, can effectively utilize the context information of a text sequence, combines an attention mechanism with the bidirectional LSTM, can effectively solve the shortages of the attention mechanism in the aspect of time sequence feature extraction, and has the following calculation formula based on the attention layer of the LSTM:

e_ki＝v tanh(Wh_k+Uh_i+b)

h_k'＝H(Ch_k'X')

wherein T is the length of the input sequence; e.g. of the type_kiAn attention score for the ith node to the kth node; w and U are weight matrices, v and b are coefficients, alpha_kiAn attention weight for the ith node to the kth node; h is_iThe ith vector of the forward hidden layer sequence; h is the transformation matrix, X' is the input, H_kA k vector which is a reverse hidden layer sequence; c is semantic coding; h is_k'The final feature vector is obtained;

s42: non-linear mapping layer: the feedforward sublayer of this part consists of two linear layers connected by a ReLU, and the calculation formula is as follows:

FFN(X)＝ReLU(XW₁)W₂

wherein

And

is a trainable weight matrix, d and h_fRepresenting the row and column number of the matrix, wherein X is a feedforward neural network layer input matrix; ffn (x) represents a feedforward neural network mapping result;

s43: the LSTM is improved using a residual mechanism: in the bidirectional LSTM, a residual error mechanism is used to selectively update the hidden layer, thereby accelerating the training speed.

Preferably, the step S5 of classifying the entity using the CRF layer and the Softmax layer specifically includes:

s51: using CRF as classification layer: the CRF layer takes a path as a unit, considers the path probability and assumes the conditional probability p (y) of the original objective function₁,...,y_n|x₁,...,x_n) Is an exponential distribution, wherein x_i,y_iInput and output, i ═ 1, …, n; second, assuming that the correlation between outputs occurs only at two adjacent positions, x is { x ═ x according to the input sequence₁,x₂,...,x_nY, tag sequence y ═ y₁,y₂,...,y_nUse maximum likelihoodSolving a parameter value theta for the objective function by the method; in the prediction stage, the model predicts a corresponding label according to a hidden state generated by the last layer of the deep attention network;

s52: using Softmax as the classification layer:

model training phase, for a given input x ═ { x ═ x₁,x₂,...,x_nY, the corresponding tag sequence y ═ y₁,y₂,...,y_nThe log-likelihood function of is:

where logp (y | x; theta) is a log-likelihood function under the condition of input x and parameter theta, y_tFor the t-th label, the training goal is to maximize the log probability of the corresponding correct label given the training set input sequence;

in the prediction stage, a model generates a hidden layer representation h according to a highest attention sublayer of a deep attention network_tPredicting the corresponding label y_tThe calculation formula is as follows:

wherein h is_tFor hidden layer representation, W₀In order to be the weight, the weight is,

is a dirac function.

Compared with the prior art, the invention has the beneficial effects that:

according to the method for extracting the knowledge of the violation behaviors of miners based on the entity and relationship joint learning, provided by the invention, the original words are projected into a real-valued vector and then fed to the next layer. A deep multi-headed self-attention neural network is then designed, which takes the embedded matrix as input to capture the nested structure of sentences and the dependency between labels. And finally, classifying the entities and the relationships thereof by using a classification layer. The invention provides a knowledge extraction model for entity and entity relation joint learning, which is suitable for the joint extraction of the entity and relation of the violation behaviors of miners. The joint learning carries out joint learning on two tasks of entity identification and relation extraction through a model, learning parameters and characteristic information are shared in the learning process of different tasks, and the knowledge extraction effect is optimized. The method for extracting the knowledge of the violation behaviors of the miners based on the entity and relationship joint learning can automatically extract the entities and the relationships related to the violation behaviors of the miners, and can express the entities and the relationships in a computer understandable mode, so that the problem that the knowledge in the field of the violation behaviors of the miners cannot be expressed and used is solved.

Drawings

For a clearer explanation of the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning;

FIG. 2 is a schematic diagram of a data annotation method.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

s1: the miner violation knowledge extraction task is regarded as a serialization labeling task to perform end-to-end direct extraction, namely, a sentence is given, the goal of the joint learning is to identify all entities and relations in the sentence, and perform semantic classification on the entities and relations between the entities, namely, the entities in the input sentence are identified, and accordingly a triple result is obtained. As shown in fig. 2, the sentence is input: "Hebei Hai Min Industrial and mining group Limited company sits in Standard parts scattered throughout the country-Handan", generates labeled results: { Hebei Hai Min Industrial and mining group Co., Ltd., belongs to (SY), Hebei-Handan }. In the label, B represents Begin, which represents the beginning; m represents intermedate and represents middle; e represents End, representing the End; s represents Single and represents a Single character; o represents Other, which is used for marking irrelevant characters; the label "1" represents the subject in the relationship, and the label "2" represents the object in the relationship. "Hebei Hai Min Industrial and mineral group" and "Hebei-Handan" are the same relationship "and belong to (SY)".

S2: pretreatment: and carrying out jieba word segmentation on the training data before model training.

S3: projection: in order to enrich semantic information of sentences, the input sentences are encoded through three distributed models, and the method specifically comprises the following steps:

s31: using Word to Vector to map Word segmentation processing results to obtain Word vectors and Word vectors, performing combined training, introducing Word vectors with finer granularity into Word representation for improving the accuracy Of low-frequency Word representation, and jointly training the Word vectors and the Word vectors by using an improved Continuous Bag-Of-Words model to obtain a new Word representation model; the Continuous Bag-Of-Words model improvement formula is as follows:

The consistency of the word distance calculated by the word vector and the word vector is ensured, and only the distance between the word vector and the word vector is compared with the distance between the word vector and the word vector in order to simplify the modelThe following part introduces word vector information, namely final target information is obtained by predicting the combined information of the word vector and the word vector.

S4: and designing a network model, and learning a nested structure of the input corpus and a potential dependency relationship between the input corpus and the tags. The self-attention mechanism is as follows: self-attention is a special case of attention mechanism, which is input as a single distributed sequence, i.e. without any additional information, the information that needs attention can still be obtained from the sentence. The self-attention mechanism has been successfully used in many natural language processing tasks such as machine translation, text representation, etc. Firstly, calculating the matching degree score of the current hidden state and the previous hidden state as the attention score of the current hidden unit, and secondly, converting the score into a probability value through normalized mapping. And finally, weighting and summing all hidden states before the current state. The enhancement model embeds the bidirectional LSTM in a self-attention mechanism on the basis of an original model so as to better extract time sequence characteristics of texts and labels, does not need to independently learn encoding and decoding of samples and label characteristics, learns the text characteristics by using a deep network, and obtains the labels of a sequence by using maximum likelihood.

The enhancement model specifically includes:

e_ki＝vtanh(Wh_k+Uh_i+b)

h_k'＝H(Ch_k'X')

FFN(X)＝ReLU(XW₁)W₂

wherein

And

is a trainable weight matrix, d and h_fRepresenting the row and column number of the matrix, wherein X is a feedforward neural network layer input matrix; FFN (X) denotes a feedforward neural networkMapping results;

The classification of the entities by using the CRF layer and the Softmax layer specifically includes:

s51: using CRF as classification layer: the CRF layer takes a path as a unit, considers the path probability and assumes the conditional probability p (y) of the original objective function₁,...,y_n|x₁,...,x_n) Is an exponential distribution, wherein x_i,y_iInput and output, i ═ 1, …, n; second, assuming that the correlation between outputs occurs only at two adjacent positions, x is { x ═ x according to the input sequence₁,x₂,...,x_nY, tag sequence y ═ y₁,y₂,...,y_nSolving a parameter value theta of the objective function by using a maximum likelihood method; in the prediction stage, the model predicts a corresponding label according to a hidden state generated by the last layer of the deep attention network;

s52: using Softmax as the classification layer:

where log p (y | x; θ) is the log-likelihood function under the conditions of input x and parameter θ, y_tFor the t-th label, the training goal is to maximize the log probability of the corresponding correct label given the training set input sequence;

a prediction phase, in which the model is generated according to the highest interest sub-layer of the deep attention networkHidden layer of (1) represents h_tPredicting the corresponding label y_tThe calculation formula is as follows:

is a dirac function.

According to the method for extracting the knowledge of the violation behaviors of miners based on the entity and relationship joint learning, four main entity types are extracted from a data set in the field of coal mine safety to test the model performance, and the experimental results are shown in table 1:

TABLE 1 entity identification results

In Table 1, PER is name of person, ORG is name of organization structure, LOC is name of area, EQU is name of coal mine equipment, P is Precision, R is Recall, F1-score is F1 score, P, R, F1-score calculation formula is as follows:

wherein TP represents the number of positive examples in the test set which are correctly predicted as positive examples; FP represents the number of positive cases in the test set that are misclassified as negative cases; FN represents the number of negative cases in the test set misclassified as positive cases.

As can be seen from table 1, the entity type with better recognition effect is the name of a person, the area name, the organization name and the equipment name are various in type and rich in semantics, and F1-score is relatively low, but the difference of the recognition effects is acceptable compared with the recognition advantages of short name and single location.

And extracting the four relation types to complete the test of the model performance, wherein the experimental result is as follows:

TABLE 2 results of relational extraction

In Table 2, SY is a geographical dependency relationship, JZ is a dependency relationship between a person and an organization, and SS is a relationship between a worker and a worker, such as a relationship between a worker and a machine room in "over-speed locomotive for a worker". ZW is the job relationship, and is the interpersonal relationship within the organization. It can be seen that, because the geographic corpus is added into the corpus in the coal mine safety field, the implementation relationship and the geographic dependency relationship frequently appear, and the identification effect is better because the entity characteristics are more obvious.

Claims

1. A method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning is characterized by comprising the following steps:

2. The method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning according to claim 1, wherein step S3 specifically comprises:

3. The method for extracting knowledge Of violation behaviors Of miners based on entity and relationship joint learning according to claim 2, wherein the Continuous Bag-Of-Words model improvement formula in step S31 is as follows:

4. The method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning according to claim 1, wherein the step S4 of enhancing the model specifically comprises:

e_ki＝vtanh(Wh_k+Uh_i+b)

h_k'＝H(Ch_k'X')

wherein T is the input orderThe length of the column; e.g. of the type_kiAn attention score for the ith node to the kth node; w and U are weight matrices, v and b are coefficients, alpha_kiAn attention weight for the ith node to the kth node; h is_iThe ith vector of the forward hidden layer sequence; h is the transformation matrix, X' is the input, H_kA k vector which is a reverse hidden layer sequence; c is semantic coding; h is_k'The final feature vector is obtained;

FFN(X)＝ReLU(XW₁)W₂

wherein

And

5. The method for extracting knowledge of the violation of miners based on entity and relationship joint learning according to claim 1, wherein the step S5 of classifying the entities using the CRF layer and the Softmax layer specifically comprises:

s52: using Softmax as the classification layer:

p(y_t|x；θ)＝p(y_t|h_t；θ)＝softmax(W_oh_t)^Tδ_yt

is a dirac function.