CN114239574A - Miner violation knowledge extraction method based on entity and relationship joint learning - Google Patents

Miner violation knowledge extraction method based on entity and relationship joint learning Download PDF

Info

Publication number
CN114239574A
CN114239574A CN202111564215.2A CN202111564215A CN114239574A CN 114239574 A CN114239574 A CN 114239574A CN 202111564215 A CN202111564215 A CN 202111564215A CN 114239574 A CN114239574 A CN 114239574A
Authority
CN
China
Prior art keywords
word
layer
input
model
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111564215.2A
Other languages
Chinese (zh)
Inventor
史新国
刘柯
冯仕民
刘业献
翟勃
谢亚波
王卫龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zibo Mining Group Co ltd
Xuzhou University of Technology
Original Assignee
Zibo Mining Group Co ltd
Xuzhou University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zibo Mining Group Co ltd, Xuzhou University of Technology filed Critical Zibo Mining Group Co ltd
Priority to CN202111564215.2A priority Critical patent/CN114239574A/en
Publication of CN114239574A publication Critical patent/CN114239574A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning, which comprises the following steps: data labeling, namely identifying entities in input sentences and relations among the entities so as to obtain triple results; preprocessing, namely performing word segmentation processing on training data before model training; projection, namely encoding input sentences through three distributed models for enriching semantic information of the sentences; designing a network model, and learning a nested structure of input linguistic data and a potential dependency relationship between the input linguistic data and a label; and performing feature extraction by taking the text and the label as network input, and classifying the entities by using a CRF layer and a Softmax layer respectively. The invention carries out common learning by two tasks of entity identification and relation extraction, and the learning parameters and the characteristic information are shared in the learning process of different tasks, thereby optimizing the knowledge extraction effect.

Description

Miner violation knowledge extraction method based on entity and relationship joint learning
Technical Field
The invention relates to the technical field of coal mine exploration and development, in particular to a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning.
Background
At present, the coal mine mainly stores knowledge of the violation behaviors of miners in an unstructured data form such as a document, and a computer cannot understand the knowledge, so that the knowledge cannot be used for identifying the violation behaviors of the miners by the computer. The manual integration of these professional data and literature is a huge engineering burden.
Named entity recognition is the initial step of the knowledge extraction task. At present, many researches on named entity recognition are carried out, but compared with the general field, the information extraction task of the data in the violation behavior field of miners comprises coal mine geographic information and a large number of proper nouns, and the difficulty is that the named entities have the phenomenon of meaning of one word or synonymy of multiple words, and certain semantic relations exist among different named entities, and the semantic relations have great influence on entity recognition. The task of named entity recognition in the field of miner violations remains a significant challenge. Conventional rule-and-statistic-based methods require manual feature extraction, which, although an improvement over manual entity extraction, still consume a lot of time and labor, and the selection of features determines the upper limit of the model. With the tremendous success of deep web models in natural language processing, a large number of named entity recognitions are being driven to begin automatically recognizing entities from unstructured text using deep learning techniques that do not rely on expert constructed features. However, for a strong domain text in the domain of the violation of the miners, different models can be designed in different ways to have different influences on the recognition effect.
In the knowledge extraction task, named entity recognition can provide information related to terms in the text, but the provided information is limited, and the relationships among the entities contain a large amount of knowledge and abundant semantic information, so the relationship extraction is usually accepted in the knowledge extraction after the entity recognition task. Deep learning is currently applied in the field of relationship extraction. The convolutional neural network is slightly deficient in the extraction of timing characteristics. The cyclic neural network overcomes the defects of the neural network in the time sequence characteristic extraction by changing the connection mode between the neurons and the time-based back propagation algorithm, but the cyclic neural network cannot be parallelized, and the propagation is time-consuming compared with the convolutional network. Currently, most research on relationship extraction is performed on the basis of the result of named entity identification, so that errors of named entity identification can be propagated to a relationship extraction task, relationship information between entities can have certain influence on the identification of the entities, and the interaction information between two subtasks is ignored by simple entity identification.
At present, most machine learning researches aim at a certain specific task, and a specific learning model is constructed so as to solve the specific task. Many tasks are not completely independent and have rich association information between them. Taking named entity identification and relationship extraction as an example, most researchers regard the two tasks as two independent subtasks to respectively perform feature extraction and identification, ignore shared information between the two tasks, and easily cause problems of error propagation and the like.
Disclosure of Invention
In order to solve the problem of error propagation in the knowledge extraction task, the invention provides a method for extracting the knowledge of the violation behaviors of miners based on entity and relationship joint learning. In order to achieve the technical purpose, the invention adopts the following technical scheme:
a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning comprises the following steps:
s1: data annotation: marking entities in the input sentence and the relation between the entities so as to obtain a triple result;
s2: pretreatment: carrying out jieba word segmentation processing on training data before model training;
s3: projection: in order to enrich semantic information of sentences, training data after word segmentation processing is coded through three distributed models;
s4: designing a network model, and learning a nested structure of training data and a potential dependency relationship between the nested structure and a label: an enhanced model is provided, the enhanced model embeds the bidirectional LSTM into a self-attention mechanism on the basis of an original model so as to better extract time sequence characteristics of the text and the label, the text characteristics are learned by using a deep network instead of independent learning of encoding and decoding of samples and label characteristics, and the label of the sequence is obtained by using the maximum likelihood;
s5: and (3) performing feature extraction by taking the text and the label as network input, and classifying the entities by respectively using a CRF layer and a Softmax layer in order to explore the performance of the depth model on dependent feature learning.
Preferably, step S3 specifically includes:
s31: using Word to Vector to map Word segmentation processing results to obtain Word vectors and Word vectors, performing combined training, introducing Word vectors with finer granularity into Word representation for improving the accuracy Of low-frequency Word representation, and jointly training the Word vectors and the Word vectors by using an improved Continuous Bag-Of-Words model to obtain a new Word representation model;
s32: in order to learn word-level context information and sentence structure information, a word vector is trained by using Fastext;
s33: in order to learn the co-occurrence information among the words, the words are subjected to distributed learning by using the global word vector;
s34: extracting relative position information: extracting features by using an attention mechanism, wherein the attention mechanism cannot distinguish different position features, so that position coding information of each word is added;
s35: connecting the vectors obtained in the steps S31, S32 and S33 in series, adding the position coding information in the step S34 to generate a new projection vector, adding a full connection layer after the spliced vectors in order to avoid data offset caused by repeated extraction of information, introducing a weight matrix, and reducing the dimension of input; a dropout layer is added behind a full connection layer, and some neuron nodes are thrown away temporarily with certain probability, so that networks with different structures are trained each time.
Preferably, the Continuous Bag-Of-Words model improvement formula in step S31 is as follows:
Figure BDA0003421670110000021
wherein x isjTo output, wjIs a weight, NjAs the number of Chinese characters in the text, ckFor coding words, coefficients
Figure BDA0003421670110000031
The consistency of the word distance calculated by the word vector and the word vector is ensured, in order to simplify the model, the word vector information is only introduced to the context part, namely the final target information is obtained by predicting the combined information of the word vector and the word vector.
Preferably, the step S4 of enhancing the model specifically includes:
s41: attention layer based on bidirectional LSTM: the bidirectional LSTM is the concatenation of forward LSTM and backward LSTM results, can effectively utilize the context information of a text sequence, combines an attention mechanism with the bidirectional LSTM, can effectively solve the shortages of the attention mechanism in the aspect of time sequence feature extraction, and has the following calculation formula based on the attention layer of the LSTM:
Figure BDA0003421670110000032
eki=v tanh(Whk+Uhi+b)
Figure BDA0003421670110000033
hk'=H(Chk'X')
wherein T is the length of the input sequence; e.g. of the typekiAn attention score for the ith node to the kth node; w and U are weight matrices, v and b are coefficients, alphakiAn attention weight for the ith node to the kth node; h isiThe ith vector of the forward hidden layer sequence; h is the transformation matrix, X' is the input, HkA k vector which is a reverse hidden layer sequence; c is semantic coding; h isk'The final feature vector is obtained;
s42: non-linear mapping layer: the feedforward sublayer of this part consists of two linear layers connected by a ReLU, and the calculation formula is as follows:
FFN(X)=ReLU(XW1)W2
wherein
Figure BDA0003421670110000034
And
Figure BDA0003421670110000035
is a trainable weight matrix, d and hfRepresenting the row and column number of the matrix, wherein X is a feedforward neural network layer input matrix; ffn (x) represents a feedforward neural network mapping result;
s43: the LSTM is improved using a residual mechanism: in the bidirectional LSTM, a residual error mechanism is used to selectively update the hidden layer, thereby accelerating the training speed.
Preferably, the step S5 of classifying the entity using the CRF layer and the Softmax layer specifically includes:
s51: using CRF as classification layer: the CRF layer takes a path as a unit, considers the path probability and assumes the conditional probability p (y) of the original objective function1,...,yn|x1,...,xn) Is an exponential distribution, wherein xi,yiInput and output, i ═ 1, …, n; second, assuming that the correlation between outputs occurs only at two adjacent positions, x is { x ═ x according to the input sequence1,x2,...,xnY, tag sequence y ═ y1,y2,...,ynUse maximum likelihoodSolving a parameter value theta for the objective function by the method; in the prediction stage, the model predicts a corresponding label according to a hidden state generated by the last layer of the deep attention network;
s52: using Softmax as the classification layer:
model training phase, for a given input x ═ { x ═ x1,x2,...,xnY, the corresponding tag sequence y ═ y1,y2,...,ynThe log-likelihood function of is:
Figure BDA0003421670110000041
where logp (y | x; theta) is a log-likelihood function under the condition of input x and parameter theta, ytFor the t-th label, the training goal is to maximize the log probability of the corresponding correct label given the training set input sequence;
in the prediction stage, a model generates a hidden layer representation h according to a highest attention sublayer of a deep attention networktPredicting the corresponding label ytThe calculation formula is as follows:
Figure BDA0003421670110000042
wherein h istFor hidden layer representation, W0In order to be the weight, the weight is,
Figure BDA0003421670110000043
is a dirac function.
Compared with the prior art, the invention has the beneficial effects that:
according to the method for extracting the knowledge of the violation behaviors of miners based on the entity and relationship joint learning, provided by the invention, the original words are projected into a real-valued vector and then fed to the next layer. A deep multi-headed self-attention neural network is then designed, which takes the embedded matrix as input to capture the nested structure of sentences and the dependency between labels. And finally, classifying the entities and the relationships thereof by using a classification layer. The invention provides a knowledge extraction model for entity and entity relation joint learning, which is suitable for the joint extraction of the entity and relation of the violation behaviors of miners. The joint learning carries out joint learning on two tasks of entity identification and relation extraction through a model, learning parameters and characteristic information are shared in the learning process of different tasks, and the knowledge extraction effect is optimized. The method for extracting the knowledge of the violation behaviors of the miners based on the entity and relationship joint learning can automatically extract the entities and the relationships related to the violation behaviors of the miners, and can express the entities and the relationships in a computer understandable mode, so that the problem that the knowledge in the field of the violation behaviors of the miners cannot be expressed and used is solved.
Drawings
For a clearer explanation of the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for a person skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning;
FIG. 2 is a schematic diagram of a data annotation method.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning comprises the following steps:
s1: the miner violation knowledge extraction task is regarded as a serialization labeling task to perform end-to-end direct extraction, namely, a sentence is given, the goal of the joint learning is to identify all entities and relations in the sentence, and perform semantic classification on the entities and relations between the entities, namely, the entities in the input sentence are identified, and accordingly a triple result is obtained. As shown in fig. 2, the sentence is input: "Hebei Hai Min Industrial and mining group Limited company sits in Standard parts scattered throughout the country-Handan", generates labeled results: { Hebei Hai Min Industrial and mining group Co., Ltd., belongs to (SY), Hebei-Handan }. In the label, B represents Begin, which represents the beginning; m represents intermedate and represents middle; e represents End, representing the End; s represents Single and represents a Single character; o represents Other, which is used for marking irrelevant characters; the label "1" represents the subject in the relationship, and the label "2" represents the object in the relationship. "Hebei Hai Min Industrial and mineral group" and "Hebei-Handan" are the same relationship "and belong to (SY)".
S2: pretreatment: and carrying out jieba word segmentation on the training data before model training.
S3: projection: in order to enrich semantic information of sentences, the input sentences are encoded through three distributed models, and the method specifically comprises the following steps:
s31: using Word to Vector to map Word segmentation processing results to obtain Word vectors and Word vectors, performing combined training, introducing Word vectors with finer granularity into Word representation for improving the accuracy Of low-frequency Word representation, and jointly training the Word vectors and the Word vectors by using an improved Continuous Bag-Of-Words model to obtain a new Word representation model; the Continuous Bag-Of-Words model improvement formula is as follows:
Figure BDA0003421670110000051
wherein x isjTo output, wjIs a weight, NjAs the number of Chinese characters in the text, ckFor coding words, coefficients
Figure BDA0003421670110000052
The consistency of the word distance calculated by the word vector and the word vector is ensured, and only the distance between the word vector and the word vector is compared with the distance between the word vector and the word vector in order to simplify the modelThe following part introduces word vector information, namely final target information is obtained by predicting the combined information of the word vector and the word vector.
S32: in order to learn word-level context information and sentence structure information, a word vector is trained by using Fastext;
s33: in order to learn the co-occurrence information among the words, the words are subjected to distributed learning by using the global word vector;
s34: extracting relative position information: extracting features by using an attention mechanism, wherein the attention mechanism cannot distinguish different position features, so that position coding information of each word is added;
s35: connecting the vectors obtained in the steps S31, S32 and S33 in series, adding the position coding information in the step S34 to generate a new projection vector, adding a full connection layer after the spliced vectors in order to avoid data offset caused by repeated extraction of information, introducing a weight matrix, and reducing the dimension of input; a dropout layer is added behind a full connection layer, and some neuron nodes are thrown away temporarily with certain probability, so that networks with different structures are trained each time.
S4: and designing a network model, and learning a nested structure of the input corpus and a potential dependency relationship between the input corpus and the tags. The self-attention mechanism is as follows: self-attention is a special case of attention mechanism, which is input as a single distributed sequence, i.e. without any additional information, the information that needs attention can still be obtained from the sentence. The self-attention mechanism has been successfully used in many natural language processing tasks such as machine translation, text representation, etc. Firstly, calculating the matching degree score of the current hidden state and the previous hidden state as the attention score of the current hidden unit, and secondly, converting the score into a probability value through normalized mapping. And finally, weighting and summing all hidden states before the current state. The enhancement model embeds the bidirectional LSTM in a self-attention mechanism on the basis of an original model so as to better extract time sequence characteristics of texts and labels, does not need to independently learn encoding and decoding of samples and label characteristics, learns the text characteristics by using a deep network, and obtains the labels of a sequence by using maximum likelihood.
The enhancement model specifically includes:
s41: attention layer based on bidirectional LSTM: the bidirectional LSTM is the concatenation of forward LSTM and backward LSTM results, can effectively utilize the context information of a text sequence, combines an attention mechanism with the bidirectional LSTM, can effectively solve the shortages of the attention mechanism in the aspect of time sequence feature extraction, and has the following calculation formula based on the attention layer of the LSTM:
Figure BDA0003421670110000061
eki=vtanh(Whk+Uhi+b)
Figure BDA0003421670110000062
hk'=H(Chk'X')
wherein T is the length of the input sequence; e.g. of the typekiAn attention score for the ith node to the kth node; w and U are weight matrices, v and b are coefficients, alphakiAn attention weight for the ith node to the kth node; h isiThe ith vector of the forward hidden layer sequence; h is the transformation matrix, X' is the input, HkA k vector which is a reverse hidden layer sequence; c is semantic coding; h isk'The final feature vector is obtained;
s42: non-linear mapping layer: the feedforward sublayer of this part consists of two linear layers connected by a ReLU, and the calculation formula is as follows:
FFN(X)=ReLU(XW1)W2
wherein
Figure BDA0003421670110000071
And
Figure BDA0003421670110000072
is a trainable weight matrix, d and hfRepresenting the row and column number of the matrix, wherein X is a feedforward neural network layer input matrix; FFN (X) denotes a feedforward neural networkMapping results;
s43: the LSTM is improved using a residual mechanism: in the bidirectional LSTM, a residual error mechanism is used to selectively update the hidden layer, thereby accelerating the training speed.
S5: and (3) performing feature extraction by taking the text and the label as network input, and classifying the entities by respectively using a CRF layer and a Softmax layer in order to explore the performance of the depth model on dependent feature learning.
The classification of the entities by using the CRF layer and the Softmax layer specifically includes:
s51: using CRF as classification layer: the CRF layer takes a path as a unit, considers the path probability and assumes the conditional probability p (y) of the original objective function1,...,yn|x1,...,xn) Is an exponential distribution, wherein xi,yiInput and output, i ═ 1, …, n; second, assuming that the correlation between outputs occurs only at two adjacent positions, x is { x ═ x according to the input sequence1,x2,...,xnY, tag sequence y ═ y1,y2,...,ynSolving a parameter value theta of the objective function by using a maximum likelihood method; in the prediction stage, the model predicts a corresponding label according to a hidden state generated by the last layer of the deep attention network;
s52: using Softmax as the classification layer:
model training phase, for a given input x ═ { x ═ x1,x2,...,xnY, the corresponding tag sequence y ═ y1,y2,...,ynThe log-likelihood function of is:
Figure BDA0003421670110000073
where log p (y | x; θ) is the log-likelihood function under the conditions of input x and parameter θ, ytFor the t-th label, the training goal is to maximize the log probability of the corresponding correct label given the training set input sequence;
a prediction phase, in which the model is generated according to the highest interest sub-layer of the deep attention networkHidden layer of (1) represents htPredicting the corresponding label ytThe calculation formula is as follows:
Figure BDA0003421670110000074
wherein h istFor hidden layer representation, W0In order to be the weight, the weight is,
Figure BDA0003421670110000075
is a dirac function.
According to the method for extracting the knowledge of the violation behaviors of miners based on the entity and relationship joint learning, four main entity types are extracted from a data set in the field of coal mine safety to test the model performance, and the experimental results are shown in table 1:
TABLE 1 entity identification results
Figure BDA0003421670110000081
In Table 1, PER is name of person, ORG is name of organization structure, LOC is name of area, EQU is name of coal mine equipment, P is Precision, R is Recall, F1-score is F1 score, P, R, F1-score calculation formula is as follows:
Figure BDA0003421670110000082
Figure BDA0003421670110000083
Figure BDA0003421670110000084
wherein TP represents the number of positive examples in the test set which are correctly predicted as positive examples; FP represents the number of positive cases in the test set that are misclassified as negative cases; FN represents the number of negative cases in the test set misclassified as positive cases.
As can be seen from table 1, the entity type with better recognition effect is the name of a person, the area name, the organization name and the equipment name are various in type and rich in semantics, and F1-score is relatively low, but the difference of the recognition effects is acceptable compared with the recognition advantages of short name and single location.
And extracting the four relation types to complete the test of the model performance, wherein the experimental result is as follows:
TABLE 2 results of relational extraction
Figure BDA0003421670110000085
In Table 2, SY is a geographical dependency relationship, JZ is a dependency relationship between a person and an organization, and SS is a relationship between a worker and a worker, such as a relationship between a worker and a machine room in "over-speed locomotive for a worker". ZW is the job relationship, and is the interpersonal relationship within the organization. It can be seen that, because the geographic corpus is added into the corpus in the coal mine safety field, the implementation relationship and the geographic dependency relationship frequently appear, and the identification effect is better because the entity characteristics are more obvious.

Claims (5)

1. A method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning is characterized by comprising the following steps:
s1: data annotation: marking entities in the input sentence and the relation between the entities so as to obtain a triple result;
s2: pretreatment: carrying out jieba word segmentation processing on training data before model training;
s3: projection: in order to enrich semantic information of sentences, training data after word segmentation processing is coded through three distributed models;
s4: designing a network model, and learning a nested structure of training data and a potential dependency relationship between the nested structure and a label: an enhanced model is provided, the enhanced model embeds the bidirectional LSTM into a self-attention mechanism on the basis of an original model so as to better extract time sequence characteristics of the text and the label, the text characteristics are learned by using a deep network instead of independent learning of encoding and decoding of samples and label characteristics, and the label of the sequence is obtained by using the maximum likelihood;
s5: and (3) performing feature extraction by taking the text and the label as network input, and classifying the entities by respectively using a CRF layer and a Softmax layer in order to explore the performance of the depth model on dependent feature learning.
2. The method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning according to claim 1, wherein step S3 specifically comprises:
s31: using Word to Vector to map Word segmentation processing results to obtain Word vectors and Word vectors, performing combined training, introducing Word vectors with finer granularity into Word representation for improving the accuracy Of low-frequency Word representation, and jointly training the Word vectors and the Word vectors by using an improved Continuous Bag-Of-Words model to obtain a new Word representation model;
s32: in order to learn word-level context information and sentence structure information, a word vector is trained by using Fastext;
s33: in order to learn the co-occurrence information among the words, the words are subjected to distributed learning by using the global word vector;
s34: extracting relative position information: extracting features by using an attention mechanism, wherein the attention mechanism cannot distinguish different position features, so that position coding information of each word is added;
s35: connecting the vectors obtained in the steps S31, S32 and S33 in series, adding the position coding information in the step S34 to generate a new projection vector, adding a full connection layer after the spliced vectors in order to avoid data offset caused by repeated extraction of information, introducing a weight matrix, and reducing the dimension of input; a dropout layer is added behind a full connection layer, and some neuron nodes are thrown away temporarily with certain probability, so that networks with different structures are trained each time.
3. The method for extracting knowledge Of violation behaviors Of miners based on entity and relationship joint learning according to claim 2, wherein the Continuous Bag-Of-Words model improvement formula in step S31 is as follows:
Figure FDA0003421670100000011
wherein x isjTo output, wjIs a weight, NjAs the number of Chinese characters in the text, ckFor coding words, coefficients
Figure FDA0003421670100000021
The consistency of the word distance calculated by the word vector and the word vector is ensured, in order to simplify the model, the word vector information is only introduced to the context part, namely the final target information is obtained by predicting the combined information of the word vector and the word vector.
4. The method for extracting knowledge of violation behaviors of miners based on entity and relationship joint learning according to claim 1, wherein the step S4 of enhancing the model specifically comprises:
s41: attention layer based on bidirectional LSTM: the bidirectional LSTM is the concatenation of forward LSTM and backward LSTM results, can effectively utilize the context information of a text sequence, combines an attention mechanism with the bidirectional LSTM, can effectively solve the shortages of the attention mechanism in the aspect of time sequence feature extraction, and has the following calculation formula based on the attention layer of the LSTM:
Figure FDA0003421670100000022
eki=vtanh(Whk+Uhi+b)
Figure FDA0003421670100000023
hk'=H(Chk'X')
wherein T is the input orderThe length of the column; e.g. of the typekiAn attention score for the ith node to the kth node; w and U are weight matrices, v and b are coefficients, alphakiAn attention weight for the ith node to the kth node; h isiThe ith vector of the forward hidden layer sequence; h is the transformation matrix, X' is the input, HkA k vector which is a reverse hidden layer sequence; c is semantic coding; h isk'The final feature vector is obtained;
s42: non-linear mapping layer: the feedforward sublayer of this part consists of two linear layers connected by a ReLU, and the calculation formula is as follows:
FFN(X)=ReLU(XW1)W2
wherein
Figure FDA0003421670100000024
And
Figure FDA0003421670100000025
is a trainable weight matrix, d and hfRepresenting the row and column number of the matrix, wherein X is a feedforward neural network layer input matrix; ffn (x) represents a feedforward neural network mapping result;
s43: the LSTM is improved using a residual mechanism: in the bidirectional LSTM, a residual error mechanism is used to selectively update the hidden layer, thereby accelerating the training speed.
5. The method for extracting knowledge of the violation of miners based on entity and relationship joint learning according to claim 1, wherein the step S5 of classifying the entities using the CRF layer and the Softmax layer specifically comprises:
s51: using CRF as classification layer: the CRF layer takes a path as a unit, considers the path probability and assumes the conditional probability p (y) of the original objective function1,...,yn|x1,...,xn) Is an exponential distribution, wherein xi,yiInput and output, i ═ 1, …, n; second, assuming that the correlation between outputs occurs only at two adjacent positions, x is { x ═ x according to the input sequence1,x2,...,xnY, tag sequence y ═ y1,y2,...,ynSolving a parameter value theta of the objective function by using a maximum likelihood method; in the prediction stage, the model predicts a corresponding label according to a hidden state generated by the last layer of the deep attention network;
s52: using Softmax as the classification layer:
model training phase, for a given input x ═ { x ═ x1,x2,...,xnY, the corresponding tag sequence y ═ y1,y2,...,ynThe log-likelihood function of is:
Figure FDA0003421670100000031
where logp (y | x; theta) is a log-likelihood function under the condition of input x and parameter theta, ytFor the t-th label, the training goal is to maximize the log probability of the corresponding correct label given the training set input sequence;
in the prediction stage, a model generates a hidden layer representation h according to a highest attention sublayer of a deep attention networktPredicting the corresponding label ytThe calculation formula is as follows:
p(yt|x;θ)=p(yt|ht;θ)=softmax(Woht)Tδyt
wherein h istFor hidden layer representation, W0In order to be the weight, the weight is,
Figure FDA0003421670100000032
is a dirac function.
CN202111564215.2A 2021-12-20 2021-12-20 Miner violation knowledge extraction method based on entity and relationship joint learning Pending CN114239574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111564215.2A CN114239574A (en) 2021-12-20 2021-12-20 Miner violation knowledge extraction method based on entity and relationship joint learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111564215.2A CN114239574A (en) 2021-12-20 2021-12-20 Miner violation knowledge extraction method based on entity and relationship joint learning

Publications (1)

Publication Number Publication Date
CN114239574A true CN114239574A (en) 2022-03-25

Family

ID=80759457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111564215.2A Pending CN114239574A (en) 2021-12-20 2021-12-20 Miner violation knowledge extraction method based on entity and relationship joint learning

Country Status (1)

Country Link
CN (1) CN114239574A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114781381A (en) * 2022-04-11 2022-07-22 中国航空综合技术研究所 Standard index extraction method based on rule and neural network model fusion
CN115510869A (en) * 2022-05-30 2022-12-23 青海师范大学 End-to-end Tibetan La lattice shallow semantic analysis method
CN117195891A (en) * 2023-11-07 2023-12-08 成都航空职业技术学院 Engineering construction material supply chain management system based on data analysis
CN117610562A (en) * 2024-01-23 2024-02-27 中国科学技术大学 Relation extraction method combining combined category grammar and multi-task learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114781381A (en) * 2022-04-11 2022-07-22 中国航空综合技术研究所 Standard index extraction method based on rule and neural network model fusion
CN114781381B (en) * 2022-04-11 2024-01-09 中国航空综合技术研究所 Standard index extraction method based on rule and neural network model fusion
CN115510869A (en) * 2022-05-30 2022-12-23 青海师范大学 End-to-end Tibetan La lattice shallow semantic analysis method
CN117195891A (en) * 2023-11-07 2023-12-08 成都航空职业技术学院 Engineering construction material supply chain management system based on data analysis
CN117195891B (en) * 2023-11-07 2024-01-23 成都航空职业技术学院 Engineering construction material supply chain management system based on data analysis
CN117610562A (en) * 2024-01-23 2024-02-27 中国科学技术大学 Relation extraction method combining combined category grammar and multi-task learning

Similar Documents

Publication Publication Date Title
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN113128229B (en) Chinese entity relation joint extraction method
CN111382565B (en) Emotion-reason pair extraction method and system based on multiple labels
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN110532557B (en) Unsupervised text similarity calculation method
CN110647612A (en) Visual conversation generation method based on double-visual attention network
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN109086269B (en) Semantic bilingual recognition method based on semantic resource word representation and collocation relationship
CN111274804A (en) Case information extraction method based on named entity recognition
CN111581970B (en) Text recognition method, device and storage medium for network context
CN111914555B (en) Automatic relation extraction system based on Transformer structure
CN110874411A (en) Cross-domain emotion classification system based on attention mechanism fusion
CN111832293A (en) Entity and relation combined extraction method based on head entity prediction
CN114091460A (en) Multitask Chinese entity naming identification method
CN115292568B (en) Civil news event extraction method based on joint model
CN114254645A (en) Artificial intelligence auxiliary writing system
CN111444720A (en) Named entity recognition method for English text
CN114048314A (en) Natural language steganalysis method
CN114911947A (en) Concept extraction model based on knowledge prompt
CN113642862A (en) Method and system for identifying named entities of power grid dispatching instructions based on BERT-MBIGRU-CRF model
CN111967265B (en) Chinese word segmentation and entity recognition combined learning method for automatic generation of data set
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN113505207B (en) Machine reading understanding method and system for financial public opinion research report
CN115587595A (en) Multi-granularity entity recognition method for pathological text naming
CN115130475A (en) Extensible universal end-to-end named entity identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination