CN112800756B - Entity identification method based on PRADO - Google Patents

Entity identification method based on PRADO Download PDF

Info

Publication number
CN112800756B
CN112800756B CN202011334119.4A CN202011334119A CN112800756B CN 112800756 B CN112800756 B CN 112800756B CN 202011334119 A CN202011334119 A CN 202011334119A CN 112800756 B CN112800756 B CN 112800756B
Authority
CN
China
Prior art keywords
gate
projection
output
word
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011334119.4A
Other languages
Chinese (zh)
Other versions
CN112800756A (en
Inventor
尚凤军
冉淳夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011334119.4A priority Critical patent/CN112800756B/en
Publication of CN112800756A publication Critical patent/CN112800756A/en
Application granted granted Critical
Publication of CN112800756B publication Critical patent/CN112800756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention relates to the technical field of computer networks, in particular to an entity identification method based on PRADO, which comprises the steps of obtaining original data, and performing word segmentation and labeling processing on the original data; on a PRADO layer, based on a projection Embedding model, a projection network is constructed by using local sensitive hashing, and each character in a sentence is converted into a low-dimensional Embedding word list; extracting Embedding vector features by using the context association characteristics of the BilSTM neural network; distributing the feature vectors acquired by the BilSTM layer to different attention weights by an attention mechanism method; using CRF to complete the task of sequence labeling; the invention adopts the LSH algorithm to construct the projection network so as to achieve the purpose of reducing word embedded vector parameters, and simultaneously uses an attention mechanism to ensure the relation between the feature vector and the whole text so as to eliminate the hidden danger that the LSH algorithm can not well relate to the context.

Description

Entity identification method based on PRADO
Technical Field
The invention relates to the technical field of computer networks, in particular to an entity identification method based on PRADO.
Background
In recent years, due to the continuous development of the technical level of the internet, a large amount of data of all walks of life appears on the network, the large amount of data has high value, and how to efficiently acquire, store, analyze and apply the data is a problem to be researched in the big data era. In the data, not only structured data which is already arranged, but also a large amount of unstructured and semi-structured data which is not arranged exist, and natural language processing technology can be used for processing and classifying the data. With the rapid increase of the total amount of internet information, the traditional semantic network is not suitable, and the appearance of the knowledge graph provides a new idea for solving the problem.
The extraction of entity relations is an indispensable link for constructing the knowledge graph, the quality of the extracted entities and the quality of the relation lays the quality of the graph, the technology is not only used for search engines, but also used in other industries including the fields of medical treatment, education, securities investment, finance and the like, in general, each field is related, the existence of the relation provides a foundation for constructing the knowledge graph, and meanwhile, the value of the knowledge graph can be extracted.
An existing entity relationship extraction model, such as a Skip-Gram model, is a method for predicting context word vectors based on a selected target word vector, and firstly, a word in a sequence is selected as a reference point, and then, another word is found near the reference point by using a sliding window as a label, so that a plurality of reference point-label pairs can be obtained, and the reference point-label pairs are used as input of the model. However, the vector dimensions trained by these conventional word vector techniques are large, so that the input parameters in the network are extremely large, and the training of the model is extremely difficult.
Disclosure of Invention
In order to reduce the size of parameters in an Embedding stage and reduce the number of parameters on the premise of ensuring comprehensive word vector description information, so that the training of a model can be simpler and more portable, the invention provides an entity identification method based on PRADO, which specifically comprises the following steps as shown in FIG. 1:
acquiring original data, and performing word segmentation and labeling processing on the original data;
on a PRADO layer, based on a projection Embedding model, a projection network is constructed by using local sensitive hashing, and each character in a sentence is converted into a low-dimensional Embedding word list;
extracting Embedding vector features by using the context association characteristics of the BilSTM neural network;
distributing the feature vectors acquired by the BilSTM layer to different attention weights by an attention mechanism method;
and (4) completing the task of sequence labeling by using the CRF.
Further, the process of converting each word in the sentence into a low-dimensional Embedding word list includes:
repeatedly carrying out binary hash on the ith word to obtain a 2B bit vector
Figure GDA0003544475550000021
Using a projection matrix P generated by an initial random number and using the matrix pair
Figure GDA0003544475550000022
Projecting to obtain d-dimension vector
Figure GDA0003544475550000023
Using pairs of activation functions
Figure GDA0003544475550000024
Activating to obtain the low-dimensional Embedding word list e of the wordi
Further, the process of optimizing the projection matrix P generated by using an initial random number includes:
and comparing the final output result of the model with the actual value, performing a back propagation algorithm, and adaptively updating the projection matrix P through gradient inspection.
Further, using projection matrix pairs
Figure GDA0003544475550000025
The performing of the projection includes:
Figure GDA0003544475550000026
wherein, PkIn order to be a function of the projection,
Figure GDA0003544475550000027
representing a vector
Figure GDA0003544475550000028
And vector
Figure GDA0003544475550000029
The angle therebetween;
Figure GDA00035444755500000210
is composed of
Figure GDA00035444755500000211
The projection of (a) is performed,
Figure GDA00035444755500000212
further, the ith word is a low-dimensional Embedding word list eiExpressed as:
Figure GDA0003544475550000031
wherein, WpA weight parameter for the projection network; b ispIs the bias parameter of the projection network.
Further, the feature vectors obtained by the projection layer are assigned with different attention weights by an attention mechanism method, including:
Figure GDA0003544475550000032
Figure GDA0003544475550000033
Figure GDA0003544475550000034
wherein alpha isi,t′Indicating the generation result yiHow much attention is needed to be put to et′Upper, i.e. attention weight factor, et,t′Ensuring the sum of the weights as an auxiliary parameter is 1, yiTo output the result, TxThe length of the input sequence.
Further, the characteristics of the context association of the BilSTM neural network are utilized to extract the Embedding vector characteristics, namely, at each moment, the data to be deleted is added, the newly added content is added, the memory cell is updated, and the data at the current moment is output, the BilSTM neural network comprises a forgetting gate, an input gate and an output gate, wherein the forgetting gate is used for selecting the information to be discarded and left in the memory cell, the input gate is used for updating the control factor and updating the content, the output gate is used for determining the final output content, and the forgetting gate is expressed as:
Γf=σ(Wf[a<t-1>,x<t>,c<t-1>]+bf);
the input gates are represented as:
Γu=σ(Wu[a<t-1>,x<t>,c<t-1>]+bu);
Figure GDA0003544475550000035
Figure GDA0003544475550000036
the output gate is represented as:
Γo=σ(Wo[a<t-1>,x<t>,c<t-1>]+bo);
a<t>=Γo*c<t>
wherein, gamma isfA factor for forgetting the gate, WfWeight of forgetting gate, bfBias value for forgetting gate;a<t-1>Is an activation value; c. C<t-1>The value of the memory cell at the last moment is recorded; gamma-shapeduFactor of input gate, WuAs the weight of the input gate, buIs the offset value of the input gate;
Figure GDA0003544475550000041
the content is to be newly added; c. C<t>Newly added content; x is the number of<t>Is the t-th input parameter; gamma-shapedoFactor of output gate, WoAs weights of output gates, boIs the offset value of the output gate; bcIs composed of
Figure GDA0003544475550000042
The corresponding offset value.
Further, the tasks of completing sequence labeling by using the CRF comprise:
Figure GDA0003544475550000043
Figure GDA0003544475550000044
Figure GDA0003544475550000045
wherein the content of the first and second substances,
Figure GDA0003544475550000046
for transfer matrix, representing slave label yi-1To yiThe transition probability of (a) is,
Figure GDA0003544475550000047
is that the predicted result is the yiScore of individual labels, Z (x) is a normalization factor, tkAnd siAs a characteristic function, muiAnd λkIs a weight parameter.
According to the entity recognition model provided by the invention, the idea of a PRADO algorithm is borrowed at a word embedding layer, and a projection network is constructed by adopting an LSH algorithm, so that the purpose of reducing word embedding vector parameters is achieved, and meanwhile, the relation between the feature vector and the whole text is ensured by using an attention mechanism, so that the hidden danger of the context which cannot be well connected by the LSH algorithm is eliminated; then, the characteristic that the network is strongly associated in a local scope is used in a BilSTM layer, so that the trained result has better association with the whole text and the local part; and finally, completing the task of sequence labeling on a CRF layer, and continuously adjusting the weight parameters of each layer through a back propagation mechanism in the whole model.
Drawings
FIG. 1 is a flow chart of a PRADO-based entity identification method of the present invention;
FIG. 2 is a schematic diagram of the PRADO-BilSTM-CRF model employed in the present invention;
FIG. 3 is a schematic structural diagram of an attention model employed in the present invention;
FIG. 4 is a schematic structural diagram of a BilSTM model employed in the present invention;
FIG. 5 is a schematic diagram of an LSTM cell unit according to the present invention;
FIG. 6 is a schematic view of a CRF structure according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a PRADO-based entity identification method, as shown in figure 1, which specifically comprises the following steps:
acquiring original data, and performing word segmentation and labeling processing on the original data;
on a PRADO layer, based on a projection Embedding model, a projection network is constructed by using local sensitive hashing, and each character in a sentence is converted into a low-dimensional Embedding word list;
extracting Embedding vector features by using the context association characteristics of the BilSTM neural network;
distributing the feature vectors acquired by the BilSTM layer to different attention weights by an attention mechanism method;
and (4) completing the task of sequence labeling by using the CRF.
As shown in fig. 2, firstly, performing operations such as word segmentation and labeling on original data, and then injecting the original data into a PRADO layer, wherein the layer uses the idea of a projection Embedding model, uses Local Sensitive Hashing (LSH) to construct a projection network, converts each word in a sentence into a low-dimensional Embedding word list, and then distributes feature vectors acquired by a BiLSTM layer to different attention weights by an attention mechanism method, thereby eliminating the defect that an LSH algorithm cannot contact the whole text; the second layer is a BilSTM layer, and the characteristics of context correlation of the BilSTM neural network are utilized to extract the Embedding vector characteristics, so that the defect that the previous and following relations cannot be fully considered by the LSH in the first layer is improved; and the third layer is a CRF layer, and the CRF layer is used for completing the task of sequence marking. Next, this embodiment will describe in detail the manner of using the model in each layer.
(I) PRADO
In the traditional embedding concept, assume that the input text has T tokens or words, WiRepresents the ith word, where i ∈ {0,1.. T-1 }. If V is the number of words in the vocabulary, including the out-of-vocabulary token representing all the missing words, then each word WiAre all mapped to deltaiE.g. V. In most linguistic neural networks, words are typically mapped to fixed-length d-dimensional vectors e using an embedding layer with trainable parameters W ∈ Rd · Vi=W·δiWherein e isiE Rd is the word vector. Since most parameters in the network mainly come from the word vectors trained by W, and a word vector matrix capable of describing W in detail is to be obtained, the completeness of the vocabulary V, that is, the dimension of V, is particularly large, and only then, the word vectors obtained by training have relatively good performance. However, this method is premised on the fact that the dimension of V is large, and the dimension of W is also large, so that the number of parameters of the whole neural network is extremely large, and training is performedThe process of the network is particularly difficult, so that the method for training the word vectors by using a projection Embedding mode is proposed in the Embedding stage so as to achieve the purpose of reducing network parameters and enable the network training to be faster.
In the Embedding stage, if the dimension of the trained W is too large, although the representation of the word vector is complete, the parameters trained by the network can explode, the dimension is too small, and the description of the word vector is inaccurate and the network cannot be correctly trained, so that the mode adopted by the PRADO is a compromise method, and the mode of projecting the network is used, so that a certain word does not need to be particularly accurately represented, and only the trained word vector can describe the attributes of the word to a certain extent. For example, in entity classification, specific differences between the Chongqing university and the Chongqing post and telecommunications university do not need to be known, and only the fact that the Chongqing university and the Chongqing post and telecommunications university refer to each other needs to be understood, that is, in some specific fields, the meaning of the designation of some entities does not need to be completely known, and only the class to which the entity belongs needs to be known.
In the embodiment, a basic projection model is constructed by using Local Sensitive Hashing (LSH), the size and precision of word vectors trained by a traditional word2vec method mainly depend on the dimension of a vocabulary, and the LSH, as a dimension reduction technology in a clustering algorithm, can more independently control the dimension and sparsity of the word vectors, so that some vocabularies needing high-latitude representation can be controlled in the dimension in a certain range to achieve the purpose of reducing parameters and generate compact embedding, thereby optimizing the training effect of the whole model. The main steps are as follows:
1. for each W in the input textiIteratively performing binary hashing to obtain vectors
Figure GDA0003544475550000061
Here, assume that max (i) ═ N;
2. projection matrix P generated by an initial random number (P can be optimally adjusted by back propagation mechanism)) Will be
Figure GDA0003544475550000071
Is converted into
Figure GDA0003544475550000072
As shown in equation (1), to obtain a d-dimension vector
Figure GDA0003544475550000073
Figure GDA0003544475550000074
Figure GDA0003544475550000075
This results in a d-dimensional vector representation, and each dimension corresponds to a Pk=1,2,...,dThe vector is projected.
3. Using an activation function to obtain
Figure GDA0003544475550000076
As in equation 3:
Figure GDA0003544475550000077
wherein WpAnd BpRepresenting the weights and bias functions of the projection network, respectively. It can be known from the above formula
Figure GDA0003544475550000078
A total of N x d parameters that can be mapped into N d-dimensional word-embedding vectors eiThus the resulting eigenvector matrix (e)1,e-2,...,en-1,en)。
Through the method, a feature vector representation compressed by a traditional word embedding method can be obtained, and a token is not required to be described in great detail by using a one-hot vector; meanwhile, the dimensionality of N and d can be limited in a relatively small range by the coarse granularity of word segmentation, so that the number of parameters input into the neural network is smaller, and the operation speed of the network is higher. However, because the LSH algorithm is used, the feature vector obtained by the training can only describe the word within a certain range, and cannot better relate to the following relation before and after, therefore, before the feature vector obtained at this stage is fed into the BiLSTM model, the feature vector needs to be processed by means of an attention mechanism, so as to reduce the disadvantages of the LSH algorithm, as shown in fig. 4, the method specifically includes the following steps:
1. by alphai,t′Indicating the generation result yiHow much attention is needed to be put to et′And satisfies the following equation:
Figure GDA0003544475550000079
Figure GDA00035444755500000710
at this time alphai,t′Then again expressed as output yiTo ensure that the sum of weights is 1, softmax is used, and an auxiliary parameter e is introducedt,t′Such that:
Figure GDA0003544475550000081
2. the above equation requires calculating et,t′Therefore, a simple neural network model needs to be established, and then e is calculated by using a gradient descent algorithmt,t′
3. The result Y output in the above step is { Y ═ Y1,y2,...,yn-1,ynLet E be the eigenvector matrix E ═ E1,e-2,...,en-1E: ═ Y, and is taken as input to the BiLSTM, by means of which the network is better pairedThe front and back related characteristics of the section improve the accuracy of the final output.
(II) BilsTM layer
In the field of natural language processing, entity naming recognition problems are often expressed as sequence models, if two problems exist using standard neural network models: the first point is that different text sequences can obtain matched output after the model is input due to different sequences, but the input and the output of the model after another sequence is changed are different from the previous sequence and are not equal; the second point is that due to the particularity of the text sequence, the context information of the sequence is related, but the context of the sequence cannot be related by the common model. The general neural network has a natural inverse potential in solving the sequence problem, and thus a Recurrent Neural Network (RNN) is proposed to solve the sequence model problem.
In the task of named entity recognition, compared with the early rule matching and machine learning method, the accuracy of entity recognition can be obviously improved by using a recurrent neural network model, but because the model is large, two defects generally exist: (1) in the back propagation process of the model, due to the sequence of the RNN model, a plurality of hidden layers and a plurality of weight data of each layer, the problem of gradient explosion or disappearance is particularly easy to occur; (2) for longer sequences, the model is not good at capturing long-term dependent effects before and after the sequence.
To solve the above problem, the concept of adding a gate control unit, namely, long-short-term memory (LSTM), to the hidden unit of the RNN changes the hidden layer of the RNN, so that it can better capture deep junctions and improve the gradient hour problem. The main role of these gate structures is to control the number of information flow transmission processes, and in the training phase of the model, there are more and more intermediate data due to the properties of RNN, and the gate structures can know which of these intermediate data are important and need to be retained, and which are relatively unimportant and can be discarded. LSTM has a structure with three gates to control and regulate the transmission of information, a forgetting gate, an input gate and an output gate respectively. But because LSTM only remembers a single-direction text sequence, the BiLSTM model is finally chosen to solve this problem.
The last layer network obtains the word vector part after data preprocessing, and a local sensitive hashing algorithm (LSH) is used in the projection network, and because the algorithm cannot sufficiently link the relationship between two words with relatively long distance, the BiLSTM network needs to be used for better relationship between the words before and after the connection in the future. Next we will input the word vector into the constructed sequence processing model.
The forward propagation formula of the LSTM model is as follows:
(1) forget the door: the forgetting gate is used for determining the information discarded and left in the memory cell and has an output value of gammafOf value between 0 and 1, when ΓfAbout close to 0 means that the more should one discard, ΓfCloser to 1 means that the more should be kept, ΓfThe formula for forward propagation is:
Γf=σ(Wf[a<t-1>,x<t>,c<t-1>]+bf) (7)
(2) an input gate: the input gate determines the new updated content, and the two parts are the update control factor and the updated content respectively. First is the update factor, i.e. the update gate ΓuThe value range is [0, 1]]Because the values of the update are different, the information required to be retained is also different, that is, the importance of the information is different from 0 to 1, which indicates that the importance is changed from low to high. The formula is as follows:
Γu=σ(Wu[a<t-1>,x<t>,c<t-1>]+bu) (8)
secondly, the content to be newly added is expressed as
Figure GDA0003544475550000091
The formula is as follows:
Figure GDA0003544475550000092
finally obtaining the memory cells c at the t moment<t>Disclosure of the inventionOvercombination update gate gammauAnd new added value
Figure GDA0003544475550000093
Calculated, the formula is:
Figure GDA0003544475550000094
(3) an output gate: determining the final output content, where the output gate has a value range of [0, 1], and the final output content is expressed by the following formula:
Γo=σ(Wo[a<t-1>,x<t>,c<t-1>]+bo) (11)
a<t>=Γo*c<t> (12)
the main principle of the network is that data to be deleted is removed at each moment, then new content is added and memory cells are updated, and finally data at the current moment are output. In this layer, the main steps are as follows:
1. building a BilST model, and setting a word vector matrix E ═ E obtained in the first step1,e-2,...,en-1E, inputting a BilSTM model;
2. training network weights through a back propagation algorithm;
3. optimizing overfitting phenomena by using techniques such as Dropout and L2 according to requirements;
4. outputting a sentence-level feature vector matrix (y)1,y2,...,yn-1,yn)。
(III) CRF layer
Generally speaking, softmax model can be directly selected to directly obtain the desired result, but because the sentence-level feature vectors obtained in the BilsTM model have the possibility of labeling offset, and the traditional softmax model has the defect of processing the problem, a CRF model is selected to solve the problem so as to obtain the optimal output result facing the global sequence, and the effect is better than that of the BilsTM model alone or the softmax model directly.
The LSTM output vector Y in the previous layer is determined as { Y ═ Y1,y2,...,yn-1,ynAnd (4) inputting the data into the model, and combining the constraint of conditional probability distribution with an input-output sequence to obtain a final result so as to reduce the error of the data. The specific principle is as follows:
first, let us set the output sequence Y of the BiLSTM layer as { Y ═ Y1,y2,...,yn-1,ynThe input sequence X ═ X for CRF1,x2,...,xn-1,xnThen let the correct notation sequence Y ═ Y1,y2,...,yn-1,ynAnd constructing a conditional probability P ═ y |, x, and the main formula is as follows:
Figure GDA0003544475550000101
Figure GDA0003544475550000102
Figure GDA0003544475550000111
wherein
Figure GDA0003544475550000112
For transfer matrix, representing slave label yi-1To yiThe transition probability of (a) is,
Figure GDA0003544475550000113
is that the predicted result is yiScore of label, Z (x) is normalization factor, tkAnd siAs a characteristic function, muiAnd λkIs a weight parameter.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A PRADO-based entity identification method is characterized by comprising the following steps:
acquiring original data, and performing word segmentation and labeling processing on the original data;
on the PRADO layer, based on a projection Embedding model, a projection network is constructed by using locality sensitive hashing, and each word in a sentence is converted into a low-dimensional Embedding word list, namely the method comprises the following steps:
repeatedly carrying out binary hash on the ith word to obtain a 2B bit vector
Figure FDA0003550861130000011
A projection matrix P generated by an initial random number is utilized, wherein the optimization of the projection matrix P comprises the steps of comparing the final output result of the model with an actual value, carrying out a back propagation algorithm, and adaptively updating the projection matrix P through gradient checking;
and using projection matrix pairs
Figure FDA0003550861130000012
Performing projection to obtain a d-dimension vector
Figure FDA0003550861130000013
The method comprises the following steps:
Figure FDA0003550861130000014
Figure FDA0003550861130000015
wherein, PkIn order to be a function of the projection,
Figure FDA0003550861130000016
representing a vector
Figure FDA0003550861130000017
And vector
Figure FDA0003550861130000018
The angle therebetween;
Figure FDA0003550861130000019
is composed of
Figure FDA00035508611300000110
The projection of (a) is performed,
Figure FDA00035508611300000111
using pairs of activation functions
Figure FDA00035508611300000112
Activating to obtain the low-dimensional Embedding word list e of the wordiExpressed as:
Figure FDA00035508611300000113
wherein, WpThe weight parameter is the projection network; b ispIs a bias parameter of the projection network;
extracting Embedding vector characteristics by using context correlation characteristics of a BilSTM neural network;
distributing the feature vectors acquired by the BilSTM layer to different attention weights by an attention mechanism method;
and (4) completing the task of sequence labeling by using the CRF.
2. The PRADO-based entity recognition method of claim 1, wherein the assigning the feature vectors obtained from the projection layer with different attention weights by an attention mechanism method comprises:
Figure FDA0003550861130000021
Figure FDA0003550861130000022
Figure FDA0003550861130000023
wherein alpha isi,t′Indicating the generation result yiHow much attention is needed to be put to et′Upper, i.e. attention weight factor, et,t′Ensuring as an auxiliary parameter that the sum of the weights is 1, yiTo output the result, TxThe length of the input sequence.
3. A PRADO-based entity recognition method according to claim 1, wherein the bilst neural network context-dependent features are used to extract the Embedding vector features, i.e. the data to be deleted at each time, add new contents and update the memory cells, and output the current time data, the bilst neural network includes a forgetting gate, an input gate and an output gate, the forgetting gate is used to select the information to be discarded and left in the memory cells, the input gate is used to update the control factors, and the content is updated, the output gate is used to determine the final output content, the forgetting gate is expressed as:
Γf=σ(Wf[a<t-1>,x<t>,c<t-1>]+bf);
the input gates are represented as:
Γu=σ(Wu[a<t-1>,x<t>,c<t-1>]+bu);
Figure FDA0003550861130000024
Figure FDA0003550861130000025
the output gate is represented as:
Γo=σ(Wo[a<t-1>,x<t>,c<t-1>]+bo);
a<t>=Γo*c<t>
wherein, gamma isfA factor for forgetting the gate, WfWeight of forgetting gate, bfIs the offset value of the forgetting gate; a is<t-1>Is an activation value; c. C<t-1>The value of the memory cell at the last moment is recorded; gamma-shapeduFactor of input gate, WuAs the weight of the input gate, buIs the offset value of the input gate;
Figure FDA0003550861130000026
the content is to be newly added; c. C<t>Newly added content; x is the number of<t>Is the t-th input parameter; gamma-shapedoFactor of output gate, WoAs weights of output gates, boIs the offset value of the output gate; bcTo be newly added with content
Figure FDA0003550861130000031
The corresponding offset value.
4. The PRADO-based entity identification method of claim 1, wherein the task of performing sequence labeling by using CRF comprises:
output sequence Y of the BiLSTM layer ═ Y1,y2,...,yn-1,ynThe input sequence X ═ X for CRF1,x2,...,xn-1,xn};
Make the training network correctThe sequence of the label is Y ═ Y1,y2,...,yn-1,ynAnd constructing a conditional probability P ═ y |, which specifically includes:
Figure FDA0003550861130000032
Figure FDA0003550861130000033
Figure FDA0003550861130000034
wherein
Figure FDA0003550861130000035
For transfer matrix, representing slave label yi-1To yiThe transition probability of (a) is,
Figure FDA0003550861130000036
is that the predicted result is the yiScore of individual labels, Z (x) is a normalization factor, tkAnd siAs a characteristic function, muiAnd λkIs a weight parameter.
CN202011334119.4A 2020-11-25 2020-11-25 Entity identification method based on PRADO Active CN112800756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011334119.4A CN112800756B (en) 2020-11-25 2020-11-25 Entity identification method based on PRADO

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011334119.4A CN112800756B (en) 2020-11-25 2020-11-25 Entity identification method based on PRADO

Publications (2)

Publication Number Publication Date
CN112800756A CN112800756A (en) 2021-05-14
CN112800756B true CN112800756B (en) 2022-05-10

Family

ID=75806276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011334119.4A Active CN112800756B (en) 2020-11-25 2020-11-25 Entity identification method based on PRADO

Country Status (1)

Country Link
CN (1) CN112800756B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194414A (en) * 2017-04-25 2017-09-22 浙江工业大学 A kind of SVM fast Incremental Learning Algorithms based on local sensitivity Hash
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
EP3398115A1 (en) * 2016-03-01 2018-11-07 Google LLC Compressed recurrent neural network models
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A kind of entity relationship joint abstracting method and system based on attention mechanism
CN110263332A (en) * 2019-05-28 2019-09-20 华东师范大学 A kind of natural language Relation extraction method neural network based
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN110832596A (en) * 2017-10-16 2020-02-21 因美纳有限公司 Deep convolutional neural network training method based on deep learning
WO2020093761A1 (en) * 2018-11-05 2020-05-14 扬州大学 Entity and relationship joint extraction method oriented to software bug knowledge
CN111291556A (en) * 2019-12-17 2020-06-16 东华大学 Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111611775A (en) * 2020-05-14 2020-09-01 沈阳东软熙康医疗系统有限公司 Entity identification model generation method, entity identification method, device and equipment
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3398115A1 (en) * 2016-03-01 2018-11-07 Google LLC Compressed recurrent neural network models
CN107194414A (en) * 2017-04-25 2017-09-22 浙江工业大学 A kind of SVM fast Incremental Learning Algorithms based on local sensitivity Hash
CN110832596A (en) * 2017-10-16 2020-02-21 因美纳有限公司 Deep convolutional neural network training method based on deep learning
CN108628823A (en) * 2018-03-14 2018-10-09 中山大学 In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
WO2020093761A1 (en) * 2018-11-05 2020-05-14 扬州大学 Entity and relationship joint extraction method oriented to software bug knowledge
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A kind of entity relationship joint abstracting method and system based on attention mechanism
CN110263332A (en) * 2019-05-28 2019-09-20 华东师范大学 A kind of natural language Relation extraction method neural network based
CN110825845A (en) * 2019-10-23 2020-02-21 中南大学 Hierarchical text classification method based on character and self-attention mechanism and Chinese text classification method
CN111291556A (en) * 2019-12-17 2020-06-16 东华大学 Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111611775A (en) * 2020-05-14 2020-09-01 沈阳东软熙康医疗系统有限公司 Entity identification model generation method, entity identification method, device and equipment
CN111914097A (en) * 2020-07-13 2020-11-10 吉林大学 Entity extraction method and device based on attention mechanism and multi-level feature fusion

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
PRADO: Projection Attention Networks for Document Classification On-Device;Kaliamoorthi Prabhu 等;《Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)》;20191130;5012-5021 *
Quantization and training of neural networks for efficient integer-arithmetic-only inference;Jacob Benoit 等;《Proceedings of the IEEE conference on computer vision and pattern recognition》;20181231;2704-2713 *
The Prediction Model of Saccade Target Based on LSTM-CRF for Chinese Reading;Wan Xiaoming 等;《International Conference on Brain Inspired Cognitive Systems》;20180731;44-53 *
基于改进随机决策树算法的分布式数据挖掘;石红姣;《计算机与数字工程》;20170920;第45卷(第9期);1802-1808 *
应用预处理技术的深度学习特征融合的文字识别算法;冯玮;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20190115(第01期);I138-5329 *

Also Published As

Publication number Publication date
CN112800756A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
CN108733792B (en) Entity relation extraction method
CN111476294B (en) Zero sample image identification method and system based on generation countermeasure network
CN109902293B (en) Text classification method based on local and global mutual attention mechanism
CN110929030B (en) Text abstract and emotion classification combined training method
WO2021212749A1 (en) Method and apparatus for labelling named entity, computer device, and storage medium
CN112579778B (en) Aspect-level emotion classification method based on multi-level feature attention
CN112270379A (en) Training method of classification model, sample classification method, device and equipment
CN111506732B (en) Text multi-level label classification method
CN110046223B (en) Film evaluation emotion analysis method based on improved convolutional neural network model
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN115081437B (en) Machine-generated text detection method and system based on linguistic feature contrast learning
CN110580287A (en) Emotion classification method based ON transfer learning and ON-LSTM
CN111984791B (en) Attention mechanism-based long text classification method
WO2023134083A1 (en) Text-based sentiment classification method and apparatus, and computer device and storage medium
CN110276396B (en) Image description generation method based on object saliency and cross-modal fusion features
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN114186063A (en) Training method and classification method of cross-domain text emotion classification model
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN112347245A (en) Viewpoint mining method and device for investment and financing field mechanism and electronic equipment
CN114048314A (en) Natural language steganalysis method
CN110569355A (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN113627550A (en) Image-text emotion analysis method based on multi-mode fusion
CN113761885A (en) Bayesian LSTM-based language identification method
CN116956228A (en) Text mining method for technical transaction platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant