CN111209738A - Multi-task named entity recognition method combining text classification - Google Patents
Multi-task named entity recognition method combining text classification Download PDFInfo
- Publication number
- CN111209738A CN111209738A CN201911417834.1A CN201911417834A CN111209738A CN 111209738 A CN111209738 A CN 111209738A CN 201911417834 A CN201911417834 A CN 201911417834A CN 111209738 A CN111209738 A CN 111209738A
- Authority
- CN
- China
- Prior art keywords
- task
- layer
- word
- vector
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a multi-task named entity recognition method combining text classification. The method comprises the following steps: (1) constructing a text classifier by using a convolutional neural network, and measuring the similarity of texts; (2) selecting a proper threshold, and determining whether the auxiliary task data set participates in the update of the shared layer parameters according to the comparison between the text classification result and the threshold; (3) cascading character vectors of the text and pre-trained word vectors to serve as input feature vectors; (4) in a sharing layer, modeling an input feature vector of each word in a sentence by using a bidirectional LSTM, and learning common features of each task; (5) training each task in turn on the task layer, transmitting the output of the sharing layer to the bidirectional LSTM neural network in the main task private layer or the auxiliary task private layer, then using the linear chain random field to decode the label of the whole sentence, and labeling the entity in the sentence. The invention performs experiments on data sets in multiple biomedical fields, and can effectively improve the named entity recognition effect in a specific field with difficult acquisition of language materials and high labeling cost.
Description
Technical Field
The invention relates to natural language processing, in particular to a multitask named entity recognition method combining text classification.
Background
Natural Language Processing (NLP) is a cross discipline integrating linguistics and computer disciplines. Named Entity Recognition (NER) is a basic task in natural language processing, and aims to recognize proper nouns and meaningful quantitative phrases in natural language texts and classify the proper nouns and meaningful quantitative phrases. With the rise of information extraction and big data concepts, named entity recognition tasks are increasingly emphasized by people and become important components of natural language processing such as public opinion analysis, information retrieval, automatic question answering and machine translation. How to automatically, accurately and quickly identify named entities from massive internet text information gradually becomes a hot problem concerned by academia and industry.
Named entity recognition techniques, which aim to identify entity text and categories in documents in a particular domain (e.g., biomedical), have become an important component of document classification, retrieval, and content analysis in a particular domain. Taking the biomedical field as an example, while the number of biomedical documents, clinical records, etc. is growing at a high rate, there is also a high rate of growth of new biomedical entities and their acronyms, synonyms. However, existing named entity recognition systems based on learning rely heavily on labeling data which requires high cost, and in the biomedical field, professional domain knowledge is required to label data. How to utilize the published data set without additional manual labeling of a new data set has become a research focus at present.
The neural network model is a mainstream entity recognition technology at present for recognizing named entities in texts, however, such a learning model often needs a large amount of labeled data for training. Neural network models often perform very poorly due to the lack of training data in the biomedical field.
Aiming at the difficulty in the prior art, a multi-task named entity recognition method for joint text classification in a specific field is provided. Although data is often limited for a particular domain, there is often some data for related domains. For example, in the biomedical field, there are related field data sets, such as disease data sets, drug data sets, species data sets, and the like. The purpose of the method study is to utilize the data to help the target task improve the effect. The method is based on the assumption that two data sets should overlap in semantic space if they can facilitate each other or the target task. When the overlapped part of the two data sets is close in semanteme, namely a target task is trained, sentences close to the target task in the auxiliary task are trained, and sentences with non-close semantemes are not trained. The used frame is multi-task learning, and if the sentences of the auxiliary task are close to the target task semantics, the sharing layer and the task layer are updated; otherwise, only the task layer is updated. Experiments are carried out on a plurality of data sets in biomedicine and related fields, and the effect of a target task can be effectively improved under most conditions.
Disclosure of Invention
The invention aims to utilize the data sets of related fields to help the target field to improve the effect under the background that new data sets do not need to be additionally labeled, and provides a multi-task named entity recognition method aiming at joint text classification of a specific field.
The technical scheme adopted by the invention is as follows:
a multitask named entity recognition method combining text classification comprises the following steps:
s1: constructing a text classifier by using a convolutional neural network, and measuring the similarity of texts;
s2: selecting a threshold, and determining whether the auxiliary task data set participates in updating of the shared layer parameters according to the comparison between the text classification result and the threshold;
s3: cascading character vectors of the text and pre-trained word vectors to serve as input feature vectors;
s4: in a sharing layer, modeling an input feature vector of each word in a sentence by using a bidirectional LSTM, and learning common features of each task;
s5: training each task in turn on the task layer, transmitting the output of the sharing layer to the bidirectional LSTM neural network in the main task private layer or the auxiliary task private layer, then using the linear chain random field to decode the label of the whole sentence, and labeling the entity in the sentence.
The steps can be realized in the following way:
in step S1, a text classifier is constructed by using a convolutional neural network, and the specific steps of measuring the similarity of the text are as follows:
s11: inputting each word in a sentence, and converting the word into a word vector with a dimension of k through a word embedding module; let the word vector of the ith word in the sentenceIf the sentence length is n, the sentence is represented as:
x1:n=[x1;x2;…;xn](1)
s12: let the convolution kernel beAt window xi:i+h-1The upper convolution calculation obtains the characteristic ci:
ci=f(w·xi:i+h-1+b) (2)
Where h × k is the dimension of the convolution kernel, and b represents the bias;
the sentence-wise constructed features of length n are:
c=[c1;c2;…;cn-h+1](3)
s14: using a plurality of convolution kernels w1,w2,…,wsRespectively performing the above operations to express the obtained corresponding characteristicsSplicing, inputting into a fully connected network, and classifying by using a Softmax function; the Softmax function is defined as follows:
wherein, being the input of the Softmax function, ViAn ith element representing an input vector; s is the output of the Softmax function, SiAnd an ith element representing an output vector represents the probability that the input sentence belongs to an ith category, and the number of the categories is M.
In step S2, a threshold is selected, and for the data set of the auxiliary task, the specific step of determining whether to participate in updating the shared layer parameter according to the comparison between the text classification result and the threshold is as follows:
s21: setting m data sets, wherein the first data set is set as a main task, and the rest m-1 data sets are auxiliary tasks;
s22: after the training of the text classifier is completed, each sentence is subjected to text classificationThe machine will generate 1 vector asThe 1 st digit of the vector is denoted as k0Each dataset takes k of all sentences0As a threshold for the data set;
s23: when the multi-task named entity recognition model is trained, the data of the main task is updated to the sharing layer by default;
s24: the data of the auxiliary task passes through a text classifier, and when the text is classified and output, k0And if the value is larger than the threshold value, updating the task layer and the sharing layer, otherwise, only updating the task layer.
In step S3, the step of concatenating the character vector of the text and the pre-trained word vector as the input feature vector is as follows:
s31: the method comprises the steps that a natural language processing tool is adopted to perform sentence segmentation and word segmentation on a document, and the sentences, words and labels are counted to form a sentence table, a word table and a label table; counting characters in the word list to form a character list;
s32: let C be the character table, d be the dimension of each character vector, and the character vector matrix be:
s33: let the vector of the ith character of the word t beThe word is denoted t1:l=[t1;t2;…;tl]Where l is the length of the word t;
s34: using a kernel of height hRealizing convolution, adding bias value b, then making nonlinear regression on the whole convolution result to implement characteristic mapping, and mapping function ftThe ith element ft(i) Is given by formula (6);
ft(i)=tanh(w·ti:i+h-1+b) (6)
s35: with yt=maxift(i) A feature expression corresponding to a convolution kernel w as a word t;
s36: using a plurality of convolution kernels w1,w2,…,wqRespectively performing the above operations to express the obtained corresponding characteristicsAnd splicing the words, and then cascading the words with the word vectors pre-trained by the words t to be used as the input feature vectors of the t.
In step S4, in the sharing layer, the specific steps of modeling the input feature vector of each word in the sentence by using the bidirectional LSTM and learning the common features of each task are as follows:
s41: definition of xtIs the input feature vector at time t, htFor the hidden layer state vector to store all useful information at time t, σ is sigmoid regression layer, and x is inner product, Ui,Uf,Uc,UoFor input x in different statestWeight matrix of Wi,Wf,Wc,WoIs a hidden layer state htWeight matrix of bi,bf,bc,boIs a bias vector;
s42: the calculation of the forget gate at time t is shown in equation (7):
ft=σ(Wfht-1+Ufxt+bf) (7)
ftdetermining the proportion of the unit state needing to be forgotten at the time of t-1;
s43: and updating the information in the cell state required to be stored until the time t, wherein the calculation formulas are shown as (8) and (9):
it=σ(Wiht-1+Uixt+bi) (8)
whereinTo be added to the candidate vector for the cell state at time t, itDeterminingA storable proportion;
s44: combining the calculation results of the first two steps together to generate a new cell state, wherein the calculation formula is shown as an expression (10):
Ctcell state at time t;
s45: the output at time t is calculated, and the calculation formulas are shown as (11) and (12):
ot=σ(Woht-1+Uoxt+bo) (11)
ht=ot*tanh(Ct) (12)
wherein o istDetermining the proportion of the unit state which can be used as output at the time t; h istA hidden layer vector representing time t as output information of time t;
s46: hidden layer information h in the above steptStoring all the past time information, and setting a hidden layer information g by the same methodtFor storing future information, the last two hidden layer information are concatenated to form the final output vector.
In step S5, each task is trained in turn at the task layer, the output of the shared layer is transmitted to the bidirectional LSTM neural network in the main task private layer or the auxiliary task private layer, the entire sentence is tag-decoded by using the linear chain random field, and the entity in the sentence is labeled as follows:
s51: the output of the sharing layer is used as input and is transmitted into a bidirectional LSTM private layer of a main task or an auxiliary task, and then the output of the bidirectional LSTM private layer is used as the input of a conditional random field;
s52: with z ═ z1,z2,…,znDenotes an input sequence of conditional random fields, where n is the length of the input sequence and z is the length of the input sequenceiIs the input vector of the ith word, y ═ y1,y2,…,ynY (z) ═ y'1,y′n,…,y′nZ is all possible output label sequences;
s53: for tag sequence y, its score is defined as:
wherein A is a transition score matrix, Aj,kRepresents the score of the transition from label j to label k; p is a fractional matrix of the output of the previous layer network, Pj,kA score of a kth tag corresponding to a jth word;
s54: for an input sequence z, the probability that its tag sequence is y is defined as:
in the training process, the logarithmic probability of the correct sequence label is maximized;
s55: at the time of final decoding, the sequence y with the highest score is searched*As a final output sequence, as shown in equation (15):
compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a multi-task named entity recognition method for joint text classification in a specific field. Aiming at the problem that a specific field (such as a biomedical field) is lack of labeled data, the method fully utilizes the theoretical knowledge of multi-task learning and explores and utilizes a related field data set to improve the named entity identification accuracy of the target field.
2. The method combines a text classification model to measure the relevance between the related field data and the target task, the related field data with high relevance to the target task participates in the update of the shared layer parameters, and the data with low relevance only participates in the update of the self task layer parameters. Therefore, irrelevant data are prevented from interfering the training of the target task, and the relevant data are effectively utilized to improve the effect of the target task.
Drawings
FIG. 1 is a schematic diagram of a text classification model based on a convolutional neural network;
FIG. 2 is a schematic diagram of a bi-directional LSTM neural network;
FIG. 3 is a block diagram of a method for multi-tasking named entity recognition for federated text classification;
FIG. 4 is a training flow of a method for multi-task named entity recognition with joint text classification.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
The invention mainly realizes a multi-task named entity recognition method for joint text classification in a specific field. Aiming at the problem that a specific field (such as a biomedical field) is lack of labeled data, the method fully utilizes the theoretical knowledge of multi-task learning and explores and utilizes a related field data set to improve the named entity identification accuracy of the target field. The invention adopts the text classification model based on the convolutional neural network shown in figure 1 to measure the relevance of the related field data and the target task. The result of the character feature vector and the word vector after being cascaded is input into the bidirectional LSTM neural network shown in FIG. 2, and then input into the task layer of the main task or the auxiliary task, and the overall framework of the multitask model is shown in FIG. 3.
The invention discloses a multi-task named entity recognition method based on combined text classification, which comprises the following specific steps:
s1: and constructing a text classifier by using a convolutional neural network, and measuring the similarity of the text.
In this embodiment, the sub-steps of specifically implementing S1 are as follows:
s11: inputting each word in a sentence, and converting the word into a word vector with a dimension of k through a word embedding module; let the word vector of the ith word in the sentenceIf the sentence length is n, the sentence is represented as:
x1:n=[x1;x2;…;xn](1)
s12: let the convolution kernel beAt window xi:i+h-1The upper convolution calculation obtains the characteristic ci:
ci=f(w·xi:i+h-1+b) (2)
Where h × k is the dimension of the convolution kernel, and b represents the bias;
the sentence-wise constructed features of length n are:
c=[c1;c2;…;cn-h+1](3)
s14: using a plurality of convolution kernels w1,w2,…,wsRespectively performing the above operations to express the obtained corresponding characteristicsSplicing, inputting into a fully connected network, and classifying by using a Softmax function; the Softmax function is defined as follows:
wherein, being the input of the Softmax function, ViAn ith element representing an input vector; s is the output of the Softmax function, SiAnd an ith element representing an output vector represents the probability that the input sentence belongs to an ith category, and the number of the categories is M.
S2: and selecting a proper threshold, and determining whether the auxiliary task data set participates in the update of the shared layer parameters according to the comparison between the text classification result and the threshold.
In this embodiment, the sub-steps of specifically implementing S3 are as follows:
s21: setting m data sets, wherein the first data set is set as a main task, and the rest m-1 data sets are auxiliary tasks;
s22: after the training of the text classifier is completed, each sentence generates 1 vector through the text classifier and records the vector asThe 1 st digit of the vector is denoted as k0Each dataset takes k of all sentences0As a threshold for the data set;
s23: when the multi-task named entity recognition model is trained, the data of the main task is updated to the sharing layer by default;
s24: the data of the auxiliary task passes through a text classifier, and when the text is classified and output, k0And if the value is larger than the threshold value, updating the task layer and the sharing layer, otherwise, only updating the task layer.
S3: and cascading character vectors of the text and pre-trained word vectors to serve as input feature vectors.
In this embodiment, the sub-steps of specifically implementing S3 are as follows:
s31: the method comprises the steps that a natural language processing tool is adopted to perform sentence segmentation and word segmentation on a document, and the sentences, words and labels are counted to form a sentence table, a word table and a label table; counting characters in the word list to form a character list;
s32: let C be the character table, d be the dimension of each character vector, the character vector matrixComprises the following steps:
s33: let the vector of the ith character of the word t beThe word is denoted t1:l=[t1;t2;…;tl]Where l is the length of the word t;
s34: using a kernel of height hRealizing convolution, adding bias value b, then making nonlinear regression on the whole convolution result to implement characteristic mapping, and mapping function ftThe ith element ft(i) Is given by formula (6);
ft(i)=tanh(w·ti:i+h-1+b) (6)
s35: with yt=maxift(i) A feature expression corresponding to a convolution kernel w as a word t;
s36: using a plurality of convolution kernels w1,w2,…,wqRespectively performing the above operations to express the obtained corresponding characteristicsAnd splicing the words, and then cascading the words with the word vectors pre-trained by the words t to be used as the input feature vectors of the t.
S4: in the sharing layer, the input feature vector of each word in the sentence is modeled by using bidirectional LSTM, and the common features of all tasks are learned.
In this embodiment, the sub-steps of specifically implementing S4 are as follows:
s41: definition of xtIs the input feature vector at time t, htFor the hidden layer state vector to store all useful information at time t, σ is sigmoid regression layer, and x is inner product, Ui,Uf,Uc,UoFor input x in different statestWeight matrix of Wi,Wf,Wc,WoIs a hidden layer state htWeight matrix of bi,bf,bc,boIs a bias vector;
s42: the calculation of the forget gate at time t is shown in equation (7):
ft=σ(Wfht-1+Ufxt+bf) (7)
ftdetermining the proportion of the unit state needing to be forgotten at the time of t-1;
s43: and updating the information in the cell state required to be stored until the time t, wherein the calculation formulas are shown as (8) and (9):
it=σ(Wiht-1+Uixt+bi) (8)
whereinTo be added to the candidate vector for the cell state at time t, itDeterminingA storable proportion;
s44: combining the calculation results of the first two steps together to generate a new cell state, wherein the calculation formula is shown as an expression (10):
Ctcell state at time t;
s45: the output at time t is calculated, and the calculation formulas are shown as (11) and (12):
ot=σ(Woht-1+Uoxt+bo) (11)
ht=ot*tanh(Ct) (12)
wherein o istDetermining the proportion of the unit state which can be used as output at the time t; h istA hidden layer vector representing time t as output information of time t;
s46: hidden layer information h in the above steptStoring all the past time information, and setting a hidden layer information g by the same methodtFor storing future information, the last two hidden layer information are concatenated to form the final output vector.
S5: training each task in turn on the task layer, transmitting the output of the sharing layer to the bidirectional LSTM neural network in the main task private layer or the auxiliary task private layer, then using the linear chain random field to decode the label of the whole sentence, and labeling the entity in the sentence.
In this embodiment, the sub-steps of specifically implementing S5 are as follows:
s51: the output of the sharing layer is used as input and is transmitted into a bidirectional LSTM private layer of a main task or an auxiliary task, and then the output of the bidirectional LSTM private layer is used as the input of a conditional random field;
s52: with z ═ z1,z2,…,znDenotes an input sequence of conditional random fields, where n is the length of the input sequence and z is the length of the input sequenceiIs the input vector of the ith word, y ═ y1,y2,…,ynY (z) ═ y'1,y′n,…,y′nZ is all possible output label sequences;
s53: for tag sequence y, its score is defined as:
wherein A is a transition score matrix, Aj,kRepresents the score of the transition from label j to label k; p is a fractional matrix of the output of the previous layer network, Pj,kA score of a kth tag corresponding to a jth word;
s54: for an input sequence z, the probability that its tag sequence is y is defined as:
in the training process, the logarithmic probability of the correct sequence label is maximized;
s55: at the time of final decoding, the sequence y with the highest score is searched*As a final output sequence, as shown in equation (15):
the method is applied to the embodiment, the specific steps and the parameter definitions are as described above, and some contents are not repeated again, and the embodiment mainly shows the specific implementation and technical effects thereof.
Examples
Taking 3 public data sets (BioNLP13CG, BioNLP13PC and CRAFT) of cell component groups in the biomedical field as an example, the method is applied to the 3 data sets for named entity identification, and specific parameters and practices in each step are as follows: training a text classifier:
1. each word in the input sentence is converted into a word vector of dimension 128 by the word embedding module. A sentence of length n may be represented as1:n=[x1;x2;…;xn];
2. The convolution kernel uses three sizes of 3, 4 and 5, and 100 sentences with the length of n are respectively used for constructing the feature which is recorded as c;
4. And splicing all the features, inputting the spliced features into a fully-connected network, and classifying by using a Softmax function so as to construct a text classifier. When the text classifier is trained, the batch size is 64, the dropout is 0.5, and the initial learning rate is set to be 0.001;
selecting a proper threshold value:
5.for example, the named entity recognition task of BioNLP13CG is used as a main task, and the other two tasks are used as auxiliary tasks; after the training of the text classifier is completed, each sentence of the data sets BioNLP13PC and CRAFT generates 1 vector through the text classifier and records the vector as 1 vectorThe 1 st digit of the k vector is denoted as k0. The two data sets respectively take k of all sentences0The average value of (a) is used as a threshold value;
6. during multitask model training, the data of the BioNLP13CG updates the shared layer by default. The data of BioNLP13PC and CRAFT are firstly processed by a text classifier, and when the text classification outputs k0When the value is larger than the corresponding threshold value, the task layer and the sharing layer are updated; otherwise, only updating the task layer;
extracting character feature vectors of the text, and cascading the character feature vectors and the pre-trained word vectors as input feature vectors:
7. and performing sentence segmentation and word segmentation on the document by adopting a natural language processing tool, and performing statistics on sentences, words and labels to form a sentence table, a vocabulary table and a label table. Counting characters in the word list to form a character list;
8. let C be the character table, d be the dimension of each character vector, and the character vector matrix be:
9. let the vector of the ith character of the word t beThe word is denoted t1:l=[t1;t2;…;tl]Where l is the length of the word t;
10. using a kernel of height hRealizing convolution, adding bias value b, then making nonlinear regression on the whole convolution result to implement characteristic mapping, and mapping function ftThe ith element ft(i) Is given by the formula (6). With yt=maxift(i) As a characteristic expression for the word t corresponding to the convolution kernel w.
11. Using a plurality of convolution kernels w1,w2,…,wqRespectively performing the above operations to express the obtained corresponding characteristicsSpliced together, and then cascaded with the word t pre-trained GloVe 100-dimensional word vector disclosed by 6 hundred million Stanford as the input feature vector of t.
At the sharing layer, the input feature vector for each word in the sentence is modeled using bi-directional LSTM:
12. in the sharing layer, the input feature vector obtained in the step 11 is transmitted into a bidirectional LSTM, the parameter updating mode of the bidirectional LSTM neural network is that 10 is used as batchsize, parameter updating is carried out by using an Adam optimization algorithm, dropout is 0.5, the initialized learning rate is 0.015, and after each iteration, the learning rate updating formula is thatWherein the decline rate d is 0.05, and e is the iteration number;
13. definition of xtIs the input feature vector at time t, htFor the hidden layer state vector to store all useful information at time t, σ is sigmoid regression layer, and x is inner product, Ui,Uf,Uc,UoFor input x in different statestWeight matrix of Wi,Wf,Wc,WoIs a hidden layer state htWeight matrix of bi,bf,bc,boIs a bias vector;
14. the calculation formula for forget gate at time t is as follows:
ft=σ(Wfht-1+Ufxt+bf)
15. and updating the information in the cell state required to be saved to the time t, wherein the calculation formula is as follows:
it=σ(Wiht-1+Uixt+bi)
16. combining the calculation results of the first two steps together to generate a new cell state, wherein the calculation formula is as follows:
17. output at time t, and update htThe calculation formula is as follows:
Ot=σ(Woht-1+Uoxt+bo)
ht=ot*tanh(Ct)
wherein o istIs the output at time t; h istA vector of a hidden layer at time t;
18. h in the above steptStoring all the past time information, and setting a g again by the same methodtFor storing future information, the last two hidden layer information are concatenated to form the final output vector.
Training each task in turn at the task level:
19. the output of BioNLP13CG at the shared layer is used as input to be transmitted into the bidirectional LSTM network of the private layer of the main task, and the output of BioNLP13PC and CRAFT at the shared layer is used as input to be transmitted into the bidirectional LSTM networks of the private layers of the auxiliary task 1 and the auxiliary task 2 respectively. Taking the output of the bidirectional LSTM as the input of the conditional random field;
and (3) carrying out entity labeling on each word by using a conditional random field:
20. with z ═ z1,z2,…,znDenotes an input sequence of conditional random fields, where n is the length of the input sequence and z is the length of the input sequenceiIs the input vector of the ith word, y ═ y1,y2,…,ynY (z) ═ y'1,y′n,…,y′nA possible output tag sequence of z;
21. for tag sequence y, its score is defined as:
wherein A is a transition score matrix, Aj,kRepresents the score of the transition from label j to label k; p is a fractional matrix of the output of the previous layer network, Pj,kThe score of the kth tag corresponding to the jth word.
22. For an input sequence z, the probability that its tag sequence is y is defined as:
in the training process, we maximize the log probability of the correct sequence label;
23. at the time of final decoding, the sequence y with the highest score is searched*As the final output sequence:
24. and identifying the position of the marked words in the original file, and neatly feeding back the marking result to the user, so that the marking accuracy can be calculated. The following table was achieved:
data set | Single task | BioNLP13CG | BioNLP13PC | CRAFT |
BioNLP13CG | 74.72 | 77.11 | 77.65 | 69.16 |
BioNLP13PC | 88.17 | 78.16 | 89.12 | 77.23 |
CRAFT | 64.24 | 61.53 | 62.31 | 64.72 |
The single task column indicates the accuracy of 3 data sets to identify tasks as separate named entities. The accuracy of the BioNLP13CG column indicates the accuracy of the BioNLP13CG as the main task and the remaining 2 as the auxiliary tasks, and the BioNLP13PC column and the CRAFT column are the same.
From the above experimental results, the accuracy of the main task in the multitask model is generally higher than that of the single task. Therefore, the method can effectively improve the accuracy of the target task.
Claims (6)
1. A multitask named entity recognition method combining text classification is characterized by comprising the following steps:
s1: constructing a text classifier by using a convolutional neural network, and measuring the similarity of texts;
s2: selecting a threshold, and determining whether the auxiliary task data set participates in updating of the shared layer parameters according to the comparison between the text classification result and the threshold;
s3: cascading character vectors of the text and pre-trained word vectors to serve as input feature vectors;
s4: in a sharing layer, modeling an input feature vector of each word in a sentence by using a bidirectional LSTM, and learning common features of each task;
s5: training each task in turn on the task layer, transmitting the output of the sharing layer to the bidirectional LSTM neural network in the main task private layer or the auxiliary task private layer, then using the linear chain random field to decode the label of the whole sentence, and labeling the entity in the sentence.
2. The method for multi-task named entity recognition through combined text classification according to claim 1, wherein in step S1, a text classifier is constructed by using a convolutional neural network, and the specific steps for measuring the similarity of texts are as follows:
s11: inputting each word in a sentence, and converting the word into a word vector with a dimension of k through a word embedding module; let the word vector of the ith word in the sentenceIf the sentence length is n, the sentence is represented as:
x1:n=[x1;x2;…;xn](1)
s12: let the convolution kernel beAt window xi:i+h-1The upper convolution calculation obtains the characteristic ci:
ci=f(w·xi:i+h-1+b) (2)
Where h × k is the dimension of the convolution kernel, and b represents the bias;
the sentence-wise constructed features of length n are:
c=[c1;c2;…;cn-h+1](3)
s14: using a plurality of convolution kernels w1,w2,…,wsRespectively performing the above operations to express the obtained corresponding characteristicsSplicing, inputting into a fully connected network, and classifying by using a Softmax function; the Softmax function is defined as follows:
where V is the input to the Softmax function, ViAn ith element representing an input vector; s is the output of the Softmax function, SiAnd an ith element representing an output vector represents the probability that the input sentence belongs to an ith category, and the number of the categories is M.
3. The method as claimed in claim 1, wherein the step S2 of selecting the threshold, and the specific steps of determining whether the auxiliary task data set participates in the update of the shared layer parameter according to the comparison between the text classification result and the threshold are as follows:
s21: setting m data sets, wherein the first data set is set as a main task, and the rest m-1 data sets are auxiliary tasks;
s22: after the training of the text classifier is completed, each sentence generates 1 vector through the text classifier and records the vector asThe 1 st digit of the vector is denoted as k0Each dataset takes k of all sentences0As a threshold for the data set;
s23: when the multi-task named entity recognition model is trained, the data of the main task is updated to the sharing layer by default;
s24: the data of the auxiliary task passes through a text classifier, and when the text is classified and output, k0And if the value is larger than the threshold value, updating the task layer and the sharing layer, otherwise, only updating the task layer.
4. The method for multi-task named entity recognition through combined text classification as claimed in claim 1, wherein in step S3, the step of concatenating the character vector of the text and the pre-trained word vector as the input feature vector comprises:
s31: the method comprises the steps that a natural language processing tool is adopted to perform sentence segmentation and word segmentation on a document, and the sentences, words and labels are counted to form a sentence table, a word table and a label table; counting characters in the word list to form a character list;
s32: let C be the character table, d be the dimension of each character vector, and the character vector matrix be:
s33: let the vector of the ith character of the word t beThe word is denoted t1:l=[t1;t2;…;tl]Where l is the length of the word t;
s34: using a kernel of height hRealizing convolution, adding bias value b, then making nonlinear regression on the whole convolution result to implement characteristic mapping, and mapping function ftThe ith element ft(i) Is given by formula (6);
ft(i)=tanh(w·ti:i+h-1+b) (6)
s35: with yt=maxift(i) A feature expression corresponding to a convolution kernel w as a word t;
s36: using a plurality of convolution kernels w1,w2,…,wqRespectively performing the above operations to express the obtained corresponding characteristicsAnd splicing the words, and then cascading the words with the word vectors pre-trained by the words t to be used as the input feature vectors of the t.
5. The method as claimed in claim 1, wherein in step S4, the step of learning the common features of each task by modeling the input feature vector of each word in the sentence with bidirectional LSTM in the sharing layer comprises the following steps:
s41: definition of xtIs the input feature vector at time t, htFor the hidden layer state vector to store all useful information at time t, σ is sigmoid regression layer, and x is inner product, Ui,Uf,Uc,UoFor input x in different statestWeight matrix of Wi,Wf,Wc,WoIs a hidden layer state htWeight matrix of bi,bf,bc,boIs a bias vector;
s42: the calculation of the forget gate at time t is shown in equation (7):
ft=σ(Wfht-1+Ufxt+bf) (7)
ftdetermining the proportion of the unit state needing to be forgotten at the time of t-1;
s43: and updating the information in the cell state required to be stored until the time t, wherein the calculation formulas are shown as (8) and (9):
it=σ(Wiht-1+Uixt+bi) (8)
whereinTo be added to the candidate vector for the cell state at time t, itDeterminingA storable proportion;
s44: combining the calculation results of the first two steps together to generate a new cell state, wherein the calculation formula is shown as an expression (10):
Ctcell state at time t;
s45: the output at time t is calculated, and the calculation formulas are shown as (11) and (12):
ot=σ(Woht-1+Uoxt+bo) (11)
ht=ot*tanh(Ct) (12)
wherein o istDetermining the proportion of the unit state which can be used as output at the time t; h istHidden layer vector representing time t, asOutput information at time t;
s46: hidden layer information h in the above steptStoring all the past time information, and setting a hidden layer information g by the same methodtFor storing future information, the last two hidden layer information are concatenated to form the final output vector.
6. The method as claimed in claim 1, wherein in step S5, the steps of training each task in turn at task layer, transmitting the output of the shared layer to the bi-directional LSTM neural network in the main task private layer or the auxiliary task private layer, then using linear chain random field to decode the label of the whole sentence, and labeling the entity in the sentence are as follows:
s51: the output of the sharing layer is used as input and is transmitted into a bidirectional LSTM private layer of a main task or an auxiliary task, and then the output of the bidirectional LSTM private layer is used as the input of a conditional random field;
s52: with z ═ z1,z2,…,znDenotes an input sequence of conditional random fields, where n is the length of the input sequence and z is the length of the input sequenceiIs the input vector of the ith word, y ═ y1,y2,…,ynY (z) ═ y'1,y′n,…,y′nZ is all possible output label sequences;
s53: for tag sequence y, its score is defined as:
wherein A is a transition score matrix, Aj,kRepresents the score of the transition from label j to label k; p is a fractional matrix of the output of the previous layer network, Pj,kA score of a kth tag corresponding to a jth word;
s54: for an input sequence z, the probability that its tag sequence is y is defined as:
in the training process, the logarithmic probability of the correct sequence label is maximized;
s55: at the time of final decoding, the sequence y with the highest score is searched*As a final output sequence, as shown in equation (15):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911417834.1A CN111209738B (en) | 2019-12-31 | 2019-12-31 | Multi-task named entity recognition method combining text classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911417834.1A CN111209738B (en) | 2019-12-31 | 2019-12-31 | Multi-task named entity recognition method combining text classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111209738A true CN111209738A (en) | 2020-05-29 |
CN111209738B CN111209738B (en) | 2021-03-26 |
Family
ID=70786490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911417834.1A Active CN111209738B (en) | 2019-12-31 | 2019-12-31 | Multi-task named entity recognition method combining text classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111209738B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111859936A (en) * | 2020-07-09 | 2020-10-30 | 大连理工大学 | Cross-domain establishment oriented legal document professional jurisdiction identification method based on deep hybrid network |
CN112039997A (en) * | 2020-09-03 | 2020-12-04 | 重庆邮电大学 | Triple-feature-based Internet of things terminal identification method |
CN112052684A (en) * | 2020-09-07 | 2020-12-08 | 南方电网数字电网研究院有限公司 | Named entity identification method, device, equipment and storage medium for power metering |
CN112085251A (en) * | 2020-08-03 | 2020-12-15 | 广州数说故事信息科技有限公司 | Consumer product research and development combined concept recommendation method and system |
CN112541355A (en) * | 2020-12-11 | 2021-03-23 | 华南理工大学 | Few-sample named entity identification method and system with entity boundary class decoupling |
CN113064993A (en) * | 2021-03-23 | 2021-07-02 | 南京视察者智能科技有限公司 | Design method, optimization method and labeling method of automatic text classification labeling system based on big data |
CN113204970A (en) * | 2021-06-07 | 2021-08-03 | 吉林大学 | BERT-BilSTM-CRF named entity detection model and device |
CN113254617A (en) * | 2021-06-11 | 2021-08-13 | 成都晓多科技有限公司 | Message intention identification method and system based on pre-training language model and encoder |
CN113255342A (en) * | 2021-06-11 | 2021-08-13 | 云南大学 | Method and system for identifying product name of 5G mobile service |
CN113743111A (en) * | 2020-08-25 | 2021-12-03 | 国家计算机网络与信息安全管理中心 | Financial risk prediction method and device based on text pre-training and multi-task learning |
CN114036933A (en) * | 2022-01-10 | 2022-02-11 | 湖南工商大学 | Information extraction method based on legal documents |
CN114048749A (en) * | 2021-11-19 | 2022-02-15 | 重庆邮电大学 | Chinese named entity recognition method suitable for multiple fields |
CN115688777A (en) * | 2022-09-28 | 2023-02-03 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
CN116074317A (en) * | 2023-02-20 | 2023-05-05 | 王春辉 | Service resource sharing method and server based on big data |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN108153895A (en) * | 2018-01-06 | 2018-06-12 | 国网福建省电力有限公司 | A kind of building of corpus method and system based on open data |
CN108228568A (en) * | 2018-01-24 | 2018-06-29 | 上海互教教育科技有限公司 | A kind of mathematical problem semantic understanding method |
CN108229582A (en) * | 2018-02-01 | 2018-06-29 | 浙江大学 | Entity recognition dual training method is named in a kind of multitask towards medical domain |
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
CN108536679A (en) * | 2018-04-13 | 2018-09-14 | 腾讯科技(成都)有限公司 | Name entity recognition method, device, equipment and computer readable storage medium |
CN108595708A (en) * | 2018-05-10 | 2018-09-28 | 北京航空航天大学 | A kind of exception information file classification method of knowledge based collection of illustrative plates |
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN109255119A (en) * | 2018-07-18 | 2019-01-22 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition |
CN109766417A (en) * | 2018-11-30 | 2019-05-17 | 浙江大学 | A kind of construction method of the literature annals question answering system of knowledge based map |
CN110046709A (en) * | 2019-04-22 | 2019-07-23 | 成都新希望金融信息有限公司 | A kind of multi-task learning model based on two-way LSTM |
CN110134954A (en) * | 2019-05-06 | 2019-08-16 | 北京工业大学 | A kind of name entity recognition method based on Attention mechanism |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
-
2019
- 2019-12-31 CN CN201911417834.1A patent/CN111209738B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119050A1 (en) * | 2009-11-18 | 2011-05-19 | Koen Deschacht | Method for the automatic determination of context-dependent hidden word distributions |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN108153895A (en) * | 2018-01-06 | 2018-06-12 | 国网福建省电力有限公司 | A kind of building of corpus method and system based on open data |
CN108228568A (en) * | 2018-01-24 | 2018-06-29 | 上海互教教育科技有限公司 | A kind of mathematical problem semantic understanding method |
CN108229582A (en) * | 2018-02-01 | 2018-06-29 | 浙江大学 | Entity recognition dual training method is named in a kind of multitask towards medical domain |
CN108415977A (en) * | 2018-02-09 | 2018-08-17 | 华南理工大学 | One is read understanding method based on the production machine of deep neural network and intensified learning |
CN108536679A (en) * | 2018-04-13 | 2018-09-14 | 腾讯科技(成都)有限公司 | Name entity recognition method, device, equipment and computer readable storage medium |
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN108595708A (en) * | 2018-05-10 | 2018-09-28 | 北京航空航天大学 | A kind of exception information file classification method of knowledge based collection of illustrative plates |
CN109255119A (en) * | 2018-07-18 | 2019-01-22 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition |
CN109766417A (en) * | 2018-11-30 | 2019-05-17 | 浙江大学 | A kind of construction method of the literature annals question answering system of knowledge based map |
CN110046709A (en) * | 2019-04-22 | 2019-07-23 | 成都新希望金融信息有限公司 | A kind of multi-task learning model based on two-way LSTM |
CN110134954A (en) * | 2019-05-06 | 2019-08-16 | 北京工业大学 | A kind of name entity recognition method based on Attention mechanism |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
Non-Patent Citations (5)
Title |
---|
GAMAL CRICHTON等: "A neural network multi-task learning approach to biomedical named entity recognition", 《BMC BIOINFORMATICS》 * |
TUNG TRAN等: "A Multi-Task Learning Framework for Extracting Drugs and Their Interactions from Drug Labels", 《HTTPS://ARXIV.ORG/PDF/1905.07464.PDF》 * |
XI WANG等: "Multitask learning for biomedical named entity recognition with cross-sharing structure", 《BMC BIOINFORMATICS》 * |
奚雪峰等: "面向自然语言处理的深度学习研究", 《自动化学报》 * |
陈伟等: "基于BiLSTM-CRF的关键词自动抽取", 《计算机科学》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111859936A (en) * | 2020-07-09 | 2020-10-30 | 大连理工大学 | Cross-domain establishment oriented legal document professional jurisdiction identification method based on deep hybrid network |
CN112085251A (en) * | 2020-08-03 | 2020-12-15 | 广州数说故事信息科技有限公司 | Consumer product research and development combined concept recommendation method and system |
CN113743111A (en) * | 2020-08-25 | 2021-12-03 | 国家计算机网络与信息安全管理中心 | Financial risk prediction method and device based on text pre-training and multi-task learning |
CN112039997A (en) * | 2020-09-03 | 2020-12-04 | 重庆邮电大学 | Triple-feature-based Internet of things terminal identification method |
CN112052684A (en) * | 2020-09-07 | 2020-12-08 | 南方电网数字电网研究院有限公司 | Named entity identification method, device, equipment and storage medium for power metering |
CN112541355A (en) * | 2020-12-11 | 2021-03-23 | 华南理工大学 | Few-sample named entity identification method and system with entity boundary class decoupling |
CN112541355B (en) * | 2020-12-11 | 2023-07-18 | 华南理工大学 | Entity boundary type decoupling few-sample named entity recognition method and system |
CN113064993A (en) * | 2021-03-23 | 2021-07-02 | 南京视察者智能科技有限公司 | Design method, optimization method and labeling method of automatic text classification labeling system based on big data |
CN113064993B (en) * | 2021-03-23 | 2023-07-21 | 南京视察者智能科技有限公司 | Design method, optimization method and labeling method of automatic text classification labeling system based on big data |
CN113204970A (en) * | 2021-06-07 | 2021-08-03 | 吉林大学 | BERT-BilSTM-CRF named entity detection model and device |
CN113254617B (en) * | 2021-06-11 | 2021-10-22 | 成都晓多科技有限公司 | Message intention identification method and system based on pre-training language model and encoder |
CN113254617A (en) * | 2021-06-11 | 2021-08-13 | 成都晓多科技有限公司 | Message intention identification method and system based on pre-training language model and encoder |
CN113255342A (en) * | 2021-06-11 | 2021-08-13 | 云南大学 | Method and system for identifying product name of 5G mobile service |
CN113255342B (en) * | 2021-06-11 | 2022-09-30 | 云南大学 | Method and system for identifying product name of 5G mobile service |
CN114048749A (en) * | 2021-11-19 | 2022-02-15 | 重庆邮电大学 | Chinese named entity recognition method suitable for multiple fields |
CN114048749B (en) * | 2021-11-19 | 2024-02-02 | 北京第一因科技有限公司 | Chinese named entity recognition method suitable for multiple fields |
CN114036933A (en) * | 2022-01-10 | 2022-02-11 | 湖南工商大学 | Information extraction method based on legal documents |
CN114036933B (en) * | 2022-01-10 | 2022-04-22 | 湖南工商大学 | Information extraction method based on legal documents |
CN115688777B (en) * | 2022-09-28 | 2023-05-05 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
CN115688777A (en) * | 2022-09-28 | 2023-02-03 | 北京邮电大学 | Named entity recognition system for nested and discontinuous entities of Chinese financial text |
CN116074317A (en) * | 2023-02-20 | 2023-05-05 | 王春辉 | Service resource sharing method and server based on big data |
CN116074317B (en) * | 2023-02-20 | 2024-03-26 | 新疆八达科技发展有限公司 | Service resource sharing method and server based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN111209738B (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111209738B (en) | Multi-task named entity recognition method combining text classification | |
CN113011533B (en) | Text classification method, apparatus, computer device and storage medium | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN110209806B (en) | Text classification method, text classification device and computer readable storage medium | |
Wang et al. | Mapping customer needs to design parameters in the front end of product design by applying deep learning | |
CN110188272B (en) | Community question-answering website label recommendation method based on user background | |
Sun et al. | Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features | |
CN110807320B (en) | Short text emotion analysis method based on CNN bidirectional GRU attention mechanism | |
CN110765260A (en) | Information recommendation method based on convolutional neural network and joint attention mechanism | |
CN112002411A (en) | Cardiovascular and cerebrovascular disease knowledge map question-answering method based on electronic medical record | |
CN106569998A (en) | Text named entity recognition method based on Bi-LSTM, CNN and CRF | |
CN110674252A (en) | High-precision semantic search system for judicial domain | |
CN110489523B (en) | Fine-grained emotion analysis method based on online shopping evaluation | |
CN112001186A (en) | Emotion classification method using graph convolution neural network and Chinese syntax | |
CN112328900A (en) | Deep learning recommendation method integrating scoring matrix and comment text | |
CN107818084B (en) | Emotion analysis method fused with comment matching diagram | |
CN106708929B (en) | Video program searching method and device | |
CN112884551B (en) | Commodity recommendation method based on neighbor users and comment information | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
KR102155768B1 (en) | Method for providing question and answer data set recommendation service using adpative learning from evoloving data stream for shopping mall | |
CN111274790A (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
CN112966068A (en) | Resume identification method and device based on webpage information | |
CN111274829A (en) | Sequence labeling method using cross-language information | |
CN111582506A (en) | Multi-label learning method based on global and local label relation | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |