CN110765775B - Self-adaptive method for named entity recognition field fusing semantics and label differences - Google Patents
Self-adaptive method for named entity recognition field fusing semantics and label differences Download PDFInfo
- Publication number
- CN110765775B CN110765775B CN201911059048.9A CN201911059048A CN110765775B CN 110765775 B CN110765775 B CN 110765775B CN 201911059048 A CN201911059048 A CN 201911059048A CN 110765775 B CN110765775 B CN 110765775B
- Authority
- CN
- China
- Prior art keywords
- sentence
- character
- sentences
- vector
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
on the basis of a traditional Bi-L STM + CRF model, in order to fuse the semantic difference and the label difference of sentences in the source field and the target field, the semantic difference and the label difference are introduced by strengthening state representation and reward setting in learning, so that a trained decision network can select sentences which have positive influence on the recognition performance of named entities in the target field in the data of the source field, expand the training data of the target field, solve the problem of insufficient training data of the target field and improve the recognition performance of the named entities in the target field.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a method for fusing semantic difference and label difference between fields and carrying out field migration on a named entity recognition task.
Background
In recent years, deep learning and machine learning have made great progress in computer vision and natural language processing. In the aspect of computer vision, people classify images by using a deep neural network, such as a convolutional neural network for recognizing handwritten figures, and achieve the accuracy rate exceeding the recognition rate of human beings in the aspect; in the aspect of natural language processing, deep learning is more applied to various life scenes, such as analyzing browsing records and consumption behaviors of a user by using a neural network, pushing products which the user may like, and training a translation system by using a large amount of parallel corpora to enable a machine to achieve a high level of translation capability. With the increase of internet users, more and more information is generated, and how to automatically extract useful information from the large amount of user information has a very important meaning. The Chinese named entity recognition is used as an upstream task of information extraction, and the development of the Chinese named entity recognition is very critical to the information extraction technology.
The Chinese named entity recognition refers to recognition of entities with specific meanings in texts, and generally comprises names of people, places, time and the like. The named entity recognition is performed on the text because many downstream tasks need entity information in the text, for example, information extraction is very concerned about the entity information in the text, relationship extraction is needed to know the entity information in the text, then the relationship between the entities is determined, and the named entity recognition has very important significance for machine translation and knowledge graph construction.
Chinese named entity recognition typically involves two processes: (1) determining the boundary of an entity; (2) the type of the entity is identified. Generally, we look at named entity identification as a problem with sequence labeling, employing labeling rules to label both the type and the boundaries of an entity. Traditional methods for named entity recognition include maximum entropy models, support vector machine models, and conditional random fields. In recent years, advanced learning methods such as recurrent neural networks, convolutional neural networks and the like are also widely applied to Chinese named entity recognition, and higher accuracy is achieved on a plurality of large corpora.
Deep learning is a feature that allows a neural network to capture data automatically, and a large amount of data is often required to obtain high accuracy. However, in the aspect of Chinese named entity recognition, the existing large corpus is only in the news field, and the labeled corpus in the microblog field is few, so that the trained neural network cannot achieve good accuracy in the field. In recent years, in order to improve accuracy, a migration learning mode is adopted for named entity recognition tasks in the fields of microblogs and the like, and the performance of named entity recognition models in the fields of microblogs and the like is improved mainly through large-scale corpora outside the fields of news and the like.
In domain migration, corpora with large-scale annotations are referred to as source domain data, and corpora with no annotations or only a small amount of annotations are referred to as target domain data. Meanwhile, domain migration using unmarked target domain data is called unsupervised domain migration, and domain migration using a small amount of marked target domain data is called semi-supervised domain migration.
There are two problems with the domain migration of Chinese named entity recognition: firstly, there is a great difference between the sentence semantics of the corpus, and secondly, there is a difference between the tag sets of the sentences of the corpus, which is caused by different labeling rules. In order to solve these problems, the existing domain migration technology migrates based on semantic vectors of sentences in different corpora on one hand and on conversion relationships of different labels of sentences in different corpora on the other hand.
In the article "A Unified Model for Cross-Domain and Semi-Supervised Named EntityRecognization in Chinese Social Media", authors perform Domain migration for Named entity recognition based on the similarity between sentences in the source corpus and the target Domain corpus.
Firstly, training a word vector generation model by using a large number of sentences of an unlabeled corpus to obtain a pre-trained word vector dictionary, then searching and obtaining a word vector corresponding to each word in a source field and a target field according to the dictionary, then averaging all the word vectors in the sentences to obtain an expression form of a sentence vector of each sentence, and finally calculating a learning rate corresponding to training of each sentence according to the following formula.
α(x)=α0(x)*func(x,IN)
Wherein v is xsentence vector, alpha, referring to sentences of the source domain 0Is the learning rate of the target domain sentence, C is an adjustable parameter.
in the article "Named Entity registration for Novel Types by Transfer L earning", authors propose to use a two-layer linear network to learn the correlation between the labels of the source domain and the target domain for domain migration.
Firstly, a named entity recognition model is trained by using a large amount of data of a source domain, then the relevance of labels between the source domain and a target is learned by using a double-layer linear network, and finally a conditional random field is trained by using data of a target domain to obtain an output label of the target domain.
the inventor finds in the research process that for the 'A Unified Model for Cross-Domain and semi-Supervised Named Entity Recognition in Chinese Social Media', 'Named Entity Recognition for Novel Types by Transfer L reading':
1. Conventionally, whether a current source domain sentence is beneficial to training a target domain naming recognition model is judged according to semantic similarity of a source domain and a target domain, and influence of label difference of an entity is not considered.
2. When the label transfer relationship between the source domain and the target domain is adopted for migration, the situation that semantic vectors of sentences in the source domain and the target domain are too different is not fully considered.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for carrying out domain migration on a named entity task by fusing semantics and label differences, which introduces the semantics differences and the label differences between a source domain and a target domain through state representation and reward setting in deep reinforcement learning, trains a decision network, and selectively adds data of the source domain into a training process, so that positive sample data in the source domain can enhance the named entity recognition performance of the target domain, and simultaneously avoids the influence of negative sample data in the source domain on the target domain.
The invention provides a method for performing domain migration by fusing semantic difference and label difference of texts in a source domain and a target domain. And selectively adding data of the source field by a reinforcement learning training decision network to enhance the performance of named entity recognition of the target field.
The method comprises the steps of firstly, preprocessing sentences in a corpus of a source field and a target field, removing websites and special symbols in the source field and the target field, and converting complex and simplified sentences into Chinese simplified sentences.
And step two, processing the labels of the sentences in the source field corpus, and unifying the entity label sets of the target field and the source field.
And step three, mapping the sentences in the source field and the sentences in the target field into vector representations according to the same dictionary, and digitizing the input text into a numerical matrix formed by connecting character vector columns.
And step four, in order to enhance the representation of the character vector, splicing the word segmentation label and the bigram vector of each character behind the character vector to introduce word level information and word segmentation information.
and step five, extracting context-related feature vectors of each character by adopting a Bidirectional long-Short term memory neural network (Bi-L STM), and obtaining the probability of each character entity label by using a linear layer.
And sixthly, decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence.
And step seven, performing the operations from the first step to the sixth step by using the corpus of the target domain to obtain the named entity recognition model of the target domain.
And step eight, obtaining the state representation and the current reward of each sentence in the source field by adopting the named entity recognition model obtained in the step seven.
And step nine, the decision network makes corresponding actions according to the state representation of the sentences in the current source field, judges whether to add the current sentences into the training data, and then calculates the loss function of the decision network according to the reward of each sentence to perform gradient back propagation.
Combining the source domain sentences selected by the decision network with the target domain sentences to obtain expanded training data, and continuing to train the named entity recognition model of the target domain.
And step eleven, continuously repeating the step eight to the step ten, selecting the model with the maximum F value obtained on the development set, carrying out model test, and storing the model.
Further, in the non-training case, the steps one to ten are replaced by:
Step one, a sentence in a target field corpus is used as an input of a trained named entity recognition model;
Mapping each character of a sentence in the target corpus to corresponding vector representation by using a character vector dictionary in the training process;
Inputting the vector representation of each sentence into a bidirectional long-short term memory neural network to obtain the feature representation of each sentence related to the context;
Inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence;
And step five, inputting the label prediction probability of each character into the conditional random field, and decoding to obtain an optimal sequence to obtain a result of named entity recognition.
Further, in the third step, mapping the chinese characters in the target domain and the source domain into vector representations by using the same dictionary, including:
Randomly initializing a mapping dictionary by adopting a word embedding method, randomly initializing the same dense vector representation for the same characters, and mapping each Chinese character of the corpus data into the dense vector representation through the mapping dictionary;
Training a word vector to obtain a vector representation containing certain word information by adopting a word Bag model Skip-Gram or Continuous Bag-of-Words (CBOW), and mapping each Chinese character of the corpus data into a dense vector representation through a mapping dictionary.
Further, in the fourth step, in order to enhance the representation of the character vectors, the word segmentation information and the bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
Wherein c is iIs the character vector of the ith character in the sentence, b iIs the corresponding bigram vector, seg iThen it is a word segmentation tag.
Further, in the fifth step, the numerical matrix is input into the bidirectional long-short term memory neural network to obtain the feature representation, and the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
Wherein f is t、it、Outputs representing states of forgetting gate, memory gate and temporary cell, respectively, C tIs the cell state at the present time, o tIs the output of the current output gate, h tThe output of the hidden layer state is taken as the characteristic representation of each character.
Further, the calculation method of the state representation and the reward of the sentences in the source field in the step eight is as follows:
st=(h1+h2+…+hn)/n
reward=log P(Y|X)
Wherein h is 1,h2,…,hnIs the output of the sentence outside the domain after passing through the two-way long-short term memory neural network, and P (Y | X) is the label sequence obtained by conditional random field decoding Probability of column, s tAnd reward is the state representation of the sentence and the reward the current sentence received in the named entity recognition model.
Further, the decision network in the ninth step has the following judgment mode:
a=softmax(W·st+b)
where W, b are the weight parameters of the selector, softmax is the normalization operation, a ∈ R 2x1Is the action of the output of the selector, we adopt the multilayer perceptron as our decision network, the decision network makes the corresponding action a according to the current state of each sentence, if a 0>And 0.5, selecting a sentence and adding the sentence into the training data, otherwise, discarding the sentence, obtaining corresponding reward at the same time, calculating a loss function of the decision network, and performing reverse gradient propagation.
The loss function is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is 1,L2is L of a selector 1,L2The regularization parameter, reward, is the reward that the current sentence receives in the named entity recognition model.
The invention provides a method for carrying out domain migration on a named entity recognition task by fusing semantics and label differences, which adopts a neural network to replace a decision network in reinforcement learning, thereby avoiding the problem of infinite state space in natural language processing; and meanwhile, semantic difference and label difference between a source field and a target field are introduced by using state representation and reward setting in reinforcement learning, and training of a decision network is carried out, so that the decision network can select sentences which have positive influence on a target domain named entity recognition model, and the field migration based on examples on Chinese named entity recognition is realized.
Drawings
FIG. 1 is a flow chart of a first embodiment;
Fig. 2 is a network structure diagram of a domain migration method for fusing semantic and tag differences on a named entity recognition task according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Wherein, the abbreviations and key terms appearing in this embodiment are defined as follows:
BP is Back Propagation Back Propagation;
CRF, Conditional Random Field;
Bi-L STM, Bidirectional L ong Short-Term Memory neural network;
Real-time instance one
Referring to fig. 1 and 2, the present invention provides a method for performing domain migration on a named entity recognition task by fusing semantic and tag differences, and specifically, during training, the method includes:
The method comprises the steps of firstly, preprocessing sentences in a corpus of a source domain and a target domain, removing websites and special symbols in the source domain and the target domain, and converting complex and simplified sentences into Chinese simplified sentences.
and step two, processing the labels of sentences in the corpus of the source field, and using the labels to unify entity label sets of the source field and the target field.
And step three, mapping the sentences in the source field and the sentences in the target field into vector representations according to the same dictionary, and digitizing the input text into a numerical matrix formed by connecting character vector columns.
Further, a mapping dictionary which is initialized randomly is adopted, a word embedding method is adopted to initialize the same dense vector representation for the same characters randomly, and then each Chinese character of the corpus data is mapped into the dense vector representation through the mapping dictionary;
When training a word vector, a Glove model is adopted, a vector representation containing certain word information is obtained through training, and each Chinese character of the corpus data is mapped into a dense vector representation through a mapping dictionary.
In the embodiment, a large amount of unlabeled target domain corpus and source domain data obtained from a web crawler are adopted to pre-train a word vector model, a word vector mapping dictionary is constructed, and the same character vectors are consistent for each character label; for characters that do not appear in the dictionary, random initialization is employed.
And step four, in order to enhance the representation of the character vector, splicing the word segmentation label and the bigram vector of each character behind the character vector to introduce word level information and word segmentation information.
Specifically, in order to enhance the representation of the character vectors, the word segmentation information and the bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
Wherein c is iIs the character vector of the ith character in the sentence, b iIs the corresponding bigram vector, seg iThen it is a Word Segmentation tag, and we use the Word Segmentation tool (Neural Word Segmentation with richprediction, Yang et al 2017a) when segmenting words in the target domain corpus.
and step five, extracting context-related feature vectors of each character by adopting a Bidirectional long-Short term memory neural network (Bi-L STM), and obtaining the probability of various entity labels of each character by using a linear layer.
Inputting the numerical matrix into a bidirectional long-short term memory neural network to obtain a characteristic representation, wherein the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
Wherein f is t、it、Outputs representing states of forgetting gate, memory gate and temporary cell, respectively, C tIs the cell state at the present time, o tIs the output of the current output gate, h tThe output of the hidden layer state is taken as the characteristic representation of each character.
And sixthly, decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence.
And step seven, performing the operations from the first step to the sixth step by using the corpus of the target domain to obtain the named entity recognition model of the target domain.
And step eight, obtaining the state representation and the current reward of the sentences in the source field by adopting the named entity recognition model obtained in the step seven.
The state representation and the reward of the source domain sentence are calculated as follows:
st=(h1+h2+…+hn)/n
reward=log P(Y|X)
Wherein h is 1,h2,…,hnIs the output of the sentence outside the domain after passing through the two-way long-short term memory neural network, P (Y | X) is the probability of the label sequence obtained by conditional random field decoding, s tAnd reward is a state representation of a sentence and the current sentence is The named entity identifies the reward earned in the model.
And step nine, the decision network makes corresponding actions according to the state representation of the current sentence, judges whether the current sentence is added into the training data, and simultaneously calculates the loss function of the decision network according to the reward of each sentence to perform gradient reverse propagation.
The decision network in the ninth step has the following judgment mode:
a=softmax(W·st+b)
wherein W, b are weight parameters of the decision network, softmax is a normalization operation, a ∈ R 2x1The decision network is an output action of the decision network, a multilayer perceptron is adopted as the decision network, the decision network makes a corresponding action a according to the current state of each sentence, if a 0>And 0.5, selecting a sentence, adding training data, obtaining corresponding reward, calculating a loss function, and performing inverse gradient propagation.
The loss function is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is 1,L2Is the regularization parameter of the decision network and reward is the reward that the current sentence receives in the named entity recognition model.
Combining the source field sentences selected by the decision network with the sentences of the target field linguistic data to obtain expanded training data, and continuing to train the named entity recognition model of the target field.
And step eleven, continuously repeating the step eight to the step ten, selecting the model with the maximum F value obtained on the development set, testing, and storing the model.
In the non-training case, the steps one to ten are replaced by:
Step one, sentences in a target domain corpus are used as input of a neural network;
Mapping each character of a sentence in the target domain corpus to corresponding vector representation by using a character vector dictionary in the training process;
and step three, inputting the vector representation of each sentence into a Bidirectional long-Short Term Memory neural network (Bi-L STM) to obtain the feature representation related to each sentence and context.
And step four, inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence.
And step five, inputting the label prediction probability of each character into a Conditional Random Field (CRF), decoding to obtain an optimal sequence, and finishing entity identification.
A preferred embodiment includes the steps of firstly mapping each character in a sentence into a dense vector, enabling the vector dimension to be n, extracting and obtaining the characteristic of each character in a sentence through a bidirectional long-short term memory neural network, inputting the state of each sentence into a decision network of reinforcement learning training for data of a source field to obtain corresponding action and reward, determining whether to add a current sentence into training data according to the action, meanwhile, calculating the L oss of the decision network according to the feedback reward, conducting back propagation, updating the decision network, directly adding training data for the sentence of a target field without selection, retraining an entity recognition model by the obtained training data, calculating the corresponding L oss, conducting back propagation, and updating the parameters of the entity recognition model.
The invention provides a method for carrying out domain migration on a named entity recognition task by fusing semantics and label differences, which adopts a neural network to replace a decision network in reinforcement learning, thereby avoiding the problem of infinite state space in natural language processing; meanwhile, semantic difference and label difference between a source field and a target field are introduced by using state representation and reward setting in reinforcement learning, and training of a decision network is carried out, so that the decision network can select sentences having positive influence on a target domain named entity recognition model; by utilizing the existing large-scale labeling data, the named entity identification accuracy in the target domain is improved, and the pressure of manual labeling of linguistic data is relieved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (7)
1. A named entity recognition field self-adaptive method fusing semantic and label differences is characterized in that the semantic and label differences are introduced in a deep reinforcement learning mode, a decision network is trained, data of a source field are selectively added, and training data of a target field are expanded, and the method comprises the following steps:
(1) Preprocessing sentences in the target corpus to remove websites and special symbols in the sentences, and performing complex and simplified body conversion to convert all the sentences in the target corpus into Chinese simplified bodies;
(2) Processing labels of sentences in the corpus of the source field and the target field, and unifying label sets of entities in different corpora;
(3) The method comprises the steps of mapping sentences in a source field and sentences in a target field into vector representations according to the same dictionary, and digitizing input texts into a numerical matrix formed by connecting character vector columns;
(4) In order to enhance the representation of the character vector, the word segmentation label and the bigram vector of each character are spliced behind the character vector to introduce word level information and word segmentation information;
(5) extracting context-related feature vectors of each character by adopting a Bidirectional long-Short Term Memory neural network (Bi-L STM), and obtaining the probability of various entity labels of each character by using a linear layer;
(6) Decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence;
(7) Carrying out the operations of the steps (1) to (6) by using a target corpus to obtain a named entity recognition model trained by a target domain;
(8) Obtaining the state representation and the current reward of each sentence in the source field in a reinforcement learning mode by adopting the named entity recognition model obtained in the step (7);
(9) Training a decision network by using a deep reinforcement learning mode, wherein the decision network makes corresponding actions according to the state representation of the current sentence, judges whether the current sentence is added into training data or not, and then obtains rewards after the actions are executed for calculating a loss function of the decision network and performing gradient back propagation;
(10) Combining positive samples in the source field selected by the decision network with sentences in the target field corpus, expanding training data, and continuing to train the named entity recognition model of the target field;
(11) And (5) continuously repeating the steps (8) to (10), selecting the model which obtains the maximum F value on the target domain development set, carrying out model test, and storing the model.
2. The method of claim 1, wherein in the untrained case, the process of entity recognition comprises the steps of:
(2.1) taking sentences in the target domain corpus as input of the trained named entity recognition model of the target domain;
(2.2) mapping sentences in the target domain corpus to corresponding vector representations through a dictionary by utilizing the character vector dictionary in the training process;
(2.3) inputting the vector representation of each sentence into the bidirectional long-short term memory neural network, and acquiring the feature representation of each sentence related to the context;
(2.4) inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence;
And (2.5) inputting the label prediction probability of each character into the conditional random field, and decoding to obtain an optimal sequence to obtain a result of recognizing each sentence named entity.
3. The method of claim 1, wherein in step (3), mapping the chinese characters of the target domain and the source domain into a vector representation using the same dictionary comprises:
(3.1) randomly initializing a mapping dictionary, adopting a word embedding method, randomly initializing the same dense vector representation for the same characters, and mapping each Chinese character of the corpus data into dense vector representation through the mapping dictionary;
And (3.2) training word vectors to obtain vector representation containing certain word information by adopting a Bag-of-Words model Skip-Gram or Continuous Bag-of-Words (CBOW), and mapping each Chinese character of the corpus data into dense vector representation through a mapping dictionary.
4. The method according to claim 1, wherein in step (4), in order to enhance the representation of the character-level vectors, the participle tag information and bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
Wherein c is iIs the character vector of the ith character in the sentence, b iIs the corresponding bigram vector, seg iThen it is a word segmentation tag.
5. The method as claimed in claim 1, wherein in the step (5), the numerical matrix is input into the bidirectional long-short term memory neural network to obtain the feature representation, and the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
Wherein f is t、it、Outputs representing states of forgetting gate, memory gate and temporary cell, respectively, C tIs the cell state at the present time, o tIs the output of the current output gate, h tIs the output of the hidden state, which is taken as a characteristic representation of each character.
6. The method as claimed in claim 1, wherein the state representation and the award of the source domain sentence in the step (8) are calculated as follows:
st=(h1+h2+…+hn)/n
Wherein h is 1,h2,…,hnthe method is characterized in that a sentence in a source field is output in a hidden layer state obtained by a bidirectional long-short term memory neural network, and a Bi-L STM output hidden layer state summation mode is adopted as a state representation s of a named entity recognition model of a current sentence in a target domain t。
7. The method of claim 1, wherein the selector in step (9) is determined in the following manner:
a=softmax(W·st+b)
wherein W, b are parameters of the decision network, softmax is a normalization operation, a ∈ R 2x1Representing the action of the decision network in the current state, taking the multilayer perceptron as the decision network, and making a corresponding action a by the decision network according to the current state of each sentence if a 0If the value is more than 0.5, selecting sentences in the source field to be added into the training data, otherwise, discarding the sentences, then maximizing according to the reward, and finally, passing through the current sentences Obtaining rewards in a named entity recognition model of the target domain, calculating a loss function, and performing reverse gradient propagation;
The loss function of the decision network is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is 1,L2is L of a decision network 1,L2The regularization parameter, reward, is the reward that the current sentence receives in the named entity recognition model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911059048.9A CN110765775B (en) | 2019-11-01 | 2019-11-01 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911059048.9A CN110765775B (en) | 2019-11-01 | 2019-11-01 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110765775A CN110765775A (en) | 2020-02-07 |
CN110765775B true CN110765775B (en) | 2020-08-04 |
Family
ID=69335232
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911059048.9A Active CN110765775B (en) | 2019-11-01 | 2019-11-01 | Self-adaptive method for named entity recognition field fusing semantics and label differences |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110765775B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111522965A (en) * | 2020-04-22 | 2020-08-11 | 重庆邮电大学 | Question-answering method and system for entity relationship extraction based on transfer learning |
CN111666734B (en) * | 2020-04-24 | 2021-08-10 | 北京大学 | Sequence labeling method and device |
CN113553849A (en) * | 2020-04-26 | 2021-10-26 | 阿里巴巴集团控股有限公司 | Model training method, recognition method, device, electronic equipment and computer storage medium |
CN111611802B (en) * | 2020-05-21 | 2021-08-31 | 苏州大学 | Multi-field entity identification method |
CN111738003B (en) * | 2020-06-15 | 2023-06-06 | 中国科学院计算技术研究所 | Named entity recognition model training method, named entity recognition method and medium |
CN111767718B (en) * | 2020-07-03 | 2021-12-07 | 北京邮电大学 | Chinese grammar error correction method based on weakened grammar error feature representation |
CN112163372B (en) * | 2020-09-21 | 2022-05-13 | 上海玫克生储能科技有限公司 | SOC estimation method of power battery |
CN112084783B (en) * | 2020-09-24 | 2022-04-12 | 中国民航大学 | Entity identification method and system based on civil aviation non-civilized passengers |
CN112199511B (en) * | 2020-09-28 | 2022-07-08 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Cross-language multi-source vertical domain knowledge graph construction method |
CN112528894B (en) * | 2020-12-17 | 2024-05-31 | 科大讯飞股份有限公司 | Method and device for discriminating difference term |
CN112925886B (en) * | 2021-03-11 | 2022-01-04 | 杭州费尔斯通科技有限公司 | Few-sample entity identification method based on field adaptation |
CN113342904B (en) * | 2021-04-01 | 2021-12-24 | 山东省人工智能研究院 | Enterprise service recommendation method based on enterprise feature propagation |
CN112966517B (en) * | 2021-04-30 | 2022-02-18 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for named entity recognition model |
CN114239549A (en) * | 2021-12-21 | 2022-03-25 | 沈阳东软智能医疗科技研究院有限公司 | Training sample processing method, device, medium and electronic equipment |
CN115221871B (en) * | 2022-06-24 | 2024-02-20 | 毕开龙 | Multi-feature fusion English scientific literature keyword extraction method |
CN115577707B (en) * | 2022-12-08 | 2023-04-07 | 中国传媒大学 | Word segmentation method for multi-language news subject words |
CN117744660B (en) * | 2024-02-19 | 2024-05-10 | 广东省人民医院 | Named entity recognition method and device based on reinforcement learning and migration learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN109871541A (en) * | 2019-03-06 | 2019-06-11 | 电子科技大学 | It is a kind of suitable for multilingual multi-field name entity recognition method |
CN109871538A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of Chinese electronic health record name entity recognition method |
CN110175227A (en) * | 2019-05-10 | 2019-08-27 | 神思电子技术股份有限公司 | A kind of dialogue auxiliary system based on form a team study and level reasoning |
CN110196980A (en) * | 2019-06-05 | 2019-09-03 | 北京邮电大学 | A kind of field migration based on convolutional network in Chinese word segmentation task |
CN110209770A (en) * | 2019-06-03 | 2019-09-06 | 北京邮电大学 | A kind of name entity recognition method based on policy value network and tree search enhancing |
-
2019
- 2019-11-01 CN CN201911059048.9A patent/CN110765775B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108664589A (en) * | 2018-05-08 | 2018-10-16 | 苏州大学 | Text message extracting method, device, system and medium based on domain-adaptive |
CN108874997A (en) * | 2018-06-13 | 2018-11-23 | 广东外语外贸大学 | A kind of name name entity recognition method towards film comment |
CN109871538A (en) * | 2019-02-18 | 2019-06-11 | 华南理工大学 | A kind of Chinese electronic health record name entity recognition method |
CN109871541A (en) * | 2019-03-06 | 2019-06-11 | 电子科技大学 | It is a kind of suitable for multilingual multi-field name entity recognition method |
CN110175227A (en) * | 2019-05-10 | 2019-08-27 | 神思电子技术股份有限公司 | A kind of dialogue auxiliary system based on form a team study and level reasoning |
CN110209770A (en) * | 2019-06-03 | 2019-09-06 | 北京邮电大学 | A kind of name entity recognition method based on policy value network and tree search enhancing |
CN110196980A (en) * | 2019-06-05 | 2019-09-03 | 北京邮电大学 | A kind of field migration based on convolutional network in Chinese word segmentation task |
Non-Patent Citations (2)
Title |
---|
Named Entity Recognition for Novel Types by Transfer Learning;Lizhen Qu等;《Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing》;20161205;第899–905页 * |
基于强化学习的实体关系联合抽取模型;陈佳沣 等;《计算机应用》;20190710;第39卷(第7期);1918-1924 * |
Also Published As
Publication number | Publication date |
---|---|
CN110765775A (en) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110765775B (en) | Self-adaptive method for named entity recognition field fusing semantics and label differences | |
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN111967266B (en) | Chinese named entity recognition system, model construction method, application and related equipment | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
CN110196980B (en) | Domain migration on Chinese word segmentation task based on convolutional network | |
CN111858944B (en) | Entity aspect level emotion analysis method based on attention mechanism | |
CN111783462A (en) | Chinese named entity recognition model and method based on dual neural network fusion | |
CN110263325B (en) | Chinese word segmentation system | |
CN111401061A (en) | Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention | |
CN108062388A (en) | Interactive reply generation method and device | |
CN109214006B (en) | Natural language reasoning method for image enhanced hierarchical semantic representation | |
Hu et al. | Emphasizing essential words for sentiment classification based on recurrent neural networks | |
CN111125367B (en) | Multi-character relation extraction method based on multi-level attention mechanism | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN111540470A (en) | Social network depression tendency detection model based on BERT transfer learning and training method thereof | |
CN114579741B (en) | GCN-RN aspect emotion analysis method and system for fusing syntax information | |
CN115730232A (en) | Topic-correlation-based heterogeneous graph neural network cross-language text classification method | |
CN113191150B (en) | Multi-feature fusion Chinese medical text named entity identification method | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
CN116720519B (en) | Seedling medicine named entity identification method | |
CN112699684A (en) | Named entity recognition method and device, computer readable storage medium and processor | |
CN112560440A (en) | Deep learning-based syntax dependence method for aspect-level emotion analysis | |
CN112417118A (en) | Dialog generation method based on marked text and neural network | |
CN117035077A (en) | Difficulty controllable problem generation method based on soft template and inverse fact reasoning | |
CN111737470A (en) | Text classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |