CN110826334B

CN110826334B - Chinese named entity recognition model based on reinforcement learning and training method thereof

Info

Publication number: CN110826334B
Application number: CN201911089295.3A
Authority: CN
Inventors: 叶梅; 卓汉逵
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2023-04-21
Anticipated expiration: 2039-11-08
Also published as: CN110826334A

Abstract

The invention relates to a Chinese named entity recognition model based on reinforcement learning and a training method thereof, wherein the model comprises a strategy network module, a word segmentation recombination network and a named entity recognition network module; firstly, a strategy network designates an action sequence, then a word segmentation and recombination network executes actions in the action sequence one by one, a phrase is obtained through 'terminating' actions, the phrase is used as auxiliary input information, a lattice-LSTM modeling is carried out to obtain a hidden state sequence, the hidden state is input into a named entity recognition network, a label sequence of sentences is obtained, and a recognition result is used as a delay rewarding guiding strategy network module to update. The invention effectively divides sentences by reinforcement learning, avoids modeling redundant interference words matched in sentences, effectively avoids the influence on dependence and long text of an external dictionary, can better utilize the correct word information, and better helps a Chinese named entity recognition model to improve recognition effect.

Description

Chinese named entity recognition model based on reinforcement learning and training method thereof

Technical Field

The invention relates to the field of machine learning, in particular to a Chinese named entity recognition model based on reinforcement learning and a training method thereof.

Background

Named entity recognition (named entity recognition, NER) is a basic task in the field of natural language processing, and is to recognize named terms from texts, lay down the tasks of relation extraction, question-answering system, syntactic analysis, machine translation and the like, and play an important role in the process of the natural language processing technology going to practical use. In general, the task of identifying named entities is to identify named entities of three major classes (entity class, time class and digit class) and seven minor classes (person name, organization name, place name, time, date, currency and percentage) in the text to be processed.

The existing Chinese named entity recognition model is a lattice-LSTM, and besides each word in a sentence, the model takes cell vectors of all potential words taking the word as a word tail as input, the selection of the potential words depends on an external dictionary, a supplementary gate is additionally added to control the selection of word granularity information and word granularity information, and the input vectors are changed into word information, a last hidden state vector and all word information taking the word as the word tail by word information, a last hidden state vector and a last cell state vector. The model has the advantage that explicit word information can be utilized in a model based on word sequence labeling without encountering word segmentation errors.

However, just because the lattice-LSTM model uses information of all words in a sentence, words composed of adjacent words in the sentence, if present in an external dictionary, are input into the model as registered word granularity information, but the words are not necessarily correctly divided in the sentence, such as: according to the thought of the model, "Changjiang bridge in Nanjing city" uses the login word composed of characters as input according to the sequence, the login word means that the word is a noun recorded in an external dictionary, then "Nanjing", "Nanjing city", "city long", "Changjiang bridge" and "Changjiang bridge" are used as input words in the model, but obviously, the word "city long" is an interfering word in the sentence, and the word information of the word is used to have negative influence on entity identification. In addition, the model typically requires autonomous construction of an external dictionary from the dataset used for the experiment, with severe dependencies on the external dictionary. Meanwhile, when the text length is increased, the number of potential words in the sentence is increased, and the complexity of the model is greatly improved.

Disclosure of Invention

The invention provides a Chinese named entity recognition model based on reinforcement learning and a training method thereof, which aims to solve the problems that in the prior art, redundant interference words matched in sentences are modeled and depend on an external dictionary and influenced by long text, and the internal relation of the sentences is learned by constructing a reinforcement learning model, so that a sentence dividing method related to a named entity recognition task is effectively learned, thereby cutting the sentences and realizing effective sentence division. Therefore, the method can effectively avoid the input of disturbing words and the use of an external dictionary, reduces the number of words in sentences when the text length is increased, and can better help the Chinese named entity recognition model to improve the recognition accuracy by utilizing the correct word information.

In order to solve the technical problems, the invention adopts the following technical scheme: the Chinese named entity recognition model based on reinforcement learning comprises a strategy network module, a word segmentation recombination network and a named entity recognition network module;

the strategy network module is used for sampling an action for each word in the sentence under each state space by adopting a random strategy, so as to obtain an action sequence for the whole sentence, and obtaining delay rewards according to the recognition result of the Chinese named entity recognition network so as to guide the strategy network module to update;

the word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, cutting the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the lattice-LSTM representation of the sentences;

and the named entity recognition network module is used for inputting the hidden state of the lattice-LSTM representation of the sentence into a CRF (conditional random field ) layer, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train a named entity recognition model, and simultaneously guiding the updating of the strategy network module by taking the loss value as a delay reward.

Preferably, the action includes internal or termination.

Preferably, the random strategy is:

π(a _t |s _t ；θ)＝σ(W*s _t +b)

wherein ,π(a_t |s _t The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the selection action a _t Probability of (2); θ= { W, b }, representing parameters of the policy network; s is(s) _t And the state of the policy network at the time t.

Preferably, the word segmentation and recombination network cuts sentences according to the action sequence output by the strategy network module to obtain phrases, and encodes each phrase to be respectively used as the input of the cell state at the last word of the corresponding phrase to obtain the characteristics of the labalice-LSTM of the sentences.

Preferably, the named entity recognition network module inputs the output of a lattice-LSTM obtained by a word segmentation and recombination network into a CRF layer, scores each labeling sequence of the sentence by utilizing a characteristic function set of the CRF layer, indexes and normalizes the score, calculates all possible labeling sequences by using a first-order Viterbi algorithm, takes the sequence with the highest score as the final output, carries out parameter training on the value of a loss function in a back propagation manner, and takes the loss value as a delay rewarding updating strategy network module; the penalty function is defined as the log-likelihood of the sentence level with the L2 regularization term as follows:

wherein lambda is L ₂ A regularization term coefficient; θ represents a parameter set; s and y represent a sentence and a labeling sequence corresponding to the sentence, respectively.

The training method is used for training the Chinese named entity recognition model and comprises the following steps:

step one: inputting sentence data for training into a strategy network module, wherein the strategy network module samples each word in a sentence with one action under each state space, and outputs an action sequence of the whole sentence;

step two: dividing sentences by the word segmentation and recombination network according to the action sequence output by the strategy network module, cutting the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the law-LSTM representation of the word;

step three: the hidden state obtained by the named entity recognition network from the word segmentation and recombination network is input into a CRF layer, a named entity recognition result is finally obtained, a loss value is obtained through calculation according to the recognition result and used for training a named entity recognition model, and meanwhile the loss value is used as a delay reward to guide the updating of the strategy network module;

the sentence is characterized by a lattice-LSTM model, so that the hidden state vector h of each word in the sentence can be obtained _i The state vector sequence h= { H is then ₁ ,h ₂ ,…,h _n Input CRF layer; let y=l ₁ ,l ₂ ,…,l _n Representing the output tag of the CRF layer, the output tag sequence probability is calculated by:

wherein s represents a sentence;

is directed to l _i Model parameters of (2); />

Is directed to l _i-1 and l_i Is set to be a bias parameter of (a); y' represents all possible output tag sets.

The calculation formula of the loss value function is as follows:

wherein lambda is L ₂ A regularization term coefficient; θ represents a parameter set; s and y respectively represent sentences and correct labeling sequences corresponding to the sentences; p denotes the probability that the sentence s is labeled as sequence y, i.e. the probability that the label is correct.

Preferably, in the first step, the action includes internal or termination, and the formula of the random policy is as follows:

π(a _t |s _t ；θ)＝ρ(W*s _t +b)

Preferably, in the second step, the character layer is characterized by the character through the LSTM, and the update formula is as follows:

wherein ,

a transfer function representing LSTM; x is x _t A code vector representing a word input at time t of the sentence; />

and />

The cell state and the hidden state at time t are shown, respectively.

After the division of sentences is completed, phrase information is integrated into a word-granularity-based LSTM model, which is a basic cyclic LSTM function, as follows:

wherein ,

a code vector representing a j-th word in the sentence; />

The hidden state of the j-1 th word moment of the sentence is represented; w (W) ^cT and b^c Is a model parameter; />

Representing input, forget and output gates, respectively; />

Representing a new candidate state; />

Cell states representing the j-1 th word of a sentence; />

Representing the updated cell state; />

The hidden state of the j-th word moment of the sentence is represented by the output gate +.>

And the cell state at the present moment->

Determining; sigma () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function.

Phrase information is characterized by an LSTM model without an output gate, and a specific formula is as follows:

wherein ,

a code vector representing a phrase in the sentence starting from the b-th word and ending with the e-th word; />

The hidden state of the b word moment of the sentence is represented, namely the hidden state of the first word of the phrase; w (W) ^wT and b^w Is a model parameter; />

Representing input and forget gates, respectively; />

Representing a new candidate state; />

A cell state representing the phrase first word; />

Representing the updated cell state; sigma () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function.

Additionally, an additional gate is added to select the granularity of the word and the granularity information of the word, and the cell states of the code vector which is input as the word and the phrase ending with the word are input, and the formula is defined as follows:

wherein ,

a code vector representing an e-th word in the sentence; />

Representing the cell state of a phrase starting from the b-th word and ending with the e-th word, i.e. the cell state of a phrase ending with the e-th word in a sentence; w (W) ^lT and b^l Is a model parameter; />

Representing an additional door; σ () represents a sigmoid function.

The updating mode of the hidden state is changed, the updating of the hidden state is unchanged, and the final representation formula based on the lattice-LSTM model is as follows:

wherein ,

an input gate vector for the j-th word; />

An input gate vector that is a phrase ending with j starting with b;

is the phrase cell state; />

New candidate cell states for the word; />

Is a phrase information vector;

is a word information vector.

Preferably, before the first step, the named entity recognition network and network parameters thereof are pre-trained, and words used by the named entity recognition network are words obtained by dividing an original sentence through a simple heuristic algorithm;

and (3) temporarily fixing the pre-trained partial network parameters of the entity identification network as the network parameters of the named entity identification network, then pre-training the strategy network, and finally jointly training the whole network parameters.

Compared with the prior art, the invention has the beneficial effects that: the Chinese named entity recognition model and the method thereof based on reinforcement learning effectively divide sentences by utilizing reinforcement learning, avoid modeling redundant interference words matched in sentences, effectively avoid dependence on an external dictionary and influence of long texts, and better utilize the correct word information to better help the Chinese named entity recognition model to improve recognition effect.

Drawings

FIG. 1 is a schematic diagram of a Chinese named entity recognition model based on reinforcement learning;

FIG. 2 is a schematic diagram of a strategy network module of a Chinese named entity recognition model based on reinforcement learning according to the present invention;

FIG. 3 is a schematic diagram of a named entity recognition network module based on a reinforcement learning Chinese named entity recognition model according to the present invention;

FIG. 4 is a flow chart of a training method of a Chinese named entity recognition model based on reinforcement learning according to the present invention;

FIG. 5 is a diagram of an example sentence segmentation for a training method of a Chinese named entity recognition model based on reinforcement learning according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent.

The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are orientations or positional relationships indicated by terms "upper", "lower", "left", "right", "long", "short", etc., based on the orientations or positional relationships shown in the drawings, this is merely for convenience in describing the present invention and simplifying the description, and is not an indication or suggestion that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present patent, and that it is possible for those of ordinary skill in the art to understand the specific meaning of the terms described above according to specific circumstances.

The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:

example 1

1-3 are embodiments of a Chinese named entity recognition model based on reinforcement learning, which comprises a strategy network module, a word segmentation recombination network and a named entity recognition network module;

the strategy network module is used for sampling an action (the action comprises the internal part or the termination) for each word in the sentence under each state space by adopting a random strategy, so that an action sequence is obtained for the whole sentence, and delay rewards are obtained according to the recognition result of the Chinese named entity recognition network so as to guide the strategy network module to update; the random strategy is:

π(a _t |s _t ；θ)＝σ(W*s _t +b)

The word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, cutting the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the lattice-LSTM expression of the sentences;

specifically, the word segmentation and recombination network cuts sentences according to the action sequences output by the strategy network module to obtain phrases, and encodes each phrase to be respectively used as the input of the cell state at the last word of the corresponding phrase to obtain the lattice-LSTM representation of the sentences.

And the named entity recognition network module is used for inputting the hidden state of the lattice-LSTM expression of the sentence into the conditional random field, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train the named entity recognition model, and simultaneously guiding the updating of the strategy network module by taking the loss value as a delay reward. Wherein, the calculation formula of the loss value is as follows,

The working principle of the embodiment is as follows: firstly, a strategy network designates an action sequence, then a word segmentation and recombination network executes actions in the action sequence one by one, a phrase is obtained through 'terminating' actions, the phrase is used as the input information of the last word of the phrase, a lattice-LSTM modeling is carried out to obtain a hidden state sequence, the hidden state is input into a named entity recognition network, a label sequence of sentences is obtained, and a recognition result is used as a delay rewarding guide strategy network module to update.

The beneficial effects of this embodiment are: the embodiment is an enhancement of an LSTM-CRF model based on a neural network, combines a framework of reinforcement learning to learn the internal relation of sentences, efficiently divides the sentence structure, integrates the obtained phrase information into a lattice-LSTM model based on word granularity, and fully learns the word granularity information and the word granularity information related to the word granularity information so as to achieve a better recognition effect.

Example 2

Fig. 4 shows an embodiment of a training method for training the model of embodiment 1 based on reinforcement learning of a chinese named entity recognition model, which includes the following steps:

pretreatment: pre-training a named entity recognition network and network parameters thereof, wherein words used by the named entity recognition network are words obtained by dividing an original sentence through a simple heuristic algorithm;

in step one, the states, actions, policies are defined as follows:

1. status: the encoding vector of the currently input word and the context vector preceding the word;

2. the actions are as follows: defining two distinct operations, including internal and termination;

3. strategy: the random strategy is defined as follows:

π(a _t |s _t ；θ)＝σ(W*s _t +b)

as shown in fig. 5, "washington in the united states" is divided into "washington" in the united states ". The character level characterization of the word by LSTM is performed with the updated formula as follows:

/>

wherein ,

and />

The cell state and the hidden state at time t are shown, respectively.

wherein ,

a code vector representing a j-th word in the sentence; />

Representing input, forget and output gates, respectively; />

Representing a new candidate state; />

Cell states representing the j-1 th word of a sentence; />

Representing the updated cell state; />

The hidden state of the j-th word moment of the sentence is represented; by the output door->

And the cell state at the present moment->

Determining; σ () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function.

Respectively represent transfusionDoor entry and forget; />

Representing a new candidate state; />

A cell state representing the phrase first word; />

Representing the updated cell state; σ () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function.

wherein ,

a code vector representing an e-th word in the sentence; />

Representing an additional door; σ () represents a sigmoid function.

The updating mode of the hidden state is changed, the updating of the hidden state is unchanged, and the final representation formula based on the lattice-LSTM model is as follows: />

wherein ,

an input gate vector for the j-th word; />

An input gate vector that is a phrase ending with j starting with b;

is the phrase cell state; />

New candidate cell states for the word; />

Is a phrase information vector;

is a word information vector.

wherein s represents a sentence;

is directed to l _i Model parameters of (2); />

The calculation formula of the loss value function is as follows:

wherein lambda is L ₂ The regular term coefficients, θ, represent the parameter set, and s and y represent the sentence and the correct labeling sequence corresponding to the sentence, respectively.

The definition of rewards is: after the action sequence is sampled through the strategy network, sentence division can be obtained, phrases obtained after sentence division are added into an LSTM model based on word granularity as word granularity information, a token based on the lattice-LSTM model is obtained, the token is input into a named entity recognition network module, entity labels of each word are obtained through a CRF layer, entity labels are decoded, and a reward value is calculated according to a recognition result. This is a delayed reward with which the policy network module can be directed to update, since the final recognition result is to be obtained to calculate the reward value.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A training method of a Chinese named entity recognition model based on reinforcement learning is characterized by comprising the following steps:

step two: dividing sentences by the word segmentation and recombination network according to the action sequence output by the strategy network module, breaking the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the law-LSTM representation of the word; the character is characterized in character level through LSTM, and each phrase is obtained according to termination, and the updated formula is as follows:

wherein ,

and />

Respectively representing a cell state and a hidden state at a time t;

wherein ,

a code vector representing a j-th word in the sentence; />

Representing input, forget and output gates, respectively; />

Representing a new candidate state; />

Cell states representing the j-1 th word of a sentence; />

Representing the updated cell state; />

And the cell state at the present moment->

Determining; sigma () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function;

wherein ,

a code vector representing a phrase in the sentence starting from the b-th word and ending with the w-th word; />

Representing input and forget gates, respectively; />

Representing a new candidate state; />

A cell state representing the phrase first word; />

Representing the updated cell state; sigma () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function;

wherein ,

a code vector representing an e-th word in the sentence; />

Representing an additional door; sigma () represents a sigmoid function;

wherein ,

an input gate vector for the j-th word; />

An input gate vector that is a phrase ending with j starting with b; />

Is the phrase cell state; />

New candidate cell states for the word; />

Is a phrase information vector;

is a word information vector;

step three: the hidden state obtained by the named entity recognition network from the word segmentation and recombination network is input into a conditional random field layer, a named entity recognition result is finally obtained, a loss value is obtained through calculation according to the recognition result and used for training a named entity recognition model, and meanwhile the loss value is used as a delay reward to guide the update of the strategy network module;

the sentence is characterized by a lattice-LSTM model, so that the hidden state vector h of each word in the sentence can be obtained _i The state vector sequence h= { H is then ₁ ,h ₂ ,…,h _n Inputting a conditional random field layer; let y=l ₁ ,l ₂ ,…,l _n The output label representing the conditional random field layer, the output label sequence probability is calculated by:

wherein s represents a sentence;

is directed to l _i Model parameters of (2); />

Is directed to l _i-1 and l_i Is set to be a bias parameter of (a); y' represents all possible output tag sets;

the calculation formula of the loss value function is as follows:

2. The training method of a reinforcement learning-based chinese named entity recognition model of claim 1, wherein in said step one, said actions include internal or termination, and the formula of the random strategy is as follows:

π(a _t | _t ；)＝(W*s _t +)

wherein ,π(a_t | _t The method comprises the steps of carrying out a first treatment on the surface of the ) Representing selection action a _t Probability of (2); θ= { W, b }, representing parameters of the policy network; s is(s) _t The state of the strategy network at the moment t; sigma () represents a sigmoid function; w, b denotes network parameters.

3. The training method of a Chinese named entity recognition model based on reinforcement learning according to claim 1, wherein before the first step, the named entity recognition network and network parameters thereof are pre-trained, and words used by the named entity recognition network are words obtained by dividing an original sentence through a simple heuristic algorithm;

4. The Chinese named entity recognition model based on reinforcement learning is characterized by comprising a strategy network module, a word segmentation recombination network and a named entity recognition network module; training with the training method of the preceding claims 1-3;

the strategy network module is used for sampling an action for each word in the sentence under each state space by adopting a random strategy, so as to obtain an action sequence for the whole sentence;

the word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, breaking the sentences into phrases, and combining the codes of the phrases with the code vector of the last word of the phrases so as to obtain the lattice-LSTM expression of the sentences;

and the named entity recognition network module is used for inputting the hidden state of the language-LSTM expression of the sentence into the conditional random field, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train the named entity recognition model, and simultaneously guiding the updating of the strategy network module by taking the loss value as a delay reward.

5. The reinforcement-learning-based chinese named entity recognition model of claim 4, wherein said actions comprise internal or termination.

6. The reinforcement-learning-based chinese named entity recognition model of claim 4, wherein said random strategy is:

π(a _t | _t ；)＝(W*s _t +)

7. The reinforcement learning-based Chinese named entity recognition model of claim 6, wherein the word segmentation and recombination network cuts sentences according to the action sequences output by the strategy network module to obtain phrases, and encodes each phrase as the cell state input at the last word of the corresponding phrase to obtain the language-LSTM representation of the sentences.

8. The reinforcement learning-based Chinese named entity recognition model of claim 7, wherein the named entity recognition network module inputs the output of lattice-LSTM obtained by the word segmentation and recombination network into a conditional random field layer, scores each labeling sequence of the sentence by using a feature function set of the conditional random field layer, indexes and normalizes the score, calculates all possible labeling sequences by using a first-order Viterbi algorithm, and the labeling sequence with the highest score is used as a final output. Meanwhile, defining a loss function, carrying out parameter training on the back propagation of the loss value, and taking the loss value as a delay rewarding updating strategy network module; the penalty function is defined as the log-likelihood of the sentence level with the L2 regularization term as follows: