CN110826334A

CN110826334A - Chinese named entity recognition model based on reinforcement learning and training method thereof

Info

Publication number: CN110826334A
Application number: CN201911089295.3A
Authority: CN
Inventors: 叶梅; 卓汉逵
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-02-21
Anticipated expiration: 2039-11-08
Also published as: CN110826334B

Abstract

The invention relates to a Chinese named entity recognition model based on reinforcement learning and a training method thereof, wherein the model comprises a strategy network module, a word segmentation and recombination network and a named entity recognition network module; firstly, the strategy network appoints an action sequence, then the word segmentation and recombination network executes the actions in the action sequence one by one, a phrase is obtained through the action of 'stopping', the phrase is used as auxiliary input information, lattice-LSTM modeling is carried out to obtain a hidden state sequence, the hidden state is input into the named entity identification network to obtain a label sequence of sentences, and the identification result is used as delay reward to guide the updating of the strategy network module. The method effectively divides the sentences by using reinforcement learning, avoids modeling redundant interference words matched in the sentences, effectively avoids dependence on an external dictionary and influence of long texts, can better utilize correct word information, and better helps the Chinese named entity recognition model to improve the recognition effect.

Description

Chinese named entity recognition model based on reinforcement learning and training method thereof

Technical Field

The invention relates to the field of machine learning, in particular to a Chinese named entity recognition model based on reinforcement learning and a training method thereof.

Background

Named Entity Recognition (NER) is a basic task in the field of natural language processing, and refers to recognizing named referents from texts, laying the foundation for tasks such as relation extraction, question-answering system, syntactic analysis, machine translation and the like, and plays an important role in the process of putting natural language processing technology into practical use. In general, the named entity recognition task is to recognize named entities of three major classes (entity class, time class and numeric class), seven minor classes (person name, organization name, place name, time, date, currency and percentage) in the text to be processed.

An existing chinese named entity recognition model is lattice-LSTM, which, in addition to inputting each word in a sentence, also takes as input cell vectors of all potential words whose end is the word, the selection of the potential words depends on an external dictionary, and additionally, a supplementary gate is added to control the selection of word granularity information and word granularity information, so as to change the input vector from word information, a previous hidden state vector and a previous cell state vector to word information, a previous hidden state vector and all word information whose end is the word. The advantage of this model is that explicit word information can be utilized in a model based on word sequence tagging, and no word segmentation errors are encountered.

However, just because the lattice-LSTM model uses information of all words in a sentence, a word composed of adjacent words in the sentence, if present in an external dictionary, is input into the model as registered word granularity information, but the word is not necessarily a correct division in the sentence, such as: according to the thought of the model, the 'Nanjing Yangtze river bridge' can take the entry words formed by characters as input in sequence, the entry words mean that the words are nouns recorded in an external dictionary, and then the 'Nanjing', 'Nanjing City', 'Shangjiang', 'Changjiang bridge' and 'Changjiang river bridge' can be taken as the entry words in the model, but obviously, the 'Shangjiang' is an interfering word in the sentence, and the word information of the 'Shangjiang' has negative influence on entity identification. In addition, the model usually requires an external dictionary to be constructed autonomously from the data set used in the experiment, and has a serious dependency on the external dictionary. Meanwhile, when the text length is increased, the number of potential words in the sentence is increased, and the complexity of the model is greatly improved.

Disclosure of Invention

The invention aims to solve the problems that the modeling is carried out on redundant interference words matched in a sentence, and the modeling is dependent on an external dictionary and is influenced by a long text in the prior art, and provides a Chinese named entity recognition model based on reinforcement learning and a training method thereof. Therefore, the input of interference words and the use of an external dictionary can be effectively avoided, the number of words in a sentence is reduced when the text length is increased, and the correct word information can be utilized to better help the Chinese named entity recognition model to improve the recognition accuracy.

In order to solve the technical problems, the invention adopts the technical scheme that: the Chinese named entity recognition model based on reinforcement learning is provided and comprises a strategy network module, a word segmentation and recombination network and a named entity recognition network module;

the strategy network module is used for sampling an action for each word in the sentence under each state space by adopting a random strategy so as to obtain an action sequence for the whole sentence, and obtaining a delay reward according to the recognition result of the Chinese named entity recognition network so as to guide the strategy network module to update;

the word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, cutting the sentences into phrases, and combining the encoding of the phrases with the encoding vector of the last character of the phrases so as to obtain the lattice-LSTM representation of the sentences;

and the named entity recognition network module is used for inputting the hidden state represented by the lattice-LSTM of the sentence into a CRF (conditional random field) layer, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train a named entity recognition model, and simultaneously using the loss value as a delay reward to guide the updating of the strategy network module.

Preferably, the action comprises an internal or termination.

Preferably, the random policy is:

π(a_t|s_t；θ)＝σ(W*s_t+b)

wherein ,π(a_t|s_t(ii) a θ) represents the selection action a_tThe probability of (d); θ ═ W, b, representing parameters of the policy network; s_tAnd the state of the network is optimized at the moment t.

Preferably, the word segmentation and recombination network cuts the sentences to obtain phrases according to the action sequence output by the strategy network module, codes each phrase and respectively inputs the phrase as the cell state at the last word of the corresponding phrase to obtain the lattice-LSTM representation of the sentence.

Preferably, the named entity recognition network module inputs the output of lattice-LSTM obtained by the word segmentation and recombination network into a CRF layer, scores each labeled sequence of the sentence by using a feature function set of the CRF layer, indexes and standardizes the score, calculates all possible labeled sequences by using a first-order Viterbi algorithm, takes the sequence with the highest score as final output, performs parameter training by reversely propagating the value of the loss function, and simultaneously takes the loss value as a delay reward updating strategy network module; the loss function is defined as the log-likelihood of the sentence level with the L2 regularization term as follows:

wherein λ is L₂A regularization term coefficient; θ represents a parameter set; s and y represent the sentence and the corresponding annotation sequence of the sentence, respectively.

The training method is used for training the Chinese named entity recognition model and comprises the following steps:

the method comprises the following steps: inputting sentence data for training into a strategy network module, wherein the strategy network module samples one action for each word in a sentence under each state space and outputs an action sequence of the whole sentence;

step two: the word segmentation and recombination network divides sentences according to the action sequence output by the strategy network module, cuts the sentences into phrases, and combines the encoding of the phrases with the encoding vector of the last word of the phrase to obtain the lattice-LSTM representation of the word;

step three: inputting the hidden state obtained by the named entity recognition network from the word segmentation and recombination network into a CRF layer, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train a named entity recognition model, and simultaneously using the loss value as a delay reward to guide the updating of the strategy network module;

the sentence is characterized by a lattice-LSTM model, and a hidden state vector h of each word in the sentence is obtained_iThen, the state vector sequence H is set to { H ═ H₁,h₂,…,h_nInputting into CRF layer; let y equal to l₁,l₂,…,l_nRepresenting the output label of the CRF layer, and calculating the probability of the output label sequence by the following formula:

wherein s represents a sentence;is directed to_iThe model parameters of (1);

is directed to_i-1 and l_iThe bias parameter of (2); y' represents all possible output tag sets.

The formula for the function of the loss value is:

wherein λ is L₂A regularization term coefficient; theta meterShowing parameter sets; s and y respectively represent a sentence and a correct labeling sequence corresponding to the sentence; p represents the probability that the sentence s is labeled as the sequence y, i.e., the probability of labeling the correct.

Preferably, in the step one, the action includes internal or termination, and the formula of the random strategy is as follows:

π(a_t|s_t；θ)＝ρ(W*s_t+b)

Preferably, in the second step, the word is characterized in a character level by LSTM, and the update formula is as follows:

wherein ,

a transfer function representing LSTM; x is the number of_tA code vector representing a word input at time t of the sentence;

andrespectively, the cell state and the hidden state at time t.

After the division of the sentence is completed, the phrase information is integrated into a word granularity-based LSTM model, which is the basic cyclic LSTM function, as follows:

wherein ,

a coded vector representing the jth word in the sentence;

representing the hidden state of the j-1 character moment of the sentence; w^cT and b^cIs a model parameter;

respectively representing input, forgetting and output gates;representing a new candidate state;

representing the cell state of the j-1 word time of the sentence;

indicating the updated cell state;

the hidden state of the jth word time of the sentence is represented by an output gate

And the cell state at the current time

Determining; σ () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function.

The phrase information is characterized by an LSTM model without an output gate, and the specific formula is as follows:

wherein ,

a coded vector representing a phrase starting from the b-th word and ending at the e-th word in the sentence;

representing the hidden state of the b-th word of the sentence, namely the hidden state of the first word of the phrase; w^wT and b^wIs a model parameter;respectively representing input and forgetting to remember a gate;

representing a new candidate state;

cell state representing the first word of the phrase;

indicating the updated cell state; σ () represents a sigmoid function; tanh () represents a hyperbolic tangent activation function.

In addition, an additional gate is added to select word granularity and word granularity information, the input is a coding vector of a word and the cell state of a phrase ending with the word, and the formula is defined as follows:

wherein ,encoding for representing the e-th character in a sentenceA code vector;

the cell state of the phrases starting from the b-th word to ending from the e-th word is represented, namely the cell state of the phrases taking the e-th word as the tail in the sentence; w^lT and b^lIs a model parameter;

an additional door is shown; σ () represents a sigmoid function.

The updating mode of the hidden state is changed, the hidden state is updated without change, and the final characterization formula based on the lattice-LSTM model is as follows:

wherein ,

an input gate vector for the jth word;

an input gate vector for phrases starting at b and ending with j;

is the phrase cellular state;a new candidate cell state for the word;

is a phrase information vector;is a word information vector.

Preferably, before the step one is carried out, the named entity recognition network and the network parameters thereof are pre-trained, and the words used by the named entity recognition network are words obtained by dividing the original sentences through a simple heuristic algorithm;

and temporarily setting part of the network parameters of the entity recognition network which are pre-trained as the network parameters of the named entity recognition network, then pre-training the strategy network, and finally jointly training the whole network parameters.

Compared with the prior art, the invention has the beneficial effects that: the Chinese named entity recognition model based on reinforcement learning and the method thereof effectively divide sentences by utilizing reinforcement learning, avoid modeling redundant interference words matched in the sentences and effectively avoid dependence on an external dictionary and influence of long texts.

Drawings

FIG. 1 is a schematic diagram of a Chinese named entity recognition model based on reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of a strategy network module of a Chinese named entity recognition model based on reinforcement learning according to the present invention;

FIG. 3 is a schematic diagram of a named entity recognition network module based on a reinforcement learning Chinese named entity recognition model according to the present invention;

FIG. 4 is a flow chart of a training method of a Chinese named entity recognition model based on reinforcement learning according to the present invention;

FIG. 5 is an exemplary diagram of sentence segmentation in the training method of the Chinese named entity recognition model based on reinforcement learning according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms such as "upper", "lower", "left", "right", "long", "short", etc., indicating orientations or positional relationships based on the orientations or positional relationships shown in the drawings, it is only for convenience of description and simplicity of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationships in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

The technical scheme of the invention is further described in detail by the following specific embodiments in combination with the attached drawings:

example 1

1-3, an embodiment of a Chinese named entity recognition model based on reinforcement learning is shown, which comprises a strategy network module, a segmentation and recombination network and a named entity recognition network module;

the strategy network module is used for sampling an action (action comprises internal or termination) for each word in the sentence under each state space by adopting a random strategy so as to obtain an action sequence for the whole sentence, and obtaining a delay reward according to the recognition result of the Chinese named entity recognition network so as to guide the strategy network module to update; the random strategy is:

π(a_t|s_t；θ)＝σ(W*s_t+b)

The word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, cutting the sentences into phrases, and combining the encoding of the phrases with the encoding vector of the last character of the phrases so as to obtain lattice-LSTM expression of the sentences;

specifically, the word segmentation and recombination network cuts the sentences to obtain phrases according to the action sequence output by the strategy network module, codes each phrase and respectively inputs the cell state at the last word of the corresponding phrase to obtain the lattice-LSTM representation of the sentence.

And the named entity recognition network module is used for inputting the hidden state expressed by the lattice-LSTM of the sentence into the conditional random field, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train a named entity recognition model, and simultaneously using the loss value as a delay reward to guide the updating of the strategy network module. Wherein, the calculation formula of the loss value is as follows,

wherein λ is L₂A regularization term coefficient; θ represents a parameter set; s and y respectively represent a sentence and a correct labeling sequence corresponding to the sentence; p represents the probability that the sentence s is labeled as the sequence y, i.e., the probability of labeling the correct.

The working principle of the embodiment is as follows: firstly, the strategy network appoints an action sequence, then the word segmentation and recombination network executes the actions in the action sequence one by one, a phrase is obtained through the action of 'stopping', the phrase is used as the input information of the last character of the phrase, lattice-LSTM modeling is carried out to obtain a hidden state sequence, the hidden state is input to a named entity identification network to obtain a label sequence of a sentence, and the identification result is used as delay reward to guide the updating of a strategy network module.

The beneficial effects of this embodiment: the embodiment is reinforcement of an LSTM-CRF model based on a neural network, an reinforcement learning framework is combined to learn the internal relation of sentences, the sentence structure is efficiently divided, the obtained phrase information is integrated into a lattice-LSTM model based on word granularity, and the word granularity information related to the word granularity information are fully learned, so that a better recognition effect is achieved.

Example 2

Fig. 4 shows an embodiment of a training method of a chinese named entity recognition model based on reinforcement learning, which is used for training the model described in embodiment 1, and includes the following steps:

pretreatment: pre-training a named entity recognition network and network parameters thereof, wherein the words used by the named entity recognition network are words obtained by dividing original sentences through a simple heuristic algorithm;

in step one, the states, actions, and policies are defined as follows:

1. the state is as follows: an encoding vector of a currently input word and a context vector preceding the word;

2. the actions are as follows: defining two different operations, including internal and termination;

3. strategy: the random strategy is defined as follows:

π(a_t|s_t；θ)＝σ(W*s_t+b)

as shown in FIG. 5, "Washington in the United states" is classified as "United states", "Washington". The character level of the character is characterized by LSTM, and the updating formula is as follows:

wherein ,

and

respectively, the cell state and the hidden state at time t.

wherein ,

a coded vector representing the jth word in the sentence;representing the hidden state of the j-1 character moment of the sentence; w^cT and b^cIs a model parameter;

respectively representing input, forgetting and output gates;

representing a new candidate state;

representing the cell state of the j-1 word time of the sentence;

indicating the updated cell state;

representing the hidden state of the jth word moment of the sentence; from the output gate

And the cell state at the current time

Determining; σ () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function.

representing the b-th word time of a sentenceThe hidden state of (a), i.e., the hidden state of the first word of the phrase; w^wT and b^wIs a model parameter;

respectively representing input and forgetting to remember a gate;

representing a new candidate state;

cell state representing the first word of the phrase;

indicating the updated cell state; σ () represents a sigmoid function, and tanh () represents a hyperbolic tangent activation function.

wherein ,

a coded vector representing the e-th word in the sentence;

an additional door is shown; σ () represents a sigmoid function.

wherein ,

an input gate vector for the jth word;an input gate vector for phrases starting at b and ending with j;

is the phrase cellular state;

a new candidate cell state for the word;

is a phrase information vector;

is a word information vector.

wherein s represents a sentence;is directed to_iThe model parameters of (1);

The formula for the function of the loss value is:

wherein λ is L₂Regular term coefficients, θ represents a parameter set, and s and y represent a sentence and the correct sequence of labels corresponding to the sentence, respectively.

The reward is defined as: the method comprises the steps of obtaining division of sentences after action sequences are sampled through a policy network, adding phrases obtained after the sentence division as word granularity information into an LSTM model based on word granularity to obtain a representation based on a lattice-LSTM model, inputting the representation into a named entity identification network module, obtaining entity labels of all words through a CRF layer, decoding entity labels, and calculating reward values according to identification results. This is a delayed reward with which policy network modules can be directed to update, since the reward value can only be calculated until the final recognition result is obtained.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A Chinese named entity recognition model based on reinforcement learning is characterized by comprising a strategy network module, a word segmentation and recombination network and a named entity recognition network module;

the strategy network module is used for sampling an action for each word in the sentence under each state space by adopting a random strategy, so that an action sequence is obtained for the whole sentence;

the word segmentation and recombination network is used for dividing sentences according to the action sequence output by the strategy network module, breaking the sentences into phrases, and combining the encoding of the phrases with the encoding vector of the last character of the phrases so as to obtain lattice-LSTM expression of the sentences;

and the named entity recognition network module is used for inputting the hidden state expressed by the lattice-LSTM of the sentence into the conditional random field, finally obtaining a named entity recognition result, calculating a loss value according to the recognition result to train a named entity recognition model, and simultaneously using the loss value as a delay reward to guide the updating of the strategy network module.

2. The reinforcement learning-based Chinese named entity recognition model of claim 1, wherein the action comprises an internal or termination.

3. The reinforcement learning-based Chinese named entity recognition model of claim 1, wherein the stochastic strategy is:

π(a_t|s_t；θ)＝σ(W*s_t+b)

wherein ,π(a_t|s_t(ii) a θ) represents the selection action a_tThe probability of (d); θ ═ W, b, representing parameters of the policy network; s_tThe state of the network is optimized at the moment t; σ () represents a sigmoid function; w, b denotes network parameters.

4. The model of claim 3, wherein the word segmentation and reassembly network segments the sentence according to the action sequence outputted by the strategy network module to obtain phrases, encodes each phrase, and inputs the phrase as the cell state at the last word of the corresponding phrase to obtain lattice-LSTM representation of the sentence.

5. The reinforced learning-based named entity recognition model of Chinese according to claim 4, wherein the named entity recognition network module inputs the output of lattice-LSTM obtained from the participle reorganization network into the conditional random field layer, scores each annotation sequence of the sentence by using the feature function set of the conditional random field layer and indexes and standardizes the score, calculates all possible annotation sequences by using a first-order Viterbi algorithm, and takes the highest scoring annotation sequence as the final output. Defining a loss function, carrying out parameter training on the loss value by back propagation, and taking the loss value as a delay reward updating strategy network module; the loss function is defined as the log-likelihood of the sentence level with the L2 regularization term as follows:

6. A training method of Chinese named entity recognition model based on reinforcement learning, which is used for training the Chinese named entity recognition model based on reinforcement learning of any one of claims 1 to 5, and comprises the following steps:

step two: the word segmentation and recombination network divides sentences according to the action sequence output by the strategy network module, breaks the sentences into phrases, codes the phrases and combines the code vectors of the last character of the phrases, thereby obtaining the lattice-LSTM representation of the characters;

step three: the named entity recognition network inputs the hidden state obtained from the word segmentation and recombination network into a conditional random field layer, finally obtains a named entity recognition result, calculates a loss value according to the recognition result to train a named entity recognition model, and simultaneously takes the loss value as a delay reward to guide the updating of the strategy network module;

the sentence is characterized by a lattice-LSTM model, and a hidden state vector h of each word in the sentence is obtained_iThen, the state vector sequence H is set to { H ═ H₁，h₂，…，h_nInputting a conditional random field layer; let y equal to l₁，l₂，…，l_nAn output tag representing the conditional random field layer, the output tag sequence probability being calculated by:

wherein s represents a sentence;is directed to_iThe model parameters of (1);

The formula for the function of the loss value is:

7. The method for training the Chinese named entity recognition model based on reinforcement learning of claim 6, wherein in the step one, the action comprises internal or termination, and the formula of the random strategy is as follows:

π(a_t|s_t；θ)＝σ(W*s_t+b)

8. The method for training the Chinese named entity recognition model based on reinforcement learning of claim 6, wherein in the second step, the word is characterized by the character level through LSTM, and the phrases are obtained according to the termination, and the formula is updated as follows:

wherein ,a transfer function representing LSTM; x is the number of_tA code vector representing a word input at time t of the sentence;

and

respectively, the cell state and the hidden state at time t.

wherein ,

a coded vector representing the jth word in the sentence;

respectively representing input, forgetting and output gates;

representing a new candidate state;

representing the cell state of the j-1 word time of the sentence;

indicating the updated cell state;

And the cell state at the current time

wherein ,

representing the hidden state of the b-th word of the sentence, namely the hidden state of the first word of the phrase; w^wT and b^wIs a model parameter;

respectively representing input and forgetting to remember a gate;

representing a new candidate state;

express the phrase firstWord cell states;

wherein ,

a coded vector representing the e-th word in the sentence;

an additional door is shown; σ () represents a sigmoid function.

wherein ,

an input gate vector for the jth word;

an input gate vector for phrases starting at b and ending with j;is the phrase cellular state;

a new candidate cell state for the word;

is a phrase information vector;

is a word information vector.

9. The training method of the Chinese named entity recognition model based on reinforcement learning of claim 6, wherein before the step one, the named entity recognition network and the network parameters thereof are pre-trained, and at this time, the words used by the named entity recognition network are words obtained by dividing the original sentence through a simple heuristic algorithm;