CN112257447A

CN112257447A - Named entity recognition system and recognition method based on deep network AS-LSTM

Info

Publication number: CN112257447A
Application number: CN202011140319.6A
Authority: CN
Inventors: 王国鸿
Original assignee: Beijing Zhongbiao Intelligent Technology Co ltd
Current assignee: Beijing Zhongbiao Intelligent Technology Co ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-22
Anticipated expiration: 2040-10-22
Also published as: CN112257447B

Abstract

The invention provides a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer. The context feature layer BI-AS-LSTM comprises 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network. In the invention, a novel AS-LSTM deep network is designed in the named entity recognition system, which can obtain more stable and accurate cell states before and after the named entity in the input text, and the network has learning of self-dependent gravity, so that not only can the semantic representation related to the context be learned, but also the robustness of dealing with irrelevant words in the context can be increased, and the error of the recognition system is reduced.

Description

Named entity recognition system and recognition method based on deep network AS-LSTM

Technical Field

The invention belongs to the field of artificial intelligence natural language processing, relates to a named entity recognition technology in the field of natural language processing, and particularly relates to a named entity recognition system and a named entity recognition method based on a deep network AS-LSTM.

Background

With the development of artificial intelligence technology, machine learning becomes one of the most common methods for natural language processing. Deep learning is taken as a branch of machine learning, and due to the improvement of CPU/GPU hardware computing power and the optimization development of a deep network in recent years, optimal effects are obtained in almost all subtask fields of natural language processing, including tasks such as a dialog system, named entity recognition, language translation and the like, wherein the Named Entity Recognition (NER) also becomes the most common problem in the field of natural language processing.

Deep learning networks in Named Entity Recognition (NER) are a method recognized by the industry and the academia at present, and a training set is obtained through manual labeling, and then a deep network is constructed and trained to obtain an NER recognition model. The NER deep network development to date mostly uses long and short term memory networks named LSTM, but in actual prediction the following problems are found: 1. the deep learning network has the problem that the post robustness is weak before and after the deep learning network exists; 2. when model prediction is carried out, when one or two unimportant words in the context are changed, the precision of a prediction result is often seriously influenced; 3. during named entity recognition, the time cost and the labor cost of a training set marked manually are high, the training set in the early stage is few, the cold start efficiency is low, and a quick model iteration period cannot be entered.

For example, the invention disclosed in china patent CN109871541A provides a named entity recognition method applicable to multiple languages and multiple fields, and the NER establishing method provided by the patent uses the LSTM long-short term memory network. However, it still needs a large amount of labeled data to improve the recognition accuracy, and cannot solve the problem of insufficient context robustness of LSTM.

For example, the invention disclosed in china patent CN111091002A provides a method for identifying named entities in chinese, which is characterized in that CWS and POS tag information are used for word relationship inference, and common information related to entity boundaries is extracted through NER, CWS and POS tagging tasks included in counterstudy. But still needs a large amount of label information, thereby greatly improving the time cost and the labor cost of identification.

Therefore, there is a need for improvements to existing named entity recognition techniques.

Disclosure of Invention

The invention aims to improve the accuracy and recall rate of named entity recognition, solve the defects of poor context robustness, high time cost, high labor cost and the like in the actual use of the conventional long-short term memory network (LSTM network), solve the problems of insufficient sample size and low iteration efficiency of a cold-started training set of the conventional deep model, and provide a named entity recognition system and a named entity recognition method based on the deep network AS-LSTM.

The technical scheme for realizing the purpose of the invention is as follows:

the invention provides a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer; the text feature layer is used for extracting feature information of an input text in the text, the context feature layer BI-AS-LSTM is used for outputting the extracted feature information to obtain an output sequence and obtain context features, and the CRF layer is used for obtaining position information and entity labels of the context features in the input text.

The context feature layer BI-AS-LSTM comprises 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network.

In the invention, a novel AS-LSTM deep network is designed in the named entity recognition system, which can discard the influence of the cell states of all contexts in the input text, and the network has the learning of the emphasis of the network, so that the semantic representation related to the context can be learned, the robustness of dealing with irrelevant words in the context can be increased, and the error of the recognition system can be reduced.

Further, the AS-LSTM deep network includes a forgetting gate, an output gate, an input gate, and a weight gate, and the weight gate is associated with a current input of text.

In general, a conventional LSTM network is composed of three gates, namely a forgetting gate, an output gate and an input gate, the parameter calculation of the gates is related to not only the current input of a text but also the last input of the text, and the semantic relationship of the context can be learned by properly constructing the LSTM. However, in the actual use process, the model trained by the LSTM may exhibit the characteristic of poor robustness in the context, and two irrelevant words before and after changing may result in completely different prediction results, for example, the vocabulary of "beijing" belongs to the semantic of "place" in many contexts, but it is difficult to cover all scenes in the actual training set, and it is not uncommon that prediction is wrong due to the use of the LSTM network.

Therefore, the invention optimizes the LSTM, adds a weight gate on the basis of the traditional LSTM and forms a new AS-LSTM deep network. The weight gate is only related to the current input in the AS-LSTM deep network, and the AS-LSTM deep network can not only learn the semantic information of the context when being used, but also present strong robustness when being faced with the interference information in the previous and later periods, thereby obtaining more accurate effect in the actual NER (named entity recognition) task.

Furthermore, the named entity recognition system also comprises a Random Replace training method, wherein the Random Replace training method is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot starting mode.

Furthermore, the named entity recognition system also comprises an inventory database, and the inventory database is used for constructing an incremental training set by using a Random Replace training method on the basis of the training set. Specifically, the stock database can use a Random Replace method on the basis of a small amount of training sets, named entities which are marked are randomly replaced by names in the stock database by the Random Replace training method, an incremental training set is further constructed and formed, and the incremental training set is combined with the initial small amount of training sets and then input to the model for training.

In the named entity recognition system, an AS-LSTM deep network is combined with a Random Replace training method, and is combined with the application of a stock mechanism database, an incremental training set can be formed by using Random replacement role names on the training set through listing related semantic representations before and after, and the learning performance of the AS-LSTM deep network on the AS-LSTM deep network is combined, so that the shortcut of a rapid iteration model is found, and the iteration period of cold start is greatly shortened.

The invention also provides a named entity recognition method based on the deep network AS-LSTM, which is applied to the named entity recognition system for recognizing the text, and the starting form of the named entity recognition system formed by the deep network AS-LSTM is cold starting. The named entity identification method comprises the following steps:

s1, constructing a network model BI-AS-LSTM-CRF;

s2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;

s3, dividing the recognition corpus in the S2 into a training set and a test set;

s4, inputting the training set into a network model BI-AS-LSTM-CRF for training to obtain a named entity training model;

s5, adopting the named entity training model in S4 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;

and S6, calculating and comparing the accuracy and the recall rate of the recognition result of the named entity recognition model obtained in the S5 on the test set.

In order to improve the named entity recognition method, the iteration period of cold start is greatly shortened, the quantity of a training set is increased by increasing a stock database, a large number of incremental training sets are generated, and the shortcut of a rapid iteration model is obtained. The named entity recognition method is characterized in that the starting form of a named entity recognition system formed by combining a deep network AS-LSTM with a stock database and a Random Replace training method is semi-hot start. Specifically, the named entity identification method comprises the following steps:

s1, constructing a network model BI-AS-LSTM-CRF;

s301, preparing a stock database of the identification target;

s4, based on the training set in S3, randomly replacing the stock database in S301 to obtain an incremental training set;

s401, merging the training set in S3 with the incremental training set in S4, and inputting the merged training set into a network model BI-AS-LSTM-CRF of S1 for training to obtain a named entity recognition model;

s5, adopting the named entity training model obtained in S401 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;

and S6, calculating and comparing the accuracy and the recall rate of the recognition results of the named entity recognition model in the S5 on the test set.

The method comprises the steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, and obtaining an entity label.

The marking step of identifying the corpus in the named entity comprises marking the beginning of an identification target by B-PRO and marking the middle part of the identification target by I-PRO.

In step S3, the ratio of the training set to the test set is (10:1) - (2: 1).

Preferably, in step S3, the ratio of the training set to the test set is 4: 1.

Compared with the prior art, the invention has the beneficial effects that:

1. a novel AS-LSTM deep network is designed in the named entity recognition system, the more stable and accurate cell state of the named entity in the input text in the preceding and following periods can be obtained, the network has the learning of the self-dependent importance, so that the context-related semantic representation can be learned, the robustness of dealing with irrelevant words in the preceding and following periods can be increased, and the error of the recognition system is reduced.

The AS-LSTM deep network, the Random Replace training method and the stock database are combined and applied, the named entity recognition system is converted from a cold start mode to a semi-hot start mode, the iteration period of cold start can be greatly shortened, the iteration efficiency of the model is improved, and the prediction result is more accurate and stable.

Drawings

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings used in the description of the embodiment will be briefly introduced below. It should be apparent that the drawings in the following description are only for illustrating the embodiments of the present invention or technical solutions in the prior art more clearly, and that other drawings can be obtained by those skilled in the art without any inventive work.

FIG. 1 is a diagram of a conventional LSTM long and short term memory network;

FIG. 2a is a schematic diagram of a forgetting gate in a conventional LSTM network and an AS-LSTM deep network according to the present invention;

FIG. 2b is a schematic diagram of the input gates in a conventional LSTM network and the AS-LSTM deep network of the present invention;

FIG. 2c is a schematic diagram of the output gates in a conventional LSTM network and the AS-LSTM deep network of the present invention;

FIG. 3 is a schematic diagram of a bidirectional AS-LSTM network in the named entity recognition system of the present invention;

FIG. 4 is a schematic diagram of an AS-LSTM deep network in a bidirectional AS-LSTM network of the present invention;

FIG. 5 is a schematic diagram of the weight gates of the AS-LSTM deep network in the bidirectional AS-LSTM network of the present invention;

FIG. 6 is a flowchart of the cold start of AS-LSTM deep network formation in the named entity recognition method of the present invention;

FIG. 7 is a flowchart of a semi-hot start formed by combining an AS-LSTM deep network and a Random Replace training method in the named entity recognition method of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments, and the advantages and features of the invention will become apparent as the description proceeds. These examples are illustrative only and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention, and that such changes and modifications may be made without departing from the spirit and scope of the invention.

In the description of the present embodiments, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit to a number of indicated technical features. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.

Example 1:

the embodiment discloses a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text feature layer, a context feature layer BI-AS-LSTM and a CRF layer.

Generally speaking, in the conventional named entity recognition system, a BI-LSTM network in a network model BI-LSTM-CRF for named entity recognition is formed by splicing 2 LSTM networks, as shown in fig. 1. Specifically, the LSTM network consists of three gates, namely a forgetting gate (as in fig. 2a), an output gate (as in fig. 2b), and an input gate (as in fig. 2c), and the core of the network is a cell state, which is represented by a horizontal line running through the cell, and the cell state is like a conveyor belt, and has only few branches running through the whole cell, so that it can be guaranteed that information flows through the whole network without change, and the network structure can learn the context semantic representation, and thus is very commonly used in context phaseRelated to natural language processing tasks. Wherein f is_t＝σ(W_f·[h_t-1,x_t]+b_f) To forget the door; i.e. i_t＝σ(W_i·[h_t-1,x_t]+b_i) Is an input gate; o_t＝σ(W_o·[h_t-1,x_t]+b_o) Is an output gate; c_t＝f_t·C_t-1+i_t·tanh(W_c·[h_t-1,x_t]+b_c) Indicating the cellular state; h is_t＝o_t·tanh(C_t) The output of the bi-directional LSTM network is shown.

The calculation of the parameters of these gates is not only related to the current input of text, but also to the last input of text, and the semantic relationship of the context can be learned by appropriately constructing the LSTM. However, in the actual use process, the model trained by the LSTM may exhibit the characteristic of poor robustness in the context, and two irrelevant words before and after changing may result in completely different prediction results, for example, the vocabulary of "beijing" belongs to the semantic of "place" in many contexts, but it is difficult to cover all scenes in the actual training set, and it is not uncommon that prediction is wrong due to the use of the LSTM network.

The invention optimizes the LSTM, adds a weight gate on the basis of the traditional LSTM, and forms a new AS-LSTM deep network (AS shown in figure 4). That is, the AS-LSTM deep network includes a forgetting gate (AS in fig. 2a), an outputting gate (AS in fig. 2b), an inputting gate (AS in fig. 2c), and a weighting gate (AS in fig. 5), and the weighting gate is associated with the current input of the text, and the AS-LSTM deep network can not only learn the semantic information of the context, but also present strong robustness in the face of the interference information of the preceding and following text, thereby achieving more precise effect in the actual NER (named entity recognition) task. Wherein f is_t＝σ(W_f·[h_t-1,x_t]+b_f) To forget the door; i.e. i_t＝σ(W_i·[h_t-1,x_t]+b_i) Is an input gate; o_t＝σ(W_o·[h_t-1,x_t]+b_o) Is composed ofAn output gate; c_t＝f_t·C_t-1+i_t·tanh(W_c·[h_t-1,x_t]+b_c) Indicating the cellular state; a. the_t＝tanh(x_t·W_A) Is a weight gate; h is_t＝o_t·tanh(C_t)·A_tThe expression is the output of the bidirectional LSTM network, the weight selection gate is only related to the current input, the influence of all the contextual cell states is abandoned, and the network has the learning of self-dependent gravity, so that the context-related semantic expression can be learned, and the robustness of dealing with the context-independent words can be increased.

In this embodiment, the context feature layer BI-AS-LSTM includes 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network, AS shown in FIG. 3.

In the present embodiment, the text feature layer is used to extract feature information of the input text in the text, for example, it can extract feature information of words and sentences of the input text.

In this embodiment, the context feature layer BI-AS-LSTM is configured to output the extracted feature information to obtain an output sequence and obtain context features, specifically, the context feature layer BI-AS-LSTM obtains the output sequence from the feature information (e.g., sequence) extracted by the text feature layer through a bidirectional gated cyclic network (i.e., a bidirectional AS-LSTM network), and obtains the context features of the sentence through the linear layer;

in this embodiment, the CRF layer is a conditional random field model, and can learn a transition matrix of the named entity expression, for example: words ending with a company are more likely to be judged as business entities, and neural networks are typically used with CRFs. The CRF layer is used for acquiring the position information and the entity labels of the context features in the input text, and specifically, the CRF layer obtains the entity labels of the input text through the conditional random field CRF according to the position information of each word marked by the context features through BIO in the input text.

Example 2:

since most named entity recognition systems are cold-start modes and have the problem of low efficiency, at present, academic circles and industry have tried to use some methods, most commonly, pre-training models are used for word embedding, such as ELMO, BERT, GPT-3 and other pre-training models with extremely large parameter quantity are used as generators of upstream word vectors, and then fine tune tuning is performed on downstream tasks, but for many and scientific research units, the computing resources and cost brought by the pre-training models are too large, the service interface reaction speed is too slow, for example, the NER model prediction speed of RT BEunder the calculation of a common GPU is about 500ms, and the speed is very slow to meet daily use and service.

Therefore, this embodiment is an improvement on the basis of embodiment 1, and a Random Replace training method is added to the named entity recognition system, and the Random Replace training method is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot start manner.

Further, the named entity recognition system further comprises an inventory database, wherein the inventory database is used for constructing an incremental training set by using a Random Replace training method on the basis of the training set. Specifically, the stock database can use a Random Replace method on the basis of a small amount of training sets, named entities which are marked are randomly replaced by names in the stock database by the Random Replace training method, an incremental training set is further constructed and formed, and the incremental training set is combined with the initial small amount of training sets and then input to the model for training.

Example 3:

AS shown in fig. 6, this embodiment discloses a named entity recognition method based on the deep network AS-LSTM, which is applied to the named entity recognition system to recognize texts, and the start form of the named entity recognition system formed by the deep network AS-LSTM is cold start. The named entity identification method comprises the following steps:

s1, constructing a network model BI-AS-LSTM-CRF;

specifically, the construction of the network model BI-AS-LSTM-CRF comprises a plurality of steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, obtaining an entity label and the like.

for example: the named entity identification method is applied to identification of the bidding bulletin, a target object of the bidding bulletin is used as an NER (named entity identification) identification target, named entity identification linguistic data is labeled, the beginning of the target object is labeled according to B-PRO, and the middle part of the target object is labeled according to I-PRO.

specifically, the ratio of the training set to the test set is (10:1) - (2:1), and in this step, the ratio of the training set to the test set is preferably 4: 1.

s5, adopting the named entity training model in S4 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of a named entity recognition model (namely an NER recognition model);

specifically, the recognition accuracy of the named entity recognition model can reach F1 score.

Example 4:

as shown in fig. 7, in order to improve the named entity recognition method in embodiment 3, the iteration period of the cold start is greatly shortened, the amount of the training set is increased by adding the stock database, a large number of incremental training sets are generated, and a shortcut of the fast iteration model is obtained. The named entity recognition method is characterized in that the starting form of a named entity recognition system formed by combining a deep network AS-LSTM with a stock database and a Random Replace training method is semi-hot start.

The named entity identification method comprises the following steps:

s1, constructing a network model BI-AS-LSTM-CRF;

for example: the named entity identification method is applied to identification of the bidding bulletin, a buyer and an agency are used as NER identification targets, named entity identification linguistic data are marked, the head of the buyer is marked according to B-ORG1, the middle part of the buyer is marked by I-ORG1, the head of the agency is marked by B-ORG3, and the middle part of the agency is marked by I-ORG 3.

Specifically, the labeling step of identifying the corpus in the named entity comprises labeling the beginning of the identification target by B-PRO and labeling the middle part of the identification target by I-PRO.

s301, preparing a stock database of the identification target;

s5, adopting the named entity training model obtained in S401 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of a named entity recognition model (namely an NER recognition model);

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. Named entity recognition system based on deep network AS-LSTM, its characterized in that: the system comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer; the text feature layer is used for extracting feature information of an input text in the text, the context feature layer BI-AS-LSTM is used for outputting the extracted feature information to obtain an output sequence and obtain context features, and the CRF layer is used for obtaining position information and entity labels of the context features in the input text;

2. The named entity recognition system of claim 1, wherein: the AS-LSTM deep network comprises a forgetting gate, an output gate, an input gate and a weight gate, and the weight gate is associated with the current input of the text.

3. The named entity recognition system of claim 2, wherein: the named entity recognition system also comprises a Random Replace training method which is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot starting mode.

4. The named entity recognition system of claim 2, wherein: the named entity recognition system further comprises a stock database, and the stock database is used for constructing an incremental training set on the basis of the training set by using a Random Replace training method.

5. A named entity recognition method based on a deep network AS-LSTM is applied to a named entity recognition system for recognizing texts, and is characterized by comprising the following steps:

s1, constructing a network model BI-AS-LSTM-CRF;

6. The named entity recognition method of claim 5, comprising the steps of:

s1, constructing a network model BI-AS-LSTM-CRF;

s301, preparing a stock database of the identification target;

7. The named entity recognition method of claim 5 or 6, wherein the network model BI-AS-LSTM-CRF is constructed by: the method comprises the steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, and obtaining an entity label.

8. The named entity recognition method of claim 5 or 6, wherein the recognition of the corpus tag in the named entity comprises tagging the beginning of the recognition target with B-PRO and tagging the middle part of the recognition target with I-PRO.

9. The named entity recognition method of claim 5 or 6, wherein in step S3, the ratio of the training set to the test set is (10:1) - (2: 1).

10. The named entity recognition method of claim 9, wherein in step S3, the ratio of the training set to the test set is 4: 1.