CN112257447A - Named entity recognition system and recognition method based on deep network AS-LSTM - Google Patents
Named entity recognition system and recognition method based on deep network AS-LSTM Download PDFInfo
- Publication number
- CN112257447A CN112257447A CN202011140319.6A CN202011140319A CN112257447A CN 112257447 A CN112257447 A CN 112257447A CN 202011140319 A CN202011140319 A CN 202011140319A CN 112257447 A CN112257447 A CN 112257447A
- Authority
- CN
- China
- Prior art keywords
- named entity
- lstm
- entity recognition
- network
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 53
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 96
- 238000012360 testing method Methods 0.000 claims description 26
- 230000001419 dependent effect Effects 0.000 abstract description 3
- 230000005484 gravity Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 7
- 238000003058 natural language processing Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007786 learning performance Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000036632 reaction speed Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer. The context feature layer BI-AS-LSTM comprises 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network. In the invention, a novel AS-LSTM deep network is designed in the named entity recognition system, which can obtain more stable and accurate cell states before and after the named entity in the input text, and the network has learning of self-dependent gravity, so that not only can the semantic representation related to the context be learned, but also the robustness of dealing with irrelevant words in the context can be increased, and the error of the recognition system is reduced.
Description
Technical Field
The invention belongs to the field of artificial intelligence natural language processing, relates to a named entity recognition technology in the field of natural language processing, and particularly relates to a named entity recognition system and a named entity recognition method based on a deep network AS-LSTM.
Background
With the development of artificial intelligence technology, machine learning becomes one of the most common methods for natural language processing. Deep learning is taken as a branch of machine learning, and due to the improvement of CPU/GPU hardware computing power and the optimization development of a deep network in recent years, optimal effects are obtained in almost all subtask fields of natural language processing, including tasks such as a dialog system, named entity recognition, language translation and the like, wherein the Named Entity Recognition (NER) also becomes the most common problem in the field of natural language processing.
Deep learning networks in Named Entity Recognition (NER) are a method recognized by the industry and the academia at present, and a training set is obtained through manual labeling, and then a deep network is constructed and trained to obtain an NER recognition model. The NER deep network development to date mostly uses long and short term memory networks named LSTM, but in actual prediction the following problems are found: 1. the deep learning network has the problem that the post robustness is weak before and after the deep learning network exists; 2. when model prediction is carried out, when one or two unimportant words in the context are changed, the precision of a prediction result is often seriously influenced; 3. during named entity recognition, the time cost and the labor cost of a training set marked manually are high, the training set in the early stage is few, the cold start efficiency is low, and a quick model iteration period cannot be entered.
For example, the invention disclosed in china patent CN109871541A provides a named entity recognition method applicable to multiple languages and multiple fields, and the NER establishing method provided by the patent uses the LSTM long-short term memory network. However, it still needs a large amount of labeled data to improve the recognition accuracy, and cannot solve the problem of insufficient context robustness of LSTM.
For example, the invention disclosed in china patent CN111091002A provides a method for identifying named entities in chinese, which is characterized in that CWS and POS tag information are used for word relationship inference, and common information related to entity boundaries is extracted through NER, CWS and POS tagging tasks included in counterstudy. But still needs a large amount of label information, thereby greatly improving the time cost and the labor cost of identification.
Therefore, there is a need for improvements to existing named entity recognition techniques.
Disclosure of Invention
The invention aims to improve the accuracy and recall rate of named entity recognition, solve the defects of poor context robustness, high time cost, high labor cost and the like in the actual use of the conventional long-short term memory network (LSTM network), solve the problems of insufficient sample size and low iteration efficiency of a cold-started training set of the conventional deep model, and provide a named entity recognition system and a named entity recognition method based on the deep network AS-LSTM.
The technical scheme for realizing the purpose of the invention is as follows:
the invention provides a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer; the text feature layer is used for extracting feature information of an input text in the text, the context feature layer BI-AS-LSTM is used for outputting the extracted feature information to obtain an output sequence and obtain context features, and the CRF layer is used for obtaining position information and entity labels of the context features in the input text.
The context feature layer BI-AS-LSTM comprises 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network.
In the invention, a novel AS-LSTM deep network is designed in the named entity recognition system, which can discard the influence of the cell states of all contexts in the input text, and the network has the learning of the emphasis of the network, so that the semantic representation related to the context can be learned, the robustness of dealing with irrelevant words in the context can be increased, and the error of the recognition system can be reduced.
Further, the AS-LSTM deep network includes a forgetting gate, an output gate, an input gate, and a weight gate, and the weight gate is associated with a current input of text.
In general, a conventional LSTM network is composed of three gates, namely a forgetting gate, an output gate and an input gate, the parameter calculation of the gates is related to not only the current input of a text but also the last input of the text, and the semantic relationship of the context can be learned by properly constructing the LSTM. However, in the actual use process, the model trained by the LSTM may exhibit the characteristic of poor robustness in the context, and two irrelevant words before and after changing may result in completely different prediction results, for example, the vocabulary of "beijing" belongs to the semantic of "place" in many contexts, but it is difficult to cover all scenes in the actual training set, and it is not uncommon that prediction is wrong due to the use of the LSTM network.
Therefore, the invention optimizes the LSTM, adds a weight gate on the basis of the traditional LSTM and forms a new AS-LSTM deep network. The weight gate is only related to the current input in the AS-LSTM deep network, and the AS-LSTM deep network can not only learn the semantic information of the context when being used, but also present strong robustness when being faced with the interference information in the previous and later periods, thereby obtaining more accurate effect in the actual NER (named entity recognition) task.
Furthermore, the named entity recognition system also comprises a Random Replace training method, wherein the Random Replace training method is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot starting mode.
Furthermore, the named entity recognition system also comprises an inventory database, and the inventory database is used for constructing an incremental training set by using a Random Replace training method on the basis of the training set. Specifically, the stock database can use a Random Replace method on the basis of a small amount of training sets, named entities which are marked are randomly replaced by names in the stock database by the Random Replace training method, an incremental training set is further constructed and formed, and the incremental training set is combined with the initial small amount of training sets and then input to the model for training.
In the named entity recognition system, an AS-LSTM deep network is combined with a Random Replace training method, and is combined with the application of a stock mechanism database, an incremental training set can be formed by using Random replacement role names on the training set through listing related semantic representations before and after, and the learning performance of the AS-LSTM deep network on the AS-LSTM deep network is combined, so that the shortcut of a rapid iteration model is found, and the iteration period of cold start is greatly shortened.
The invention also provides a named entity recognition method based on the deep network AS-LSTM, which is applied to the named entity recognition system for recognizing the text, and the starting form of the named entity recognition system formed by the deep network AS-LSTM is cold starting. The named entity identification method comprises the following steps:
s1, constructing a network model BI-AS-LSTM-CRF;
s2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
s3, dividing the recognition corpus in the S2 into a training set and a test set;
s4, inputting the training set into a network model BI-AS-LSTM-CRF for training to obtain a named entity training model;
s5, adopting the named entity training model in S4 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;
and S6, calculating and comparing the accuracy and the recall rate of the recognition result of the named entity recognition model obtained in the S5 on the test set.
In order to improve the named entity recognition method, the iteration period of cold start is greatly shortened, the quantity of a training set is increased by increasing a stock database, a large number of incremental training sets are generated, and the shortcut of a rapid iteration model is obtained. The named entity recognition method is characterized in that the starting form of a named entity recognition system formed by combining a deep network AS-LSTM with a stock database and a Random Replace training method is semi-hot start. Specifically, the named entity identification method comprises the following steps:
s1, constructing a network model BI-AS-LSTM-CRF;
s2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
s3, dividing the recognition corpus in the S2 into a training set and a test set;
s301, preparing a stock database of the identification target;
s4, based on the training set in S3, randomly replacing the stock database in S301 to obtain an incremental training set;
s401, merging the training set in S3 with the incremental training set in S4, and inputting the merged training set into a network model BI-AS-LSTM-CRF of S1 for training to obtain a named entity recognition model;
s5, adopting the named entity training model obtained in S401 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;
and S6, calculating and comparing the accuracy and the recall rate of the recognition results of the named entity recognition model in the S5 on the test set.
The method comprises the steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, and obtaining an entity label.
The marking step of identifying the corpus in the named entity comprises marking the beginning of an identification target by B-PRO and marking the middle part of the identification target by I-PRO.
In step S3, the ratio of the training set to the test set is (10:1) - (2: 1).
Preferably, in step S3, the ratio of the training set to the test set is 4: 1.
Compared with the prior art, the invention has the beneficial effects that:
1. a novel AS-LSTM deep network is designed in the named entity recognition system, the more stable and accurate cell state of the named entity in the input text in the preceding and following periods can be obtained, the network has the learning of the self-dependent importance, so that the context-related semantic representation can be learned, the robustness of dealing with irrelevant words in the preceding and following periods can be increased, and the error of the recognition system is reduced.
The AS-LSTM deep network, the Random Replace training method and the stock database are combined and applied, the named entity recognition system is converted from a cold start mode to a semi-hot start mode, the iteration period of cold start can be greatly shortened, the iteration efficiency of the model is improved, and the prediction result is more accurate and stable.
Drawings
In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings used in the description of the embodiment will be briefly introduced below. It should be apparent that the drawings in the following description are only for illustrating the embodiments of the present invention or technical solutions in the prior art more clearly, and that other drawings can be obtained by those skilled in the art without any inventive work.
FIG. 1 is a diagram of a conventional LSTM long and short term memory network;
FIG. 2a is a schematic diagram of a forgetting gate in a conventional LSTM network and an AS-LSTM deep network according to the present invention;
FIG. 2b is a schematic diagram of the input gates in a conventional LSTM network and the AS-LSTM deep network of the present invention;
FIG. 2c is a schematic diagram of the output gates in a conventional LSTM network and the AS-LSTM deep network of the present invention;
FIG. 3 is a schematic diagram of a bidirectional AS-LSTM network in the named entity recognition system of the present invention;
FIG. 4 is a schematic diagram of an AS-LSTM deep network in a bidirectional AS-LSTM network of the present invention;
FIG. 5 is a schematic diagram of the weight gates of the AS-LSTM deep network in the bidirectional AS-LSTM network of the present invention;
FIG. 6 is a flowchart of the cold start of AS-LSTM deep network formation in the named entity recognition method of the present invention;
FIG. 7 is a flowchart of a semi-hot start formed by combining an AS-LSTM deep network and a Random Replace training method in the named entity recognition method of the present invention.
Detailed Description
The invention will be further described with reference to specific embodiments, and the advantages and features of the invention will become apparent as the description proceeds. These examples are illustrative only and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention, and that such changes and modifications may be made without departing from the spirit and scope of the invention.
In the description of the present embodiments, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicit to a number of indicated technical features. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
Example 1:
the embodiment discloses a named entity recognition system based on a deep network AS-LSTM, which comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text feature layer, a context feature layer BI-AS-LSTM and a CRF layer.
Generally speaking, in the conventional named entity recognition system, a BI-LSTM network in a network model BI-LSTM-CRF for named entity recognition is formed by splicing 2 LSTM networks, as shown in fig. 1. Specifically, the LSTM network consists of three gates, namely a forgetting gate (as in fig. 2a), an output gate (as in fig. 2b), and an input gate (as in fig. 2c), and the core of the network is a cell state, which is represented by a horizontal line running through the cell, and the cell state is like a conveyor belt, and has only few branches running through the whole cell, so that it can be guaranteed that information flows through the whole network without change, and the network structure can learn the context semantic representation, and thus is very commonly used in context phaseRelated to natural language processing tasks. Wherein f ist=σ(Wf·[ht-1,xt]+bf) To forget the door; i.e. it=σ(Wi·[ht-1,xt]+bi) Is an input gate; ot=σ(Wo·[ht-1,xt]+bo) Is an output gate; ct=ft·Ct-1+it·tanh(Wc·[ht-1,xt]+bc) Indicating the cellular state; h ist=ot·tanh(Ct) The output of the bi-directional LSTM network is shown.
The calculation of the parameters of these gates is not only related to the current input of text, but also to the last input of text, and the semantic relationship of the context can be learned by appropriately constructing the LSTM. However, in the actual use process, the model trained by the LSTM may exhibit the characteristic of poor robustness in the context, and two irrelevant words before and after changing may result in completely different prediction results, for example, the vocabulary of "beijing" belongs to the semantic of "place" in many contexts, but it is difficult to cover all scenes in the actual training set, and it is not uncommon that prediction is wrong due to the use of the LSTM network.
The invention optimizes the LSTM, adds a weight gate on the basis of the traditional LSTM, and forms a new AS-LSTM deep network (AS shown in figure 4). That is, the AS-LSTM deep network includes a forgetting gate (AS in fig. 2a), an outputting gate (AS in fig. 2b), an inputting gate (AS in fig. 2c), and a weighting gate (AS in fig. 5), and the weighting gate is associated with the current input of the text, and the AS-LSTM deep network can not only learn the semantic information of the context, but also present strong robustness in the face of the interference information of the preceding and following text, thereby achieving more precise effect in the actual NER (named entity recognition) task. Wherein f ist=σ(Wf·[ht-1,xt]+bf) To forget the door; i.e. it=σ(Wi·[ht-1,xt]+bi) Is an input gate; ot=σ(Wo·[ht-1,xt]+bo) Is composed ofAn output gate; ct=ft·Ct-1+it·tanh(Wc·[ht-1,xt]+bc) Indicating the cellular state; a. thet=tanh(xt·WA) Is a weight gate; h ist=ot·tanh(Ct)·AtThe expression is the output of the bidirectional LSTM network, the weight selection gate is only related to the current input, the influence of all the contextual cell states is abandoned, and the network has the learning of self-dependent gravity, so that the context-related semantic expression can be learned, and the robustness of dealing with the context-independent words can be increased.
In this embodiment, the context feature layer BI-AS-LSTM includes 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network, AS shown in FIG. 3.
In the present embodiment, the text feature layer is used to extract feature information of the input text in the text, for example, it can extract feature information of words and sentences of the input text.
In this embodiment, the context feature layer BI-AS-LSTM is configured to output the extracted feature information to obtain an output sequence and obtain context features, specifically, the context feature layer BI-AS-LSTM obtains the output sequence from the feature information (e.g., sequence) extracted by the text feature layer through a bidirectional gated cyclic network (i.e., a bidirectional AS-LSTM network), and obtains the context features of the sentence through the linear layer;
in this embodiment, the CRF layer is a conditional random field model, and can learn a transition matrix of the named entity expression, for example: words ending with a company are more likely to be judged as business entities, and neural networks are typically used with CRFs. The CRF layer is used for acquiring the position information and the entity labels of the context features in the input text, and specifically, the CRF layer obtains the entity labels of the input text through the conditional random field CRF according to the position information of each word marked by the context features through BIO in the input text.
In the invention, a novel AS-LSTM deep network is designed in the named entity recognition system, which can discard the influence of the cell states of all contexts in the input text, and the network has the learning of the emphasis of the network, so that the semantic representation related to the context can be learned, the robustness of dealing with irrelevant words in the context can be increased, and the error of the recognition system can be reduced.
Example 2:
since most named entity recognition systems are cold-start modes and have the problem of low efficiency, at present, academic circles and industry have tried to use some methods, most commonly, pre-training models are used for word embedding, such as ELMO, BERT, GPT-3 and other pre-training models with extremely large parameter quantity are used as generators of upstream word vectors, and then fine tune tuning is performed on downstream tasks, but for many and scientific research units, the computing resources and cost brought by the pre-training models are too large, the service interface reaction speed is too slow, for example, the NER model prediction speed of RT BEunder the calculation of a common GPU is about 500ms, and the speed is very slow to meet daily use and service.
Therefore, this embodiment is an improvement on the basis of embodiment 1, and a Random Replace training method is added to the named entity recognition system, and the Random Replace training method is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot start manner.
Further, the named entity recognition system further comprises an inventory database, wherein the inventory database is used for constructing an incremental training set by using a Random Replace training method on the basis of the training set. Specifically, the stock database can use a Random Replace method on the basis of a small amount of training sets, named entities which are marked are randomly replaced by names in the stock database by the Random Replace training method, an incremental training set is further constructed and formed, and the incremental training set is combined with the initial small amount of training sets and then input to the model for training.
In the named entity recognition system, an AS-LSTM deep network is combined with a Random Replace training method, and is combined with the application of a stock mechanism database, an incremental training set can be formed by using Random replacement role names on the training set through listing related semantic representations before and after, and the learning performance of the AS-LSTM deep network on the AS-LSTM deep network is combined, so that the shortcut of a rapid iteration model is found, and the iteration period of cold start is greatly shortened.
Example 3:
AS shown in fig. 6, this embodiment discloses a named entity recognition method based on the deep network AS-LSTM, which is applied to the named entity recognition system to recognize texts, and the start form of the named entity recognition system formed by the deep network AS-LSTM is cold start. The named entity identification method comprises the following steps:
s1, constructing a network model BI-AS-LSTM-CRF;
specifically, the construction of the network model BI-AS-LSTM-CRF comprises a plurality of steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, obtaining an entity label and the like.
S2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
for example: the named entity identification method is applied to identification of the bidding bulletin, a target object of the bidding bulletin is used as an NER (named entity identification) identification target, named entity identification linguistic data is labeled, the beginning of the target object is labeled according to B-PRO, and the middle part of the target object is labeled according to I-PRO.
S3, dividing the recognition corpus in the S2 into a training set and a test set;
specifically, the ratio of the training set to the test set is (10:1) - (2:1), and in this step, the ratio of the training set to the test set is preferably 4: 1.
s4, inputting the training set into a network model BI-AS-LSTM-CRF for training to obtain a named entity training model;
s5, adopting the named entity training model in S4 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of a named entity recognition model (namely an NER recognition model);
specifically, the recognition accuracy of the named entity recognition model can reach F1 score.
And S6, calculating and comparing the accuracy and the recall rate of the recognition result of the named entity recognition model obtained in the S5 on the test set.
Example 4:
as shown in fig. 7, in order to improve the named entity recognition method in embodiment 3, the iteration period of the cold start is greatly shortened, the amount of the training set is increased by adding the stock database, a large number of incremental training sets are generated, and a shortcut of the fast iteration model is obtained. The named entity recognition method is characterized in that the starting form of a named entity recognition system formed by combining a deep network AS-LSTM with a stock database and a Random Replace training method is semi-hot start.
The named entity identification method comprises the following steps:
s1, constructing a network model BI-AS-LSTM-CRF;
specifically, the construction of the network model BI-AS-LSTM-CRF comprises a plurality of steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, obtaining an entity label and the like.
S2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
for example: the named entity identification method is applied to identification of the bidding bulletin, a buyer and an agency are used as NER identification targets, named entity identification linguistic data are marked, the head of the buyer is marked according to B-ORG1, the middle part of the buyer is marked by I-ORG1, the head of the agency is marked by B-ORG3, and the middle part of the agency is marked by I-ORG 3.
Specifically, the labeling step of identifying the corpus in the named entity comprises labeling the beginning of the identification target by B-PRO and labeling the middle part of the identification target by I-PRO.
S3, dividing the recognition corpus in the S2 into a training set and a test set;
specifically, the ratio of the training set to the test set is (10:1) - (2:1), and in this step, the ratio of the training set to the test set is preferably 4: 1.
s301, preparing a stock database of the identification target;
s4, based on the training set in S3, randomly replacing the stock database in S301 to obtain an incremental training set;
s401, merging the training set in S3 with the incremental training set in S4, and inputting the merged training set into a network model BI-AS-LSTM-CRF of S1 for training to obtain a named entity recognition model;
s5, adopting the named entity training model obtained in S401 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of a named entity recognition model (namely an NER recognition model);
specifically, the recognition accuracy of the named entity recognition model can reach F1 score.
And S6, calculating and comparing the accuracy and the recall rate of the recognition results of the named entity recognition model in the S5 on the test set.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (10)
1. Named entity recognition system based on deep network AS-LSTM, its characterized in that: the system comprises a network model BI-AS-LSTM-CRF, wherein the network model BI-AS-LSTM-CRF comprises a text characteristic layer, a context characteristic layer BI-AS-LSTM and a CRF layer; the text feature layer is used for extracting feature information of an input text in the text, the context feature layer BI-AS-LSTM is used for outputting the extracted feature information to obtain an output sequence and obtain context features, and the CRF layer is used for obtaining position information and entity labels of the context features in the input text;
the context feature layer BI-AS-LSTM comprises 2 AS-LSTM deep networks, and the 2 AS-LSTM deep networks are spliced to form a bidirectional AS-LSTM network.
2. The named entity recognition system of claim 1, wherein: the AS-LSTM deep network comprises a forgetting gate, an output gate, an input gate and a weight gate, and the weight gate is associated with the current input of the text.
3. The named entity recognition system of claim 2, wherein: the named entity recognition system also comprises a Random Replace training method which is combined with the AS-LSTM deep network to enable the named entity recognition system to be started in a semi-hot starting mode.
4. The named entity recognition system of claim 2, wherein: the named entity recognition system further comprises a stock database, and the stock database is used for constructing an incremental training set on the basis of the training set by using a Random Replace training method.
5. A named entity recognition method based on a deep network AS-LSTM is applied to a named entity recognition system for recognizing texts, and is characterized by comprising the following steps:
s1, constructing a network model BI-AS-LSTM-CRF;
s2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
s3, dividing the recognition corpus in the S2 into a training set and a test set;
s4, inputting the training set into a network model BI-AS-LSTM-CRF for training to obtain a named entity training model;
s5, adopting the named entity training model in S4 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;
and S6, calculating and comparing the accuracy and the recall rate of the recognition result of the named entity recognition model obtained in the S5 on the test set.
6. The named entity recognition method of claim 5, comprising the steps of:
s1, constructing a network model BI-AS-LSTM-CRF;
s2, determining an identification target, and marking the identification corpus of the named entity in a distinguishing marking mode;
s3, dividing the recognition corpus in the S2 into a training set and a test set;
s301, preparing a stock database of the identification target;
s4, based on the training set in S3, randomly replacing the stock database in S301 to obtain an incremental training set;
s401, merging the training set in S3 with the incremental training set in S4, and inputting the merged training set into a network model BI-AS-LSTM-CRF of S1 for training to obtain a named entity recognition model;
s5, adopting the named entity training model obtained in S401 to carry out named entity recognition on the test set in S3, and obtaining a recognition result of the named entity recognition model;
and S6, calculating and comparing the accuracy and the recall rate of the recognition results of the named entity recognition model in the S5 on the test set.
7. The named entity recognition method of claim 5 or 6, wherein the network model BI-AS-LSTM-CRF is constructed by: the method comprises the steps of extracting characteristic information of an input text in the text, outputting the characteristic information to obtain an output sequence, obtaining context characteristics of the input text, marking position information of each word in the input text by the context characteristics through BIO, and obtaining an entity label.
8. The named entity recognition method of claim 5 or 6, wherein the recognition of the corpus tag in the named entity comprises tagging the beginning of the recognition target with B-PRO and tagging the middle part of the recognition target with I-PRO.
9. The named entity recognition method of claim 5 or 6, wherein in step S3, the ratio of the training set to the test set is (10:1) - (2: 1).
10. The named entity recognition method of claim 9, wherein in step S3, the ratio of the training set to the test set is 4: 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011140319.6A CN112257447B (en) | 2020-10-22 | 2020-10-22 | Named entity recognition system and recognition method based on depth network AS-LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011140319.6A CN112257447B (en) | 2020-10-22 | 2020-10-22 | Named entity recognition system and recognition method based on depth network AS-LSTM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112257447A true CN112257447A (en) | 2021-01-22 |
CN112257447B CN112257447B (en) | 2024-06-18 |
Family
ID=74263155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011140319.6A Active CN112257447B (en) | 2020-10-22 | 2020-10-22 | Named entity recognition system and recognition method based on depth network AS-LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112257447B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510943A (en) * | 2022-02-18 | 2022-05-17 | 北京大学 | Incremental named entity identification method based on pseudo sample playback |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109117472A (en) * | 2018-11-12 | 2019-01-01 | 新疆大学 | A kind of Uighur name entity recognition method based on deep learning |
CN109241520A (en) * | 2018-07-18 | 2019-01-18 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition |
CN111680786A (en) * | 2020-06-10 | 2020-09-18 | 中国地质大学(武汉) | Time sequence prediction method based on improved weight gating unit |
-
2020
- 2020-10-22 CN CN202011140319.6A patent/CN112257447B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241520A (en) * | 2018-07-18 | 2019-01-18 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multilayer error Feedback Neural Network for segmenting and naming Entity recognition |
CN109117472A (en) * | 2018-11-12 | 2019-01-01 | 新疆大学 | A kind of Uighur name entity recognition method based on deep learning |
CN111680786A (en) * | 2020-06-10 | 2020-09-18 | 中国地质大学(武汉) | Time sequence prediction method based on improved weight gating unit |
Non-Patent Citations (4)
Title |
---|
SANDEEP SUBRAMANIAN: "Neural Architectures for Named Entity Recognition", 《IEEE》, pages 1 - 11 * |
丁晟春: "基于Bi-LSTM-CRF 的商业领域命名实体识别", 《现代情报》, vol. 40, pages 103 - 109 * |
张华丽;康晓东;李博;王亚鸽;刘汉卿;白放;: "结合注意力机制的Bi-LSTM-CRF中文电子病历命名实体识别", 计算机应用, no. 1 * |
张华丽;康晓东;李博;王亚鸽;刘汉卿;白放;: "结合注意力机制的Bi-LSTM-CRF中文电子病历命名实体识别", 计算机应用, no. 1, 10 July 2020 (2020-07-10) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510943A (en) * | 2022-02-18 | 2022-05-17 | 北京大学 | Incremental named entity identification method based on pseudo sample playback |
CN114510943B (en) * | 2022-02-18 | 2024-05-28 | 北京大学 | Incremental named entity recognition method based on pseudo sample replay |
Also Published As
Publication number | Publication date |
---|---|
CN112257447B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
Zayats et al. | Disfluency detection using a bidirectional LSTM | |
CN111062217B (en) | Language information processing method and device, storage medium and electronic equipment | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN111209401A (en) | System and method for classifying and processing sentiment polarity of online public opinion text information | |
CN111401058B (en) | Attribute value extraction method and device based on named entity recognition tool | |
CN109284400A (en) | A kind of name entity recognition method based on Lattice LSTM and language model | |
Gao et al. | Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF | |
CN114416942A (en) | Automatic question-answering method based on deep learning | |
CN113761890B (en) | Multi-level semantic information retrieval method based on BERT context awareness | |
CN115357719B (en) | Power audit text classification method and device based on improved BERT model | |
CN117149984B (en) | Customization training method and device based on large model thinking chain | |
CN113360647A (en) | 5G mobile service complaint source-tracing analysis method based on clustering | |
CN105389303B (en) | A kind of automatic fusion method of heterologous corpus | |
CN112417132A (en) | New intention recognition method for screening negative samples by utilizing predicate guest information | |
CN115374786A (en) | Entity and relationship combined extraction method and device, storage medium and terminal | |
CN114595687B (en) | Laos text regularization method based on BiLSTM | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN113239694B (en) | Argument role identification method based on argument phrase | |
CN112036186A (en) | Corpus labeling method and device, computer storage medium and electronic equipment | |
CN118113810A (en) | Patent retrieval system combining patent image and text semantics | |
CN112257447A (en) | Named entity recognition system and recognition method based on deep network AS-LSTM | |
CN113177121A (en) | Text topic classification method and device, electronic equipment and storage medium | |
CN117556789A (en) | Student comment generation method based on multi-level semantic mining | |
CN114880994B (en) | Text style conversion method and device from direct white text to irony text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |