CN111832303A

CN111832303A - Named entity identification method and device

Info

Publication number: CN111832303A
Application number: CN201910292850.6A
Authority: CN
Inventors: 张鹏
Original assignee: Potevio Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-10-27

Abstract

The application discloses a named entity identification method and a device, wherein the method comprises the following steps: training a named entity recognition model by using sample sentences in a preset training sample set; converting words contained in each sample sentence into word vectors by using a knowledge graph and a one-hot representation method, inputting the word vectors into the named entity recognition model for training, wherein the named entity recognition model consists of a first full connection layer, a second full connection layer, a CNN layer, a BLSTM layer and a CRF layer in sequence according to a data processing sequence; the first full-connection layer is used for performing linear operation on entity parts in the word vectors; the second full-connection layer is used for performing linear operation on the relation part in the word vector; and for the sentence of the named entity to be identified, converting the words contained in the sentence into word vectors by using a knowledge graph and a one-hot representation method, and inputting the word vectors into the model for processing to obtain a corresponding entity identification result. By applying the technical scheme disclosed by the application, the accuracy of named entity identification can be improved.

Description

Named entity identification method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for identifying a named entity.

Background

Named Entity Recognition (NER) is a classical problem in natural language processing and its application is also extremely broad. Such as identifying a person's name, a place name from a sentence, identifying a product's name from a search for e-commerce, identifying a drug name, etc. The currently popular network framework is a neural network model of a Convolutional Neural Network (CNN) + a bidirectional long and short term memory network (BLSTM) + a Conditional Random Field (CRF) (namely CNN + BLSTM + CRF).

The neural network model described above uses CNN as a feature extraction layer. The method adopts BLSTM as a main nonlinear hidden layer, and uses CRF as an output layer to perform named entity recognition on the associated sequence. The network structure is shown in fig. 1. The original input of the existing named entity recognition method is a sentence, wherein the sentence has been participled and then converted into a word vector by means of the word2vec tool.

The output of Word2vec will be fed into the CNN, the convolutional neural network, which contains the convolutional layer, the max pooling layer, and the nonlinear activation layer. The CNN can effectively extract local features of the word vector. And a BLSTM layer is connected behind the CNN layer. BLSTM is composed of two columns of LSTM in opposite directions. Thus a forward LSTM network may learn the features of a sentence from the forward direction, while a backward LSTM may learn the features of a sentence from the backward to the forward direction. Fig. 2 shows a schematic diagram of the internal structure of the LSTM. Through the combination mode of two columns of LSTMs in opposite directions, the network can fully combine the contextual information to obtain the association between words.

And a CRF layer is connected behind the BLSTM, and the CRF layer rearranges the result output by the BLSTM according to the sequence information of the sentences to obtain the final entity recognition result.

The inventor discovers that in the process of implementing the invention: the existing named entity recognition scheme has the problem that the named entity cannot be accurately recognized in practical application, and the specific analysis is as follows:

the traditional named entity recognition method is based on word vectors directly obtained by a word2vec method, and then named entity recognition is carried out. While the potential assumption for word2vec is: similar words have similar contexts. In other words, a particular context can only be matched with a certain semantic. By maximizing the conditional probability, the correspondence between words and contexts is maximized, thereby satisfying the basic assumption: similar words have similar contexts. The word vector with the maximum conditional probability is satisfied, and the word vector becomes reasonable representation of word semantics. However, in actual use, due to lack of effective semantic information, sometimes words, especially potential meanings of entities, cannot be accurately captured depending on statistics, so that the above method has a serious semantic drift problem for ambiguous words and the like, which causes inaccurate expression of words by embedded vectors, and thus, entities cannot be accurately identified, thereby affecting accuracy of entity identification.

Disclosure of Invention

The application provides a named entity identification method and device, which can improve the accuracy of named entity identification.

The embodiment of the invention discloses a named entity identification method, which comprises the following steps:

training a named entity recognition model by using sample sentences in a preset training sample set; converting each word contained in each sample sentence into a corresponding word vector by using a preset knowledge graph and a one-hot representation method, and inputting the word vector into the named entity recognition model for training, wherein the named entity recognition model sequentially consists of a first full connection layer, a second full connection layer, a Convolutional Neural Network (CNN) layer, a bidirectional long and short term memory network (BLSTM) layer and a Conditional Random Field (CRF) layer according to a data processing sequence; the first fully-connected layer is used for performing linear operation on entity parts in the word vectors; the second fully-connected layer is used for performing linear operation on the relation part in the word vector;

and for the statement of the named entity to be recognized, converting each word contained in the statement into a corresponding word vector by using the knowledge graph and the one-hot representation method, and inputting the word into the trained named entity recognition model for entity recognition processing to obtain an entity recognition result of each word in the statement.

Preferably, converting the word into a corresponding word vector comprises:

for each word, acquiring M outgoing degree paths and M incoming degree paths of the word in a specified number in a preset knowledge graph according to a preset path acquisition principle;

for each path, generating a vector corresponding to each entity and each relation on the path by adopting a one-hot representation method;

for each word, according to a preset concatenation principle, concatenating all vectors corresponding to the word to obtain a word vector corresponding to the word, wherein the concatenation principle comprises: and for each path, the vectors corresponding to the entities and the relations on the path are connected in series according to the sequence of the entities and the relations on the path.

Preferably, the path obtaining principle includes:

when the word has a corresponding entity in the knowledge graph, if the length of the outbound path of the corresponding entity is smaller than a preset length N, a path expansion method is adopted to expand the corresponding path into a path with the length of N; the path expansion method comprises the following steps: adding N-L expansion entities at the tail end of the path, and setting newly added relations between adjacent entities to be equal, wherein L is the length of the path before expansion, and the expansion entity is the last entity at the tail of the path before expansion; if the length of the incoming path of the corresponding entity is less than N, the path expansion method is adopted to expand the corresponding path into the path with the length of N;

when the word has a corresponding entity in the knowledge graph, if the number K of out-degree paths of the corresponding entity_oIf the number of the entities is less than the preset number M, the entity is taken as a basic node, and M-K is generated for the entity according to a path generation method_oA new outbound path, wherein the path generation method comprises the following steps: adding N-1 entities which are the same as the basic node behind the basic node to obtain a path, and setting the relationship between adjacent entities on the path to be equal; if the number of in-degree paths K of the corresponding entity_iIf the value is less than M, the entity is taken as the basic node to generate M-K for the entity according to the path generation method_iSetting a new in-degree path;

when the word has a corresponding entity in the knowledge graph, if the number of the outgoing routes of the corresponding entity is greater than M, sorting the outgoing routes of the entity according to the descending order of the sum of the attribute numbers of all the entities on the route, and selecting the first M outgoing routes from the sorting result; if the number of the entry paths of the corresponding entities is larger than M, sorting the entry paths of the entities according to the descending order of the sum of the attribute numbers of all the entities on the paths, and selecting the first M entry paths from the sorting result;

and when the word does not have a corresponding entity in the knowledge graph, constructing an entity for the word in the knowledge graph, and establishing M in-degree paths and M out-degree paths for the word according to the path generation method by taking the constructed entity as a basic node.

Preferably, N is 3 and M is 3.

Preferably, the CNN layer includes a plurality of CNNs.

A named entity recognition apparatus comprising: a processor to:

Preferably, the processor is specifically configured to convert the word into a corresponding word vector, and includes:

Preferably, the path obtaining principle includes:

when the word has a corresponding entity in the knowledge graph, if the number K of out-degree paths of the corresponding entity_oIf the number of the entities is less than the preset number M, the entity is taken as a basic node, and M-K is generated for the entity according to a path generation method_oA new outbound path, wherein the path generation method comprises the following steps: adding N-1 entities which are the same as the basic node behind the basic node to obtain a path, and setting the relationship between adjacent entities on the path to be equal; such asNumber of in-degree paths K of corresponding entity_iIf the value is less than M, the entity is taken as the basic node to generate M-K for the entity according to the path generation method_iSetting a new in-degree path;

Preferably, N is 3 and M is 3.

Preferably, the CNN layer includes a plurality of CNNs.

The present application also discloses a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the named entity recognition method as previously described.

The application also discloses an electronic device comprising the non-volatile computer-readable storage medium as described above, and the processor having access to the non-volatile computer-readable storage medium.

According to the technical scheme, the knowledge graph and the one-hot representation method are used for generating word vectors, so that the vector representation information of words can be enriched by means of accurate description of attributes and relations of entities through the knowledge graph, and the recognition accuracy rate of named entities can be improved.

Drawings

FIG. 1 is a diagram illustrating a neural network model structure for identifying named entities in the prior art;

FIG. 2 is a schematic diagram of the internal structure of the LSTM;

FIG. 3 is a flowchart illustrating a named entity recognition method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a named entity recognition model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.

The core idea of the invention is to accurately describe the attributes and relationships of the entities by means of the knowledge graph, and embed the attributes and relationships as part of the characteristics, thereby enriching vector representation information of words and improving the recognition accuracy of named entities of words not in the knowledge graph.

Fig. 3 is a schematic flow chart of a named entity identification method according to an embodiment of the present invention, and as shown in fig. 3, the method includes:

301, training the named entity recognition model by using sample sentences in a preset training sample set.

Converting each word contained in each sample sentence into a corresponding word vector by using a preset knowledge graph and a one-hot representation method, and inputting the word vector into the named entity recognition model for training, wherein the named entity recognition model sequentially consists of a first full connection layer, a second full connection layer, a Convolutional Neural Network (CNN) layer, a bidirectional long and short term memory network (BLSTM) layer and a Conditional Random Field (CRF) layer according to a data processing sequence; the first fully-connected layer is used for performing linear operation on entity parts in the word vectors; the second fully-connected layer is to perform a linear operation on the relationship part in the word vector.

Step 301 is used to train the named entity recognition model, where one of the essential differences from the existing model training methods is: the method for generating the word vector is different from the existing scheme, namely, each word contained in the sentence needs to be converted into a corresponding word vector by using a preset knowledge graph and a one-hot representation. As described above, by accurately describing the attributes and relationships of the entities by means of the knowledge graph, and embedding the attributes and relationships as part of the features, vector representation information of words can be enriched, and the recognition accuracy of named entities of words not existing in the knowledge graph can be improved.

Here, unlike the existing named entity recognition model, it is considered that the word vectors generated by the above method are high-dimensional and sparse vectors, and each word cannot be accurately represented even if the word vectors are simply stacked characteristically. Therefore, the named entity recognition model adopted in the present application adds two fully-connected layers, namely a first fully-connected layer W1 and a second fully-connected layer W2, before the CNN layer, and performs linear operations (i.e., y ═ wx + b) with them to realize dimension compression on word vectors. Fig. 4 is a schematic diagram of a named entity recognition model structure adopted in the embodiment of the present invention. As shown in fig. 4, two fully-connected layers, i.e., a W1 layer and a W2 layer, are provided in the model before CNN.

Specifically, during training, a first full-connection layer is used for carrying out multiplication operation on entity parts in word vectors, and a weight matrix of the layer is used for carrying out matrix multiplication operation on each entity part in the word vectors so as to compress the dimensionality of the entity vectors; and then, multiplying the relation parts in the word vectors by using the second full connection layer, and respectively performing matrix multiplication on each relation part in the second full connection layer by using the weight matrix of the layer so as to compress the dimensionality of the relation vectors. In practical application, in order not to perform linear operation on the relationship part data in the first fully-connected layer, the weight corresponding to the relationship part in the weight matrix may be set to 1.

Specifically, how to perform linear operations on the first fully-connected layer and the second fully-connected layer to implement the above method for performing dimension compression on the entity part and the relationship part in the word vector is known by those skilled in the art, and is not described herein again.

The output of the second fully-connected layer is received by the convolutional neural network layer, and preferably, considering that the dimensionality of the output of the second fully-connected layer is higher, a plurality of layers of convolutional neural network layers can be adopted here, and the dimensionality of the internal vector is reduced by using pooling operation, so that the calculated amount of subsequent processing is reduced, and high-dimensionality features are extracted. And connecting the BLSTM layer and the CRF layer after the CNN layer, and finally marking an entity label on each input word by the CRF layer. And then, calculating a corresponding loss function value according to the entity identification result of the CRF layer, and adjusting the parameters of the named entity identification model according to the loss function value.

In practical application, the loss function of the named entity recognition model may be a cross entropy loss function, but is not limited thereto, and may also be implemented by using other existing loss functions, parameter optimization may be performed by using an adam algorithm during training, and training may be stopped when the loss function reaches an expectation.

Step 302, for the sentence of the named entity to be recognized, converting each word contained in the sentence into a corresponding word vector by using the knowledge graph and the one-hot representation method, and inputting the word into the trained named entity recognition model for entity recognition processing to obtain an entity recognition result of each word in the sentence.

Preferably, converting the word into a corresponding word vector comprises:

Preferably, the path obtaining principle includes:

Preferably, N is 3 and M is 3.

Preferably, the CNN layer includes a plurality of CNNs.

Corresponding to the above method embodiment, an embodiment of the present invention further provides a named entity identifying device, including: a processor to:

Preferably, the path obtaining principle includes:

Preferably, N is 3 and M is 3.

Preferably, the CNN layer includes a plurality of CNNs.

The present application also provides a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the named entity recognition method as previously described.

The present application also provides an electronic device comprising the non-volatile computer-readable storage medium as described above, and the processor having access to the non-volatile computer-readable storage medium.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A named entity recognition method, comprising:

2. The method of claim 1, wherein: converting the word into a corresponding word vector comprises:

3. The method of claim 2, wherein: the path acquisition principle comprises:

when the word has a corresponding entity in the knowledge graph, if the number K of out-degree paths of the corresponding entity_oIf the number of the M is less than the preset number, the entity is taken as a basic node, and the M is generated for the entity according to a path generation method-K_oA new outbound path, wherein the path generation method comprises the following steps: adding N-1 entities which are the same as the basic node behind the basic node to obtain a path, and setting the relationship between adjacent entities on the path to be equal; if the number of in-degree paths K of the corresponding entity_iIf the value is less than M, the entity is taken as the basic node to generate M-K for the entity according to the path generation method_iSetting a new in-degree path;

4. The method of claim 3, wherein: n is 3 and M is 3.

5. The method of claim 1, wherein: the CNN layer includes a plurality of CNNs.

6. A named entity recognition apparatus comprising: a processor, characterized in that: the processor is configured to:

7. The apparatus of claim 6, wherein: the processor is specifically configured to convert the word into a corresponding word vector, and includes:

8. The apparatus of claim 7, wherein: the path acquisition principle comprises:

9. The apparatus of claim 8, wherein: n is 3 and M is 3.

10. The apparatus of claim 6, wherein: the CNN layer includes a plurality of CNNs.

11. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of the named entity recognition method according to any one of claims 1 to 5.

12. An electronic device comprising the non-volatile computer-readable storage medium of claim 11, and the processor having access to the non-volatile computer-readable storage medium.