CN114638227A

CN114638227A - Named entity identification method, device and storage medium

Info

Publication number: CN114638227A
Application number: CN202011477961.3A
Authority: CN
Inventors: 王惠欣; 胡珉; 高扬; 李飞; 黄河燕
Original assignee: China Mobile Communications Group Co Ltd; Beijing Institute of Technology BIT; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; Beijing Institute of Technology BIT; China Mobile Communications Ltd Research Institute
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2022-06-17

Abstract

The invention discloses a named entity identification method, a named entity identification device and a storage medium, wherein the named entity identification method comprises the following steps: pre-training by using a BERT model in a token mode and a token mode in which words are respectively used and words after word segmentation are used; after the pre-training is finished, splicing token supplementary information on the last hidden layer by using the token of the last layer of the output part Transformer and the hidden layer respectively; connecting a softmax classification layer above the last layer of the Transformer in series; after the classification probability of each token based on the characters and the words is respectively obtained, converting the classification probability of each token based on the words into a label probability based on the characters; according to the classification probability based on the words and the word, each token takes the highest value as the label value of the token. By adopting the invention, the model performance can be improved; what is captured is the true bidirectional context information; and the entity slot position is supplemented and coded, so that the entity slot position is reasonably utilized.

Description

Named entity identification method, device and storage medium

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a named entity identification method, apparatus, and storage medium.

Background

The NER (Named Entity Recognition) refers to Recognition of entities with specific meaning in text or character strings, and mainly includes names of people, places, organizations, proper nouns, and the like. Judging whether a named entity is correctly identified includes two aspects: whether the boundaries of the entity are correct; whether the type of the entity is correctly labeled. The named entities in english have a relatively obvious formal notation (i.e., the first letter of each word in the entity is capitalized), so entity boundary identification is relatively easy and the focus of the task is to determine the category of the entity. Compared with English, the Chinese named entity recognition task is more complex, and compared with the entity class labeling subtasks, the recognition of the entity boundary is more difficult.

The existing named entity identification method mainly comprises the following steps: rule-based methods, statistical-based methods.

The early named entity recognition is mostly based on a rule method, most of which adopts a rule template constructed by linguists, selects methods with characteristics including statistical information, punctuation marks, keywords, indicator words, direction words, position words (such as tails), central words and the like, and takes matching of a mode and a character string as a main means. The method is mainly used in specific occasions where some characteristics are easy to generalize.

Statistical-based methods are all classified methods at the end of their roots, giving multiple classes of named entities, and then using models to classify the entities in the text. Two ideas can be divided: one is to recognize the boundary of all named entities in the text and classify the named entities; the other type is a serialization labeling method, for each word in the text, a plurality of candidate category labels can be provided, the labels correspond to the positions of the word in various named entities, at the moment, the NER task is to perform serialization automatic labeling on each word in the text, then integrate the automatically labeled labels, and finally obtain the named entities formed by a plurality of words and the categories thereof. Among these, serialization labeling is the most efficient and most prevalent NER method. Typical methods are: SVM (Support Vector Machine), ME (Maximum Entropy), HMM (Hidden Markov Model), CRF (Conditional Random Field), Neural Network (Neural Network), and the like.

The defects of the prior art are as follows: due to the inherent limitation of various named entity identification methods in principle, the existing identification model has an entity boundary problem.

Disclosure of Invention

The invention provides a named entity recognition method, a named entity recognition device and a storage medium, which are used for solving the problem of entity boundary existing in a named entity recognition model based on words.

The invention provides the following technical scheme:

a named entity recognition method, comprising:

pre-training by using a BERT model in a token mode and a token mode in which words are respectively used and words after word segmentation are used;

after the pre-training is finished, splicing token supplementary information on the last hidden layer by using the token of the last layer of the output part Transformer and the hidden layer respectively;

connecting a softmax classification layer above the last layer of the Transformer in series;

after the classification probability of each token based on the characters and the words is respectively obtained, converting the classification probability of each token based on the words into a label probability based on the characters;

according to the classification probability based on the words and the word, each token takes the highest value as the label value of the token.

In an implementation, the Token supplementary information is an average of corresponding word vectors of one or a combination of the following information that can be collected: known entity definitions, known entity description information, structured knowledge-graph information corresponding to known entities.

In implementation, the embedding of the BERT model input is the summation of the following parametric characterizations: word representation, positional representation, segment representation.

In implementation, the vectorization of a word or word of word representation is represented as: using the words or characters in the divided linguistic data as a dictionary, training corresponding words or character vectors by using the Chinese linguistic data, and obtaining word vectorization representation of token;

the position vector of the positional representation is expressed as: after embedding the position information, obtaining the vector representation of the position;

the sentence vector of segment representation is represented as: for the data of sentence pairs, the Embedding of sentence a is added to each word of the previous sentence, and the Embedding of sentence B is added to each word of the next sentence.

In implementation, the pre-training is the pre-training of the Masked Language Model.

In the implementation, before the pre-training using the BERT model, the method further includes:

whether the two classes generated in a Chinese language corpus are models of the next sentence or not is trained in advance.

In an implementation, the method further comprises the following steps:

and performing combined retraining on the parameters acquired by the pre-training.

A named entity recognition apparatus comprising:

a processor for reading the program in the memory, performing the following processes:

after the pre-training is finished, splicing token supplementary information on the last hidden layer by using the token of the last layer of the output part transducer and the hidden layer respectively;

according to the classification probability based on the characters and words, taking the highest value of each token as the label value of the token;

a transceiver for receiving and transmitting data under the control of the processor.

In an implementation, the Token supplementary information is an average of corresponding word vectors of one or a combination of the following information that can be collected: known entity definitions, known entity description information, structured knowledge graph information corresponding to known entities.

In implementation, the embedding of the BERT model input is the sum of the following parametric characterizations: word representation, positional representation, segment representation.

whether the two classes generated in a Chinese corpus are models of the next sentence or not is trained in advance.

In an implementation, the method further comprises the following steps:

and performing combined retraining on the parameters obtained by the pre-training.

A named entity recognition apparatus comprising:

the pre-training module is used for pre-training by using a BERT model in two ways of respectively using characters and segmented words by token;

the Transformer module is used for splicing token supplementary information on the last hidden layer by using the token of the last layer of the output part Transformer and the hidden layer after the pre-training is finished;

the softmax module is used for connecting a softmax classification layer in series on the last layer of the Transformer;

the probability module is used for converting the classification probability of each token based on words into the label probability based on the words after respectively obtaining the classification probability of each token based on the words;

and the label module is used for taking the highest value of each token as the label value of the token according to the classification probability based on the characters and the words.

In an implementation, the Transformer module is further configured to use an average of corresponding word vectors of one or a combination of the following information that can be collected: the known entity definition, the known entity description information and the structured knowledge graph information corresponding to the known entity are the Token supplementary information.

In an implementation, the pre-training module is further configured to sum the following parametric characterizations at the embedding of the BERT model input: word representation, positional representation, segment representation.

In an implementation, the pre-training module is further configured to sum the following parametric characterizations at the embedding of the BERT model input:

the vectorized representation of a word or word of word representation is: using the words or characters in the divided linguistic data as a dictionary, training corresponding words or character vectors by using the Chinese linguistic data, and obtaining word vectorization representation of token;

In implementation, the pre-training module is further used for pre-training with the Masked Language Model.

In implementation, the pre-training module is further configured to pre-train whether a second class generated in a chinese corpus is a next sentence model before pre-training using the BERT model.

In implementation, the pre-training module is further configured to perform joint retraining on the parameters obtained by the pre-training.

A computer-readable storage medium having stored thereon a computer program for executing the above named entity recognition method.

The invention has the following beneficial effects:

in the technical scheme provided by the embodiment of the invention, as the token is used for pre-training by using the BERT model in two ways of respectively using the characters and the words after word segmentation, for the problem of entity boundary existing in the named entity recognition model based on the characters, the problem that errors of entities which are not registered in the words can be transmitted forwards is reduced by using the BERT model based on the characters and the words, so that the performance of the model is improved;

because a Transformer is used, the method is more efficient and can capture longer-distance dependence relative to RNN, and compared with the prior pre-training model, the method captures bidirectional context information in a real sense;

for the problem of entity boundary existing in the BI-LSTM-CRF model based on the characters, the model performance can be further improved because the entity boundary is determined by adopting a mode of fusion adjustment based on the characters and the words;

because the token complementary information can be spliced on the last hidden layer of the token of the hidden layer, the entity slot can be complemented and coded by utilizing external complementary information such as entity definition, entity description information, structured knowledge map information corresponding to the entity and the like, and the information can be reasonably utilized;

furthermore, because the parameters obtained by pre-training are subjected to combined retraining, the training model can be adjusted in a pre-training model mode, and the problem that a good training model is difficult to obtain when the training data of the existing entity recognition method is less is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic flow chart illustrating an implementation of a named entity recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a BERT model according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a pre-training model according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a named entity recognition apparatus 1 according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a named entity recognition apparatus 2 according to an embodiment of the present invention.

Detailed Description

The inventor notices in the process of invention that:

the existing named entity identification method mainly comprises the following steps: rule-based methods, statistical-based methods. The early named entity recognition is mostly based on a rule method, most of which adopts a rule template constructed by linguists, selects methods with characteristics including statistical information, punctuation marks, keywords, indicator words, direction words, position words (such as tails), central words and the like, and takes matching of a mode and a character string as a main means. While the extracted rules reflect linguistic phenomena more accurately, rule-based methods perform better than statistical-based methods. However, these rules often depend on specific languages, domains and text styles, are time-consuming in programming, are difficult to cover all linguistic phenomena, are particularly prone to errors, are not well portable, and require linguistic experts to rewrite the rules for different systems. Therefore, the method is mainly used in specific occasions where some features are easy to generalize.

Statistical-based methods are all classified methods at the end of their roots, giving multiple classes of named entities, and then using models to classify the entities in the text. Two ideas can be divided: one is to recognize the boundary of all named entities in the text and classify the named entities; the other type is a serialization labeling method, each word in the text can be provided with a plurality of candidate category labels, the labels correspond to the positions of the word in various named entities, at the moment, the NER task is to perform serialization automatic labeling on each word in the text, then integrate the automatically labeled labels, and finally obtain the named entities formed by a plurality of words and the categories thereof. Among these, serialization labeling is the most efficient and most prevalent NER method. Typical methods are: SVM (Support Vector Machine), ME (Maximum Entropy), HMM (Hidden Markov Model), CRF (Conditional Random Field), Neural Network (Neural Network), and the like.

The HMM is a probabilistic directed graph model, which mainly makes two assumptions, namely a "first-order markov assumption" and an "observation independent assumption", wherein the "first-order markov assumption" describes that yt at the current time is generated from yt-1 at the previous time, and the model can only model the above information, but cannot model the below information, so that the expression capability of the model is limited. The CRF is a probabilistic undirected graph model, which needs to define a relevant feature template through which a whole sentence is scanned and matched, the features of the whole sequence are obtained by linear weighted combination of local features, the long-term context information cannot be flexibly considered, the local context features can be flexibly applied, and in addition, the model expression of the model can cover the model parameters of the HMM, so that the model has stronger expression capability than the HMM. In the neural network method, the BI-directional Long Short Term Memory neural network (BI-LSTM) is very strong in sequence modeling, can capture Long-distance context information, and has the capability of fitting nonlinearity of the neural network, which are places that cannot be surpassed by the conventional method, but the method lacks entity boundary characteristics, is very easy to make mistakes in entity boundary determination, and does not reasonably utilize external supplementary information such as entity definition. In addition, more labeled training data is needed by using the method, and when the training data is less, the network is difficult to train well.

The defects of the prior art are as follows: due to the inherent limitation of the principle of various methods, the existing identification method is easy to generate identification errors.

As mentioned above, the existing chinese named entity recognition method mainly has the following problems:

1. the rule-based method depends on specific languages, fields and text styles, the programming process is time-consuming and difficult to cover all language phenomena, errors are particularly easy to generate, the portability of the system is poor, and linguistics experts are required to rewrite rules for different systems.

2. A BI-LSTM-CRF model based on words needs to be subjected to word segmentation on texts or character strings through a word segmentation tool, the word segmentation is prone to errors on unknown words, and the names of people, places, organizations, proper nouns and the like belong to the unknown words, errors occurring on the word segmentation by using the model can be transmitted forward, so that the performance of the model is influenced.

3. Although the word-based BI-LSTM-CRF model has higher performance in Chinese named entity recognition than the word-based BI-LSTM-CRF model, the word-based BI-LSTM-CRF model lacks entity boundary information and is very prone to errors in entity boundary recognition.

4. External supplementary information such as entity definition, entity description information, knowledge graph information corresponding to the entity and the like is not reasonably utilized. No use is made of the entity information structured in the knowledge-graph.

5. The existing entity recognition method needs more marked training data, and when the training data is less, the network is difficult to train well, so that an ideal effect is obtained.

Based on this, the technical solutions provided in the embodiments of the present invention will solve at least one of the above problems, and the following describes a specific embodiment of the present invention with reference to the drawings.

Fig. 1 is a schematic flow chart of an implementation of the named entity identification method, as shown in the figure, the implementation may include:

101, pre-training by using a BERT model in a token mode and a word mode after word segmentation;

102, after pre-training is finished, splicing token supplementary information on a last hidden layer by using a token of a last layer of an output part Transformer and a hidden layer respectively;

103, connecting a softmax classification layer in series on the last layer of the Transformer;

step 104, after the classification probability of each token based on the word and the word is respectively obtained, converting the classification probability of each token based on the word into a label probability based on the word;

and 105, according to the classification probability based on the characters and the words, taking the highest value of each token as the label value of the token.

First, the implementation of the BERT model structure will be explained.

Fig. 2 is a schematic structural diagram of the BERT model, and as shown in the drawing, the model is pre-trained by using the BERT model in two ways, i.e., a token mode and a word after word segmentation, and after the pre-training is completed, the token of the hidden layer is spliced with token supplementary information at the last hidden layer by using the last layer of the transducer of the output part. And a softmax (software maximum value) classification layer is connected above the last layer of positions in series. After the classification probabilities based on the characters and each token based on the words are obtained respectively, the classification probability of each token based on the words is converted into the label probability based on the characters, finally, the classification probabilities based on the characters and the words are compared, and the highest value is taken as the label value of the token.

The BERT is a pre-trained model, assuming that a training set A exists, the network is pre-trained by A, network parameters are learned on the task A and then stored for later use, when a new task B comes, the same network structure is adopted, the parameters learned by A can be loaded when the network parameters are initialized, other high-level parameters are initialized randomly, then the network is trained by training data of the task B, when the loaded parameters are kept unchanged, the method is called as 'freeze', when the loaded parameters are continuously changed along with the training of the task B, the method is called as 'fine-tuning', namely, the parameters are better adjusted to be more suitable for the current task B.

The core of this model is a focusing mechanism, and for a statement, multiple focus points can be enabled simultaneously, without being limited to front-to-back or back-to-front, sequential serial processing. Not only the structure of the model needs to be correctly selected, but also the parameters of the model need to be correctly trained, so that the model can be guaranteed to accurately understand the semantics of the sentence. BERT takes two steps in an attempt to correctly train the parameters of the model.

The first step is to cover 15% of the vocabulary in an article, and let the model predict the covered words omnidirectionally according to context. Given that there are 1 million articles, each with an average of 100 words, randomly covering 15% of the words, the task of the model is to correctly predict these 15 ten thousand covered words. The parameters of the Transformer model are preliminarily trained by omni-directionally predicting the occluded vocabulary.

Then, the second step is used to continue training the parameters of the model. For example, from the 1 ten thousand articles, 20 ten thousand pairs of sentences were picked, for a total of 40 ten thousand sentences. When choosing the sentence pairs, 2 × 10 ten thousand pairs of sentences are two continuous context sentences, and the other 2 × 10 ten thousand pairs of sentences are not continuous sentences. The Transformer model is then asked to identify the 20 ten thousand pairs of statements, which are contiguous and which are not.

These two steps of training are combined and called pre-training (pre-training). And (4) after training, the Transformer model comprises the parameters thereof, namely the general language representation model.

the token is pre-trained in two modes of characters and words after word segmentation, and after pre-training is completed, token supplementary information is spliced on the last hidden layer by the token of the hidden layer by using the last layer of the output part transducer. And a softmax classification layer is connected above the last layer of position in series. After the classification probabilities based on the characters and each token based on the words are obtained respectively, the classification probability of each token based on the words is converted into the label probability based on the characters, finally, the classification probabilities based on the characters and the words are compared, and the highest value is taken as the label value of the token.

In implementation, the Token supplementary information is an average of corresponding word vectors of one or a combination of the following information that can be collected: known entity definitions, known entity description information, structured knowledge-graph information corresponding to known entities.

Specifically, the Token supplementary information is the average of word vectors corresponding to the collected known entity definitions, entity description information, and structured knowledge graph information corresponding to the entities.

The following describes an input implementation of the model.

In a specific implementation, the word or vectorization of a word representation is represented as: using the words or characters in the divided linguistic data as a dictionary, training corresponding words or character vectors by using the Chinese linguistic data, and obtaining word vectorization representation of token;

Token pre-trains the models in a word and word manner, and pre-trains the models in the word manner. All available chinese corpora are used for pre-training.

FIG. 3 is a schematic structural diagram of a pre-training model, and as shown in the figure, embedding of model input is the summation of 3 types of characterizations: word representation, position representation, segment representation.

1) word representation word vectorization representation:

the words or characters in the divided linguistic data are used as a dictionary, the corresponding word or character vectors, namely word embedding (converting the word representation represented by numerical values into vectors with fixed size) are trained by the Chinese linguistic data, and the word vectorization representation of token is obtained.

2) The positional representation position vector represents:

the position information is also subjected to embedding of the position, and a vector representation of the position is obtained. The length of the sequence is at most 512.

3segment representation sentence vector representation:

for the data of sentence pairs, the Embedding of sentence a is added to each word of the previous sentence, and the Embedding of sentence B is added to each word of the next sentence.

4) The beginning of a sentence is denoted by [ CLS ], the end by [ SEP ], and the space between two sentences in a sentence pair is also denoted by [ SEP ].

The implementation of model pre-training is described below.

Specifically, the following may be mentioned:

task # 1: masked LM

To train the bi-directional features, a mask Language Model pre-training method is used to randomly mask out some tokens (e.g., 15%) in the sentence, and then train the Model to predict the removed tokens. The vector obtained by token of the hidden layer in the last hidden layer is put into softmax to calculate the probability of each word in the dictionary.

The specific operation is as follows:

tokens of 15% of the random mask corpus are predicted by entering final hidden vectors from the mask token position into softmax.

If token is replaced by marker [ MASK ] will affect the model, so the following strategy is adopted at random MASK:

1) 80% of the words are replaced by [ MASK ] tokens:

yesterday sees the movie avanda → yesterday sees the movie [ MASK ].

2) 10% of the words are replaced by any word:

yesterday sees the movie avanda → yesterday sees the movie green book.

3) 10% of the words are unchanged:

yesterday sees the movie avatar → yesterday sees the movie avatar.

Task 2 #: prediction of next sentence

In the implementation, before the pre-training using the BERT model, the method may further include:

Specifically, in order to train a model capable of understanding the connection of the long sequence context and understanding the sentence relationship, whether the two classes generated in a chinese corpus are the next sentence model or not may be trained in advance. The following may be used:

firstly, preparing a training set: (sentence a, sentence B), and in 50% of cases B is the next sentence of a, and in 50% of cases B is another sentence randomly fetched in the remaining corpus. Label is the relationship of two sentences (next/not next), the beginning of the sentence is denoted by [ CLS ], the end is denoted by [ SEP ], and the space between two sentences in a sentence pair is also denoted by [ SEP ]. For example:

input ═ CLS ] yesterday watched a movie [ MASK ] [ SEP ]

Microsoft issued robot small ice [ MASK ] [ SEP ]

Label＝NotNext

Input ═ CLS ] yesterday watched a movie [ MASK ] [ SEP ]

[ MASK ] really likes [ SEP ]

Label＝IsNext

The following describes the implementation of the fine-tuning model.

The slot information, i.e., Token supplemental information, may be an average of word vectors corresponding to known entity definitions, entity description information, and structured knowledge graph information corresponding to the entities, which can be collected. For example "time to watch a movie", token supplemental information features get the coded vector described by token by averaging the word vectors describing all words in the sentence.

And (4) classifying the corresponding position of each word in the last layer of the transform of the output part. For the sequence-level classification problem, a layer is added to the original BERT model, W (K × H), K is the number of classes to be classified, H is the number-of-layers output dimension of a transform, and then a softmax layer is passed to predict the probability P (K dimension) ═ softmax (CW ^ T) of the class.

In the implementation, the method can further comprise the following steps:

All parameters, including the parameters of the BERT's original pre-training, and the new W parameters, are jointly retrained with the goal of making the probability predicted by the model and the true probability distance smaller.

The following describes the implementation of label adjustment.

After the classification probabilities based on the words and each token based on the words are obtained, the classification probability of each token based on the words is converted into a label probability based on the words, such as mingming: B-PER, p1, conversion to small: B-PER, p1 and Min: the average power of I-PER, p1,

and finally, comparing the classification probabilities based on the characters and the words, and taking the highest value as the label value of the token.

TABLE 1 parts-of-speech category comparison Table

Part of speech	Means of
		n	Noun (name)
nr	Name of a person
		ns	Place name
nt	Organization name
		nz	Other proper names

Based on the same inventive concept, the embodiment of the present invention further provides a named entity recognition apparatus and a computer-readable storage medium, and because the principles of these apparatuses for solving the problems are similar to the named entity recognition method, the implementation of these apparatuses can refer to the implementation of the method, and the repeated details are not repeated.

When the technical scheme provided by the embodiment of the invention is implemented, the implementation can be carried out as follows.

Fig. 4 is a schematic diagram of a named entity recognition apparatus 1, as shown in the figure, the apparatus includes:

the processor 400, which is used to read the program in the memory 420, executes the following processes:

a transceiver 410 for receiving and transmitting data under the control of the processor 400.

In implementation, vectorization of a word or word of word representation is represented as: using the words or characters in the divided linguistic data as a dictionary, training corresponding words or character vectors by using the Chinese linguistic data, and obtaining word vectorization representation of token;

In an implementation, the method further comprises the following steps:

Where in fig. 4, the bus architecture may include any number of interconnected buses and bridges, with various circuits of one or more processors, represented by processor 400, and memory, represented by memory 420, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 410 may be a number of elements including a transmitter and a receiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 400 is responsible for managing the bus architecture and general processing, and the memory 420 may store data used by the processor 400 in performing operations.

Fig. 5 is a schematic diagram of a named entity recognition apparatus 2, as shown in the figure, the apparatus includes:

the pre-training module 501 is used for pre-training by using a BERT model in two ways of respectively using characters and segmented words by token;

the Transformer module 502 is configured to splice token supplementary information on the last hidden layer by using the token of the hidden layer on the last layer of the Transformer of the output part after the pre-training is completed;

a softmax module 503, configured to concatenate a softmax classification layer on the last layer of the Transformer;

a probability module 504, configured to convert the classification probability of each token based on a word into a label probability based on a word after obtaining the classification probability of each token based on a word and a word respectively;

and the label module 505 is configured to take the highest value of each token as the label value of the token according to the classification probability based on the word and the word.

For convenience of description, each part of the above-described apparatus is separately described as being functionally divided into various modules or units. Of course, the functionality of the various modules or units may be implemented in the same one or more pieces of software or hardware in the practice of the invention.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the named entity identification method.

The specific implementation can be seen in the implementation of the named entity recognition method.

In summary, in the technical solution provided by the embodiment of the present invention, for the problem of the entity boundary existing in the BI-LSTM-CRF model based on the word, a method based on the fusion adjustment of the word and the word is adopted to help optimize the entity boundary, thereby further improving the model performance.

And supplementing the entity slot position by using external supplementary information such as entity definition, entity description information, structured knowledge graph information corresponding to the entity and the like, and coding for reasonable utilization.

The method can capture the information of the bidirectional context in the real sense by more efficiently and longer-distance dependence.

By means of the scheme of pre-training the model and adjusting the training model, the problem that a good training model is difficult to obtain when the training data of the existing entity recognition method is less is solved.

The method has the advantages that the problem that errors of unregistered entities can be transmitted forwards on word segmentation is solved through a BERT (Bidirectional Encoder representation based on a Transformer) model based on characters and words, the performance of the model is improved, and the Transformer is used, so that the method is more efficient and can capture dependence of longer distance compared with RNN (Recurrent Neural Network).

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A named entity recognition method, comprising:

representing a BERT model by a bidirectional encoder based on a Transformer in two ways of respectively using characters and segmented words by a token for pre-training;

a software maximum value softmax classification layer is connected above the last layer of the Transformer in series;

2. The method of claim 1, wherein the Token supplemental information is an average of corresponding word vectors that can be collected of one or a combination of the following: known entity definitions, known entity description information, structured knowledge graph information corresponding to known entities.

3. The method of claim 1, wherein the embedded embedding of the BERT model input is a summation characterized by the following parameters: the word represents word representation, the position represents positional representation, and the segment represents segment representation.

4. A method as claimed in claim 3, characterized in that the vectorized representation of a word or word of the word representation is: taking the words or characters in the linguistic data after word segmentation as a dictionary, and training corresponding words or character vectors by using the Chinese linguistic data to obtain word vectorization expression of token;

5. The method of claim 1, wherein the pre-training is of a hidden Language Model, Masked Language Model.

6. The method of claim 1, wherein prior to pre-training using the BERT model, further comprising:

7. The method of claim 1, further comprising:

8. A named entity recognition apparatus, comprising:

connecting a softmax classification layer above the last layer of the transform in series;

according to the classification probability based on the characters and the words, each token takes the highest value as the label value of the token;

9. A named entity recognition apparatus, comprising:

the probability module is used for converting the classification probability of each token based on words into label probability based on the words after respectively obtaining the classification probability of each token based on the words;

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 7.