CN117034942A

CN117034942A - Named entity recognition method, device, equipment and readable storage medium

Info

Publication number: CN117034942A
Application number: CN202311286040.2A
Authority: CN
Inventors: 赵鑫安; 宋伟; 朱世强; 谢冰; 王雨菡; 沈亦翀
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-11-10
Anticipated expiration: 2043-10-07
Also published as: CN117034942B

Abstract

The specification discloses a named entity recognition method, a device, equipment and a readable storage medium, wherein a text to be recognized is input into a pre-trained named entity recognition model, character feature vectors of characters in the text to be recognized are determined through a character feature extraction module, character segment feature vectors of character segments in the text to be recognized are obtained through a character segment feature extraction module, and conditional probabilities of the text to be recognized corresponding to preset entity class combinations are determined according to the character segment feature vectors respectively corresponding to the character segments in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model, so that named entity information contained in the text to be recognized is determined. Therefore, in the scheme, the nested entity in the text to be identified can be effectively identified through the two-dimensional conditional random field entity label prediction module in the named entity identification model, so that the accuracy of entity identification is improved.

Description

Named entity recognition method, device, equipment and readable storage medium

Technical Field

The present disclosure relates to the field of natural language processing technologies, and in particular, to a named entity recognition method, device, apparatus, and readable storage medium.

Background

Named entity recognition (Named Entity Recognition, NER) is a natural language processing technique aimed at identifying entities from text that have a specific meaning, mainly including person names, place names, organization names, time, date, proper nouns, etc. The named entity recognition technology is an important basic tool in a natural language processing system, and by recognizing entities in texts and classifying and labeling the entities, the accuracy and efficiency of subsequent information extraction, question-answering systems, syntactic analysis, machine translation, metadata labeling and other applications are improved.

Currently, common named entity recognition methods include rule-based methods, machine learning-based methods, deep learning-based methods, and the like. The method based on deep learning is widely applied in recent years, and by using a neural network model for entity recognition, compared with the method based on rules and machine learning, the method can greatly improve the accuracy of named entity recognition.

However, in practical applications, named entities in the text to be identified may not appear in isolation, and entity nesting may occur. Nesting an entity in named entity recognition refers to the case where one entity contains another entity. This situation typically occurs when there is a hierarchy or containment relationship between the entities. At present, the above method still cannot effectively identify nested entities in the text to be identified, and once the nested entities appear in the text to be identified, the conditions of identification errors and unrecognization can appear, so that the named entity identification method has poor named entity identification effect in part of specific scenes.

Based on this, the specification provides a named entity recognition method.

Disclosure of Invention

The present specification provides a named entity recognition method, apparatus, device, and readable storage medium, to partially solve the above-mentioned problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a named entity recognition method, which comprises the following steps:

acquiring a text to be identified;

inputting the text to be identified into a pre-trained named entity identification model, and determining character feature vectors corresponding to each character contained in the text to be identified through a character feature extraction module of the named entity identification model;

taking character feature vectors corresponding to the characters contained in the text to be recognized as input, and inputting the character feature vectors into a character segment feature extraction module of the named entity recognition model to obtain character segment feature vectors corresponding to the character segments in the text to be recognized;

determining the conditional probability of the text to be recognized corresponding to each preset entity class combination according to character segment feature vectors respectively corresponding to each character segment in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model;

And determining the named entity information contained in the text to be identified according to the conditional probability that the text to be identified corresponds to each preset entity class combination.

Optionally, the character feature extraction module of the named entity recognition model comprises an embedded layer and an encoder;

inputting the text to be recognized into a pre-trained named entity recognition model, and determining character feature vectors corresponding to each character contained in the text to be recognized through a character feature extraction module of the named entity recognition model, wherein the method specifically comprises the following steps of:

inputting the text to be recognized into a pre-trained named entity recognition model, and obtaining an embedded vector of each character in the text to be recognized through the embedded layer;

and inputting the embedded vector of each character into the encoder to obtain the character characteristic vector of each character.

Optionally, the character segment feature extraction module of the named entity recognition model comprises a first full-connection layer, a second full-connection layer and a third full-connection layer;

the character segment feature extraction module, which takes as input the character feature vectors corresponding to the characters contained in the text to be recognized and inputs the character feature vectors to the named entity recognition model, obtains the character segment feature vectors corresponding to the character segments in the text to be recognized, and specifically includes:

The character feature vectors corresponding to the characters contained in the text to be recognized are used as input, the input is input to a character segment feature extraction module of the named entity recognition model, first feature vectors corresponding to the characters are obtained through the first full-connection layer, and the first feature vectors are feature vectors of the characters serving as first characters of character segments in the text to be recognized;

according to character feature vectors respectively corresponding to the characters contained in the text to be recognized, obtaining second feature vectors respectively corresponding to the characters through the second full-connection layer, wherein the second feature vectors are feature vectors of the characters serving as tail characters of character fragments in the text to be recognized;

dividing the text to be recognized into a plurality of character fragments;

and for each character segment, taking a first feature vector of a first character in the character segment and a second feature vector of a tail character in the character segment as inputs, and inputting the first feature vector and the second feature vector into the third full-connection layer to obtain the character segment feature vector of the character segment.

Optionally, determining the conditional probability that the text to be recognized corresponds to each preset entity category combination according to the character segment feature vector corresponding to each character segment in the text to be recognized and the two-dimensional conditional random field entity tag prediction module in the named entity recognition model specifically includes:

Taking each character segment in the text to be identified as a node, taking the relation among the character segments in the text to be identified as an edge, constructing a target two-dimensional grid, and taking the entity category of each character segment in the text to be identified as the state of each node in the target two-dimensional grid; wherein, the head characters of the character fragments of each row in the target two-dimensional grid are the same, and the tail characters of the character fragments of each column are the same;

arranging character segment feature vectors of all character segments in the text to be identified according to the character segments corresponding to all nodes in the target two-dimensional grid respectively to obtain a character segment feature vector matrix;

and taking the target two-dimensional grid and the character segment feature vector matrix as input, and inputting the input into a two-dimensional conditional random field entity label prediction module in the named entity recognition model to obtain the conditional probability of the text to be recognized corresponding to each preset entity class combination.

Optionally, the target two-dimensional grid and character segment feature vectors of all nodes in the target two-dimensional grid are used as input, the input is input into a two-dimensional conditional random field entity tag prediction module in the named entity recognition model, and the determination of the conditional probability that the text to be recognized corresponds to each preset entity category combination specifically comprises the following steps:

Determining, by the two-dimensional conditional random field entity tag prediction module, for each row of nodes in the target two-dimensional grid, a first feature function for representing an association relationship between the row nodes and a row node above the row node according to edges between the row nodes, edges between the row nodes and a row node above the row node, states of the row nodes, states of a row node above the row node, and the character segment feature vector matrix;

determining a second characteristic function for representing the state of the line node according to the state of the line node, the line node and the character segment characteristic vector matrix;

determining a matrix random variable corresponding to the row of nodes according to the first characteristic function and the second characteristic function;

and obtaining the conditional probability of the text to be identified corresponding to the combination of the entity categories according to the matrix random variables of the nodes in each row in the target two-dimensional grid.

Optionally, determining the named entity information contained in the text to be identified according to the conditional probability that the text to be identified corresponds to each preset entity category combination specifically includes:

And determining the named entity information contained in the text to be identified according to the preset entity category combination with the maximum conditional probability in the conditional probabilities of the text to be identified corresponding to the preset entity category combinations.

Optionally, pre-training a named entity recognition model specifically includes:

pre-acquiring a reference text as a training sample, and acquiring entity class labels of character fragments contained in the reference text as labels of the training sample;

inputting the training sample into a named entity recognition model to be trained, and determining character feature vectors corresponding to each character contained in the training sample through a character feature extraction module of the named entity recognition model;

taking character feature vectors corresponding to the characters contained in the training sample as input, and inputting the character feature vectors to a character segment feature extraction module of the named entity recognition model to obtain character segment feature vectors corresponding to the character segments in the training sample;

determining the conditional probability of the training sample corresponding to each entity class combination according to character segment feature vectors respectively corresponding to each character segment in the training sample and a two-dimensional conditional random field entity label prediction module in the named entity recognition model;

Determining a loss according to the difference between the conditional probability corresponding to each entity class combination in the training sample and the label of the training sample;

and training the named entity recognition model by taking the loss minimization as a training target.

The specification provides a named entity recognition device, comprising:

the acquisition module is used for acquiring the text to be identified;

the character feature vector determining module is used for inputting the text to be recognized into a pre-trained named entity recognition model, and determining character feature vectors corresponding to all characters contained in the text to be recognized through a character feature extracting module of the named entity recognition model;

the character segment feature vector determining module is used for taking character feature vectors corresponding to all characters contained in the text to be recognized as input, and inputting the character feature vectors into the character segment feature extracting module of the named entity recognition model to obtain character segment feature vectors corresponding to all character segments in the text to be recognized;

the conditional probability determining module is used for determining the conditional probability of the text to be recognized corresponding to each preset entity class combination according to character segment feature vectors corresponding to each character segment in the text to be recognized and a two-dimensional conditional random field entity label predicting module in the named entity recognition model;

And the named entity determining module is used for determining named entity information contained in the text to be identified according to the conditional probability that the text to be identified corresponds to each preset entity category combination.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the named entity recognition method described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the named entity recognition method described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the named entity recognition method provided by the specification, a text to be recognized is input into a pre-trained named entity recognition model, character feature vectors of characters in the text to be recognized are determined through a character feature extraction module, character segment feature vectors of character segments in the text to be recognized are obtained through a character segment feature extraction module, and conditional probability that the text to be recognized corresponds to each preset entity class combination is determined according to the character segment feature vectors respectively corresponding to the character segments in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model, so that named entity information contained in the text to be recognized is determined. Therefore, in the scheme, the nested entity in the text to be identified can be effectively identified through the two-dimensional conditional random field entity label prediction module in the named entity identification model, so that the accuracy of entity identification is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a flow chart of a named entity recognition method in the present specification;

FIG. 2 is a flow chart of a named entity recognition method according to the present disclosure;

FIG. 3 is a flow chart of a named entity recognition method according to the present disclosure;

FIG. 4 is a flow chart of a named entity recognition method according to the present disclosure;

FIG. 5 is a schematic diagram of a two-dimensional grid of character segments according to the present disclosure;

FIG. 6 is a schematic diagram of a named entity recognition device provided in the present specification;

fig. 7 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In addition, all the actions for acquiring signals, information or data in the present specification are performed under the condition of conforming to the corresponding data protection rule policy of the place and obtaining the authorization given by the corresponding device owner.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a named entity recognition method provided in the present specification.

S100: and acquiring a text to be identified.

The named entity recognition method provided in the embodiments of the present disclosure may be executed by an electronic device such as a server for performing named entity recognition on a text to be recognized. In addition, the named entity recognition model involved in the execution process of the method, the electronic device executing the training process of the named entity recognition model and the electronic device executing the named entity recognition may be the same or different, and this specification is not limited thereto.

In practical application, named entity recognition can be performed on a text to be recognized, which contains a large number of characters, and named entities generally refer to specific types of entities, such as person names, place names, organization names and the like. And extracting key information from the massive texts by identifying and extracting named entities in the texts. Such information may be used in various applications, such as results presentation of search engines, knowledge graph construction, etc., and may also help in computing semantics in mechanism solution text. By identifying named entities, relationships between different entities in the text can be inferred, thereby providing a better understanding of context and context. This is important for tasks such as machine translation, question and answer systems, intelligent conversations, etc.

It follows that named entity recognition is widely used in different tasks in the field of natural language processing.

In this step, the text to be recognized may originate from tasks in different natural language processing domains, such as machine translation, intelligent conversations, etc. The text to be recognized contains a plurality of characters, and the types of the characters can be Chinese characters, letters, data, special characters or English words, english word subwords and the like. The number and types of characters contained in the text to be recognized, and the source of the text to be recognized are not limited in this specification.

Further, after obtaining the text to be recognized, unified preprocessing is required to be performed on the text to be recognized to obtain a text which has a unified format and accords with the input format of the subsequent named entity recognition model, and the specific preprocessing steps comprise: the complex conversion, case processing, special character removal, unicode text standardization, etc., may of course also include other existing preprocessing steps, which are not limited in this specification.

Further, after preprocessing the text to be recognized, word segmentation is carried out on the text.

Specifically, the step of word segmentation of the text to be recognized is specifically as follows: given a text X to be recognized, a word segmentation device is used for segmenting the text X into characters to obtain a character sequence, which can be marked as x= (X) ₁ ，x ₂ ……，x _n ) Wherein x is _i Is the ith character in the text X to be recognized, and n is the character sequence length of the text to be recognized.

The characters in the chinese context mainly refer to chinese characters, and include letters, data, special characters or some english words, english word sub-words (subwords), and some character segments. The word segmentation device can be an open-source word segmentation device, such as a word segmentation device with a pre-training language model, hanLP, LTP and the like, which are not limited in the specification.

Generally, the server responds to the named entity recognition request to perform named entity recognition on the text to be recognized carried in the named entity recognition request.

In the actual named entity recognition process, named entities formed by one or more characters in the text to be recognized may not appear in isolation, but may have a phenomenon of entity nesting. Entity nesting (Nested Entities) refers to the existence of Nested entity structures in text, where one entity contains another entity. In this case, one entity serves as context or modifier information for another entity. Therefore, in the named entity recognition model adopted in the specification, the association relationship between different character fragments is modeled in a three-dimensional dimension by introducing a two-dimensional conditional random field modeling two-dimensional conditional random field entity label prediction module, and the method is not limited to a chain structure or a linear structure, so that the recognition accuracy of entity nesting phenomena possibly existing in a text to be recognized is improved.

S102: inputting the text to be recognized into a pre-trained named entity recognition model, and determining character feature vectors corresponding to all characters contained in the text to be recognized through a character feature extraction module of the named entity recognition model.

Specifically, the character feature extraction module is used for extracting character feature vectors of each character in the text to be identified, and can be constructed by adopting a neural network by using a deep learning-based method in specific implementation, and can be constructed by using a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN), a gate-controlled cyclic unit (Gate Recurrent Unit, GRU), a long-short-term memory network (Long Short Term Memory, LSTM), a transformer and the like or other neural networks, so that the network structure of the character feature extraction module can be flexibly selected and constructed according to different application scenes, and the network structure of the character feature extraction module is not particularly limited in the specification.

Alternatively, the character feature extraction module may be divided into an embedded layer and an encoder. Specifically, the text to be recognized is input into a pre-trained named entity recognition model, the embedded vector of each character in the text to be recognized is obtained through the embedded layer, the embedded vector of each character is input into the encoder, and the character feature vector of each character is obtained. The encoder may be a transducer encoder, and model parameters of the transducer encoder may be derived from an encoder in a natural language model pre-trained based on a general corpus, or may be obtained during training in a named entity recognition model, which is not limited in this specification.

The embedding layer of the character feature extraction module may be configured to obtain a character embedding vector and a position embedding vector of each character in the text to be recognized, and add the character embedding vector and the position embedding vector to be a final embedding vector of each character. For a text x= (X) to be recognized of given length n ₁ ，x ₂ …，x _i ，…，x _n ) The text X to be recognized is first converted into an integer index sequence of corresponding characters in the vocabulary using the vocabulary of the pre-trained language model (e.g., the BERT pre-trained language model may be used). The vocabulary of the pre-trained language model is composed of common characters and character fragments, wherein the common characters and character fragments comprise Chinese characters, english letters, special letters, english words, subwords of English words and the like. And then inputting the integer index sequence corresponding to the text X to be identified into an embedding layer of the character feature extraction module to obtain a character embedding vector and a position embedding vector corresponding to each character, and adding the character embedding vector and the position embedding vector to obtain the embedding vector of each character. Thereby obtaining a character embedded vector sequence of the text X to be recognized, which is marked as E= (E) ₁ ，e ₂ ，…，e _i ，…，e _n ) Wherein e is _i Is the ith character X in the text X to be recognized _i Is used to determine the embedded vector of (a).

Further, the embedded vector (embedded vector sequence) of each character of the text to be recognized is sequentially input into each transducer module of the encoder of the character feature extraction module, and the feature vector sequence output by the last transducer module is used as the character feature vector sequence of the text to be recognized and is recorded as h= (H) ₁ ，h ₂ ，…，h _i ，…，h _n ) Wherein h is _i Is the ith character X in the text X to be recognized _i The character feature vector contains semantic grammar information and context information for the character.

S104: and taking character feature vectors corresponding to the characters contained in the text to be recognized as input, and inputting the character feature vectors into a character segment feature extraction module of the named entity recognition model to obtain character segment feature vectors corresponding to the character segments in the text to be recognized.

Specifically, a neural network is used for constructing a character segment feature extraction module of the named entity recognition model. The character segment feature extraction module is used for determining the features of the character segments in the text to be recognized based on the character feature vectors of the characters in the text to be recognized. In practical application, a certain character x in the text to be recognized _i First character, ordered in x in the text to be recognized _i Another character x thereafter _j As a tail character, a character segment X belonging to the text to be recognized can be constructed _i：j . The character segment has a length of (j-i+1) characters, wherein 1.ltoreq.i.ltoreq.j.ltoreq.n. Since the characters contained in the text to be recognized are exactly in the sequence in the text to be recognized, the character segment with i > j has no practical meaning. Since the text to be recognized may be a long sentence including n characters, the text to be recognized includes a plurality of character fragments, in total A segment of characters.

It should be noted that, in general, the named entity is formed by a plurality of characters, but, according to different practical application scenarios, there may also be a named entity formed by one character, that is, in this specification, the length of the character segment may be one character.

Inputting the character feature vector of each character in the text to be recognized obtained in the step S102 into a character segment feature extraction module to obtain a first feature vector and a second feature vector of each character in the text to be recognized, wherein the first feature vector can be used for representing the feature when the character is used as the first character of the character segment, the second feature vector can be used for representing the feature when the character is used as the last character of the character segment, and then obtaining the feature vector of the character segment based on the first feature vector of the first character and the second feature vector of the last character of each character segment in the text to be recognized, so that a character segment feature vector matrix of the text to be recognized is obtained.

S106: and determining the conditional probability of the text to be recognized corresponding to each preset entity class combination according to character segment feature vectors respectively corresponding to the character segments in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model.

Specifically, a two-dimensional conditional random field entity tag prediction module for a named entity recognition model is constructed based on a two-dimensional conditional random field, a character segment two-dimensional grid structure is constructed according to the first character and the last character of each character segment, then a two-dimensional conditional random field is used on the basis of the character segment two-dimensional grid structure, character segment feature vectors of texts to be recognized form a character segment feature vector matrix according to the character segment two-dimensional grid structure, the character segment feature vector matrix is input into the two-dimensional conditional random field entity tag prediction module, and conditional probabilities of all entity category combinations on the texts to be recognized are output. Specifically, all character segments included in the text to be recognized are respectively corresponding to each preset entity category, so that a plurality of entity category combinations corresponding to the text to be recognized are formed, and the two-dimensional conditional random field entity tag prediction module is used for determining the conditional probability respectively corresponding to each entity category combination corresponding to the text to be recognized under the condition that the given input is the text to be recognized.

Note that a length n of text to be recognized is commonAnd (3) respectively giving an entity class to each character segment to obtain an entity class combination corresponding to the text to be recognized. Thus, a piece of text to be recognized corresponds to a plurality of entity class combinations. Also for this reason, in this step, the two-dimensional conditional random field entity tag prediction module in the named entity recognition model may output the conditional probability that the text to be recognized corresponds to each preset entity class combination based on the character segment feature vector of each character segment in the text to be recognized.

Specifically, the previous named entity recognition model (such as BiLSTM+CRF, BERT+CRF, etc.) based on the conditional random field is to encode the text to be recognized to obtain a character vector sequence, then consider it as a chain structure or linear structure (each character is regarded as a node, and an edge is constructed between adjacent characters), and model and label the text by using a linear chain conditional random field (linear-chain CRFs) to obtain entity class labels corresponding to each character node, so as to extract entity information in the text to be recognized. However, these models based on named entity recognition of linear chain member random fields only use the association relationship between adjacent characters, neglect the relationship between different character segments (such as adjacent character segments or nested character segments), and result in limited accuracy of entity recognition; moreover, the named entity recognition models cannot effectively recognize nested entities in texts, so that the effect is poor in an actual application scene.

In the specification, a two-dimensional conditional random field (two-dimensional conditional random fields,2D CRFs) model is adopted to construct an entity tag prediction module in a named entity recognition model, a text to be recognized is not only regarded as a chain structure or a linear structure, but a character segment two-dimensional grid structure is constructed for the text to be recognized, the two-dimensional conditional random field is used for modeling, entity information in the text is predicted, three-dimensional association information among character segments is effectively utilized, thus the condition of entity nesting is effectively utilized, and the accuracy of entity recognition is improved.

Still with the aforementioned text to be recognized x= (X ₁ ，x ₂ …，x _i ，…，x _n ) For example, in determining the character feature vector h= (H) of each character in the text to be recognized ₁ ，h ₂ ，…，h _i ，…，h _n ) Then, based on step S104, character segment feature vector h is obtained _ij ^span Further obtaining a character segment feature vector matrix H of the text to be recognized ^span Based on the character segment feature vector matrix, determining that the text to be recognized corresponds to the text by using a two-dimensional conditional random field of the entity tag prediction moduleThe conditional probability of each preset entity class combination. Specifically, for a given text X to be identified, and some entity class combination y= { T ₁ ，T ₂ ，…，T _i ，…，T _n Probability of condition is expressed asWherein T is _i ={y _i,i ，y _i,i+1 ，…，y _i,j ，…，y _i,n -wherein y _i,j Is a character segment X formed from the ith character to the jth character in a text X to be recognized predicted by a named entity recognition model _i:j Is a category of entities.

S108: and determining the named entity information contained in the text to be identified according to the conditional probability that the text to be identified corresponds to each preset entity class combination.

Further, based on the step S106, the conditional probability that the text to be identified corresponds to each preset entity class combination is obtained, and then the entity class combination with the highest conditional probability is selected as the entity class label predicted by the named entity identification model.

Specifically, the aforementioned text X to be recognized is still taken as an example. An alternative predictive entity class label determination formula is as follows:

wherein the method comprises the steps of={/>，/>，…，/>，…，/>}, wherein->={/>，/>，…，/>，…，/>}，/>Is a character segment X formed from the ith character to the jth character in a text X to be recognized predicted by a named entity recognition model _i:j Is a category of entities.

Further, obtaining the entity class label with the highest conditional probability under the condition of the text X to be recognized, which is predicted by the named entity recognition modelAnd judging whether each character segment in the text to be recognized is an entity according to the entity category label, and determining which entity category the character segment belongs to when judging that the character segment is the entity.

Specifically, for the text X to be recognized, a character segment X is formed from the ith character to the jth character _i:j Entity class labels with maximum conditional probability of predicting the text to be recognized according to the named entity recognition modelIf the entity category corresponding to the character segment is determined +.>=0, representing that the character segment is not an entity; if->=k (whereinK epsilon {1,2, …, K }), the character segment is the kth entity class of the preset K entity classes; if->=k+1, the character segment is a special entity representing the start in the text to be recognized, corresponding to the start entity class; if- >The character segment is a special entity representing termination in the text to be recognized, and corresponds to the end entity class. And obtaining each entity and entity category in the text to be identified and outputting the entity information.

In the named entity recognition method provided by the description, a text to be recognized is input into a pre-trained named entity recognition model, character feature vectors of characters in the text to be recognized are determined through a character feature extraction module, character segment feature vectors of character segments in the text to be recognized are obtained through a character segment feature extraction module, and conditional probability that the text to be recognized corresponds to each preset entity class combination is determined according to the character segment feature vectors respectively corresponding to the character segments in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model, so that named entity information contained in the text to be recognized is determined.

Therefore, in the scheme, the character segment two-dimensional grid structure is constructed, the two-dimensional conditional random field entity label prediction module in the named entity recognition model predicts the entity information in the text to be recognized based on the character segment two-dimensional grid structure, and compared with the traditional named entity recognition model, the method can effectively utilize the association information among character segments and can effectively recognize nested entities in the text to be recognized, so that the accuracy of entity recognition is improved.

Further, the character segment feature extraction module is configured to determine character segment feature vectors of different character segments composed by the characters according to the character feature vectors of the characters in the text to be recognized, and since the differences between the character segments depend on the first character, the last character and the segment length of the character segments, the character segment feature extraction module needs to extract each character as a feature of the first character in the character segment and as a feature of the last character in the character segment. Specifically, the character segment feature extraction module includes a first full-connection layer, a second full-connection layer, and a third full-connection layer, as shown in fig. 2, and the specific scheme is as follows:

s200: and taking character feature vectors corresponding to the characters contained in the text to be recognized as input, inputting the character feature vectors into a character segment feature extraction module of the named entity recognition model, and obtaining first feature vectors corresponding to the characters through the first full-connection layer, wherein the first feature vectors are feature vectors of the characters serving as first characters of character segments in the text to be recognized.

For the ith character X in a given text X to be recognized _i (1. Ltoreq.i. Ltoreq.n), and the corresponding character feature vector obtained in step S102 is h _i The first eigenvector h is obtained according to the following formula _i ^s ：

Wherein the character x is to be _i Feature vector h of (2) _i Inputting the first full connection layer to obtain character x _i Feature vector h as the first character of a character segment _i ^s And is denoted as a first feature vector. Wherein W is _s 、b _s Is the weight matrix and bias terms of the first fully-connected layer, and ReLU () is a ReLU activation function, where other activation functions, such as LeakyReLU, etc., may be used, and the specific type of the activation function is not limited in this specification. In addition, the first fully-connected layer may also be implemented using a multi-layer neural network.

Further, a first eigenvector sequence of the text X to be recognized is obtained as H _s =（h ₁ ^s ，h ₂ ^s ，…，h _i ^s ，…，h _n ^s ) Wherein h is _i ^s Is the ith character x in the text to be recognized _i Is included in the first feature vector of (a).

S202: and obtaining second feature vectors corresponding to the characters respectively through the second full-connection layer according to character feature vectors corresponding to the characters contained in the text to be recognized, wherein the second feature vectors are feature vectors of the characters serving as tail characters of character fragments in the text to be recognized.

For the ith character X in a given text X to be recognized _i (1. Ltoreq.i. Ltoreq.n), and the corresponding character feature vector obtained in step S102 is h _i The second characteristic vector h is obtained according to the following formula _i ^e ：

Wherein the character x is _i Feature vector h of (2) _i Inputting the second full connection layer to obtain x _i Feature vector h as character segment tail character _i ^e A second feature vector; wherein W is _e ，b _e Is the weight matrix and bias terms of the second fully-connected layer, and ReLU () is a ReLU activation function, where other activation functions, such as LeakyReLU, etc., may be used, and the specific type of the activation function is not limited in this specification. In addition, the second fully-connected layer may also be implemented using a multi-layer neural network.

Further, a second feature vector sequence of the text X to be recognized is obtained as H _e =（h ₁ ^e ，h ₂ ^e ，…，h _i ^e ，…，h _n ^e ) Wherein h is _i ^e Is the ith character x in the text to be recognized _i Is described.

S204: and dividing the text to be recognized into a plurality of character fragments.

Similar to step S106 described above, a plurality of characters are included in the text to be recognized, and thus, character fragments may be constituted based on one or more characters. In this step, for each character in the text to be recognized, each candidate character after the character in the arrangement order is determined according to the arrangement order of each character in the text to be recognized, the character is used as a first character, each candidate character of the character is used as a last character, and each character segment with the character as a beginning is formed. Of course, since one character may constitute a character segment in this specification, a character segment beginning with the character also includes the character itself.

S206: and for each character segment, taking a first feature vector of a first character in the character segment and a second feature vector of a tail character in the character segment as inputs, and inputting the first feature vector and the second feature vector into the third full-connection layer to obtain the character segment feature vector of the character segment.

Specifically, for the ith character X in the text X to be recognized _i By the j-th character x _j The character segment x of the composition _i:j (wherein 1.ltoreq.i.ltoreq.j.ltoreq.n) the first character x is obtained from step S200 and step S202 _i Is the first eigenvector h of (1) _i ^s And tail character x _j Is a second eigenvector h of (1) _j ^e Inputting the character segment into a third full-connection layer to obtain a feature vector h of the character segment _ij ^span The specific formula is as follows:

wherein f _dist Is a relative position coding function, which codes the character x _i To character x _j The relative distance j-i between the two is input into the relative position coding function to obtain character x _i To character x _j Is a relative position encoded vector e of (2) _dist For retaining the length information of the character segments. Will h _i ^s 、h _j ^e 、e _dist The three are spliced to input a third full connection layer to obtain a character x _i To character x _j Feature vector h of the composed character segment _ij ^span Wherein W is _span ，b _span Is the weight matrix and bias term of the third full connection layer; h ^span Is a character segment feature vector matrix of a text to be recognized, and when i is more than or equal to 1 and less than or equal to j and less than or equal to n, the (i, j) th element in the matrix is a character x _i To character x _j Feature vectors of the composed character segments; when i > j, the (i, j) th element in the matrix is a zero vector. In addition, the third fully-connected layer may also be implemented using a multi-layer neural network.

In one or more embodiments of the present disclosure, step S106 may form a two-dimensional grid structure based on each character segment, and the two-dimensional conditional random field entity tag prediction module determines, based on the two-dimensional grid structure, a conditional probability of each entity class combination corresponding to the text to be recognized, as shown in fig. 3, specifically implemented by the following scheme:

s300: taking each character segment in the text to be identified as a node, taking the relation among the character segments in the text to be identified as an edge, constructing a target two-dimensional grid, and taking the entity category of each character segment in the text to be identified as the state of each node in the target two-dimensional grid; the first characters of the character fragments of each row in the target two-dimensional grid are the same, and the tail characters of the character fragments of each column are the same.

Specifically, for the text x= (X) ₁ ，x ₂ …，x _i ，…，x _n ) (n is the length of the text to be recognized), constructing an n multiplied by n character segment two-dimensional grid as a target two-dimensional grid, wherein the target two-dimensional grid takes character segments in the text to be recognized as nodes, and constructing edges between the nodes, and the specific structure is shown in fig. 5. The specific construction mode of the nodes and the edges in the target two-dimensional grid is as follows:

Nodes (i row, j column, i and j satisfy { 1.ltoreq.i.ltoreq.j.ltoreq.n }) with coordinates (i, j) in the target two-dimensional grid represent the i-th character x in the text to be recognized _i First character, j-th character x _j Character segment X being the tail character _i:j Notably in the target two-dimensional gridOnly the node of the upper triangle part and the node of the diagonal line are meaningful nodes, and the node of the lower triangle part (i.e. the node of the part i > j) is meaningless because when i > j, the character segment formed from the ith character to the jth character is a reverse character segment in the text to be recognized, and the subsequent modeling does not use the part of the node, so the part of the node can be regarded as an empty node or a virtual node, and can be regarded as X _null ；

The construction mode of the edges in the target two-dimensional grid is as follows: and constructing edges between each node (i, j) and four adjacent nodes (i-1, j), (i+1, j), (i, j-1) and (i, j+1), and constructing edges between each node (i, i) and adjacent nodes (i-1 ) and (i+1, i+1) on the diagonal line, wherein the edges are used for establishing association relations among different character fragments. In specific practice, other sides may be constructed in a two-dimensional grid structure in other manners according to practical situations, which is not limited in this specification.

By constructing a target two-dimensional grid with character fragments of text to be recognized as nodes and constructing edges between the nodes, the connection between the character fragments can be established. Compared with other methods for predicting entity types by directly using information of character fragments or identifying named entities based on linear chain member random fields, the method for establishing connection among character fragments and modeling based on the connection adopted in the named entity identification model adopted in the specification can fully utilize association relations of different dimensions among the character fragments, is not limited to linear relations, so that accuracy of predicting entity types is improved, and the problem that nested entities cannot be effectively identified by the traditional method can be solved.

S302: and arranging character segment feature vectors of all character segments in the text to be identified according to the character segments corresponding to all nodes in the target two-dimensional grid respectively to obtain a character segment feature vector matrix.

Specifically, according to the coordinates of each character segment in the text to be recognized in the target two-dimensional grid, the character segment feature vectors of each character segment are arranged according to the positions of the coordinates in the target two-dimensional grid, and a character segment feature vector matrix is obtained, so that the character segment feature vector matrix not only contains the position relation of each character segment in the text to be recognized, but also contains the features of the character segments.

S304: and taking the target two-dimensional grid and the character segment feature vector matrix as input, and inputting the input into a two-dimensional conditional random field entity label prediction module in the named entity recognition model to obtain the conditional probability of the text to be recognized corresponding to each preset entity class combination.

The two-dimensional conditional random field entity label prediction module in the named entity recognition model can model the relation between the nodes of each row in the target two-dimensional grid and the relation between the nodes of each row and the nodes of the adjacent row, and the association relation between the nodes is described by modeling the relation, so that the conditional probability of the text to be recognized corresponding to each entity class combination is predicted based on the association relation.

When the two-dimensional conditional random field is used for modeling the two-dimensional grid structure of the character segment of the text to be recognized, the two-dimensional conditional random field can be modeled according to the change of entity category labels among rows, and can also be modeled according to columns or diagonal lines, and the method is not limited in the specification.

Alternatively, taking a method of modeling the line-by-line as an example, based on the line-by-line modeling, the above step S304 may be specifically implemented by the following scheme:

The first step: determining, by the two-dimensional conditional random field entity tag prediction module, for each row of nodes in the target two-dimensional grid, a first feature function for representing an association relationship between the row nodes and a row node above the row nodes according to edges between the row nodes, edges between the row nodes and a row node above the row nodes, states of a row node above the row nodes, and the character segment feature vector matrix.

Specifically, for the ith row (i is not less than 1 and not more than n) in the character segment two-dimensional grid structure of the text X to be recognized, definition is given×/>Matrix random variable>Wherein->Given the input text X to be recognized, the node state sequence of the i-1 row is T _i-1 The state sequence of the ith row node is T _i The value of the random variable is that when the entity class sequence of the i-1 row node is T _i-1 When the entity class sequence of the ith row node is T _i The magnitude of the score at that time.

In order to obtain the value of the random variable, in the present specification, the text to be recognized is defined as x= (X) ₁ ，x ₂ …，x _i ，…，x _n ) The coordinate set of the nodes corresponding to all character fragments taking the ith character as the first character in the text to be recognized is R (i) = { (i, i), (i, i+1), …, (i, j), … (i, n) }, wherein 1.ltoreq.i.ltoreq.n, and R (i) corresponds to all nodes beginning from (i, i) on the ith row in the two-dimensional grid structure of the character fragments.

In this specification, when modeling the association between each line node and the adjacent line node, taking the node of the previous line of each line node of the adjacent line as an example, therefore, taking all edges between the node coordinate set R (i-1) of the i-1 th line and the node coordinate set R (i) of the i-th line in the target two-dimensional grid corresponding to the text X to be recognized and all edges between the nodes in the node coordinate set R (i) of the i-th line as edge sets, it is noted as E (i) = { E ((m, n), (i, j)): (m, n) ∈R (i-1)/(i, j) ∈R (i), E ((m, n), (i, j))εE), where E ((m, n), (i, j)) represents the edge from node (m, n) to node (i, j), and E is the set of all meaningful edges in the target two-dimensional grid. E (i) includes edges that exist between the nodes of row i and edges that exist between row i-1 and the nodes of row i.

To be in a two-dimensional grid structure of character segments of the text X to be recognizedThe state sequence on the i-th row (i is more than or equal to 1 and less than or equal to n) node is T _i ={y _i,i ，y _i,i+1 ，…，y _i,j ，…，y _i,n -wherein y _i,j (i.ltoreq.j.ltoreq.n) is the state of the node (i, j), which in the present specification corresponds to the character segment X corresponding to the node _i:j The value range of the entity class label is {0,1,2, …, K, K+1, K+2}.

In this step, to Characterizing the first feature function, an alternative formula for the first feature function is as follows:

wherein E is one side of E (i), y _m,n Is the initial node state corresponding to the edge, y _i,j Is the state of the end node corresponding to the edge, H ^span Is a character segment feature vector matrix of the text to be recognized. In addition, according to different application scenarios, the first feature function may also be implemented using a neural network, which is not limited in this specification.

The first feature function can be used for describing the association relation between the nodes, so that the association relation between the nodes is applied to the prediction of the subsequent entity category, which is equivalent to introducing the association relation between each row of nodes and the nodes above each row of nodes into the prediction of the entity category through the first feature function, and the association relation between each character segment in the text to be recognized is fully utilized, so that the accuracy of the entity category prediction for entity nesting is improved.

And a second step of: and determining a second characteristic function for representing the state of the line node according to the state of the line node, the line node and the character segment characteristic vector matrix.

To be used forThe second characteristic function is characterized in that, The formula may be as follows:

where v is the j-th node in the i-th row of nodes, y _i,j Is the corresponding state of the node, H ^span Is a character segment feature vector matrix of the text to be recognized. The second feature function is used to characterize the node state. In addition, according to different application scenarios, the second feature function may also be implemented using a neural network, which is not limited in this specification.

And a third step of: and determining a matrix random variable corresponding to the row of nodes according to the first characteristic function and the second characteristic function.

This step can be realized specifically according to the following formula:

/>

wherein the first characteristic function has K in total ₁ Each first characteristic functionCan correspond to weight->Wherein, 1 is less than or equal to k ₁ ≤K _1, . The second characteristic function shares K ₂ Each second characteristic function +.>Can correspond to weight->Wherein, 1 is less than or equal to k ₂ ≤K ₂ 。

Further:

=exp(/>)，

wherein T is _i-1 ={y _i-1,i-1 ，y _i-1,i ，…，y _i-1,n }，T _i ={y _i,i ，y _i,i+1 ，…，y _i,n }。

Fourth step: and obtaining the conditional probability of the text to be identified corresponding to the combination of the entity categories according to the matrix random variables of the nodes in each row in the target two-dimensional grid.

When a text x= (X) to be recognized is given ₁ ，x ₂ ，…，x _i ，…，x _n ) And a corresponding certain entity class combination y= { Y _i,j When i is more than or equal to 1 and less than or equal to j and less than or equal to n, Y= { Y _i,j 1.ltoreq.i.ltoreq.j.ltoreq.n } is rewritten to Y= { T ₁ ,T ₂ ,…,T _n }, T therein _i ={y _i,i ，y _i,i+1 ，…，y _i,n And let T ₀ =start，T _n+1 =end, the conditional probability P (y|x) of the combination of entity categories Y given the text X to be recognized is calculated as:

P(Y|X)=

the P (y|x) represents the magnitude of the conditional probability of Y in all possible entity class combinations given the text X to be recognized.

In one or more embodiments of the present disclosure, the named entity recognition model employed by the scheme shown in FIG. 1 may be iteratively trained by the following steps, as shown in FIG. 4.

S400: and acquiring a reference text in advance as a training sample, and acquiring entity class labels of character fragments contained in the reference text as labels of the training sample.

Specifically, reference text is collected as training samples from a natural language processing system or the internet. Thereby preprocessing the reference text and labeling the entity class label of each reference text.

Specifically, in training a named entity recognition model, a large number of reference texts are required to be used as training samples, so that the reference texts need to be collected. The reference text may be obtained from a log record of a natural language processing system (e.g., intelligent question and answer system, intelligent chat system, etc.), or may be collected from the internet. The reference text may refer to a text requiring entity information to be identified, and the reference text may be a sentence, an article, or the like, and may include one or more entities, may include nested entities, or may not include any entity, which is not limited in this specification.

Further, after the reference texts are collected, unified preprocessing needs to be performed on each reference text to obtain a text which has a unified format and accords with the input format of a subsequent named entity recognition model, and the specific preprocessing steps comprise: the complex conversion, case processing, special character removal, unicode text standardization, etc., may of course also include other existing preprocessing steps, which are not limited in this specification.

Further, after preprocessing the original text data, word segmentation and labeling are carried out on each text to be identified to obtain a corresponding entity class label.

Specifically, the step of word segmentation of the text to be recognized is specifically as follows: given a reference text X, a word segmentation device is used to segment the reference text X into individual characters to obtain a character sequence, which can be expressed as x= (X) ₁ ，x ₂ …，x _i ，…，x _n ) Wherein x is _i Is the i-th character in the reference text X and n is the character sequence length of the reference text.

And after the reference text is segmented, marking the reference text to obtain a corresponding entity class label. Firstly, a plurality of preset entities are determined under a given application sceneThe volume category may be given by K entity categories in the current scene, and is denoted as (entity) ₁ ，entity ₂ ，……，entity _K ) In addition to the K entities, three special entities of non-entity (non-entity), start (special entity representing the beginning of text sentence) and end (special entity representing the end of text sentence) are additionally introduced to obtain a total entity class set of IQ= { non-entity, entity ₁ ，entity ₂ ，……，entity _K ，start，end}。

For the segmented reference text x= (X) ₁ ，x ₂ ……，x _n ) The corresponding entity class label may be in the form of:

Y={y _i,j ：1≤i≤j≤n}

wherein y is _i,j Representing a character segment (denoted as X) formed from the ith character to the jth character in the reference text X _i:j ) A corresponding entity class. Wherein when the character segment is not an entity then y _i,j =0, corresponding to non-entry category; when the character segment is the kth entity class in the preset K entity classes, then y _i,j =k (1. Ltoreq.k), corresponding to the identity _k Category. When the character segment is a special entity representing the start in the reference text, y _i,j =k+1, corresponding to start class; when the character segment is a special entity representing termination in the reference text, y _i,j =k+2, corresponding to end category.

In this specification, the entity class labels of the character segments included in the reference text may be obtained based on manual labeling, or may be determined based on a named entity recognition model trained in advance in other scenarios, which is not limited in this specification.

In practical application, since the named entity recognition model needs to perform a training process in addition to a training sample, the training process also needs to be performed after the training process is completed, the reference text can be randomly divided into a training set, a verification set and a test set according to a certain proportion, the training set is used as the training sample for training the named entity recognition model, the verification set is used for verifying the named entity recognition model after the training process is completed, and the test set is used for testing the named entity recognition model after the verification process is performed.

S402: and inputting the training sample into a named entity recognition model to be trained, and determining character feature vectors corresponding to each character contained in the training sample through a character feature extraction module of the named entity recognition model.

Specifically, the character feature extraction module is used for extracting character feature vectors of each character in the text to be recognized, the construction can be performed by adopting a neural network by using a deep learning-based method in specific implementation, the construction can be performed by using a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN), a gate-controlled cyclic unit (Gate Recurrent Unit, GRU), a long-short-term memory network (Long Short Term Memory, LSTM), a transformer and the like or other neural networks, and the network structure of the character feature extraction module can be flexibly selected and constructed according to different application scenes.

Because the pre-training language model based on the transducer has a remarkable effect on many natural language processing tasks, the pre-training language model has strong language characterization and understanding capability, and the pre-training language model is BERT, roBERTa, ALBERT, T, BART, GPT and the like. For this reason, in an alternative embodiment of the present specification, the character feature extraction module in the named entity recognition model may be constructed based on a pre-training language model, that is, a part of the sub-model in the pre-training language model for extracting text character features from text is used as the character feature extraction module in the named entity recognition model to be trained in the present specification, so that, during the training process of the named entity recognition model, parameters of the part of the sub-model in the pre-training language model for extracting text character features from text are adjusted and optimized, so that after the training is completed, a trained character feature extraction module is obtained.

Optionally, the character feature extraction module comprises an embedded layer and a transducer encoder. And inputting the training sample into an embedding layer of the character feature extraction module to obtain an embedding vector of each character in the training sample. And then inputting the character embedded vector of the training sample into a transducer encoder part of the character feature extraction module, and outputting the character feature vector of each character in the training sample.

S404: and taking character feature vectors corresponding to the characters contained in the training sample as input, and inputting the character feature vectors to a character segment feature extraction module of the named entity recognition model to obtain character segment feature vectors corresponding to the character segments in the training sample.

Similar to the above S104, one of the characters x is used as the character of the training sample _i First character, ordered in x _i Another character x thereafter _j For the tail character, a character segment X belonging to the training sample can be constructed _i:j . Wherein i is more than or equal to 1 and j is more than or equal to n.

Inputting the character feature vector of each character in the training sample obtained in the step S402 into a character segment feature extraction module to obtain a first feature vector and a second feature vector of each character in the training sample, wherein the first feature vector can be used for representing the feature when the character is used as the first character of the character segment, the second feature vector can be used for representing the feature when the character is used as the last character of the character segment, and then obtaining the feature vector of the character segment based on the first feature vector of the first character and the second feature vector of the last character of each character segment in the training sample, so as to obtain the character segment feature vector matrix of the text to be recognized.

S406: and determining the conditional probability of the training sample corresponding to each entity class combination according to character segment feature vectors respectively corresponding to each character segment in the training sample and a two-dimensional conditional random field entity label prediction module in the named entity recognition model.

Specifically, a two-dimensional conditional random field entity tag prediction module for a named entity recognition model is constructed based on a two-dimensional conditional random field, a character segment two-dimensional grid structure is constructed according to the first character and the last character of each character segment, then a two-dimensional conditional random field is used on the basis of the character segment two-dimensional grid structure, character segment feature vectors of training samples are formed according to the character segment two-dimensional grid structure to form character segment feature vector matrixes, the character segment feature vector matrixes are input into the two-dimensional conditional random field entity tag prediction module, and conditional probabilities of the training samples corresponding to entity class combinations are output. The specific scheme is similar to the aforementioned S106, and will not be repeated here. In addition, it should be noted that, in the training process of the named entity recognition model, optionally, a scheme similar to that of fig. 3 may also be executed, so that the conditional probability that the training sample corresponds to each entity class combination is determined through the two-dimensional conditional random field based on the two-dimensional grid structure, which is not described herein.

S408: and determining loss according to the difference between the conditional probability corresponding to each entity class combination in the training sample and the label of the training sample.

Specifically, for the training sample, the conditional probability of the entity class combination obtained by the corresponding label is obtained based on the previous steps, and then the loss corresponding to the training sample is calculated by using a loss function, wherein the loss function can be any existing loss function, such as a negative logarithmic loss function. The loss reflects the degree of difference between the prediction result output by the named entity recognition model in the training process and the real entity class label, and the smaller the loss is, the smaller the difference is, and the better the performance of the model is. In the training process, parameters of the named body recognition model are optimized by minimizing the loss value.

Optionally, training sample X and corresponding entity class label combination Y, the conditional probability P (y|x) of the entity class label combination being Y, given the input X, is calculated as described in step S406, the loss of the sample is calculated using a negative logarithmic loss function as follows:

s410: and training the named entity recognition model by taking the loss minimization as a training target.

Specifically, the training is performed in a multi-iteration mode, a training data set is traversed in each iteration, a small batch of training samples are randomly extracted from the disturbed training data set each time, the small batch of training samples are input into a named entity recognition model, the loss corresponding to each training sample is calculated, and the loss of the small batch of samples is the average value of the losses of all training samples in the small batch of samples. A gradient back-propagation optimization algorithm is then performed to update the parameters of the various modules of the named entity recognition model, in particular to minimize the loss. Judging whether training is needed to be stopped according to the performance of the named entity recognition model on the verification set after each iteration, and storing the model with the best performance on the verification set. The saved model is subsequently used for identifying entity information in the text to be identified.

Fig. 6 is a schematic diagram of a named entity recognition device provided in the present specification, which specifically includes:

an obtaining module 500, configured to obtain a text to be identified;

the character feature vector determining module 502 is configured to input the text to be identified into a pre-trained named entity recognition model, and determine character feature vectors corresponding to each character included in the text to be identified through a character feature extraction module of the named entity recognition model;

The character segment feature vector determining module 504 is configured to input character feature vectors corresponding to each character included in the text to be recognized as input, and input the input character feature vectors to the character segment feature extracting module of the named entity recognition model, so as to obtain character segment feature vectors corresponding to each character segment in the text to be recognized;

the conditional probability determining module 506 is configured to determine a conditional probability that the text to be recognized corresponds to each preset entity class combination according to character segment feature vectors corresponding to each character segment in the text to be recognized and a two-dimensional conditional random field entity label prediction module in the named entity recognition model;

and the named entity determining module 508 is configured to determine named entity information included in the text to be identified according to conditional probabilities that the text to be identified corresponds to each preset entity category combination.

optionally, the character feature vector determining module 502 is specifically configured to input the text to be identified into a pre-trained named entity recognition model, and obtain, through the embedding layer, an embedding vector of each character in the text to be identified; and inputting the embedded vector of each character into the encoder to obtain the character characteristic vector of each character.

optionally, the character segment feature vector determining module 504 is specifically configured to take, as input, a character feature vector corresponding to each character included in the text to be recognized, and input the character feature vector to a character segment feature extracting module of the named entity recognition model, where the first feature vector corresponding to each character is obtained through the first full-connection layer, and the first feature vector is a feature vector of a first character of the character segment in the text to be recognized, where each character is the first character of the character segment; according to character feature vectors respectively corresponding to the characters contained in the text to be recognized, obtaining second feature vectors respectively corresponding to the characters through the second full-connection layer, wherein the second feature vectors are feature vectors of the characters serving as tail characters of character fragments in the text to be recognized; dividing the text to be recognized into a plurality of character fragments; and for each character segment, taking a first feature vector of a first character in the character segment and a second feature vector of a tail character in the character segment as inputs, and inputting the first feature vector and the second feature vector into the third full-connection layer to obtain the character segment feature vector of the character segment.

Optionally, the conditional probability determining module 506 is specifically configured to construct a target two-dimensional grid by using each character segment in the text to be identified as a node, using a relationship between each character segment in the text to be identified as an edge, and using entity types of each character segment in the text to be identified as states of each node in the target two-dimensional grid; wherein, the head characters of the character fragments of each row in the target two-dimensional grid are the same, and the tail characters of the character fragments of each column are the same; arranging character segment feature vectors of all character segments in the text to be identified according to the character segments corresponding to all nodes in the target two-dimensional grid respectively to obtain a character segment feature vector matrix; and taking the target two-dimensional grid and the character segment feature vector matrix as input, and inputting the input into a two-dimensional conditional random field entity label prediction module in the named entity recognition model to obtain the conditional probability of the text to be recognized corresponding to each entity class combination.

Optionally, the conditional probability determining module 506 is specifically configured to determine, by using the two-dimensional conditional random field entity tag prediction module, for each row of nodes in the target two-dimensional grid, a first feature function for characterizing an association relationship between the row nodes and a row node above the row node according to an edge between the row nodes, an edge between the row nodes and a row node above the row node, a state of a row node above the row node, and the character segment feature vector matrix; determining a second characteristic function for representing the state of the line node according to the state of the line node, the line node and the character segment characteristic vector matrix; determining a matrix random variable corresponding to the row of nodes according to the first characteristic function and the second characteristic function; and obtaining the conditional probability of the text to be identified corresponding to the combination of the entity categories according to the matrix random variables of the nodes in each row in the target two-dimensional grid.

Optionally, the named entity determining module 508 is specifically configured to determine named entity information included in the text to be identified according to a preset entity category combination with a maximum conditional probability in conditional probabilities that the text to be identified corresponds to each preset entity category combination.

Optionally, the apparatus further comprises:

the training module 510 is specifically configured to obtain a reference text in advance as a training sample, and obtain entity class labels of character segments included in the reference text as labels of the training sample; inputting the training sample into a named entity recognition model to be trained, and determining character feature vectors corresponding to each character contained in the training sample through a character feature extraction module of the named entity recognition model; taking character feature vectors corresponding to the characters contained in the training sample as input, and inputting the character feature vectors to a character segment feature extraction module of the named entity recognition model to obtain character segment feature vectors corresponding to the character segments in the training sample; determining the conditional probability of the training sample corresponding to each entity class combination according to character segment feature vectors respectively corresponding to each character segment in the training sample and a two-dimensional conditional random field entity label prediction module in the named entity recognition model; determining a loss according to the difference between the conditional probability of the training sample corresponding to each entity class combination and the annotation of the training sample; and training the named entity recognition model by taking the loss minimization as a training target.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the named entity recognition method shown in fig. 1 described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 7. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 7, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the named entity recognition method shown in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A named entity recognition method, comprising:

acquiring a text to be identified;

2. The method of claim 1, wherein the character feature extraction module of the named entity recognition model comprises an embedded layer and an encoder;

3. The method of claim 1, wherein the character segment feature extraction module of the named entity recognition model comprises a first fully-connected layer, a second fully-connected layer, and a third fully-connected layer;

dividing the text to be recognized into a plurality of character fragments;

4. The method of claim 1, wherein determining the conditional probability of the text to be recognized corresponding to each preset entity class combination according to the character segment feature vector corresponding to each character segment in the text to be recognized and the two-dimensional conditional random field entity label prediction module in the named entity recognition model, specifically comprises:

5. The method of claim 4, wherein the character segment feature vectors of the target two-dimensional grid and each node in the target two-dimensional grid are input into a two-dimensional conditional random field entity tag prediction module in the named entity recognition model, and determining the conditional probability that the text to be recognized corresponds to each preset entity class combination specifically comprises:

6. The method of claim 1, wherein determining named entity information contained in the text to be identified according to conditional probabilities that the text to be identified corresponds to each preset entity class combination, specifically comprises:

7. The method of claim 1, wherein pre-training a named entity recognition model, in particular, comprises:

8. A named entity recognition device, comprising:

the acquisition module is used for acquiring the text to be identified;

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.