CN113392649B

CN113392649B - Identification method, device, equipment and storage medium

Info

Publication number: CN113392649B
Application number: CN202110771579.1A
Authority: CN
Inventors: 万建伟; 李松涛; 贺凯; 孙科; 余非; 裴卫民; 冯文亮
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2023-04-07
Anticipated expiration: 2041-07-08
Also published as: CN113392649A

Abstract

The invention discloses an identification method, an identification device, identification equipment and a storage medium. The method comprises the following steps: acquiring a sequence to be recognized, wherein the sequence to be recognized comprises: the method comprises the steps of identifying sentences to be identified and part-of-speech information corresponding to characters in the sentences to be identified; obtaining a vector to be recognized corresponding to the sequence to be recognized by searching a word vector table; the vector to be recognized is input into a target named entity recognition model, and the first class probability corresponding to the vector to be recognized is obtained.

Description

Identification method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to an identification method, an identification device, identification equipment and a storage medium.

Background

According to the first scheme, for a named entity recognition task, a Chinese sentence sequence is mostly used as input of a model in the traditional scheme, then the sequence is represented through characters embedding, feature extraction is carried out through a network model, and finally a final sequence label prediction result is obtained through softmax and a CRF network layer. In a quick named entity recognition method, an author selects a CNNs-self-attribute model in a feature extraction process, and context representation of characters and global context representation are introduced.

And the second scheme is that counterstudy is adopted to enrich the characteristics except the character representation. The model is divided into two parts: an antagonistic learning section and a multitask learning section. Among them, the antagonistic learning utilizes three types of numbers: the NER (named entity recognition) data, the CWS (Chinese word segmentation) data and the POS (part of speech tagging) data are respectively used as the input of the shared bi-LSTM after word embedding, and the antagonistic learning is carried out through gradient inversion. Embedding data of a multi-task learning part NER (named entity recognition) through words and passing through a private bi-LSTM structure, then passing through a multi-head attention layer, and training through a softmax layer and a CRF layer; CWS (Chinese word segmentation) data and POS (part of speech tagging) data are respectively trained by a private feature extraction layer by adopting one-hot codes. Meanwhile, the features of the shared bi-LSTM are also output to the private part of the multi-task learning, and a countermeasure model is formed.

One disadvantage of the solution is as follows:

the part-of-speech information of the words cannot be fully utilized only through a CNNs-self-annotation model, and the boundary of a prediction result cannot be well determined; insufficient feature extraction; the model effect is not good enough.

The second scheme has the following disadvantages:

although word segmentation information is introduced in the scheme, the introduction mode is simple and violent, and is indirect combination, so that the information loss is more, and the prediction result is inaccurate; in addition, the model is too large and bulky, and the contained structure is too large, so that the resource consumption of training and prediction is increased; the bi-LSTM is used in the feature extraction process, and bidirectional semantic information cannot be fully learned.

Disclosure of Invention

The embodiment of the invention provides an identification method, an identification device, identification equipment and a storage medium, which aim to solve the problems that the Chinese NER usually adopts a character-based embedding mode as model input, word part-of-speech information is not fully utilized, the boundary of prediction can be better defined by adding the introduced word part-of-speech information, the model prediction effect is improved, and the current Chinese NER model has low prediction process speed and large memory occupation. A bi-LSTM structure is selected in a feature extraction part of a main Chinese named entity recognition model at present, but the bi-LSTM cannot fully extract information on the left and right of a character, only overlaps bidirectional information, cannot fully utilize local features of the character and the like, so that the complexity of the model is reduced, and the prediction speed of the model is increased.

In a first aspect, an embodiment of the present invention provides an identification method, including:

acquiring a sequence to be recognized, wherein the sequence to be recognized comprises: the method comprises the steps of recognizing sentences to be recognized and part-of-speech information corresponding to characters in the sentences to be recognized;

obtaining a vector to be recognized corresponding to the sequence to be recognized by searching a word vector table;

and inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized.

In a second aspect, an embodiment of the present invention further provides an identification apparatus, where the apparatus includes:

the sequence acquisition module is used for acquiring a sequence to be recognized, wherein the sequence to be recognized comprises: the method comprises the steps of recognizing sentences to be recognized and part-of-speech information corresponding to characters in the sentences to be recognized;

the searching module is used for obtaining a vector to be recognized corresponding to the sequence to be recognized by searching a word vector table;

and the determining module is used for inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the identification method according to any one of the embodiments of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the identification method according to any one of the embodiments of the present invention.

The embodiment of the invention obtains a sequence to be identified, wherein the sequence to be identified comprises: the method comprises the steps of identifying sentences to be identified and part-of-speech information corresponding to characters in the sentences to be identified; obtaining a vector to be recognized corresponding to the sequence to be recognized by searching a word vector table; the method comprises the steps of inputting a vector to be recognized into a target named entity recognition model, obtaining a first class probability corresponding to the vector to be recognized, solving the problem that the Chinese NER usually adopts a character-based embedding mode as model input, cannot fully utilize word part-of-speech information, increases the boundary of word part-of-speech information introduction, can better define prediction, improves the model prediction effect, and is low in speed and large in memory occupation in the current Chinese NER model prediction process. A bi-LSTM structure is selected in a feature extraction part of a main Chinese named entity recognition model at present, but the bi-LSTM cannot fully extract information on the left and right of a character, only overlaps bidirectional information, cannot fully utilize local features of the character and the like, so that the complexity of the model is reduced, and the prediction speed of the model is increased.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of an identification method in an embodiment of the invention;

FIG. 1a is a schematic diagram of a network architecture in an embodiment of the invention;

FIG. 1b is a diagram illustrating fused vocabulary information in an embodiment of the present invention;

FIG. 1c is a schematic diagram of a modified self-attack mechanism in an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an identification device in an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer-readable storage medium containing a computer program in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 is a flowchart of an identification method provided in an embodiment of the present invention, where this embodiment is applicable to an identification situation, the method may be executed by an identification apparatus in an embodiment of the present invention, and the apparatus may be implemented in a software and/or hardware manner, as shown in fig. 1, the method specifically includes the following steps:

s110, obtaining a sequence to be identified, wherein the sequence to be identified comprises: the method comprises the steps of recognizing sentences to be recognized and part-of-speech information corresponding to characters in the sentences to be recognized.

The method for acquiring the sequence to be identified may be as follows: acquiring a sentence to be recognized, and inquiring a word list according to the sentence to be recognized to obtain part-of-speech information corresponding to characters in the sentence to be recognized; the method for acquiring the sequence to be recognized can also be as follows: the method comprises the steps of obtaining characters input by a user, determining a sentence to be recognized according to the characters input by the user, and obtaining part-of-speech information corresponding to each character.

And S120, obtaining the vector to be recognized corresponding to the sequence to be recognized by searching a word vector table.

The word vector table is a table which is established in advance and relates to the corresponding relation between characters and vectors.

Wherein, the vector to be identified may include: the character vector to be recognized and the position vector to be recognized may further include: the field vector to be identified is not limited in this embodiment of the present invention.

For example, the vector to be recognized corresponding to the sequence to be recognized is obtained by looking up a word vector table, for example, the word vector table may be looked up to obtain a vector corresponding to a sentence to be recognized and a vector corresponding to part-of-speech information corresponding to characters in the sentence to be recognized.

S130, inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized.

Wherein the target named entity recognition model may include: the BERT model layer and the softmax layer can further comprise: and in the full connection layer, data can obtain vector representation of each character after BERT, the dimensionality of the vector is 768 at the moment, the dimensionality and the number of NER labels are unified through one full connection layer, and then the NER labels corresponding to the characters are obtained by sorting through softmax.

For example, the vector to be recognized is input into the target named entity recognition model, and the first class probability corresponding to the vector to be recognized is obtained, for example, the vector to be recognized is input into the target named entity recognition model, and the probability of obtaining the class x corresponding to the character a to be recognized is 30%, the probability of obtaining the class y corresponding to the character a to be recognized is 70%, the probability of obtaining the class x corresponding to the character B to be recognized is 30%, the probability of obtaining the class y corresponding to the character B to be recognized is 70%, the probability of obtaining the class x corresponding to the character C to be recognized is 70%, and the probability of obtaining the class y corresponding to the character C to be recognized is 30%.

Optionally, the vector to be identified includes: at least one character vector to be recognized and at least one position vector to be recognized;

correspondingly, inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized, including:

inputting the at least one character vector to be recognized and the at least one position vector to be recognized into a BERT model to obtain a score corresponding to each character vector to be recognized;

inputting the score corresponding to each character vector to be recognized into a softmax layer to obtain a first named entity identification tag corresponding to each character vector to be recognized, wherein the first named entity identification tag comprises: and the class probability corresponding to the character vector to be recognized.

Optionally, the obtaining the sequence to be recognized includes:

acquiring a sentence to be recognized;

and querying a word list according to the sentence to be recognized to obtain part-of-speech information corresponding to the characters in the sentence to be recognized.

The method for acquiring the sentence to be recognized may be as follows: and acquiring characters input by a user, and determining a sentence to be recognized according to the characters input by the user.

The word list is a dictionary of word attributes, and is established in advance so as to be convenient for inquiring the word list according to the sentence to be recognized input by a user and obtain the part-of-speech information corresponding to the characters in the sentence to be recognized.

Optionally, after the vector to be recognized is input into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized, the method further includes:

inputting the first category probability corresponding to the vector to be identified into a CRF layer to obtain a target category probability corresponding to the vector to be identified, wherein the score of the CRF layer is defined as follows:

wherein the content of the first and second substances,

for the first class probability, x, corresponding to the vector to be recognized _i Indexed by the position of a single character, y _i Indexing for location of category label>

Representing labels y from categories _i Transfer to y _i+1 X is the sequence to be recognized, y = (y) ₁ ,y ₂ ,…,y _n ) Y is a category label corresponding to the sequence X;

and calculating according to the category label corresponding to the sequence X to obtain a target probability:

wherein, Y _X Indicates at least two class labels corresponding to the sequence X,

represents traversal of Y _X ；

The loss function for the CRF layer is:

wherein n is the maximum length of a sentence.

Wherein, CRF (Conditional Random Field) is used to constrain the label. Although the BERT model utilizes information of adjacent characters on the left side and the right side of a character during learning, the constraint on the appearance sequence of the label and the continuity of the same label is less when the NER label is predicted. By adding the CRF layer, through learning in a training process, the NER label predicted by the model can be limited to always follow a certain rule, namely, the dependency between the added labels. For example, a B-person tag always precedes I-person, which always immediately follows B-person, and so on.

Optionally, the self-entry method in the target named entity recognition model is implemented by the following formula:

wherein Q represents a Query vector, K represents a Key vector, V represents a Value vector, Q = K = V, and Q is composed of two parts _s Representing a sentence part, Q _v Representing a part of a vocabulary, K is similar to Q,

representing the concatenation of vectors, d _k The dimensions of the Key vector are represented by,

represents self-attention between characters in a sentence, and->

Self-attention representing part of speech of the word and Q.

Illustratively, self-attention operation is carried out between single characters in a sentence, but self-attention is not carried out with part-of-speech information; the part-of-speech information needs to be subjected to self-attention operation with a single character, and self-attention operation needs to be carried out among the part-of-speech information. Because the self-attribute operation between characters focuses on the degree of correlation between characters, there is no relation to the added flag bit; and the self-attention operation of the flag bit depends on the corresponding character.

Optionally, the vector to be identified includes: at least one character vector to be identified, at least one position vector to be identified and at least one segmentation vector to be identified;

inputting the at least one character vector to be recognized, the at least one position vector to be recognized and the at least one segmented vector to be recognized into a BERT model to obtain a score corresponding to each character vector to be recognized;

inputting the score corresponding to each character vector to be recognized into the softmax layer to obtain a second named entity recognition tag corresponding to each character vector to be recognized, wherein the second named entity recognition tag comprises: and the class probability corresponding to the character vector to be recognized.

The network structure provided by the embodiment of the invention mainly comprises 4 parts, as shown in fig. 1a, from bottom to top: 1. a model input section; BERT pre-training model; 3. a full connection layer and a softmax layer; 4.CRF layer.

A model input part:

the input of the model is divided into two parts, wherein the left half part starts with [ CLS ] and ends with [ SEP ] as a sentence, namely a sample according to the format of the BERT pre-training model; the right half introduces a representation of the part of speech of the word in the sentence, ending with [ SEP ]. [ x ] is the starting position of the word, [/x ] is the ending position of the word, and x is some part of speech of the word. Considering the embedding information of the word position, we mark the start position and the end position of the word with flag bits:

P(w _start )＝P([x])；

P(w _end )＝P([/x])；

where P (-) denotes the position index of the word, w _start 、w _end Respectively representing the start and end positions of the word. As shown in FIG. 1b, the threshold AA is set to a value of t]7, the ending position is [/t [ ]]8。

The construction of word part-of-speech vocabularies, according to the task of named entity recognition, the part-of-speech tagging is carried out on specific related entities, meanwhile, the vocabularies are expanded by combining the word segmentation technology, and finally, the word part-of-speech vocabularies are enriched and corrected continuously according to the data set in a specific field.

Meanwhile, the method modifies the self-annotation method used in BERT (Bidirectional Encoder replication from transforms), as shown in FIG. 1 c. A self-attention operation is carried out between single characters in the sentence, but the self-attention operation is not carried out with the flag bit; however, the flag bit is subject to a self-attention operation with a single character. Because the self-attribute operation between characters focuses on the degree of correlation between characters, there is no relation to the added flag bit; and the self-attention operation of the flag bit depends on the corresponding character.

Wherein the modified self-attack is represented as follows:

wherein Q represents a Query vector, K represents a Key vector, V represents a Value vector, and Q = K = V; q consists of two parts _s Representing a sentence part, Q _v Representing a part of a vocabulary, K being similar to Q;

representing a concatenation of vectors; d _k Representing the dimensions of the Key vector.

Represents self-attention between characters in a sentence, and->

Self-entry representing the part of speech of the word and the entire Query vector Q.

When the model is predicted, the boundary definition error often occurs, for example, for the label "external rating", only part of the label is easily predicted during prediction, and results such as "external", "external rating" and "rating" occur, which all cause the named entity recognition result to be inaccurate and lose important information. By explicitly adding word part-of-speech information into model input and adopting an improved self-entry mechanism, information loss is reduced, redundant information is reduced, so that the entity boundary determination of the named entity recognition model is more accurate, and the model accuracy is improved; meanwhile, the complexity of the model is reduced, and the prediction speed of the model is improved.

BERT pre-training model:

by using the BERT pre-training model, a more ideal result can be obtained more quickly through a fine-tuning mode. The BERT model inputs characters, positions and sections in an embedding stage, and after the embedding vectors are added together, the embedding vectors are input to a transformer encoder part. Wherein the transformer encoder has 12 layers in total, the number of hidden layer neurons is 768, and the multi-head self-attention has 12 heads in total. For the named entity recognition task, the input of the BERT model comprises characters embedding and positions embedding, wherein the values of the segmentations are all 1.

The BERT pre-training is divided into two tasks: masked language model and next sentence prediction. The first step of pre-training aims to make a language model, 15% of tokens (words) of the random MASK are trained, 10% of words can be replaced by other words when the random MASK is used, 10% of words are not replaced, 80% of words are replaced by MASK, and the final loss function only calculates the token which is removed by the MASK. The objective function is:

P(w _i |w ₁ ,w ₂ ,…w _i-1 ,w _i+1 ,…,w _n )；

and the next sentence is predicted in the second step, because a question-answer task is involved, a pre-training task for predicting the next sentence is added, and the aim is to enable the model to understand the relation between the two sentences. The input for the training is sentences A and B, which have half the probability of being the next sentence of A, and the model predicts whether B is the next sentence of A or not.

Fully connected and softmax layers:

data can be obtained through BERT to represent the vector of each character, the dimensionality of the vector is 768, the dimensionality and the number of NER labels are unified through a full connection layer, and then the NER labels corresponding to the characters are obtained through classification by softmax.

CRF layer:

although the BERT model utilizes information of adjacent characters on the left and right sides of the character during learning, the constraint on the appearance sequence of the label and the continuity of the same label is less when the NER label is predicted. The CRF layer is added, and the NER label predicted by the model can be limited to always follow a certain rule through learning in the training process, namely, the dependence between the labels is added. For example, a B-person tag always precedes I-person, which always immediately follows B-person, and so on.

Two types of scores are included in the loss function in the CRF layer: emission score (Emission score) and transfer score (Transition score). Wherein the emission score, the output from the softmax layer

x _i Indexed by the position of a single character, y _i Indexed for category label position. Transfer score pick>

Representation from category label y _i Transfer to y _j The transition scores of all classes construct a transition score matrix. The matrix can be initialized randomly, the constraint relation between the class labels is obtained through model training iterative updating and learning, and the corresponding NER label is obtained through Viterbi algorithm (Viterbi) decoding.

The goal of CRF is to train so that the fraction of the true path is the greatest at the fraction ratio of all paths:

path score

s _i = emissoscore + TransitionScore, emissoscore and TransitionScore can be calculated from the emission matrix and the transfer matrix, respectively.

In one specific example, the algorithm framework of the embodiment of the invention is as follows:

preparing corpus into BIO format X _BIO And simultaneously, preparing a word list V of the part of speech of the word according to the training corpus. Combining each group of input sentences with corresponding word part-of-speech information in the sentences to form an input sequence with the word parts-of-speech fused:

X＝X _BIO +V＝(x ₁ ,x ₂ ,…,x _n )；

as the input of the model, n is the maximum length of the sentence, and the length less than n is filled up by adopting a padding method.

For the input X of the model, the input X is obtained by searching a word vector table of characters

Is represented by a vector of (a). The word vectors of the characters are obtained by combining word2vec algorithm and financial field corpus training, the vector representation of word parts of speech adopts a preset special value, and e is the dimension 300 of the word vectors. X _{char-embedding} With random initialization

The combination yields a vector representation that is input into the BERT model:

X _embedding ＝X _{char-embedding} +X _{position-embedding} ；

the embodiment of the invention adopts a BERT-Base model, and utilizes a trained parameter, namely, chip _ L-12 \/H-768 \/A-12 to carry out fine-tuning, wherein the super parameters are set as follows: {

"attention_probs_dropout_prob":0.1,

"hidden_act":"gelu",

"hidden_dropout_prob":0.1,

"hidden_size":768,

"initializer_range":0.02,

"intermediate_size":3072,

"max_position_embeddings":512,

"num_attention_heads":12,

"num_hidden_layers":12,

"pooler_fc_size":768,

"pooler_num_attention_heads":12,

"pooler_num_fc_layers":3,

"pooler_size_per_head":128}

After passing through a BERT-Base model, connecting a full connection layer, wherein the output dimensionality is the number k of different labels of a training set, and obtaining normalized representation through a softmax layer, namely the output at the moment is an emission matrix

Finally, the CRF layer transmits the locations after softmax to the CRF for training, wherein the matrix is transferred

Random initialization is then continuously optimized by learning, where the sequence X corresponds to a label of y = (y) ₁ ,y ₂ ,…,y _n )。

And in the prediction stage, after the model is processed, decoding by a Viterbi algorithm (Viterbi) to obtain a corresponding NER (Named Entity Recognition) label.

Training a target:

and in the training process, the final convergence result is obtained by calculating the loss of the CRF layer and continuously iterating by using a back propagation algorithm. Wherein the fraction of CRF is defined as follows:

for a given sequence tag y, the corresponding probability is calculated by softmax:

wherein Y is _X All possible sequence tags representing sequence X. The training process our goal is to maximize the log probability of the correct tag sequence, by transforming the deformation into a minimum loss function:

where n is the maximum length of the sentence.

In the prediction stage, the sequence label with the largest score is obtained through decoding and is used as a prediction result:

the key point of the technology provided by the embodiment of the invention is that the problems of slow training and prediction and low accuracy of a Chinese named entity recognition model are solved by constructing a word part-of-speech table and fusing word part-of-speech characteristics into an input sequence. Constructing a word part-of-speech table, and fusing word part-of-speech information into a sequence as a model input method; the loss of information is reduced, redundant information is reduced, the boundary of the named entity recognition model is determined more accurately, and the accuracy of the model is improved; meanwhile, the complexity of the model is reduced, and the prediction speed of the model is improved.

According to the embodiment of the invention, word-embedding word vectors are not required to be utilized, and although the accuracy of the model can be improved by introducing the word vectors, more parameters are added, so that the model is more complex. Word part-of-speech representation is introduced, and the accuracy of Chinese named entity recognition is improved. The resource consumption is less, the memory required by training is small, and the prediction speed is high. The BERT pre-training model with better performance and wider application is selected in the characteristic extraction process.

According to the technical scheme of the embodiment, a sequence to be recognized is obtained, wherein the sequence to be recognized comprises: the method comprises the steps of recognizing sentences to be recognized and part-of-speech information corresponding to characters in the sentences to be recognized; obtaining a vector to be recognized corresponding to the sequence to be recognized by searching a word vector table; the method comprises the steps of inputting a vector to be recognized into a target named entity recognition model, obtaining a first class probability corresponding to the vector to be recognized, solving the problem that the Chinese NER usually adopts a character-based embedding mode as model input, cannot fully utilize word part-of-speech information, increases the boundary of word part-of-speech information introduction, can better define prediction, improves the model prediction effect, and is low in speed and large in memory occupation in the current Chinese NER model prediction process. A bi-LSTM structure is selected in a feature extraction part of a main Chinese named entity recognition model at present, but the bi-LSTM cannot fully extract information on the left and right of a character, only overlaps bidirectional information, cannot fully utilize local features of the character and the like, so that the complexity of the model is reduced, and the prediction speed of the model is increased.

Fig. 2 is a schematic structural diagram of an identification apparatus according to an embodiment of the present invention. The present embodiment may be applicable to the case of identification, the identification apparatus may be implemented in a software and/or hardware manner, and the identification apparatus may be integrated in any device providing an identification function, as shown in fig. 2, where the identification apparatus specifically includes: a sequence acquisition module 210, a lookup module 220, and a determination module 230.

The sequence acquisition module is configured to acquire a sequence to be identified, where the sequence to be identified includes: the method comprises the steps of recognizing sentences to be recognized and part-of-speech information corresponding to characters in the sentences to be recognized;

correspondingly, the determining module is specifically configured to:

inputting the score corresponding to each character vector to be recognized into a softmax layer to obtain a first named entity recognition tag corresponding to each character vector to be recognized, wherein the first named entity recognition tag comprises: and the class probability corresponding to the character vector to be recognized.

The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention. FIG. 3 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 3 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in FIG. 3, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, and commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (a Compact disk-Read Only Memory (CD-ROM)), digital Video disk (DVD-ROM), or other optical media may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination of which may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. In the electronic device 12 of the present embodiment, the display 24 is not provided as a separate body but is embedded in the mirror surface, and when the display surface of the display 24 is not displayed, the display surface of the display 24 and the mirror surface are visually integrated. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), wide Area Network (WAN), and/or a public Network such as the internet) via the Network adapter 20. As shown, the network adapter 20 communicates with the other modules of the electronic device 12 over the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive Arrays, redundant Array of Independent Disks (RAID) systems, tape drives, and data backup storage systems, to name a few.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the identification method provided by the embodiment of the present invention:

Fig. 4 is a schematic structural diagram of a computer-readable storage medium containing a computer program according to an embodiment of the present invention. Embodiments of the present invention provide a computer-readable storage medium 61, on which a computer program 610 is stored, which when executed by one or more processors implements the identification method as provided by all inventive embodiments of the present application:

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An identification method, comprising:

acquiring a sequence to be recognized, wherein the sequence to be recognized comprises: the method comprises the steps of identifying sentences to be identified and part-of-speech information corresponding to characters in the sentences to be identified;

inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized;

the self-attention method in the target named entity recognition model is realized through the following formula:

q represents a Query vector, K represents a Key vector, V represents a Value vector, Q = K = V, and Q is composed of two parts _s Representing a sentence part, Q _v Representing a part of the vocabulary, K is similar to Q,

representing the concatenation of vectors, d _k Represents the dimension of the Key vector, < >>

Represents self-attention between characters in a sentence>

Self-attention representing part of speech and Q of the word;

self-attention operation is carried out among single characters in the sentence;

performing self-attention operation on the part-of-speech information and a single character;

and carrying out self-attention operation among the part of speech information.

2. The method of claim 1, wherein the vector to be identified comprises: at least one character vector to be recognized and at least one position vector to be recognized;

3. The method of claim 1, wherein obtaining the sequence to be identified comprises:

acquiring a sentence to be recognized;

4. The method according to claim 1, further comprising, after inputting the vector to be recognized into a target named entity recognition model and obtaining a first class probability corresponding to the vector to be recognized, the method further comprising:

inputting the first class probability corresponding to the vector to be identified into a CRF layer to obtain a target class probability corresponding to the vector to be identified, wherein the score of the CRF layer is defined as follows:

wherein the content of the first and second substances,

for the first class probability, x, corresponding to the vector to be recognized _i Indexed by the position of a single character, y _i Indexing the location of a category label, <' > or>

and calculating according to the class label corresponding to the sequence X to obtain a target probability:

represents traversal Y _X S (X, y) represents the fraction of the CRF layer;

the loss function of the CRF layer is:

wherein n is the maximum length of the sentence.

5. The method of claim 1, wherein the vector to be identified comprises: at least one character vector to be recognized, at least one position vector to be recognized and at least one segmentation vector to be recognized;

6. An identification device, comprising:

the sequence acquisition module is used for acquiring a sequence to be recognized, wherein the sequence to be recognized comprises: the method comprises the steps of identifying sentences to be identified and part-of-speech information corresponding to characters in the sentences to be identified;

the determining module is used for inputting the vector to be recognized into a target named entity recognition model to obtain a first class probability corresponding to the vector to be recognized;

the self-entry method in the target named entity recognition model is realized by the following formula:

Represents self-attention between characters in a sentence, and->

Self-attention representing part of speech and Q of the word;

self-attention operation is carried out between single characters in the sentence;

performing self-attention operation on the part of speech information and a single character;

and carrying out self-attention operation among the part of speech information.

7. The apparatus of claim 6, wherein the vector to be identified comprises: at least one character vector to be recognized and at least one position vector to be recognized;

correspondingly, the determining module is specifically configured to:

8. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the processors to implement the method of any of claims 1-5.

9. A computer-readable storage medium containing a computer program, on which the computer program is stored, characterized in that the program, when executed by one or more processors, implements the method of any one of claims 1-5.