CN110069781B

CN110069781B - Entity label identification method and related equipment

Info

Publication number: CN110069781B
Application number: CN201910335748.XA
Authority: CN
Inventors: 赵知纬
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2022-11-18
Anticipated expiration: 2039-04-24
Also published as: CN110069781A

Abstract

The embodiment of the invention provides an identification method of an entity label and related equipment, which are used for identifying the entity label of a text so as to better understand the search intention of a user and improve the user experience. The method comprises the following steps: performing word segmentation on the target text to obtain a target word segmentation set; vectorizing each participle in the target participle set; inputting each participle in the vectorized target participle set into a first preset model to obtain a probability value of each semantic block combination corresponding to the target text; determining a vector of each semantic block in the target semantic block combination; inputting the vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of the entity label of each semantic block in the target semantic block combination; and determining the entity label of each semantic block in the target semantic block combination, wherein the probability value of each semantic block in the target semantic block combination reaches a second preset threshold value, as the entity label of each semantic block in the target semantic block combination.

Description

Entity label identification method and related equipment

Technical Field

The present invention relates to the field of natural language processing, and in particular, to a method and related device for identifying an entity tag.

Background

In the conventional entity recognition task, a sequence labeling method based on a Conditional Random Field (CRF) model is the most commonly used method. The method generally comprises the steps of firstly creating one or more feature sequences according to the word/word sequences of the input text, then defining a series of feature templates for specifying features to be extracted and feature combinations in the one or more feature sequences, and finally inputting the features of the extracted sequences into sequence labels obtained in a CRF (text object model), wherein the sequence labels comprise entity boundary information and category information.

In recent years, with the re-emergence of neural networks, many Natural Language Processing (NLP) tasks achieve better effects by means of neural networks, and entity recognition is one of them. In an entity recognition task in the general field, most of the existing entity recognition methods based on a Neural Network are methods based on a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN) or a method combined with CRF.

However, since the video domain entity has the characteristics that the general domain entity does not have, the effect of identifying the video domain entity cannot be effectively improved by directly applying the method in the general domain.

Disclosure of Invention

The embodiment of the invention provides an identification method of an entity label and related equipment, which are used for identifying the entity label of a text so as to better understand the search intention of a user and improve the user experience.

A first aspect of an embodiment of the present invention provides a method for identifying an entity tag, including:

performing word segmentation on a target text to obtain a target word segmentation set, wherein the target text is a text of an entity label to be identified;

vectorizing each participle in the target participle set;

inputting each vectorized participle into a first preset model to obtain a probability value of each semantic block combination corresponding to the target text, wherein the first preset model is obtained by training data through the first model, the training data comprises a vector of the participle corresponding to each text in a training text set, each text in the training text set is a text which is manually labeled with an entity label sequence, and each semantic block in each semantic block combination at least comprises one participle in the target participle set;

determining a vector of each semantic block in a target semantic block combination, wherein the target semantic block combination is a semantic block combination of which the probability value reaches a first preset threshold value in each semantic block combination corresponding to the target text;

inputting the vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, wherein the second preset model is obtained by training the vector of each semantic block in a first target semantic block combination output by a first model through a second model, the first target semantic block combination is the semantic block combination of which the probability value in the semantic block combination corresponding to each text reaches a first preset threshold, and each semantic block in the first target semantic block combination at least comprises a word corresponding to each text;

and determining the entity label of which the probability value of each semantic block in the target semantic block combination reaches a second preset threshold value as the entity label of each semantic block in the target semantic block combination.

Optionally, the first model is a semi-markov based conditional random field model, the second model is a self-attention model, and the method further comprises:

performing word segmentation on each text to obtain the word segmentation of each text;

vectorizing each participle in the participle of each text to obtain a participle vector corresponding to the participle of each text;

iteratively updating the model parameters of the semi-Markov-based conditional random field model and the self-attention model based on the word segmentation vector corresponding to the word segmentation of each text;

when a preset iteration termination condition is reached, determining the semi-Markov-based conditional random field model at the time of iteration termination as the first preset model, and determining the self-attention model at the time of iteration termination as the second preset model.

Optionally, the iteratively updating the model parameters of the semi-markov-based conditional random field model and the self-attention model based on the word segmentation vector corresponding to the word segmentation of each text includes:

step 1, selecting a word segmentation vector corresponding to a word segmentation of a target training text, and inputting the word segmentation vector into the semi-Markov-based conditional random field model to obtain probability values of different semantic block combinations corresponding to the target training text, wherein the target training text is any one text in each text;

step 2, inputting a vector of each semantic block in a target training semantic block combination into the self-attention model to obtain a probability value of an entity label of each semantic block in the target training semantic block combination, wherein the target training semantic block combination is a semantic block combination of which the probability value reaches a first preset threshold value in different semantic block combinations corresponding to the target training text;

step 3, adjusting model parameters of the semi-Markov-based conditional random field model and model parameters of the self-attention model based on the probability value of the target training semantic block combination and the target entity label probability value, wherein the target entity label probability value is the entity label probability value reaching the second preset threshold value in all the entity label probability values corresponding to each semantic block in the target training semantic block combination;

and 4, iteratively executing the step 1 to the step 3 based on the half Markov-based conditional random field model after model parameter adjustment and the self-attention model after model parameter adjustment.

Optionally, the selecting a word segmentation vector corresponding to a word segmentation of the target training text and inputting the word segmentation vector into the semi-markov-based conditional random field model, and obtaining probability values of different semantic block combinations corresponding to the target training text includes:

calculating the word segmentation vectors corresponding to the word segmentation of the target training text through the following formula to obtain the probability values of different semantic block combinations corresponding to the target training text:

wherein, w ₀ w ₁ …w _m A word segmentation vector of the target training text, m is the number of the word segments in the word segmentation set of the target training text, w ₀ ′w ₁ ′…w _n 'is a combination mode of the participle vectors corresponding to the participle set of the target training text, and n is the number of semantic blocks obtained after the participle vectors of the target training text are combined, P (w' ₀ w′ ₁ …w′ _n ) Is to follow the segmentation vector of the target training text by w' ₀ w′ ₁ …w′ _n Z (w) is all combination modes of each participle in the participle set of the target training text, M _semi-crf For the model parameters of the semi-markov based conditional random field model,

l is the number of types of entity labels, fd is the dimension of each participle vector in the participle vectors of the target training text, and G (-) is the characteristic function of the semi-markov-based conditional random field model.

Optionally, the inputting the target training semantic block combination into the self-attention model to obtain an entity label probability value of each semantic block in the target training semantic block combination includes:

determining a target matrix based on the target training semantic block combination;

respectively calculating matrix dot products of the target matrix and at least one preset parameter matrix to obtain at least one parameter matrix, wherein the at least one preset parameter matrix and the at least one parameter matrix have an incidence relation;

decomposing at least one parameter matrix to obtain an equal-width matrix corresponding to the at least one parameter matrix;

determining an attention matrix based on a constant-width matrix corresponding to the at least one parameter matrix;

calculating the attention matrix and the output parameter matrix to obtain an entity label probability value of each semantic block in the target training semantic block combination;

wherein the at least one parameter matrix and the output parameter matrix are both model parameters of the self-attention model, and the output parameter matrix includes the number of types of entity labels.

Optionally, the method further comprises:

judging whether the iteration times reach a preset value or not, and if so, determining that the preset iteration termination condition is met;

or the like, or, alternatively,

and judging whether the model parameters of the semi-Markov-based conditional random field model and/or the model parameters of the self-attention model are converged, if so, determining that the preset iteration termination condition is met.

A second aspect of the embodiments of the present invention provides an apparatus for identifying an entity tag, including:

the word segmentation unit is used for segmenting a target text to obtain a target word segmentation set, wherein the target text is a text of an entity label to be identified;

the vectorization processing unit is used for vectorizing each participle in the target participle set;

a processing unit, configured to input each participle in the vectorized target participle set into a first preset model to obtain a probability value of each semantic block combination corresponding to the target text, where the first preset model is obtained by training data with the first model, the training data includes a vector of a participle corresponding to each text in a training text set, and each text in the training text set is a text that is manually labeled with an entity label sequence, and each semantic block in each semantic block combination at least includes one participle in the target participle set;

the determining unit is used for determining a vector of each semantic block in a target semantic block combination, wherein the target semantic block combination is a semantic block combination of which the probability value reaches a first preset threshold value in each semantic block combination corresponding to the target text;

the processing unit is further configured to input a vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, where the second preset model is obtained by training the vector of each semantic block in a first target semantic block combination output by the first model through a second model, the first target semantic block combination is a semantic block combination of which the probability value reaches the first preset threshold in the semantic block combination corresponding to each text, and each semantic block in the first target semantic block combination at least includes a participle corresponding to each text;

the determining unit is further configured to determine, as the entity tag of each semantic block in the target semantic block combination, the entity tag of which the probability value of each semantic block in the target semantic block combination reaches a second preset threshold.

Optionally, the first model is a semi-markov based conditional random field model, the second model is a self-attention model, and the apparatus further comprises:

a training unit to:

Optionally, the iteratively updating, by the training unit, the model parameters of the semi-markov-based conditional random field model and the self-attention model based on the word segmentation vector corresponding to the word segmentation of each text includes:

Optionally, the selecting, by the training unit, a word segmentation vector corresponding to a word segmentation of the target training text and inputting the word segmentation vector into the semi-markov-based conditional random field model, and obtaining probability values of different semantic block combinations corresponding to the target training text includes:

calculating the word segmentation vectors corresponding to the word segmentation of the target text through the following formula to obtain the probability values of different semantic block combinations corresponding to the target training text:

wherein, w ₀ w ₁ …w _m Is the word segmentation vector of the target training text, m is the number of the word segmentation in the word segmentation set of the target training text, w ₀ ′w ₁ ′…w _n 'is a combination mode of the participle vectors corresponding to the participle set of the target training text, and n is the number of semantic blocks obtained after the participle vectors of the target training text are combined, P (w' ₀ w′ ₁ …w′ _n ) To divide word vector of the target training text by w' ₀ w′ ₁ …w′ _n Z (w) is all combination modes of each participle in the participle set of the target training text, M _semi-crf For the model parameters of the semi-markov based conditional random field model,

Optionally, the inputting, by the training unit, a target training semantic block combination into the self-attention model to obtain an entity label probability value of each semantic block in the target training semantic block combination includes:

Optionally, the training unit is further configured to:

or the like, or a combination thereof,

and judging whether the model parameters of the semi-Markov-based conditional random field model and/or the model parameters of the self-attention model are converged, and if so, determining that the preset iteration termination condition is met.

A third aspect of embodiments of the present invention provides a computer-readable storage medium, which is characterized by comprising instructions that, when the computer-readable storage medium is run on a computer, cause the computer to execute the steps of the entity tag identification method in the above aspects.

In a fourth aspect, an embodiment of the present invention provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the steps of the method for identifying an entity tag according to the above aspects.

In summary, it can be seen that, in the embodiment provided by the present invention, the text of the entity tag to be recognized is recognized comprehensively through the pre-trained first preset model and the second preset model, and because the first preset model is obtained by training the training data through the first model, the second preset model is obtained by training the output of the second model through the first model, because the model is recognized through the model, and the model is obtained through a large amount of training in advance, the search intention of the user can be better understood, and the user experience is improved.

Drawings

Fig. 1 is a schematic flowchart of an identification method for an entity tag according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a training process of a first preset model and a second preset model according to an embodiment of the present invention;

fig. 3 is a schematic view of a virtual structure of an identification apparatus for entity tags according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a hardware structure of a server according to an embodiment of the present invention.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The method for identifying an entity tag in the embodiment of the present invention is described below from the perspective of an entity tag identification apparatus, which may be a server or a service unit in the server, and is not particularly limited.

Referring to fig. 1, fig. 1 is a schematic flow chart of an identification method of an entity tag according to an embodiment of the present invention, including:

101. and performing word segmentation on the target text to obtain a target word segmentation set.

In this embodiment, the identification device for the entity tag may first obtain a target text, where the target text is a text of the entity tag to be identified, and then perform word segmentation on the target text to obtain a target word segmentation set, where how to perform word segmentation on the target text is not specifically limited, for example, a word segmentation tool may perform word segmentation on the target text, and similarly, other ways may also be used to perform word segmentation on the target text, and are not specifically limited.

102. And vectorizing each participle in the target participle set.

In this embodiment, the identification apparatus for the entity tag may perform vectorization processing on each participle in the target participle set through a vectorization tool, for example, perform vectorization processing on each participle in the target participle set through a vectorization tool such as generic word2vec and glove.

103. And inputting each participle in the vectorized target participle set into a first preset model to obtain the probability value of each semantic block combination corresponding to the target text.

In this embodiment, the identification device of the entity tag may pre-train a first preset model, where the first preset model is obtained by training data through the first model, the training data includes a participle vector corresponding to each text in a training text set, and each text in the training text set is a text that is artificially labeled with an entity tag sequence, that is, the input of the first preset model is each participle in a vectorized target participle set, the output of the first preset model is a probability value of each semantic block combination corresponding to the target text, the semantic blocks corresponding to the target text are combined into different combination modes of each participle in the target participle set, and after the identification device of the entity tag obtains the first preset model, the identification device of the entity tag may input each participle in the vectorized target participle set into the first preset model to obtain a probability value of each semantic block combination corresponding to the target text, where each semantic block in each semantic block combination includes at least one participle in the target participle set. For example, if the target text is "i want to have all flowers open", the target word segmentation set may be: "i", "want", "listen", "flower", "all", "open", "good" and "out" (the above target participle set is merely an example, and does not represent a limitation thereto), vectorizing each participle in the target participle set, and inputting each participle after vectorization into a first preset model, so as to obtain a semantic block combination of each participle in the target participle set: probability values of "i", "want", "listen", "flower all good" (the description of semantic block combinations of each participle in the target participle set is only for illustration and does not represent a limitation thereof).

104. A vector for each semantic block in the target semantic block combination is determined.

In this embodiment, after obtaining the probability value of each semantic block combination, the entity tag identification apparatus may use, as the target semantic block combination, a semantic block combination in each semantic block combination corresponding to the target text, where the probability value reaches a first preset threshold, and determine a vector of each semantic block in the target semantic block combination, because each semantic block in each semantic block combination corresponding to the target text may include more than one participle, the vector of each semantic block needs to be determined, the vector of each semantic block may be determined by adding the vectors of the participles in the semantic blocks, or the vector of each semantic block may be determined by other manners, for example, by training a depth model, and is not particularly limited, as long as the vector of each semantic block can be obtained.

105. And inputting the vector of each semantic block in the target semantic block combination into a second preset model to obtain the probability value of the entity label of each semantic block in the target semantic block combination.

In this embodiment, the entity tag recognition device may train a second preset model in advance, where the second preset model is obtained by training a vector of each semantic block in a first target semantic block combination output by the first preset model through the second model, the first target semantic block combination is a semantic block combination in which a probability value in the semantic block combination corresponding to each text reaches a first preset threshold, each semantic block in the first target semantic block combination at least includes a word corresponding to each text, that is, an input of the second preset model is an output of the first preset model, and an output of the second preset model is a vector of each semantic block in the target semantic block combination, and then the entity tag recognition device may input the vector of each semantic block in the target semantic block combination into the second preset model to obtain the entity tag of each semantic block in the target semantic block combination.

106. And determining the entity label of each semantic block in the target semantic block combination, wherein the probability value of each semantic block in the target semantic block combination reaches a second preset threshold value, as the entity label of each semantic block in the target semantic block combination.

In this embodiment, after obtaining the probability value of the entity tag of each semantic block in the target semantic block combination, the entity tag of which the probability value of each semantic block in the target semantic block combination reaches the second preset threshold may be used as the entity tag of each semantic block, so as to obtain the entity tag sequence of the target text.

It should be noted that the first model is a conditional random field model based on half markov, and the second model is a self-attention model, but may be other models, and is not limited to the specific example.

The following describes the training procedures of the first preset model and the second preset model, taking the first model as a semi-markov-based conditional random field model and the second model as a self-attention model as examples.

Referring to fig. 2, fig. 2 is a schematic diagram of a training process of a first preset model and a second preset model according to an embodiment of the present invention, including:

201. and performing word segmentation on each text to obtain the word segmentation of each text.

In this embodiment, the apparatus for identifying an entity tag may first obtain each text in a training text set, where each text in the training text set is a text that has been manually labeled with an entity tag, and then perform word segmentation on each text in the training text set, where how to perform word segmentation on each text is not specifically limited, for example, a word segmentation tool is used to perform word segmentation on each text to obtain a word segmentation of each text.

For example, if "i want to listen to all flowers" is a certain text in the training text set, the manual label may exist as shown in table 1:

i am	O
		To be administered	O
Listening device	O
		Flower all has good bloom	MUSIC

TABLE 1

The column 1 in table 1 is each semantic block of the text "i want to listen to the MUSIC and have all the flowers open", the column two is an entity identifier corresponding to each semantic block, the value MUSIC indicates that the semantic block is a song name, and the value O indicates that the semantic block is a non-entity. It should be noted that the above table and the entity identifier are only examples, and do not represent limitations, and may also exist in other forms, and the entity tag may also have other types, such as movie, animation, etc., and may be specifically set according to the actual situation.

It should be further noted that, the division by the semantic block is only performed in two division manners, one is a non-entity identifier, the other is an entity identifier, and of course, there may be multiple expression forms, for example, part of speech of the non-entity identifier may be described, and the description is not limited specifically.

202. And respectively carrying out vectorization processing on each participle in the participle of each text to obtain a participle vector corresponding to the participle of each text.

In this embodiment, each participle in the participle of each text may be subjected to vectorization processing by a vectorization tool, so as to obtain a participle vector corresponding to the participle of each text. For example, vectorization processing is performed on each participle in the participle set of each text through vectorization tools such as generic word2vec and glove, so as to obtain a vector of each participle in the participle set of each text.

203. And iteratively updating model parameters of the semi-Markov-based conditional random field model and the self-attention model based on the word segmentation vector corresponding to the word segmentation of each text.

In this embodiment, after obtaining the segmentation vector corresponding to the segmentation set of each text, the identification device for the entity tag may iteratively update the model parameters of the conditional random field model based on the half markov model and the self-attention model based on the segmentation vector corresponding to the segmentation of each text.

The following is a detailed description:

step 1, selecting a word segmentation vector corresponding to a word segmentation of a target training text, and inputting the word segmentation vector into a semi-Markov-based conditional random field model to obtain probability values of different semantic block combinations corresponding to the target training text, wherein the target training text is any one text in each text.

In this embodiment, one text may be arbitrarily selected from each text and labeled as a target training text, and then a participle vector corresponding to a participle of the target training text is input to a half-markov-based conditional random field model to obtain probability values of different semantic block combinations corresponding to the target training text, that is, the participles in a participle set of the target training text may be arbitrarily combined to obtain multiple different combination modes, and a probability value of each combination mode is calculated, specifically, the participle vector corresponding to the participle of the target training text may be calculated by the following formula to obtain probability values of different semantic block combinations corresponding to the target training text:

wherein, w ₀ w ₁ …w _m A word segmentation vector of the target training text, m is the number of the word segments in the word segmentation set of the target training text, w ₀ ′w ₁ ′…w _n 'is the combination mode of the participle vectors corresponding to the participle set of the target training text, n is the number of semantic blocks obtained after the participle vectors of the target training text are combined, and P (w' ₀ w′ ₁ …w′ _n ) Is to divide the participle vector of the target training text by w' ₀ w′ ₁ …w′ _n Z (w) is all combination modes of each participle in the participle set of the target training text, M _semi-crf Model parameters for a semi-markov based conditional random field model,

l | is the number of types of entity labels, fd is the dimensionality of the participle vector of the target training text, G (·) is a characteristic function of a semi-markov-based conditional random field model, and Z (w) is determined by the following formula:

for example, the target training text "i want to listen to all flowers and open up", the word segmentation set of the target training text is as follows: the method comprises the steps of determining word segmentation vectors of a target training text, then randomly selecting a combination mode of the words, wherein each combination mode corresponds to a group of semantic block combinations, such as the combination modes of 'I', 'want', 'listen', 'flower-all', calculating probability values of the combination modes of the words through formulas, traversing all the modes of word segmentation combinations in the word segmentation set of the target training text, and calculating the probability values of each combination mode through formulas.

And 2, inputting the vector of each semantic block in the target training semantic block combination into a self-attention model to obtain the entity label probability value of each semantic block in the target training semantic block combination, wherein the target training semantic block combination is the semantic block combination of which the probability value reaches a first preset threshold value in different semantic block combinations corresponding to the target training text.

In this embodiment, after the probability values of all combination modes of the participles in the participle set of the target training text are obtained through the half-markov-based conditional random field model, the combination mode with the probability value reaching the first preset threshold may be selected and marked as a target training semantic block combination, then a vector of each semantic block in the target training semantic block combination is determined (how to determine the vector of each semantic block in the target training semantic block combination is not specifically limited, for example, the vector of each participle in each semantic block may be added or obtained through a deep learning model), and finally the vector of each semantic block in the target training semantic block combination is input from the attention model to obtain an entity tag probability value of each semantic block in the target training semantic block combination.

In one embodiment, inputting the target training semantic block combination into the attention model to obtain an entity label probability value of each semantic block in the target training semantic block combination comprises:

calculating the attention matrix and an output parameter matrix to obtain an entity label probability value of each semantic block in the target training semantic block combination;

The following is illustrated with reference to examples:

e.g. including s in the target training semantic block combinations ₁ ,s ₂ ,s ₃ ,s ₄ Four vectors, s can be first divided ₁ ,s ₂ ,s ₃ ,s ₄ The four vectors are spliced together to form a matrix form

(i.e., an object matrix) in which

d is the dimension of each vector, the formed matrix is marked as a target matrix, and then at least one preset parameter matrix is defined (here, the definition of three preset parameter matrices is taken as an example, respectively: a preset parameter matrix W _Q A preset parameter matrix W _K And a predetermined parameter matrix W _V Wherein

Certainly, other numbers of preset parameter matrices may also be used, for example, one preset parameter matrix, or two parameter matrices, which is not specifically limited), and then the dot products of the target matrix and the three preset parameter matrices are calculated respectively to obtain three parameter matrices, which are respectively a parameter matrix Q, a parameter matrix K, and a parameter matrix V, where the parameter matrix Q is a calculated target matrix and a preset parameter matrix W _Q Obtaining a parameter matrix K as a calculation target matrix and a second preset parameter matrix W _K The parameter matrix V is a calculation target matrix and a preset parameter matrix W _V The dot product of (a).

Then, decomposing the parameter matrix Q into equal-width matrixes to obtain equal-width matrixes

The same principle can obtain the equal width matrix K of the parameter matrix K _i And a constant width matrix V of the parameter matrix V _i 。

Then according to the equal-width matrix Q _i And a matrix K of equal width _i Determining an attention matrix A, i.e.

Computing attention matrix A simultaneously _i And V _i Dot product between to obtain a parameter matrix

Finally defining an output parameter matrix, wherein the output parameter matrix

l is the number of kinds of entity labels, and a parameter matrix O is calculated _i And an output parameter matrix W _O Dot product (O) of _i ·W _O ) That is, the probability value of each entity label of each semantic block in the target training semantic block combination on each entity label in the target training semantic block combination can be obtained, that is, the target training semantic block combination is obtainedEntity label probability values for each semantic block in the block combination.

And 3, adjusting model parameters of the conditional random field model based on the semi-Markov and model parameters of the self-attention model based on the probability value of the target training semantic block combination and the target entity label probability value.

In this embodiment, the model parameter of the half-Markov-based conditional random field model is M _semi-crf The model parameter of the self-attention model is W _Q ,W _K ,W _A ,W _O Then, because the correct partition mode of the target training text is known and the entity labels of the semantic blocks obtained by the correct partition mode are known, the model parameters of the conditional random field model based on the half-markov and the model parameters of the self-attention model are adjusted according to the probability value of the target training semantic block combination, the probability value of the target training semantic block combination and the probability value of the target entity label probability value, so that the probability value of the correct entity label of the output target training semantic block combination is larger than or maximum to a second preset threshold value, and the target entity label probability value is the entity label probability value which reaches the second preset threshold value in all the entity label probability values corresponding to each semantic block in the target training semantic block combination.

And 4, iteratively executing the step 1 to the step 3 based on the half-Markov-based conditional random field model after model parameter adjustment and the self-attention model after model parameter adjustment.

204. And when a preset iteration termination condition is reached, determining the semi-Markov-based conditional random field model at the iteration termination as a first preset model, and determining the self-attention model at the iteration termination as a second preset model.

In this embodiment, each text may be trained through the steps 1 to 4 until a preset iteration termination condition is reached, and the conditional random field model based on the half markov when the iteration is terminated is determined as a first preset model, and the self-attention model when the iteration is terminated is determined as a second preset model.

It should be noted that whether the iteration termination condition is reached may be determined as follows:

judging whether the iteration times reach a preset value or not, and if so, determining that a preset iteration termination condition is met;

or the like, or, alternatively,

and judging whether the model parameters of the conditional random field model based on the half Markov and/or the model parameters of the self-attention model are converged, if so, determining that a preset iteration termination condition is met. That is, after each iteration, it may be determined whether the number of iterations reaches a preset value (for example, 1000 times), or after each iteration is completed, it may be determined whether a model parameter of the conditional random field model based on the half markov and/or a model parameter of the self-attention model converge, if so, it is determined that a preset iteration termination condition is satisfied, and if not, it is determined that the preset iteration termination condition is not satisfied.

It should also be noted that the model parameters of the semi-Markov-based conditional random field model and/or the model parameters of the self-attention model may be judged to converge and calculated by a back-propagation algorithm after each iteration is complete, such as P (w' ₀ w′ ₁ w′ ₂ w′ ₃ ) Training a probability value, P (L), for a semantic Block combination for a target ₀ ,L ₁ ,L ₂ ,L ₃ ) Calculating-log (P (w) 'as target entity tag probability value' ₀ w′ ₁ w′ ₂ w′ ₃ ))-log(P(L ₀ ,L ₁ ,L ₂ ,L ₃ ) Value of-log (P (w) 'if calculated)' ₀ w′ ₁ w′ ₂ w′ ₃ ))-log(P(L ₀ ,L ₁ ,L ₂ ,L ₃ ) ) the values converge, it is determined that a preset iteration termination condition is satisfied.

In summary, in the embodiments provided by the present invention, the semantic block in the training text set is obtained through the semi-markov-based conditional random field model, which is beneficial for the self-attention model to better judge the entity label of the semantic block, so as to improve the accuracy of entity label sequence identification.

It should be noted that, the above is described by comprehensively training the first preset model and the second preset model, but it is also possible to train the first preset model and the second preset model separately, when the first preset model and the second preset model are trained separately, the output value of the first preset model may be saved each time, when the iteration number of the first preset model reaches a value or the model parameters of the first preset model converge, all the output values of the first preset model are saved and input to the second preset model for training, and the specific details are not limited here, as long as the training of the first preset model and the second preset model can be completed,

it should be further noted that, in the process of using the first preset model and the second preset model, the model parameters of the first preset model and the model parameters of the second preset model may also be adjusted through each output, and the details are not limited.

In the above, the method for identifying an entity tag according to the embodiment of the present invention is explained, and an identification apparatus for an entity tag according to the embodiment of the present invention is explained with reference to fig. 2.

Referring to fig. 3, fig. 3 is a schematic view of a virtual structure of an identification apparatus for an entity tag according to an embodiment of the present invention, where the identification apparatus for an entity tag includes:

the word segmentation unit 301 is configured to perform word segmentation on a target text to obtain a target word segmentation set, where the target text is a text of an entity tag to be identified;

a vectorization processing unit 302, configured to perform vectorization processing on each participle in the target participle set;

a processing unit 303, configured to input each participle in the target participle set after vectorization processing into a first preset model to obtain a probability value of each semantic block combination corresponding to the target text, where the first preset model is obtained by training data for the first model, the training data includes a vector of a participle corresponding to each text in a training text set, and each text in the training text set is a text that is artificially labeled with an entity tag sequence, and each semantic block in each semantic block combination at least includes a participle in the target participle set;

a determining unit 304, configured to determine a vector of each semantic block in a target semantic block combination, where the target semantic block combination is a semantic block combination whose probability value reaches a first preset threshold in each semantic block combination corresponding to the target text;

the processing unit 303 is further configured to input a vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, where the second preset model is obtained by training the vector of each semantic block in a first target semantic block combination output by the first model through a second model, the first target semantic block combination is a semantic block combination whose probability value reaches the first preset threshold in the semantic block combination corresponding to each text, and each semantic block in the first target semantic block combination at least includes a word segment corresponding to each text;

the determining unit 304 is further configured to determine, as the entity label of each semantic block in the target semantic block combination, the entity label of which the probability value of each semantic block in the target semantic block combination reaches a second preset threshold.

a training unit 305, the training unit 305 to:

Optionally, the iteratively updating, by the training unit 305, the model parameters of the semi-markov-based conditional random field model and the self-attention model based on the segmentation vector corresponding to the segmentation word of each text includes:

Optionally, the selecting, by the training unit 305, a word segmentation vector corresponding to a word segmentation of the target training text and inputting the word segmentation vector into the half markov-based conditional random field model, and obtaining probability values of different semantic block combinations corresponding to the target training text includes:

Optionally, the inputting, by the training unit 305, a target training semantic block combination into the self-attention model to obtain an entity label probability value of each semantic block in the target training semantic block combination includes:

Optionally, the training unit 305 is further configured to:

or the like, or, alternatively,

The interaction manner between the units of the identification apparatus of the entity tag in this embodiment is as described in the embodiments shown in fig. 1 and fig. 2, and details thereof are not repeated here.

In summary, it can be seen that, in the embodiment provided by the present invention, the text of the entity tag to be recognized is recognized comprehensively through the first preset model and the second preset model trained in advance, and because the first preset model is obtained by training the training data through the first model, and the second preset model is obtained by training the output of the second model through the first model, because the model is recognized through the model, and the model is obtained by a large amount of training in advance, the search intention of the user can be better understood, and the user experience is improved.

Fig. 3 above describes the identification apparatus for entity tags in the embodiment of the present invention from the perspective of modular functional entities, and the following describes the identification apparatus for entity tags in the embodiment of the present invention in detail from the perspective of hardware processing, referring to fig. 4, an embodiment of the identification apparatus 400 for entity tags in the embodiment of the present invention includes:

an input device 401, an output device 402, a processor 403 and a memory 404 (wherein the number of the processor 403 may be one or more, and one processor 403 is taken as an example in fig. 4). In some embodiments of the present invention, the input device 401, the output device 402, the processor 403 and the memory 404 may be connected by a bus or other means, wherein the connection by the bus is exemplified in fig. 4.

Wherein, by calling the operation instruction stored in the memory 404, the processor 403 is configured to execute the following steps:

vectorizing each participle in the target participle set;

determining a vector of each semantic block in a target semantic block combination, wherein the target semantic block combination is a semantic block combination of which the probability value in each semantic block combination corresponding to the target text reaches a first preset threshold value;

inputting a vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, wherein the second preset model is obtained by training the vector of each semantic block in a first target semantic block combination output by a first preset model through a second model, the first target semantic block combination is a semantic block combination of which the probability value reaches a first preset threshold value in the semantic block combination corresponding to each text, and each semantic block in the first target semantic block combination at least comprises a word corresponding to each text;

The processor 403 is also configured to execute any of the embodiments corresponding to fig. 1 and fig. 2 by calling the operation instructions stored in the memory 404.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present invention further provides a storage medium, on which a program is stored, and the program, when executed by a processor, implements the method for identifying the entity tag.

The embodiment of the invention also provides a processor, which is used for running the program, wherein the identification method of the entity label is executed when the program runs.

An embodiment of the present invention further provides an apparatus, where the apparatus includes a processor, a memory, and a program that is stored in the memory and is executable on the processor, and the processor implements the following steps when executing the program:

vectorizing each participle in the target participle set;

inputting a vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, wherein the second preset model is obtained by training the vector of each semantic block in a first target semantic block combination output by a first preset model through a second model, the first target semantic block combination is a semantic block combination of which the probability value in the semantic block combination corresponding to each text reaches a first preset threshold, and each semantic block in the first target semantic block combination at least comprises a word corresponding to each text;

In a specific implementation process, when the processor executes the program, any one of the embodiments corresponding to fig. 1 and fig. 2 may be implemented.

The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The invention also provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:

vectorizing each participle in the target participle set;

In the detailed implementation process, any of the embodiments corresponding to fig. 1 and fig. 2 may be implemented when the computer program product is executed.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional identical elements in the process, method, article, or apparatus comprising the element.

The above description is only an example of the present invention and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. An identification method of an entity tag, comprising:

vectorizing each participle in the target participle set;

and determining the entity label of each semantic block in the target semantic block combination, wherein the probability value of each semantic block in the target semantic block combination reaches a second preset threshold value, as the entity label of each semantic block in the target semantic block combination.

2. The method of claim 1, wherein the first model is a semi-markov based conditional random field model and the second model is a self-attention model, the method further comprising:

iteratively updating model parameters of the semi-Markov-based conditional random field model and the self-attention model based on a participle vector corresponding to a participle of each text;

3. The method of claim 2, wherein iteratively updating model parameters of the semi-Markov-based conditional random field model and the self-attention model based on the corresponding participle vector for the each text participle comprises:

4. The method of claim 3, wherein the selecting the segmentation vector corresponding to the segmentation of the target training text and inputting the semi-Markov based conditional random field model to obtain the probability values of different semantic block combinations corresponding to the target training text comprises:

wherein, w ₀ w ₁ …w _m Is the word segmentation vector of the target training text, m is the number of the word segmentation in the word segmentation set of the target training text, w ₀ ′w ₁ ′…w _n 'is a combination mode of participle vectors corresponding to the participle set of the target training text, n is the number of semantic blocks obtained after the participle vectors of the target training text are combined, and P (w' ₀ w′ ₁ …w′ _n ) To divide word vector of the target training text by w' ₀ w′ ₁ …w′ _n Z (w) is all combination modes of each participle in the participle set of the target training text, M _semi-crf For the model parameters of the semi-Markov based conditional random field model,

5. The method of claim 3, wherein inputting a target training semantic block combination into the self-attention model to obtain an entity tag probability value for each semantic block in the target training semantic block combination comprises:

decomposing the at least one parameter matrix to obtain an equal-width matrix corresponding to the at least one parameter matrix;

determining an attention matrix based on an equal-width matrix corresponding to the at least one parameter matrix;

6. The method according to any one of claims 2 to 5, further comprising:

or the like, or, alternatively,

7. An apparatus for identifying physical tags, comprising:

a processing unit, configured to input each participle in the target participle set after vectorization processing into a first preset model to obtain a probability value of each semantic block combination corresponding to the target text, where the first preset model is obtained by training data for the first model, the training data includes a vector of a participle corresponding to each text in a training text set, each text in the training text set is a text that is artificially labeled with an entity tag sequence, and each semantic block in each semantic block combination at least includes a participle in the target participle set;

the processing unit is further configured to input a vector of each semantic block in the target semantic block combination into a second preset model to obtain a probability value of an entity label of each semantic block in the target semantic block combination, where the second preset model is obtained by training, through a second model, the vector of each semantic block in a first target semantic block combination output by the first preset model, the first target semantic block combination is a semantic block combination whose probability value in the semantic block combination corresponding to each text reaches a first preset threshold, and each semantic block in the first target semantic block combination at least includes a word corresponding to each text;

8. The apparatus of claim 7, wherein the first model is a semi-markov based conditional random field model and the second model is a self-attention model, the apparatus further comprising:

a training unit to:

and when a preset iteration termination condition is reached, determining the semi-Markov-based conditional random field model at the iteration termination as the first preset model, and determining the self-attention model at the iteration termination as the second preset model.

9. The apparatus of claim 8, wherein the training unit iteratively updates model parameters of the semi-Markov based conditional random field model and the self-attention model based on a participle vector corresponding to the participle of each text comprises:

10. The apparatus of claim 9, wherein the training unit selects a segmentation vector corresponding to a segmentation of a target training text and inputs the segmentation vector into the semi-markov-based conditional random field model, and obtaining probability values of different semantic block combinations corresponding to the target training text comprises:

wherein w ₀ w ₁ …w _m A word segmentation vector of the target training text, m is the number of the word segments in the word segmentation set of the target training text, w ₀ ′w ₁ ′…w _n 'is a combination mode of the participle vectors corresponding to the participle set of the target training text, and n is the number of semantic blocks obtained after the participle vectors of the target training text are combined, P (w' ₀ w′ ₁ …w′ _n ) To divide word vector of the target training text by w' ₀ w′ ₁ …w′ _n Z (w) is all combination modes of each participle in the participle set of the target training text, M _semi-crf For the model parameters of the semi-markov based conditional random field model,

l | is the number of types of entity labels, fd is the dimension of each participle vector in the participle vectors of the target training text, and G (·) is a feature function of the semi-markov-based conditional random field model.