WO2021238337A1

WO2021238337A1 - Method and device for entity tagging

Info

Publication number: WO2021238337A1
Application number: PCT/CN2021/080402
Authority: WO
Inventors: 孟函可
Original assignee: 华为技术有限公司
Priority date: 2020-05-29
Filing date: 2021-03-12
Publication date: 2021-12-02
Also published as: CN113743117B; CN113743117A

Abstract

A method and device for entity tagging, applied in the field of artificial intelligence. In the method, a processor can update a first sequence tagging model by using N mask vectors corresponding to N sample sets. M dimensions of each mask vector correspond to M named entities, and therefore, each mask vector can reflect that some of the named entities are concerned and the remaining named entities are not concerned. Thus, in one update of a sequence tagging model, the processor can adjust parameters corresponding to some of the named entities but does not adjust parameters corresponding to the remaining named entities; and after being updated once or multiple times, a second sequence tagging model can predict prediction statements of different corpora, avoiding the need to train M different entity tagging models for the M sample sets, so that the complexity can be reduced, and the performance of entity tagging is facilitated to improve.

Description

Method and device for entity labeling

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 29, 2020, the application number is 202010474348.X, and the application name is "Methods and Apparatus for Entity Marking", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of artificial intelligence (AI), and more specifically to methods and devices for entity annotation in the AI field.

Background technique

Named entity recognition (NER) is a basic task in natural language processing (NLP). NER can identify entities such as person names, place names, organization names, date and time, etc., so that the identified entities can be used for information extraction, information retrieval, syntactic analysis, semantic role labeling, etc.

In NER, the input sentence can be input to the sequence labeling model to output the label of each word. In the prior art, a sequence labeling model trained on a specific corpus can only be applied to specific input sentences. For example, if the training sentence of the sample set of the training sequence labeling model includes movie corpus, the input sequence labeling model predicts the input sentence It is necessary to include movies to predict the label. If the input sentence includes movies and TV shows, only movies can be predicted but TV shows cannot be predicted. If there are multiple input sentences with multiple different corpora, multiple sequence labeling models with different corpora or different corpus combinations need to be trained, which will lead to high complexity. And in order to predict the label of the input sentence, multiple sequence labeling models need to be run concurrently, and the sequence labeling model suitable for the input sentence is matched among the multiple sequence labeling models, resulting in a large amount of calculation and high complexity.

Summary of the invention

The embodiments of the present application provide a method and device for entity labeling, which can reduce complexity and help improve the performance of entity labeling.

In the first aspect, a method for entity labeling is provided. The method can be executed by a processor or a processing module. The method includes: determining N mask vectors of N sample sets, N sample sets and N mask vectors One-to-one correspondence, the entity corpus corresponding to different sample sets in N sample sets is different, each sample set in N sample sets includes multiple samples of at least one entity corpus, and each of the N mask vectors has M dimensions Corresponding to M named entities, M and N are positive integers;

The first sequence labeling model is updated according to the partial samples in each sample set of the N sample sets and the N mask vectors to obtain the second sequence labeling model, and the second sequence labeling model is used for entity labeling.

In the above technical solution, one sample set corresponds to one mask vector, and the entity corpus corresponding to different sample sets is different. In other words, the sample set mask vectors of different corpora are different, and the processor can combine N corresponding to N sample sets. The mask vector updates the first sequence labeling model. Since the M dimensions of each mask vector correspond to M named entities, each mask vector can reflect the part of the named entity concerned and not the remaining part of the named entity. In this way, the processor can update the sequence labeling model at one time. Adjust the parameters corresponding to some named entities, and do not adjust the parameters corresponding to the remaining named entities. After one or more updates, the second-sequence tagging model can predict the prediction sentences of different corpora, avoiding the need to train M different sets of M samples The entity annotation model can reduce complexity and help improve the performance of entity annotation.

Optionally, the N mask vectors are used to mask multiple loss vectors obtained from the N sample sets, and the multiple masked loss vectors are used to update the first sequence labeling model. Optionally, the processor inputs the words of the training sentences of each sample set of the N sample sets into the first sequence labeling model before the update to obtain the weight vector of each word, and the processor calculates the weight vector of each word and the weight vector of each word according to the The actual labels of the words are input into the loss function, and multiple loss vectors are obtained.

The different entity corpora corresponding to different sample sets in N sample sets can be understood as: the entity corpora corresponding to different sample sets in N sample sets are not completely the same. Specifically, the first sample set in the above N sample sets corresponds to the first entity corpus, the second sample set corresponds to the second entity corpus, the first entity corpus is completely different from the second entity corpus, or the first entity corpus is different from the second entity corpus. The entity corpus can have the same part of the corpus. In other words, the entity corpora corresponding to different sample sets in the N sample sets are completely different or partly the same and partly different.

The number of entity corpora corresponding to different sample sets in the above N sample sets is the same and different corpus types (there is at least one corpus with different types), or the number of entity corpora corresponding to different sample sets is at least one of the same or different corpora types. The number of entity corpus corresponding to the set is also different in different corpus types.

One of the above-mentioned N sample sets includes training sentences of at least one entity corpus, and different training sentences included in the same sample set correspond to the same entity corpus.

The dimensions of each of the above N mask vectors are the same, and they are all M-dimensional vectors.

One dimension of each of the above N mask vectors corresponds to a named entity, the M-dimensional mask vector corresponds to M named entities one-to-one, and the N mask vectors correspond to a total of M named entities.

Different entity corpus includes different named entities. For example, the first entity corpus includes the first named entity, the second entity corpus includes the second named entity, and the first named entity and the second named entity are not exactly the same.

Optionally, each mask vector consists of 0 and 1.

It should be noted that the first-sequence annotation model can be updated one or more times in the above solution. After each update of the first-sequence annotation model, the updated one can continue to be called the first-sequence annotation model. After this update, the second-sequence labeling model can be obtained.

Each sample set of the aforementioned N sample sets consists of a test set and a training set. The samples in the training set are used to update the first sequence labeling model, and the samples in the test set are used to test the stability of the second sequence labeling model. The middle sample of each sample set is a sentence that includes entity words. The sample in the test set can be called a test sentence, and the sample in the training set can be called a training sentence.

In some possible implementations, the first sequence labeling model is updated according to the partial samples and N mask vectors in each sample set of N sample sets, including:

Input the first word in the first sample in the first sample set of the N sample sets into the first sequence labeling model, and output the weight vector of the first word;

Input the actual label vector and weight vector of the first word into the loss function, and calculate the loss vector of the first word;

Multiply the loss vector and the first mask vector corresponding to the first sample set to obtain the masked loss vector; update the first sequence labeling model according to the masked loss vector;

Among them, the dimensions of the weight vector, the actual label vector, and the loss vector are M.

In the above solution, when updating the first-sequence tagging model, taking the first word as an example, the first word can be input into the first-sequence tagging model to obtain the weight vector of the first word. The weight vector can reflect the first word to a certain extent. For the possibility of a word being marked as which label, the weight vector and the actual label vector of the first word are used to calculate the loss vector, and the first mask vector is used to mask the loss vector. In this way, the loss vector after the mask is used to update the first When a sequence of labeling models, only the parameters of the named entity corresponding to the non-zero position of the mask vector are adjusted, and the parameters of the named entity corresponding to the zero position of the mask vector are not adjusted, so that the updated first sequence labeling model can be closer to the mask. The sequence labeling model of the named entity corresponding to the non-zero position of the code vector can improve the accuracy of the second sequence labeling model.

The dimension of the weight vector of the first word, the dimension of the actual label vector of the first word, the dimension of the loss vector, the dimension of each mask vector and the dimension of the masked loss vector are the same.

Optionally, the aforementioned loss function is a cross-entropy function.

It should be understood that, in this application, the multiplication of two vectors may be a dot multiplication operation, and the dot multiplication operation is the multiplication of corresponding elements of two vectors.

In some possible implementations, the first word is a physical word in the first sample, rather than a non-physical word. In this way, the efficiency of updating the first sequence labeling model can be improved.

In some possible implementation manners, the method further includes: testing the stability of the second sequence annotation model according to the remaining samples in each sample set of the N sample sets.

Specifically, part of the samples in each sample set can be used to train the first sequence labeling model to obtain the second sequence labeling model. In this way, some samples in each sample set can be used to update the first sequence labeling model, using each sample set The remaining samples of test the stability of the second-sequence annotation model.

In some possible implementations, the method further includes: inputting the second entity word in the prediction sentence into the second sequence labeling model, and outputting the prediction vector;

Determine at least one label of the second entity word according to the prediction vector, and the prediction sentence is a sentence including the entity corpus corresponding to any sample set in the N sample sets;

Among them, the dimension of the prediction vector is M.

In some possible implementations, determining at least one label of the second entity word according to the prediction label vector includes: determining whether the value of each dimension of the prediction vector is greater than a preset value; then the value in the prediction vector is greater than the preset value The named entity tag corresponding to the dimension of is determined to be at least one tag of the second entity word.

In the above solution, the second sequence labeling model can be used to predict the label of the second entity word, and the second sequence labeling model can be used to label the second entity word according to whether the value of each element in the prediction vector output by the second sequence labeling model is greater than a preset value One or more tags. In this way, in this embodiment of the present application, one entity word can be marked with more than one tag.

In some possible implementations, determining N mask vectors of N sample sets includes: determining that the dimension of each mask vector in the N mask vectors is the total number of entity corpus types corresponding to the N sample sets; According to the entity corpus corresponding to each sample set in the N sample set, the value corresponding to each of the N mask vectors is determined.

In a second aspect, a method for entity labeling is provided, including: inputting a second entity word in a prediction sentence into a second sequence labeling model, and outputting a prediction vector; and determining at least one label of the second entity word according to the prediction vector.

The above-mentioned second sequence labeling model is obtained after updating the first sequence labeling model according to the partial samples in each sample set of the N sample sets and the N mask vectors.

In a third aspect, a device for entity labeling is provided, and the device is configured to execute the foregoing first aspect or the method in any possible implementation manner of the first aspect. Specifically, the device may include a module for executing the first aspect or the method in any possible implementation manner of the first aspect.

In a fourth aspect, a device for entity labeling is provided, and the device is configured to execute the foregoing second aspect or any possible implementation method of the second aspect. Specifically, the apparatus may include a module for executing the second aspect or the method in any possible implementation manner of the second aspect.

In a fifth aspect, a device for entity labeling is provided. The device includes a processor, the processor is coupled with a memory, the memory is used to store computer programs or instructions, and the processor is used to execute the computer programs or instructions stored in the memory, so that the first The method in one aspect is executed.

For example, the processor is used to execute a computer program or instruction stored in the memory, so that the device executes the method in the first aspect.

Optionally, the device includes one or more processors.

Optionally, the device may also include a memory coupled with the processor.

Optionally, the device may include one or more memories.

Optionally, the memory can be integrated with the processor or provided separately.

Optionally, the device may also include a transceiver.

In a sixth aspect, a device for entity labeling is provided. The device includes a processor, the processor is coupled with a memory, the memory is used to store computer programs or instructions, and the processor is used to execute the computer programs or instructions stored in the memory, so that the first The method in the two aspects is executed.

For example, the processor is used to execute a computer program or instruction stored in the memory, so that the device executes the method in the second aspect.

Optionally, the device includes one or more processors.

Optionally, the device may also include a memory coupled with the processor.

Optionally, the device may include one or more memories.

Optionally, the device may also include a transceiver.

In a seventh aspect, a computer-readable storage medium is provided, on which a computer program (also referred to as an instruction or code) for implementing the method in the first aspect is stored.

For example, when the computer program is executed by a computer, the computer can execute the method in the first aspect.

In an eighth aspect, a computer-readable storage medium is provided, on which a computer program (also referred to as an instruction or code) for implementing the method in the first aspect or the second aspect is stored.

For example, when the computer program is executed by a computer, the computer can execute the method in the second aspect.

In a ninth aspect, this application provides a chip including a processor. The processor is used to read and execute the computer program stored in the memory to execute the method in the first aspect and any possible implementation manners thereof.

Optionally, the chip further includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire.

Further optionally, the chip further includes a communication interface.

In a tenth aspect, this application provides a chip system including a processor. The processor is used to read and execute the computer program stored in the memory to execute the method in the second aspect and any possible implementation manners thereof.

Further optionally, the chip further includes a communication interface.

In an eleventh aspect, the present application provides a computer program product. The computer program product includes a computer program (also referred to as an instruction or code). When the computer program is executed by a computer, the computer realizes the method.

In the twelfth aspect, the present application provides a computer program product. The computer program product includes a computer program (also referred to as an instruction or code). When the computer program is executed by a computer, the computer realizes the method.

Description of the drawings

Fig. 1 is a schematic diagram of a method for entity labeling provided by an embodiment of the present application.

Fig. 2 is a schematic diagram of a method for obtaining a second sequence labeling model provided by an embodiment of the present application.

Fig. 3 is a schematic diagram of an example of updating a first sequence labeling model provided by an embodiment of the present application.

Fig. 4 is a schematic diagram of another example of updating the first sequence labeling model provided by an embodiment of the present application.

FIG. 5 is a schematic diagram of using a second sequence labeling model for prediction provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of an example of using a second sequence labeling model for prediction provided by an embodiment of the present application.

Fig. 7 is a schematic diagram of a possible application scenario provided by an embodiment of the present application.

Fig. 8 is a schematic block diagram of a device for entity labeling provided by an embodiment of the present application.

FIG. 9 is a schematic block diagram of another apparatus for entity labeling provided by an embodiment of the present application.

Detailed ways

The embodiments provided in this application can be applied to AI services in the AI field. AI services involve voice assistants, subtitle generation, voice input, chat robots, customer robots, or spoken language evaluation. Of course, in actual applications, other AIs can also be included. For services, this embodiment of the application does not limit this.

The following explains the terms used in the embodiments of the present application.

1. Sample set. A sample set consists of a test set and a training set. The samples in the test set are test samples, which can also be called test sentences; the samples in the training set are training samples, and the training samples can also be called training sentences. The samples in each sample set include the same corpus. In other words, the samples in a sample set are composed of test sentences and training sentences that include the same corpus. For example, each sample in sample set 1 includes movie entities, and sample set 1 includes 3 samples as an example. The 3 samples are: I want to watch "Romance of the Three Kingdoms", and show me "Youth in Youth", please open " Tangshan Earthquake"; for another example, at least part of the samples in sample set 2 may include movie and TV series entities, and the remaining part of the samples may include movies or TV series. Taking sample set 2 including 3 training sentences as an example, the 3 training sentences are: Show me "Nezha" (Nezha is a movie) and "Sansheng III" (Sansheng III is a TV series), I want to watch "The Romance of the Three Kingdoms" (the Romance of the Three Kingdoms may be a movie or a TV series here), "Give me Play Nezha" (Nezha is a movie).

2. The sequence labeling model can be a long short-term memory (LSTM)-conditional random field (CRF) model. LSTM is suitable for sequence modeling problems. After LSTM is superimposed on CRF, it is conducive to path planning. The sequence annotation model may also be a sequence-to-sequence (Seq2Seq) or a transform (transformer) model.

3. A mask vector, a vector composed of 0 and 1. One dimension of the mask vector corresponds to a named entity. A value of 1 for one dimension means paying attention to the named entity, and a value of 0 means not paying attention to the named entity. entity.

In the prior art, a sequence labeling model trained on a specific corpus can only be applied to a specific input sentence. For example, if the sample set of the training sequence labeling model is the above sample set 2, then the input sequence labeling model predicts the input sentence It also needs to include movies and/or TV shows. In other words, the entity of the input sentence predicted by the input sequence annotation model needs to be a subset of the entity of the training sentence and the test sentence included in the sample set 2 to meet the accuracy of the prediction. For another example, the sample set of the training sequence labeling model is the above sample set 1, then the input sentence of the input sequence labeling model for prediction needs to include movies before the label can be predicted. If the input sentence includes movies and TV series, it can only predict that the movie cannot be predicted. Predicting TV series, for example, if the input sentence is "I want to watch Romance of the Three Kingdoms", only movie tags can be output to "Romance of the Three Kingdoms". If "Romance of the Three Kingdoms" may be a TV series, the TV series tags cannot be output, which will cause the entity to not be labeled. precise. If you need to predict multiple input sentences of multiple corpora, you need to train multiple sequence labeling models of different corpora or different corpus combinations, which will lead to high complexity. In addition, in order to predict the label of the input sentence, multiple sequence labeling models need to be concurrently used, and the sequence labeling model suitable for the input sentence is matched among the multiple sequence labeling models, resulting in a large amount of calculation and high complexity.

The following describes the method 100 for entity labeling provided by the embodiments of the present application with reference to the accompanying drawings. The method 100 may be executed by a processor, and the method 100 includes:

S110. The processor determines N mask vectors of the N sample sets, and the N sample sets correspond to the N mask vectors one-to-one. The entity corpus corresponding to different sample sets in the N sample sets is different, and each sample in the N sample set The set includes multiple samples of at least one entity corpus. The M dimensions of each of the N mask vectors correspond to M named entities, and M and N are positive integers.

Among them, the different entity corpora corresponding to different sample sets in the N sample sets can be understood as: the entity corpora corresponding to different sample sets are not completely the same. Specifically, the partial entity corpus corresponding to different sample sets in the N sample sets are the same and the partial entity corpus is different, or the entity corpora corresponding to different sample sets in the N sample sets are completely different.

It is understandable that the dimension of each of the above N mask vectors is M, one dimension of the mask vector corresponds to a named entity, and the M-dimensional mask vector corresponds to M named entities one-to-one, and N The mask vector corresponds to a total of M named entities.

Optionally, in the N sample sets, different sample sets include at least one same training sentence and/or test sentence, or each training sentence and/or each test sentence included in the different sample sets are different. To limit.

In order to better explain N sample sets, N mask vectors of M dimensions. The following example describes, assuming that N=6, that is, 6 sample sets (sample set 1, sample set 2, sample set 3, sample set 4, sample set 5, and sample set 6) correspond to 6 mask vectors, and sample set 1 corresponds to Movie corpus, sample set 2 corresponds to TV series corpus, sample set 3 corresponds to variety show corpus, sample set 4 corresponds to animation corpus, sample set 5 corresponds to movie and TV series corpus, and sample set 6 corresponds to TV series and variety show corpus. The 6 sample sets include a total of 4 types of corpus, so the dimension of the mask vector is 4, that is, M=4, and the 4 dimensions correspond to the 4 named entities of movies, TV series, animation, and variety shows. Specifically, the mask can be specified The corresponding relationship between the dimension of the code vector and the named entity. For example, the first dimension of each mask vector corresponds to movies, the second dimension corresponds to TV series, the third dimension corresponds to variety shows, and the fourth dimension corresponds to animation. The 6 mask vectors corresponding to the 6 sample sets are [1 0 0 0], [0 1 0 0], [0 0 1 0], [0 0 0 1], [1 1 0 0], [0 1 1 0].

It should be noted that M and N have the following relationship, and N is less than or equal to

In combination with the above example, if M=4, then N is a positive integer less than or equal to 16. in,

The 1 in it means that the mask vector can be [0 0 0 0], which means that the named entity of any dimension will not be masked;

middle

Indicates that the mask vector can be [1 0 0 0], [0 1 0 0], [0 0 1 0] and [0 0 0 1];

middle

Indicates that the mask vector can be [1 1 0 0], [0 1 1 0], [1 0 1 0], [1 0 0 1], [0 1 0 1] and [0 0 1 1];

middle

Indicates that the mask vector can be [1 1 1 0], [0 1 1 1], [1 0 1 1] and [1 1 0 1];

middle

Indicates that the mask vector can be [1 1 1 1].

S120: The processor updates the first sequence labeling model according to the partial samples in each sample set of the N sample sets and the N mask vectors to obtain a second sequence labeling model, and the second sequence labeling model is used for entity labeling.

In method 100, one sample set corresponds to one mask vector, and different sample sets correspond to different entity corpora. In other words, the sample set mask vectors of different corpora are different, and the processor can combine the N mask vectors corresponding to the N sample sets. The code vector updates the first sequence labeling model. Since the M dimensions of each mask vector correspond to M named entities, each mask vector can reflect the part of the named entity concerned and not the remaining part of the named entity. In this way, the processor can update the sequence labeling model at one time. Adjust the parameters corresponding to some named entities, and do not adjust the parameters corresponding to the remaining part of the named entities. After one or more updates, the second-sequence tagging model can predict the input sentences of different corpora, avoiding the need to train M different for M sample sets The entity annotation model can reduce complexity and help improve the performance of entity annotation.

In order to better understand the above method 100, how to obtain the second sequence labeling model is described in detail below in conjunction with the method 200 of FIG. 2. The method 200 is executed by the processor in the method 100.

S210: The processor obtains N sample sets.

Specifically, before S210, humans need to collect multiple training sentences and multiple test sentences including entity corpus, and manually label multiple training sentences and multiple test sentences. Manual labeling is divided into single named entity labeling and multiple test sentences. Named entity mixed standard, the annotated training sentence is classified according to the named entity label or corpus to obtain N sample sets, and input into the processor.

The entity words in the training sentences and the test sentences included in each sample set of the N sample sets have corresponding actual labels.

Exemplarily, if a training sentence is "I want to watch Nezha and Sansheng III", then Nezha can be marked as a movie label, and Sansheng III as a TV series label. This method is a way of mixing multiple named entities , That is, a training sentence can be labeled with at least two tags. If the training sentence is still "I want to watch Nezha and Sansheng III", you can mark "Nezha" as a movie label, but Sansheng III does not, or mark "Sansheng III" as a TV series, but Nezha does not. This method is a method of labeling a single named entity; in this way, the sample set of different corpora (movie corpus, TV drama corpus, movie + TV drama corpus) can include the training sentence "I want to see Nezha and Sansheng III".

S220: The processor determines N mask vectors corresponding to the N sample sets, and S220 is equivalent to S110. Among them, N sample sets have a one-to-one correspondence with N mask vectors.

Specifically, the processor determines the dimension and value of each mask vector according to the named entities included in each sample set of the N sample sets. For the specific determination method, refer to the description of S110.

It should be noted that one sample set corresponding to one mask vector can be understood as multiple samples in one sample set corresponding to one mask vector.

The actual label vector exists for the words in each training sentence. The actual label vector and the mask vector have the same dimensions. Combined with the example in S110, the dimension of the mask vector is 4, and the dimension of the actual label vector is also 4. The actual label vector The first dimension corresponds to movies, the second dimension corresponds to TV shows, the third dimension corresponds to variety shows, and the fourth dimension corresponds to animation. For the actual label of "Nezha" in the training sentence "I want to watch Nezha" The vector is [1 0 0], and the actual label vector of "The Romance of the Three Kingdoms" in the training sentence "I want to watch the Romance of the Three Kingdoms" is [1 1 0 0], that is, the Romance of the Three Kingdoms may be a movie or a TV series.

S230: Determine the first sequence labeling model. For example, the initial first sequence labeling model may be an LSTM-CRF model.

It should be noted that the sequence of S220 and S230 is not limited, and S220 can be performed before or after S230 or at the same time.

The following describes the first word in the first sample in the first sample set (the first sample is a training sentence) in the first sample set of N sample sets as an example. The words in the samples in other sample sets are similar to the first word. Avoid repeating it and give examples in detail.

It should be noted that, if the first-sequence annotation model is updated once by the processor, the updated first-sequence annotation model may also be referred to as the first-sequence annotation model.

S240. The processor inputs the first word in the first sample in the first sample set of the N sample sets into the first sequence labeling model, and outputs a weight vector of the first word.

It is understandable that the physical meaning of the weight vector of the first word is: the first word is the weight of each named entity tag, and the larger a certain dimension of the weight vector is, it means that the first word is the weight of the named entity tag corresponding to that dimension. The more likely it is.

It should be noted that in this embodiment of the application, the dimensions of the first mask vector, the actual label vector of the first word, and the weight vector of the first word are the same, and the named entities corresponding to the same dimension of each vector are the same, for example, Combining the example of S110, the first dimension of the first mask vector, the actual label vector of the first word, and the weight vector of the first word corresponds to movies, the second dimension corresponds to TV shows, the third dimension corresponds to variety shows, and the fourth dimension corresponds to cartoon.

Optionally, the first word is a physical word in the first training sentence, of course, it may also be a non-physical word, which is not limited in the embodiment of the present application.

S250: The processor inputs the actual label vector and weight vector of the first word into the loss function, and calculates the loss vector of the first word.

For example, the loss function is a cross entropy function.

It should be noted that in S250, the processor needs to compare the actual label vector of the first word with the weight vector to determine the degree of deviation between the weight vector output by the first sequence model and the actual label vector of the first word.

S260: The processor multiplies the loss vector of the first word by the first mask vector corresponding to the first sample set to obtain a masked loss vector. The masked loss vector is used to update the first sequence labeling model, therefore, S230 is executed.

In S260, the first mask vector corresponding to the first sample set only focuses on the named entity corresponding to the non-zero part. After the processor multiplies the first mask vector by the loss vector of the first word, the processor obtains When updating or adjusting the first-sequence labeling model with the masked loss vector, only pay attention to the named entity corresponding to the non-zero part of the first mask vector, and not pay attention to the named entity corresponding to zero. In other words, the processor is adjusting The parameters corresponding to some named entities of the first-sequence tagging model will not affect the parameters of other named entities, which can ensure that the adjusted first-sequence tagging model can meet the input sentences of different corpora.

The above S230-S260 are the execution process of the first word in the first training sentence in the first sample set, and any one word in any training sentence in any sample set can also perform a process similar to S230-S260. In order to avoid repetitive description, the embodiments of the present application are not described in detail. Only the following two cases will be discussed in which time sequence the processor uses multiple training sentences of each sample set to update the first sequence labeling model:

Case 1: The processor inputs part of the samples in each sample set into the first sequence labeling model in batches, and the first sequence labeling model can be updated multiple times at the same time. For example, N=3, each sample set includes 100 samples, 70 samples of each sample set are training samples, and 70 training samples are used to update the first sequence labeling model (the remaining 30 samples in each sample set) Samples are test samples, 30 test samples are used to test the stability of the second sequence labeling model), 70 training samples of each sample set are input into the first sequence labeling model in 7 batches, for example, the first sequence The batch of training samples includes 10 training samples in each training sample set in 3 training sample sets. For example, the processor can update the first sequence labeling model once according to a training sample in a training sample set, and a training sample includes an entity word As an example, at the first time point, the processor can update the first sequence labeling model at the same time for the 3 training samples in the 3 training sample set according to S240-S260; at the second time point, the processor can update The other 3 training samples in the 3 training sample sets update the first sequence labeling model at the same time according to S240-S260; and so on, at the 10th time point, the processor can combine the last 3 training samples in the 3 training sample sets Follow S240-S260 to update the first sequence labeling model at the same time, complete the process of using the first batch of training samples to update the first sequence labeling model; and so on, input 60 training samples from each training sample set in 3 training sample sets Go to the first-sequence labeling model to obtain the updated first-sequence labeling model. This example is just to better illustrate the process of updating the first sequence labeling model. The above description is that the processor can update the first sequence labeling model once according to a word in a training sample of a training sample set, or according to a sample The first sequence labeling model is updated once for multiple words in multiple training samples in the set.

Case 2: The processor mixes all the training samples included in each sample set of the N sample sets, and then inputs the mixed training samples into the first sequence labeling model in batches, and executes S240-S260 once for each training sample. , The first sequence labeling model can be updated in batches, where each training sample has a corresponding mask vector.

In order to better illustrate the method 200, the following is an example with reference to Figure 3. As shown in Figure 3, the sample set of the movie corpus includes training samples of "I want to watch the Romance of the Three Kingdoms", and the mask vector corresponding to the sample set of the movie corpus is [1 0 0 0]. The processor inputs the words "I want to see the Romance of the Three Kingdoms" into the first sequence labeling model. Among them, "I", "Yao" and "Look" are intangible words marked with "O" in the graph, and the processor will "Three Kingdoms" Enter the first sequence labeling model for "Romance", that is, this step is the above-mentioned S240. For example, for "Romance of the Three Kingdoms", the output weight vector P is [0.5 0.4 0 0.1]. The first dimension of the weight vector corresponds to movies, and the second dimension corresponds to TV shows. , The third dimension corresponds to variety shows, and the fourth dimension corresponds to animation, that is, the probability that "Romance of the Three Kingdoms" may be a movie is 0.5, the probability that it may be a TV series is 0.4, the probability that it may be a variety show is 0, and the probability that it may be an animation is 0.1. At this time, the values of the dimensions of the weight vector add up to 1. In the movie corpus, the actual label vector Y of "Romance of the Three Kingdoms" is [1 0 0 0], for example, the loss function is -(y _i log(p _i )+(1-y _i )log(1-p _i )), Among them, y _i is the value of the corresponding dimension of Y, p _i is the value of the corresponding dimension of P, and the value of i is 1, 2, 3, 4. Then the loss vector calculated by the processor according to P and Y is [0.3 0.2 0 0.04], and the processor uses the mask vector [1 0 0 0] to multiply the loss vector [0.3 0.2 0 0.04] to obtain the masked loss Vector [0.3 0 0 0], and then feed back the masked loss vector [0.3 0 0 0] to the first sequence labeling model, and use the masked loss vector [0.3 0 0 0] to adjust only the first sequence label The parameters of the model about the part of the movie, other parameters remain unchanged. Among them, the multiplication of two vectors can also be a dot multiplication operation, that is, the corresponding positions of the two vectors are multiplied.

In order to better illustrate the method 200, the following is an example with reference to Figure 4. As shown in Figure 4, the sample set of movie + TV drama corpus includes training samples of "I want to watch the Romance of the Three Kingdoms", and the sample set of movie corpus + TV drama corresponds to The mask vector is [1 1 0 0]. The processor inputs the words "I want to see the Romance of the Three Kingdoms" into the first sequence labeling model. Among them, "I", "Yao" and "Look" are intangible words marked with "O" in the graph, and the processor will "Three Kingdoms" Enter the first sequence labeling model for "Romance", that is, this step is the above-mentioned S240. For example, for "Romance of the Three Kingdoms", the output weight vector P is [0.5 0.4 0 0.1]. The first dimension of the weight vector corresponds to movies, and the second dimension corresponds to TV shows. , The third dimension corresponds to variety shows, and the fourth dimension corresponds to animation, that is, the probability that "Romance of the Three Kingdoms" may be a movie is 0.5, the probability that it may be a TV series is 0.4, the probability that it may be a variety show is 0, and the probability that it may be an animation is 0.1. At this time, the values of the dimensions of the weight vector add up to 1. In the movie corpus, the actual label vector Y of "The Romance of the Three Kingdoms" is [1 1 0 0], that is, the Romance of the Three Kingdoms may be a TV series or a movie. E.g. loss function is _{_{- (y i log (p i}} ) + (1-y i) log (1-p i)), where y _i is the value corresponding to the Y dimension, p _i is a value corresponding to P dimension, The value of i is 1, 2, 3, 4. Then the loss vector calculated by the processor according to P and Y is [0.3 0.2 0 0.04], and the processor uses the mask vector [1 1 0 0] to multiply the loss vector [0.3 0.2 0 0.04] to obtain the masked loss Vector [0.3 0.2 0 0], and then feed back the masked loss vector [0.3 0.2 0 0] to the first sequence labeling model, and use the masked loss vector [0.3 0 0 0] to adjust only the first sequence label Regarding the parameters of the part of the model for movies and TV series, other parameters remain unchanged. Among them, the multiplication of two vectors can also be a dot multiplication operation, that is, the corresponding positions of the two vectors are multiplied.

It is understandable that, to a certain extent, the mask vector corresponding to a sample set is equal to the actual label vector of the words in the training sample of the sample set. For example, in Figure 3, the mask vector and training sample corresponding to the movie sample set The corresponding actual label vectors are all [1 0 0 0]; for another example, in Figure 4, the mask vector corresponding to the movie + TV series sample set and the actual label vector corresponding to the training sample are both [1 1 0 0].

The description in Figure 2 to Figure 4 above is to use part of the samples in each sample set of N sample sets (partial samples are also called training samples) to update the first sequence labeling model to obtain the second sequence labeling model. When the second sequence is obtained When labeling the model, the processor can use the remaining samples in each of the N sample sets (the remaining samples may also be referred to as test samples) to test the stability of the labeling model in the second sequence. For example, the remaining samples of each sample set are input into the second sequence labeling model, and the second sequence labeling model outputs the weight vector of each word of each sample, and the named entity label of each word is determined according to the weight vector. The named entity label of the word is compared with the actual label. If they are consistent, the number of qualified samples is increased by one, otherwise the number of unqualified samples is increased by one, and so on, until the processor inputs the remaining samples into the second sequence labeling model to determine the sample If the qualification rate meets the threshold, it means that the second-sequence labeling model is stable, otherwise the second-sequence labeling model is unstable. Continue to perform the above figure 1 to figure 2 method. The method of continuing to execute the above-mentioned Figure 1 or Figure 2 can be: updating the first sequence labeling model to the second sequence labeling model after testing, re-collecting the sample set, and continuing to execute the method shown in Figure 1 or Figure 2, or it can be : Re-determine a first-sequence annotation model that is not related to the second-sequence annotation model, re-collect the sample set, and continue to execute the method shown in Figure 1 or Figure 2 until the obtained second-sequence annotation model is stable.

After the above-mentioned process, after the obtained second-sequence annotation model is stable, the second-sequence annotation model can be used for prediction. The specific prediction process is shown in FIG. 5, which is executed by the processor, and the method 500 includes:

S510: The processor inputs the second entity word in the prediction sentence into the second sequence labeling model, and outputs the prediction vector.

S520: The processor determines at least one label of the second entity word according to the prediction vector, and the prediction sentence is a sentence including the entity corpus corresponding to any sample set in the N sample sets. Among them, the dimension of the prediction vector is M.

Optionally, S520 includes: determining a named entity tag corresponding to a dimension whose value is greater than a preset value in the prediction vector as the at least one tag of the second entity word. For example, the default value is 0.5. As shown in Figure 6, "I want to see the Romance of the Three Kingdoms" is entered into the second sequence labeling model. "The output prediction vector is [0.7 0.6 0 1]. The first dimension and the second dimension of the prediction vector are both greater than 0.5. The first dimension corresponds to movies, and the second dimension corresponds to TV series. The named entity label of "Romance of the Three Kingdoms" is Movies and TV series.

It should be noted that the sum of the values of the various dimensions of the prediction vector may be equal to 1 or may not be equal to 1, for example, may be greater than 1, which is not limited in the embodiment of the present application.

The following describes a possible voice assistant scenario in the embodiment of the present application with reference to FIG. 7, which is coordinated by the various modules in FIG. 7. In the above method embodiment, the processor may include a natural language understanding (NLU) module, such as As shown in Figure 7, it consists of an automatic speech recognition (ASR) module, a dialog manager (DM) module, a natural language understanding (NLU) module, and a text to speech (text to speech, TTS) module. The module executes the method 700. Specifically, the method 700 includes the following steps:

S701, the ASR module receives the user's statement.

S702: The ASR module converts the user's speech into text information.

S703: The ASR module sends the text information to the DM module.

S704: The DM module combines the context of the text information to determine the context information corresponding to the text information.

It should be noted that the user's statement in S701 is a verbal expression, and the user may have said other statements related to this conversation before the verbal expression. In this way, the user's other statements are contextual information.

S705: The DM module sends the context information and text information to the NLU module.

The text information at this time can also be called a predicted input sentence.

S706: The NLU module inputs the text information into the second sequence labeling model, and determines the intent information and slot information corresponding to the text information in combination with the context information.

It should be noted that S706 is related to the foregoing embodiment of the present application, using the second sequence labeling model to predict the named entity tags of the entity words in the input sentence, and determining the intent information and slot information corresponding to the text information according to the named entity tags.

S707, the NLU module sends the intent information and the slot information to the DM module.

S708: The DM module calls the voice result in the TTS module according to the intent information and the slot information.

S709: The TTS module performs voice playback to the user according to the voice result.

In order to better understand the method 700, for example, the user in S701 will say "I want to check tomorrow's weather", and the upper and lower information of the text information is determined in S704. For example, before the user says "I want to check tomorrow's weather", The user also said "I want to query the weather in Beijing". In S706, the NLU module knows that the user's intention is to query the weather based on these two sentences, and the slot is to query the weather in Beijing tomorrow. In S708, the voice result is query Tomorrow's weather in Beijing, S709, TTS will play the weather in Beijing tomorrow.

It should be noted that the logical division of each module shown in FIG. 7 is only for a better understanding of the scenario. In actual applications, it is not limited to the foregoing modules, and other division methods may also be used, which are not limited in the embodiment of the present application. In addition, FIG. 7 only shows one possible application scenario, and the embodiment of the present application can also be applied to other application scenarios, such as playing video of a TV voice assistant.

It should also be noted that the examples of the embodiments of this application are described in Chinese in the sample collection, and the embodiments of this application can also be applied to any possible language, for example, English, French, or German. Applications are not restricted.

The various embodiments described herein may be independent solutions, or may be combined according to internal logic, and these solutions fall into the protection scope of the present application.

It can be understood that the methods and operations implemented by the electronic device in each of the foregoing method embodiments may also be implemented by components (for example, chips or circuits) that can be used in electronic devices.

The method embodiments provided by the present application are described above, and the device embodiments provided by the present application will be described below. It should be understood that the description of the device embodiment and the description of the method embodiment correspond to each other. Therefore, for the content that is not described in detail, please refer to the above method embodiment. For the sake of brevity, it will not be repeated here.

Those skilled in the art should be aware that, in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered as going beyond the scope of protection of this application.

The embodiment of the present application may divide the electronic device into functional modules based on the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other feasible division methods in actual implementation. The following is an example of dividing each function module corresponding to each function as an example.

FIG. 8 is a schematic block diagram of an apparatus 800 for labeling named entities according to an embodiment of the application. The device 800 includes a determining unit 810 and an updating unit 820. The determining unit 810 is configured to perform operations related to the determination of the processor in the above embodiment. The update unit 820 is configured to perform operations related to the update of the processor in the above embodiment.

The determining unit 810 is used to determine N mask vectors of N sample sets. The N sample sets correspond to the N mask vectors in a one-to-one manner. The entity corpus corresponding to different sample sets in the N sample sets is different. A sample set includes multiple samples of at least one entity corpus, the M dimensions of each mask vector in the N mask vectors correspond to M named entities, and M and N are positive integers;

The updating unit 820 is configured to update the first sequence labeling model according to the partial samples in each sample set of the N sample sets and the N mask vectors to obtain the second sequence labeling model, and the second sequence labeling model is used for entity labeling.

As an optional embodiment, the determining unit 810 is specifically configured to:

Multiply the loss vector and the first mask vector corresponding to the first sample set to obtain the masked loss vector;

Update the first sequence labeling model according to the masked loss vector;

As an optional embodiment, the first word is an entity word in the first sample.

As an optional embodiment, the loss function is a cross-entropy function.

As an optional embodiment, the device 800 further includes: a testing unit configured to test the stability of the second sequence annotation model according to the remaining samples in each sample set of the N sample sets.

As an optional embodiment, the device 800 further includes: an input and output unit, configured to input the second entity word in the prediction sentence into the second sequence labeling model, and output the prediction vector; the determining unit is also configured to determine the second entity word according to the prediction vector At least one label of the entity word, the prediction sentence is a sentence including the entity corpus corresponding to any sample set in the N sample sets, and the dimension of the prediction vector is M.

Among them, the input and output unit can communicate with the outside. The input and output unit may also be referred to as a communication interface or a communication unit.

As an optional embodiment, the determining unit 810 is specifically configured to: determine whether the value of each dimension of the prediction vector is greater than a preset value; determine the named entity label corresponding to the dimension whose value is greater than the preset value in the prediction vector as the second At least one label of the entity word.

As an optional embodiment, the determining unit 810 is specifically configured to: determine that the dimension of each mask vector in the N mask vectors is the total number of entity corpus types corresponding to the N sample sets;

According to the entity corpus corresponding to each sample set in the N sample set, the value corresponding to each of the N mask vectors is determined.

FIG. 9 is a schematic structural diagram of an apparatus 900 for labeling named entities provided by an embodiment of the present application. The communication device 900 includes a processor 910, a memory 920, a communication interface 930, and a bus 940.

The processor 910 in the device 900 shown in FIG. 9 may correspond to the determining unit 810 and the updating unit 820 in the device 800 in FIG. 8. The communication interface 930 may correspond to an input and output unit in the device 800.

Wherein, the processor 910 may be connected to the memory 920. The memory 920 can be used to store the program code and data. Therefore, the memory 920 may be a storage unit inside the processor 910, or an external storage unit independent of the processor 910, or may include a storage unit inside the processor 910 and an external storage unit independent of the processor 910. part.

Optionally, the apparatus 900 may further include a bus 940. The memory 920 and the communication interface 930 may be connected to the processor 910 through the bus 940. The bus 940 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus 940 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one line is used to represent in FIG. 9, but it does not mean that there is only one bus or one type of bus.

It should be understood that, in this embodiment of the present application, the processor 910 may adopt a central processing unit (central processing unit, CPU). The processor can also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (ASICs), ready-made programmable gate arrays (field programmable gate arrays, FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. Or, the processor 910 adopts one or more integrated circuits to execute related programs to implement the technical solutions provided in the embodiments of the present application.

The memory 920 may include a read-only memory and a random access memory, and provides instructions and data to the processor 910. A part of the processor 910 may also include a non-volatile random access memory. For example, the processor 910 may also store device type information.

When the device 900 is running, the processor 910 executes the computer-executable instructions in the memory 920 to execute the operation steps of the foregoing method through the device 900.

It should be understood that the device 900 according to the embodiment of the present application may correspond to the device 800 in the embodiment of the present application, and the above and other operations and/or functions of each unit in the device 800 are used to implement the corresponding process of the method. For brevity, I won't repeat it here.

Optionally, in some embodiments, the embodiments of the present application also provide a computer-readable medium, the computer-readable medium stores program code, and when the computer program code runs on the computer, the computer executes The methods in the above aspects.

Optionally, in some embodiments, the embodiments of the present application also provide a computer program product, the computer program product includes: computer program code, when the computer program code runs on a computer, the computer executes the above Methods in all aspects.

In the embodiment of the present application, the terminal device or the network device includes a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer. Among them, the hardware layer may include hardware such as a central processing unit (CPU), a memory management unit (MMU), and memory (also referred to as main memory). The operating system of the operating system layer can be any one or more computer operating systems that implement business processing through processes, for example, Linux operating systems, Unix operating systems, Android operating systems, iOS operating systems, or windows operating systems. The application layer can include applications such as browsers, address books, word processing software, and instant messaging software.

The embodiment of this application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of this application, as long as it can run a program that records the code of the method provided in the embodiment of this application to follow the method provided in the embodiment of this application. Just communicate. For example, the execution subject of the method provided in the embodiments of the present application may be a terminal device or a network device, or a functional module in the terminal device or the network device that can call and execute the program.

The various aspects or features of this application can be implemented as methods, devices, or products using standard programming and/or engineering techniques. The term "article of manufacture" used herein can encompass a computer program accessible from any computer-readable device, carrier, or medium. For example, the computer-readable medium may include, but is not limited to: magnetic storage devices (for example, hard disks, floppy disks, or tapes, etc.), optical disks (for example, compact discs (CD), digital versatile discs (digital versatile disc, DVD), etc.), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.).

The various storage media described herein may represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to: wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.

It should be understood that the processor mentioned in the embodiment of the present application may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits ( application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

It should also be understood that the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM). For example, RAM can be used as an external cache. As an example and not a limitation, RAM can include the following various forms: static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM) , Double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and Direct RAM Bus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component, the memory (storage module) can be integrated in the processor.

It should also be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.

A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this document, the units and steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to realize the described functions, but this realization should not be considered as going beyond the protection scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated into one unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the essence of the technical solution of this application, or the part that contributes to the existing technology, or the part of the technical solution, can be embodied in the form of a computer software product, and the computer software product is stored in a storage In the medium, the computer software product includes a number of instructions, which are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media may include but are not limited to: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks, etc., which can store programs The medium of the code.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terminology used in the specification of the application herein is only for the purpose of describing specific embodiments, and is not intended to limit the application.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for entity labeling, which is characterized in that it includes:

Determine N mask vectors of N sample sets, the N sample sets correspond to the N mask vectors one-to-one, the entity corpora corresponding to different sample sets in the N sample sets are different, and the N samples Each sample set in the set includes multiple samples of at least one entity corpus, and the M dimensions of each of the N mask vectors correspond to M named entities, and M and N are positive integers;

The first sequence labeling model is updated according to the partial samples in each sample set of the N sample sets and the N mask vectors to obtain a second sequence labeling model, and the second sequence labeling model is used for entity labeling.
The method according to claim 1, wherein the updating the first sequence labeling model according to the partial samples in each sample set of the N sample sets and the N mask vectors comprises:

Inputting the first word in the first sample in the first sample set of the N sample sets into the first sequence tagging model, and outputting the weight vector of the first word;

Input the actual label vector of the first word and the weight vector into a loss function, and calculate the loss vector of the first word;

Multiply the loss vector and the first mask vector corresponding to the first sample set to obtain the masked loss vector;

Updating the first sequence labeling model according to the loss vector after the mask;

Wherein, the dimensions of the weight vector, the actual label vector, and the loss vector are M.
The method according to claim 2, wherein the first word is an entity word in the first sample.
The method according to claim 2 or 3, wherein the loss function is a cross entropy function.
The method according to any one of claims 1 to 4, wherein the method further comprises:

The stability of the second sequence annotation model is tested according to the remaining samples in each sample set of the N sample sets.
The method according to any one of claims 1 to 5, wherein the method further comprises:

Input the second entity word in the prediction sentence into the second sequence labeling model, and output the prediction vector;

Determining at least one label of the second entity word according to the prediction vector, and the prediction sentence is a sentence including an entity corpus corresponding to any sample set in the N sample sets;

Wherein, the dimension of the prediction vector is M.
The method according to claim 6, wherein the determining at least one label of the second entity word according to the prediction vector comprises:

Determine whether the value of each dimension of the prediction vector is greater than a preset value;

Determine a named entity tag corresponding to a dimension whose value is greater than a preset value in the prediction vector as the at least one tag of the second entity word.
The method according to any one of claims 1 to 7, wherein the determining N mask vectors of N sample sets comprises:

Determining that the dimension of each mask vector in the N mask vectors is the total number of entity corpus types corresponding to the N sample sets;

The value corresponding to each mask vector of the N mask vectors is determined according to the entity corpus corresponding to each sample set in the N sample sets.
A device for entity labeling, characterized in that it comprises:

The determining unit is configured to determine N mask vectors of N sample sets, where the N sample sets correspond to the N mask vectors one-to-one, and the entity corpora corresponding to different sample sets in the N sample sets are different, Each sample set of the N sample sets includes a plurality of samples of at least one entity corpus, the M dimensions of each of the N mask vectors correspond to M named entities, and M and N are positive integers;

The update unit is configured to update the first sequence labeling model according to the partial samples in each sample set of the N sample sets and the N mask vectors to obtain a second sequence labeling model, and the second sequence labeling model is used for Entity annotation.
The device according to claim 9, wherein the determining unit is specifically configured to:

Inputting the first word in the first sample in the first sample set of the N sample sets into the first sequence labeling model, and outputting the weight vector of the first word;

Input the actual label vector of the first word and the weight vector into a loss function, and calculate the loss vector of the first word;

Multiply the loss vector and the first mask vector corresponding to the first sample set to obtain the masked loss vector;

Updating the first sequence labeling model according to the loss vector after the mask;

Wherein, the dimensions of the weight vector, the actual label vector, and the loss vector are M.
The device of claim 10, wherein the first word is a physical word in the first sample.
The device according to claim 10 or 11, wherein the loss function is a cross entropy function.
The device according to any one of claims 9 to 12, wherein the device further comprises:

The testing unit is configured to test the stability of the second sequence annotation model according to the remaining samples in each sample set of the N sample sets.
The device according to any one of claims 9 to 13, wherein the device further comprises:

The input and output unit is configured to input the second entity word in the prediction sentence into the second sequence labeling model, and output the prediction vector;

The determining unit is further configured to determine at least one label of the second entity word according to the prediction vector, and the prediction sentence is a sentence including the entity corpus corresponding to any sample set in the N sample sets;

Wherein, the dimension of the prediction vector is M.
The device according to claim 14, wherein the determining unit is specifically configured to:

Determine whether the value of each dimension of the prediction vector is greater than a preset value;

Determine a named entity tag corresponding to a dimension whose value is greater than a preset value in the prediction vector as the at least one tag of the second entity word.
The device according to any one of claims 9 to 15, wherein the determining unit is specifically configured to:

Determining that the dimension of each mask vector in the N mask vectors is the total number of entity corpus types corresponding to the N sample sets;

The value corresponding to each mask vector of the N mask vectors is determined according to the entity corpus corresponding to each sample set in the N sample sets.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed, the method according to any one of claims 1 to 8 is implemented.
A chip comprising a processor, the processor is connected to a memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the chip executes as claimed in claim 1. To the method described in any one of 8.