CN115392386A

CN115392386A - Model training method, device and equipment

Info

Publication number: CN115392386A
Application number: CN202211058124.6A
Authority: CN
Inventors: 赵闻飙; 兰钧; 王昊天; 孟昌华; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-11-25

Abstract

The embodiment of the specification discloses a method, a device and equipment for training a model, wherein the method comprises the following steps: the method comprises the steps of obtaining a text data sample for training a target model, wherein the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the sample for training the target model, inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each class label in the second number of class labels, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the obtained probability and the first number of class labels, cutting the first number of class labels based on the loss information corresponding to the text data sample, and performing model training on the target model based on the text data sample containing the remaining class labels through a back propagation algorithm.

Description

Model training method, device and equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for training a model.

Background

The samples are data necessary for training the model, and the samples need not only raw data necessary for training the model but also class labels corresponding to the samples, so as to perform loss calculation on the result of model prediction and the class labels, thereby adjusting the model parameters of the model. In practical applications, a sample may include one class label or may include a plurality of different class labels, however, one or more class labels of the sample may also be false labels, thus generating a noisy label for the sample.

Samples with noise labels are very common in daily business, especially under the condition that the class labels of the samples are more and the distinction among different class labels is not very obvious (such as under a human-computer interaction scene), more error labels (namely, the noise labels) can be generated in the process of marking some samples, and if a sample training model with the noise labels is used, the condition that the error labels are fit can be easily generated.

Disclosure of Invention

The purpose of the embodiments of the present specification is to provide a better processing mechanism for noisy samples, so that a model with a better output effect can be trained on noisy samples.

In order to implement the above technical solution, the embodiments of the present specification are implemented as follows:

the embodiment of the present specification provides a training method of a model, the method includes: the method comprises the steps of obtaining text data samples used for training a target model, wherein the text data samples are obtained by converting voice data input by a user in a human-computer interaction process, the text data samples comprise a first number of class labels, the first number does not exceed a second number corresponding to the class labels of the samples used for training the target model, and the target model is a model used for identifying the intention of the user. Inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample. And based on the loss information corresponding to the text data sample, performing cutting processing on the first number of category labels contained in the text data sample to obtain the text data sample containing the remaining category labels. And performing model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

The embodiment of this specification provides a training device of model, the device includes: the system comprises a sample acquisition module and a target model recognition module, wherein the sample acquisition module is used for acquiring a text data sample for training a target model, the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the samples for training the target model, and the target model is a model for recognizing the intention of the user. The loss determining module is used for inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample. And the label cutting module is used for cutting the first number of category labels contained in the text data sample based on the loss information corresponding to the text data sample to obtain the text data sample containing the remaining category labels. And the model training module performs model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

The embodiment of the present specification provides a training apparatus for a model, where the training apparatus for a model includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: the method comprises the steps of obtaining a text data sample for training a target model, wherein the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the sample for training the target model, and the target model is a model for identifying the intention of the user. Inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample. And based on the loss information corresponding to the text data sample, cutting a first number of category labels contained in the text data sample to obtain the text data sample containing the remaining category labels. And performing model training on the target model based on the text data sample containing the rest class labels through a back propagation algorithm to obtain the trained target model.

The present specification also provides a storage medium for storing computer executable instructions, which when executed by a processor implement the following procedures: the method comprises the steps of obtaining a text data sample for training a target model, wherein the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the sample for training the target model, and the target model is a model for identifying the intention of the user. Inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample. And based on the loss information corresponding to the text data sample, cutting a first number of category labels contained in the text data sample to obtain the text data sample containing the remaining category labels. And performing model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort;

FIG. 1 illustrates an embodiment of a model training method of the present disclosure;

FIG. 2 is a schematic diagram of a sample input page according to the present disclosure;

FIG. 3 is a diagram of another embodiment of a model training method according to the present disclosure;

FIG. 4 is a schematic diagram of a training process for a model of the present disclosure;

FIG. 5 is a block diagram illustrating an embodiment of a method for training a model according to the present disclosure;

FIG. 6 is a block diagram illustrating an embodiment of a method for training a model according to the present disclosure;

FIG. 7 is an embodiment of a model training apparatus according to the present disclosure;

FIG. 8 is an embodiment of a training apparatus for a model of the present disclosure.

Detailed Description

The embodiment of the specification provides a model training method, a model training device and model training equipment.

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

Example one

As shown in fig. 1, an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone and a tablet computer, or a device such as a personal computer, the server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of a financial service or an online shopping service, or a background server of an application. The method may be applied to relevant scenes where model training and the like are set, in this embodiment, a server is used as an execution subject to be described in detail, and for the case of the terminal device, the following relevant contents may be referred to, and are not described herein again. The method specifically comprises the following steps:

in step S102, a first sample for training the target model is obtained, where the first sample includes a first number of class labels, and the first number does not exceed a second number corresponding to the class labels of the samples for training the target model.

In practical applications, the target model may be a relevant model for one or more services, and different target models may be constructed based on different services, for example, if the service is an intention recognition service, that is, when human-computer interaction is performed, a dialog intention of a user is recognized or understood, the target model may be a model for recognizing the intention of the user, if the service is an information recommendation service, the target model may be a model for information recommendation, if the service is a risk prevention and control service in a financial system, the target model may be a model for risk prevention and control of the financial system, and if the service is a commodity transaction service, the target model may be a model for predicting sales of a certain commodity. The target model can include a plurality of types, and different target models can be built in different manners, for example, a model for information recommendation can be built through a classification algorithm, or a model for risk prevention and control of a financial system can be built through a convolutional neural network algorithm, and the like. The first sample may be one or more samples, which may be specifically set according to actual conditions, and is not limited in this description embodiment. The category label may be one label of the sample, and one sample may include a plurality of different category labels, for example, the first sample is an image, the category label may be whether the image includes a cat or not, and may also be whether the image includes a dog or not, if the image includes a cat and a dog, the category labels of the image may include 2, one category label may include a dog, and another category label may include a cat, and the like, which are only described above as examples of 2 category labels, and in an actual application, the category labels may further include 2 or more than 2, and may be specifically set according to an actual situation, which is not limited in this specification. The first number may be set according to practical situations, specifically 2 or 10. The second number can be set according to practical situations, such as 20 or 50.

In practice, a sample is data necessary for training a model, and the sample needs not only raw data necessary for training the model but also a class label corresponding to the sample in many cases, so as to perform loss calculation on a result of model prediction and the class label to adjust model parameters of the model. In practical applications, a sample may include one class label or may include a plurality of different class labels, however, one or more class labels of the sample may also be false labels, thus generating a noisy label for the sample. The samples with the noise labels are very common in daily business, especially under the condition that the class labels of the samples are more and the distinction among different class labels is not very obvious, more error labels (namely, the noise labels) can be generated in the process of marking some samples, and if a sample training model with the noise labels is used, the condition that the error labels are fitted can be easily generated. Therefore, a better processing mechanism for the noisy label of the sample is needed to be provided, so that the model with better output effect can be trained on the noisy label sample. The embodiment of the present specification provides an achievable processing method, which may specifically include the following:

the first sample for training the target model may be obtained in a variety of different manners, for example, an input page of the training sample may be preset, the input page may include a data input box, a determination key, a cancel key, and the like of the training sample, and when a training sample (i.e., the first sample) needs to be uploaded to the server, data of the input page may be obtained and the input page may be displayed. As shown in fig. 2, the user may enter the data of the first sample in the data entry box of the input page. In addition, a plurality of different category labels (i.e., a second number of category labels) may be set based on the attribute of the target model and the related information of the service corresponding to the target model, the set second number of category labels may be all the category labels to which the sample may belong, and from the second number of category labels, according to an actual situation, the above-mentioned selecting an appropriate second number of category labels for the input first sample, and labeling the first sample with the second number of category labels. Then, the user can input the data of the first sample into the data input box, and simultaneously can input the category labels of the second number into the data input box, and after the input is finished, the user can click a determination key in the input page, at this time, the server can obtain the first sample, and the first sample can be obtained through the above mode. Alternatively, the server may record related data of a certain service, and may label the related data of the service, so as to obtain the second number of category labels. When a first sample needs to be obtained, data meeting specified requirements can be obtained from the relevant data of the service and the corresponding category label, the obtained data is used as the first sample, and the like.

In step S104, the first sample is input into the target model, the probability that the first sample belongs to each of the second number of class labels is obtained through a forward propagation algorithm, and based on the probability that the first sample belongs to each of the second number of class labels and the first number of class labels included in the first sample, the loss information corresponding to the first sample is determined through a preset loss function corresponding to the target model.

The loss function may include multiple types, such as a cross entropy loss function, which may be specifically set according to an actual situation, and this is not limited in the embodiment of the present specification.

In implementation, a certain number of samples may be obtained to train the target model to obtain a preliminarily trained target model, or a corresponding algorithm may be directly used to construct the target model, and the constructed target model may be initialized to obtain the target model. Then, the first sample may be input into the target model, the first sample is input from the input layer of the target model through the forward propagation algorithm, the processing result may be finally transmitted to the output layer through the hidden layer, and a final output result may be obtained through the output layer, where the output result is a probability that the first sample belongs to each of the second number of category labels, so that the probability that the category label of the first sample is each of the second number of category labels may be calculated through the target model, for example, the second number of category labels includes category label 1, category label 2, category label 3, category label 4, and category label 5, and the probability that the category label of the first sample is category label 1, the probability that the category label of the first sample is category label 2, the probability that the category label of the first sample is category label 3, the probability that the category label of the first sample is category label 4, and the probability that the category label of the first sample is category label 5 may be calculated through the target model. Then, the probability that the first sample belongs to each class label in the second number of class labels and the first number of class labels included in the first sample may be respectively input into a preset loss function corresponding to the target model, and corresponding loss information is calculated, so that the loss information corresponding to the first sample is obtained.

In step S106, a first number of category labels included in the first sample are trimmed based on the loss information corresponding to the first sample, and a first sample including the remaining category labels is obtained.

In implementation, after the loss information corresponding to the first sample is obtained in the above manner, according to an actual situation, and considering that if the loss information corresponding to a certain category of tag is larger, it indicates that the category of tag has a higher possibility of belonging to a category of tag with noise, at this time, a corresponding threshold may be preset, and if the loss information corresponding to a certain category of tag is higher than the threshold, it indicates that the category of tag has a higher possibility of belonging to a category of tag with noise, at this time, the category of tag whose loss information is higher than the threshold may be cut off, and only the category of tag whose loss information is lower than the threshold remains, so as to obtain the first sample including the remaining category of tag.

In step S108, model training is performed on the target model based on the first sample including the remaining class labels through a back propagation algorithm, so as to obtain a trained target model.

In implementation, after the first sample including the remaining class labels is obtained in the above manner, the Gradient of the preset target function with respect to each model parameter may be quickly calculated through a back-propagation algorithm (back-propagation), and finally, a local minimum of the preset loss function may be calculated based on the obtained Gradient value by using, for example, a Stochastic Gradient Descent (SGD) algorithm, so as to obtain a corresponding weight, and the model parameter is updated based on the weight, so as to perform model training on the target model, and finally obtain the trained target model.

The embodiment of the present disclosure provides a method for training a model, in which a first sample for training a target model is obtained (the first sample includes a first number of class labels, the first number does not exceed a second number corresponding to the class labels of samples for training the target model), then the first sample may be input into the target model, a forward propagation algorithm is used to obtain a probability that the first sample belongs to each class label in the second number of class labels, and based on the probability that the first sample belongs to each class label in the second number of class labels and the first number of class labels included in the first sample, loss information corresponding to the first sample is determined through a preset loss function corresponding to the target model, based on the loss information corresponding to the first sample, the first number of class labels included in the first sample are clipped, so as to obtain a first sample including remaining class labels, and finally, a backward propagation algorithm is used to model the target model based on the first sample including remaining class labels, so as to obtain a target model with a post-labeled target model, thus, a sample with a noise label can be extracted, and a robust sample with a large number of class labels can be well-handled, and the problem of the sample can be solved, and the above-level noise-labeled target model can be well, and the above-based on-level noise-labeled-based on-labeled target model.

Example two

As shown in fig. 3, an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone and a tablet computer, or a device such as a personal computer, the server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of a financial service or an online shopping service, or a background server of an application. The method may be applied to relevant scenes where model training and the like are set, in this embodiment, a server is used as an execution subject to be described in detail, and for the case of the terminal device, the following relevant contents may be referred to, and are not described herein again. The method may specifically comprise the steps of:

in step S302, a first sample for training the target model is obtained, where the first sample includes a first number of class labels, and the first number does not exceed a second number corresponding to the class labels of the samples for training the target model.

In step S304, model training is performed on the target model based on a preset model training rule, and in the process of performing the model training, the first sample is input into the target model, and a probability that the first sample belongs to each of the second number of class labels is obtained through a forward propagation algorithm, where the model training rule includes one or more of a co-teaching-based model training rule, a co-teaching + based model training rule, and a clearlab-based model training rule.

In the implementation, considering that in practical application, the model may be trained by the co-teaching-based model training rule, the co-teaching + based model training rule, or the clearlab-based model training rule, and the model may be trained by the noisy label, therefore, the process of training the model may be improved, specifically, the target model may be model-trained based on the first sample, and the co-teaching-based model training rule, the co-teaching + based model training rule, or the clearlab-based model training rule, and during the process of model training, the first sample may be input into the target model, and the probability of each class label in the second number of class labels of the first sample may be obtained by the forward propagation algorithm, specifically, see the related contents above.

It should be noted that, performing model training on the target model based on one or more of the co-teaching-based model training rule, the co-teaching + based model training rule, and the clearlab-based model training rule is only an achievable way, and may also include performing model training on the target model by using a plurality of different model training rules, which may be specifically set according to actual situations, and the embodiment of the present specification does not limit this.

In step S306, based on the probability that the first sample belongs to each of the second number of class labels and the first number of class labels included in the first sample, the loss information corresponding to the first sample is determined through a preset loss function corresponding to the target model.

The preset Loss function corresponding to the target model can be a two-class cross entropy Loss function BCELoss, a mean square error Loss function MSELoss, a Focal local Loss function or the like.

The specific processing procedure of the step S306 may refer to relevant contents in the foregoing embodiment one, and is not described herein again.

In step S308, based on the loss information corresponding to the first sample, the first number of category labels included in the first sample are sorted, and the first number of category labels with the loss information sorted in descending order is obtained.

In step S310, based on a preset category label clipping rule, a first number of category labels with loss information arranged in descending order are clipped to obtain a first sample containing the remaining category labels.

In implementation, corresponding threshold values may be preset, a first number of category labels arranged in descending order of loss information are screened, and based on a preset category label cutting rule, the category labels corresponding to the loss information arranged in the front and larger than the threshold values are cut from the first number of category labels, so as to obtain a first sample including the remaining category labels.

In practical applications, the specific processing manner of step S310 may be various, and an alternative processing manner is provided as follows, which may specifically include the following: removing positive tags of which the loss information is greater than a preset first loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information, and removing negative tags of which the loss information is greater than a preset second loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information to obtain a first sample containing the residual category tags.

The positive label may be a category label for labeling the first sample as a positive sample, and the negative label may be a category label for labeling the first sample as a negative sample, for example, the first sample is an image, the positive label may be that the image is labeled to include a certain object, and the negative label may be that the image is labeled to include no certain object, and the like, which may be set according to actual situations, and this is not limited in this description embodiment.

In practical application, as shown in fig. 4, if the preset loss function corresponding to the target model is a binary cross entropy loss function BCELoss, and the class label is represented by 0 or 1, the specific processing manner of removing the positive label whose loss information is greater than the preset first loss threshold from the first number of class labels whose loss information is sequentially arranged from large to small, and removing the negative label whose loss information is greater than the preset second loss threshold from the first number of class labels whose loss information is sequentially arranged from large to small to obtain the first sample including the remaining class labels may be various, and an optional processing manner is provided below, which may specifically include the following processing of step A2 and step A4.

In step A2, from the first number of category labels in which the loss information is arranged in the descending order, the category label presented with 1 in which the loss information is arranged in the descending order and the category label presented with 0 in which the loss information is arranged in the descending order are obtained.

In step A4, the category labels with the loss information greater than the preset first loss threshold are removed from the category labels presented with 1 in the descending order of the obtained loss information, and the category labels with the loss information greater than the preset second loss threshold are removed from the category labels presented with 0 in the descending order of the obtained loss information, so as to obtain a first sample containing the remaining category labels.

In implementation, as shown in fig. 4, the category label corresponding to the largest k1 loss information may be removed from the category labels presented with 1 that are arranged in the descending order of the obtained loss information (i.e., the category label whose loss information is greater than the preset first loss threshold is removed from the category labels presented with 1 that are arranged in the descending order of the obtained loss information), and the category label corresponding to the largest k2 loss information may be removed from the category labels presented with 0 that are arranged in the ascending order of the obtained loss information (i.e., the category label whose loss information is greater than the preset second loss threshold is removed from the category labels presented with 0 that are arranged in the ascending order of the obtained loss information is smaller), where k1 and k2 are hyper parameters, and represent the numbers of positive labels and negative labels of the first sample, respectively, and may be set according to a specific situation, for example, when the number of the category labels of the samples is large, a certain category label may be missed, but the labeled category label is usually more accurate, and the tolerable category label of the k1 label may be set, and only represents a negative label of the negative label.

In step S312, model training is performed on the target model based on the first sample including the remaining class labels through a back propagation algorithm, so as to obtain a trained target model.

The specific program code of the above-mentioned processing procedure (for example, the pytorch code) can be referred to the following:

the embodiment of the present specification provides a method for training a model, where a first sample for training a target model is obtained (the first sample includes a first number of class labels, and the first number does not exceed a second number corresponding to the class labels of samples for training the target model), then the first sample may be input into the target model, a probability that the first sample belongs to each of the second number of class labels is obtained through a forward propagation algorithm, and based on the probability that the first sample belongs to each of the second number of class labels and the first number of class labels included in the first sample, loss information corresponding to the first sample is determined through a preset loss function corresponding to the target model, based on the loss information corresponding to the first sample, the first number of class labels included in the first sample are trimmed to obtain a first sample including remaining class labels, and finally, a model is performed on the target model based on the first sample including the remaining class labels through a backward propagation algorithm, so that, the trained target model can obtain the target model, and thus, a sample with respect to the sample with the noise, a problem that a plurality of class labels can be well processed in a high-level sample, and a robust sample label processing method for the sample can be applicable to a robust label is capable of discarding the sample, and a robust label is capable of improving the sample.

EXAMPLE III

Based on the foregoing embodiments, the following (including the third embodiment and the fourth embodiment) describes the process of the foregoing embodiments through a specific application scenario, where the application scenario is a human-computer interaction scenario (for example, a dialog robot interacts with a user, the dialog robot needs to understand and recognize the intention of the user in a dialog, and the user's intention is in many categories, and a tag in the dialog is likely to appear with a noisy tag when the tag is set), and based on this, the foregoing first sample is a text data sample. As shown in fig. 5, an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone or a tablet computer, or may be a device such as a personal computer, and the server may be an independent server, a server cluster composed of a plurality of servers, or the like, or a server of the above-mentioned conversation robot. The method may specifically comprise the steps of:

in step S502, a text data sample for training a target model is obtained, where the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of class labels, the first number does not exceed a second number corresponding to the class labels of the samples for training the target model, and the target model is a model for identifying a user intention.

The category tag may be a tag for identifying user intention of voice data input by a user in a human-computer interaction process, for example, the category tag may be whether to transfer money or not, may also be whether to need to pay, may also be whether to pay by swiping a face, and the like, and may be specifically set according to an actual situation, which is not limited in the embodiment of the present specification.

In implementation, in a human-computer interaction scene, the conversation robot interacts with a user, and at the moment, voice data input in a conversation process between the user and the conversation robot can be collected. In addition, a plurality of different class labels (i.e., a second number of class labels) may be set based on the attribute of the target model and the related information of the human-computer interaction scene corresponding to the target model, the set second number of class labels may be all class labels to which the sample may belong, and a suitable second number of class labels may be selected for the input voice data from the second number of class labels according to the actual situation. Then, after or instead of the text data sample, a second number of category labels labeled for the text data sample may be obtained, which may be specifically set according to an actual situation, and this is not limited in this specification.

In step S504, the text data sample is input into the target model, the probability that the text data sample belongs to each of the second number of class labels is obtained through a forward propagation algorithm, and based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels included in the text data sample, the loss information corresponding to the text data sample is determined through a preset loss function corresponding to the target model.

In implementation, a certain number of text data samples can be obtained to train the target model to obtain the preliminarily trained target model, or the target model can be directly constructed by using a corresponding algorithm and initialized to obtain the target model. Then, the obtained text data sample may be input into the target model, and through a forward propagation algorithm, the text data sample is input from an input layer of the target model, and passes through a hidden layer (including, for example, a convolutional layer, a pooling layer, a full-link layer, and the like), and a processing result may be finally transmitted to an output layer, and through the output layer, a final output result may be obtained, where the output result is a probability that the text data sample belongs to each of the second number of category labels, and thus, a probability that the category label of the text data sample is each of the second number of category labels may be calculated by the target model. Then, the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels included in the text data sample may be respectively input into a preset loss function corresponding to the target model, and corresponding loss information is calculated, so that the loss information corresponding to the text data sample is obtained.

In step S506, based on the loss information corresponding to the text data sample, a first number of category labels included in the text data sample are clipped, so as to obtain a text data sample including remaining category labels.

In implementation, after the loss information corresponding to the text data sample is obtained in the above manner, according to actual conditions, and considering that if the loss information corresponding to a certain category of tag is relatively large, it indicates that the category of tag has a relatively high possibility of belonging to a noisy category of tag, at this time, a corresponding threshold may be preset, and if the loss information corresponding to a certain category of tag is higher than the threshold, it indicates that the category of tag has a relatively high possibility of belonging to a noisy category of tag, at this time, the category of tag whose loss information is higher than the threshold may be cut off, and only the category of tag whose loss information is lower than the threshold remains, so as to obtain the text data sample including the remaining category of tag.

In step S508, model training is performed on the target model based on the text data sample including the remaining category labels through a back propagation algorithm, so as to obtain a trained target model.

In implementation, after the text data samples including the remaining category labels are obtained in the above manner, the Gradient of the preset target function with respect to each model parameter may be quickly calculated through a back-propagation algorithm (back-propagation), and finally, a local minimum of the preset loss function may be calculated based on the obtained Gradient value by using, for example, a Stochastic Gradient Descent (SGD) algorithm, so as to obtain a corresponding weight through solution, and the model parameter is updated based on the weight, so as to perform model training on the target model, and finally obtain the trained target model.

The embodiment of the present specification provides a model training method, which includes obtaining a text data sample for training a target model (the text data sample is a data sample obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of category labels, the first number does not exceed a second number corresponding to the category labels of the sample used for training the target model, and the target model is a model for identifying a user intention), then inputting the text data sample into the target model, obtaining a probability that the text data sample belongs to each category label in the second number of category labels by a forward propagation algorithm, determining loss information corresponding to the text data sample by a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each category label in the second number of category labels and the first number of category labels included in the text data sample, obtaining loss information corresponding to the text data sample, obtaining a text sample including a residual category label by a preset loss function corresponding to the target model, and obtaining a robust training model for training data sample by performing a robust noise extraction on the target model with respect to the text data sample, and the target model, so that the target model can be applied to the training data sample, and the target model with more types of the training data label, and the target label, thus obtaining the training data sample, and improving the target data sample by the training method, and the robust training method, in addition, the text data sample processing is carried out deep to the label layer in the mode, so that the problem that the difficult sample is discarded is prevented, and the problem of multi-class labels can be well processed.

Example four

As shown in fig. 6, an execution subject of the method may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone and a tablet computer, or may be a device such as a personal computer, and the server may be an independent server, a server cluster formed by multiple servers, or a server of the above-mentioned conversation robot. The method may specifically comprise the steps of:

in step S602, a text data sample for training a target model is obtained, where the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of category labels, the first number does not exceed a second number corresponding to the category labels of the samples for training the target model, and the target model is a model for identifying a user intention.

In step S604, model training is performed on the target model based on a preset model training rule, and in the process of performing model training, the text data sample is input into the target model, and a probability that the text data sample belongs to each of the second number of class labels is obtained through a forward propagation algorithm, where the model training rule includes one or more of a co-teaching-based model training rule, a co-teaching + based model training rule, and a clearlab-based model training rule.

The specific processing procedure of step S604 may refer to relevant contents in the foregoing embodiments, and is not described herein again.

In step S606, based on the probability that the text data sample belongs to each of the second number of category labels and the first number of category labels included in the text data sample, the loss information corresponding to the text data sample is determined through a preset loss function corresponding to the target model.

The specific processing procedure of step S606 may refer to relevant contents in the foregoing embodiments, and is not described herein again.

In step S608, based on the loss information corresponding to the text data sample, the first number of category labels included in the text data sample are sorted to obtain the first number of category labels with loss information arranged in descending order.

In step S610, based on a preset category label clipping rule, a first number of category labels with loss information arranged in descending order are clipped to obtain a text data sample containing remaining category labels.

In practical applications, the specific processing manner of step S610 may be various, and the following provides an optional processing manner, which may specifically include the following: removing positive labels with loss information larger than a preset first loss threshold value from the first quantity of category labels with the loss information arranged in the descending order of the loss information, and removing negative labels with the loss information larger than a preset second loss threshold value from the first quantity of category labels with the loss information arranged in the descending order of the loss information to obtain a text data sample containing the remaining category labels.

The positive label may be a category label for labeling the text data sample as a positive sample, and the negative label may be a category label for labeling the text data sample as a negative sample, for example, the positive label may be a label that the text data sample includes a user's intention to pay for face brushing, the negative label may be a label that the text data sample does not include the user's intention to pay for face brushing, and the like, which may also be specifically set according to an actual situation, and this is not limited in this description embodiment.

In practical application, if the preset loss function corresponding to the target model is a binary cross entropy loss function BCELoss, and the category label is represented by 0 or 1, the positive label with loss information greater than the preset first loss threshold value is removed from the first number of category labels with loss information sequentially arranged from large to small, and the negative label with loss information greater than the preset second loss threshold value is removed from the first number of category labels with loss information sequentially arranged from large to small, so that specific processing manners of obtaining the text data sample including the remaining category labels may be various, and a selectable processing manner is provided below, which may specifically include the following processing of step B2 and step B4.

In step B2, from the first number of category labels in which the loss information is arranged in the descending order, the category label presented with 1 in which the loss information is arranged in the descending order and the category label presented with 0 in which the loss information is arranged in the descending order are obtained.

In step B4, the category labels with the loss information greater than the preset first loss threshold are removed from the category labels presented with 1 and the acquired loss information is arranged in descending order, and the category labels with the loss information greater than the preset second loss threshold are removed from the category labels presented with 0 and the acquired loss information is arranged in descending order, so as to obtain the text data sample containing the remaining category labels.

In implementation, the category label corresponding to the largest k1 loss information may be removed from the category labels presented with 1 that are arranged in the descending order of the obtained loss information (i.e., the category label whose loss information is greater than the preset first loss threshold is removed from the category labels presented with 1 that are arranged in the descending order of the obtained loss information), and the category label corresponding to the largest k2 loss information may be removed from the category labels presented with 0 that are arranged in the descending order of the obtained loss information (i.e., the category label whose loss information is greater than the preset second loss threshold is removed from the category labels presented with 0 that are arranged in the descending order of the obtained loss information), where k1 and k2 are hyper parameters, which represent the numbers of positive labels and negative labels tolerable noisy labels in the text data sample, and may be set according to specific situations, for example, when the category labels of the text data sample are many, the case of missing labels often occurs, but the case that the category labels are labeled accurately, at this time, the case that k1 is a tolerable noisy label is set, may only 0.

In step S612, model training is performed on the target model based on the text data sample including the remaining category labels through a back propagation algorithm, so as to obtain a trained target model.

EXAMPLE five

Based on the same idea, the above method for training a model provided in the embodiment of the present specification further provides a device for training a model, as shown in fig. 7.

The training device of the model comprises: a sample acquisition module 701, a loss determination module 702, a label clipping module 703, and a model training module 704, wherein:

a sample obtaining module 701, configured to obtain a first sample used for training a target model, where the first sample includes a first number of class labels, and the first number is not more than a second number corresponding to the class labels of the samples used for training the target model;

a loss determining module 702, configured to input the first sample into the target model, obtain, through a forward propagation algorithm, a probability that the first sample belongs to each of the second number of class labels, and determine, based on the probability that the first sample belongs to each of the second number of class labels and the first number of class labels included in the first sample, loss information corresponding to the first sample through a preset loss function corresponding to the target model;

a label cutting module 703, configured to perform cutting processing on a first number of category labels included in the first sample based on the loss information corresponding to the first sample, to obtain a first sample including remaining category labels;

and the model training module 704 performs model training on the target model based on the first sample containing the residual class labels through a back propagation algorithm to obtain a trained target model.

In an embodiment of the present specification, the preset Loss function corresponding to the target model is a two-class cross entropy Loss function BCELoss, a mean square error Loss function mselos, or a Focal local Loss function.

In this embodiment, the label cutting module 703 includes:

the sorting unit is used for sorting a first number of category labels contained in the first sample based on the loss information corresponding to the first sample to obtain a first number of category labels with loss information arranged in a descending order;

and the label cutting unit is used for cutting a first number of class labels of which the loss information is arranged in a descending order based on a preset class label cutting rule to obtain a first sample containing the rest class labels.

In this embodiment of the present specification, the label clipping unit removes, from a first number of category labels in which loss information is sequentially arranged from large to small, positive labels whose loss information is greater than a preset first loss threshold, and removes, from a first number of category labels in which loss information is sequentially arranged from large to small, negative labels whose loss information is greater than a preset second loss threshold, to obtain a first sample including remaining category labels.

In the embodiment of the present specification, the preset loss function corresponding to the target model is a two-class cross entropy loss function BCELoss, and the class label is represented by 0 or 1;

the label cutting unit is used for acquiring the category labels which are arranged in the loss information from large to small and are presented by 1 and the category labels which are arranged in the loss information from large to small and are presented by 0 from the category labels of the first number which are arranged in the loss information from large to small; removing the category labels with the loss information larger than a preset first loss threshold value from the category labels presented by 1 and arranged in the descending order of the obtained loss information, and removing the category labels with the loss information larger than a preset second loss threshold value from the category labels presented by 0 and arranged in the descending order of the obtained loss information to obtain a first sample containing the remaining category labels.

In this embodiment of the present specification, the loss determining module 702 performs model training on the target model based on a preset model training rule, and inputs the first sample into the target model during the process of performing model training, and obtains, through a forward propagation algorithm, a probability that the first sample belongs to each class label in the second number of class labels;

the model training rules include one or more of co-teaching based model training rules, co-teaching + based model training rules, and clearlab based model training rules.

In an embodiment of the present specification, a similarity between the second number of category labels is greater than a preset similarity threshold.

In an embodiment of this specification, the training apparatus for the model includes:

the system comprises a sample acquisition module, a target model generation module and a target model generation module, wherein the sample acquisition module is used for acquiring a text data sample for training a target model, the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the sample for training the target model, and the target model is used for identifying the intention of the user;

the loss determining module is used for inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each class label in the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each class label in the second number of class labels and the first number of class labels contained in the text data sample;

the label cutting module is used for cutting a first number of category labels contained in the text data sample based on the loss information corresponding to the text data sample to obtain a text data sample containing the remaining category labels;

and the model training module performs model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

The embodiment of the present specification provides a model training apparatus, which obtains a text data sample for training a target model (the text data sample is a data sample obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of category labels, the first number does not exceed a second number corresponding to the category labels of the sample for training the target model, and the target model is a model for identifying a user intention), then, may input the text data sample into the target model, obtain, through a forward propagation algorithm, a probability that the text data sample belongs to each category label in the second number of category labels, and based on the probability that the text data sample belongs to each category label in the second number of category labels and the first number of category labels included in the text data sample, determining loss information corresponding to the text data sample through a preset loss function corresponding to a target model, cutting a first number of class labels contained in the text data sample based on the loss information corresponding to the text data sample to obtain a text data sample containing residual class labels, and finally performing model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain a trained target model, so that a loss function capable of processing the text data sample with noise labels at a label level is provided for the text data sample with noise labels, the method is particularly suitable for identifying user intentions in a man-machine interaction process with multiple labels and more label types, and the robustness of the text data sample with noise labels of the target model can be greatly improved, in addition, the text data sample processing is carried out deep to the label layer in the mode, so that the problem that the difficult sample is discarded is prevented, and the problem of multi-class labels can be well processed.

EXAMPLE six

Based on the same idea, the above training device for the model provided in the embodiment of the present specification further provides a training apparatus for the model, as shown in fig. 8.

The training device of the model may provide a terminal device or a server, etc. for the above embodiments.

The training devices of the model may vary significantly depending on configuration or performance, and may include one or more processors 801 and memory 802, where the memory 802 may have one or more stored applications or data stored therein. Memory 802 may be, among other things, transient storage or persistent storage. The application program stored in memory 802 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a training apparatus for a model. Still further, the processor 801 may be configured to communicate with the memory 802 to execute a series of computer-executable instructions in the memory 802 on the training device of the model. The training apparatus of the model may also include one or more power supplies 803, one or more wired or wireless network interfaces 804, one or more input-output interfaces 805, one or more keyboards 806.

In particular, in this embodiment, the training apparatus for the model includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the training apparatus for the model, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

obtaining a first sample for training a target model, wherein the first sample comprises a first number of class labels, and the first number does not exceed a second number corresponding to the class labels of the sample for training the target model;

inputting the first sample into the target model, obtaining the probability that the first sample belongs to each class label in the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the first sample through a preset loss function corresponding to the target model based on the probability that the first sample belongs to each class label in the second number of class labels and the first number of class labels contained in the first sample;

based on the loss information corresponding to the first sample, performing cutting processing on a first number of class labels contained in the first sample to obtain a first sample containing residual class labels;

and performing model training on the target model based on the first sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

In an embodiment of this specification, the cutting a first number of class labels included in the first sample based on the loss information corresponding to the first sample to obtain a first sample including remaining class labels includes:

sorting a first number of category labels contained in the first sample based on the loss information corresponding to the first sample to obtain a first number of category labels with loss information arranged in a descending order;

based on a preset class label cutting rule, a first number of class labels with loss information arranged in sequence from large to small are subjected to cutting processing, and a first sample containing the remaining class labels is obtained.

In this embodiment of the present specification, the cutting a first number of category labels, in which loss information is sequentially arranged from large to small, based on a preset category label cutting rule to obtain a first sample including remaining category labels includes:

removing positive tags of which the loss information is greater than a preset first loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information, and removing negative tags of which the loss information is greater than a preset second loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information to obtain a first sample containing the residual category tags.

removing positive tags of which the loss information is greater than a preset first loss threshold value from the first quantity of category tags of which the loss information is sequentially arranged from large to small, and removing negative tags of which the loss information is greater than a preset second loss threshold value from the first quantity of category tags of which the loss information is sequentially arranged from large to small to obtain a first sample containing the remaining category tags, wherein the method comprises the following steps:

obtaining category labels which are arranged in the loss information from big to small and are presented by 1 and category labels which are arranged in the loss information from big to small and are presented by 0 from a first number of category labels which are arranged in the loss information from big to small;

removing the category labels with the loss information larger than a preset first loss threshold value from the category labels presented by 1 and arranged in the descending order of the obtained loss information, and removing the category labels with the loss information larger than a preset second loss threshold value from the category labels presented by 0 and arranged in the descending order of the obtained loss information to obtain a first sample containing the remaining category labels.

In this embodiment of the present specification, the inputting the first sample into the target model, and obtaining the probability that the first sample belongs to each of the second number of class labels through a forward propagation algorithm includes:

model training is carried out on the target model based on a preset model training rule, the first sample is input into the target model in the process of model training, and the probability that the first sample belongs to each class label in the second number of class labels is obtained through a forward propagation algorithm;

Further, in particular embodiments, the training apparatus for the model includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the training apparatus for the model, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

the method comprises the steps of obtaining a text data sample for training a target model, wherein the text data sample is obtained by converting voice data input by a user in a human-computer interaction process, the text data sample comprises a first number of class labels, the first number is not more than a second number corresponding to the class labels of the sample for training the target model, and the target model is a model for identifying the intention of the user;

inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample;

based on the loss information corresponding to the text data sample, cutting a first number of category labels contained in the text data sample to obtain a text data sample containing residual category labels;

and performing model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain the trained target model.

The embodiment of the present specification provides a model training device, which obtains a text data sample for training a target model (the text data sample is a data sample obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of category labels, the first number does not exceed a second number corresponding to the category labels of the sample for training the target model, and the target model is a model for identifying a user intention), then, may input the text data sample into the target model, obtain, through a forward propagation algorithm, a probability that the text data sample belongs to each category label in the second number of category labels, and based on the probability that the text data sample belongs to each category label in the second number of category labels and the first number of category labels included in the text data sample, determining loss information corresponding to the text data sample through a preset loss function corresponding to a target model, cutting a first number of class labels contained in the text data sample based on the loss information corresponding to the text data sample to obtain a text data sample containing residual class labels, and finally performing model training on the target model based on the text data sample containing the residual class labels through a back propagation algorithm to obtain a trained target model, so that a loss function capable of processing the text data sample with noise labels at a label level is provided for the text data sample with noise labels, the method is particularly suitable for identifying user intentions in a man-machine interaction process with multiple labels and more label types, and the robustness of the text data sample with noise labels of the target model can be greatly improved, in addition, the text data sample processing is carried out deep to the label layer in the mode, so that the problem that the difficult sample is discarded is prevented, and the problem of multi-class labels can be well processed.

EXAMPLE seven

Further, based on the methods shown in fig. 1 to fig. 6, one or more embodiments of the present specification further provide a storage medium for storing computer-executable instruction information, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and when the storage medium stores the computer-executable instruction information, the storage medium can implement the following process when being executed by a processor:

In an embodiment of this specification, the obtaining a first sample including remaining category labels by performing a clipping process on a first number of category labels included in the first sample based on loss information corresponding to the first sample includes:

In this embodiment of the present specification, the cutting a first number of category labels, in which loss information is arranged in descending order, based on a preset category label cutting rule to obtain a first sample including remaining category labels includes:

the removing of the positive tags of which the loss information is greater than the preset first loss threshold from the first number of category tags of which the loss information is sequentially arranged from large to small and the removing of the negative tags of which the loss information is greater than the preset second loss threshold from the first number of category tags of which the loss information is sequentially arranged from large to small obtains the first sample containing the remaining category tags, including:

In addition, in another specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and the storage medium stores computer executable instruction information that, when executed by the processor, can implement the following process:

The embodiment of the present specification provides a storage medium, which obtains a text data sample for training a target model (the text data sample is a data sample obtained by converting voice data input by a user in a human-computer interaction process, the text data sample includes a first number of category labels, the first number does not exceed a second number corresponding to the category labels of the samples for training the target model, and the target model is a model for identifying a user's intention), then, the text data sample may be input into the target model, a probability that the text data sample belongs to each category label in the second number of category labels is obtained through a forward propagation algorithm, based on the probability that the text data sample belongs to each category label in the second number of category labels and the first number of category labels included in the text data sample, loss information corresponding to the text data sample is determined through a preset loss function corresponding to the target model, based on the loss information corresponding to the text data sample, user processing is performed on the first number of category labels included in the text data sample, a text data sample including residual category labels is obtained, and a final loss function for the text data sample is obtained by extracting a robust training data sample based on the target label of the text data sample, and the target model, thus, and the target model can be applied to a robust training data sample, and the target model can be applied to the target model, in addition, the text data sample processing is carried out deep to the label layer in the mode, so that the problem that the difficult sample is discarded is prevented, and the problem of multi-class labels can be well processed.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abll (Advanced boot Expression Language), AHDL (alternate hard Description Language), traffic, CUPL (computer universal Programming Language), HDCal (Java hard Description Language), lava, lola, HDL, PALASM, software, rhydl (Hardware Description Language), and vhul-Language (vhyg-Language), which is currently used in the field. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be regarded as a hardware component and the means for performing the various functions included therein may also be regarded as structures within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-parallel apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-parallel apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable fraud case to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable fraud case serial-parallel apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of training a model, the method comprising:

2. The method of claim 1, wherein the predetermined loss function corresponding to the target model is a binary cross entropy loss function (BCELoss), a mean square error loss function (MSELoss) or a FocalLoss loss function.

3. The method according to claim 1 or 2, wherein the clipping a first number of category labels included in the text data sample based on the loss information corresponding to the text data sample to obtain a text data sample including remaining category labels includes:

sorting a first number of category labels contained in the text data sample based on the loss information corresponding to the text data sample to obtain a first number of category labels with loss information arranged in a descending order;

based on a preset class label cutting rule, carrying out cutting processing on a first number of class labels with loss information arranged in sequence from large to small to obtain a text data sample containing the remaining class labels.

4. The method according to claim 3, wherein the clipping processing is performed on a first number of category labels, which are arranged in the order of loss information from large to small, based on a preset category label clipping rule to obtain a text data sample including remaining category labels, the method includes:

removing positive tags of which the loss information is greater than a preset first loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information, and removing negative tags of which the loss information is greater than a preset second loss threshold value from the first quantity of category tags of which the loss information is arranged in the descending order of the loss information to obtain a text data sample containing the residual category tags.

5. The method according to claim 4, wherein the preset loss function corresponding to the target model is a binary cross entropy loss function BCELoss, the class label is represented by 0 or 1, and positive labels with loss information greater than a preset first loss threshold are removed from a first number of class labels with loss information sequentially arranged from large to small, and negative labels with loss information greater than a preset second loss threshold are removed from a first number of class labels with loss information sequentially arranged from large to small, so as to obtain a text data sample including remaining class labels, including:

removing the category labels of which the loss information is greater than a preset first loss threshold value from the category labels presented by 1 and arranged in the descending order of the obtained loss information, and removing the category labels of which the loss information is greater than a preset second loss threshold value from the category labels presented by 0 and arranged in the descending order of the obtained loss information to obtain the text data sample containing the rest category labels.

6. The method of claim 1, the inputting the text data sample into the target model, the obtaining a probability that the text data sample belongs to each of the second number of category labels through a forward propagation algorithm, comprising:

and performing model training on the target model based on a preset model training rule, inputting the text data sample into the target model in the process of performing model training, and obtaining the probability that the text data sample belongs to each class label in the second number of class labels through a forward propagation algorithm, wherein the model training rule comprises one or more of a co-teaching-based model training rule, a co-teaching + based model training rule and a clearlab-based model training rule.

7. The method of claim 1, wherein a similarity between the second number of category labels is greater than a preset similarity threshold.

8. An apparatus for training a model, the apparatus comprising:

the loss determining module is used for inputting the text data sample into the target model, obtaining the probability that the text data sample belongs to each of the second number of class labels through a forward propagation algorithm, determining loss information corresponding to the text data sample through a preset loss function corresponding to the target model based on the probability that the text data sample belongs to each of the second number of class labels and the first number of class labels contained in the text data sample;

and the model training module performs model training on the target model based on the text data sample containing the remaining category labels through a back propagation algorithm to obtain the trained target model.

9. A training apparatus for a model, the training apparatus for a model comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

10. A storage medium for storing computer-executable instructions, which when executed by a processor implement the following: