CN112529209A

CN112529209A - Model training method, device and computer readable storage medium

Info

Publication number: CN112529209A
Application number: CN202011427624.3A
Authority: CN
Inventors: 孟嘉琪
Original assignee: Shanghai Yuncong Enterprise Development Co ltd
Current assignee: Shanghai Yuncong Enterprise Development Co ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-19

Abstract

The invention relates to the technical field of machine learning, in particular provides a model training method, a model training device and a computer readable storage medium, and aims to solve the technical problem of how to improve the model training effect. According to the method provided by the embodiment of the invention, the initial training set can be utilized to train the preset data processing model to obtain the first data processing model; acquiring a first model loss difference value of the first data processing model and each second data processing model on a test set, wherein the second data processing model is obtained by training according to different sub-training sets under an initial training set, and the difference between the different sub-training sets is one or more different deleted samples; and finally, obtaining an abnormal sample according to the difference value to optimize the initial training set, and training the first data processing model by using the optimized initial training set. Based on the steps, the method can quickly and accurately screen the abnormal samples from the training set, and greatly improves the model training effect.

Description

Model training method, device and computer readable storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a model training method and device and a computer readable storage medium.

Background

Supervised learning in the technical field of machine learning mainly trains a model by using training samples and sample labels, and in order to improve the training effect of the model, the trained model can be ensured to have higher model performance only by using training samples with larger magnitude, such as millions of training samples, and marking accurate sample labels for each training sample in advance. For example: the million-level training samples and the class labels corresponding to the training samples are utilized to train the data classification model, so that the trained data classification model has high classification performance. Due to the fact that the magnitude of the training samples is too large, accurate labeling of each training sample cannot be guaranteed when labeling of the training samples is carried out, and if model training is carried out by using the noise samples with wrong labels, the training effect of the model can be reduced.

Disclosure of Invention

In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a model training method, an apparatus and a computer-readable storage medium that solve or at least partially solve the technical problem of how to improve the model training effect.

In a first aspect, a model training method is provided, the method comprising:

training a preset data processing model by using an initial training set to obtain a first data processing model;

respectively testing the first data processing model and the plurality of second data processing models by using a test set to obtain a first model loss difference value of the first data processing model and each second data processing model on the test set;

obtaining abnormal samples in the initial training set according to the first model loss difference value, and carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;

training the first data processing model by using the optimized training set to obtain a final data processing model;

wherein different second data processing models are configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples.

In one embodiment of the above model training method, the step of "obtaining a first model loss difference value of the first data processing model and each of the second data processing models on the test set" specifically includes:

acquiring a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;

respectively testing the plurality of alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;

according to the first model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a second data processing model corresponding to the currently deleted sample, obtaining an optimal alternative data processing model as a final second data processing model and obtaining a second model parameter of the final second data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the currently deleted sample to train the preset data processing model;

and carrying out influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by adopting a steady statistical method so as to obtain a first model loss difference value corresponding to the current deleted sample.

In an embodiment of the above model training method, the method includes fitting the second model parameter according to the first model parameter and a method shown in the following formula:

and fitting to obtain the second model parameter according to the first model parameter and a method shown as the following formula:

wherein, the

A second model parameter, said z, representing said final second data processing model as fitted_delRepresenting the currently deleted sample; the above-mentioned

Representing the first model parameter; the L represents a loss function used when a preset data processing model is trained and tested; z is_iRepresents the ith sample in the training set and z_i＝(x_i，y_i)，x_iRepresenting a sample z_iOf (2) an image sample, y_iA label, i 1.., n, representing the image sample; the epsilon₁Representing the preset currently deleted sample z_delSample weights of and

in one technical solution of the above model training method, "obtaining the first model loss difference corresponding to the currently deleted sample" specifically includes:

based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the second model parameter and the final second data processing model on the test set, wherein the influence function is represented by the following formula, and calculating the first model loss difference according to the influence function:

wherein the gamma is_up，loss(z_del，z_test) Representing the first model loss difference, said z_testRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a

Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of

Transposing the calculated gradient vector; the above-mentioned

Representing the second model parameters; z is_delRepresents the currently deleted sample, the

A Hessian matrix representing empirical risk of the final second data processing model and

in one embodiment of the above model training method, "obtaining abnormal samples in the initial training set according to the first model loss difference" specifically includes:

reversely ordering the first model loss difference values from negative to positive;

selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result;

and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.

In one embodiment of the above model training method, "adjusting the training set according to the abnormal sample" specifically includes:

obtaining a sample label of the abnormal sample;

judging whether the sample label is correct or not;

if the abnormal sample is correct, deleting the abnormal sample;

and if not, correcting the sample label of the abnormal sample.

In an embodiment of the above model training method, the method further includes obtaining a confrontation training set in the following manner, so as to train a preset generated confrontation network model by using the optimized training set and the confrontation training set:

training the preset data processing model by using the optimized training set to obtain a third data processing model;

respectively testing the third data processing model and a plurality of fourth data processing models by using the test set to obtain a second model loss difference value of the third data processing model and each fourth data processing model on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;

adjusting the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference value corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference value corresponding to each fourth data processing model;

and acquiring a disturbance quantity and a disturbed sample corresponding to the maximum second model loss difference value, and disturbing the disturbed sample according to the disturbance quantity to form a new sample so as to construct the confrontation training set according to the new sample.

In one technical solution of the above model training method, the step of "obtaining a second model loss difference value of each of the third data processing model and each of the fourth data processing model on the test set" specifically includes:

obtaining a plurality of alternative data processing models of the third data processing model after the preset data processing model is trained by the optimized training set;

respectively testing the multiple alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;

according to the third model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a fourth data processing model corresponding to the current disturbed sample, obtaining an optimal alternative data processing model as a final fourth data processing model and obtaining a fourth model parameter of the final fourth data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the current disturbed sample to train the preset data processing model;

and carrying out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by adopting a steady statistical method so as to obtain a second model loss difference value corresponding to the current disturbed sample.

In an embodiment of the above model training method, the method includes fitting the fourth model parameter according to the third model parameter and a method shown in the following formula:

and fitting to obtain the fourth model parameter according to the third model parameter and a method shown as the following formula:

wherein, the

A fourth model parameter, said z, representing said final fourth data processing model resulting from said fitting_δRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and z_δX denotes a sample z_δY represents the label of the image sample x; the above-mentioned

Representing the third model parameter; z is_iRepresents the ith sample in the training set and z_i＝(x_i，y_i)，x_iRepresenting a sample z_iOf (2) an image sample, y_iRepresenting the image sample x_i1.., n; the epsilon₂Represents a preset sample weight of the current perturbed sample z and

in one technical solution of the above model training method, "obtaining a second model loss difference corresponding to the current disturbed sample" specifically includes:

based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the fourth model parameter and the final fourth model parameter on the test set, wherein the influence function is represented by the following formula, and calculating the second model loss difference according to the influence function:

wherein the gamma is_pert，loss(z，z_test) Representing the second model loss difference, said z_testRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a

Transposing the calculated gradient vector; the above-mentioned

Representing the fourth model parameter; the z represents the current perturbed sample, the

A Hessian matrix representing empirical risk of the final fourth data processing model and

the above-mentioned

A first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance δ at the image sample x.

In a second aspect, there is provided a model training apparatus, the apparatus comprising:

a first data processing model obtaining module configured to train a preset data processing model by using an initial training set, obtaining a first data processing model;

a first loss difference acquisition module configured to test the first data processing model and the plurality of second data processing models respectively by using a test set, and acquire a first model loss difference between the first data processing model and each of the second data processing models on the test set respectively;

a training set optimization module configured to obtain abnormal samples in the initial training set according to the first model loss difference value, and perform sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;

a model training module configured to train the first data processing model with the optimized training set to obtain a final data processing model;

In one technical solution of the above model training device, the first loss difference obtaining module includes a first candidate model obtaining unit, a first parameter obtaining unit, a second parameter obtaining unit, and a first loss difference obtaining unit;

the first alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;

the first parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;

the second parameter obtaining unit is configured to fit, according to the first model parameter, a plurality of candidate data processing models of a second data processing model corresponding to the currently deleted sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final second data processing model, and obtain a second model parameter of the final second data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently deleted sample;

the first loss difference obtaining unit is configured to perform influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by using a robust statistical method to obtain a first model loss difference corresponding to the currently deleted sample.

In an embodiment of the above model training device, the second parameter obtaining unit is further configured to fit the second model parameter according to the first model parameter and according to a method shown in the following formula:

wherein, the

in an embodiment of the above model training apparatus, the first loss difference obtaining unit is further configured to construct an influence function of the model loss of the second model parameter and the final second data processing model on the test set, which is shown in the following formula, based on an influence function theory in a robust statistical method, and calculate the first model loss difference according to the influence function:

Transposing the calculated gradient vector; the above-mentioned

in an aspect of the above model training apparatus, the training set optimization module is further configured to perform the following operations:

obtaining a sample label of the abnormal sample;

judging whether the sample label is correct or not;

if the abnormal sample is correct, deleting the abnormal sample;

and if not, correcting the sample label of the abnormal sample.

In an embodiment of the above model training apparatus, the apparatus further includes:

a third data processing model obtaining module configured to train the preset data processing model by using the optimized training set to obtain a third data processing model;

a second loss difference obtaining module configured to respectively test the third data processing model and a plurality of fourth data processing models by using the test set, and obtain a second model loss difference between the third data processing model and each of the fourth data processing models on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;

a third loss difference obtaining module configured to adjust a disturbance amount of a disturbed sample corresponding to each fourth data processing model according to a variation trend of a second model loss difference corresponding to each fourth data processing model, so as to obtain a maximum second model loss difference corresponding to each fourth data processing model;

and the countermeasure training set acquisition module is configured to acquire a disturbance amount and a disturbed sample corresponding to the maximum second model loss difference value, and disturb the disturbed sample according to the disturbance amount to form a new sample so as to construct the countermeasure training set according to the new sample.

In one technical solution of the above model training device, the second loss difference obtaining module includes a second candidate model obtaining unit, a third parameter obtaining unit, a fourth parameter obtaining unit, and a second loss difference obtaining unit;

the second alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the third data processing model obtained after the preset data processing model is trained by using the optimized training set;

the third parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;

the fourth parameter obtaining unit is configured to fit, according to the third model parameter, a plurality of candidate data processing models of a fourth data processing model corresponding to the currently disturbed sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final fourth data processing model, and obtain a fourth model parameter of the final fourth data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently disturbed sample;

the second loss difference obtaining unit is configured to perform influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by using a robust statistical method to obtain a second model loss difference corresponding to the current disturbed sample.

In an embodiment of the above model training device, the fourth parameter obtaining unit is further configured to obtain the fourth model parameter by fitting according to the third model parameter and a method shown in the following formula:

wherein, the

in an embodiment of the above model training apparatus, the second loss difference obtaining unit is further configured to construct an influence function of model losses of the fourth model parameter and the final fourth model parameter on the test set, which is shown in the following formula, based on an influence function theory in a robust statistical method, and calculate the second model loss difference according to the influence function:

Transposing the calculated gradient vector; the above-mentioned

the above-mentioned

In a third aspect, a control device is provided, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the model training method according to any of the above-mentioned aspects of the model training method.

In a fourth aspect, a computer readable storage medium is provided, having stored therein a plurality of program codes adapted to be loaded and run by a processor to perform the model training method according to any one of the above-mentioned aspects of the model training method.

One or more technical schemes of the invention at least have one or more of the following beneficial effects:

in the technical scheme of implementing the invention, an initial training set can be used for training a preset data processing model to obtain a first data processing model; then, the first data processing model and the second data processing models are respectively tested by using the test set, and a first model loss difference value of the first data processing model and each second data processing model on the test set is obtained; wherein the different second data processing models are configured to be trained from different sub-training sets under the initial training set, and the different sub-training sets differ by one or more different deleted samples. Further, obtaining abnormal samples in the initial training set according to the first model loss difference, and if the model loss obtained after deleting a certain sample is increased, indicating that the deleted sample is a beneficial sample for model training; if the model loss after deleting a sample is reduced, the deleted sample is a sample harmful to the model training, so that the sample can be judged to belong to an abnormal sample. And then, carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. For example: and carrying out reverse sequencing from negative to positive on the first model loss difference values, selecting the first model loss difference values of which the sequencing sequence is less than or equal to a preset sequence value, obtaining second data processing models corresponding to the first model loss difference values, obtaining deleted samples (abnormal samples) when the second data processing models are trained, and permanently deleting the samples from the initial training set to form an optimized training set. And finally, training the first data processing model by using the optimized training set to obtain a final data processing model. Through the steps, the abnormal samples can be quickly and accurately screened out from the training set according to the change situation of the loss difference value of the first model, and the defects that in the prior art, the sample check is performed on the training set in an artificial check mode, the check efficiency is low, and missing and wrong checks are easy to occur are overcome. Meanwhile, the model training effect is greatly improved.

Further, in the technical solution of the present invention, a robust statistical method may be adopted to perform an impact analysis on the model parameters of the second data processing model and the model loss of the second data processing model in the test set, and a first model loss difference between the first data processing model and the second data processing model in the test set is directly obtained according to the result of the impact analysis, without using the first data processing model obtained by training the preset data processing model with the initial training set, then using the second data processing model obtained by training the preset data processing model with the sub-training set after deleting the training sample, then obtaining the model losses of the first data processing model and the second data processing model in the test set, and finally performing a difference calculation on the two model losses to obtain the first model loss difference, that is, omitting the training process on the second data processing model, the first model loss difference between the model losses of the first data processing model and the second data processing model on the test set can be obtained directly through influence analysis, so that the obtaining efficiency of the first model loss difference is greatly improved, and the abnormal samples are screened quickly.

Drawings

Embodiments of the invention are described below with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram illustrating the main steps of a model training method according to one embodiment of the present invention;

FIG. 2 is a flow chart illustrating the main steps of a first model loss difference acquisition method according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating the main steps of a model training method according to another embodiment of the present invention;

FIG. 4 is a flow chart illustrating the main steps of a second model loss difference acquisition method according to an embodiment of the present invention;

FIG. 5 is a block diagram of the main structure of a model training apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram showing the main structure of a model training apparatus according to another embodiment of the present invention;

list of reference numerals:

31: a first data processing model acquisition module; 32: a first loss difference acquisition module; 33: a training set optimization module; 34: a model training module; 41: a third data processing model acquisition module; 42: a second loss difference acquisition module; 43: a third loss difference acquisition module; 44: and a confrontation training set acquisition module.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.

Some terms to which the present invention relates are explained first.

Robust statistical methods refer to conventional statistical methods in the field of mathematical statistics, which are able to describe the effect of observations on the estimators. The observed value in the embodiment of the present invention may be a weight of a training sample or a disturbance amount of the training sample, and the estimator is a model loss of the data processing model, that is, the purpose of the embodiment of the present invention using a robust statistical method is to analyze what influence is brought to the model loss of the data processing model after the weight of the training sample is changed or the disturbance amount is increased. The theory of the Influence Function in the robust statistical method means that the Influence of the observed value on the estimated quantity is quantitatively analyzed by the Influence Function (IF) by constructing the Influence Function between the observed value and the estimated quantity. It should be noted that the robust statistical method and the theory of its influence function are conventional techniques in the field of mathematical statistics, and for the sake of brevity, the robust statistical method and the theory of its influence function are not specifically described here.

Generating a countermeasure network model refers to a model constructed based on a generating countermeasure network architecture (GAN). The GAN is a conventional network structure in the field of artificial intelligence technology, and for brevity of description, specific structures, functions and training methods of the GAN are not described herein again.

At present, the traditional sample labeling method mainly utilizes a manual labeling mode to label training samples with larger magnitude, such as millions of training samples, but the manual labeling mode is adopted to label the training samples with larger magnitude, so that errors are easily labeled, and if the noise samples with the wrong labels are utilized to train a model, the training effect of the model can be reduced. However, because the training samples have a large magnitude, if the training samples are continuously checked in an artificial checking manner to screen out the noise samples, time and labor are wasted, and missing detection and false detection are easy to occur.

In the embodiment of the invention, an initial training set can be used for training a preset data processing model (for example, a data classification model) to obtain a first data processing model; then, the first data processing model and the second data processing models are respectively tested by using the test set, and a first model loss difference value of the first data processing model and each second data processing model on the test set is obtained; wherein the different second data processing models are configured to be trained from different sub-training sets under the initial training set, and the different sub-training sets differ by one or more different deleted samples. Further, obtaining abnormal samples (for example, samples with wrong label labeling) in the initial training set according to the first model loss difference, and if the model loss obtained after deleting a certain sample is increased (the first model loss difference is increased), indicating that the deleted sample is a beneficial sample for model training; if the model loss after a certain sample is deleted is reduced (the loss difference of the first model is reduced), the deleted sample is a sample harmful to model training, and therefore the sample can be judged to belong to an abnormal sample. And then, carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. For example: and carrying out reverse sequencing from negative to positive on the first model loss difference values, selecting the first model loss difference values of which the sequencing sequence is less than or equal to a preset sequence value, obtaining second data processing models corresponding to the first model loss difference values, obtaining deleted samples (abnormal samples) when the second data processing models are trained, and permanently deleting the samples from the initial training set to form an optimized training set. And finally, training the first data processing model by using the optimized training set to obtain a final data processing model. Through the steps, the abnormal samples can be quickly and accurately screened out from the training set according to the change situation of the loss difference value of the first model, and the defects that in the prior art, the sample check is performed on the training set in an artificial check mode, the check efficiency is low, and missing and wrong checks are easy to occur are overcome. Meanwhile, the model training effect is greatly improved.

Referring to FIG. 1, FIG. 1 is a flow chart illustrating the main steps of a model training method according to an embodiment of the present invention. As shown in fig. 1, the model training method in the embodiment of the present invention mainly includes the following steps:

step S101: and training a preset data processing model by using the initial training set to obtain a first data processing model.

It should be noted that, in this embodiment, a preset data processing model may be trained by using a conventional model training method in the field of machine learning technology, and for brevity of description, the model training method is not described herein again.

Step S102: and respectively testing the first data processing model and the plurality of second data processing models by using the test set to obtain a first model loss difference value of the first data processing model and each second data processing model on the test set.

The different second data processing model may be configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples. An example is as follows: if the initial training set includes sample 1, sample 2, and sample 3, then training the preset data processing model with samples 1-3 at the same time may obtain the first data processing model described in step S101 above. If the sample 1, the sample 2 and the sample 3 are respectively deleted from the initial training set, sub-training sets 1 to 3 are formed, and then the preset data processing model is respectively trained by using the sub-training sets 1 to 3, so that a second data processing model shown in the following table 1 can be obtained.

TABLE 1

The first model loss difference is a difference obtained by subtracting the model loss of the first data processing model from the model loss of the second data processing model in the test set.

The model loss of the first data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the first data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the first data processing model on the test set.

The model loss of the second data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the second data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the second data processing model on the test set.

In addition, it should be noted that, in this embodiment, the preset data processing model may be trained by using the same model training method as that used for acquiring the first data processing model, so as to obtain each second data processing model.

Referring to fig. 2, in the present embodiment, the first model loss difference between the first data processing model and each second data processing model on the test set can be obtained according to the following methods shown in steps S1021-S1024.

Step S1021: and acquiring a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set.

The multiple candidate data processing models of the first data processing model refer to multiple models which can meet preset model training requirements (for example, the accuracy of data classification is greater than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures during training of a preset data processing model.

Step S1022: and respectively testing the plurality of alternative data processing models obtained in the step S1021 by using a test set to obtain an optimal alternative data processing model as a final first data processing model and obtain first model parameters of the final first data processing model.

It should be noted that, in this embodiment, each candidate data processing model may be tested by using a conventional model testing method in the field of machine learning technology, so as to obtain an optimal candidate data processing model from the trained multiple candidate data processing models.

Step S1023: and according to the first model parameters, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of the second data processing model corresponding to the currently deleted sample, acquiring the optimal alternative data processing model as the final second data processing model and acquiring the second model parameters of the final second data processing model.

A plurality of alternative data processing models of the second data processing model are obtained after a preset data processing model is trained by utilizing a sub-training set corresponding to a currently deleted sample, and the alternative data processing models refer to a plurality of models which can meet preset model training requirements (for example, the accuracy of data classification is more than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures when the preset data processing model is trained.

Specifically, in this embodiment, the second model parameter may be obtained by fitting according to the first model parameter and the method shown in the following formula (1):

the meaning of each parameter in formula (1) is as follows:

second model parameters, z, representing the final second data processing model obtained by fitting_delIndicating the currently deleted sample;

representing a first model parameter; l represents a loss function used when a preset data processing model is trained and tested; z is a radical of_iRepresents the ith sample in the training set and z_i＝(x_i，y_i)，x_iRepresenting a sample z_iOf (2) an image sample, y_iA label representing an image sample, i ═ 1, …, n; epsilon₁Representing a preset currently deleted sample z_delSample weights of and

step S1024: and (4) performing influence analysis on the second model parameter obtained in the step (S1023) and the model loss of the final second data processing model on the test set by adopting a robust statistical method to obtain a first model loss difference value corresponding to the current deleted sample.

Using robust statistical methods to model the second parameters

The influence analysis is carried out on the model loss of the final second data processing model on the test set, a first model loss difference between the model losses of the first data processing model and the final second data processing model on the test set can be directly obtained, the second data processing model obtained by training the preset data processing model by using the sub-training set is not needed after the first data processing model obtained by training the preset data processing model by using the initial training set, then the model losses of the first data processing model and the second data processing model on the test set are obtained, finally the difference between the two model losses is calculated to obtain the first model loss difference, namely the training process of the second data processing model is omitted, the first model loss difference between the model losses of the first data processing model and the final second data processing model on the test set can be obtained directly through the influence analysis, therefore, the obtaining efficiency of the loss difference value of the first model is greatly improved, and the method is favorable for rapidly screening abnormal samples.

In an implementation manner of the embodiment of the present invention, based on the robust statistical method, the first model loss difference value may be obtained according to the following acquisition:

constructing a second model parameter shown in the following formula (2) based on an influence function theory in the robust statistical method

And calculating a first model loss difference value according to the influence function with the model loss of the final second data processing model on the test set:

the meaning of each parameter in the formula (2) is as follows:

Γ_up，loss(z_del，z_test) Representing the first model loss difference, z_testRepresenting a test set, L representing a loss function used in training and testing a preset data processing model,

expressing the gradient of the model parameter theta for the loss value calculated from the loss function L, and T represents the warp

Transposing the calculated gradient vector;

representing the second model parameters; z is a radical of_delIndicating that the sample is currently being deleted,

a Hessian matrix representing the empirical risk of the final second data processing model and

empirical risk refers to the average of model loss for the data processing model for each sample in the training set, which can measure the training effectiveness of the data processing model. If the experience risk is smaller, the training effect of the data processing model is better; conversely, the less effective the training of the data processing model. It should be noted that the experience risk is a conventional technique in the field of machine learning technology, and is not described herein again for brevity of description.

In addition, it should be noted that, in this embodiment, a conventional gradient calculation method in the field of mathematical technology may be adopted to calculate the gradient of the model parameter θ according to the loss value calculated by the loss function L, and for brevity of description, detailed description of specific work of the gradient calculation method is not repeated here.

Second model parameters shown below in equation (2) above

The construction process of the impact function with model loss is briefly explained.

First, at the deleted sample z obtained by the fitting in step S1022_delCorresponding second model parameters

Then, based on the influence function theory in the robust statistical method, a preset deleted sample z shown in the following formula (3) is constructed_delSample weight of₁And model parameters

Influence function of (2):

solving equation (3) according to equation (1) yields the following equation (4):

in the formula (4), the first and second groups,

representing deleted samples z_delThe Hessian matrix of empirical risks for the corresponding second data processing model,

and is

Is a positive definite matrix.

The change of the model parameters caused by deleting a certain training sample can be estimated through the formulas (3) to (4), and the data processing model does not need to be trained by the sub-training set after deleting the training sample again to obtain new model parameters.

Then, using the chain rule, the analysis changes the weight of a certain training sample (increasing the sample weight ε)₁) The influence of the results of tests on the test set, i.e. the evaluation

The variation in model loss incurred over the test set. Specifically, an influence function as shown in the following formula (5) is constructed using the chain rule:

expanding equation (5) yields the following equation (6):

substituting the equations (3) - (4) into the equation (6) can obtain the analytical expression of the influence function shown in the equation (2).

In the present embodiment by means of the second model parameters

The influence function of the model loss of the final second data processing model on the test set can quantitatively analyze the influence of the change of the model parameters on the model loss of the data processing model, and the second model parameters can be directly calculated according to the influence function

The influence value of the model loss of the second data processing model on the test set (the loss difference value of the first model) is greatly improved, so that the efficiency of obtaining the loss difference value of the first model is greatly improved, and the method is favorable for quickly screening abnormal samples.

Step S103: and obtaining abnormal samples in the initial training set according to the first model loss difference, and carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set.

In the present embodiment, the abnormality sample may be acquired as follows in steps 11 to 13.

Step 11: and reversely ordering the first model loss difference value from negative to positive. If the loss difference value of the first model is larger, the fact that the corresponding sample is deleted to be more bad for the model is indicated, namely, the deleted sample is a beneficial sample beneficial to model training; if the first model loss difference is smaller, it indicates that the corresponding sample is deleted less badly to the model, that is, the deleted sample is a harmful sample harmful to the model training. Therefore, the abnormal samples with larger hazard degrees can be quickly selected according to the reverse sorting result by reversely sorting the first model loss difference values from negative to positive, namely sorting the first model loss difference values from large to small in hazard degree. Similarly, the beneficial samples with larger beneficial degrees can be quickly selected according to the positive sorting result by sorting the first model loss difference values from positive to negative in a positive direction, namely sorting the beneficial degrees in a descending order.

An example is as follows: if the second data processing models 1-10 are obtained by utilizing the training of the sub training sets 1-10 after the samples 1-10 are deleted respectively, and the first model loss difference values corresponding to the second data processing models 1-10 are-1, -2, -3, -4, -5, 1, 2, 3, 4 and 5 in sequence, then the first model loss difference values are reversely ordered from negative to positive to obtain-5, -4, -3, -2, -1, 2, 3, 4 and 5.

Step 12: and selecting a first model loss difference value with the sorting sequence less than or equal to a preset sequence value according to the result obtained in the step 11 and according to the reverse sorting.

Continuing with the above example, if the predetermined sequential value is 2, then the selected first model loss difference values are-5 and-4.

Step 13: and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.

Continuing with the example above, if the first model loss difference selected at step 12 is-5 and-4, then the outlier samples are samples 5 and 4 in the training set.

In this embodiment, in addition to the screening of the abnormal samples according to the above steps 11 to 13, a plurality of samples may be selected for sample feature analysis according to the result of the reverse ranking of the loss difference of the first model, and what common features of the samples are analyzed, so as to obtain which features of the samples are mainly concerned by the data processing model during training, and further determine whether the concerned features meet the training purpose, and if not, the model parameters and/or the model structure and/or the training method of the data processing model may be adjusted in a targeted manner. Similarly, in addition to performing feature analysis on the loss difference of the first model in a reverse sorting manner, the loss difference of the first model may also be subjected to forward sorting, then a plurality of samples are selected according to the sequence of the difference from small to large for feature analysis, and common features of the training samples are analyzed, so that which features of the samples are mainly concerned by the data processing model during training can be obtained, whether the concerned features meet the training purpose or not is further judged, and if not, model parameters and/or a model structure and/or a training method of the data processing model can be adjusted in a targeted manner.

In this embodiment, the training set may be sample adjusted according to the following steps 21-22.

Step 21: and acquiring a sample label of the abnormal sample.

Step 22: and judging whether the sample label is correct or not.

If the sample label is correct, the abnormal sample is not suitable for training the data processing model, and the data processing model cannot learn the corresponding capability from the abnormal sample. For example: the purpose of the training data processing model is to enable it to classify a vehicle in the image, whether this vehicle belongs to a motor vehicle or a non-motor vehicle, so the sample labels of the training samples may include motor vehicles and non-motor vehicles. If the acquired abnormal sample is a vehicle image and the sample label is a vehicle (the sample label is correct), but most of the region of the vehicle in the image is blocked by buildings, so that the data processing model cannot learn whether the image is a vehicle image or a non-vehicle from the sample, and therefore the sample needs to be deleted.

And if the sample label is wrong, directly correcting the sample label of the abnormal sample. Continuing with the above example, if the acquired exception sample is a vehicle image and the sample label is a non-vehicle, it is obvious that the sample label of the exception sample is wrong, and therefore the sample label is modified to be a vehicle.

Step S104: and training the first data processing model by using the optimized training set to obtain a final data processing model.

In this embodiment, a model training method used when obtaining the first data processing model may be adopted to continue training the first data processing model. In the present embodiment, the first data processing model may be model-trained by using a conventional model training method different from the above-described model training method in the field of machine learning technology. For brevity of description, the specific process of the model training will not be described herein again.

Through the steps S101 to S104, the embodiment of the invention can quickly and accurately screen the abnormal samples from the training set, and overcomes the defects of low checking efficiency and easy omission and error detection caused by adopting an artificial checking mode to check the samples of the training set in the prior art.

Further, in an implementation manner of the embodiment of the present invention, after the optimized training set is obtained through the above steps S101 to S104, a confrontation training set is generated by using the optimized training set, and then a preset generated confrontation network model is trained by using the optimized training set and the generated confrontation training set at the same time, so as to improve a model capability of generating the confrontation network model. Referring to fig. 3, in the present embodiment, the confrontation training set may be acquired as follows, i.e., S201 to S204.

Step S201: and training a preset data processing model by using the optimized training set to obtain a third data processing model.

Step S202: and respectively testing the third data processing model and the plurality of fourth data processing models by using the test set to obtain a second model loss difference value of the third data processing model and each fourth data processing model on the test set.

The different fourth data processing model may be configured to be trained according to different sub-training sets under the optimized training set, the different sub-training sets differing by one or more different disturbed samples. An example is as follows: if the initial training set includes sample 1, sample 2, and sample 3, then training the preset data processing model with samples 1-3 at the same time may obtain the third data processing model described in step S201 above. If the sub-training set 1 formed by adding disturbance to the sample 1, the sub-training set 2 formed by adding disturbance to the training sample 2, and the sub-training set 3 formed by adding disturbance to the training sample 3 are used, respectively training the preset data processing model, a fourth data processing model shown in the following table 2 can be obtained.

TABLE 2

The second model loss difference is a difference obtained by subtracting the model loss of the third data processing model from the model loss of the fourth data processing model in the test set.

The model loss of the third data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the third data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the third data processing model on the test set.

The model loss of the fourth data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the fourth data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the fourth data processing model on the test set.

In addition, it should be noted that, in this embodiment, the preset data processing model may be trained by using the same model training method as that used for obtaining the third data processing model, so as to obtain each fourth data processing model.

Referring to fig. 4, in the present embodiment, the second model loss difference value may be obtained according to the following method shown in steps S2021 to S2024.

Step S2021: and acquiring a plurality of alternative data processing models of a third data processing model obtained after the preset data processing model is trained by using the optimized training set.

The multiple candidate data processing models of the third data processing model refer to multiple models which can meet preset model training requirements (for example, the accuracy of data classification is greater than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures during training of a preset data processing model.

Step S2022: and respectively testing the multiple candidate data processing models obtained in the step S2021 by using a test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model.

It should be noted that, in this embodiment, each third data processing model may be tested by using a model testing method that is conventional in the field of machine learning technology, so as to obtain an optimal third data processing model from the plurality of third data processing models.

Step S2023: and according to the third model parameters, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of the fourth data processing model corresponding to the current disturbed sample, acquiring the optimal alternative data processing model as the final fourth data processing model and acquiring the fourth model parameters of the final fourth data processing model.

A plurality of alternative data processing models corresponding to the current disturbed sample are obtained by training a preset data processing model by using a sub-training set corresponding to the current disturbed sample, and the alternative data processing models refer to a plurality of models which can meet preset model training requirements (for example, the accuracy of data classification is more than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures when the preset data processing model is trained.

Specifically, in this embodiment, the fourth model parameter may be obtained by fitting according to the third model parameter and the method shown in the following equation (7):

the meaning of each parameter in formula (7) is as follows:

a fourth model parameter, z, representing the final fourth data processing model obtained by fitting_δRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and z_δX denotes a sample z_δY denotes the image sample xA label;

representing a third model parameter; z is a radical of_iRepresents the ith sample in the training set and z_i＝(x_i，y_i)，x_iRepresenting a sample z_iOf (2) an image sample, y_iRepresenting an image sample x_i1.., n; epsilon₂Represents a preset sample weight of the current perturbed sample z and

step S2024: and carrying out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by adopting a steady statistical method so as to obtain a second model loss difference value corresponding to the current disturbed sample.

The method adopts a steady statistical method to carry out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set, can directly obtain a second model loss difference value between the model loss of the third data processing model and the model loss of the final fourth data processing model on the test set, does not need to train the preset data processing model by using an optimized training set which does not add disturbance to the training sample, then train the preset data processing model by using a sub-training set which adds disturbance to the training sample to obtain a fourth data processing model, then respectively obtain the model loss of the third data processing model and the fourth data processing model on the test set, and finally carry out difference calculation on the two model losses to obtain a second model loss difference value, namely, the training process of the fourth data processing model is omitted, the second model loss difference between the model losses of the third data processing model and the final fourth data processing model on the test set can be obtained directly through influence analysis, so that the acquisition efficiency of the second model loss difference is greatly improved, and the rapid generation of the confrontation training set is facilitated.

In one implementation of the embodiment of the present invention, based on the robust statistical method, the second model loss difference value may be obtained according to the following acquisition:

based on an influence function theory in the robust statistical method, constructing an influence function of model loss of the fourth model parameter and the final fourth model parameter on the test set, which is shown in the following formula (8), and calculating a second model loss difference value according to the influence function:

the meaning of each parameter in formula (8) is as follows:

Γ_pert，loss(z，z_test) Representing the second model loss difference, z_testRepresenting a test set, L representing a loss function used in training and testing a preset data processing model,

Transposing the calculated gradient vector;

representing a fourth model parameter; z represents the current perturbed sample,

a Hessian matrix representing the empirical risk of the final fourth data processing model and

a first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance amount δ at the image sample x.

Empirical risk refers to the average of model loss for the data processing model for each training sample in the training set, which can measure the training effectiveness of the data processing model. If the experience risk is smaller, the training effect of the data processing model is better; conversely, the less effective the training of the data processing model. It should be noted that the experience risk is a conventional technique in the field of machine learning technology, and is not described herein again for brevity of description.

The first order Taylor expansion refers to a first order expansion formula of Taylor series (Taylor series) formula. It should be noted that the taylor series formula is a conventional technique in the field of mathematical technology, and is not described herein again for brevity of description.

It should be noted that, in this embodiment, a conventional gradient calculation method in the field of mathematical technology may be adopted to calculate the gradient of the model parameter θ according to the loss value calculated by the loss function L, and for brevity of description, detailed description of the specific work of the gradient calculation method is not repeated here.

The process of constructing the influence function of the model loss on the test set of the fourth model parameter and the final fourth model parameter shown in the above equation (8) will be briefly described below.

First, the fourth model parameter obtained by fitting the disturbed sample z by the disturbance amount δ is obtained in step S2022

Then, based on the influence function theory in the robust statistical method, the disturbance quantity delta and the model parameters shown in the following formula (9) are constructed

Influence function of (2):

expanding equation (9) yields the following equation (10):

if the image samples x in each sample in the training set are consecutive and ε₂Very small, then equation (10) holds for any disturbance δ. When the disturbance amount δ is small, it can be approximated by a first order gradient

Thus the disturbance delta and the model parameters

Can be expressed approximately as an analytical form shown in the following equation (11):

if the disturbed sample z is replaced by the sample z after the disturbance quantity delta is increased_δThen, the variation of the model parameter can be approximated as shown in the following equation (12):

the amount of change in model loss caused by the magnitude of the disturbance amount δ on the test set can be calculated by using the following equation (13):

the fourth model parameter shown in formula (8) can be obtained by solving formula (13)

And model loss.

When the disturbed sample increases the disturbance amount delta and then the data processing model is tested by using the test set, the model loss of the data processing model increases Γ_pert，loss(z，z_test)^Tδ. Therefore, according to the model loss "Γ_pert，loss(z，z_test)^TDelta ', the value of the disturbance quantity delta can be adjusted to obtain the maximum model loss' gamma_pert，loss(z，z_test)^Tδ ". Further, in the present embodiment, Γ may be measured_pert，loss(z，z_test) To analyze the third data processing model's ability to combat the training set perturbations. If f is_pert，loss(z，z_test) The larger the disturbance is, the weaker the resistance to the disturbance of the training set is, and the larger the influence on the model loss of the third data processing model is generated after the disturbance of the training set is increased; if f is_pert，loss(z，z_test) The smaller the model loss, the stronger the resistance to the disturbance of the training set, and the larger the model loss of the third data processing model will not be affected after the disturbance of the training set is increased.

In the present embodiment by means of the fourth model parameter

The influence function of the model loss can quantitatively analyze the influence of the change of the model parameters on the model loss of the fourth data processing model, and the fourth model parameters can be directly calculated according to the influence function

And the influence value of the model loss of the fourth data processing model (the second model loss difference value) is greatly improved, so that the efficiency of obtaining the second model loss difference value is greatly improved, and the rapid generation of the confrontation training set is facilitated.

Step S203: and adjusting the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference value corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference value corresponding to each fourth data processing model.

An example is as follows: if the disturbance quantity of a disturbed sample is increased and the corresponding second model loss difference value is increased and then decreased, the disturbance quantity of the disturbed sample is continuously increased until the second model loss difference value is changed from increasing to decreasing, and then disturbance is stopped increasing.

Step S204: and acquiring a disturbance quantity and a disturbed sample corresponding to the largest second model loss difference value, disturbing the disturbed sample according to the disturbance quantity to form a new sample, constructing an antagonistic training set according to the new sample, namely taking the sample after disturbance increase as an antagonistic sample, and forming the antagonistic training set by all the antagonistic training samples.

Through the steps S201 to S204, the embodiment of the present invention can quickly and accurately generate a large batch of countermeasure samples, thereby improving the training efficiency of the countermeasure generation network model, and enabling the trained countermeasure generation network model to have higher model performance.

It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.

Furthermore, the invention also provides a model training device.

Referring to fig. 5, fig. 5 is a main block diagram of a model training apparatus according to an embodiment of the present invention. As shown in fig. 5, the model training apparatus in the embodiment of the present invention mainly includes a first data processing model obtaining module 31, a first loss difference obtaining module 32, a training set optimizing module 33, and a model training module 34. In some embodiments, one or more of the first data processing model acquisition module 31, the first loss difference acquisition module 32, the training set optimization module 33, and the model training module 34 may be combined together into one module. In some embodiments, the first data processing model obtaining module 31 may be configured to train a preset data processing model with an initial training set, and obtain the first data processing model. The first loss difference acquisition module 32 may be configured to test the first data processing model and the plurality of second data processing models respectively with the test set, and acquire a first model loss difference between the first data processing model and each of the second data processing models on the test set. The training set optimization module 33 may be configured to obtain abnormal samples in the initial training set according to the first model loss difference, and perform sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. Model training module 34 may be configured to train the first data processing model with the optimized training set to obtain a final data processing model. Wherein the different second data processing models may be configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples. In one embodiment, the description of the specific implementation function may refer to steps S101 to S104.

In one embodiment, the first loss difference acquisition module 32 may include a first candidate model acquisition unit, a first parameter acquisition unit, a second parameter acquisition unit, and a first loss difference acquisition unit. In this embodiment, the first candidate model obtaining unit may be configured to obtain a plurality of candidate data processing models of the first data processing model obtained after a preset data processing model is trained by using the initial training set. The first parameter obtaining unit may be configured to test the plurality of candidate data processing models respectively using the test set to obtain an optimal candidate data processing model as a final first data processing model and obtain first model parameters of the final first data processing model. The second parameter obtaining unit may be configured to fit, according to the first model parameter, to respectively test a plurality of candidate data processing models of the second data processing model corresponding to the currently deleted sample by using the test set, obtain an optimal candidate data processing model as a final second data processing model, and obtain a second model parameter of the final second data processing model, where the plurality of candidate data processing models are obtained by training a preset data processing model by using a sub-training set corresponding to the currently deleted sample. The first loss difference obtaining unit may be configured to perform impact analysis on the model loss of the second model parameter and the final second data processing model on the test set by using a robust statistical method to obtain a first model loss difference corresponding to the currently deleted sample. In one embodiment, the description of the specific implementation function may be referred to in step S102.

In one embodiment, the second parameter obtaining unit may be further configured to fit the second model parameters based on the first model parameters and according to a method shown in formula (1). In one embodiment, the description of the specific implementation function may be referred to in step S102.

In one embodiment, the first loss difference obtaining unit may be further configured to construct an influence function of the model loss of the second model parameter and the final second data processing model on the test set as shown in equation (2) based on an influence function theory in the robust statistical method, and calculate the first model loss difference according to the influence function. In one embodiment, the description of the specific implementation function may be referred to in step S102.

In one embodiment, the training set optimization module 33 may be further configured to perform the following operations: reversely ordering the first model loss difference from negative to positive; selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result; and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample. In one embodiment, the description of the specific implementation function may refer to that in step S103.

In one embodiment, the training set optimization module 33 may be further configured to perform the following operations: obtaining a sample label of an abnormal sample; judging whether the sample label is correct or not; if the abnormal sample is correct, deleting the abnormal sample; and if not, correcting the sample label of the abnormal sample. In one embodiment, the description of the specific implementation function may refer to that in step S103.

Referring to fig. 6, in another embodiment of the model training apparatus according to the present invention, the model training apparatus may further include a third data processing model obtaining module 41, a second loss difference obtaining module 42, a third loss difference obtaining module 43, and a confrontation training set obtaining module 44. In some embodiments, one or more of the third data processing model acquisition module 41, the second loss difference acquisition module 42, the third loss difference acquisition module 43, and the antagonistic training set acquisition module 44 may be combined together into one module. In some embodiments, the third data processing model obtaining module 41 may be configured to train a preset data processing model by using the optimized training set, and obtain the third data processing model. The second loss difference obtaining module 42 may be configured to test the third data processing model and the plurality of fourth data processing models respectively by using the test set, and obtain a second model loss difference between the third data processing model and each fourth data processing model on the test set; wherein different fourth data processing models may be configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples. The third loss difference obtaining module 43 may be configured to adjust the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference corresponding to each fourth data processing model. The opposing training set obtaining module 44 may be configured to obtain a perturbation amount corresponding to the largest second model loss difference value and the perturbed sample, and perturb the perturbed sample according to the perturbation amount to form a new sample, so as to construct the opposing training set according to the new sample. In one embodiment, the description of the specific implementation function may refer to steps S201 to S204.

In one embodiment, the second loss difference acquisition module 42 may include a second candidate model acquisition unit, a third parameter acquisition unit, a fourth parameter acquisition unit, and a second loss difference acquisition unit. In this embodiment, the second candidate model obtaining unit may be configured to obtain a plurality of candidate data processing models of a third data processing model obtained after a preset data processing model is trained by using the optimized training set. The third parameter obtaining unit may be configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model; the fourth parameter obtaining unit may be configured to fit, according to the third model parameter, to respectively test a plurality of candidate data processing models of a fourth data processing model corresponding to the currently disturbed sample by using the test set, obtain an optimal candidate data processing model as a final fourth data processing model, and obtain a fourth model parameter of the final fourth data processing model, where the plurality of candidate data processing models are obtained by training a preset data processing model by using a sub-training set corresponding to the currently disturbed sample; the second loss difference obtaining unit may be configured to perform influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by using a robust statistical method to obtain a second model loss difference corresponding to the current disturbed sample. In one embodiment, the description of the specific implementation function may be referred to in step S202.

In one embodiment, the fourth parameter obtaining unit may be further configured to fit the fourth model parameter based on the third model parameter and according to a method shown in formula (7). In one embodiment, the description of the specific implementation function may be referred to in step S202.

In one embodiment, the second loss difference obtaining unit may be further configured to construct an influence function of the model loss of the fourth model parameter and the final fourth model parameter on the test set, which is shown in formula (8), based on an influence function theory in the robust statistical method, and calculate the second model loss difference according to the influence function. In one embodiment, the description of the specific implementation function may be referred to in step S202.

The above-mentioned model training device is used for executing the embodiment of the model training method shown in fig. 1-4, and the technical principles, the solved technical problems and the generated technical effects of the two are similar, and it can be clearly understood by those skilled in the art that for convenience and simplicity of description, the specific working process and related descriptions of the model training device may refer to the contents described in the embodiment of the model training method, and are not repeated herein.

It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program that executes the model training method of the above-described method embodiment, which may be loaded and executed by a processor to implement the above-described model training method. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.

Furthermore, the invention also provides a control device. In an embodiment of the control device according to the invention, the control device comprises a processor and a memory device, the memory device may be configured to store a program for performing the model training method of the above-described method embodiment, and the processor may be configured to execute the program in the memory device, the program including but not limited to the program for performing the model training method of the above-described method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed including various electronic apparatuses.

Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method of model training, the method comprising:

2. The model training method of claim 1, wherein the step of obtaining the first model loss difference between the first data processing model and each of the second data processing models on the test set specifically comprises:

3. A method of model training as claimed in claim 2, comprising fitting the second model parameters based on the first model parameters and according to the method shown in the following equation:

wherein, the

4. the model training method according to claim 2, wherein the step of obtaining the first model loss difference corresponding to the currently deleted sample specifically comprises:

Transposing the calculated gradient vector; the above-mentioned

5. the model training method according to claim 1, wherein the step of obtaining the abnormal samples in the initial training set according to the first model loss difference specifically comprises:

6. The model training method according to claim 1, wherein the step of "performing sample adjustment on the initial training set according to the abnormal sample" specifically comprises:

obtaining a sample label of the abnormal sample;

judging whether the sample label is correct or not;

if the abnormal sample is correct, deleting the abnormal sample;

and if not, correcting the sample label of the abnormal sample.

7. The model training method according to any one of claims 1 to 6, further comprising obtaining a confrontation training set by training a preset generative confrontation network model using the optimized training set and the confrontation training set:

8. The model training method of claim 7, wherein the step of obtaining the second model loss difference between the third data processing model and each of the fourth data processing models on the test set specifically comprises:

9. A method of model training as claimed in claim 8, comprising fitting the fourth model parameters based on the third model parameters and according to the method shown in the following equation:

wherein, the

10. the model training method according to claim 8, wherein the step of obtaining the second model loss difference corresponding to the current disturbed sample specifically comprises:

Transposing the calculated gradient vector; the above-mentioned

the above-mentioned

11. A model training apparatus, the apparatus comprising:

12. The model training apparatus of claim 10, wherein the first loss difference acquisition module comprises a first candidate model acquisition unit, a first parameter acquisition unit, a second parameter acquisition unit, and a first loss difference acquisition unit;

13. The model training apparatus of claim 12, wherein the second parameter obtaining unit is further configured to fit the second model parameters according to the first model parameters and according to a method shown in the following formula:

wherein, the

14. the model training apparatus according to claim 12, wherein the first loss difference obtaining unit is further configured to construct an influence function of the second model parameter and the model loss of the final second data processing model on the test set based on an influence function theory in a robust statistical method, as shown in the following formula, and calculate the first model loss difference according to the influence function:

Transposing the calculated gradient vector; the above-mentioned

15. the model training apparatus of claim 11, wherein the training set optimization module is further configured to:

16. The model training apparatus of claim 11, wherein the training set optimization module is further configured to:

obtaining a sample label of the abnormal sample;

judging whether the sample label is correct or not;

if the abnormal sample is correct, deleting the abnormal sample;

and if not, correcting the sample label of the abnormal sample.

17. Model training apparatus as claimed in any of claims 11 to 16, characterized in that the apparatus further comprises:

18. The model training apparatus of claim 17, wherein the second loss difference acquisition module comprises a second candidate model acquisition unit, a third parameter acquisition unit, a fourth parameter acquisition unit, and a second loss difference acquisition unit;

19. The model training apparatus of claim 18, wherein the fourth parameter obtaining unit is further configured to fit the fourth model parameter according to the third model parameter and according to a method shown in the following formula:

wherein, the

20. the model training apparatus according to claim 18, wherein the second loss difference obtaining unit is further configured to construct an influence function of model losses of the fourth model parameter and the final fourth model parameter on the test set based on an influence function theory in a robust statistical method, and calculate the second model loss difference according to the influence function:

Transposing the calculated gradient vector; the above-mentioned

the above-mentioned

21. A control apparatus comprising a processor and a storage device adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the model training method of any one of claims 1 to 10.

22. A computer-readable storage medium, in which a plurality of program codes are stored, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the model training method of any one of claims 1 to 10.