CN112529209A - Model training method, device and computer readable storage medium - Google Patents

Model training method, device and computer readable storage medium Download PDF

Info

Publication number
CN112529209A
CN112529209A CN202011427624.3A CN202011427624A CN112529209A CN 112529209 A CN112529209 A CN 112529209A CN 202011427624 A CN202011427624 A CN 202011427624A CN 112529209 A CN112529209 A CN 112529209A
Authority
CN
China
Prior art keywords
model
data processing
sample
training
processing model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011427624.3A
Other languages
Chinese (zh)
Inventor
孟嘉琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuncong Enterprise Development Co ltd
Original Assignee
Shanghai Yuncong Enterprise Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuncong Enterprise Development Co ltd filed Critical Shanghai Yuncong Enterprise Development Co ltd
Priority to CN202011427624.3A priority Critical patent/CN112529209A/en
Publication of CN112529209A publication Critical patent/CN112529209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to the technical field of machine learning, in particular provides a model training method, a model training device and a computer readable storage medium, and aims to solve the technical problem of how to improve the model training effect. According to the method provided by the embodiment of the invention, the initial training set can be utilized to train the preset data processing model to obtain the first data processing model; acquiring a first model loss difference value of the first data processing model and each second data processing model on a test set, wherein the second data processing model is obtained by training according to different sub-training sets under an initial training set, and the difference between the different sub-training sets is one or more different deleted samples; and finally, obtaining an abnormal sample according to the difference value to optimize the initial training set, and training the first data processing model by using the optimized initial training set. Based on the steps, the method can quickly and accurately screen the abnormal samples from the training set, and greatly improves the model training effect.

Description

Model training method, device and computer readable storage medium
Technical Field
The invention relates to the technical field of machine learning, in particular to a model training method and device and a computer readable storage medium.
Background
Supervised learning in the technical field of machine learning mainly trains a model by using training samples and sample labels, and in order to improve the training effect of the model, the trained model can be ensured to have higher model performance only by using training samples with larger magnitude, such as millions of training samples, and marking accurate sample labels for each training sample in advance. For example: the million-level training samples and the class labels corresponding to the training samples are utilized to train the data classification model, so that the trained data classification model has high classification performance. Due to the fact that the magnitude of the training samples is too large, accurate labeling of each training sample cannot be guaranteed when labeling of the training samples is carried out, and if model training is carried out by using the noise samples with wrong labels, the training effect of the model can be reduced.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a model training method, an apparatus and a computer-readable storage medium that solve or at least partially solve the technical problem of how to improve the model training effect.
In a first aspect, a model training method is provided, the method comprising:
training a preset data processing model by using an initial training set to obtain a first data processing model;
respectively testing the first data processing model and the plurality of second data processing models by using a test set to obtain a first model loss difference value of the first data processing model and each second data processing model on the test set;
obtaining abnormal samples in the initial training set according to the first model loss difference value, and carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;
training the first data processing model by using the optimized training set to obtain a final data processing model;
wherein different second data processing models are configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples.
In one embodiment of the above model training method, the step of "obtaining a first model loss difference value of the first data processing model and each of the second data processing models on the test set" specifically includes:
acquiring a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;
respectively testing the plurality of alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;
according to the first model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a second data processing model corresponding to the currently deleted sample, obtaining an optimal alternative data processing model as a final second data processing model and obtaining a second model parameter of the final second data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the currently deleted sample to train the preset data processing model;
and carrying out influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by adopting a steady statistical method so as to obtain a first model loss difference value corresponding to the current deleted sample.
In an embodiment of the above model training method, the method includes fitting the second model parameter according to the first model parameter and a method shown in the following formula:
and fitting to obtain the second model parameter according to the first model parameter and a method shown as the following formula:
Figure BDA0002819692220000021
wherein, the
Figure BDA0002819692220000022
A second model parameter, said z, representing said final second data processing model as fitteddelRepresenting the currently deleted sample; the above-mentioned
Figure BDA0002819692220000023
Representing the first model parameter; the L represents a loss function used when a preset data processing model is trained and tested; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiA label, i 1.., n, representing the image sample; the epsilon1Representing the preset currently deleted sample zdelSample weights of and
Figure BDA0002819692220000031
in one technical solution of the above model training method, "obtaining the first model loss difference corresponding to the currently deleted sample" specifically includes:
based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the second model parameter and the final second data processing model on the test set, wherein the influence function is represented by the following formula, and calculating the first model loss difference according to the influence function:
Figure BDA0002819692220000032
wherein the gamma isup,loss(zdel,ztest) Representing the first model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure BDA0002819692220000033
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure BDA0002819692220000034
Transposing the calculated gradient vector; the above-mentioned
Figure BDA0002819692220000035
Representing the second model parameters; z isdelRepresents the currently deleted sample, the
Figure BDA0002819692220000036
A Hessian matrix representing empirical risk of the final second data processing model and
Figure BDA0002819692220000037
in one embodiment of the above model training method, "obtaining abnormal samples in the initial training set according to the first model loss difference" specifically includes:
reversely ordering the first model loss difference values from negative to positive;
selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result;
and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.
In one embodiment of the above model training method, "adjusting the training set according to the abnormal sample" specifically includes:
obtaining a sample label of the abnormal sample;
judging whether the sample label is correct or not;
if the abnormal sample is correct, deleting the abnormal sample;
and if not, correcting the sample label of the abnormal sample.
In an embodiment of the above model training method, the method further includes obtaining a confrontation training set in the following manner, so as to train a preset generated confrontation network model by using the optimized training set and the confrontation training set:
training the preset data processing model by using the optimized training set to obtain a third data processing model;
respectively testing the third data processing model and a plurality of fourth data processing models by using the test set to obtain a second model loss difference value of the third data processing model and each fourth data processing model on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;
adjusting the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference value corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference value corresponding to each fourth data processing model;
and acquiring a disturbance quantity and a disturbed sample corresponding to the maximum second model loss difference value, and disturbing the disturbed sample according to the disturbance quantity to form a new sample so as to construct the confrontation training set according to the new sample.
In one technical solution of the above model training method, the step of "obtaining a second model loss difference value of each of the third data processing model and each of the fourth data processing model on the test set" specifically includes:
obtaining a plurality of alternative data processing models of the third data processing model after the preset data processing model is trained by the optimized training set;
respectively testing the multiple alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;
according to the third model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a fourth data processing model corresponding to the current disturbed sample, obtaining an optimal alternative data processing model as a final fourth data processing model and obtaining a fourth model parameter of the final fourth data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the current disturbed sample to train the preset data processing model;
and carrying out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by adopting a steady statistical method so as to obtain a second model loss difference value corresponding to the current disturbed sample.
In an embodiment of the above model training method, the method includes fitting the fourth model parameter according to the third model parameter and a method shown in the following formula:
and fitting to obtain the fourth model parameter according to the third model parameter and a method shown as the following formula:
Figure BDA0002819692220000051
wherein, the
Figure BDA0002819692220000052
A fourth model parameter, said z, representing said final fourth data processing model resulting from said fittingδRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and zδX denotes a sample zδY represents the label of the image sample x; the above-mentioned
Figure BDA0002819692220000053
Representing the third model parameter; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiRepresenting the image sample xi1.., n; the epsilon2Represents a preset sample weight of the current perturbed sample z and
Figure BDA0002819692220000054
in one technical solution of the above model training method, "obtaining a second model loss difference corresponding to the current disturbed sample" specifically includes:
based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the fourth model parameter and the final fourth model parameter on the test set, wherein the influence function is represented by the following formula, and calculating the second model loss difference according to the influence function:
Figure BDA0002819692220000055
wherein the gamma ispert,loss(z,ztest) Representing the second model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure BDA0002819692220000056
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure BDA0002819692220000057
Transposing the calculated gradient vector; the above-mentioned
Figure BDA0002819692220000058
Representing the fourth model parameter; the z represents the current perturbed sample, the
Figure BDA0002819692220000059
A Hessian matrix representing empirical risk of the final fourth data processing model and
Figure BDA00028196922200000510
the above-mentioned
Figure BDA00028196922200000511
A first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance δ at the image sample x.
In a second aspect, there is provided a model training apparatus, the apparatus comprising:
a first data processing model obtaining module configured to train a preset data processing model by using an initial training set, obtaining a first data processing model;
a first loss difference acquisition module configured to test the first data processing model and the plurality of second data processing models respectively by using a test set, and acquire a first model loss difference between the first data processing model and each of the second data processing models on the test set respectively;
a training set optimization module configured to obtain abnormal samples in the initial training set according to the first model loss difference value, and perform sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;
a model training module configured to train the first data processing model with the optimized training set to obtain a final data processing model;
wherein different second data processing models are configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples.
In one technical solution of the above model training device, the first loss difference obtaining module includes a first candidate model obtaining unit, a first parameter obtaining unit, a second parameter obtaining unit, and a first loss difference obtaining unit;
the first alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;
the first parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;
the second parameter obtaining unit is configured to fit, according to the first model parameter, a plurality of candidate data processing models of a second data processing model corresponding to the currently deleted sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final second data processing model, and obtain a second model parameter of the final second data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently deleted sample;
the first loss difference obtaining unit is configured to perform influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by using a robust statistical method to obtain a first model loss difference corresponding to the currently deleted sample.
In an embodiment of the above model training device, the second parameter obtaining unit is further configured to fit the second model parameter according to the first model parameter and according to a method shown in the following formula:
Figure BDA0002819692220000071
wherein, the
Figure BDA0002819692220000072
A second model parameter, said z, representing said final second data processing model as fitteddelRepresenting the currently deleted sample; the above-mentioned
Figure BDA0002819692220000073
Representing the first model parameter; the L represents a loss function used when a preset data processing model is trained and tested; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiA label, i 1.., n, representing the image sample; the epsilon1Representing the preset currently deleted sample zdelSample weights of and
Figure BDA0002819692220000074
in an embodiment of the above model training apparatus, the first loss difference obtaining unit is further configured to construct an influence function of the model loss of the second model parameter and the final second data processing model on the test set, which is shown in the following formula, based on an influence function theory in a robust statistical method, and calculate the first model loss difference according to the influence function:
Figure BDA0002819692220000075
wherein the gamma isup,loss(zdel,ztest) Representing the first model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure BDA0002819692220000076
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure BDA0002819692220000077
Transposing the calculated gradient vector; the above-mentioned
Figure BDA0002819692220000078
Representing the second model parameters; z isdelRepresents the currently deleted sample, the
Figure BDA0002819692220000079
A Hessian matrix representing empirical risk of the final second data processing model and
Figure BDA0002819692220000081
in an aspect of the above model training apparatus, the training set optimization module is further configured to perform the following operations:
reversely ordering the first model loss difference values from negative to positive;
selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result;
and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.
In an aspect of the above model training apparatus, the training set optimization module is further configured to perform the following operations:
obtaining a sample label of the abnormal sample;
judging whether the sample label is correct or not;
if the abnormal sample is correct, deleting the abnormal sample;
and if not, correcting the sample label of the abnormal sample.
In an embodiment of the above model training apparatus, the apparatus further includes:
a third data processing model obtaining module configured to train the preset data processing model by using the optimized training set to obtain a third data processing model;
a second loss difference obtaining module configured to respectively test the third data processing model and a plurality of fourth data processing models by using the test set, and obtain a second model loss difference between the third data processing model and each of the fourth data processing models on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;
a third loss difference obtaining module configured to adjust a disturbance amount of a disturbed sample corresponding to each fourth data processing model according to a variation trend of a second model loss difference corresponding to each fourth data processing model, so as to obtain a maximum second model loss difference corresponding to each fourth data processing model;
and the countermeasure training set acquisition module is configured to acquire a disturbance amount and a disturbed sample corresponding to the maximum second model loss difference value, and disturb the disturbed sample according to the disturbance amount to form a new sample so as to construct the countermeasure training set according to the new sample.
In one technical solution of the above model training device, the second loss difference obtaining module includes a second candidate model obtaining unit, a third parameter obtaining unit, a fourth parameter obtaining unit, and a second loss difference obtaining unit;
the second alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the third data processing model obtained after the preset data processing model is trained by using the optimized training set;
the third parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;
the fourth parameter obtaining unit is configured to fit, according to the third model parameter, a plurality of candidate data processing models of a fourth data processing model corresponding to the currently disturbed sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final fourth data processing model, and obtain a fourth model parameter of the final fourth data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently disturbed sample;
the second loss difference obtaining unit is configured to perform influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by using a robust statistical method to obtain a second model loss difference corresponding to the current disturbed sample.
In an embodiment of the above model training device, the fourth parameter obtaining unit is further configured to obtain the fourth model parameter by fitting according to the third model parameter and a method shown in the following formula:
Figure BDA0002819692220000091
wherein, the
Figure BDA0002819692220000092
A fourth model parameter, said z, representing said final fourth data processing model resulting from said fittingδRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and zδX denotes a sample zδY represents the label of the image sample x; the above-mentioned
Figure BDA0002819692220000093
Representing the third model parameter; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiRepresenting the image sample xi1.., n; the epsilon2Represents a preset sample weight of the current perturbed sample z and
Figure BDA0002819692220000101
in an embodiment of the above model training apparatus, the second loss difference obtaining unit is further configured to construct an influence function of model losses of the fourth model parameter and the final fourth model parameter on the test set, which is shown in the following formula, based on an influence function theory in a robust statistical method, and calculate the second model loss difference according to the influence function:
Figure BDA0002819692220000102
wherein the gamma ispert,loss(z,ztest) Representing the second model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure BDA0002819692220000103
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure BDA0002819692220000104
Transposing the calculated gradient vector; the above-mentioned
Figure BDA0002819692220000105
Representing the fourth model parameter; the z represents the current perturbed sample, the
Figure BDA0002819692220000106
A Hessian matrix representing empirical risk of the final fourth data processing model and
Figure BDA0002819692220000107
the above-mentioned
Figure BDA0002819692220000108
A first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance δ at the image sample x.
In a third aspect, a control device is provided, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the model training method according to any of the above-mentioned aspects of the model training method.
In a fourth aspect, a computer readable storage medium is provided, having stored therein a plurality of program codes adapted to be loaded and run by a processor to perform the model training method according to any one of the above-mentioned aspects of the model training method.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme of implementing the invention, an initial training set can be used for training a preset data processing model to obtain a first data processing model; then, the first data processing model and the second data processing models are respectively tested by using the test set, and a first model loss difference value of the first data processing model and each second data processing model on the test set is obtained; wherein the different second data processing models are configured to be trained from different sub-training sets under the initial training set, and the different sub-training sets differ by one or more different deleted samples. Further, obtaining abnormal samples in the initial training set according to the first model loss difference, and if the model loss obtained after deleting a certain sample is increased, indicating that the deleted sample is a beneficial sample for model training; if the model loss after deleting a sample is reduced, the deleted sample is a sample harmful to the model training, so that the sample can be judged to belong to an abnormal sample. And then, carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. For example: and carrying out reverse sequencing from negative to positive on the first model loss difference values, selecting the first model loss difference values of which the sequencing sequence is less than or equal to a preset sequence value, obtaining second data processing models corresponding to the first model loss difference values, obtaining deleted samples (abnormal samples) when the second data processing models are trained, and permanently deleting the samples from the initial training set to form an optimized training set. And finally, training the first data processing model by using the optimized training set to obtain a final data processing model. Through the steps, the abnormal samples can be quickly and accurately screened out from the training set according to the change situation of the loss difference value of the first model, and the defects that in the prior art, the sample check is performed on the training set in an artificial check mode, the check efficiency is low, and missing and wrong checks are easy to occur are overcome. Meanwhile, the model training effect is greatly improved.
Further, in the technical solution of the present invention, a robust statistical method may be adopted to perform an impact analysis on the model parameters of the second data processing model and the model loss of the second data processing model in the test set, and a first model loss difference between the first data processing model and the second data processing model in the test set is directly obtained according to the result of the impact analysis, without using the first data processing model obtained by training the preset data processing model with the initial training set, then using the second data processing model obtained by training the preset data processing model with the sub-training set after deleting the training sample, then obtaining the model losses of the first data processing model and the second data processing model in the test set, and finally performing a difference calculation on the two model losses to obtain the first model loss difference, that is, omitting the training process on the second data processing model, the first model loss difference between the model losses of the first data processing model and the second data processing model on the test set can be obtained directly through influence analysis, so that the obtaining efficiency of the first model loss difference is greatly improved, and the abnormal samples are screened quickly.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow diagram illustrating the main steps of a model training method according to one embodiment of the present invention;
FIG. 2 is a flow chart illustrating the main steps of a first model loss difference acquisition method according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating the main steps of a model training method according to another embodiment of the present invention;
FIG. 4 is a flow chart illustrating the main steps of a second model loss difference acquisition method according to an embodiment of the present invention;
FIG. 5 is a block diagram of the main structure of a model training apparatus according to an embodiment of the present invention;
FIG. 6 is a block diagram showing the main structure of a model training apparatus according to another embodiment of the present invention;
list of reference numerals:
31: a first data processing model acquisition module; 32: a first loss difference acquisition module; 33: a training set optimization module; 34: a model training module; 41: a third data processing model acquisition module; 42: a second loss difference acquisition module; 43: a third loss difference acquisition module; 44: and a confrontation training set acquisition module.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
Some terms to which the present invention relates are explained first.
Robust statistical methods refer to conventional statistical methods in the field of mathematical statistics, which are able to describe the effect of observations on the estimators. The observed value in the embodiment of the present invention may be a weight of a training sample or a disturbance amount of the training sample, and the estimator is a model loss of the data processing model, that is, the purpose of the embodiment of the present invention using a robust statistical method is to analyze what influence is brought to the model loss of the data processing model after the weight of the training sample is changed or the disturbance amount is increased. The theory of the Influence Function in the robust statistical method means that the Influence of the observed value on the estimated quantity is quantitatively analyzed by the Influence Function (IF) by constructing the Influence Function between the observed value and the estimated quantity. It should be noted that the robust statistical method and the theory of its influence function are conventional techniques in the field of mathematical statistics, and for the sake of brevity, the robust statistical method and the theory of its influence function are not specifically described here.
Generating a countermeasure network model refers to a model constructed based on a generating countermeasure network architecture (GAN). The GAN is a conventional network structure in the field of artificial intelligence technology, and for brevity of description, specific structures, functions and training methods of the GAN are not described herein again.
At present, the traditional sample labeling method mainly utilizes a manual labeling mode to label training samples with larger magnitude, such as millions of training samples, but the manual labeling mode is adopted to label the training samples with larger magnitude, so that errors are easily labeled, and if the noise samples with the wrong labels are utilized to train a model, the training effect of the model can be reduced. However, because the training samples have a large magnitude, if the training samples are continuously checked in an artificial checking manner to screen out the noise samples, time and labor are wasted, and missing detection and false detection are easy to occur.
In the embodiment of the invention, an initial training set can be used for training a preset data processing model (for example, a data classification model) to obtain a first data processing model; then, the first data processing model and the second data processing models are respectively tested by using the test set, and a first model loss difference value of the first data processing model and each second data processing model on the test set is obtained; wherein the different second data processing models are configured to be trained from different sub-training sets under the initial training set, and the different sub-training sets differ by one or more different deleted samples. Further, obtaining abnormal samples (for example, samples with wrong label labeling) in the initial training set according to the first model loss difference, and if the model loss obtained after deleting a certain sample is increased (the first model loss difference is increased), indicating that the deleted sample is a beneficial sample for model training; if the model loss after a certain sample is deleted is reduced (the loss difference of the first model is reduced), the deleted sample is a sample harmful to model training, and therefore the sample can be judged to belong to an abnormal sample. And then, carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. For example: and carrying out reverse sequencing from negative to positive on the first model loss difference values, selecting the first model loss difference values of which the sequencing sequence is less than or equal to a preset sequence value, obtaining second data processing models corresponding to the first model loss difference values, obtaining deleted samples (abnormal samples) when the second data processing models are trained, and permanently deleting the samples from the initial training set to form an optimized training set. And finally, training the first data processing model by using the optimized training set to obtain a final data processing model. Through the steps, the abnormal samples can be quickly and accurately screened out from the training set according to the change situation of the loss difference value of the first model, and the defects that in the prior art, the sample check is performed on the training set in an artificial check mode, the check efficiency is low, and missing and wrong checks are easy to occur are overcome. Meanwhile, the model training effect is greatly improved.
Referring to FIG. 1, FIG. 1 is a flow chart illustrating the main steps of a model training method according to an embodiment of the present invention. As shown in fig. 1, the model training method in the embodiment of the present invention mainly includes the following steps:
step S101: and training a preset data processing model by using the initial training set to obtain a first data processing model.
It should be noted that, in this embodiment, a preset data processing model may be trained by using a conventional model training method in the field of machine learning technology, and for brevity of description, the model training method is not described herein again.
Step S102: and respectively testing the first data processing model and the plurality of second data processing models by using the test set to obtain a first model loss difference value of the first data processing model and each second data processing model on the test set.
The different second data processing model may be configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples. An example is as follows: if the initial training set includes sample 1, sample 2, and sample 3, then training the preset data processing model with samples 1-3 at the same time may obtain the first data processing model described in step S101 above. If the sample 1, the sample 2 and the sample 3 are respectively deleted from the initial training set, sub-training sets 1 to 3 are formed, and then the preset data processing model is respectively trained by using the sub-training sets 1 to 3, so that a second data processing model shown in the following table 1 can be obtained.
TABLE 1
Figure BDA0002819692220000151
The first model loss difference is a difference obtained by subtracting the model loss of the first data processing model from the model loss of the second data processing model in the test set.
The model loss of the first data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the first data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the first data processing model on the test set.
The model loss of the second data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the second data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the second data processing model on the test set.
In addition, it should be noted that, in this embodiment, the preset data processing model may be trained by using the same model training method as that used for acquiring the first data processing model, so as to obtain each second data processing model.
Referring to fig. 2, in the present embodiment, the first model loss difference between the first data processing model and each second data processing model on the test set can be obtained according to the following methods shown in steps S1021-S1024.
Step S1021: and acquiring a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set.
The multiple candidate data processing models of the first data processing model refer to multiple models which can meet preset model training requirements (for example, the accuracy of data classification is greater than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures during training of a preset data processing model.
Step S1022: and respectively testing the plurality of alternative data processing models obtained in the step S1021 by using a test set to obtain an optimal alternative data processing model as a final first data processing model and obtain first model parameters of the final first data processing model.
It should be noted that, in this embodiment, each candidate data processing model may be tested by using a conventional model testing method in the field of machine learning technology, so as to obtain an optimal candidate data processing model from the trained multiple candidate data processing models.
Step S1023: and according to the first model parameters, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of the second data processing model corresponding to the currently deleted sample, acquiring the optimal alternative data processing model as the final second data processing model and acquiring the second model parameters of the final second data processing model.
A plurality of alternative data processing models of the second data processing model are obtained after a preset data processing model is trained by utilizing a sub-training set corresponding to a currently deleted sample, and the alternative data processing models refer to a plurality of models which can meet preset model training requirements (for example, the accuracy of data classification is more than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures when the preset data processing model is trained.
Specifically, in this embodiment, the second model parameter may be obtained by fitting according to the first model parameter and the method shown in the following formula (1):
Figure BDA0002819692220000161
the meaning of each parameter in formula (1) is as follows:
Figure BDA0002819692220000162
second model parameters, z, representing the final second data processing model obtained by fittingdelIndicating the currently deleted sample;
Figure BDA0002819692220000163
representing a first model parameter; l represents a loss function used when a preset data processing model is trained and tested; z is a radical ofiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiA label representing an image sample, i ═ 1, …, n; epsilon1Representing a preset currently deleted sample zdelSample weights of and
Figure BDA0002819692220000164
step S1024: and (4) performing influence analysis on the second model parameter obtained in the step (S1023) and the model loss of the final second data processing model on the test set by adopting a robust statistical method to obtain a first model loss difference value corresponding to the current deleted sample.
Using robust statistical methods to model the second parameters
Figure BDA0002819692220000171
The influence analysis is carried out on the model loss of the final second data processing model on the test set, a first model loss difference between the model losses of the first data processing model and the final second data processing model on the test set can be directly obtained, the second data processing model obtained by training the preset data processing model by using the sub-training set is not needed after the first data processing model obtained by training the preset data processing model by using the initial training set, then the model losses of the first data processing model and the second data processing model on the test set are obtained, finally the difference between the two model losses is calculated to obtain the first model loss difference, namely the training process of the second data processing model is omitted, the first model loss difference between the model losses of the first data processing model and the final second data processing model on the test set can be obtained directly through the influence analysis, therefore, the obtaining efficiency of the loss difference value of the first model is greatly improved, and the method is favorable for rapidly screening abnormal samples.
In an implementation manner of the embodiment of the present invention, based on the robust statistical method, the first model loss difference value may be obtained according to the following acquisition:
constructing a second model parameter shown in the following formula (2) based on an influence function theory in the robust statistical method
Figure BDA0002819692220000172
And calculating a first model loss difference value according to the influence function with the model loss of the final second data processing model on the test set:
Figure BDA0002819692220000173
the meaning of each parameter in the formula (2) is as follows:
Γup,loss(zdel,ztest) Representing the first model loss difference, ztestRepresenting a test set, L representing a loss function used in training and testing a preset data processing model,
Figure BDA0002819692220000174
expressing the gradient of the model parameter theta for the loss value calculated from the loss function L, and T represents the warp
Figure BDA0002819692220000175
Transposing the calculated gradient vector;
Figure BDA0002819692220000176
representing the second model parameters; z is a radical ofdelIndicating that the sample is currently being deleted,
Figure BDA0002819692220000177
a Hessian matrix representing the empirical risk of the final second data processing model and
Figure BDA0002819692220000178
empirical risk refers to the average of model loss for the data processing model for each sample in the training set, which can measure the training effectiveness of the data processing model. If the experience risk is smaller, the training effect of the data processing model is better; conversely, the less effective the training of the data processing model. It should be noted that the experience risk is a conventional technique in the field of machine learning technology, and is not described herein again for brevity of description.
In addition, it should be noted that, in this embodiment, a conventional gradient calculation method in the field of mathematical technology may be adopted to calculate the gradient of the model parameter θ according to the loss value calculated by the loss function L, and for brevity of description, detailed description of specific work of the gradient calculation method is not repeated here.
Second model parameters shown below in equation (2) above
Figure BDA0002819692220000181
The construction process of the impact function with model loss is briefly explained.
First, at the deleted sample z obtained by the fitting in step S1022delCorresponding second model parameters
Figure BDA0002819692220000182
Then, based on the influence function theory in the robust statistical method, a preset deleted sample z shown in the following formula (3) is constructeddelSample weight of1And model parameters
Figure BDA0002819692220000183
Influence function of (2):
Figure BDA0002819692220000184
solving equation (3) according to equation (1) yields the following equation (4):
Figure BDA0002819692220000185
in the formula (4), the first and second groups,
Figure BDA0002819692220000186
representing deleted samples zdelThe Hessian matrix of empirical risks for the corresponding second data processing model,
Figure BDA0002819692220000187
and is
Figure BDA0002819692220000188
Is a positive definite matrix.
The change of the model parameters caused by deleting a certain training sample can be estimated through the formulas (3) to (4), and the data processing model does not need to be trained by the sub-training set after deleting the training sample again to obtain new model parameters.
Then, using the chain rule, the analysis changes the weight of a certain training sample (increasing the sample weight ε)1) The influence of the results of tests on the test set, i.e. the evaluation
Figure BDA0002819692220000189
The variation in model loss incurred over the test set. Specifically, an influence function as shown in the following formula (5) is constructed using the chain rule:
Figure BDA00028196922200001810
expanding equation (5) yields the following equation (6):
Figure BDA00028196922200001811
substituting the equations (3) - (4) into the equation (6) can obtain the analytical expression of the influence function shown in the equation (2).
In the present embodiment by means of the second model parameters
Figure BDA00028196922200001812
The influence function of the model loss of the final second data processing model on the test set can quantitatively analyze the influence of the change of the model parameters on the model loss of the data processing model, and the second model parameters can be directly calculated according to the influence function
Figure BDA0002819692220000191
The influence value of the model loss of the second data processing model on the test set (the loss difference value of the first model) is greatly improved, so that the efficiency of obtaining the loss difference value of the first model is greatly improved, and the method is favorable for quickly screening abnormal samples.
Step S103: and obtaining abnormal samples in the initial training set according to the first model loss difference, and carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set.
In the present embodiment, the abnormality sample may be acquired as follows in steps 11 to 13.
Step 11: and reversely ordering the first model loss difference value from negative to positive. If the loss difference value of the first model is larger, the fact that the corresponding sample is deleted to be more bad for the model is indicated, namely, the deleted sample is a beneficial sample beneficial to model training; if the first model loss difference is smaller, it indicates that the corresponding sample is deleted less badly to the model, that is, the deleted sample is a harmful sample harmful to the model training. Therefore, the abnormal samples with larger hazard degrees can be quickly selected according to the reverse sorting result by reversely sorting the first model loss difference values from negative to positive, namely sorting the first model loss difference values from large to small in hazard degree. Similarly, the beneficial samples with larger beneficial degrees can be quickly selected according to the positive sorting result by sorting the first model loss difference values from positive to negative in a positive direction, namely sorting the beneficial degrees in a descending order.
An example is as follows: if the second data processing models 1-10 are obtained by utilizing the training of the sub training sets 1-10 after the samples 1-10 are deleted respectively, and the first model loss difference values corresponding to the second data processing models 1-10 are-1, -2, -3, -4, -5, 1, 2, 3, 4 and 5 in sequence, then the first model loss difference values are reversely ordered from negative to positive to obtain-5, -4, -3, -2, -1, 2, 3, 4 and 5.
Step 12: and selecting a first model loss difference value with the sorting sequence less than or equal to a preset sequence value according to the result obtained in the step 11 and according to the reverse sorting.
Continuing with the above example, if the predetermined sequential value is 2, then the selected first model loss difference values are-5 and-4.
Step 13: and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.
Continuing with the example above, if the first model loss difference selected at step 12 is-5 and-4, then the outlier samples are samples 5 and 4 in the training set.
In this embodiment, in addition to the screening of the abnormal samples according to the above steps 11 to 13, a plurality of samples may be selected for sample feature analysis according to the result of the reverse ranking of the loss difference of the first model, and what common features of the samples are analyzed, so as to obtain which features of the samples are mainly concerned by the data processing model during training, and further determine whether the concerned features meet the training purpose, and if not, the model parameters and/or the model structure and/or the training method of the data processing model may be adjusted in a targeted manner. Similarly, in addition to performing feature analysis on the loss difference of the first model in a reverse sorting manner, the loss difference of the first model may also be subjected to forward sorting, then a plurality of samples are selected according to the sequence of the difference from small to large for feature analysis, and common features of the training samples are analyzed, so that which features of the samples are mainly concerned by the data processing model during training can be obtained, whether the concerned features meet the training purpose or not is further judged, and if not, model parameters and/or a model structure and/or a training method of the data processing model can be adjusted in a targeted manner.
In this embodiment, the training set may be sample adjusted according to the following steps 21-22.
Step 21: and acquiring a sample label of the abnormal sample.
Step 22: and judging whether the sample label is correct or not.
If the sample label is correct, the abnormal sample is not suitable for training the data processing model, and the data processing model cannot learn the corresponding capability from the abnormal sample. For example: the purpose of the training data processing model is to enable it to classify a vehicle in the image, whether this vehicle belongs to a motor vehicle or a non-motor vehicle, so the sample labels of the training samples may include motor vehicles and non-motor vehicles. If the acquired abnormal sample is a vehicle image and the sample label is a vehicle (the sample label is correct), but most of the region of the vehicle in the image is blocked by buildings, so that the data processing model cannot learn whether the image is a vehicle image or a non-vehicle from the sample, and therefore the sample needs to be deleted.
And if the sample label is wrong, directly correcting the sample label of the abnormal sample. Continuing with the above example, if the acquired exception sample is a vehicle image and the sample label is a non-vehicle, it is obvious that the sample label of the exception sample is wrong, and therefore the sample label is modified to be a vehicle.
Step S104: and training the first data processing model by using the optimized training set to obtain a final data processing model.
In this embodiment, a model training method used when obtaining the first data processing model may be adopted to continue training the first data processing model. In the present embodiment, the first data processing model may be model-trained by using a conventional model training method different from the above-described model training method in the field of machine learning technology. For brevity of description, the specific process of the model training will not be described herein again.
Through the steps S101 to S104, the embodiment of the invention can quickly and accurately screen the abnormal samples from the training set, and overcomes the defects of low checking efficiency and easy omission and error detection caused by adopting an artificial checking mode to check the samples of the training set in the prior art.
Further, in an implementation manner of the embodiment of the present invention, after the optimized training set is obtained through the above steps S101 to S104, a confrontation training set is generated by using the optimized training set, and then a preset generated confrontation network model is trained by using the optimized training set and the generated confrontation training set at the same time, so as to improve a model capability of generating the confrontation network model. Referring to fig. 3, in the present embodiment, the confrontation training set may be acquired as follows, i.e., S201 to S204.
Step S201: and training a preset data processing model by using the optimized training set to obtain a third data processing model.
It should be noted that, in this embodiment, a preset data processing model may be trained by using a conventional model training method in the field of machine learning technology, and for brevity of description, the model training method is not described herein again.
Step S202: and respectively testing the third data processing model and the plurality of fourth data processing models by using the test set to obtain a second model loss difference value of the third data processing model and each fourth data processing model on the test set.
The different fourth data processing model may be configured to be trained according to different sub-training sets under the optimized training set, the different sub-training sets differing by one or more different disturbed samples. An example is as follows: if the initial training set includes sample 1, sample 2, and sample 3, then training the preset data processing model with samples 1-3 at the same time may obtain the third data processing model described in step S201 above. If the sub-training set 1 formed by adding disturbance to the sample 1, the sub-training set 2 formed by adding disturbance to the training sample 2, and the sub-training set 3 formed by adding disturbance to the training sample 3 are used, respectively training the preset data processing model, a fourth data processing model shown in the following table 2 can be obtained.
TABLE 2
Figure BDA0002819692220000221
The second model loss difference is a difference obtained by subtracting the model loss of the third data processing model from the model loss of the fourth data processing model in the test set.
The model loss of the third data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the third data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the third data processing model on the test set.
The model loss of the fourth data processing model in the test set refers to an average loss obtained by averaging model losses corresponding to each test sample obtained by testing the fourth data processing model with each test sample in the test set. It should be noted that, in this embodiment, after the model loss corresponding to each test sample is obtained, other conventional methods for obtaining model losses of the model on the test set in the machine learning technical field may also be adopted to calculate the model losses, so as to obtain the model loss of the fourth data processing model on the test set.
In addition, it should be noted that, in this embodiment, the preset data processing model may be trained by using the same model training method as that used for obtaining the third data processing model, so as to obtain each fourth data processing model.
Referring to fig. 4, in the present embodiment, the second model loss difference value may be obtained according to the following method shown in steps S2021 to S2024.
Step S2021: and acquiring a plurality of alternative data processing models of a third data processing model obtained after the preset data processing model is trained by using the optimized training set.
The multiple candidate data processing models of the third data processing model refer to multiple models which can meet preset model training requirements (for example, the accuracy of data classification is greater than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures during training of a preset data processing model.
Step S2022: and respectively testing the multiple candidate data processing models obtained in the step S2021 by using a test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model.
It should be noted that, in this embodiment, each third data processing model may be tested by using a model testing method that is conventional in the field of machine learning technology, so as to obtain an optimal third data processing model from the plurality of third data processing models.
Step S2023: and according to the third model parameters, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of the fourth data processing model corresponding to the current disturbed sample, acquiring the optimal alternative data processing model as the final fourth data processing model and acquiring the fourth model parameters of the final fourth data processing model.
A plurality of alternative data processing models corresponding to the current disturbed sample are obtained by training a preset data processing model by using a sub-training set corresponding to the current disturbed sample, and the alternative data processing models refer to a plurality of models which can meet preset model training requirements (for example, the accuracy of data classification is more than or equal to a preset accuracy threshold) by adjusting model parameters and/or model structures when the preset data processing model is trained.
Specifically, in this embodiment, the fourth model parameter may be obtained by fitting according to the third model parameter and the method shown in the following equation (7):
Figure BDA0002819692220000231
the meaning of each parameter in formula (7) is as follows:
Figure BDA0002819692220000232
a fourth model parameter, z, representing the final fourth data processing model obtained by fittingδRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and zδX denotes a sample zδY denotes the image sample xA label;
Figure BDA0002819692220000233
representing a third model parameter; z is a radical ofiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiRepresenting an image sample xi1.., n; epsilon2Represents a preset sample weight of the current perturbed sample z and
Figure BDA0002819692220000234
step S2024: and carrying out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by adopting a steady statistical method so as to obtain a second model loss difference value corresponding to the current disturbed sample.
The method adopts a steady statistical method to carry out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set, can directly obtain a second model loss difference value between the model loss of the third data processing model and the model loss of the final fourth data processing model on the test set, does not need to train the preset data processing model by using an optimized training set which does not add disturbance to the training sample, then train the preset data processing model by using a sub-training set which adds disturbance to the training sample to obtain a fourth data processing model, then respectively obtain the model loss of the third data processing model and the fourth data processing model on the test set, and finally carry out difference calculation on the two model losses to obtain a second model loss difference value, namely, the training process of the fourth data processing model is omitted, the second model loss difference between the model losses of the third data processing model and the final fourth data processing model on the test set can be obtained directly through influence analysis, so that the acquisition efficiency of the second model loss difference is greatly improved, and the rapid generation of the confrontation training set is facilitated.
In one implementation of the embodiment of the present invention, based on the robust statistical method, the second model loss difference value may be obtained according to the following acquisition:
based on an influence function theory in the robust statistical method, constructing an influence function of model loss of the fourth model parameter and the final fourth model parameter on the test set, which is shown in the following formula (8), and calculating a second model loss difference value according to the influence function:
Figure BDA0002819692220000241
the meaning of each parameter in formula (8) is as follows:
Γpert,loss(z,ztest) Representing the second model loss difference, ztestRepresenting a test set, L representing a loss function used in training and testing a preset data processing model,
Figure BDA0002819692220000242
expressing the gradient of the model parameter theta for the loss value calculated from the loss function L, and T represents the warp
Figure BDA0002819692220000243
Transposing the calculated gradient vector;
Figure BDA0002819692220000244
representing a fourth model parameter; z represents the current perturbed sample,
Figure BDA0002819692220000245
a Hessian matrix representing the empirical risk of the final fourth data processing model and
Figure BDA0002819692220000246
Figure BDA0002819692220000247
a first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance amount δ at the image sample x.
Empirical risk refers to the average of model loss for the data processing model for each training sample in the training set, which can measure the training effectiveness of the data processing model. If the experience risk is smaller, the training effect of the data processing model is better; conversely, the less effective the training of the data processing model. It should be noted that the experience risk is a conventional technique in the field of machine learning technology, and is not described herein again for brevity of description.
The first order Taylor expansion refers to a first order expansion formula of Taylor series (Taylor series) formula. It should be noted that the taylor series formula is a conventional technique in the field of mathematical technology, and is not described herein again for brevity of description.
It should be noted that, in this embodiment, a conventional gradient calculation method in the field of mathematical technology may be adopted to calculate the gradient of the model parameter θ according to the loss value calculated by the loss function L, and for brevity of description, detailed description of the specific work of the gradient calculation method is not repeated here.
The process of constructing the influence function of the model loss on the test set of the fourth model parameter and the final fourth model parameter shown in the above equation (8) will be briefly described below.
First, the fourth model parameter obtained by fitting the disturbed sample z by the disturbance amount δ is obtained in step S2022
Figure BDA0002819692220000251
Then, based on the influence function theory in the robust statistical method, the disturbance quantity delta and the model parameters shown in the following formula (9) are constructed
Figure BDA0002819692220000252
Influence function of (2):
Figure BDA0002819692220000253
expanding equation (9) yields the following equation (10):
Figure BDA0002819692220000254
if the image samples x in each sample in the training set are consecutive and ε2Very small, then equation (10) holds for any disturbance δ. When the disturbance amount δ is small, it can be approximated by a first order gradient
Figure BDA0002819692220000255
Thus the disturbance delta and the model parameters
Figure BDA0002819692220000256
Can be expressed approximately as an analytical form shown in the following equation (11):
Figure BDA0002819692220000257
if the disturbed sample z is replaced by the sample z after the disturbance quantity delta is increasedδThen, the variation of the model parameter can be approximated as shown in the following equation (12):
Figure BDA0002819692220000258
the amount of change in model loss caused by the magnitude of the disturbance amount δ on the test set can be calculated by using the following equation (13):
Figure BDA0002819692220000261
the fourth model parameter shown in formula (8) can be obtained by solving formula (13)
Figure BDA0002819692220000262
And model loss.
When the disturbed sample increases the disturbance amount delta and then the data processing model is tested by using the test set, the model loss of the data processing model increases Γpert,loss(z,ztest)Tδ. Therefore, according to the model loss "Γpert,loss(z,ztest)TDelta ', the value of the disturbance quantity delta can be adjusted to obtain the maximum model loss' gammapert,loss(z,ztest)Tδ ". Further, in the present embodiment, Γ may be measuredpert,loss(z,ztest) To analyze the third data processing model's ability to combat the training set perturbations. If f ispert,loss(z,ztest) The larger the disturbance is, the weaker the resistance to the disturbance of the training set is, and the larger the influence on the model loss of the third data processing model is generated after the disturbance of the training set is increased; if f ispert,loss(z,ztest) The smaller the model loss, the stronger the resistance to the disturbance of the training set, and the larger the model loss of the third data processing model will not be affected after the disturbance of the training set is increased.
In the present embodiment by means of the fourth model parameter
Figure BDA0002819692220000263
The influence function of the model loss can quantitatively analyze the influence of the change of the model parameters on the model loss of the fourth data processing model, and the fourth model parameters can be directly calculated according to the influence function
Figure BDA0002819692220000264
And the influence value of the model loss of the fourth data processing model (the second model loss difference value) is greatly improved, so that the efficiency of obtaining the second model loss difference value is greatly improved, and the rapid generation of the confrontation training set is facilitated.
Step S203: and adjusting the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference value corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference value corresponding to each fourth data processing model.
An example is as follows: if the disturbance quantity of a disturbed sample is increased and the corresponding second model loss difference value is increased and then decreased, the disturbance quantity of the disturbed sample is continuously increased until the second model loss difference value is changed from increasing to decreasing, and then disturbance is stopped increasing.
Step S204: and acquiring a disturbance quantity and a disturbed sample corresponding to the largest second model loss difference value, disturbing the disturbed sample according to the disturbance quantity to form a new sample, constructing an antagonistic training set according to the new sample, namely taking the sample after disturbance increase as an antagonistic sample, and forming the antagonistic training set by all the antagonistic training samples.
Through the steps S201 to S204, the embodiment of the present invention can quickly and accurately generate a large batch of countermeasure samples, thereby improving the training efficiency of the countermeasure generation network model, and enabling the trained countermeasure generation network model to have higher model performance.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
Furthermore, the invention also provides a model training device.
Referring to fig. 5, fig. 5 is a main block diagram of a model training apparatus according to an embodiment of the present invention. As shown in fig. 5, the model training apparatus in the embodiment of the present invention mainly includes a first data processing model obtaining module 31, a first loss difference obtaining module 32, a training set optimizing module 33, and a model training module 34. In some embodiments, one or more of the first data processing model acquisition module 31, the first loss difference acquisition module 32, the training set optimization module 33, and the model training module 34 may be combined together into one module. In some embodiments, the first data processing model obtaining module 31 may be configured to train a preset data processing model with an initial training set, and obtain the first data processing model. The first loss difference acquisition module 32 may be configured to test the first data processing model and the plurality of second data processing models respectively with the test set, and acquire a first model loss difference between the first data processing model and each of the second data processing models on the test set. The training set optimization module 33 may be configured to obtain abnormal samples in the initial training set according to the first model loss difference, and perform sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set. Model training module 34 may be configured to train the first data processing model with the optimized training set to obtain a final data processing model. Wherein the different second data processing models may be configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples. In one embodiment, the description of the specific implementation function may refer to steps S101 to S104.
In one embodiment, the first loss difference acquisition module 32 may include a first candidate model acquisition unit, a first parameter acquisition unit, a second parameter acquisition unit, and a first loss difference acquisition unit. In this embodiment, the first candidate model obtaining unit may be configured to obtain a plurality of candidate data processing models of the first data processing model obtained after a preset data processing model is trained by using the initial training set. The first parameter obtaining unit may be configured to test the plurality of candidate data processing models respectively using the test set to obtain an optimal candidate data processing model as a final first data processing model and obtain first model parameters of the final first data processing model. The second parameter obtaining unit may be configured to fit, according to the first model parameter, to respectively test a plurality of candidate data processing models of the second data processing model corresponding to the currently deleted sample by using the test set, obtain an optimal candidate data processing model as a final second data processing model, and obtain a second model parameter of the final second data processing model, where the plurality of candidate data processing models are obtained by training a preset data processing model by using a sub-training set corresponding to the currently deleted sample. The first loss difference obtaining unit may be configured to perform impact analysis on the model loss of the second model parameter and the final second data processing model on the test set by using a robust statistical method to obtain a first model loss difference corresponding to the currently deleted sample. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the second parameter obtaining unit may be further configured to fit the second model parameters based on the first model parameters and according to a method shown in formula (1). In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the first loss difference obtaining unit may be further configured to construct an influence function of the model loss of the second model parameter and the final second data processing model on the test set as shown in equation (2) based on an influence function theory in the robust statistical method, and calculate the first model loss difference according to the influence function. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the training set optimization module 33 may be further configured to perform the following operations: reversely ordering the first model loss difference from negative to positive; selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result; and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample. In one embodiment, the description of the specific implementation function may refer to that in step S103.
In one embodiment, the training set optimization module 33 may be further configured to perform the following operations: obtaining a sample label of an abnormal sample; judging whether the sample label is correct or not; if the abnormal sample is correct, deleting the abnormal sample; and if not, correcting the sample label of the abnormal sample. In one embodiment, the description of the specific implementation function may refer to that in step S103.
Referring to fig. 6, in another embodiment of the model training apparatus according to the present invention, the model training apparatus may further include a third data processing model obtaining module 41, a second loss difference obtaining module 42, a third loss difference obtaining module 43, and a confrontation training set obtaining module 44. In some embodiments, one or more of the third data processing model acquisition module 41, the second loss difference acquisition module 42, the third loss difference acquisition module 43, and the antagonistic training set acquisition module 44 may be combined together into one module. In some embodiments, the third data processing model obtaining module 41 may be configured to train a preset data processing model by using the optimized training set, and obtain the third data processing model. The second loss difference obtaining module 42 may be configured to test the third data processing model and the plurality of fourth data processing models respectively by using the test set, and obtain a second model loss difference between the third data processing model and each fourth data processing model on the test set; wherein different fourth data processing models may be configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples. The third loss difference obtaining module 43 may be configured to adjust the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference corresponding to each fourth data processing model. The opposing training set obtaining module 44 may be configured to obtain a perturbation amount corresponding to the largest second model loss difference value and the perturbed sample, and perturb the perturbed sample according to the perturbation amount to form a new sample, so as to construct the opposing training set according to the new sample. In one embodiment, the description of the specific implementation function may refer to steps S201 to S204.
In one embodiment, the second loss difference acquisition module 42 may include a second candidate model acquisition unit, a third parameter acquisition unit, a fourth parameter acquisition unit, and a second loss difference acquisition unit. In this embodiment, the second candidate model obtaining unit may be configured to obtain a plurality of candidate data processing models of a third data processing model obtained after a preset data processing model is trained by using the optimized training set. The third parameter obtaining unit may be configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model; the fourth parameter obtaining unit may be configured to fit, according to the third model parameter, to respectively test a plurality of candidate data processing models of a fourth data processing model corresponding to the currently disturbed sample by using the test set, obtain an optimal candidate data processing model as a final fourth data processing model, and obtain a fourth model parameter of the final fourth data processing model, where the plurality of candidate data processing models are obtained by training a preset data processing model by using a sub-training set corresponding to the currently disturbed sample; the second loss difference obtaining unit may be configured to perform influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by using a robust statistical method to obtain a second model loss difference corresponding to the current disturbed sample. In one embodiment, the description of the specific implementation function may be referred to in step S202.
In one embodiment, the fourth parameter obtaining unit may be further configured to fit the fourth model parameter based on the third model parameter and according to a method shown in formula (7). In one embodiment, the description of the specific implementation function may be referred to in step S202.
In one embodiment, the second loss difference obtaining unit may be further configured to construct an influence function of the model loss of the fourth model parameter and the final fourth model parameter on the test set, which is shown in formula (8), based on an influence function theory in the robust statistical method, and calculate the second model loss difference according to the influence function. In one embodiment, the description of the specific implementation function may be referred to in step S202.
The above-mentioned model training device is used for executing the embodiment of the model training method shown in fig. 1-4, and the technical principles, the solved technical problems and the generated technical effects of the two are similar, and it can be clearly understood by those skilled in the art that for convenience and simplicity of description, the specific working process and related descriptions of the model training device may refer to the contents described in the embodiment of the model training method, and are not repeated herein.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program that executes the model training method of the above-described method embodiment, which may be loaded and executed by a processor to implement the above-described model training method. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.
Furthermore, the invention also provides a control device. In an embodiment of the control device according to the invention, the control device comprises a processor and a memory device, the memory device may be configured to store a program for performing the model training method of the above-described method embodiment, and the processor may be configured to execute the program in the memory device, the program including but not limited to the program for performing the model training method of the above-described method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed including various electronic apparatuses.
Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (22)

1. A method of model training, the method comprising:
training a preset data processing model by using an initial training set to obtain a first data processing model;
respectively testing the first data processing model and the plurality of second data processing models by using a test set to obtain a first model loss difference value of the first data processing model and each second data processing model on the test set;
obtaining abnormal samples in the initial training set according to the first model loss difference value, and carrying out sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;
training the first data processing model by using the optimized training set to obtain a final data processing model;
wherein different second data processing models are configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples.
2. The model training method of claim 1, wherein the step of obtaining the first model loss difference between the first data processing model and each of the second data processing models on the test set specifically comprises:
acquiring a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;
respectively testing the plurality of alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;
according to the first model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a second data processing model corresponding to the currently deleted sample, obtaining an optimal alternative data processing model as a final second data processing model and obtaining a second model parameter of the final second data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the currently deleted sample to train the preset data processing model;
and carrying out influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by adopting a steady statistical method so as to obtain a first model loss difference value corresponding to the current deleted sample.
3. A method of model training as claimed in claim 2, comprising fitting the second model parameters based on the first model parameters and according to the method shown in the following equation:
Figure FDA0002819692210000021
wherein, the
Figure FDA0002819692210000022
A second model parameter, said z, representing said final second data processing model as fitteddelRepresenting the currently deleted sample; the above-mentioned
Figure FDA0002819692210000023
Representing the first model parameter; the L represents a loss function used when a preset data processing model is trained and tested; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiA label, i 1.., n, representing the image sample; the epsilon1Representing the preset currently deleted sample zdelSample weights of and
Figure FDA0002819692210000024
4. the model training method according to claim 2, wherein the step of obtaining the first model loss difference corresponding to the currently deleted sample specifically comprises:
based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the second model parameter and the final second data processing model on the test set, wherein the influence function is represented by the following formula, and calculating the first model loss difference according to the influence function:
Figure FDA0002819692210000025
wherein the gamma isup,loss(zdel,ztest) Representing the first model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure FDA0002819692210000026
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure FDA0002819692210000027
Transposing the calculated gradient vector; the above-mentioned
Figure FDA0002819692210000028
Representing the second model parameters; z isdelRepresents the currently deleted sample, the
Figure FDA0002819692210000029
A Hessian matrix representing empirical risk of the final second data processing model and
Figure FDA00028196922100000210
5. the model training method according to claim 1, wherein the step of obtaining the abnormal samples in the initial training set according to the first model loss difference specifically comprises:
reversely ordering the first model loss difference values from negative to positive;
selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result;
and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.
6. The model training method according to claim 1, wherein the step of "performing sample adjustment on the initial training set according to the abnormal sample" specifically comprises:
obtaining a sample label of the abnormal sample;
judging whether the sample label is correct or not;
if the abnormal sample is correct, deleting the abnormal sample;
and if not, correcting the sample label of the abnormal sample.
7. The model training method according to any one of claims 1 to 6, further comprising obtaining a confrontation training set by training a preset generative confrontation network model using the optimized training set and the confrontation training set:
training the preset data processing model by using the optimized training set to obtain a third data processing model;
respectively testing the third data processing model and a plurality of fourth data processing models by using the test set to obtain a second model loss difference value of the third data processing model and each fourth data processing model on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;
adjusting the disturbance amount of the disturbed sample corresponding to each fourth data processing model according to the variation trend of the second model loss difference value corresponding to each fourth data processing model, so as to obtain the maximum second model loss difference value corresponding to each fourth data processing model;
and acquiring a disturbance quantity and a disturbed sample corresponding to the maximum second model loss difference value, and disturbing the disturbed sample according to the disturbance quantity to form a new sample so as to construct the confrontation training set according to the new sample.
8. The model training method of claim 7, wherein the step of obtaining the second model loss difference between the third data processing model and each of the fourth data processing models on the test set specifically comprises:
obtaining a plurality of alternative data processing models of the third data processing model after the preset data processing model is trained by the optimized training set;
respectively testing the multiple alternative data processing models by using the test set to obtain an optimal alternative data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;
according to the third model parameter, fitting and utilizing the test set to respectively test a plurality of alternative data processing models of a fourth data processing model corresponding to the current disturbed sample, obtaining an optimal alternative data processing model as a final fourth data processing model and obtaining a fourth model parameter of the final fourth data processing model, wherein the plurality of alternative data processing models are obtained by utilizing a sub-training set corresponding to the current disturbed sample to train the preset data processing model;
and carrying out influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by adopting a steady statistical method so as to obtain a second model loss difference value corresponding to the current disturbed sample.
9. A method of model training as claimed in claim 8, comprising fitting the fourth model parameters based on the third model parameters and according to the method shown in the following equation:
Figure FDA0002819692210000041
wherein, the
Figure FDA0002819692210000042
A fourth model parameter, said z, representing said final fourth data processing model resulting from said fittingδRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and zδX denotes a sample zδY represents the label of the image sample x; the above-mentioned
Figure FDA0002819692210000043
Representing the third model parameter; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiRepresenting the image sample xi1.., n; the epsilon2Represents a preset sample weight of the current perturbed sample z and
Figure FDA0002819692210000044
10. the model training method according to claim 8, wherein the step of obtaining the second model loss difference corresponding to the current disturbed sample specifically comprises:
based on an influence function theory in a robust statistical method, constructing an influence function of model loss of the fourth model parameter and the final fourth model parameter on the test set, wherein the influence function is represented by the following formula, and calculating the second model loss difference according to the influence function:
Figure FDA0002819692210000051
wherein the gamma ispert,loss(z,ztest) Representing the second model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure FDA0002819692210000052
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure FDA0002819692210000053
Transposing the calculated gradient vector; the above-mentioned
Figure FDA0002819692210000054
Representing the fourth model parameter; the z represents the current perturbed sample, the
Figure FDA0002819692210000055
A Hessian matrix representing empirical risk of the final fourth data processing model and
Figure FDA0002819692210000056
the above-mentioned
Figure FDA0002819692210000057
A first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance δ at the image sample x.
11. A model training apparatus, the apparatus comprising:
a first data processing model obtaining module configured to train a preset data processing model by using an initial training set, obtaining a first data processing model;
a first loss difference acquisition module configured to test the first data processing model and the plurality of second data processing models respectively by using a test set, and acquire a first model loss difference between the first data processing model and each of the second data processing models on the test set respectively;
a training set optimization module configured to obtain abnormal samples in the initial training set according to the first model loss difference value, and perform sample adjustment on the initial training set according to the abnormal samples to obtain an optimized training set;
a model training module configured to train the first data processing model with the optimized training set to obtain a final data processing model;
wherein different second data processing models are configured to be trained from different sub-training sets under the initial training set, the different sub-training sets differing by one or more different deleted samples.
12. The model training apparatus of claim 10, wherein the first loss difference acquisition module comprises a first candidate model acquisition unit, a first parameter acquisition unit, a second parameter acquisition unit, and a first loss difference acquisition unit;
the first alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the first data processing model obtained after the preset data processing model is trained by using the initial training set;
the first parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final first data processing model and obtain first model parameters of the final first data processing model;
the second parameter obtaining unit is configured to fit, according to the first model parameter, a plurality of candidate data processing models of a second data processing model corresponding to the currently deleted sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final second data processing model, and obtain a second model parameter of the final second data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently deleted sample;
the first loss difference obtaining unit is configured to perform influence analysis on the model loss of the second model parameter and the final second data processing model on the test set by using a robust statistical method to obtain a first model loss difference corresponding to the currently deleted sample.
13. The model training apparatus of claim 12, wherein the second parameter obtaining unit is further configured to fit the second model parameters according to the first model parameters and according to a method shown in the following formula:
Figure FDA0002819692210000061
wherein, the
Figure FDA0002819692210000062
A second model parameter, said z, representing said final second data processing model as fitteddelRepresenting the currently deleted sample; the above-mentioned
Figure FDA0002819692210000063
Representing the first model parameter; the L represents a loss function used when a preset data processing model is trained and tested; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiA label, i 1.., n, representing the image sample; the epsilon1Representing the preset currently deleted sample zdelSample weights of and
Figure FDA0002819692210000064
14. the model training apparatus according to claim 12, wherein the first loss difference obtaining unit is further configured to construct an influence function of the second model parameter and the model loss of the final second data processing model on the test set based on an influence function theory in a robust statistical method, as shown in the following formula, and calculate the first model loss difference according to the influence function:
Figure FDA0002819692210000071
wherein the gamma isup,loss(zdel,ztest) Representing the first model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure FDA0002819692210000072
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure FDA0002819692210000073
Transposing the calculated gradient vector; the above-mentioned
Figure FDA0002819692210000074
Representing the second model parameters; z isdelRepresents the currently deleted sample, the
Figure FDA0002819692210000075
A Hessian matrix representing empirical risk of the final second data processing model and
Figure FDA0002819692210000076
15. the model training apparatus of claim 11, wherein the training set optimization module is further configured to:
reversely ordering the first model loss difference values from negative to positive;
selecting a first model loss difference value with the sorting order less than or equal to a preset order value according to a reverse sorting result;
and acquiring a deleted sample during training of the second data processing model according to the second data processing model corresponding to the selected first model loss difference, and taking the deleted sample as an abnormal sample.
16. The model training apparatus of claim 11, wherein the training set optimization module is further configured to:
obtaining a sample label of the abnormal sample;
judging whether the sample label is correct or not;
if the abnormal sample is correct, deleting the abnormal sample;
and if not, correcting the sample label of the abnormal sample.
17. Model training apparatus as claimed in any of claims 11 to 16, characterized in that the apparatus further comprises:
a third data processing model obtaining module configured to train the preset data processing model by using the optimized training set to obtain a third data processing model;
a second loss difference obtaining module configured to respectively test the third data processing model and a plurality of fourth data processing models by using the test set, and obtain a second model loss difference between the third data processing model and each of the fourth data processing models on the test set; wherein different fourth data processing models are configured to be trained according to different sub-training sets under the optimized training set, and the different sub-training sets differ by one or more different disturbed samples;
a third loss difference obtaining module configured to adjust a disturbance amount of a disturbed sample corresponding to each fourth data processing model according to a variation trend of a second model loss difference corresponding to each fourth data processing model, so as to obtain a maximum second model loss difference corresponding to each fourth data processing model;
and the countermeasure training set acquisition module is configured to acquire a disturbance amount and a disturbed sample corresponding to the maximum second model loss difference value, and disturb the disturbed sample according to the disturbance amount to form a new sample so as to construct the countermeasure training set according to the new sample.
18. The model training apparatus of claim 17, wherein the second loss difference acquisition module comprises a second candidate model acquisition unit, a third parameter acquisition unit, a fourth parameter acquisition unit, and a second loss difference acquisition unit;
the second alternative model acquisition unit is configured to acquire a plurality of alternative data processing models of the third data processing model obtained after the preset data processing model is trained by using the optimized training set;
the third parameter obtaining unit is configured to respectively test the plurality of candidate data processing models by using the test set to obtain an optimal candidate data processing model as a final third data processing model and obtain third model parameters of the final third data processing model;
the fourth parameter obtaining unit is configured to fit, according to the third model parameter, a plurality of candidate data processing models of a fourth data processing model corresponding to the currently disturbed sample to be respectively tested by using the test set, obtain an optimal candidate data processing model as a final fourth data processing model, and obtain a fourth model parameter of the final fourth data processing model, wherein the plurality of candidate data processing models are obtained by training the preset data processing model by using a sub-training set corresponding to the currently disturbed sample;
the second loss difference obtaining unit is configured to perform influence analysis on the model loss of the fourth model parameter and the final fourth data processing model on the test set by using a robust statistical method to obtain a second model loss difference corresponding to the current disturbed sample.
19. The model training apparatus of claim 18, wherein the fourth parameter obtaining unit is further configured to fit the fourth model parameter according to the third model parameter and according to a method shown in the following formula:
Figure FDA0002819692210000091
wherein, the
Figure FDA0002819692210000092
A fourth model parameter, said z, representing said final fourth data processing model resulting from said fittingδRepresents a new sample formed after adding a disturbance quantity delta to a current disturbed sample z and zδX denotes a sample zδY represents the label of the image sample x; the above-mentioned
Figure FDA0002819692210000093
Representing the third model parameter; z isiRepresents the ith sample in the training set and zi=(xi,yi),xiRepresenting a sample ziOf (2) an image sample, yiRepresenting the image sample xi1.., n; the epsilon2Represents a preset sample weight of the current perturbed sample z and
Figure FDA0002819692210000094
20. the model training apparatus according to claim 18, wherein the second loss difference obtaining unit is further configured to construct an influence function of model losses of the fourth model parameter and the final fourth model parameter on the test set based on an influence function theory in a robust statistical method, and calculate the second model loss difference according to the influence function:
Figure FDA0002819692210000095
wherein the gamma ispert,loss(z,ztest) Representing the second model loss difference, said ztestRepresenting the test set, the L representing a loss function used in training and testing a preset data processing model, the L representing a loss function used in training and testing the loss function, the L representing a
Figure FDA0002819692210000096
Representing a gradient of the model parameter θ for a loss value calculated from the loss function L, the T representing a sum of
Figure FDA0002819692210000097
Transposing the calculated gradient vector; the above-mentioned
Figure FDA0002819692210000098
Representing the fourth model parameter; the z represents the current perturbed sample, the
Figure FDA0002819692210000099
A Hessian matrix representing empirical risk of the final fourth data processing model and
Figure FDA00028196922100000910
the above-mentioned
Figure FDA00028196922100000911
A first order taylor expansion corresponding to the difference of the loss values calculated from the loss function L before and after the sample z is increased by the disturbance δ at the image sample x.
21. A control apparatus comprising a processor and a storage device adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the model training method of any one of claims 1 to 10.
22. A computer-readable storage medium, in which a plurality of program codes are stored, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the model training method of any one of claims 1 to 10.
CN202011427624.3A 2020-12-07 2020-12-07 Model training method, device and computer readable storage medium Pending CN112529209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011427624.3A CN112529209A (en) 2020-12-07 2020-12-07 Model training method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011427624.3A CN112529209A (en) 2020-12-07 2020-12-07 Model training method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112529209A true CN112529209A (en) 2021-03-19

Family

ID=74996877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011427624.3A Pending CN112529209A (en) 2020-12-07 2020-12-07 Model training method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112529209A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239022A (en) * 2021-04-19 2021-08-10 浙江大学 Method and device for complementing missing data in medical diagnosis, electronic device and medium
CN113505800A (en) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 Image processing method and training method, device, equipment and medium of model thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007866B2 (en) * 2016-04-28 2018-06-26 Microsoft Technology Licensing, Llc Neural network image classifier
CN108932527A (en) * 2018-06-06 2018-12-04 上海交通大学 Using cross-training model inspection to the method for resisting sample
CN109606378A (en) * 2018-11-19 2019-04-12 江苏大学 Vehicle running state estimation method towards non-Gaussian noise environment
US20190325738A1 (en) * 2018-04-18 2019-10-24 Here Global B.V. Lane-level geometry and traffic information
CN110378961A (en) * 2019-09-11 2019-10-25 图谱未来(南京)人工智能研究院有限公司 Optimization method, critical point detection method, apparatus and the storage medium of model
CN110532880A (en) * 2019-07-29 2019-12-03 深圳大学 Screening sample and expression recognition method, neural network, equipment and storage medium
CN110796153A (en) * 2018-08-01 2020-02-14 阿里巴巴集团控股有限公司 Training sample processing method and device
CN110866528A (en) * 2019-10-28 2020-03-06 腾讯科技(深圳)有限公司 Model training method, energy consumption use efficiency prediction method, device and medium
CN110991657A (en) * 2019-11-22 2020-04-10 深圳市魔数智擎人工智能有限公司 Abnormal sample detection method based on machine learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007866B2 (en) * 2016-04-28 2018-06-26 Microsoft Technology Licensing, Llc Neural network image classifier
US20190325738A1 (en) * 2018-04-18 2019-10-24 Here Global B.V. Lane-level geometry and traffic information
CN108932527A (en) * 2018-06-06 2018-12-04 上海交通大学 Using cross-training model inspection to the method for resisting sample
CN110796153A (en) * 2018-08-01 2020-02-14 阿里巴巴集团控股有限公司 Training sample processing method and device
CN109606378A (en) * 2018-11-19 2019-04-12 江苏大学 Vehicle running state estimation method towards non-Gaussian noise environment
CN110532880A (en) * 2019-07-29 2019-12-03 深圳大学 Screening sample and expression recognition method, neural network, equipment and storage medium
CN110378961A (en) * 2019-09-11 2019-10-25 图谱未来(南京)人工智能研究院有限公司 Optimization method, critical point detection method, apparatus and the storage medium of model
CN110866528A (en) * 2019-10-28 2020-03-06 腾讯科技(深圳)有限公司 Model training method, energy consumption use efficiency prediction method, device and medium
CN110991657A (en) * 2019-11-22 2020-04-10 深圳市魔数智擎人工智能有限公司 Abnormal sample detection method based on machine learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
PANGWEI KOH ET AL.: ""Understanding Black-box Predictions via Influence Functions"", 《ARXIV:1703.04730V2》 *
SAMYADEEP BASU ET AL.: ""Influence Functions in Deep Learning Are Fragile"", 《ARXIV:2006.14651V1》 *
朱参世 等: ""一种参数容错辨识法判别和剔除野值方法研究"", 《微计算机信息》 *
王强 等: ""基于生成式-判别式混合模型的可解释性文档分类"", 《模式识别与人工智能》 *
袁兴梅 等: ""基于RSC模型和噪声去除的半监督训练方法"", 《计算机工程与科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239022A (en) * 2021-04-19 2021-08-10 浙江大学 Method and device for complementing missing data in medical diagnosis, electronic device and medium
WO2022222026A1 (en) * 2021-04-19 2022-10-27 浙江大学 Medical diagnosis missing data completion method and completion apparatus, and electronic device and medium
CN113505800A (en) * 2021-06-30 2021-10-15 深圳市慧鲤科技有限公司 Image processing method and training method, device, equipment and medium of model thereof

Similar Documents

Publication Publication Date Title
Colas et al. How many random seeds? statistical power analysis in deep reinforcement learning experiments
CN110009171B (en) User behavior simulation method, device, equipment and computer readable storage medium
US7107187B1 (en) Method for modeling system performance
CN107015875B (en) Method and device for evaluating storage life of electronic complete machine
CN110348615B (en) Cable line fault probability prediction method based on ant colony optimization support vector machine
CN112529209A (en) Model training method, device and computer readable storage medium
CN109115383B (en) Fatigue life prediction method for cold extrusion reinforced hole
CN111967535B (en) Fault diagnosis method and device for temperature sensor of grain storage management scene
CN110795780A (en) XGboost algorithm-based cable-stayed bridge finite element correction method
CN110706213A (en) Bridge cluster structure damage judgment method based on strain response cumulative distribution function difference
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN112433896A (en) Server disk failure prediction method, device, equipment and storage medium
CN112507605A (en) Power distribution network anomaly detection method based on AnoGAN
CN106886620B (en) Spacecraft test resource optimal configuration method
CN110956112B (en) Novel high-reliability slewing bearing service life assessment method
CN115146569A (en) Integrated circuit delay determination method, device and equipment
Zuiev et al. Questions of radioelectronic equipment diagnostics programs efficiency analysis
CN111079348B (en) Method and device for detecting slowly-varying signal
CN116610484B (en) Model training method, fault prediction method, system, equipment and medium
CN112613191A (en) Cable health state evaluation method and device, computer equipment and storage medium
CN107290603B (en) Product reliability evaluation method and device
CN110928269A (en) Degradation acceleration test optimization design method and system based on inertial navigation platform
CN110749813B (en) Test system and method for generating adaptive test recipe
CN117330987B (en) Method, system, medium and apparatus for time-based battery state of health assessment
Egan et al. Machine Learning Using High-Precision Data for Fault Location

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210319