CN117576522B

CN117576522B - Model training method and device based on mimicry structure dynamic defense

Info

Publication number: CN117576522B
Application number: CN202410076456.XA
Authority: CN
Inventors: 张音捷; 王之宇; 张奕鹏; 白冰; 孙才俊; 孙天宁; 徐昊天
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-01-18
Filing date: 2024-01-18
Publication date: 2024-04-26
Anticipated expiration: 2044-01-18
Also published as: CN117576522A

Abstract

The specification discloses a model training method and device based on mimicry structure dynamic defense. The task execution method comprises the following steps: and acquiring a pre-training model, and inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image. And determining a second image according to the identification result corresponding to the first image and the actual label corresponding to the first image. Inputting the second image into a pre-training model, determining weights corresponding to all sub-recognition networks arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain all recognition results, weighting all the recognition results according to the determined weights corresponding to all the sub-recognition networks to obtain a final recognition result, and training the pre-training model by taking the deviation between the minimum final recognition result and an actual label as an optimization target.

Description

Model training method and device based on mimicry structure dynamic defense

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a model training method and apparatus based on dynamic defense of a mimicry structure.

Background

With the rapid development of artificial intelligence, deep learning has achieved remarkable achievements in various fields such as image recognition, natural language processing, voice recognition and the like. However, the neural network model often has unstable performance in complex environments such as noise, disturbance, resistance attack and the like, and lacks robustness.

Currently, when a neural network model is used to identify a picture, wherein a picture sample input into the neural network model may include an antagonistic sample, the neural network model may decrease accuracy of an output result when identifying the antagonistic sample, and the variety of the antagonistic sample is increased over time, and the current training method is insufficient to obtain a neural network model with higher accuracy.

Based on this, how to improve the accuracy of neural network model training is a urgent problem to be solved.

Disclosure of Invention

The present disclosure provides a model training method and apparatus based on dynamic defense of a mimicry structure, so as to partially solve the above-mentioned problems in the prior art.

The technical scheme adopted in the specification is as follows:

The specification provides a model training method based on mimicry structure dynamic defense, which comprises the following steps:

Obtaining a pre-training model;

Inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image;

Determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image;

generating interference data according to the gradient information corresponding to the reverse gradient direction of the gradient direction;

Adding the interference data into the first image to obtain a second image;

Inputting the second image into the pre-training model, determining weights corresponding to all sub-recognition networks arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain all recognition results, and weighting all the recognition results according to the determined weights corresponding to all the sub-recognition networks to obtain a final recognition result;

And training the pre-training model by taking the deviation between the minimum final recognition result and the actual label as an optimization target.

Optionally, each sub-recognition network includes a first sub-recognition network and a second sub-recognition network, where the first sub-recognition network is used to recognize the image input into the first sub-recognition network through the learned recognition rule for recognizing the first image, and the second sub-recognition network is used to recognize the image input into the second sub-recognition network through the learned recognition rule for recognizing the second image;

Training the pre-training model by taking the deviation between the minimum final recognition result and the actual label as an optimization target, wherein the training method specifically comprises the following steps:

fixing the network parameters of the first sub-recognition network in the pre-training model, and adjusting the network parameters in the second sub-recognition network and the network parameters in the weight network layer by taking the deviation between the minimized final recognition result and the actual label as an optimization target.

Optionally, the method further comprises:

When image data which is different from the type of the image data used for training the pre-training model is obtained, generating a plurality of new sub-recognition networks, deploying the new sub-recognition networks into the pre-training model, and performing dimension expansion on the weight network layer according to the new sub-recognition networks to obtain an updated pre-training model;

inputting the obtained image data which are different from the type of the image data used for training the pre-training model and the original image data into the updated pre-training model as expanded image data, determining the weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network through a weight network layer in the updated pre-training model, respectively recognizing the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network to obtain the recognition result corresponding to the expanded image data;

And training the updated pre-training model by taking the deviation between the identification result corresponding to the minimum expanded image data and the actual label corresponding to the expanded image data as an optimization target.

Optionally, training the updated pre-training model with an optimization objective that minimizes a deviation between the recognition result corresponding to the expanded image data and the actual label corresponding to the expanded image data, specifically including:

Fixing network parameters of original sub-recognition networks in the pre-training model and corresponding network parameters aiming at dimensions of the original sub-recognition networks in the weight network layer;

And adjusting network parameters in the new sub-recognition networks and network parameters corresponding to the expansion dimension of the new sub-recognition networks in the weight network layer by taking the deviation between the recognition result corresponding to the minimized expanded image data and the actual label corresponding to the expanded image data as an optimization target.

Aiming at the training of the nth round, obtaining a first loss value according to the deviation between the identification result corresponding to the expanded image data and the actual label corresponding to the expanded image data, which are obtained in the training of the nth round;

for each original sub-recognition network, determining a second loss value corresponding to the original sub-recognition network according to the deviation between the recognition result of the original sub-recognition network in the pre-training model for the expanded image data before training and the recognition result of the original sub-recognition network in the pre-training model for the expanded image data after the N-1 turn training;

And obtaining a total loss value according to the first loss value and the second loss value, and training the pre-training model for the Nth round by taking the minimum total loss value as an optimization target.

The specification provides a model training device based on mimicry structure dynamic defense, comprising:

The acquisition module is used for acquiring the pre-training model;

The generation module is used for inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image; determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image; generating interference data according to the gradient information corresponding to the reverse gradient direction of the gradient direction; adding the interference data into the first image to obtain a second image;

The weighting module is used for inputting the second image into the pre-training model, determining the weight corresponding to each sub-recognition network arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each sub-recognition network to obtain a final recognition result;

And the training module is used for training the pre-training model by taking the deviation between the minimum final recognition result and the actual label as an optimization target.

the training module is specifically configured to fix network parameters of the first sub-recognition network in the pre-training model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with a deviation between the minimized final recognition result and the actual tag as an optimization target.

Optionally, the training module is further configured to, when it is monitored that image data that is different from the type of the image data used for training the pre-training model is acquired, generate a plurality of new sub-recognition networks, deploy the new sub-recognition networks into the pre-training model, and dimension expand the weight network layer according to the new sub-recognition networks, so as to obtain an updated pre-training model; inputting the obtained image data which are different from the type of the image data used for training the pre-training model and the original image data into the updated pre-training model as expanded image data, determining the weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network through a weight network layer in the updated pre-training model, respectively recognizing the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network to obtain the recognition result corresponding to the expanded image data; and training the updated pre-training model by taking the deviation between the identification result corresponding to the minimum expanded image data and the actual label corresponding to the expanded image data as an optimization target.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described model training method based on mimicry structure dynamic defenses.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-described model training method based on mimicry structure dynamic defense when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

In the model training method based on the mimicry structure dynamic defense, a pre-training model is obtained, a first image used for training the pre-training model is input into the pre-training model, and a recognition result corresponding to the first image is obtained. And determining a second image according to the identification result corresponding to the first image and the actual label corresponding to the first image. Inputting the second image into a pre-training model, determining weights corresponding to all sub-recognition networks arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain all recognition results, weighting all the recognition results according to the determined weights corresponding to all the sub-recognition networks to obtain a final recognition result, and training the pre-training model by taking the deviation between the minimum final recognition result and an actual label as an optimization target.

According to the method, in the model training method based on the mimicry structure dynamic defense provided by the specification, a plurality of sub-recognition networks are arranged in the pre-training model, each sub-recognition network gives different recognition results for the input image data, and the weight network layer determines which sub-recognition model outputs the recognition result which has more convincing power according to the sample characteristics of the image data, so that the more accurate recognition result for the image data input into the pre-training model is obtained. In practical application, the variety of the resistance samples input into the pre-training model is increased continuously along with the time, but in the specification, along with a large number of samples input into the model, the model can generate a plurality of new sub-recognition networks according to the input samples, and the pre-training model is trained by continuously adding the sub-recognition networks, so that the pre-training model after training has stronger defensive performance in the risk recognition field.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

FIG. 1 is a schematic flow chart of a model training method based on simulated structure dynamic defense provided in the present specification;

FIG. 2 is a schematic diagram of a training process of a pre-training model provided in the present specification;

FIG. 3 is a schematic diagram of a training process of a pre-training model provided in the present specification;

FIG. 4 is a schematic diagram of a model training device based on a simulated structure dynamic defense provided in the present specification;

fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a model training method based on a simulated structure dynamic defense provided in the present specification, which includes the following steps:

s101: a pre-training model is obtained.

Deep learning, an important branch of the artificial intelligence field, has achieved significant achievements in various aspects such as image recognition, natural language processing, speech recognition, and the like. At present, when a neural network model is used for identifying an image, performance in complex environments such as noise, disturbance, resistance attack and the like is often unstable, and robustness is lacking. Especially when an antagonistic sample is present, the neural network model cannot accurately identify the image.

The resistance sample is a special sample in which a model makes an error in judgment of the resistance sample by adding a carefully designed disturbance or disturbance to input data in a machine learning model. These samples are typically modified slightly based on the training data so that the modified samples produce the expected erroneous decisions in the model. The presence of the resistance sample may have a serious impact on the performance of the machine learning model, resulting in the model failing to properly classify or identify data in some cases. Thus, under a resistance attack, the accuracy and stability of the model may be severely impacted. The manufacture of the resistance sample often requires a certain skill and knowledge in order to properly design and manufacture the perturbations or disturbances that can fool the model. Methods of making the resistance samples include, but are not limited to, adding noise, changing the color or brightness of an image, modifying the syntax or semantics of text, and the like.

In addition, more and more resistance samples appear with time, the variety of the resistance samples changes infinitely, and when the neural network model is trained by using the traditional training means, the neural network model can generate a catastrophic forgetting problem, so that the obtained image recognition result is inaccurate.

Based on the above, the specification provides a model training method based on the dynamic defense of a mimicry structure, which is to acquire a pre-training model, input a first image used for training the pre-training model into the pre-training model, and acquire a recognition result corresponding to the first image. And determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image, generating interference data according to the opposite gradient direction of the gradient direction corresponding to the gradient information, and adding the interference data into the first image to obtain a second image. Inputting the second image into a pre-training model, determining weights corresponding to all sub-recognition networks arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain all recognition results, weighting all the recognition results according to the determined weights corresponding to all the sub-recognition networks to obtain a final recognition result, and training the pre-training model by taking the deviation between the minimum final recognition result and an actual label as an optimization target.

The training mode can improve the robustness of the neural network model, and can ensure that the neural network model is safer and more reliable when facing to a resistance sample, and compared with the traditional training mode, the training mode improves the accuracy of the neural network model so as to obtain the identification result which is more similar to a real label.

In the present specification, an execution body for implementing a model training method based on a mimicry structure dynamic defense may be a terminal device such as a notebook computer, a tablet computer, or the like, and of course, may also be a server.

In the present specification, a server first obtains a pre-training model, where the pre-training model includes a weight network layer and one or more sub-recognition networks, where the weight network layer gives a corresponding weight to each sub-recognition network, and the sub-recognition networks give recognition results to input image data respectively.

S102: inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image.

The server may input the first image into the sub-recognition network corresponding to the first image in the pre-training model to train the sub-recognition network corresponding to the first image in the pre-training model, where the first image may be multiple, for example, normal image data and difficult image data.

The normal image is a normal image with high definition, and the difficult image includes various conditions, such as image blur, image main body defect, image main body ambiguity, and image tilt and overturn.

The difficult image data can be generated after the common image data is damaged by an algorithm, for example, noise processing is performed on the image, such as Gaussian noise, shot noise, impulse noise and the like; blurring processing is carried out on the image, such as defocusing blurring, frosted glass blurring, motion blurring, zooming blurring and the like; weather factors such as snow, frost, fog and the like are added in the image; the digital class of the image is adjusted, such as brightness, contrast, plasticity, pixelation, JPEG.

Further, the first image is input into the pre-training model, and a recognition result corresponding to the first image is obtained.

S103: and determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image.

The server inputs the first image into the pre-training model to obtain a recognition result corresponding to the first image, and then calculates a loss function, wherein the loss value is calculated by using cross entropy loss, and can be expressed as:

wherein the input of the pre-training model is Output is pair/>Is denoted as/>，/>I-th category representing real tags,/>Is the probability of the ith category in the image recognition result output by the pre-training model.

Further, for the loss function with respect to the inputThe gradient of (2) is calculated by the formula:

wherein, For/>Corresponding gradient information,/>For/>Corresponding recognition result,/>Is/>Corresponding real labels.

S104: and generating interference data according to the gradient information corresponding to the opposite gradient direction of the gradient direction.

In this specification, the server is according to the followingThe gradient information corresponds to the opposite gradient direction of the gradient direction, generates interference data, and inputs/>The disturbance is generated and can be formulated as:

wherein, For the generated interference data,/>As disturbance coefficient,/>The larger the disturbance degree is, the higher the disturbance degree is/(As a sign function, by calculating a loss function with respect to the input/>Gradient information of (1) >, use/>The sign function converts the sign function into the direction of the reverse gradient, and can generate interference data pointing to the loss increasing direction, thereby achieving the aim of resisting attack.

S105: and adding the interference data into the first image to obtain a second image.

The server adds the generated interference data to the first image to obtain a second image, which can be expressed as:

wherein, The perturbed image data is added for x, which is input into the pre-training model, that is,And adding interference data to the first image to obtain a second image.

S106: and inputting the second image into the pre-training model, determining weights corresponding to all sub-recognition networks arranged in the pre-training model through a weight network layer in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain all recognition results, and weighting all the recognition results according to the determined weights corresponding to all the sub-recognition networks to obtain a final recognition result.

The pre-training model comprises a weight network layer and sub-recognition networks, wherein each sub-recognition network comprises a first sub-recognition network and a second sub-recognition network, the first sub-recognition network is used for recognizing the image input into the first sub-recognition network through the learned recognition rule for recognizing the first image, and the second sub-recognition network is used for recognizing the image input into the second sub-recognition network through the learned recognition rule for recognizing the second image.

And the server inputs the second image into the pre-training model, and at the moment, the second image obtains the weight corresponding to each sub-recognition network arranged in the pre-training model through a weight network layer in the pre-training model. And respectively inputting the second image into each sub-recognition network in the pre-training model, respectively recognizing the second image through each sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each sub-recognition network to obtain a final recognition result.

It should be noted that for the second sub-recognition network in the pre-training model, it may be a fully initialized sub-recognition network, which is subsequently trained by the model training means mentioned in S107. The server may input the second image into the second sub-recognition network of the pre-training model, train the second sub-recognition network, so that the second sub-recognition network can recognize the image input into the second sub-recognition network through the learned recognition rule for recognizing the second image, and deploy the trained second sub-recognition network into the pre-training model, and further train the model training mode mentioned in S107.

In this specification, when the server trains the first sub-recognition network by using the first image and trains the second sub-recognition network by using the second image, a small amount of second images may be added to the first image, and similarly, a small amount of first images are added to the second images, so that the generalization capability of each sub-recognition network can be enhanced, and the overfitting phenomenon can be reduced.

S107: and training the pre-training model by taking the deviation between the minimum final recognition result and the actual label as an optimization target.

In the present specification, when the server inputs the second image into the pre-training model and trains the pre-training model, two training modes are provided, one mode is to train the pre-training model with the deviation between the minimum final recognition result and the actual label as an optimization target, and update parameters in the weight network layer and the first sub-recognition network and the second sub-recognition network in the pre-training model, that is, parameters of all network layers are to be adjusted in the training process, which is emphasized in the mode;

The other way is to fix the network parameters of the first sub-recognition network in the pre-training model, and adjust the network parameters of the second sub-recognition network and the network parameters of the weight network layer by taking the deviation between the minimized final recognition result and the actual label as an optimization target, i.e. the network parameters of the first sub-recognition network which are already trained are not needed to be adjusted, but only the parameters of the second sub-recognition network and the weight network layer are adjusted, wherein the first sub-recognition network in the pre-training model has learned the recognition rule for the first image in advance, that is, the server uses the first image to independently train the first sub-recognition network in advance, adjusts the network parameters of the first sub-recognition network according to the recognition result of the first sub-recognition network, so that the network parameters of the first sub-recognition network can be fixed in the training process of the pre-training model, and only the network parameters of the second sub-recognition network and the network parameters of the weight network layer can be adjusted.

In addition, when the server monitors that the pre-training model acquires image data which is different from the type of the image data used for training the pre-training model, specifically, in a certain period of time, the server monitors that the accuracy of the recognition result corresponding to the image data input into the pre-training model for the period of time is obviously reduced, then the type of the image data input into the pre-training model for the period of time can be considered to be different from the type of the image data input into the pre-training model before, so that the pre-training model needs to be trained according to the image data input into the pre-training model for the period of time, further, the server generates a plurality of new sub-recognition networks, deploys the new sub-recognition networks into the pre-training model, and dimension expands a weight network layer according to the new sub-recognition networks to obtain an updated pre-training model.

The server inputs the acquired image data which is not in accordance with the type of the image data used for training the pre-training model and the original image data as the expanded image data into the updated pre-training model, that is, after the server acquires the image data which is not in accordance with the type of the image data used for training the pre-training model, the image data which is not in accordance with the type of the image data used for training the pre-training model and the original image data are required to be fused to form a sample set, the image data in the sample set is the expanded image data, and the expanded image data is input into the updated pre-training model, wherein the original image data can comprise a first image and a second image. The method can prevent the phenomenon that a new sub-recognition network in the pre-training model is over-fitted, and can also prevent the original sub-recognition network in the pre-training model from being excessively deviated when the network parameters are updated to a certain extent.

Further, determining the weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network through a weight network layer in the updated pre-training model, respectively recognizing the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network to obtain the recognition result corresponding to the expanded image data. And the server trains the updated pre-training model by taking the deviation between the recognition result corresponding to the minimum expanded image data and the actual label corresponding to the expanded image data as an optimization target.

The training method of the pre-training model also relates to two ways, one is to train the updated pre-training model by taking the deviation between the recognition result corresponding to the minimized expanded image data and the actual label corresponding to the expanded image data as an optimization target, and update all network parameters of the weight network layer and the network parameters of each sub-recognition network in the pre-training model, as shown in fig. 2.

Fig. 2 is a schematic diagram of a training process of a pre-training model provided in the present specification.

The hatched portion in fig. 2 represents the portion involved in training, that is, in the first training manner provided in the present specification, all the network parameters of the weight network layer and the network parameters of each sub-identification network in the pre-training model may be updated to obtain a more fused weight network layer and each sub-identification network, so that the pre-training model is more accurate.

The other is to fix the network parameters of the original sub-recognition networks in the pre-training model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer, and adjust the network parameters of the new sub-recognition networks and the network parameters corresponding to the extended dimensions of the new sub-recognition networks in the weight network layer by taking the deviation between the recognition result corresponding to the minimized extended image data and the actual label corresponding to the extended image data as an optimization target, as shown in fig. 3.

Fig. 3 is a schematic diagram of a training process of a pre-training model provided in the present specification.

The hatched portion in fig. 3 represents the portion involved in training, that is, in the second training manner provided in this specification, the network parameters involved in the training process include the network parameters in each new sub-identification network and the network parameters corresponding to the expansion dimensions of each new sub-identification network in the weight network layer, so that the training pressure of the model can be reduced, only a part of the parameters in the pre-training model can be adjusted, and the problem that the pre-training model is catastrophically forgotten is reduced.

In the first training mode, because the network parameters in each sub-recognition network and all the network parameters in the weighted network layer are adjusted, in order to make the model training effect better, the server can calculate the loss functions of two parts to obtain two loss values, so that the total loss value is obtained according to the two loss values.

Specifically, for training of the nth round, the server obtains a first loss value according to a deviation between an identification result corresponding to the expanded image data obtained in the training of the nth round and an actual label corresponding to the expanded image data, which can be specifically expressed as:

wherein, For the first loss value, n represents the number of sub-identification networks,/>Representing a weight value output by the weight network layer for the ith sub-identification network, wherein/>Is passed/>Values after function processing, thereby ensuring/>。/>The function processing procedure can be specifically expressed as:

Further, since the original sub-recognition network in the pre-training model may be a sub-recognition network that has been trained, in order to prevent the problem of catastrophic forgetting caused by too far offset of network parameters in the pre-training model trained by the nth round and the pre-training model before training, the second loss value is calculated.

Specifically, for each original sub-recognition network, the server determines, according to a deviation between a recognition result obtained by the original sub-recognition network in the pre-training model before training for the expanded image data and a recognition result obtained by the original sub-recognition network in the pre-training model after the N-1 th round of training for the expanded image data, a second loss value corresponding to the original sub-recognition network, where the second loss value may be specifically expressed as:

wherein, Representing the recognition result obtained by each original sub-recognition network in the pre-training model before training aiming at the expanded image dataAnd representing the recognition result obtained by each original sub-recognition network in the pre-training model aiming at the expanded image data after the N-1 th round of training. Specifically, the specific calculation formula of the JS divergence is:

wherein M represents the average distribution of P and Q, and the specific calculation formula is as follows:

While The KL divergence is expressed, and the calculation formula is as follows:

Further, the server may obtain the second loss value through the above formula, and then obtain the total loss value according to the first loss value and the second loss value, which may be specifically expressed as:

wherein, And the quality adjustment factor is expressed and used for controlling the forgetting proportion of the sub-identification network. The server performs the training of the nth round on the pre-training model with the minimum total loss value as an optimization target.

When the server monitors that image data which is not identical to the type of the image data used for training the pre-training model is acquired, a plurality of new sub-recognition networks are generated, however, when more new sub-recognition networks are generated, the situation that the calculated amount of the pre-training model is increased is caused by respectively inputting the image data which is not identical to the type of the image data used for training the pre-training model and the original image data into all the sub-recognition networks as the expanded image data. Therefore, when the number of sub-recognition networks is large, the sub-recognition networks are aimed at non-passingWeighting of each sub-identification network of function processing/>Sampling, if the weight of the sub-identification network is within the first k, reserving, otherwise, directly setting the weight of the sub-identification network to be minus infinity.

The above formula also needs to be passed throughFunction processing, for non/>Because the weight corresponding to the sub-recognition network is minus infinity, the sub-recognition network is a part of the wayAfter the function processing, the weight corresponding to the sub-recognition network is set to 0, so that the calculated amount can be effectively reduced. For example, at/>/>, In samplingWhen the method is used, only the sub-recognition networks with the weight values of the sub-recognition networks in the first 3 are reserved, the server inputs the expanded image data into the reserved 3 sub-recognition networks respectively, and weights the 3 recognition results according to the weights corresponding to the 3 sub-recognition networks so as to obtain the recognition results corresponding to the expanded image data.

In the specification, a second image is obtained through the identification result corresponding to the first image and the actual label corresponding to the first image, and then the second image is input into the pre-training model, and the network parameters in the pre-training model are adjusted and updated to obtain a more accurate identification result.

When the server monitors that the type of the image data input into the pre-training model is different from the type of the image data used by the pre-training model, a plurality of new sub-recognition networks can be generated so as to increase the adaptability of the pre-training model to the environment, and further, the server can have good recognition results for various image data.

In the training process of the pre-training model, two training modes are provided, wherein one training mode is to adjust the weight network layer and the network parameters of each sub-recognition network in the pre-training model, and the mode can realize the deep fusion of the weight network layer and each sub-recognition network in the pre-training model so as to obtain the pre-training model with higher accuracy.

The other is to fix the network parameters of the original sub-recognition networks in the pre-training model and the network parameters corresponding to the dimensions of the original sub-recognition networks in the weight network layer, and only adjust the network parameters of the new sub-recognition networks and the network parameters corresponding to the expansion dimensions of the new sub-recognition networks in the weight network layer, so that the previous training effect can be reserved, and catastrophic forgetting is prevented.

The two modes are adapted to different scenes, training is needed to be carried out on the pre-training model according to different conditions in practical application, and a correct training mode can be selected according to input image data so as to obtain a more accurate pre-training model.

The method provided by the specification can be applied to various image recognition scenes, particularly in the field of risk recognition, for example, in the field of face recognition, whether the user has business risk can be judged by the collected face images, and the method provided by the specification can be applied to the field. According to the method, the sub-recognition network is automatically added along with the increase of the variety of the resistance sample, and the network parameters of each sub-recognition network and the weight network layer are adjusted in the training mode, so that an accurate recognition result can be obtained. Therefore, although the risk types gradually increase along with the passage of time, the method provided by the specification can train the pre-training model according to the sample characteristics of the image data, so that the pre-training model can learn the sub-recognition network aiming at the new risk, and the process of model dynamic defense is realized.

To further describe the methods provided in this specification, there is provided a network architecture of a pre-training model comprising three sub-recognition networks: A. b and C, a or B may be a first sub-identification network obtained by a first image and C may be a second sub-identification network obtained by a second image. Wherein a is mainly used for identifying ordinary image data, B is mainly used for identifying difficult image data, and C is mainly used for identifying contrast image data. A. B and C can identify three different image data and obtain corresponding results, in the pre-training process of the pre-training model, common image data, difficult image data and antagonistic image data are used for independently training A, B and C, the trained A, B and C are deployed into the pre-training model, a weight network layer is arranged in the pre-training model, and the pre-training model is trained, so that the weight network layer and three sub-recognition networks can learn learning characteristics of each other, and the pre-training model can obtain more reasonable recognition results according to the input of different image data. And meanwhile, under the action of a weight network layer, the identification results of the three sub-identification networks are fused, so that the obtained final identification result is more accurate, and the dynamic defense process is realized.

The above is one or more model training methods for implementing dynamic defense based on a mimicry structure in the present specification, and based on the same thought, the present specification further provides a corresponding model training device based on dynamic defense based on a mimicry structure, as shown in fig. 4.

Fig. 4 is a schematic diagram of a model training device based on a dynamic defense of a mimicry structure provided in the present specification, including:

an acquisition module 401, configured to acquire a pre-training model;

A generating module 402, configured to input a first image used for training the pre-training model into the pre-training model, to obtain a recognition result corresponding to the first image; determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image; generating interference data according to the gradient information corresponding to the reverse gradient direction of the gradient direction; adding the interference data into the first image to obtain a second image;

The weighting module 403 is configured to input the second image into the pre-training model, determine weights corresponding to the sub-recognition networks set in the pre-training model through a weight network layer in the pre-training model, respectively identify the second image through each sub-recognition network, obtain each recognition result, and weight each recognition result according to the determined weights corresponding to each sub-recognition network, so as to obtain a final recognition result;

And the training module 404 is configured to train the pre-training model with a deviation between the final recognition result and the actual label minimized as an optimization target.

The training module 404 is specifically configured to fix the network parameters of the first sub-recognition network in the pre-training model, and adjust the network parameters in the second sub-recognition network and the network parameters in the weight network layer with a deviation between the minimized final recognition result and the actual tag as an optimization target.

Optionally, the training module 404 is further configured to, when it is detected that image data that is different from the type of image data used for training the pre-training model is acquired, generate a plurality of new sub-recognition networks, deploy the new sub-recognition networks into the pre-training model, and dimension expand the weight network layer according to the new sub-recognition networks, so as to obtain an updated pre-training model; inputting the obtained image data which are different from the type of the image data used for training the pre-training model and the original image data into the updated pre-training model as expanded image data, determining the weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network through a weight network layer in the updated pre-training model, respectively recognizing the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network to obtain the recognition result corresponding to the expanded image data; and training the updated pre-training model by taking the deviation between the identification result corresponding to the minimum expanded image data and the actual label corresponding to the expanded image data as an optimization target.

Optionally, the training module 404 is specifically configured to fix network parameters of each original sub-recognition network in the pre-training model and network parameters corresponding to dimensions of each original sub-recognition network in the weight network layer; and adjusting network parameters in the new sub-recognition networks and network parameters corresponding to the expansion dimension of the new sub-recognition networks in the weight network layer by taking the deviation between the recognition result corresponding to the minimized expanded image data and the actual label corresponding to the expanded image data as an optimization target.

Optionally, the training module 404 is specifically configured to obtain, for the training of the nth round, a first loss value according to a deviation between the recognition result corresponding to the expanded image data obtained in the training of the nth round and the actual label corresponding to the expanded image data; for each original sub-recognition network, determining a second loss value corresponding to the original sub-recognition network according to the deviation between the recognition result of the original sub-recognition network in the pre-training model for the expanded image data before training and the recognition result of the original sub-recognition network in the pre-training model for the expanded image data after the N-1 turn training; and obtaining a total loss value according to the first loss value and the second loss value, and training the pre-training model for the Nth round by taking the minimum total loss value as an optimization target.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a model training method based on a mimicry structure dynamic defense as provided in fig. 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the model training method based on the dynamic defense of the mimicry structure, which is shown in the figure 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL（Advanced Boolean Expression Language）、AHDL（Altera Hardware Description Language）、Confluence、CUPL（Cornell University Programming Language）、HDCal、JHDL（Java Hardware Description Language）、Lava、Lola、MyHDL、PALASM、RHDL（Ruby Hardware Description Language）, and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A model training method based on the dynamic defense of a mimicry structure is characterized by comprising the following steps:

Obtaining a pre-training model;

generating disturbance data according to the gradient information corresponding to the reverse gradient direction of the gradient direction, wherein the direction of the gradient information is converted into the reverse gradient direction through a preset function, and the disturbance data is generated according to the converted gradient information and a preset disturbance coefficient, wherein the larger the disturbance coefficient is, the higher the disturbance degree of the converted gradient information is;

Adding the interference data into the first image to obtain a second image;

2. The method of claim 1, wherein each sub-recognition network includes a first sub-recognition network for recognizing the image input into the first sub-recognition network by the learned recognition rule for recognizing the first image and a second sub-recognition network for recognizing the image input into the second sub-recognition network by the learned recognition rule for recognizing the second image;

3. The method of claim 1, wherein the method further comprises:

4. The method of claim 3, wherein training the updated pre-training model with respect to minimizing a deviation between the recognition result corresponding to the extended image data and the actual label corresponding to the extended image data as an optimization objective specifically comprises:

5. The method of claim 3, wherein training the updated pre-training model with respect to minimizing a deviation between the recognition result corresponding to the extended image data and the actual label corresponding to the extended image data as an optimization objective specifically comprises:

6. Model training device based on mimicry structure dynamic defense, characterized by comprising:

The acquisition module is used for acquiring the pre-training model;

The generation module is used for inputting a first image used for training the pre-training model into the pre-training model to obtain a recognition result corresponding to the first image; determining gradient information corresponding to the first image according to the identification result corresponding to the first image and the actual label corresponding to the first image; generating disturbance data according to the gradient information corresponding to the reverse gradient direction of the gradient direction, wherein the direction of the gradient information is converted into the reverse gradient direction through a preset function, and the disturbance data is generated according to the converted gradient information and a preset disturbance coefficient, wherein the larger the disturbance coefficient is, the higher the disturbance degree of the converted gradient information is; adding the interference data into the first image to obtain a second image;

7. The apparatus of claim 6, wherein each sub-recognition network includes a first sub-recognition network for recognizing the image input into the first sub-recognition network by the learned recognition rule for recognizing the first image and a second sub-recognition network for recognizing the image input into the second sub-recognition network by the learned recognition rule for recognizing the second image;

8. The apparatus of claim 6, wherein the training module is further configured to, when acquiring image data that is not of a type that is used to train the pre-training model is monitored, generate a plurality of new sub-recognition networks, deploy the new sub-recognition networks into the pre-training model, and dimension expand the weight network layer according to the new sub-recognition networks to obtain an updated pre-training model; inputting the obtained image data which are different from the type of the image data used for training the pre-training model and the original image data into the updated pre-training model as expanded image data, determining the weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network through a weight network layer in the updated pre-training model, respectively recognizing the expanded image data through each original sub-recognition network and each new sub-recognition network to obtain each recognition result, and weighting each recognition result according to the determined weight corresponding to each original sub-recognition network and the weight corresponding to each new sub-recognition network to obtain the recognition result corresponding to the expanded image data; and training the updated pre-training model by taking the deviation between the identification result corresponding to the minimum expanded image data and the actual label corresponding to the expanded image data as an optimization target.

9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-5.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-5 when executing the program.