CN111539520A

CN111539520A - Method and device for enhancing robustness of deep learning model

Info

Publication number: CN111539520A
Application number: CN202010442837.7A
Authority: CN
Inventors: 邱伟峰
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-08-14
Also published as: WO2021233389A1

Abstract

The embodiment of the specification provides a method and a device for enhancing robustness of a deep learning model, wherein the method is applied to a deep learning module and comprises the following steps: extracting a sample feature vector of a target sample; repeatedly executing a plurality of setting operations; for the sample characteristic vector after random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after random inactivation to obtain a corresponding sub-prediction result; determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results; the setting operation comprises obtaining a sample feature vector and carrying out random inactivation treatment on the sample feature vector.

Description

Method and device for enhancing robustness of deep learning model

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for enhancing robustness of a deep learning model.

Background

With the rapid development of artificial intelligence technology, deep learning models capable of performing image recognition, speech recognition, natural language processing, and the like are also widely and rapidly popularized in various fields. With the wide application and popularization of the deep learning model in various fields, higher requirements are placed on the robustness and reliability of the deep learning model, especially in some fields with higher requirements on reliability, such as the automatic driving field and the medical field.

Although some solutions for improving the robustness of the deep learning model exist at present, for example, the model is trained again by using the disturbance samples, so that the calculation amount is large and the practicability is low. Therefore, it is necessary to provide a technical solution to improve the robustness of the deep learning model with less computation.

Disclosure of Invention

The embodiment of the specification provides a method for enhancing robustness of a deep learning model. The method is applied to a deep learning model and comprises the following steps:

and extracting a sample feature vector of the target sample. The setting operation is repeatedly performed a plurality of times. And the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain the sample characteristic vector after the random inactivation treatment. And for the sample characteristic vector after the random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation. Determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results. Wherein the prediction error is used to indicate a degree of reliability of the prediction result.

The embodiment of the specification also provides a device for enhancing the robustness of the deep learning model. Wherein, the device is applied to the deep learning model, and the device includes:

and the extraction module is used for extracting the sample characteristic vector of the target sample. And the execution module is used for repeatedly executing the setting operation for multiple times. And the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain the sample characteristic vector after the random inactivation treatment. And the prediction module is used for predicting the sample characteristic vector after the random inactivation treatment obtained by executing the setting operation each time to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation treatment. And the determining module is used for determining a prediction result of the target sample and a prediction error aiming at the prediction result according to the plurality of sub-prediction results. Wherein the prediction error is used to indicate a degree of reliability of the prediction result.

The embodiment of the specification also provides a device for enhancing the robustness of the deep learning model. The apparatus comprises a processor, and a memory arranged to store computer executable instructions that, when executed, cause the processor to: and extracting a sample feature vector of the target sample. The setting operation is repeatedly performed a plurality of times. And the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain the sample characteristic vector after the random inactivation treatment. And for the sample characteristic vector after the random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation. Determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results. Wherein the prediction error is used to indicate a degree of reliability of the prediction result.

The embodiment of the specification also provides a storage medium. The storage medium is used for storing computer executable instructions, and the executable instructions realize the following processes when executed: and extracting a sample feature vector of the target sample. The setting operation is repeatedly performed a plurality of times. And the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain the sample characteristic vector after the random inactivation treatment. And for the sample characteristic vector after the random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation. Determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results. Wherein the prediction error is used to indicate a degree of reliability of the prediction result.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart of a first method of enhancing robustness of a deep learning model provided in an embodiment of the present disclosure;

FIG. 2 is a flow chart of a second method of enhancing robustness of a deep learning model provided in an embodiment of the present disclosure;

FIG. 3 is a block diagram illustrating an exemplary configuration of an apparatus for enhancing robustness of a deep learning model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an apparatus for enhancing robustness of a deep learning model provided in an embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The idea of the embodiment of the present specification is that sample features extracted by a feature extractor of a deep learning network model are multiplexed, and a plurality of sample features repeatedly obtained from the feature extractor are subjected to random inactivation processing, so that a plurality of different sample features can be obtained, so that a plurality of predictions are performed using a plurality of different sample features, a prediction result and a prediction error corresponding to a sample are determined according to a result of the plurality of predictions, a reliability degree of the prediction result can be determined according to the prediction error of the sample, and thus, a prediction result with poor reliability is rejected, so that robustness of the deep learning model can be improved. Since the sample features extracted by the feature extractor are multiplexed, the operation of extracting the sample features does not need to be performed a plurality of times, thereby reducing the amount of calculation. Based on this idea, embodiments of the present specification provide a method, an apparatus, a device, and a storage medium for enhancing robustness of a deep learning model, which will be described in detail below.

Fig. 1 is a flowchart of a first method of a method for enhancing robustness of a deep learning model according to an embodiment of the present disclosure, where the method is applied to the deep learning model, and the method shown in fig. 1 includes at least the following steps:

step 102, extracting a sample feature vector of the target sample.

The target sample is a sample input to the deep learning model, for example, the sample may be a picture sample, a text sample, a voice sample, or the like, and the target sample may be any sample that can be processed by the deep learning model, and this is only an exemplary sample and does not constitute a limitation to the embodiments of this specification. In addition, it should be noted that the target sample may be a disturbed sample or a non-disturbed sample.

Step 104, repeatedly executing a plurality of setting operations; and the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain the sample characteristic vector after the random inactivation treatment.

The specific value of "multiple times" may be set according to an actual application scenario, and the specific value of "multiple times" is not limited in the embodiments of the present specification.

Optionally, in specific implementation, the sample feature vectors after the random inactivation treatment obtained by performing the random inactivation treatment on the sample feature vectors each time are different.

For ease of understanding, the following description will be given by way of example.

For example, in one embodiment, assuming that the sample feature vector extracted in step 102 is sample feature vector a, five setting operations are repeatedly performed, and the five setting operations are as follows:

firstly, obtaining a sample characteristic vector A, and carrying out random inactivation treatment on the sample characteristic vector A to obtain a sample characteristic vector B after the random inactivation treatment;

secondly, obtaining a sample characteristic vector A, and carrying out random inactivation treatment on the sample characteristic vector A to obtain a sample characteristic vector C after random inactivation treatment;

thirdly, obtaining a sample characteristic vector A, and carrying out random inactivation treatment on the sample characteristic vector A to obtain a sample characteristic vector D after the random inactivation treatment;

fourthly, obtaining a sample characteristic vector A, and carrying out random inactivation treatment on the sample characteristic vector A to obtain a sample characteristic vector E after the random inactivation treatment;

and fifthly, obtaining a sample characteristic vector A, and performing random inactivation treatment on the sample characteristic vector A to obtain a sample characteristic vector F after the random inactivation treatment.

The sample feature vector B, the sample feature vector C, the sample feature vector D, the sample feature vector E, and the sample feature vector F are different sample feature vectors. In fact, it can be understood that, by performing random inactivation on the sample feature vector a, on one hand, overfitting can be reduced, and on the other hand, a plurality of identical sample feature vectors a obtained repeatedly can be converted into a corresponding number of different sample feature vectors.

In addition, in the embodiment of the present specification, a plurality of sample feature vectors after the random inactivation process can be obtained by repeatedly performing the setting operation in step 104 a plurality of times, and the number of the obtained sample feature vectors after the random inactivation process is the same as the number of times of repeatedly performing the setting operation. For example, if the setting operation is repeatedly performed five times, the number of the obtained sample feature vectors after the random inactivation process is also five.

And 106, predicting the sample characteristic vector after the random inactivation treatment obtained by executing the setting operation each time by using the sample characteristic vector after the random inactivation treatment to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation treatment.

In the embodiment of the present specification, a sample feature vector after a random inactivation process is used for primary prediction, that is, one sample feature vector after a random inactivation process corresponds to one sub-prediction result. Therefore, in the embodiment of the present specification, the number of times of performing the setting operation, the number of sample feature vectors after the random inactivation process, and the number of sub prediction results are equal.

Step 108, determining a prediction result of the target sample and a prediction error aiming at the prediction result according to the plurality of sub-prediction results; wherein the prediction error is used to indicate the reliability of the prediction result.

In the embodiment of the present specification, since there are a plurality of sub-predictors, a prediction error of the predictor can be determined. Therefore, whether the prediction result of the target sample obtained by the deep learning model is a reliable result or not can be judged according to the prediction error, so that the prediction result can be rejected when the result obtained by the deep learning model is an unreliable result, and the robustness of the deep learning model can be improved; and, because the extracted sample feature vector is multiplexed, the process of extracting the sample feature for a plurality of times does not need to be executed, and the calculation amount is reduced.

Optionally, in a specific embodiment, each sub-prediction result may include a sub-prediction probability value that the target sample belongs to XX; the prediction result of the target sample may include a prediction probability value that the target sample belongs to XX. For example, if the target sample is an image and the image is an automobile, the sub-prediction result obtained by the deep learning model may be a sub-prediction probability value that the target sample belongs to the automobile; the prediction result of the target sample may include a prediction probability value that the target sample belongs to XX.

Optionally, in a specific embodiment, the prediction result includes a prediction probability value; correspondingly, in step 108, determining the prediction result of the target sample and the prediction error for the prediction result according to the plurality of sub-prediction results specifically includes:

determining the average value of the sub prediction probability values corresponding to the sub prediction results, and determining the average value as the prediction probability value of the target sample; and determining a standard deviation of the sub-prediction probability values corresponding to the plurality of sub-prediction results, and determining the standard deviation as the prediction error.

For example, continuing with the above example, the sub-prediction result 1 corresponding to the sample feature vector B, the sub-prediction result 2 corresponding to the sample feature vector C, the sub-prediction result 3 corresponding to the sample feature vector D, the sub-prediction result 4 corresponding to the sample feature vector E, and the sub-prediction result 5 corresponding to the sample feature vector F are obtained by performing prediction using the sample feature vector B, the sample feature vector C, the sample feature vector D, and the sample feature vector F, respectively.

Calculating the average value of the sub-prediction probability value corresponding to the sub-prediction result 1, the sub-prediction probability value corresponding to the sub-prediction result 2, the sub-prediction probability value corresponding to the sub-prediction result 3, the sub-prediction probability value corresponding to the sub-prediction result 4 and the sub-prediction probability value corresponding to the sub-prediction result 5, and taking the average value as the prediction probability value corresponding to the target sample; and calculating standard deviations of the sub-prediction probability value corresponding to the sub-prediction result 1, the sub-prediction probability value corresponding to the sub-prediction result 2, the sub-prediction probability value corresponding to the sub-prediction result 3, the sub-prediction probability value corresponding to the sub-prediction result 4 and the sub-prediction probability value corresponding to the sub-prediction result 5, and taking the standard deviations as prediction errors of the prediction results corresponding to the target samples.

Therefore, in one embodiment, the prediction result of the target sample output by the deep learning model may be in the form of P + M, where P represents the prediction probability value of XX for the target sample, and M represents the prediction error. The larger the value of the prediction error, the lower the reliability of characterizing the prediction. Of course, in some embodiments, an error threshold may be set, and the prediction result may be considered unreliable when the value of the prediction error is greater than the error threshold, and reliable when the prediction error is less than or equal to the error threshold.

Optionally, in a specific embodiment, the deep learning model may be composed of a feature extractor and a classifier;

accordingly, in step 102, extracting a sample feature vector of the target sample includes:

extracting a sample feature vector of a target sample through the feature extractor;

in step 104, the setting operation is repeatedly performed for a plurality of times, including: repeatedly performing a plurality of setting operations by the classifier;

in step 106, for the sample feature vector after the random inactivation obtained by performing the setting operation each time, performing prediction using the sample feature vector after the random inactivation to obtain a sub-prediction result corresponding to the sample feature vector after the random inactivation, including: for the sample characteristic vector after random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after random inactivation through the classifier to obtain a sub-prediction result corresponding to the sample characteristic vector after random inactivation;

in the step 108, determining the prediction result of the target sample and the prediction error for the prediction result according to the plurality of sub-prediction results includes:

and determining a prediction result of the target sample and a prediction error for the prediction result according to the plurality of sub-prediction results by the classifier.

Generally, for a target sample input into a deep learning model, the target sample is first passed through a feature extractor, the feature extractor extracts a sample feature vector of the target sample, and then the sample feature vector is transmitted to a classifier, so that the classifier performs subsequent operations on the sample feature vector of the target sample.

Optionally, in a specific embodiment, the classifier includes a random deactivation (dropout) layer and a full connection layer; and the dropout layer is positioned in front of the full connection layer;

accordingly, in step 104, the repeatedly performing the setting operation for a plurality of times includes:

repeatedly executing a plurality of setting operations through a dropout layer;

in step 106, for the sample feature vector after the random inactivation obtained by performing the setting operation each time, performing prediction using the sample feature vector after the random inactivation to obtain a sub-prediction result corresponding to the sample feature vector after the random inactivation, including: for the sample characteristic vector after the random inactivation obtained by the dropout layer executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation through the full-connection layer to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation;

and determining a target sample prediction result and a prediction error aiming at the prediction result according to the plurality of sub prediction results through the full-link layer.

In addition, in the embodiments of the present specification, the feature extractor may be a convolutional neural network, and the convolutional neural network includes a plurality of convolutional layers. In this way, the sample feature vector of the target sample can be extracted by performing convolution processing on the target sample a plurality of times.

Therefore, in the embodiment of the present specification, after a target sample is input to a deep learning model, the target sample is firstly subjected to convolution processing for multiple times through a convolution neural network of the deep learning model, so as to obtain a sample feature vector of the target sample; then, multiple setting operations are executed through a dropout layer, and for the sample characteristic vector after random inactivation output by each setting operation, the sample characteristic vector after random inactivation is used for prediction through a full connection layer to obtain a sub-prediction result corresponding to the sample characteristic vector after random inactivation; and calculating an average value of the plurality of sub-prediction results through the full link layer as a prediction result of the target sample, and determining a standard deviation of the plurality of sub-prediction results as a prediction error of the prediction result of the target sample.

In addition, it should be noted that, in an embodiment, the deep learning model used in the embodiment of the present specification may be obtained by training using a fine tuning (fine tune) method. Specifically, a pre-trained model (such as a ResNet network model, a VGG model, a MobileNets model, etc.) may be used as a backbone (backbone) network (where the backbone network does not include a dropout layer). And then, removing a full-connection layer at the top of the back bone network, and leaving the convolutional neural network as a feature extractor, wherein the feature extractor is used for extracting sample feature vectors of target samples of the input model, and in general, the sample feature vectors extracted by the feature extractor can be 512-dimensional or 1024-dimensional. And finally, adding a dropout layer and a full connection layer to form a new classifier. The classifier is typically a very simple one-or two-layer fully connected neural network.

Of course, in the process of training the deep learning model, only the newly added full-connection layer may be trained (that is, the method of fine tune is adopted), or all the weights of fine tune may be further trained after the full-connection layer is trained, which is not limited in the embodiments of this specification.

Fig. 2 is a flowchart of a second method of enhancing robustness of a learning depth model according to an embodiment of the present disclosure, where the method shown in fig. 2 at least includes the following steps:

step 202, obtaining a target sample input into the deep learning model through a convolutional neural network layer of the deep learning model, and extracting a sample feature vector of the target sample.

Step 204, executing multiple setting operations through a dropout layer; the setting operation comprises the steps of obtaining a sample feature vector from a convolutional neural network layer, and carrying out dropout processing on the sample feature vector to obtain a sample feature vector after the dropout processing.

And step 206, acquiring a plurality of sample feature vectors processed by dropout from the dropout layer through the full connection layer, and predicting the sample feature vectors processed by dropout to obtain a plurality of sub-prediction results.

And each sub-result comprises a sub-prediction probability value of the XX belonging to the target sample.

Step 208, calculating the average value of the sub-prediction probability values corresponding to the plurality of sub-prediction results through the full-connection layer, and taking the average value as the prediction result of the target sample; and calculating standard deviation of sub-prediction probability values corresponding to the plurality of sub-prediction results, and taking the standard deviation as a prediction error of the prediction result of the target sample.

In the method for enhancing robustness of a deep learning model provided in the embodiment of the present specification, sample features extracted by a feature extractor of a deep learning network model are multiplexed, and a plurality of sample features repeatedly obtained from the feature extractor are subjected to random inactivation processing, so that a plurality of different sample features can be obtained, so that a plurality of predictions are performed using the plurality of different sample features, a prediction result and a prediction error corresponding to a sample are determined according to the results of the plurality of predictions, and the reliability of the prediction result can be determined according to the prediction error of the sample, so that the prediction result with poor reliability is rejected, and thus, the robustness of the deep learning model can be improved. The sample features extracted by the feature extractor are multiplexed, so that the operation of extracting the sample features is not required to be performed for multiple times, the calculation amount is reduced, and the robustness of the deep learning model can be enhanced through less calculation amount.

Corresponding to the methods provided by the embodiments shown in fig. 1 to fig. 2, based on the same idea, an embodiment of the present specification further provides an apparatus for enhancing robustness of a deep learning model, which is used to implement the method for enhancing robustness of the deep learning model, which is enhanced by the embodiments shown in fig. 1 to fig. 2. Fig. 3 is a schematic block diagram illustrating a module composition of an apparatus for enhancing robustness of a deep learning model according to an embodiment of the present disclosure, where the apparatus shown in fig. 3 at least includes:

an extracting module 302, configured to extract a sample feature vector of a target sample;

an execution module 304, configured to repeatedly execute multiple setting operations; the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain a sample characteristic vector after the random inactivation treatment;

a prediction module 306, configured to perform prediction on the randomly inactivated sample feature vector obtained by performing the setting operation each time, using the randomly inactivated sample feature vector to obtain a sub-prediction result corresponding to the randomly inactivated sample feature vector;

a determining module 308, configured to determine a prediction result of the target sample and a prediction error for the prediction result according to a plurality of the sub-prediction results; wherein the prediction error is used to indicate the reliability of the prediction result.

Optionally, the prediction result includes a prediction probability value;

correspondingly, the determining module 308 is specifically configured to:

determining an average value of sub-prediction probability values corresponding to a plurality of sub-prediction results, and determining the average value as the prediction probability value; and determining a standard deviation of the sub prediction probability values corresponding to the plurality of sub prediction results, and determining the standard deviation as the prediction error.

Optionally, the deep learning model is composed of a feature extractor and a classifier;

correspondingly, the extracting module 302 is specifically configured to: extracting a sample feature vector of the target sample through the feature extractor;

the execution module 304 is specifically configured to: repeatedly performing a plurality of setting operations by the classifier;

the prediction module 306 is specifically configured to:

for the sample feature vector after the random inactivation obtained by executing the setting operation each time, predicting by the classifier by using the sample feature vector after the random inactivation to obtain a sub-prediction result corresponding to the sample feature vector after the random inactivation;

the determining module 308 is specifically configured to:

and determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results by the classifier.

Optionally, the classifier includes a random deactivation dropout layer and a full connection layer;

correspondingly, the executing module 304 is further specifically configured to: repeatedly executing the setting operation for a plurality of times through the dropout layer;

the prediction module 306 is further specifically configured to:

performing prediction on the randomly inactivated sample feature vector obtained by executing the setting operation on the dropout layer each time by using the randomly inactivated sample feature vector through the full-connection layer to obtain a sub-prediction result corresponding to the randomly inactivated sample feature vector;

the determining module 308 is further specifically configured to:

and determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results through the all-connected layer.

Optionally, the feature extractor is a convolutional neural network, and the convolutional neural network includes a plurality of convolutional layers.

It should be noted that the apparatus for enhancing robustness of a deep learning model provided in the embodiment of the present specification and the method for enhancing robustness of a deep learning model provided in the embodiment of the present specification are based on the same inventive concept, and therefore, specific implementation of the embodiment of the apparatus may refer to implementation of the method for enhancing robustness of a deep learning model, and repeated details are not repeated.

The device for enhancing robustness of the deep learning model provided in the embodiment of the specification multiplexes sample features extracted by a feature extractor of the deep learning network model, randomly inactivates a plurality of sample features repeatedly acquired from the feature extractor, so that a plurality of different sample features can be obtained, multiple predictions are performed by using the plurality of different sample features, a prediction result and a prediction error corresponding to a sample are determined according to the multiple prediction results, the reliability of the prediction result can be determined according to the prediction error of the sample, and thus the prediction result with poor reliability is rejected, and robustness of the deep learning model can be improved. The sample features extracted by the feature extractor are multiplexed, so that the operation of extracting the sample features is not required to be performed for multiple times, the calculation amount is reduced, and the robustness of the deep learning model can be enhanced through less calculation amount.

Further, based on the methods shown in fig. 1 to fig. 2, an embodiment of the present specification further provides an apparatus for enhancing robustness of a deep learning model, as shown in fig. 4.

The apparatus for enhancing robustness of the deep learning model may have a large difference due to different configurations or performances, and may include one or more processors 401 and a memory 402, where the memory 402 may store one or more stored applications or data. Wherein memory 402 may be transient or persistent. The application program stored in memory 402 may include one or more modules (not shown), each of which may include a series of computer-executable instruction information in a device for enhancing robustness of a deep learning model. Still further, the processor 401 may be configured to communicate with the memory 402 to execute a series of computer-executable instruction information in the memory 402 on a device that enhances the robustness of the deep learning model. The apparatus to enhance the robustness of the deep learning model may also include one or more power supplies 403, one or more wired or wireless network interfaces 404, one or more input-output interfaces 405, one or more keyboards 406, and the like.

In a particular embodiment, an apparatus for enhancing robustness of a deep learning model includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instruction information in the apparatus for enhancing robustness of a deep learning model, and the one or more programs configured to be executed by one or more processors include computer-executable instruction information for:

extracting a sample feature vector of a target sample;

repeatedly executing a plurality of setting operations; the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain a sample characteristic vector after the random inactivation treatment;

performing prediction on the sample feature vector after the random inactivation obtained by executing the setting operation each time by using the sample feature vector after the random inactivation to obtain a sub-prediction result corresponding to the sample feature vector after the random inactivation;

determining a prediction result of the target sample and a prediction error for the prediction result from a plurality of the sub-prediction results; wherein the prediction error is used to indicate the reliability of the prediction result.

Optionally, when the computer executable instruction information is executed, the prediction result includes a prediction probability value;

accordingly, the determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results includes:

Optionally, when the computer executable instruction information is executed, the deep learning model is composed of a feature extractor and a classifier;

correspondingly, the extracting of the sample feature vector of the target sample includes:

extracting a sample feature vector of the target sample through the feature extractor;

the repeatedly executing the setting operation for a plurality of times includes: repeatedly performing a plurality of setting operations by the classifier;

the method for obtaining a sub-prediction result corresponding to the sample feature vector after the random inactivation process by performing prediction using the sample feature vector after the random inactivation process for each sample feature vector after the random inactivation process obtained by performing the setting operation includes:

the determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results includes:

Optionally, when the computer executable instruction information is executed, the classifier includes a random deactivation dropout layer and a full connection layer;

the repeatedly executing the setting operation for a plurality of times includes:

repeatedly executing the setting operation for a plurality of times through the dropout layer;

Optionally, when the computer executable instruction information is executed, the feature extractor is a convolutional neural network, and the convolutional neural network includes a plurality of convolutional layers.

The device for enhancing robustness of the deep learning model provided in the embodiment of the present specification multiplexes sample features extracted by a feature extractor of the deep learning network model, and performs random inactivation processing on a plurality of sample features repeatedly obtained from the feature extractor, so that a plurality of different sample features can be obtained, and thus a plurality of predictions are performed using a plurality of different sample features, a prediction result and a prediction error corresponding to a sample are determined according to the results of the plurality of predictions, and the reliability of the prediction result can be determined according to the prediction error of the sample, so that the prediction result with poor reliability is rejected, and thus the robustness of the deep learning model can be improved. The sample features extracted by the feature extractor are multiplexed, so that the operation of extracting the sample features is not required to be performed for multiple times, the calculation amount is reduced, and the robustness of the deep learning model can be enhanced through less calculation amount.

Further, based on the methods shown in fig. 1 to fig. 2, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and when executed by a processor, the storage medium stores computer-executable instruction information that implements the following processes:

extracting a sample feature vector of a target sample;

Optionally, the storage medium stores computer-executable instruction information, which when executed by the processor, the prediction result includes a prediction probability value;

Optionally, when the computer-executable instruction information stored in the storage medium is executed by the processor, the deep learning model is composed of a feature extractor and a classifier;

Optionally, when the computer-executable instruction information stored in the storage medium is executed by the processor, the classifier includes a random deactivation dropout layer and a full connection layer;

Optionally, when the storage medium stores computer-executable instruction information and the processor executes the computer-executable instruction information, the feature extractor is a convolutional neural network, and the convolutional neural network includes a plurality of convolutional layers.

When the computer-executable instruction information stored in the storage medium provided in the embodiment of the present specification is executed by the processor, the sample features extracted by the feature extractor of the deep learning network model are multiplexed, and the plurality of sample features repeatedly obtained from the feature extractor are randomly inactivated, so that a plurality of different sample features can be obtained, so that a plurality of predictions are performed using the plurality of different sample features, a prediction result and a prediction error corresponding to a sample are determined according to the result of the plurality of predictions, and the reliability of the prediction result can be determined according to the prediction error of the sample, so that the prediction result with poor reliability is rejected, and thus, the robustness of the deep learning model can be improved. The sample features extracted by the feature extractor are multiplexed, so that the operation of extracting the sample features is not required to be performed for multiple times, the calculation amount is reduced, and the robustness of the deep learning model can be enhanced through less calculation amount.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instruction information. These computer program instruction information may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instruction information executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instruction information may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instruction information stored in the computer-readable memory produce an article of manufacture including instruction information means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instruction information may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instruction information executed on the computer or other programmable apparatus provides steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instruction information, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The application may be described in the general context of computer-executable instruction information, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for enhancing robustness of a deep learning model, which is applied to the deep learning model, comprises the following steps:

extracting a sample feature vector of a target sample;

for the sample characteristic vector after the random inactivation obtained by executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation;

determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results; wherein the prediction error is used to indicate a degree of reliability of the prediction result.

2. The method of claim 1, the prediction comprising a prediction probability value;

correspondingly, the determining the prediction result of the target sample and the prediction error for the prediction result according to the plurality of sub-prediction results comprises:

determining an average value of sub-prediction probability values corresponding to the sub-prediction results, and determining the average value as the prediction probability value; and determining standard deviation of sub-prediction probability values corresponding to the sub-prediction results, and determining the standard deviation as the prediction error.

3. The method of claim 1 or 2, the deep learning model consisting of a feature extractor and a classifier;

correspondingly, the extracting the sample feature vector of the target sample comprises:

extracting, by the feature extractor, a sample feature vector of the target sample;

the repeatedly executing multiple setting operations includes: repeatedly performing a plurality of setting operations by the classifier;

the obtaining sub-prediction results corresponding to the sample feature vector after the random inactivation by using the sample feature vector after the random inactivation to perform prediction with respect to the sample feature vector after the random inactivation obtained by performing the setting operation each time includes:

for the sample feature vector after the random inactivation obtained by executing the setting operation each time, predicting by using the sample feature vector after the random inactivation through the classifier to obtain a sub-prediction result corresponding to the sample feature vector after the random inactivation;

the determining a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results comprises:

determining, by the classifier, a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results.

4. The method of claim 3, the classifier comprising a randomly deactivated dropout layer and a fully connected layer;

the repeatedly executing multiple setting operations includes:

repeatedly executing the setting operation for multiple times through the dropout layer;

for the sample characteristic vector obtained by the dropout layer after the random inactivation treatment and obtained by executing the setting operation each time, predicting by using the sample characteristic vector after the random inactivation treatment through the full-connection layer to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation treatment;

determining, by the fully-connected layer, a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results.

5. The method of claim 3, the feature extractor is a convolutional neural network, and the convolutional neural network comprises a plurality of convolutional layers.

6. An apparatus for enhancing robustness of a deep learning model, applied to the deep learning model, the apparatus comprising:

the extraction module is used for extracting a sample characteristic vector of a target sample;

the execution module repeatedly executes a plurality of setting operations; the setting operation comprises the steps of obtaining the sample characteristic vector, and carrying out random inactivation treatment on the sample characteristic vector to obtain a sample characteristic vector after the random inactivation treatment;

the prediction module is used for predicting the sample characteristic vector after the random inactivation treatment obtained by executing the setting operation each time to obtain a sub-prediction result corresponding to the sample characteristic vector after the random inactivation treatment;

a determination module that determines a prediction result of the target sample and a prediction error for the prediction result from the plurality of sub-prediction results; wherein the prediction error is used to indicate a degree of reliability of the prediction result.

7. The apparatus of claim 6, the prediction comprising a prediction probability value;

correspondingly, the determining module is specifically configured to:

8. The apparatus of claim 6 or 7, the deep learning model is comprised of a feature extractor and a classifier;

correspondingly, the extraction module is specifically configured to: extracting, by the feature extractor, a sample feature vector of the target sample;

the execution module is specifically configured to: repeatedly performing a plurality of setting operations by the classifier;

the prediction module is specifically configured to:

the determining module is specifically configured to:

9. The apparatus of claim 8, the classifier comprising a randomly deactivated dropout layer and a fully connected layer;

correspondingly, the execution module is further specifically configured to: repeatedly executing the setting operation for multiple times through the dropout layer;

the prediction module is further specifically configured to:

the determining module is further specifically configured to:

10. The apparatus of claim 8, the feature extractor is a convolutional neural network, and the convolutional neural network comprises a plurality of convolutional layers.

11. An apparatus for enhancing robustness of a deep learning model, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

extracting a sample feature vector of a target sample;

12. A storage medium storing computer-executable instructions that, when executed, implement the following:

extracting a sample feature vector of a target sample;