WO2023195120A1

WO2023195120A1 - Training device, training method, and training program

Info

Publication number: WO2023195120A1
Application number: PCT/JP2022/017240
Authority: WO
Inventors: 真徳山田
Original assignee: 日本電信電話株式会社
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2023-10-12

Abstract

A training device for training a model, in training a model for predicting a label of input data including an adversarial example, uses, as a loss function for calculating loss relative to weight in the model, a loss function regularized such that the value of the loss does not fall to or below a prescribed value b for any weight value. This makes it possible for the training device to smooth a loss landscape of the model. As a result, the training device is capable of training a model that is robust with respect to adversarial examples.

Description

Learning devices, learning methods, and learning programs

The present invention relates to a learning device, a learning method, and a learning program.

Conventionally, there is an attack called Adversarial Example that causes a classifier to make a false judgment by adding noise to the data to be classified. As a countermeasure against this Adversarial Example, there is, for example, Adversarial Training in which a model (classifier) is trained using an Adversarial Example.

However, models learned using Adversarial Training have a problem of low generalization performance. This is because the generalization performance of Deep Learning is higher as the loss landscape (the shape of the loss function) with respect to the model's weight is flatter, but learning using Adversarial Training sharpens the model's loss landscape.

Therefore, the present invention aims to solve the above-mentioned problems, improve the generalization performance of the model, and learn a model that is more robust against adversarial examples.

In order to solve the above problems, the present invention provides a method for learning a model for predicting a label of input data including an Adversarial Example. The present invention is characterized by comprising a learning processing unit that performs learning of the model using a loss function that is regularized so that the loss value does not become less than a predetermined value even if the loss value is a predetermined value.

According to the present invention, a model that is more robust to Adversarial Examples can be learned.

FIG. 1 is a diagram for explaining the loss landscape of the loss function used by the learning device. FIG. 2 is a diagram showing an example of the configuration of the learning device. FIG. 3 is a flowchart showing an example of the processing procedure of the learning device. FIG. 4 is a flowchart showing an example of the processing procedure of the learning device. FIG. 5 is a diagram showing an example of application of the learning device. FIG. 6 is a diagram showing experimental results for the model learned by the learning device. FIG. 7 is a diagram showing experimental results for the model learned by the learning device. FIG. 8 is a diagram showing an example of the configuration of a computer that executes a learning program.

Hereinafter, embodiments (embodiments) of the present invention will be described with reference to the drawings. Note that the present invention is not limited to the embodiments described below.

[Overview of learning device]
As mentioned above, the generalization performance of a model learned by Deep Learning is higher as the loss landscape (the shape of the loss function) with respect to the weight of the model is flatter. However, learning using conventional Adversarial Training sharpens the loss landscape of the model, so there is a problem in that it is not possible to improve the generalization performance of the model.

Therefore, in Adversarial Training, the learning device of this embodiment uses a loss function regularized so that the loss does not become less than a predetermined value, as a loss function for calculating the loss with respect to the weight of the model.

For example, as shown in FIG. 1, the learning device uses a loss function in which the loss bottoms out at a predetermined value (for example, b). This flattens the outline of the loss landscape. Therefore, by using the above loss function in Adversarial Training, the learning device can learn a model with high generalization performance. As a result, the learning device can learn a model that is robust to the Adversarial Example.

[Example of configuration of learning device]
A configuration example of the learning device 10 will be described using FIG. 2. The learning device 10 includes, for example, an input section 11, an output section 12, a communication control section 13, a storage section 14, and a control section 15.

The input unit 11 is an interface that accepts input of various data. For example, the input unit 11 receives input of data used for learning processing and prediction processing, which will be described later. The output unit 12 is an interface that outputs various data. For example, the output unit 12 outputs the label of the data predicted by the control unit 15.

The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and an external device such as a server via a network. For example, the communication control unit 13 controls communication between the control unit 15 and a data acquisition device (see FIG. 5) that acquires data to be studied.

The storage unit 14 is realized by a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk, and stores the parameters of the model learned by the learning process described later. be remembered.

The control unit 15 is realized using, for example, a CPU (Central Processing Unit) or the like, and executes a processing program stored in the storage unit 14. Thereby, the control unit 15 functions as an acquisition unit 15a, a learning unit 15b, and a prediction unit 15c, as illustrated in FIG.

The acquisition unit 15a acquires data used for learning processing and prediction processing, which will be described later, via the input unit 11 or the communication control unit 13.

The learning unit 15b performs Adversarial Training of a model for predicting labels of input data including Adversarial Examples. The learning unit 15b performs learning of the model using learning data including the Adversarial Example and a predetermined loss function (details will be described later). For example, the learning unit 15b determines the parameters (weight) of the model.

Here, an adversarial example for the weight of the above model is defined as in equation (1).

l in formula (1) is a loss function. Further, B(x, ε) is a set within a distance ε from x, and is a constraint used to make noise invisible to the human eye. Typically, the L∞ norm is used.

Additionally, Adversarial Training performed by the learning unit 15b is defined as in the following equation (2).

Note that the sharpness of the loss landscape is calculated by the following equation (3).

v in equation (3) is Gaussian noise randomly sampled from within the region shown in equation (4). l of w ^l is calculated for each layer, and the norm of the matrix is measured by the Frobenius norm.

The learning unit 15b uses a regularized loss function so that the loss value does not become less than a predetermined value no matter what value the weight is.

For example, the learning unit 15b performs learning of the model using a loss function shown in equation (5) below.

For example, the learning unit 15b uses the learning data acquired by the acquisition unit 15a to determine the weight of a model that minimizes the loss value of the loss function described above based on equation (6).

According to the above loss function, the loss value will not be less than the predetermined value no matter what value the weight is, so the loss landscape will bottom out at the predetermined value (for example, b) as shown in Figure 1. It becomes a flat shape. As a result, the learning unit 15b learns the model using the above loss function, thereby improving the generalization performance of the model.

The prediction unit 15c predicts the label of the input data using the model learned by the learning unit 15b. For example, the prediction unit 15c uses the learned model to calculate the probability of each label of newly acquired data, and outputs the label with the highest probability. Thereby, the learning device 10 can output a correct label even when the input data is Adversarial Example, for example.

[Learning process]
Next, an example of a learning processing procedure by the learning device 10 will be described with reference to FIG. 3. The process shown in FIG. 3 is started, for example, at the timing when an operation input instructing the start of the learning process is received.

First, the acquisition unit 15a acquires learning data including Adversarial Example (S1). Next, the learning unit 15b uses the learning data and the above loss function to learn a model representing the probability distribution of the labels of the input data (S2). The learning unit 15b stores the parameters of the model learned in S2 in the storage unit 14.

[Prediction processing]
Next, with reference to FIG. 4, an example of a process of predicting a label of input data by the learning device 10 will be described. The process shown in FIG. 4 is started, for example, at the timing when an operation input instructing the start of the prediction process is received.

First, the acquisition unit 15a acquires data for which a label is to be predicted (S11). Next, the prediction unit 15c predicts the label of the data acquired in S11 using the model learned by the learning unit 15b (S12). For example, the prediction unit 15c uses the learned model to calculate p(x') of the data x' acquired in S11, and outputs the label with the highest probability.

Thereby, for example, even if the data x' is an Adversarial Example, the learning device 10 can output a correct label.

[Application example of learning device]
The learning device 10 described above may be applied to data anomaly detection. An example of application in this case will be described with reference to FIG. Here, the case where the function of the prediction unit 15c described above is installed in the detection device 20 will be explained as an example.

For example, the learning device 10 performs model learning using teacher data (learning data) acquired from a data acquisition device and the loss function described above. After that, when the detection device 20 acquires new data x' from the data acquisition device, it calculates p(x') of the data x' using the learned model. Then, the detection device 20 outputs a report indicating whether the data x' is abnormal data based on the label with the highest probability.

[Experimental result]
Next, the results of an evaluation experiment of the model learned by the learning device 10 will be explained. The evaluation axis is the classification accuracy (0 to 1) of the data with Robust Acc: Adversarial Example.

[Experiment conditions]
Image dataset: cifar10
Deep learning model: Resnet18
Adversarial Example: PGD
PGD parameters: eps=8/255, train_iter=7, eval_iter=20, eps_iter=0.01, rand_init=True, clip_min=0.0, clip_max=1.0

[Experiment 1]
First, the results of Experiment 1 will be explained using FIG. 6. In Experiment 1, a model learned by the learning device 10 of this embodiment (b=1.2 in equation (6)), a model learned by existing AT (Adversarial Training), and a model learned by AWP (Adversarial Weight Perturbation) were used. We compared each Robust Acc. The vertical axis of the graph shown in FIG. 6 is Test Robust Acc, and the horizontal axis is the epoch of model learning.

As shown in FIG. 6, it has been confirmed that the model learned by the learning device 10 of this embodiment has a higher Robust Acc when the epoch is 400 or more than the model learned by the existing AT or the model learned by AWP. did it.

[Experiment 2]
Next, Experiment 2 will be explained using FIG. The purpose of Experiment 2 is to confirm that the height of Robust Acc depends on the constant b of the loss function used for model learning. In Experiment 2, we investigated the robustness of the model learned by adding a constant term (b) to the loss function in the existing AT, and the model learned using the loss function (see equation (5)) of the learning device 10 of this embodiment. We compared Acc. In each case, b was set to a value between 0 and 2. The vertical axis of the graph shown in FIG. 7 is Test Robust Acc, and the horizontal axis is the constant b of the loss function.

As shown in FIG. 7, the height of Robust Acc depends on the constant b of the loss function, whether the model is learned by the existing AT or the learning device 10 of this embodiment. It could be confirmed. Furthermore, by setting an appropriate value (for example, b=1, 1.2, 1.4) to the constant b of the loss function used by the learning device 10 of this embodiment, the Robust Acc of the model can be made higher than that of the existing AT. This was confirmed.

[System configuration, etc.]
Further, each component of each part shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. In other words, the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices can be functionally or physically distributed or integrated in arbitrary units depending on various loads, usage conditions, etc. Can be integrated and configured. Furthermore, all or any part of each processing function performed by each device may be realized by a CPU and a program executed by the CPU, or may be realized as hardware using wired logic.

Further, among the processes described in the embodiments described above, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. All or part of this can also be performed automatically using known methods. In addition, information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified.

[program]
The learning device 10 described above can be implemented by installing a program on a desired computer as packaged software or online software. For example, by causing the information processing device to execute the above program, the information processing device can be made to function as the learning device 10. The information processing device referred to here includes a desktop or notebook personal computer. In addition, information processing devices include mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as terminals such as PDAs (Personal Digital Assistants).

The learning device 10 can also be implemented as a server device that uses a terminal device used by a user as a client and provides services related to the above processing to the client. In this case, the server device may be implemented as a web server, or may be implemented as a cloud that provides services related to the above processing through outsourcing.

FIG. 8 is a diagram showing an example of a computer that executes a learning program. Computer 1000 includes, for example, a memory 1010 and a CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120. Video adapter 1060 is connected to display 1130, for example.

The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process executed by the learning device 10 described above is implemented as a program module 1093 in which computer-executable code is written. Program module 1093 is stored in hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration of the learning device 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Furthermore, the data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.

10 learning device 11 input section 12 output section 13 communication control section 14 storage section 15 control section 15a acquisition section 15b learning section 15c prediction section 20 detection device

Claims

In learning a model to predict the label of input data including Adversarial Example, the loss value is set to a predetermined value as a loss function to calculate the loss for the weight of the model, regardless of the value of the weight. A learning device comprising: a learning unit that learns the model using a loss function regularized so as not to become the following.
The loss function is expressed by the following equation (1) |L(x+η,y,w)-b|+b...Equation (1)
Where, x: input data to the model, y: predicted value of label output from the model, w: weight of the model, η: noise added to the input data η∈B(x,ε), b: constant The learning device according to claim 1, characterized in that:
The learning device according to claim 1, further comprising a prediction unit that predicts a label of input data using the model learned by the learning unit.
A learning method performed by a learning device, comprising:
In learning a model to predict the label of input data including Adversarial Example, the loss value is set to a predetermined value as a loss function to calculate the loss for the weight of the model, regardless of the value of the weight. A learning method characterized by comprising the step of learning the model using a loss function regularized so that it does not become the following.
In learning a model to predict the label of input data including Adversarial Example, the loss value is set to a predetermined value as a loss function to calculate the loss for the weight of the model, regardless of the value of the weight. A learning program for causing a computer to perform the step of learning the model using a loss function regularized so as not to become: