WO2022070342A1

WO2022070342A1 - Learning device, learning method, and learning program

Info

Publication number: WO2022070342A1
Application number: PCT/JP2020/037256
Authority: WO
Inventors: 真弥山口; 関利金井
Original assignee: 日本電信電話株式会社
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-04-07
Also published as: JPWO2022070342A1

Abstract

This learning device includes a conversion unit (132) that converts first data into a first frequency component and converts second data generated by a generator constituting an adversarial learning model into a second frequency component. A calculation unit (133) calculates the error between the first frequency component and the second frequency component. An update unit (134) updates a parameter of the generator so as to reduce the error calculated by the calculation unit (133).

Description

Learning equipment, learning methods and learning programs

The present invention relates to a learning device, a learning method and a learning program.

Conventionally, it is a technology based on deep learning technology, and a deep generative model that generates a sample close to the real thing by learning the distribution of trained data is known. For example, GAN (Generative Adversarial Networks) is known as a deep learning model (see, for example, Non-Patent Document 1).

However, the conventional technology has a problem that overfitting may occur and the accuracy of the model may not be improved. For example, the sample generated by the trained GAN generator contains high frequency components that are not included in the actual training data. As a result, the discriminator becomes dependent on the high frequency component to perform authenticity determination, and overfitting may occur.

In order to solve the above-mentioned problems and achieve the purpose, the learning device converts the first data into the first frequency component and converts the second data generated by the generator constituting the hostile learning model into the first frequency component. A conversion unit that converts to a second frequency component, a calculation unit that calculates an error between the first frequency component and the second frequency component, and an error calculated by the calculation unit are reduced. It is characterized by having an update unit for updating the parameters of the generator.

According to the present invention, it is possible to suppress the occurrence of overfitting and improve the accuracy of the model.

FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment. FIG. 2 is a diagram illustrating the influence of high frequency components. FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment. FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment. FIG. 5 is a diagram showing the results of the experiment. FIG. 6 is a diagram showing the results of the experiment. FIG. 7 is a diagram showing the results of the experiment. FIG. 8 is a diagram showing an example of a computer that executes a learning program.

Hereinafter, the learning device, the learning method, and the embodiment of the learning program according to the present application will be described in detail based on the drawings. The present invention is not limited to the embodiments described below.

GAN is a technique for learning the data distribution p_data (x) by two deep learning models, a generator G and a classifier D. G learns to deceive D, and D learns to distinguish G from the training data. A model in which such a plurality of models are in a hostile relationship may be called a hostile learning model.

Hostile learning models such as GAN are used in the generation of images, texts, sounds and the like.
Reference 1: Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
Reference 2: Donahue, Chris, Julian McAuley, and Miller Puckette. "Adversarial audio synthesis." ArXiv preprint arXiv: 1802.04208 (2018). (ICLR 2019)
Reference 3: Yu, Lantao, et al. "Seqgan: Sequence generative adversarial nets with policy gradient." Thirty-first AAAI conference on artificial intelligence. 2017. (AAAI 2017)

Here, GAN has a problem that D overfits the learning sample as the learning progresses. As a result, each model cannot be meaningfully updated for data generation, and the quality of generation by the generator deteriorates. This is shown, for example, in Figure 1 of Reference 4.
Reference 4: Karras, Tero, et al. "Training Generative Adversarial Networks with Limited Data." ArXiv preprint arXiv: 2006.06676 (2020).

Further, Reference 5 describes that the trained CNN output is predicted depending on the high frequency component of the input.
Reference 5: Wang, Haohan, et al. "High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)

Further, Reference 6 describes that the neural network constituting the GAN generator G and the classifier D tends to learn in the order of low frequency and high frequency.
Reference 6: Rahaman, Nasim, et al. "On the spectral bias of neural networks." International Conference on Machine Learning. 2019. (ICML 2019)

Therefore, in the first embodiment, one purpose is to suppress the occurrence of overfitting and improve the accuracy of the model by reducing the influence of the high frequency component of the data on the generator G and the classifier D. do. FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment. Further, FIG. 2 is a diagram illustrating the influence of the high frequency component.

As shown in FIG. 2, the CIFAR-10 (two-dimensional power spectrum) is different between the actual data (Real) and the data generated by the generator (GAN). Further, Reference 7 shows that the data generated by various GANs has an increased power spectrum at a high frequency as compared with the actual data.
Reference 7: Durall, Ricard, Margret Keuper, and Janis Keuper. "Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)

Returning to FIG. 1, in the deep learning model of the present embodiment, the classifier D will be used for the data (Real) included in the actual data set X and the data (Fake) generated by the generator G from the random number z. Identifies whether the data in is Real (or Fake).

In the conventional GAN, the discriminator D is optimized so that the discriminating accuracy of the discriminator D is improved, that is, the probability that the discriminator D distinguishes Real from Real is increased. Further, the generator G is optimized so that the ability of the generator G to deceive the generator G, that is, the probability that the discriminator D distinguishes Real from Fake increases.

In this embodiment, in addition to the above optimization, the generator G is optimized so that the frequency components of Real and Fake match. Hereinafter, the details of the learning process of the deep learning model will be described together with the configuration of the learning device of the present embodiment.

[Structure of the first embodiment]
FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment. The learning device 10 accepts input of data for learning and updates the parameters of the deep learning model. Further, the learning device 10 may output the updated parameters. As shown in FIG. 3, the learning device 10 has an input / output unit 11, a storage unit 12, and a control unit 13.

The input / output unit 11 is an interface for inputting / outputting data. For example, the input / output unit 11 may be a communication interface such as a NIC (Network Interface Card) for performing data communication with another device via a network. Further, the input / output unit 11 may be an interface for connecting an input device such as a mouse and a keyboard, and an output device such as a display.

The storage unit 12 is a storage device for an HDD (Hard Disk Drive), SSD (Solid State Drive), optical disk, or the like. The storage unit 12 may be a semiconductor memory in which data such as RAM (Random Access Memory), flash memory, NVSRAM (Non Volatile Static Random Access Memory) can be rewritten. The storage unit 12 stores an OS (Operating System) and various programs executed by the learning device 10. Further, the storage unit 12 stores the model information 121.

The model information 121 is information such as parameters for constructing a deep learning model, and is appropriately updated in the learning process. Further, the updated model information 121 may be output to another device or the like via the input / output unit 11.

The control unit 13 controls the entire learning device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. It is an integrated circuit. Further, the control unit 13 has an internal memory for storing programs and control data that specify various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units by operating various programs. For example, the control unit 13 has a generation unit 131, a conversion unit 132, a calculation unit 133, and an update unit 134.

The generation unit 131 inputs the random number z to the generator G and generates the second data.

The conversion unit 132 converts the first data into the first frequency component, and converts the second data generated by the generator G constituting the hostile learning model into the second frequency component.

The conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. This is to enable the parameter update by the inverse error propagation method. For example, the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform (DFT: discrete Fourier transform) or a discrete cosine transform (DCT: discrete cosine transform).

The calculation unit 133 calculates the error between the first frequency component and the second frequency component. The calculation unit 133 can calculate the error by any method such as MSE (mean square error, Mean Square Error), RMSE (mean squared error, Root Mean Square Error), L1 and the like. Here, it is assumed that the calculation unit 133 calculates the error L _freq as in the equation (1).

Here, X _real and X _fake are batches of Real and Fake, respectively. Further, | X _real | and | X _fake | are the respective batch sizes. Real is real data. Fake is data generated by the generator G.

Further, F (・) is a function that converts data in the spatial region into frequency components. x ^real _i and x ^fake _j are the i-th data of X _real and the j-th data of X _fake , respectively, and are examples of the first data and the second data. Further, F (x ^real _i ) corresponds to the first frequency component. Further, F (x ^fake _j ) corresponds to the second frequency component.

As described above, the calculation unit 133 is obtained by converting each of the batch average of the plurality of first frequency components obtained by converting each of the plurality of first data and the plurality of second data. Calculate the error between the batch averages of the plurality of second frequency components. That is, the error here corresponds to the error between batch averages, not the error between single data samples.

Further, the calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the first data and the second data by the discriminator constituting the hostile learning model become larger. The loss function _LG , which increases as the discrimination accuracy decreases, is calculated as in Eq. (2). λ is a hyperparameter that functions as a weight.

G (・) is a function that outputs the data (Fake) generated by the generator G based on the argument. Further, D (.) Is a function that outputs the probability of identifying the data input as an argument by the classifier D as Real.

The update unit 134 updates the parameters of the generator G so that the error calculated by the calculation unit 133 becomes small. Specifically, the update unit 134 updates the parameters of the generator _G so that the loss function LG is optimized.

Further, the update unit 134 updates the parameters of the classifier D so that the loss function of the equation (3) is optimized. Here, x is real data.

[Processing of the first embodiment]
FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment. As shown in FIG. 4, first, the learning device 10 reads the learning data (step S101). Here, the learning device 10 reads existing data (Real) as learning data.

Next, the learning device 10 samples a random number z from the normal distribution and generates a sample (Fake) by G (z) (step S102). Further, the learning device 10 converts Real and Fake into frequency components by DCT or DFT, and calculates the batch average of the frequency components (step S103).

Here, the learning device 10 calculates the GAN loss function of the generator G (step S104). The GAN loss of the generator G corresponds to the first term on the right side of the equation (2). Then, the learning device 10 calculates the frequency component matching loss from the batch average of the Real-Fake frequency components (step S105). The frequency component matching loss corresponds to L _freq in Eq. (1).

Further, the learning device 10 calculates the sum of the GAN loss function and the frequency component matching loss with respect to G as the total loss (step S106). The total loss corresponds to _LG in equation (2). The learning device 10 may multiply the frequency component matching loss by the weight λ. The learning device 10 updates the parameters of the generator G by the inverse error propagation method of the total loss (step S107).

Further, the learning device 10 learns the classifier D (step S108). Specifically, the learning device 10 updates the parameters of the classifier D by the inverse error propagation method of the loss function of the equation (3).

At this time, if the maximum number of learning steps> the number of learning steps (step S109, True), the learning device 10 returns to step S101 and repeats the process. On the other hand, when the maximum number of learning steps is not equal to the number of learning steps (step S109, False), the learning device 10 ends the process.

[Effect of the first embodiment]
As described above, the conversion unit 132 converts the first data into the first frequency component, and the second data generated by the generator constituting the hostile learning model is converted into the second frequency component. Convert to. The calculation unit 133 calculates the error between the first frequency component and the second frequency component. The updater 134 updates the generator parameters so that the error calculated by the calculator 133 is smaller. In this way, the learning device 10 can reflect the influence of the frequency component on the learning. Thereby, according to the present embodiment, it is possible to suppress the occurrence of overfitting and improve the accuracy of the model.

The conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. For example, the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform or a discrete cosine transform. This makes it possible to update the parameters by the inverse error propagation method in the present embodiment.

The calculation unit 133 includes a batch average of a plurality of first frequency components obtained by converting each of the plurality of first data, and a plurality of second data obtained by converting each of the plurality of second data. Calculate the error between the batch average of the two frequency components. Thereby, in the present embodiment, not only the frequency component generated in the individual data but also the overall tendency seen in the generated frequency component can be reflected in the learning.

The calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the discriminator accuracy between the first data and the second data by the classifier constituting the hostile learning model increases. Calculate the loss function that increases as is lower. The updater 134 updates the generator parameters so that the loss function is optimized. Thereby, in the present embodiment, the learning of the entire model can be efficiently performed.

[experiment]
An experiment in which the above embodiment is actually carried out will be described. The experimental settings are as follows.
・ Experimental setting data set: CIFAR-100 (image data set, 100 classes)
Training data set: 50,000 sheets Neural network architecture: Resnet-SNGAN (Reference 8: Miyato, Takeru, et al. "Spectral normalization for generative adversarial networks." ArXiv preprint arXiv: 1802.05957 (ICLR 2018).)
・ Experimental procedure (1) 100,000 iteration learning using training data (2) Measurement of generation quality (FID) every 1,000 iterations (Reference 9: Heusel, Martin, et al. "Gans trained by a two time-scale update" rule converge to a local nash equilibrium. "Advances in neural information processing systems. 2017. (NIPS 2017))
(3) The model with the best FID score is used as the final learning model. (4) Perform 10 times in total to obtain the average and standard deviation of FID. ・ Experimental pattern SNGAN: Baseline (normal GAN) (references) 8)
CVPR20: Existing method for minimizing the frequency component of the generated image (using 1-dimensional DFT, Binary Cross-entropy) (Reference 7)
FreqMSE: Frequency component matching loss (using 2D DCT, Mean Squared Error)
SSD2GAN: Simultaneous learning of spatial and frequency domains (2D DCT)
SSD2GAN + Tradeoff: Introduced trade-off coefficient α (using α = 0.8)
_SSD2GAN + SSCR: Introduced consistency loss of Ds and Df (using λ ₌ 0.001)

FreqMSE corresponds to the first embodiment. SSD2GAN is another method for improving the accuracy of the model in consideration of the influence of the frequency component by a method different from that of the first embodiment.

FIGS. 5, 6 and 7 are diagrams showing the results of the experiment. As shown in FIG. 5, in FreqMSE and SSD2GAN + Tradeoff + SSCR, it can be said that the FID of the generator G is small and the production quality is improved.

Further, as shown in FIG. 6, overfitting is suppressed by each method except SNGAN. In SNGAN, overfitting has occurred after 40,000 iterations, and FID continues to deteriorate.

As shown in FIG. 7, with respect to the conversion function of each frequency component, FreqMSE and SSD2GAN have an effect of suppressing a non-existent high frequency component contained in the generated sample.

[System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific forms of distribution and integration of each device are not limited to those shown in the figure, and all or part of them may be functionally or physically dispersed or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU, or hardware by wired logic. Can be realized as. The program may be executed not only by the CPU but also by another processor such as a GPU.

Further, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

[program]
As one embodiment, the learning device 10 can be implemented by installing a learning program that executes the above learning process as package software or online software on a desired computer. For example, by causing the information processing device to execute the above learning program, the information processing device can be made to function as the learning device 10. The information processing device referred to here includes a desktop type or notebook type personal computer. In addition, the information processing device includes smartphones, mobile phones, mobile communication terminals such as PHS (Personal Handyphone System), and slate terminals such as PDAs (Personal Digital Assistants).

Further, the learning device 10 can be implemented as a learning server device in which the terminal device used by the user is a client and the service related to the above learning process is provided to the client. For example, the learning server device is implemented as a server device that provides a learning service that inputs learning data and outputs learning model information. In this case, the learning server device may be implemented as a Web server, or may be implemented as a cloud that provides the service related to the learning process by outsourcing.

FIG. 8 is a diagram showing an example of a computer that executes a learning program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (BASIC Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, the display 1130.

The hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the program that defines each process of the learning device 10 is implemented as a program module 1093 in which a code that can be executed by a computer is described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing the same processing as the functional configuration in the learning device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

Further, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes the process of the above-described embodiment.

The program module 1093 and the program data 1094 are not limited to those stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read from another computer by the CPU 1020 via the network interface 1070.

10 Learning device 11 Input / output unit 12 Storage unit 121 Model information 13 Control unit 131 Generation unit 132 Conversion unit 133 Calculation unit 134 Update unit

Claims

A conversion unit that converts the first data into the first frequency component and the second data generated by the generator constituting the hostile learning model into the second frequency component.
A calculation unit that calculates the error between the first frequency component and the second frequency component,
An updater that updates the parameters of the generator so that the error calculated by the calculator is smaller,
A learning device characterized by having.
The learning device according to claim 1, wherein the conversion unit converts the first data and the second data into frequency components by using a differentiable function.
The learning device according to claim 2, wherein the conversion unit converts the first data and the second data into frequency components by a discrete Fourier transform or a discrete cosine transform.
The calculation unit was obtained by converting each of the batch average of the plurality of first frequency components obtained by converting each of the plurality of the first data and the plurality of the second data. The learning apparatus according to claim 1, wherein an error between a plurality of batch averages of the second frequency components is calculated.
The calculation unit increases as the error between the first frequency component and the second frequency component increases, and the first data and the second data by the classifier constituting the hostile learning model become larger. Calculate the loss function, which increases as the accuracy of discrimination from the data decreases.
The learning device according to claim 1, wherein the updating unit updates the parameters of the generator so that the loss function is optimized.
A learning method performed by a learning device,
A conversion step of converting the first data into the first frequency component and converting the second data generated by the generator constituting the hostile learning model into the second frequency component.
A calculation step of calculating the error between the first frequency component and the second frequency component,
An update process that updates the parameters of the generator so that the error calculated by the calculation process is small, and
A learning method characterized by including.
A learning program for making a computer function as the learning device according to any one of claims 1 to 5.