WO2022070342A1 - Learning device, learning method, and learning program - Google Patents

Learning device, learning method, and learning program Download PDF

Info

Publication number
WO2022070342A1
WO2022070342A1 PCT/JP2020/037256 JP2020037256W WO2022070342A1 WO 2022070342 A1 WO2022070342 A1 WO 2022070342A1 JP 2020037256 W JP2020037256 W JP 2020037256W WO 2022070342 A1 WO2022070342 A1 WO 2022070342A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning
frequency component
error
generator
Prior art date
Application number
PCT/JP2020/037256
Other languages
French (fr)
Japanese (ja)
Inventor
真弥 山口
関利 金井
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/037256 priority Critical patent/WO2022070342A1/en
Priority to JP2022553336A priority patent/JPWO2022070342A1/ja
Publication of WO2022070342A1 publication Critical patent/WO2022070342A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a learning device, a learning method and a learning program.
  • GAN Geneative Adversarial Networks
  • Non-Patent Document 1 GAN (Generative Adversarial Networks) is known as a deep learning model (see, for example, Non-Patent Document 1).
  • the conventional technology has a problem that overfitting may occur and the accuracy of the model may not be improved.
  • the sample generated by the trained GAN generator contains high frequency components that are not included in the actual training data.
  • the discriminator becomes dependent on the high frequency component to perform authenticity determination, and overfitting may occur.
  • the learning device converts the first data into the first frequency component and converts the second data generated by the generator constituting the hostile learning model into the first frequency component.
  • a conversion unit that converts to a second frequency component, a calculation unit that calculates an error between the first frequency component and the second frequency component, and an error calculated by the calculation unit are reduced. It is characterized by having an update unit for updating the parameters of the generator.
  • FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment.
  • FIG. 2 is a diagram illustrating the influence of high frequency components.
  • FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment.
  • FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment.
  • FIG. 5 is a diagram showing the results of the experiment.
  • FIG. 6 is a diagram showing the results of the experiment.
  • FIG. 7 is a diagram showing the results of the experiment.
  • FIG. 8 is a diagram showing an example of a computer that executes a learning program.
  • GAN is a technique for learning the data distribution p_data (x) by two deep learning models, a generator G and a classifier D. G learns to deceive D, and D learns to distinguish G from the training data.
  • a model in which such a plurality of models are in a hostile relationship may be called a hostile learning model.
  • Hostile learning models such as GAN are used in the generation of images, texts, sounds and the like.
  • Reference 1 Karras, Tero, et al. "Analyzing and improving the image quality of stylegan.” Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
  • Reference 2 Donahue, Chris, Julian McAuley, and Miller Puckette. "Adversarial audio synthesis.”
  • ICLR 2019 Reference 3: Yu, Lantao, et al. "Seqgan: Sequence generative adversarial nets with policy gradient.” Thirty-first AAAI conference on artificial intelligence. 2017. (AAAI 2017)
  • GAN has a problem that D overfits the learning sample as the learning progresses.
  • each model cannot be meaningfully updated for data generation, and the quality of generation by the generator deteriorates. This is shown, for example, in Figure 1 of Reference 4.
  • Reference 4 Karras, Tero, et al. "Training Generative Adversarial Networks with Limited Data.”
  • ArXiv preprint arXiv: 2006.06676 (2020).
  • Reference 5 describes that the trained CNN output is predicted depending on the high frequency component of the input.
  • Reference 5 Wang, Haohan, et al. "High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks.” Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
  • Reference 6 describes that the neural network constituting the GAN generator G and the classifier D tends to learn in the order of low frequency and high frequency.
  • Reference 6 Rahaman, Nasim, et al. "On the spectral bias of neural networks.” International Conference on Machine Learning. 2019. (ICML 2019)
  • FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment.
  • FIG. 2 is a diagram illustrating the influence of the high frequency component.
  • the CIFAR-10 (two-dimensional power spectrum) is different between the actual data (Real) and the data generated by the generator (GAN).
  • Reference 7 shows that the data generated by various GANs has an increased power spectrum at a high frequency as compared with the actual data.
  • Reference 7 Durall, Ricard, Margret Keuper, and Janis Keuper. "Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions.” Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
  • the classifier D will be used for the data (Real) included in the actual data set X and the data (Fake) generated by the generator G from the random number z. Identifies whether the data in is Real (or Fake).
  • the discriminator D is optimized so that the discriminating accuracy of the discriminator D is improved, that is, the probability that the discriminator D distinguishes Real from Real is increased.
  • the generator G is optimized so that the ability of the generator G to deceive the generator G, that is, the probability that the discriminator D distinguishes Real from Fake increases.
  • the generator G is optimized so that the frequency components of Real and Fake match.
  • the details of the learning process of the deep learning model will be described together with the configuration of the learning device of the present embodiment.
  • FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment.
  • the learning device 10 accepts input of data for learning and updates the parameters of the deep learning model. Further, the learning device 10 may output the updated parameters. As shown in FIG. 3, the learning device 10 has an input / output unit 11, a storage unit 12, and a control unit 13.
  • the input / output unit 11 is an interface for inputting / outputting data.
  • the input / output unit 11 may be a communication interface such as a NIC (Network Interface Card) for performing data communication with another device via a network.
  • the input / output unit 11 may be an interface for connecting an input device such as a mouse and a keyboard, and an output device such as a display.
  • the storage unit 12 is a storage device for an HDD (Hard Disk Drive), SSD (Solid State Drive), optical disk, or the like.
  • the storage unit 12 may be a semiconductor memory in which data such as RAM (Random Access Memory), flash memory, NVSRAM (Non Volatile Static Random Access Memory) can be rewritten.
  • the storage unit 12 stores an OS (Operating System) and various programs executed by the learning device 10. Further, the storage unit 12 stores the model information 121.
  • the model information 121 is information such as parameters for constructing a deep learning model, and is appropriately updated in the learning process. Further, the updated model information 121 may be output to another device or the like via the input / output unit 11.
  • the control unit 13 controls the entire learning device 10.
  • the control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. It is an integrated circuit.
  • the control unit 13 has an internal memory for storing programs and control data that specify various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units by operating various programs.
  • the control unit 13 has a generation unit 131, a conversion unit 132, a calculation unit 133, and an update unit 134.
  • the generation unit 131 inputs the random number z to the generator G and generates the second data.
  • the conversion unit 132 converts the first data into the first frequency component, and converts the second data generated by the generator G constituting the hostile learning model into the second frequency component.
  • the conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. This is to enable the parameter update by the inverse error propagation method.
  • the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform (DFT: discrete Fourier transform) or a discrete cosine transform (DCT: discrete cosine transform).
  • DFT discrete Fourier transform
  • DCT discrete cosine transform
  • the calculation unit 133 calculates the error between the first frequency component and the second frequency component.
  • the calculation unit 133 can calculate the error by any method such as MSE (mean square error, Mean Square Error), RMSE (mean squared error, Root Mean Square Error), L1 and the like.
  • MSE mean square error, Mean Square Error
  • RMSE mean squared error, Root Mean Square Error
  • L1 and the like.
  • MSE mean square error, Mean Square Error
  • RMSE mean squared error, Root Mean Square Error
  • X real and X fake are batches of Real and Fake, respectively. Further,
  • are the respective batch sizes.
  • Real is real data. Fake is data generated by the generator G.
  • F ( ⁇ ) is a function that converts data in the spatial region into frequency components.
  • x real i and x fake j are the i-th data of X real and the j-th data of X fake , respectively, and are examples of the first data and the second data.
  • F (x real i ) corresponds to the first frequency component.
  • F (x fake j ) corresponds to the second frequency component.
  • the calculation unit 133 is obtained by converting each of the batch average of the plurality of first frequency components obtained by converting each of the plurality of first data and the plurality of second data. Calculate the error between the batch averages of the plurality of second frequency components. That is, the error here corresponds to the error between batch averages, not the error between single data samples.
  • the calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the first data and the second data by the discriminator constituting the hostile learning model become larger.
  • the loss function LG which increases as the discrimination accuracy decreases, is calculated as in Eq. (2).
  • is a hyperparameter that functions as a weight.
  • G ( ⁇ ) is a function that outputs the data (Fake) generated by the generator G based on the argument.
  • D (.) Is a function that outputs the probability of identifying the data input as an argument by the classifier D as Real.
  • the update unit 134 updates the parameters of the generator G so that the error calculated by the calculation unit 133 becomes small. Specifically, the update unit 134 updates the parameters of the generator G so that the loss function LG is optimized.
  • the update unit 134 updates the parameters of the classifier D so that the loss function of the equation (3) is optimized.
  • x is real data.
  • FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment.
  • the learning device 10 reads the learning data (step S101).
  • the learning device 10 reads existing data (Real) as learning data.
  • the learning device 10 samples a random number z from the normal distribution and generates a sample (Fake) by G (z) (step S102). Further, the learning device 10 converts Real and Fake into frequency components by DCT or DFT, and calculates the batch average of the frequency components (step S103).
  • the learning device 10 calculates the GAN loss function of the generator G (step S104).
  • the GAN loss of the generator G corresponds to the first term on the right side of the equation (2).
  • the learning device 10 calculates the frequency component matching loss from the batch average of the Real-Fake frequency components (step S105).
  • the frequency component matching loss corresponds to L freq in Eq. (1).
  • the learning device 10 calculates the sum of the GAN loss function and the frequency component matching loss with respect to G as the total loss (step S106).
  • the total loss corresponds to LG in equation (2).
  • the learning device 10 may multiply the frequency component matching loss by the weight ⁇ .
  • the learning device 10 updates the parameters of the generator G by the inverse error propagation method of the total loss (step S107).
  • the learning device 10 learns the classifier D (step S108). Specifically, the learning device 10 updates the parameters of the classifier D by the inverse error propagation method of the loss function of the equation (3).
  • step S109 True
  • step S109 False
  • the conversion unit 132 converts the first data into the first frequency component, and the second data generated by the generator constituting the hostile learning model is converted into the second frequency component. Convert to.
  • the calculation unit 133 calculates the error between the first frequency component and the second frequency component.
  • the updater 134 updates the generator parameters so that the error calculated by the calculator 133 is smaller. In this way, the learning device 10 can reflect the influence of the frequency component on the learning. Thereby, according to the present embodiment, it is possible to suppress the occurrence of overfitting and improve the accuracy of the model.
  • the conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. For example, the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform or a discrete cosine transform. This makes it possible to update the parameters by the inverse error propagation method in the present embodiment.
  • the calculation unit 133 includes a batch average of a plurality of first frequency components obtained by converting each of the plurality of first data, and a plurality of second data obtained by converting each of the plurality of second data. Calculate the error between the batch average of the two frequency components.
  • the calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the discriminator accuracy between the first data and the second data by the classifier constituting the hostile learning model increases. Calculate the loss function that increases as is lower.
  • the updater 134 updates the generator parameters so that the loss function is optimized. Thereby, in the present embodiment, the learning of the entire model can be efficiently performed.
  • SSD2GAN is another method for improving the accuracy of the model in consideration of the influence of the frequency component by a method different from that of the first embodiment.
  • FIGS. 5, 6 and 7 are diagrams showing the results of the experiment. As shown in FIG. 5, in FreqMSE and SSD2GAN + Tradeoff + SSCR, it can be said that the FID of the generator G is small and the production quality is improved.
  • overfitting is suppressed by each method except SNGAN.
  • SNGAN overfitting has occurred after 40,000 iterations, and FID continues to deteriorate.
  • FreqMSE and SSD2GAN have an effect of suppressing a non-existent high frequency component contained in the generated sample.
  • each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific forms of distribution and integration of each device are not limited to those shown in the figure, and all or part of them may be functionally or physically dispersed or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU, or hardware by wired logic. Can be realized as. The program may be executed not only by the CPU but also by another processor such as a GPU.
  • CPU Central Processing Unit
  • the learning device 10 can be implemented by installing a learning program that executes the above learning process as package software or online software on a desired computer. For example, by causing the information processing device to execute the above learning program, the information processing device can be made to function as the learning device 10.
  • the information processing device referred to here includes a desktop type or notebook type personal computer.
  • the information processing device includes smartphones, mobile phones, mobile communication terminals such as PHS (Personal Handyphone System), and slate terminals such as PDAs (Personal Digital Assistants).
  • the learning device 10 can be implemented as a learning server device in which the terminal device used by the user is a client and the service related to the above learning process is provided to the client.
  • the learning server device is implemented as a server device that provides a learning service that inputs learning data and outputs learning model information.
  • the learning server device may be implemented as a Web server, or may be implemented as a cloud that provides the service related to the learning process by outsourcing.
  • FIG. 8 is a diagram showing an example of a computer that executes a learning program.
  • the computer 1000 has, for example, a memory 1010 and a CPU 1020.
  • the computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012.
  • the ROM 1011 stores, for example, a boot program such as a BIOS (BASIC Input Output System).
  • BIOS BASIC Input Output System
  • the hard disk drive interface 1030 is connected to the hard disk drive 1090.
  • the disk drive interface 1040 is connected to the disk drive 1100.
  • a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120.
  • the video adapter 1060 is connected to, for example, the display 1130.
  • the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the program that defines each process of the learning device 10 is implemented as a program module 1093 in which a code that can be executed by a computer is described.
  • the program module 1093 is stored in, for example, the hard disk drive 1090.
  • the program module 1093 for executing the same processing as the functional configuration in the learning device 10 is stored in the hard disk drive 1090.
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes the process of the above-described embodiment.
  • the program module 1093 and the program data 1094 are not limited to those stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read from another computer by the CPU 1020 via the network interface 1070.
  • LAN Local Area Network
  • WAN Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

This learning device includes a conversion unit (132) that converts first data into a first frequency component and converts second data generated by a generator constituting an adversarial learning model into a second frequency component. A calculation unit (133) calculates the error between the first frequency component and the second frequency component. An update unit (134) updates a parameter of the generator so as to reduce the error calculated by the calculation unit (133).

Description

学習装置、学習方法及び学習プログラムLearning equipment, learning methods and learning programs
 本発明は、学習装置、学習方法及び学習プログラムに関する。 The present invention relates to a learning device, a learning method and a learning program.
 従来、深層学習技術を基にした技術であり、学習させたデータの分布を学習することで本物に近いサンプルを生成する深層生成モデルが知られている。例えば、深層学習モデルとして、GAN(Generative Adversarial Networks)が知られている(例えば、非特許文献1を参照)。 Conventionally, it is a technology based on deep learning technology, and a deep generative model that generates a sample close to the real thing by learning the distribution of trained data is known. For example, GAN (Generative Adversarial Networks) is known as a deep learning model (see, for example, Non-Patent Document 1).
 しかしながら、従来の技術には、過学習が発生しモデルの精度が向上しない場合があるという問題がある。例えば、学習済みのGANの生成器が生成するサンプルには、実際の学習データには含まれない高周波成分が混入する。その結果、識別器が高周波成分に依存して真贋判定を行うようになり、過学習が発生する場合がある。 However, the conventional technology has a problem that overfitting may occur and the accuracy of the model may not be improved. For example, the sample generated by the trained GAN generator contains high frequency components that are not included in the actual training data. As a result, the discriminator becomes dependent on the high frequency component to perform authenticity determination, and overfitting may occur.
 上述した課題を解決し、目的を達成するために、学習装置は、第1のデータを第1の周波数成分に変換し、敵対的学習モデルを構成する生成器によって生成された第2のデータを第2の周波数成分に変換する変換部と、前記第1の周波数成分と前記第2の周波数成分との間の誤差を計算する計算部と、前記計算部によって計算された誤差が小さくなるように前記生成器のパラメータを更新する更新部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the purpose, the learning device converts the first data into the first frequency component and converts the second data generated by the generator constituting the hostile learning model into the first frequency component. A conversion unit that converts to a second frequency component, a calculation unit that calculates an error between the first frequency component and the second frequency component, and an error calculated by the calculation unit are reduced. It is characterized by having an update unit for updating the parameters of the generator.
 本発明によれば、過学習の発生を抑止し、モデルの精度を向上させることができる。 According to the present invention, it is possible to suppress the occurrence of overfitting and improve the accuracy of the model.
図1は、第1の実施形態に係る深層学習モデルを説明する図である。FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment. 図2は、高周波成分の影響を説明する図である。FIG. 2 is a diagram illustrating the influence of high frequency components. 図3は、第1の実施形態に係る学習装置の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment. 図4は、第1の実施形態に係る学習装置の処理の流れを示すフローチャートである。FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment. 図5は、実験の結果を示す図である。FIG. 5 is a diagram showing the results of the experiment. 図6は、実験の結果を示す図である。FIG. 6 is a diagram showing the results of the experiment. 図7は、実験の結果を示す図である。FIG. 7 is a diagram showing the results of the experiment. 図8は、学習プログラムを実行するコンピュータの一例を示す図である。FIG. 8 is a diagram showing an example of a computer that executes a learning program.
 以下に、本願に係る学習装置、学習方法及び学習プログラムの実施形態を図面に基づいて詳細に説明する。なお、本発明は、以下に説明する実施形態により限定されるものではない。 Hereinafter, the learning device, the learning method, and the embodiment of the learning program according to the present application will be described in detail based on the drawings. The present invention is not limited to the embodiments described below.
 GANは、生成器Gと識別器Dの2つの深層学習モデルによってデータ分布p_data(x)を学習する技術である。GはDを騙すように学習し、DはGと学習データを区別できるように学習する。このような複数のモデルが敵対的な関係にあるモデルを、敵対的学習モデルと呼ぶ場合がある。 GAN is a technique for learning the data distribution p_data (x) by two deep learning models, a generator G and a classifier D. G learns to deceive D, and D learns to distinguish G from the training data. A model in which such a plurality of models are in a hostile relationship may be called a hostile learning model.
 GANのような敵対的学習モデルは、画像、テキスト及び音声等の生成において利用される。
 参考文献1:Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
 参考文献2:Donahue, Chris, Julian McAuley, and Miller Puckette. "Adversarial audio synthesis." arXiv preprint arXiv:1802.04208 (2018).(ICLR 2019)
 参考文献3:Yu, Lantao, et al. "Seqgan: Sequence generative adversarial nets with policy gradient." Thirty-first AAAI conference on artificial intelligence. 2017. (AAAI 2017)
Hostile learning models such as GAN are used in the generation of images, texts, sounds and the like.
Reference 1: Karras, Tero, et al. "Analyzing and improving the image quality of stylegan." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
Reference 2: Donahue, Chris, Julian McAuley, and Miller Puckette. "Adversarial audio synthesis." ArXiv preprint arXiv: 1802.04208 (2018). (ICLR 2019)
Reference 3: Yu, Lantao, et al. "Seqgan: Sequence generative adversarial nets with policy gradient." Thirty-first AAAI conference on artificial intelligence. 2017. (AAAI 2017)
 ここで、GANには、学習が進むにつれてDが学習サンプルに対して過学習するという問題がある。その結果、各モデルは、データ生成に対して意味のある更新が行えなくなり、生成器による生成品質は劣化していく。このことは、例えば参考文献4のFigure 1等に示されている。
 参考文献4:Karras, Tero, et al. "Training Generative Adversarial Networks with Limited Data." arXiv preprint arXiv:2006.06676 (2020).
Here, GAN has a problem that D overfits the learning sample as the learning progresses. As a result, each model cannot be meaningfully updated for data generation, and the quality of generation by the generator deteriorates. This is shown, for example, in Figure 1 of Reference 4.
Reference 4: Karras, Tero, et al. "Training Generative Adversarial Networks with Limited Data." ArXiv preprint arXiv: 2006.06676 (2020).
 また、参考文献5には、学習済みのCNN出力が、入力の高周波成分に依存して予測を行っていることが記載されている。
 参考文献5:Wang, Haohan, et al. "High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.(CVPR 2020)
Further, Reference 5 describes that the trained CNN output is predicted depending on the high frequency component of the input.
Reference 5: Wang, Haohan, et al. "High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
 また、参考文献6には、GANの生成器Gと識別器Dを構成するニューラルネットワークは低周波、高周波の順に学習する傾向があることが記載されている。
 参考文献6:Rahaman, Nasim, et al. "On the spectral bias of neural networks." International Conference on Machine Learning. 2019. (ICML 2019)
Further, Reference 6 describes that the neural network constituting the GAN generator G and the classifier D tends to learn in the order of low frequency and high frequency.
Reference 6: Rahaman, Nasim, et al. "On the spectral bias of neural networks." International Conference on Machine Learning. 2019. (ICML 2019)
 そこで、第1の実施形態では、データの高周波成分の生成器G及び識別器Dへの影響を低減することで、過学習の発生を抑止し、モデルの精度を向上させることを1つの目的とする。図1は、第1の実施形態に係る深層学習モデルを説明する図である。また、図2は、高周波成分の影響を説明する図である。 Therefore, in the first embodiment, one purpose is to suppress the occurrence of overfitting and improve the accuracy of the model by reducing the influence of the high frequency component of the data on the generator G and the classifier D. do. FIG. 1 is a diagram illustrating a deep learning model according to the first embodiment. Further, FIG. 2 is a diagram illustrating the influence of the high frequency component.
 図2に示すように、実在するデータ(Real)と生成器によって生成されたデータ(GAN)とでは、CIFAR-10(二次元パワースペクトル)が異なる。また、参考文献7には、各種GANで生成したデータは、実在のデータに比べ、高周波におけるパワースペクトルが増大することが示されている。
 参考文献7:Durall, Ricard, Margret Keuper, and Janis Keuper. "Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
As shown in FIG. 2, the CIFAR-10 (two-dimensional power spectrum) is different between the actual data (Real) and the data generated by the generator (GAN). Further, Reference 7 shows that the data generated by various GANs has an increased power spectrum at a high frequency as compared with the actual data.
Reference 7: Durall, Ricard, Margret Keuper, and Janis Keuper. "Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions." Proceedings of the IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2020. (CVPR 2020)
 図1に戻り、本実施形態の深層学習モデルは、実在のデータ集合Xに含まれるデータ(Real)と、乱数zから生成器Gによって生成されたデータ(Fake)について、識別器Dが、いずれのデータがReal(又はFake)であるかを識別する。 Returning to FIG. 1, in the deep learning model of the present embodiment, the classifier D will be used for the data (Real) included in the actual data set X and the data (Fake) generated by the generator G from the random number z. Identifies whether the data in is Real (or Fake).
 従来のGANにおいては、識別器Dの識別精度が向上するように、すなわち識別器DがRealをRealと識別する確率が大きくなるように識別器Dの最適化が行われる。また、生成器Gが生成器Gを騙す能力、すなわち識別器DがRealをFakeと識別する確率が大きくなるように生成器Gの最適化が行われる。 In the conventional GAN, the discriminator D is optimized so that the discriminating accuracy of the discriminator D is improved, that is, the probability that the discriminator D distinguishes Real from Real is increased. Further, the generator G is optimized so that the ability of the generator G to deceive the generator G, that is, the probability that the discriminator D distinguishes Real from Fake increases.
 本実施形態では、上記の最適化に加えて、RealとFakeの周波成分が一致するように生成器Gの最適化が行われる。以下、本実施形態の学習装置の構成とともに、深層学習モデルの学習処理の詳細を説明する。 In this embodiment, in addition to the above optimization, the generator G is optimized so that the frequency components of Real and Fake match. Hereinafter, the details of the learning process of the deep learning model will be described together with the configuration of the learning device of the present embodiment.
[第1の実施形態の構成]
 図3は、第1の実施形態に係る学習装置の構成例を示す図である。学習装置10は、学習用のデータの入力を受け付け、深層学習モデルのパラメータを更新する。また、学習装置10は、更新済みのパラメータを出力してもよい。図3に示すように、学習装置10は、入出力部11、記憶部12及び制御部13を有する。
[Structure of the first embodiment]
FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment. The learning device 10 accepts input of data for learning and updates the parameters of the deep learning model. Further, the learning device 10 may output the updated parameters. As shown in FIG. 3, the learning device 10 has an input / output unit 11, a storage unit 12, and a control unit 13.
 入出力部11は、データの入出力を行うためのインタフェースである。例えば、入出力部11は、ネットワークを介して他の装置との間でデータ通信を行うためのNIC(Network Interface Card)等の通信インタフェースであってもよい。また、入出力部11は、マウス、キーボード等の入力装置、及びディスプレイ等の出力装置を接続するためのインタフェースであってもよい。 The input / output unit 11 is an interface for inputting / outputting data. For example, the input / output unit 11 may be a communication interface such as a NIC (Network Interface Card) for performing data communication with another device via a network. Further, the input / output unit 11 may be an interface for connecting an input device such as a mouse and a keyboard, and an output device such as a display.
 記憶部12は、HDD(Hard Disk Drive)、SSD(Solid State Drive)、光ディスク等の記憶装置である。なお、記憶部12は、RAM(Random Access Memory)、フラッシュメモリ、NVSRAM(Non Volatile Static Random Access Memory)等のデータを書き換え可能な半導体メモリであってもよい。記憶部12は、学習装置10で実行されるOS(Operating System)や各種プログラムを記憶する。また、記憶部12は、モデル情報121を記憶する。 The storage unit 12 is a storage device for an HDD (Hard Disk Drive), SSD (Solid State Drive), optical disk, or the like. The storage unit 12 may be a semiconductor memory in which data such as RAM (Random Access Memory), flash memory, NVSRAM (Non Volatile Static Random Access Memory) can be rewritten. The storage unit 12 stores an OS (Operating System) and various programs executed by the learning device 10. Further, the storage unit 12 stores the model information 121.
 モデル情報121は、深層学習モデルを構築するためのパラメータ等の情報であり、学習処理において適宜更新される。また、更新済みのモデル情報121は、入出力部11を介して他の装置等に出力されてもよい。 The model information 121 is information such as parameters for constructing a deep learning model, and is appropriately updated in the learning process. Further, the updated model information 121 may be output to another device or the like via the input / output unit 11.
 制御部13は、学習装置10全体を制御する。制御部13は、例えば、CPU(Central Processing Unit)、MPU(Micro Processing Unit)、GPU(Graphics Processing Unit)等の電子回路や、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)等の集積回路である。また、制御部13は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部13は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部13は、生成部131、変換部132、計算部133及び更新部134を有する。 The control unit 13 controls the entire learning device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. It is an integrated circuit. Further, the control unit 13 has an internal memory for storing programs and control data that specify various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units by operating various programs. For example, the control unit 13 has a generation unit 131, a conversion unit 132, a calculation unit 133, and an update unit 134.
 生成部131は、乱数zを生成器Gに入力し第2のデータを生成する。 The generation unit 131 inputs the random number z to the generator G and generates the second data.
 変換部132は、第1のデータを第1の周波数成分に変換し、敵対的学習モデルを構成する生成器Gによって生成された第2のデータを第2の周波数成分に変換する。 The conversion unit 132 converts the first data into the first frequency component, and converts the second data generated by the generator G constituting the hostile learning model into the second frequency component.
 変換部132は、微分可能な関数を用いて、第1のデータ及び第2のデータを周波数成分に変換する。これは、逆誤差伝搬法によるパラメータの更新を可能にするためである。例えば、変換部132は、離散フーリエ変換(DFT:discrete Fourier transform)又は離散コサイン変換(DCT:discrete cosine transform)により第1のデータ及び第2のデータを周波数成分に変換する。 The conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. This is to enable the parameter update by the inverse error propagation method. For example, the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform (DFT: discrete Fourier transform) or a discrete cosine transform (DCT: discrete cosine transform).
 計算部133は、第1の周波数成分と第2の周波数成分との間の誤差を計算する。計算部133は、MSE(平均二乗誤差、Mean Square Error)、RMSE(平均平方二乗誤差、Root Mean Square Error)、L1等の任意の方法によって誤差を計算することができる。ここでは、計算部133は、(1)式のように誤差Lfreqを計算するものとする。 The calculation unit 133 calculates the error between the first frequency component and the second frequency component. The calculation unit 133 can calculate the error by any method such as MSE (mean square error, Mean Square Error), RMSE (mean squared error, Root Mean Square Error), L1 and the like. Here, it is assumed that the calculation unit 133 calculates the error L freq as in the equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ここで、Xreal及びXfakeはそれぞれRealとFakeのバッチである。また、|Xreal|及び|Xfake|はそれぞれのバッチサイズである。Realは実在するデータである。また、Fakeは生成器Gによって生成されるデータである。 Here, X real and X fake are batches of Real and Fake, respectively. Further, | X real | and | X fake | are the respective batch sizes. Real is real data. Fake is data generated by the generator G.
 また、F(・)は空間領域のデータを周波数成分に変換する関数である。xreal 及びxfake は、それぞれXrealのi番目のデータ及びXfakeのj番目のデータであり、第1のデータ及び第2のデータの一例である。また、F(xreal )は、第1の周波数成分に相当する。また、F(xfake )は、第2の周波数成分に相当する。 Further, F (・) is a function that converts data in the spatial region into frequency components. x real i and x fake j are the i-th data of X real and the j-th data of X fake , respectively, and are examples of the first data and the second data. Further, F (x real i ) corresponds to the first frequency component. Further, F (x fake j ) corresponds to the second frequency component.
 このように、計算部133は、複数の第1のデータのそれぞれを変換して得られた複数の第1の周波数成分のバッチ平均と、複数の第2のデータのそれぞれを変換して得られた複数の第2の周波数成分のバッチ平均と、の間の誤差を計算する。つまり、ここでの誤差は、単体のデータサンプル同士の誤差ではなく、バッチ平均間の誤差に相当する。 As described above, the calculation unit 133 is obtained by converting each of the batch average of the plurality of first frequency components obtained by converting each of the plurality of first data and the plurality of second data. Calculate the error between the batch averages of the plurality of second frequency components. That is, the error here corresponds to the error between batch averages, not the error between single data samples.
 さらに、計算部133は、第1の周波数成分と第2の周波数成分との間の誤差が大きいほど大きくなり、敵対的学習モデルを構成する識別器による第1のデータと第2のデータとの識別精度が低いほど大きくなる損失関数Lを(2)式のように計算する。λは、重みとして機能するハイパーパラメータである。 Further, the calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the first data and the second data by the discriminator constituting the hostile learning model become larger. The loss function LG , which increases as the discrimination accuracy decreases, is calculated as in Eq. (2). λ is a hyperparameter that functions as a weight.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 G(・)は、引数を基に生成器Gによって生成されたデータ(Fake)を出力する関数である。また、D(・)は、引数として入力されたデータを、識別器DがRealであると識別する確率を出力する関数である。 G (・) is a function that outputs the data (Fake) generated by the generator G based on the argument. Further, D (.) Is a function that outputs the probability of identifying the data input as an argument by the classifier D as Real.
 更新部134は、計算部133によって計算された誤差が小さくなるように生成器Gのパラメータを更新する。具体的には、更新部134は、損失関数Lが最適化されるように、生成器Gのパラメータを更新する。 The update unit 134 updates the parameters of the generator G so that the error calculated by the calculation unit 133 becomes small. Specifically, the update unit 134 updates the parameters of the generator G so that the loss function LG is optimized.
 また、更新部134は、(3)式の損失関数が最適化されるように、識別器Dのパラメータを更新する。ここでのxは、実在するデータ(Real)である。 Further, the update unit 134 updates the parameters of the classifier D so that the loss function of the equation (3) is optimized. Here, x is real data.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
[第1の実施形態の処理]
 図4は、第1の実施形態に係る学習装置の処理の流れを示すフローチャートである。図4に示すように、まず、学習装置10は、学習データを読み込む(ステップS101)。ここでは、学習装置10は、実在するデータ(Real)を学習データとして読み込む。
[Processing of the first embodiment]
FIG. 4 is a flowchart showing a processing flow of the learning device according to the first embodiment. As shown in FIG. 4, first, the learning device 10 reads the learning data (step S101). Here, the learning device 10 reads existing data (Real) as learning data.
 次に、学習装置10は、正規分布から乱数zをサンプリングし、G(z)によってサンプル(Fake)を生成する(ステップS102)。また、学習装置10は、DCT又はDFTでRealとFakeを周波数成分に変換の上、周波数成分のバッチ平均を計算する(ステップS103)。 Next, the learning device 10 samples a random number z from the normal distribution and generates a sample (Fake) by G (z) (step S102). Further, the learning device 10 converts Real and Fake into frequency components by DCT or DFT, and calculates the batch average of the frequency components (step S103).
 ここで、学習装置10は、生成器GのGAN損失関数を計算する(ステップS104)。生成器GのGAN損失は、(2)式の右辺の第1項に相当する。そして、学習装置10は、Real-Fake周波数成分のバッチ平均から周波数成分一致損失を計算する(ステップS105)。周波数成分一致損失は、(1)式のLfreqに相当する。 Here, the learning device 10 calculates the GAN loss function of the generator G (step S104). The GAN loss of the generator G corresponds to the first term on the right side of the equation (2). Then, the learning device 10 calculates the frequency component matching loss from the batch average of the Real-Fake frequency components (step S105). The frequency component matching loss corresponds to L freq in Eq. (1).
 さらに、学習装置10は、全体損失としてGに関するGAN損失関数と周波数成分一致損失の和を計算する(ステップS106)。全体損失は、(2)式のLに相当する。学習装置10は、周波数成分一致損失に重みλを掛けてもよい。学習装置10は、全体損失の逆誤差伝搬法により生成器Gのパラメータを更新する(ステップS107)。 Further, the learning device 10 calculates the sum of the GAN loss function and the frequency component matching loss with respect to G as the total loss (step S106). The total loss corresponds to LG in equation (2). The learning device 10 may multiply the frequency component matching loss by the weight λ. The learning device 10 updates the parameters of the generator G by the inverse error propagation method of the total loss (step S107).
 また、学習装置10は、識別器Dの学習を行う(ステップS108)。具体的には、学習装置10は、(3)式の損失関数の逆誤差伝搬法により識別器Dのパラメータを更新する。 Further, the learning device 10 learns the classifier D (step S108). Specifically, the learning device 10 updates the parameters of the classifier D by the inverse error propagation method of the loss function of the equation (3).
 このとき、最大学習ステップ数>学習ステップ数である場合(ステップS109、True)、学習装置10はステップS101に戻り処理を繰り返す。一方、最大学習ステップ数>学習ステップ数でない場合(ステップS109、False)、学習装置10は処理を終了する。 At this time, if the maximum number of learning steps> the number of learning steps (step S109, True), the learning device 10 returns to step S101 and repeats the process. On the other hand, when the maximum number of learning steps is not equal to the number of learning steps (step S109, False), the learning device 10 ends the process.
[第1の実施形態の効果]
 これまで説明してきたように、変換部132は、第1のデータを第1の周波数成分に変換し、敵対的学習モデルを構成する生成器によって生成された第2のデータを第2の周波数成分に変換する。計算部133は、第1の周波数成分と第2の周波数成分との間の誤差を計算する。更新部134は、計算部133によって計算された誤差が小さくなるように生成器のパラメータを更新する。このように、学習装置10は、周波数成分の影響を学習に反映させることができる。これにより、本実施形態によれば、過学習の発生を抑止し、モデルの精度を向上させることができる。
[Effect of the first embodiment]
As described above, the conversion unit 132 converts the first data into the first frequency component, and the second data generated by the generator constituting the hostile learning model is converted into the second frequency component. Convert to. The calculation unit 133 calculates the error between the first frequency component and the second frequency component. The updater 134 updates the generator parameters so that the error calculated by the calculator 133 is smaller. In this way, the learning device 10 can reflect the influence of the frequency component on the learning. Thereby, according to the present embodiment, it is possible to suppress the occurrence of overfitting and improve the accuracy of the model.
 変換部132は、微分可能な関数を用いて、第1のデータ及び第2のデータを周波数成分に変換する。例えば、変換部132は、離散フーリエ変換又は離散コサイン変換により第1のデータ及び第2のデータを周波数成分に変換する。これにより、本実施形態では、逆誤差伝搬法によるパラメータの更新が可能になる。 The conversion unit 132 converts the first data and the second data into frequency components using a differentiable function. For example, the conversion unit 132 converts the first data and the second data into frequency components by a discrete Fourier transform or a discrete cosine transform. This makes it possible to update the parameters by the inverse error propagation method in the present embodiment.
 計算部133は、複数の第1のデータのそれぞれを変換して得られた複数の第1の周波数成分のバッチ平均と、複数の第2のデータのそれぞれを変換して得られた複数の第2の周波数成分のバッチ平均と、の間の誤差を計算する。これにより、本実施形態では、個々のデータに生じる周波数成分だけでなく、発生する周波数成分に見られる全体的な傾向を学習に反映させることができる。 The calculation unit 133 includes a batch average of a plurality of first frequency components obtained by converting each of the plurality of first data, and a plurality of second data obtained by converting each of the plurality of second data. Calculate the error between the batch average of the two frequency components. Thereby, in the present embodiment, not only the frequency component generated in the individual data but also the overall tendency seen in the generated frequency component can be reflected in the learning.
 計算部133は、第1の周波数成分と第2の周波数成分との間の誤差が大きいほど大きくなり、敵対的学習モデルを構成する識別器による第1のデータと第2のデータとの識別精度が低いほど大きくなる損失関数を計算する。更新部134は、損失関数が最適化されるように、生成器のパラメータを更新する。これにより、本実施形態では、モデル全体の学習を効率良く行うことができる。 The calculation unit 133 increases as the error between the first frequency component and the second frequency component increases, and the discriminator accuracy between the first data and the second data by the classifier constituting the hostile learning model increases. Calculate the loss function that increases as is lower. The updater 134 updates the generator parameters so that the loss function is optimized. Thereby, in the present embodiment, the learning of the entire model can be efficiently performed.
[実験]
 上記の実施形態を実際に実施して行った実験について説明する。実験の設定は以下の通りである。
・実験設定
  データセット:CIFAR-100(画像データセット、100クラス)
  学習データセット:50,000枚
  ニューラルネットワークアーキテクチャ:Resnet-SNGAN(参考文献8:Miyato, Takeru, et al. "Spectral normalization for generative adversarial networks." arXiv preprint arXiv:1802.05957 (ICLR 2018).)
・実験手順
  (1)学習データを用いて100,000 iteration 学習
  (2)1,000 iteration ごとに生成品質(FID)を計測(参考文献9:Heusel, Martin, et al. "Gans trained by a two time-scale update rule converge to a local nash equilibrium." Advances in neural information processing systems. 2017. (NIPS 2017))
  (3)最もFIDのスコアが良いモデルを最終的な学習モデルとする
  (4)全10回施行し、FIDの平均と標準偏差を求める
・実験パターン
  SNGAN:ベースライン(通常のGAN)(参考文献8)
  CVPR20:生成画像の周波数成分を最小化する既存手法(1次元DFT、Binary Cross-entropyを使用)(参考文献7)
  FreqMSE:周波数成分一致損失(2次元DCT、Mean Squared Errorを使用)
  SSD2GAN:空間・周波数ドメインの同時学習(2次元DCT)
  SSD2GAN + Tradeoff:トレードオフ係数α を導入(α=0.8を使用)
  SSD2GAN + SSCR:DとDの一貫性損失を導入(λ=0.001 を使用)
[experiment]
An experiment in which the above embodiment is actually carried out will be described. The experimental settings are as follows.
・ Experimental setting data set: CIFAR-100 (image data set, 100 classes)
Training data set: 50,000 sheets Neural network architecture: Resnet-SNGAN (Reference 8: Miyato, Takeru, et al. "Spectral normalization for generative adversarial networks." ArXiv preprint arXiv: 1802.05957 (ICLR 2018).)
・ Experimental procedure (1) 100,000 iteration learning using training data (2) Measurement of generation quality (FID) every 1,000 iterations (Reference 9: Heusel, Martin, et al. "Gans trained by a two time-scale update" rule converge to a local nash equilibrium. "Advances in neural information processing systems. 2017. (NIPS 2017))
(3) The model with the best FID score is used as the final learning model. (4) Perform 10 times in total to obtain the average and standard deviation of FID. ・ Experimental pattern SNGAN: Baseline (normal GAN) (references) 8)
CVPR20: Existing method for minimizing the frequency component of the generated image (using 1-dimensional DFT, Binary Cross-entropy) (Reference 7)
FreqMSE: Frequency component matching loss (using 2D DCT, Mean Squared Error)
SSD2GAN: Simultaneous learning of spatial and frequency domains (2D DCT)
SSD2GAN + Tradeoff: Introduced trade-off coefficient α (using α = 0.8)
SSD2GAN + SSCR: Introduced consistency loss of Ds and Df (using λ = 0.001)
 FreqMSEは、第1の実施形態に相当する。SSD2GANは、第1の実施形態とは異なる方法により、周波数成分の影響を考慮してモデルの精度を向上させる他の手法である。 FreqMSE corresponds to the first embodiment. SSD2GAN is another method for improving the accuracy of the model in consideration of the influence of the frequency component by a method different from that of the first embodiment.
 図5、図6、図7は、実験の結果を示す図である。図5に示すように、FreqMSE及びSSD2GAN + Tradeoff + SSCRでは、生成器GのFIDが小さくなり、生成品質が改善されたということができる。 FIGS. 5, 6 and 7 are diagrams showing the results of the experiment. As shown in FIG. 5, in FreqMSE and SSD2GAN + Tradeoff + SSCR, it can be said that the FID of the generator G is small and the production quality is improved.
 また、図6に示すように、SNGANを除く各手法で過学習が抑制されている。SNGANは、40,000 iteration以降に過学習が発生し、FIDが悪化し続けている。 Further, as shown in FIG. 6, overfitting is suppressed by each method except SNGAN. In SNGAN, overfitting has occurred after 40,000 iterations, and FID continues to deteriorate.
 図7に示すように、各周波数成分の変換関数について、FreqMSE及びSSD2GANでは、生成されたサンプルに含まれる、存在しない高周波成分を抑制する効果が現れている。 As shown in FIG. 7, with respect to the conversion function of each frequency component, FreqMSE and SSD2GAN have an effect of suppressing a non-existent high frequency component contained in the generated sample.
[システム構成等]
 また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散及び統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散又は統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、CPU(Central Processing Unit)及び当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。なお、プログラムは、CPUだけでなく、GPU等の他のプロセッサによって実行されてもよい。
[System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific forms of distribution and integration of each device are not limited to those shown in the figure, and all or part of them may be functionally or physically dispersed or physically distributed in arbitrary units according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device is realized by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU, or hardware by wired logic. Can be realized as. The program may be executed not only by the CPU but also by another processor such as a GPU.
 また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.
[プログラム]
 一実施形態として、学習装置10は、パッケージソフトウェアやオンラインソフトウェアとして上記の学習処理を実行する学習プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の学習プログラムを情報処理装置に実行させることにより、情報処理装置を学習装置10として機能させることができる。ここで言う情報処理装置には、デスクトップ型又はノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)等の移動体通信端末、さらには、PDA(Personal Digital Assistant)等のスレート端末等がその範疇に含まれる。
[program]
As one embodiment, the learning device 10 can be implemented by installing a learning program that executes the above learning process as package software or online software on a desired computer. For example, by causing the information processing device to execute the above learning program, the information processing device can be made to function as the learning device 10. The information processing device referred to here includes a desktop type or notebook type personal computer. In addition, the information processing device includes smartphones, mobile phones, mobile communication terminals such as PHS (Personal Handyphone System), and slate terminals such as PDAs (Personal Digital Assistants).
 また、学習装置10は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の学習処理に関するサービスを提供する学習サーバ装置として実装することもできる。例えば、学習サーバ装置は、学習用のデータを入力とし、学習済みモデルの情報を出力とする学習サービスを提供するサーバ装置として実装される。この場合、学習サーバ装置は、Webサーバとして実装することとしてもよいし、アウトソーシングによって上記の学習処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。 Further, the learning device 10 can be implemented as a learning server device in which the terminal device used by the user is a client and the service related to the above learning process is provided to the client. For example, the learning server device is implemented as a server device that provides a learning service that inputs learning data and outputs learning model information. In this case, the learning server device may be implemented as a Web server, or may be implemented as a cloud that provides the service related to the learning process by outsourcing.
 図8は、学習プログラムを実行するコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010、CPU1020を有する。また、コンピュータ1000は、ハードディスクドライブインタフェース1030、ディスクドライブインタフェース1040、シリアルポートインタフェース1050、ビデオアダプタ1060、ネットワークインタフェース1070を有する。これらの各部は、バス1080によって接続される。 FIG. 8 is a diagram showing an example of a computer that executes a learning program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.
 メモリ1010は、ROM(Read Only Memory)1011及びRAM(Random Access Memory)1012を含む。ROM1011は、例えば、BIOS(BASIC Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1090に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1100に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、例えばマウス1110、キーボード1120に接続される。ビデオアダプタ1060は、例えばディスプレイ1130に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (BASIC Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, the display 1130.
 ハードディスクドライブ1090は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、学習装置10の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール1093として実装される。プログラムモジュール1093は、例えばハードディスクドライブ1090に記憶される。例えば、学習装置10における機能構成と同様の処理を実行するためのプログラムモジュール1093が、ハードディスクドライブ1090に記憶される。なお、ハードディスクドライブ1090は、SSD(Solid State Drive)により代替されてもよい。 The hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the program that defines each process of the learning device 10 is implemented as a program module 1093 in which a code that can be executed by a computer is described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing the same processing as the functional configuration in the learning device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
 また、上述した実施形態の処理で用いられる設定データは、プログラムデータ1094として、例えばメモリ1010やハードディスクドライブ1090に記憶される。そして、CPU1020は、メモリ1010やハードディスクドライブ1090に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して、上述した実施形態の処理を実行する。 Further, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes the process of the above-described embodiment.
 なお、プログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1090に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ1100等を介してCPU1020によって読み出されてもよい。あるいは、プログラムモジュール1093及びプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール1093及びプログラムデータ1094は、他のコンピュータから、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to those stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read from another computer by the CPU 1020 via the network interface 1070.
 10 学習装置
 11 入出力部
 12 記憶部
 121 モデル情報
 13 制御部
 131 生成部
 132 変換部
 133 計算部
 134 更新部
10 Learning device 11 Input / output unit 12 Storage unit 121 Model information 13 Control unit 131 Generation unit 132 Conversion unit 133 Calculation unit 134 Update unit

Claims (7)

  1.  第1のデータを第1の周波数成分に変換し、敵対的学習モデルを構成する生成器によって生成された第2のデータを第2の周波数成分に変換する変換部と、
     前記第1の周波数成分と前記第2の周波数成分との間の誤差を計算する計算部と、
     前記計算部によって計算された誤差が小さくなるように前記生成器のパラメータを更新する更新部と、
     を有することを特徴とする学習装置。
    A conversion unit that converts the first data into the first frequency component and the second data generated by the generator constituting the hostile learning model into the second frequency component.
    A calculation unit that calculates the error between the first frequency component and the second frequency component,
    An updater that updates the parameters of the generator so that the error calculated by the calculator is smaller,
    A learning device characterized by having.
  2.  前記変換部は、微分可能な関数を用いて、前記第1のデータ及び前記第2のデータを周波数成分に変換することを特徴とする請求項1に記載の学習装置。 The learning device according to claim 1, wherein the conversion unit converts the first data and the second data into frequency components by using a differentiable function.
  3.  前記変換部は、離散フーリエ変換又は離散コサイン変換により前記第1のデータ及び前記第2のデータを周波数成分に変換することを特徴とする請求項2に記載の学習装置。 The learning device according to claim 2, wherein the conversion unit converts the first data and the second data into frequency components by a discrete Fourier transform or a discrete cosine transform.
  4.  前記計算部は、複数の前記第1のデータのそれぞれを変換して得られた複数の前記第1の周波数成分のバッチ平均と、複数の前記第2のデータのそれぞれを変換して得られた複数の前記第2の周波数成分のバッチ平均と、の間の誤差を計算することを特徴とする請求項1に記載の学習装置。 The calculation unit was obtained by converting each of the batch average of the plurality of first frequency components obtained by converting each of the plurality of the first data and the plurality of the second data. The learning apparatus according to claim 1, wherein an error between a plurality of batch averages of the second frequency components is calculated.
  5.  前記計算部は、前記第1の周波数成分と前記第2の周波数成分との間の誤差が大きいほど大きくなり、前記敵対的学習モデルを構成する識別器による前記第1のデータと前記第2のデータとの識別精度が低いほど大きくなる損失関数を計算し、
     前記更新部は、前記損失関数が最適化されるように、前記生成器のパラメータを更新することを特徴とする請求項1に記載の学習装置。
    The calculation unit increases as the error between the first frequency component and the second frequency component increases, and the first data and the second data by the classifier constituting the hostile learning model become larger. Calculate the loss function, which increases as the accuracy of discrimination from the data decreases.
    The learning device according to claim 1, wherein the updating unit updates the parameters of the generator so that the loss function is optimized.
  6.  学習装置によって実行される学習方法であって、
     第1のデータを第1の周波数成分に変換し、敵対的学習モデルを構成する生成器によって生成された第2のデータを第2の周波数成分に変換する変換工程と、
     前記第1の周波数成分と前記第2の周波数成分との間の誤差を計算する計算工程と、
     前記計算工程によって計算された誤差が小さくなるように前記生成器のパラメータを更新する更新工程と、
     を含むことを特徴とする学習方法。
    A learning method performed by a learning device,
    A conversion step of converting the first data into the first frequency component and converting the second data generated by the generator constituting the hostile learning model into the second frequency component.
    A calculation step of calculating the error between the first frequency component and the second frequency component,
    An update process that updates the parameters of the generator so that the error calculated by the calculation process is small, and
    A learning method characterized by including.
  7.  コンピュータを、請求項1から5のいずれか1項に記載の学習装置として機能させるための学習プログラム。 A learning program for making a computer function as the learning device according to any one of claims 1 to 5.
PCT/JP2020/037256 2020-09-30 2020-09-30 Learning device, learning method, and learning program WO2022070342A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/037256 WO2022070342A1 (en) 2020-09-30 2020-09-30 Learning device, learning method, and learning program
JP2022553336A JPWO2022070342A1 (en) 2020-09-30 2020-09-30

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/037256 WO2022070342A1 (en) 2020-09-30 2020-09-30 Learning device, learning method, and learning program

Publications (1)

Publication Number Publication Date
WO2022070342A1 true WO2022070342A1 (en) 2022-04-07

Family

ID=80950008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/037256 WO2022070342A1 (en) 2020-09-30 2020-09-30 Learning device, learning method, and learning program

Country Status (2)

Country Link
JP (1) JPWO2022070342A1 (en)
WO (1) WO2022070342A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020087103A (en) * 2018-11-28 2020-06-04 株式会社ツバサファクトリー Learning method, computer program, classifier, and generator

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020087103A (en) * 2018-11-28 2020-06-04 株式会社ツバサファクトリー Learning method, computer program, classifier, and generator

Also Published As

Publication number Publication date
JPWO2022070342A1 (en) 2022-04-07

Similar Documents

Publication Publication Date Title
WO2020167490A1 (en) Incremental training of machine learning tools
US20230196202A1 (en) System and method for automatic building of learning machines using learning machines
JP6870508B2 (en) Learning programs, learning methods and learning devices
JP6992709B2 (en) Mask estimation device, mask estimation method and mask estimation program
US20220414490A1 (en) Storage medium, machine learning method, and machine learning device
US11048852B1 (en) System, method and computer program product for automatic generation of sizing constraints by reusing existing electronic designs
Sun et al. Sparse deep learning: A new framework immune to local traps and miscalibration
US20240119266A1 (en) Method for Constructing AI Integrated Model, and AI Integrated Model Inference Method and Apparatus
JP2024051136A (en) Learning device, learning method, learning program, estimation device, estimation method, and estimation program
Liu et al. Gradient‐Sensitive Optimization for Convolutional Neural Networks
WO2022070342A1 (en) Learning device, learning method, and learning program
WO2020170803A1 (en) Augmentation device, augmentation method, and augmentation program
CN110489435B (en) Data processing method and device based on artificial intelligence and electronic equipment
WO2022070343A1 (en) Learning device, learning method, and learning program
JP2020134567A (en) Signal processing device, signal processing method and signal processing program
WO2022249418A1 (en) Learning device, learning method, and learning program
US20240220814A1 (en) Training device, training method and training program
Sun et al. Generalizing expectation propagation with mixtures of exponential family distributions and an application to Bayesian logistic regression
WO2019208248A1 (en) Learning device, learning method, and learning program
JP7047664B2 (en) Learning device, learning method and prediction system
JP7099254B2 (en) Learning methods, learning programs and learning devices
JP7077746B2 (en) Learning equipment, learning methods and learning programs
WO2023067666A1 (en) Calculation device, calculation method, and calculation program
CN114970431B (en) Training method and device for MOS tube parameter estimation model
Jiang et al. Renewable Huber estimation method for streaming datasets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20956270

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022553336

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20956270

Country of ref document: EP

Kind code of ref document: A1