JP2023176750A

JP2023176750A - Learning apparatus, learning method, and learning program

Info

Publication number: JP2023176750A
Application number: JP2022089193A
Authority: JP
Inventors: 知克高橋; Tomokatsu Takahashi; 侑雅阿蘇品; Yuga Asoshina
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2023-12-13

Abstract

To provide a learning apparatus, a learning method, and a learning program, configured to realize robustness of VAE (Variational AutoEncoder) composed of two models including an encoder and a decoder, for a hostile input sample.SOLUTION: A learning apparatus 10 includes: an estimation unit 15 which performs estimation using VAE 151 having an encoder which converts an input image into a low-dimensional latent variable and a decoder which reconstructs the input image from the latent variable to be output as an output image; a selection unit 12 which selects an original image from a group of images with class labels attached thereto, and randomly selects an arbitrary target image from an image dataset 11; a noise generation unit 13 which generates noise from the target image; a processing unit 14 which generates a processed image by adding the noise to the original image; and a learning unit 16 which executes learning of the VAE 151 using the class label of the original image as a training label, and the processed image as learning data.SELECTED DRAWING: Figure 3

Description

本発明は、学習装置、学習方法及び学習プログラムに関する。 The present invention relates to a learning device, a learning method, and a learning program.

Adversarial Exampleとは、あるデータに対して人が認知できない、かつ、作為的な微小の摂動を加えることで作成される、深層学習の出力を撹乱することを目的とした敵対的な入力サンプルのことである。Adversarial Exampleは主に深層学習による画像分類問題の分野にて、議論されていたが、昨今では様々な分野におけるAdversarial Exampleの存在が指摘されている。 An adversarial example is an adversarial input sample that is created by adding a small artificial perturbation to certain data that cannot be perceived by humans, and is intended to disrupt the output of deep learning. It is. Adversarial examples were mainly discussed in the field of image classification problems using deep learning, but recently the existence of adversarial examples in various fields has been pointed out.

ＶＡＥ（Variational AutoEncoder）は、エンコーダとデコーダの二つのモデルからなる深層学習アーキテクチャである。ＶＡＥは、入力データをエンコーダにより低次元の潜在変数に落とし込んだ後、デコーダにより潜在変数から入力データを復元するように学習を行うことで、データの生成分布を学習する。このため、ＶＡＥは、正常データの生成分布を学習して、それから外れたものを判別する異常検知に利用される。また、ＶＡＥが変換する潜在変数にはデータを復元するために重要な特徴量が含まれるため、ＶＡＥが変換する特徴量は、様々な機械学習タスクに利用される。 VAE (Variational AutoEncoder) is a deep learning architecture consisting of two models: an encoder and a decoder. VAE learns the data generation distribution by converting input data into low-dimensional latent variables using an encoder, and then performing learning to restore the input data from the latent variables using a decoder. For this reason, VAE is used for abnormality detection, which learns the generation distribution of normal data and discriminates data that deviates from it. Furthermore, since the latent variables transformed by VAE include features important for restoring data, the features transformed by VAE are used for various machine learning tasks.

Ian J. Goodfellow, and Jonathon Shlens & Christian Szegedy, “EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLE”, ［online］，［令和４年２月１６日検索］，インターネット＜ＵＲＬ：https://arxiv.org/abs/1412.6572＞Ian J. Goodfellow, and Jonathon Shlens & Christian Szegedy, “EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLE”, [online], [searched on February 16, 2020], Internet <URL: https://arxiv.org/abs/ 1412.6572＞ Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks”, ［online］，［令和４年２月１６日検索］，インターネット＜ＵＲＬ：https://arxiv.org/abs/1706.06083＞Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards Deep Learning Models Resistant to Adversarial Attacks”, [online], [Retrieved February 16, 2020], Internet <URL: https:/ /arxiv.org/abs/1706.06083＞ Diederik P. Kingma, and Max Welling, “Auto-Encoding Variational Bayes”, ［online］，［令和４年２月１６日検索］，インターネット＜ＵＲＬ：https://arxiv.org/abs/1312.6114＞Diederik P. Kingma, and Max Welling, “Auto-Encoding Variational Bayes”, [online], [searched on February 16, 2020], Internet <URL: https://arxiv.org/abs/1312.6114> Jernej Kos, Ian Fischer, and Dawn Song, “ADVERSARIAL EXAMPLES FOR GENERATIVE MODELS”, ［online］，［令和４年２月１６日検索］，インターネット＜ＵＲＬ：https://arxiv.org/abs/1702.06832＞Jernej Kos, Ian Fischer, and Dawn Song, “ADVERSARIAL EXAMPLES FOR GENERATIVE MODELS”, [online], [searched on February 16, 2020], Internet <URL: https://arxiv.org/abs/1702.06832>

ここで、ＶＡＥに対するAdversarial Exampleの脅威が示唆されている。ＶＡＥに対するAdversarial Exampleは、入力データに対し摂動（ノイズ）を載せることで、潜在変数やデコーダから再構成されるデータを任意のデータのものにする攻撃である。ＶＡＥは、微小かつ作為的なノイズを入力データに載せるAdversarial Exampleに対し、脆弱であり、その出力を撹乱されるという課題があった。具体的には、この攻撃により、ＶＡＥを使用する異常検知器が騙されたり、ＶＡＥが抽出した特徴量を利用する様々なDown Stream タスクの信頼性が脅威に晒されたりしている。 Here, the threat of Adversarial Example to VAE is suggested. An adversarial example against VAE is an attack that makes the data reconstructed from latent variables and decoders arbitrary data by adding perturbation (noise) to the input data. VAE is vulnerable to adversarial examples that add minute and artificial noise to input data, and the problem is that the output is disturbed. Specifically, this attack has tricked anomaly detectors that use VAE, and has threatened the reliability of various Down Stream tasks that utilize features extracted by VAE.

本発明は、上記に鑑みてなされたものであって、Adversarial Exampleに対して頑健なＶＡＥを実現する学習装置、学習方法及び学習プログラムを提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a learning device, a learning method, and a learning program that realize a robust VAE for Adversarial Examples.

上述した課題を解決し、目的を達成するために、学習装置は、入力画像を低次元の潜在変数に変換するエンコーダと、潜在変数から入力画像を再構成して出力画像として出力するデコーダとを有するＶＡＥを用いて推定を行う推定部と、それぞれクラスラベルが付された画像群から、オリジナル画像を選択し、画像群から任意のターゲット画像をランダムに選択する選択部と、ターゲット画像からノイズを生成する生成部と、オリジナル画像にノイズを加えた加工画像を生成する加工部と、オリジナル画像のクラスラベルを教師ラベルとし、加工画像を学習データとしてＶＡＥの学習を実行する学習部と、有することを特徴とする。 In order to solve the above-mentioned problems and achieve the objectives, the learning device includes an encoder that converts an input image into a low-dimensional latent variable, and a decoder that reconstructs the input image from the latent variables and outputs it as an output image. an estimation unit that performs estimation using VAE, a selection unit that selects an original image from a group of images each with a class label and randomly selects an arbitrary target image from the image group, and a selection unit that removes noise from the target image. A processing unit that generates a processed image by adding noise to the original image; and a learning unit that uses the class label of the original image as a teacher label and executes VAE learning using the processed image as learning data. It is characterized by

本発明によれば、Adversarial Exampleに対して頑健なＶＡＥを実現する。 According to the present invention, robust VAE is realized for Adversarial Examples.

図１は、ＶＡＥの概要を説明する図である。FIG. 1 is a diagram illustrating an overview of VAE. 図２は、ＶＡＥに対するAdversarial Exampleを説明する図である。FIG. 2 is a diagram illustrating an Adversarial Example for VAE. 図３は、実施の形態１に係る学習装置の構成の一例を模式的に示す図である。FIG. 3 is a diagram schematically showing an example of the configuration of the learning device according to the first embodiment. 図４は、実施の形態１に係る学習方法の処理手順を示すフローチャートである。FIG. 4 is a flowchart showing the processing procedure of the learning method according to the first embodiment. 図５は、通常のＶＡＥモデルと、実施の形態１にかかる学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルとの潜在変数空間の可視化図である。FIG. 5 is a visualization diagram of the latent variable space of a normal VAE model and a VAE model that has been subjected to adversarial training using the learning method according to the first embodiment. 図６は、実施の形態２に係る学習装置の構成の一例を模式的に示す図である。FIG. 6 is a diagram schematically showing an example of the configuration of a learning device according to the second embodiment. 図７は、実施の形態２に係る学習方法の処理手順を示すフローチャートである。FIG. 7 is a flowchart showing the processing procedure of the learning method according to the second embodiment. 図８は、通常のＶＡＥモデルと、実施の形態１，２に係る学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルとの潜在変数空間の可視化図である。FIG. 8 is a visualization diagram of the latent variable space of a normal VAE model and a VAE model that has undergone adversarial training using the learning methods according to the first and second embodiments. 図９は、実施の形態１，２に係る学習方法を用いて学習したＶＡＥの標識分類システムへの適応を示す図である。FIG. 9 is a diagram showing adaptation of VAE learned using the learning methods according to Embodiments 1 and 2 to the sign classification system. 図１０は、プログラムを実行するコンピュータの一例を示す図である。FIG. 10 is a diagram showing an example of a computer that executes a program.

以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。なお、以下では、ベクトルまたは行列であるＡに対し、“＾Ａ”と記載する場合は「“Ａ”の直上に“＾”が記された記号」と同じであるとする。 Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the description of the drawings, the same parts are denoted by the same reference numerals. In addition, in the following, when A, which is a vector or a matrix, is written as "^A", it is assumed that it is the same as "a symbol with "^" written directly above "A"".

［実施の形態１］
本実施の形態では、データにノイズを加えることでＶＡＥに誤判定させるAdversarial Exampleに対し、ＶＡＥへのAdversarial Training（非特許文献２）の適応を最適化した学習方法を提案する。Adversarial Trainingは、学習データとしてAdversarial Exampleを用いて、モデルを学習する方法である。 [Embodiment 1]
In this embodiment, we propose a learning method that optimizes the adaptation of Adversarial Training (Non-Patent Document 2) to VAE for an Adversarial Example that causes VAE to make a false judgment by adding noise to data. Adversarial Training is a method of learning a model using Adversarial Examples as training data.

［記号］
表１に実施の形態で使用する各記号の詳細を示す。 [symbol]
Table 1 shows details of each symbol used in the embodiment.

［ＶＡＥ］
ＶＡＥについて説明する。図１は、ＶＡＥの概要を説明する図である。図１に示すように、ＶＡＥは、エンコーダとデコーダとの二つのモデルによって構成されている。エンコーダは、入力画像ｘを低次元の潜在変数ｚに圧縮し、デコーダは、潜在変数ｚから入力画像ｘを再構成する。デコーダは、再構成した画像を出力画像＾ｘとして出力する。 [VAE]
VAE will be explained. FIG. 1 is a diagram illustrating an overview of VAE. As shown in FIG. 1, VAE is composed of two models: an encoder and a decoder. The encoder compresses the input image x into low-dimensional latent variables z, and the decoder reconstructs the input image x from the latent variables z. The decoder outputs the reconstructed image as an output image ^x.

これにより、ＶＡＥはデータの生成分布ｐ（ｘ）を学習する。ＶＡＥの学習に用いられる従来の最適化関数を式（１）に示す。従来、式（１）を用いてエンコーダ及びデコーダのモデルパラメータを最適化されていた。 Thereby, the VAE learns the data generation distribution p(x). A conventional optimization function used for VAE learning is shown in equation (1). Conventionally, model parameters of encoders and decoders have been optimized using equation (1).

［ＶＡＥに対するAdversarial Example］
ＶＡＥに対するAdversarial Exampleは、図１で示した入力画像ｘに対して作為的なノイズを載せることで、潜在変数や出力画像を任意のターゲット画像のものに変更してしまう攻撃である。図２は、ＶＡＥに対するAdversarial Exampleを説明する図である。 [Adversarial Example for VAE]
An adversarial example against VAE is an attack in which the latent variables and output image are changed to those of an arbitrary target image by adding artificial noise to the input image x shown in FIG. FIG. 2 is a diagram illustrating an Adversarial Example for VAE.

図２に示すように、この攻撃は、入力画像であるＫＬ（Kullback-Leibler）オリジナル画像（以降、オリジナル画像）ｘの潜在変数とターゲット画像ｘ^ｔの潜在変数とをＫＬダイバージェンスによって近づけるようなノイズ（Adversarial Perturbation）ｄを生成することで行われる。 As shown in Figure 2, this attack uses noise that brings the latent variables of the KL (Kullback-Leibler) original image (hereinafter referred to as the original image), which is the input image, closer to the latent variables of the target image ^xt by KL divergence. (Adversarial Perturbation) This is done by generating d.

この攻撃の概要を式（２）に示す。 An outline of this attack is shown in equation (2).

そして、この攻撃では、ノイズｄをオリジナル画像ｘに加えた画像ｘ^ａｄｖをＶＡＥの入力とする。言い換えると、この攻撃では、ターゲット画像ｘ^ｔと潜在変数の差が大きいオリジナル画像ｘではなく、ターゲット画像ｘ^ｔと潜在変数の差が小さい画像ｘ^ａｄｖをＶＡＥに入力し、ＶＡＥからターゲット画像ｘ^ｔに近い画像ｘ^{ａｄｖ＿ｒｅｃ}が出力されるようにする。この攻撃により、ＶＡＥを使用する異常検知器の誤検出が起こるほか、ＶＡＥが抽出した特徴量を利用する様々なDown Stream タスクの信頼性が脅威に晒される。 In this attack, an image x ^adv obtained by adding noise d to the original image x is input to the VAE. In other words, in this attack, instead of the original image x with a large difference between the target image x ^t and the latent variable, an image x ^adv with a small difference between the target image x ^t and the latent variable is input to the VAE, and the target image x ^t is input from the VAE. An image x ^adv_rec close to . This attack not only causes false positives in anomaly detectors that use VAE, but also threatens the reliability of various Down Stream tasks that utilize features extracted by VAE.

［学習装置］
次に、実施の形態１に係る学習装置について説明する。実施の形態１に係る学習装置は、データにノイズを加えることでＶＡＥに誤判定させるAdversarial Exampleに対し、ＶＡＥへのAdversarial Trainingの適応を最適化した学習方法を実行することで、Adversarial Exampleに対して頑健なＶＡＥを実現する。 [Learning device]
Next, a learning device according to Embodiment 1 will be explained. The learning device according to Embodiment 1 performs a learning method that optimizes the adaptation of Adversarial Training to VAE for Adversarial Examples that causes VAE to misjudge by adding noise to data. Achieve robust VAE.

図３は、実施の形態１に係る学習装置の構成の一例を模式的に示す図である。実施の形態１に係る学習装置１０は、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＣＰＵ（Central Processing Unit）等を含むコンピュータ等に所定のプログラムが読み込まれて、ＣＰＵが所定のプログラムを実行することで実現される。また、学習装置１０は、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースを有する。 FIG. 3 is a diagram schematically showing an example of the configuration of the learning device according to the first embodiment. In the learning device 10 according to the first embodiment, a predetermined program is loaded into a computer or the like including, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a CPU (Central Processing Unit), etc., and the CPU executes a predetermined program. This is achieved by running the program. Further, the learning device 10 has a communication interface for transmitting and receiving various information with other devices connected via a network or the like.

実施の形態１に係る学習装置１０は、画像データセット１１、選択部１２、ノイズ生成部１３、加工部１４、推定部１５及び学習部１６を有する。 The learning device 10 according to the first embodiment includes an image data set 11, a selection section 12, a noise generation section 13, a processing section 14, an estimation section 15, and a learning section 16.

推定部１５は、ＶＡＥ１５１を用いて、入力画像ｘを基に所定の情報を推定する。ＶＡＥ１５１は、図１に示すように、入力画像ｘを低次元の潜在変数ｚに変換するエンコーダと、潜在変数ｚから入力画像を再構成し、再構成した画像を出力画像＾ｘとして出力するデコーダとを有する。ＶＡＥ１５１のエンコーダのモデルパラメータφと、デコーダのモデルパラメータθとは、学習部１６（後述）による学習の実行により、Adversarial Exampleに対し頑健となるように最適化される。 The estimation unit 15 uses the VAE 151 to estimate predetermined information based on the input image x. As shown in Figure 1, the VAE 151 includes an encoder that converts an input image x into a low-dimensional latent variable z, and a decoder that reconstructs the input image from the latent variable z and outputs the reconstructed image as an output image ^x. and has. The model parameter φ of the encoder and the model parameter θ of the decoder of the VAE 151 are optimized to be robust against the Adversarial Example by execution of learning by the learning unit 16 (described later).

画像データセット１１は、それぞれクラスラベルが付された画像群である。画像データセット１１は、例えば、「0」～「9」の手書き数字の画像とラベルとがセットになったＭＮＩＳＴ（Modified National Institute of Standards and Technology）データセット（［online］，［令和４年２月１６日検索］，インターネット＜ＵＲＬ：https://arxiv.org/abs/1706.06083＞）である。 The image data set 11 is a group of images each assigned a class label. The image dataset 11 is, for example, the MNIST (Modified National Institute of Standards and Technology) dataset ([online], [Reiwa 4 [Retrieved on February 16th], Internet <URL: https://arxiv.org/abs/1706.06083>).

選択部１２は、画像データセット１１から、オリジナル画像ｘと、任意のターゲット画像ｘ^ｔとを選択する。選択部１２は、任意のターゲット画像ｘ^ｔをランダムに選択する。
選択部１２は、例えば、「3」の手書き数字の画像をオリジナル画像ｘとして選択し、「3」以外の手書き数字のいずれかの画像をランダムにターゲット画像ｘ^ｔとして選択する。オリジナル画像ｘは、ＶＡＥ１５１に再構成させたい画像である。ターゲット画像ｘ^ｔは、Adversarial Exampleのターゲットである。ターゲット画像ｘ^ｔは、Adversarial Exampleが、ＶＡＥ１５１の潜在変数や出力画像を、オリジナル画像のものから変更させたい画像である。 The selection unit 12 selects an original image x and an arbitrary target image ^xt from the image data set 11. The selection unit 12 randomly selects any target image ^xt .
For example, the selection unit 12 selects an image of a handwritten numeral "3" as the original image x, and randomly selects any image of a handwritten numeral other than "3" as the target image ^xt . The original image x is an image that the VAE 151 wants to reconstruct. The target image ^xt is the target of the Adversarial Example. The target image ^xt is an image in which the Adversarial Example wants to change the latent variables and output image of the VAE 151 from those of the original image.

ノイズ生成部１３は、オリジナル画像ｘ及びターゲット画像ｘ^ｔからノイズｄを生成する。ノイズ生成部１３は、入力画像ｘに対し、入力画像ｘの潜在変数ｚがターゲット画像ｘ^ｔを入力したときの潜在変数ｚに近づけるように、入力画像ｘのピクセルを操作し、ノイズを生成する。具体的には、ノイズ生成部１３は、式（２）を最適化することによって、ノイズｄを生成する。 The noise generation unit 13 generates noise d from the original image x and the target image ^xt . The noise generation unit 13 generates noise by manipulating the pixels of the input image x so that the latent variable z of the input image x approaches the latent variable z when the target image ^xt is input. . Specifically, the noise generation unit 13 generates the noise d by optimizing equation (2).

加工部１４は、オリジナル画像ｘに、ノイズｄを加えた加工画像ｘ＋ｄを生成する。加工画像ｘ＋ｄは、ＶＡＥ１５１学習用のAdversarial Exampleとなる。 The processing unit 14 generates a processed image x+d by adding noise d to the original image x. The processed image x+d becomes an Adversarial Example for VAE151 learning.

学習部１６は、オリジナル画像ｘのクラスラベルを教師ラベルとし、加工部１４が生成した加工画像ｘ＋ｄを学習データとしてＶＡＥ１５１の学習を実行する。学習装置１０は、Adversarial Exampleを生成し、生成したAdversarial Exampleを学習データとして、ＶＡＥ１５１の学習を行う。学習部１６は、加工画像ｘ＋ｄがＶＡＥ１５１に入力されても、ＶＡＥ１５１からオリジナル画像ｘに近い画像が出力されるように、ＶＡＥの学習を実行する。学習部１６は、パラメータ更新部１６１と、終了判定部１６２とを有する。 The learning unit 16 executes learning of the VAE 151 using the class label of the original image x as a teacher label and the processed image x+d generated by the processing unit 14 as learning data. The learning device 10 generates an Adversarial Example and performs learning of the VAE 151 using the generated Adversarial Example as learning data. The learning unit 16 performs VAE learning so that even if the processed image x+d is input to the VAE 151, an image close to the original image x is output from the VAE 151. The learning section 16 includes a parameter updating section 161 and a termination determining section 162.

パラメータ更新部１６１は、オリジナル画像ｘのクラスラベルを教師ラベルとし、加工画像ｘ＋ｄを学習データとして、ＶＡＥ１５１の学習を実行する。パラメータ更新部２１６１は、加工画像ｘ＋ｄを入力し、オリジナル画像ｘを出力するように、エンコーダのモデルパラメータφとデコーダのモデルパラメータθを更新する。パラメータ更新部１６１は、式（３）を用いてエンコーダのモデルパラメータφとデコーダのモデルパラメータθを最適化する。ここで、ｄは、ランダムなターゲット画像ｘ^ｔに対して生成される。 The parameter update unit 161 executes learning of the VAE 151 using the class label of the original image x as a teacher label and the processed image x+d as learning data. The parameter updating unit 2161 receives the processed image x+d and updates the encoder model parameter φ and the decoder model parameter θ so as to output the original image x. The parameter updating unit 161 optimizes the encoder model parameter φ and the decoder model parameter θ using equation (3). Here, d is generated for a random target image ^xt .

終了判定部１６２は、所定の終了条件を満たす場合、ＶＡＥ１５１の学習処理を終了する。終了条件は、例えば、損失が所定の閾値以下となった場合、パラメータの更新回数が所定の回数に到達した場合、パラメータ更新量が所定の閾値以下となった場合などである。 The termination determination unit 162 terminates the learning process of the VAE 151 when a predetermined termination condition is satisfied. The termination conditions include, for example, when the loss becomes less than or equal to a predetermined threshold, when the number of parameter updates reaches a predetermined number, or when the amount of parameter updates becomes less than or equal to a predetermined threshold.

［学習処理］
次に、学習装置１０が実行する学習方法処理手順について説明する。図４は、実施の形態１に係る学習方法の処理手順を示すフローチャートである。 [Learning process]
Next, the learning method processing procedure executed by the learning device 10 will be explained. FIG. 4 is a flowchart showing the processing procedure of the learning method according to the first embodiment.

図４に示すように、学習装置１０では、選択部１２が、画像データセット１１から、オリジナル画像ｘと、任意のターゲット画像ｘ^ｔとを選択する（ステップＳ１）。ノイズ生成部１３は、オリジナル画像ｘ及びターゲット画像ｘ^ｔからノイズｄを生成する（ステップＳ２）。加工部１４は、オリジナル画像ｘに、ノイズｄを加えた加工画像ｘ＋ｄを生成する（ステップＳ３）。 As shown in FIG. 4, in the learning device 10, the selection unit 12 selects an original image x and an arbitrary target image ^xt from the image data set 11 (step S1). The noise generation unit 13 generates noise d from the original image x and the target image ^xt (step S2). The processing unit 14 generates a processed image x+d by adding noise d to the original image x (step S3).

学習部１６は、加工画像ｘ＋ｄをＶＡＥ１５１に入力し、推定部１５に、再構成した画像を推定させる（ステップＳ４）。学習部１６は、式（３）を用いてエンコーダのモデルパラメータφとデコーダのモデルパラメータθを更新する（ステップＳ５）。 The learning unit 16 inputs the processed image x+d to the VAE 151, and causes the estimation unit 15 to estimate the reconstructed image (step S4). The learning unit 16 updates the encoder model parameter φ and the decoder model parameter θ using equation (3) (step S5).

そして、学習部１６は、所定の終了条件を満たすか否かを判定する（ステップＳ６）。所定の終了条件を満たしていない場合（ステップＳ６：Ｎｏ）、学習装置１０は、ステップＳ１に戻る。学習装置１０は、ステップＳ１～ステップＳ５の処理を所定の終了条件を満たすまで繰り返し、ＶＡＥ１５１を訓練する。所定の終了条件を満たした場合（ステップＳ６：Ｙｅｓ）、学習装置１０は、ＶＡＥ１５１の学習処理を終了する。 The learning unit 16 then determines whether a predetermined termination condition is satisfied (step S6). If the predetermined end condition is not met (step S6: No), the learning device 10 returns to step S1. The learning device 10 trains the VAE 151 by repeating the processing of steps S1 to S5 until a predetermined termination condition is satisfied. If the predetermined termination condition is satisfied (step S6: Yes), the learning device 10 terminates the learning process of the VAE 151.

［実施の形態１の効果］
実施の形態１では、ＶＡＥのAdversarial Trainingに適したAdversarial Exampleの生成し、ＶＡＥの学習を行った。Adversarial Trainingは、学習データとしてAdversarial Exampleを用いて、モデルを学習する方法である。 [Effects of Embodiment 1]
In the first embodiment, an Adversarial Example suitable for VAE Adversarial Training was generated and VAE learning was performed. Adversarial Training is a method of learning a model using Adversarial Examples as training data.

一般的に、従来のAdversarial Trainingに用いられるAdversarial Exampleは、non-target attackによって作成された。non-target attackは、モデルの誤差関数の値が悪くなるようにAdversarial Exampleを作成するものである。すなわち、一般的にAdversarial Trainingは、モデルの誤差関数を悪化させるようなノイズを、入力データに載せるものである。 Generally, Adversarial Examples used in conventional Adversarial Training are created using non-target attacks. A non-target attack is one that creates an adversarial example so that the value of the model's error function becomes worse. That is, in general, adversarial training adds noise to input data that worsens the error function of the model.

これに対し、ＶＡＥに対するAdversarial Exampleは、ターゲットとなるターゲット画像があり、このターゲット画像をＶＡＥに再構成させる攻撃である。言い換えると、ＶＡＥに対するAdversarial Exampleは、あるターゲット画像に対して元の入力画像を近づけるように入力画像にノイズを載せ、ＶＡＥにおける潜在変数や出力画像を、そのターゲット画像のものに変更させてしまう攻撃である。このため、ＶＡＥに対するAdversarial Exampleは、従来のnon-target attackによって作成されるAdversarial Exampleと大きく乖離しており、そのまま適用すると問題設定のずれがあった。 On the other hand, the Adversarial Example against VAE is an attack in which there is a target image and the VAE is made to reconstruct this target image. In other words, an adversarial example for VAE is an attack that adds noise to the input image so that the original input image approaches a certain target image, and changes the latent variables and output image in VAE to those of the target image. It is. For this reason, the Adversarial Example for VAE differs greatly from the Adversarial Example created by conventional non-target attacks, and if applied as is, there would be a deviation in problem setting.

そこで、実施の形態１では、、毎回ターゲット画像をランダムに選択し、選択したターゲット画像に近づけるように、入力画像に、ターゲット画像から生成したノイズを載せるという加工を行う。これによって、実施の形態１では、ＶＡＥに適したAdversarial Exampleを作成し、Adversarial Exampleに対するＶＡＥ１５１のAdversarial Trainingを実現した。 Therefore, in the first embodiment, a target image is randomly selected each time, and noise generated from the target image is added to the input image so that it approaches the selected target image. As a result, in the first embodiment, an Adversarial Example suitable for VAE was created, and Adversarial Training of the VAE 151 for the Adversarial Example was realized.

そして、実施の形態１では、Adversarial Trainingに用いるAdversarial Exampleを、毎回ターゲット画像をランダムに選んで生成する。これによって、実施の形態では、幅広い攻撃のバリエーションに対応できるように、攻撃のバリエーションに合わせて幅広くノイズを生成して、ＶＡＥの学習を行った。 In the first embodiment, an Adversarial Example used for Adversarial Training is generated by randomly selecting a target image each time. As a result, in the embodiment, in order to be able to deal with a wide range of attack variations, VAE learning is performed by generating a wide range of noises in accordance with the attack variations.

これまで、ＶＡＥに対するAdversarial Exampleの脅威は知られているが、それに対する防御方法はなかった。実施の形態１では、ＶＡＥに対してAdversarial Trainingを適応することができ、Adversarial Exampleに対して剛健なＶＡＥを実現することが可能である。 Until now, the threat of Adversarial Example to VAE was known, but there was no way to protect against it. In the first embodiment, Adversarial Training can be applied to VAE, and robust VAE can be realized for Adversarial Example.

［実施の形態２］
図５は、通常のＶＡＥモデルと、実施の形態１にかかる学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルとの潜在変数空間の可視化図である。図５は、ＶＡＥの潜在変数空間を２次元に落とし込み、さらにクラスラベルごとにデータをマークの違いで分けしたものである。図５が示すように、潜在変数空間において、Adversarial Trainingを行ったＶＡＥモデルは、従来の通常のモデルと比して、同じ傾向のデータが集まり、違う傾向のデータとは異なるように学習されていることがわかる。実施の形態２では、これに着目し、ＶＡＥが変換した潜在変数を基に、異なるクラス間で潜在変数空間の距離をさらに離すようにＶＡＥの学習を実行する。 [Embodiment 2]
FIG. 5 is a visualization diagram of the latent variable space of a normal VAE model and a VAE model that has been subjected to adversarial training using the learning method according to the first embodiment. Figure 5 shows the latent variable space of VAE reduced to two dimensions, and the data is further divided by mark for each class label. As shown in Figure 5, in the latent variable space, the VAE model that has undergone Adversarial Training is different from the conventional normal model in that data with the same tendency is collected and data with different trends is trained differently. I know that there is. In the second embodiment, focusing on this, VAE learning is performed to further increase the distance in the latent variable space between different classes based on the latent variables transformed by VAE.

［学習装置］
図６は、実施の形態２に係る学習装置の構成の一例を模式的に示す図である。実施の形態２に係る学習装置２１０は、図３に示す学習装置１０と比して、修正部２１２をさらに有する。また、学習装置２１０は、図３に示す学習部１６に代えて、学習部１６を有する。 [Learning device]
FIG. 6 is a diagram schematically showing an example of the configuration of a learning device according to the second embodiment. The learning device 210 according to the second embodiment further includes a modification unit 212, unlike the learning device 10 shown in FIG. Further, the learning device 210 includes a learning section 16 instead of the learning section 16 shown in FIG.

修正部２１２は、異なるクラス間で潜在変数空間の距離をさらに離すようにＶＡＥ１５１の学習を実行するために、式（４）のように潜在変数空間を確率分布ｐ´（z）に近づけるように式（３）の最適化関数を修正する。 In order to execute learning of the VAE 151 so as to further increase the distance in the latent variable space between different classes, the modification unit 212 adjusts the latent variable space to approach the probability distribution p'(z) as shown in equation (4). Modify the optimization function in equation (3).

修正部２１２は、まず、式（１）を用いて学習済みの通常のＶＡＥが出力する潜在変数ベクトルを取得し、取得した潜在変数ベクトルをクラスごとに平均したベクトルを取得する。潜在変数ベクトルの取得対象であるＶＡＥは、潜在変数の事前分布ｐ_θ（ｚ）を標準ガウス分布として学習した学習済みのＶＡＥであれば、推定部１５のＶＡＥ１５１であってもよいし、他の装置で学習されたＶＡＥであってもよい。 The modification unit 212 first obtains a latent variable vector output by a trained normal VAE using equation (1), and obtains a vector obtained by averaging the obtained latent variable vectors for each class. The VAE from which the latent variable vector is acquired may be the VAE 151 of the estimating unit 15 as long as it is a trained VAE that has learned the prior distribution p _θ (z) of the latent variable as a standard Gaussian distribution, or it may be the VAE 151 of the estimation unit 15, or it may be any other VAE. It may be a VAE learned by the device.

修正部２１２は、取得した潜在変数ベクトルをクラスごとに平均したベクトルを計算する。修正部２１２は、取得した潜在変数ベクトルを基に、クラスごとに、潜在変数の平均ベクトルを計算する。修正部２１２は、式（４）のように、クラスごとに平均したベクトルをα倍したベクトルを平均とした分散１のガウス分布を、該クラスのｐ´（z）に設定する。つまり、ｐ´（z）は、クラスごとに異なる確率分布となる。なお、特に言及がない場合、α＝２とする。 The modification unit 212 calculates a vector obtained by averaging the acquired latent variable vectors for each class. The modification unit 212 calculates the average vector of latent variables for each class based on the acquired latent variable vector. The modification unit 212 sets p'(z) of the class to a Gaussian distribution with a variance of 1, in which the average vector for each class is multiplied by α, as shown in Equation (4). In other words, p'(z) has a different probability distribution for each class. Note that unless otherwise specified, α=2.

このように、修正部２１２は、クラスごとにｐ´（ｚ）を設定する。すなわち、修正部２１２は、式（３）を、画像のクラスごとに、潜在変数の事前分布ｐ_θ（ｚ）の平均を変えるような処理を行った最適化関数（式（４））に修正する。 In this way, the modification unit 212 sets p'(z) for each class. That is, the modification unit 212 modifies equation (3) into an optimization function (formula (4)) that performs processing to change the average of the prior distribution p _θ (z) of the latent variables for each class of images. do.

学習部２１６では、パラメータ更新部２１６１は、修正部２１２によって修正された式（４）を用いて、ＶＡＥ１５１のエンコーダのモデルパラメータφとデコーダのモデルパラメータθを更新する。 In the learning unit 216, the parameter updating unit 2161 updates the encoder model parameter φ and the decoder model parameter θ of the VAE 151 using equation (4) modified by the modification unit 212.

［学習処理］
次に、学習装置２１０が実行する学習方法処理手順について説明する。図７は、実施の形態２に係る学習方法の処理手順を示すフローチャートである。 [Learning process]
Next, the learning method processing procedure executed by the learning device 210 will be described. FIG. 7 is a flowchart showing the processing procedure of the learning method according to the second embodiment.

図７に示すように、修正部２１２は、異なるクラス間で潜在変数空間の距離をさらに離すようにＶＡＥ１５１の学習を実行するために、潜在変数空間を確率分布ｐ´（z）に近づけるように、式（４）のように最適化関数を修正する（ステップＳ１１）。 As shown in FIG. 7, the modification unit 212 executes learning of the VAE 151 to further increase the distance in the latent variable space between different classes, so that the latent variable space approaches the probability distribution p'(z). , the optimization function is modified as shown in equation (4) (step S11).

ステップＳ１２～ステップＳ１５は、図４に示すステップＳ１～ステップＳ４と同じ処理である。パラメータ更新部２１６１は、修正部２１２によって修正された式（４）を用いて、ＶＡＥ１５１のエンコーダのモデルパラメータφとデコーダのモデルパラメータθを更新する（ステップＳ１６）。ステップＳ１７は、図４に示すステップＳ６と同じ処理である。 Steps S12 to S15 are the same processes as steps S1 to S4 shown in FIG. The parameter update unit 2161 updates the encoder model parameter φ and the decoder model parameter θ of the VAE 151 using equation (4) modified by the modification unit 212 (step S16). Step S17 is the same process as step S6 shown in FIG.

［検証実験］
実施の形態１，２において学習したＶＡＥを実際に検証した。表２は、検証実験に使用するデータセット及び攻撃手法を示す。 [Verification experiment]
The VAE learned in Embodiments 1 and 2 was actually verified. Table 2 shows the dataset and attack techniques used in the verification experiment.

検証実験では、実施の形態１，２に係る学習方法を用いて学習を行ったＶＡＥ１５１のAdversarial Exampleに対する耐性を検証した。前述したように、ＶＡＥに対するAdversarial Exampleによって、再構成画像は元の画像と異なる任意のものに変換されてしまう。 In the verification experiment, the resistance of the VAE 151 trained using the learning methods according to the first and second embodiments to the Adversarial Example was verified. As described above, the Adversarial Example for VAE converts the reconstructed image into an arbitrary image different from the original image.

式（１）を用いて学習を行った通常のＶＡＥ、実施の形態１に係る学習方法を用いて合学習を行ったＶＡＥ、及び、実施の形態２に係る学習方法を用いて学習を行ったＶＡＥについて、再構成画像の画像分類における精度を測定した。モデルが脆弱であれば再構成画像は元々のラベルとは違う画像になっているために分類精度は低下し、頑健であれば再構成画像は入力画像と同一となっているために真のラベル通りに分類することができる。実験結果を表３に示す。 A normal VAE trained using equation (1), a VAE trained using the learning method according to Embodiment 1, and a VAE trained using the learning method according to Embodiment 2. Regarding VAE, the accuracy in image classification of reconstructed images was measured. If the model is weak, the reconstructed image will be a different image from the original label, and the classification accuracy will decrease; if the model is robust, the reconstructed image will be the same as the input image, so the true label will be reduced. It can be classified as follows. The experimental results are shown in Table 3.

表３に示すように、実施の形態１で説明したAdversarial Trainingを行うことで、通常のＶＡＥよりも、分類精度を向上させることができた。また、実施の形態２で説明したように、異なるクラス間で、潜在変数の距離を離すように学習したＶＡＥがもっとも良い精度を示した。 As shown in Table 3, by performing the Adversarial Training described in Embodiment 1, the classification accuracy was able to be improved more than normal VAE. Furthermore, as described in the second embodiment, VAE that was trained to increase the distance between latent variables between different classes showed the best accuracy.

［潜在変数空間の可視化］
図８は、通常のＶＡＥモデルと、実施の形態１，２に係る学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルとの潜在変数空間の可視化図である。 [Visualization of latent variable space]
FIG. 8 is a visualization diagram of the latent variable space of a normal VAE model and a VAE model that has undergone adversarial training using the learning methods according to the first and second embodiments.

図８に示すように、実施の形態２に係る学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルでは、実施の形態１に係る学習方法を用いてAdversarial Trainingを行ったＶＡＥモデルにおける潜在変数空間と比して、データの潜在変数がクラスごとに離れている。したがって、実施の形態２にかかる学習方法を用いてAdversarial Trainingをすることで、データの潜在変数をクラスごとに分離することができ、分類精度の向上を図ることができる。 As shown in FIG. 8, the latent variable space of the VAE model that underwent Adversarial Training using the learning method according to Embodiment 2 is different from that of the VAE model that underwent Adversarial Training using the learning method according to Embodiment 1. In comparison, the latent variables of the data are separated by class. Therefore, by performing adversarial training using the learning method according to the second embodiment, latent variables of data can be separated for each class, and classification accuracy can be improved.

［実施の形態２の効果］
このように、実施の形態２では、異なるクラス間で潜在変数の距離を離すように学習を行うことによって、Adversarial Exampleに対して、さらに頑健なＶＡＥを実現することが可能である。 [Effects of Embodiment 2]
In this way, in the second embodiment, by performing learning to increase the distance of latent variables between different classes, it is possible to realize a more robust VAE for the Adversarial Example.

［適用例］
図９は、実施の形態１，２に係る学習方法を用いて学習したＶＡＥの標識分類システムへの適応を示す図である。 [Application example]
FIG. 9 is a diagram showing adaptation of VAE learned using the learning methods according to Embodiments 1 and 2 to the sign classification system.

自動運転車では、道路上の標識を車載カメラによって撮影し、認識することで車体の制御に活用している。この際、車載カメラによって取り込まれた標識の画像データは、ＶＡＥを含む深層学習モデルを用いた画像分類システムによって、画像データがどの標識なのか分類される。ここで、標識分類システム中の深層学習モデルに対してAdversarial Example攻撃が行われると、間違った標識情報を認識させられ、それを基に車体が制御されることで、事故等につながるリスクがある。 Self-driving cars use on-board cameras to photograph and recognize road signs, which are then used to control the vehicle. At this time, the image data of the sign captured by the in-vehicle camera is classified as to which sign the image data is by an image classification system using a deep learning model including VAE. If an Adversarial Example attack is carried out against the deep learning model in the traffic sign classification system, it will cause the wrong traffic sign information to be recognized and the vehicle will be controlled based on that information, which could lead to an accident. .

したがって、特徴量抽出機能として、実施の形態１，２に係る学習方法を用いて学習したＶＡＥを組み込むことで、このようなリスクを防ぐことが可能である。実施の形態１，２に係る学習方法を用いて学習したＶＡＥを組み込むことによって、Adversarial Exampleの影響を受けずに、本来の認識したかった標識画像の特徴を適正に抽出することが可能である。これによって、標識を分類する深層学習モデルに対してAdversarial Exampleの影響を防ぐことができる。 Therefore, such a risk can be prevented by incorporating VAE learned using the learning method according to Embodiments 1 and 2 as a feature extraction function. By incorporating the VAE learned using the learning method according to Embodiments 1 and 2, it is possible to appropriately extract the characteristics of the sign image that were originally desired to be recognized without being influenced by the Adversarial Example. . This makes it possible to prevent the influence of Adversarial Examples on the deep learning model that classifies signs.

［実施の形態のシステム構成について］
学習装置１０，２１０の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、学習装置１０，２１０の機能の分散及び統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [About the system configuration of the embodiment]
Each component of the learning devices 10, 210 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distributing and integrating the functions of the learning devices 10 and 210 is not limited to what is shown in the drawings, and all or part of them can be functionally or It can be physically distributed or integrated.

また、学習装置１０，２１０においておこなわれる各処理は、全部または任意の一部が、ＣＰＵ、ＧＰＵ（Graphics Processing Unit）、及び、ＣＰＵ、ＧＰＵにより解析実行されるプログラムにて実現されてもよい。また、学習装置１０，２１０においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 Further, each process performed in the learning devices 10 and 210 may be implemented in whole or in part by a CPU, a GPU (Graphics Processing Unit), or a program that is analyzed and executed by the CPU or GPU. Moreover, each process performed in the learning devices 10 and 210 may be realized as hardware using wired logic.

また、実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述及び図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 Furthermore, among the processes described in the embodiments, all or part of the processes described as being performed automatically can also be performed manually. Alternatively, all or part of the processes described as being performed manually can also be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, and various data and parameters described above and illustrated can be changed as appropriate, unless otherwise specified.

［プログラム］
図７は、プログラムが実行されることにより、学習装置１０，２１０が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 [program]
FIG. 7 is a diagram showing an example of a computer on which the learning device 10, 210 is realized by executing a program. Computer 1000 includes, for example, a memory 1010 and a CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 Memory 1010 includes ROM 1011 and RAM 1012. The ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System). Hard disk drive interface 1030 is connected to hard disk drive 1090. Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into disk drive 1100. Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120. Video adapter 1060 is connected to display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ（Operating System）１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、学習装置１０，２１０の各処理を規定するプログラムは、コンピュータ１０００により実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、学習装置１０，２１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094. That is, a program that defines each process of the learning devices 10 and 210 is implemented as a program module 1093 in which code executable by the computer 1000 is written. Program module 1093 is stored in hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the learning device 10 or 210 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 Further, the setting data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, program module 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.

以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述及び図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例及び運用技術等は全て本発明の範疇に含まれる。 Although the embodiments applying the invention made by the present inventor have been described above, the present invention is not limited to the description and drawings that form part of the disclosure of the present invention according to the present embodiments. That is, all other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are included in the scope of the present invention.

１０，２１０学習装置
１１画像データセット
１２選択部
１３ノイズ生成部
１４加工部
１５推定部
１６，２１６学習部
１５１ＶＡＥ
１６１，２１６１パラメータ更新部
１６２終了判定部
２１２修正部 10,210 Learning device 11 Image dataset 12 Selection unit 13 Noise generation unit 14 Processing unit 15 Estimation unit 16,216 Learning unit 151 VAE
161, 2161 Parameter update unit 162 End determination unit 212 Modification unit

Claims

an estimating unit that performs estimation using a VAE (Variational AutoEncoder) having an encoder that converts an input image into a low-dimensional latent variable, and a decoder that reconstructs the input image from the latent variable and outputs it as an output image;
a selection unit that selects an original image from a group of images each assigned a class label, and randomly selects an arbitrary target image from the group of images;
a generation unit that generates noise from the target image;
a processing unit that generates a processed image by adding the noise to the original image;
a learning unit that executes the VAE learning using the class label of the original image as a teacher label and the processed image as learning data;
A learning device characterized by having:

The generation unit generates the noise by manipulating pixels of the input image so that a latent variable of the input image approaches a latent variable when inputting the target image. The learning device according to claim 1.

3. The learning device according to claim 1, wherein the learning unit learns the VAE to increase the distance in the latent variable space between different classes based on the latent variables transformed by the VAE.

4. The learning unit updates the model parameters of the VAE using an optimization function that performs processing to change the average of the prior distribution of the latent space for each class of images. The learning device described.

A vector is calculated by averaging the latent variable vectors output by the trained VAE, which has learned the prior distribution of the latent variable as a standard Gaussian distribution, for each class, and the vector obtained by multiplying the averaged vector for each class by a predetermined number is averaged. 5. The learning device according to claim 4, further comprising a modification unit that sets a Gaussian distribution with a variance of 1 as the prior distribution of the latent variable of the class and modifies the optimization function.

A learning method executed by a learning device, comprising:
selecting an original image from a group of images each assigned a class label, and randomly selecting an arbitrary target image from the group of images;
generating noise from the target image;
generating a processed image by adding the noise to the original image;
An encoder that converts an input image into a low-dimensional latent variable using the class label of the original image as a teacher label and the processed image as learning data, and a decoder that reconstructs the input image from the latent variable and outputs it as an output image. A step of executing learning of VAE (Variational AutoEncoder) having
A learning method characterized by including.

selecting an original image from a group of images each assigned a class label, and randomly selecting an arbitrary target image from the group of images;
generating noise from the target image;
generating a processed image by adding the noise to the original image;
An encoder that converts an input image into a low-dimensional latent variable using the class label of the original image as a teacher label and the processed image as learning data, and a decoder that reconstructs the input image from the latent variable and outputs it as an output image. a step of performing learning of VAE (Variational AutoEncoder) having
A learning program for making a computer execute