JP2021120840A

JP2021120840A - Learning method, device, and program

Info

Publication number: JP2021120840A
Application number: JP2020014490A
Authority: JP
Inventors: 絵美明堂; Emi Meido; 和之田坂; Kazuyuki Tasaka; 茂之酒澤; Shigeyuki Sakasawa
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2021-08-19
Anticipated expiration: 2040-01-31
Also published as: JP7280210B2

Abstract

To provide a learning method for learning an image conversion process in which an image can be obtained while a task recognition accuracy, privacy protection, and a compression efficiency are ensured.SOLUTION: The learning method is for learning a weight parameter of an image conversion unit 11 by using a neural network structure. In the method, the weight parameter of the image conversion process is learned with use of a first cost for a recognition result obtained by recognizing, at a task unit 13 which is configured to recognize a prescribed task, a privacy protection image which is obtained by conversion of a training image at the image conversion unit 11, and a second cost for an evaluation result of the similarity between a compressed image obtained by compression of the training image at a compression unit 21 and the privacy protection image, such that a total cost calculated from the first cost and the second cost is minimized or the first cost and the second cost are alternately minimized.SELECTED DRAWING: Figure 7

Description

本発明は、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習する学習方法、装置及びプログラムに関する。 The present invention relates to a learning method, an apparatus and a program for learning an image conversion process capable of obtaining an image in which task recognition accuracy, privacy protection and compression efficiency are ensured.

ユーザのプライバシー情報が含まれうる画像／音声データを、クラウドに送信し、ニューラルネット等の機械学習を用いて解析するケースにおいて、ユーザへのプライバシー侵害が発生することを防止する必要がある。例えば、音声データに関して、クラウドに送られたスマートスピーカの内容を、仮にサービス提供者側が視聴したとすると、機械学習の精度向上等の技術的な目的の下での視聴であったとしても、結果としてプライバシーの侵害が発生しうる。 In the case where image / audio data that may include user privacy information is transmitted to the cloud and analyzed using machine learning such as a neural network, it is necessary to prevent invasion of privacy to the user. For example, regarding audio data, if the service provider views the content of the smart speaker sent to the cloud, even if it is viewed for technical purposes such as improving the accuracy of machine learning, the result will be. As a result of privacy invasion.

なお、このようなスマートスピーカでは、一般的に通信路の盗聴からは、データの暗号化によりユーザのプライバシーを守っている。しかし、クラウド側では暗号化されたデータを復号するため、上記のような状況が発生しうることとなる。 In such a smart speaker, the privacy of the user is generally protected by encrypting the data from eavesdropping on the communication path. However, since the encrypted data is decrypted on the cloud side, the above situation may occur.

以下のURLのニュースリリース記事「暗号化したままディープラーニングの標準的な学習処理ができる秘密計算技術を世界で初めて実現」において開示されているように、クラウド側で暗号化したまま再学習やファインチューニング等の処理を行う手法もある。
https://www.ntt.co.jp/news2019/1909/190902a.html As disclosed in the news release article "The world's first realization of secret calculation technology that can perform standard learning processing of deep learning while encrypted" at the following URL, re-learning and fine while encrypted on the cloud side There is also a method of performing processing such as tuning.
https://www.ntt.co.jp/news2019/1909/190902a.html

ここでの課題の一つ目は、サービス提供者の知覚による画像や音声データの確認ができないことである。実際に、問題の原因追究や機械学習の誤り等、人の知覚で行いたい作業もある。例えば、ポイズニングデータの目視排除等やユーザのクレーム対応等をサービス提供者側が行いたい場合があるが、そのような確認も難しくなると考えられる。二つ目の課題は、暗号化はしていても生データが含まれるため、ユーザにとっては、攻撃や運用ミスなどで生データが漏洩してしまうのではないかという不安を持ちやすいことである。 The first problem here is that the image and audio data cannot be confirmed by the perception of the service provider. Actually, there are some tasks that we want to do by human perception, such as investigating the cause of problems and making mistakes in machine learning. For example, the service provider may want to visually eliminate poisoning data or respond to user complaints, but such confirmation may be difficult. The second issue is that raw data is included even if it is encrypted, so users are likely to be worried that the raw data may be leaked due to attacks or operational mistakes. ..

一方、画像データに関しては、プライバシーと考えられるセンシティブな情報にぼかしや置き換え等の画像処理を行い、プライバシーを保護する手法が従来から行われている。ユーザにとっては、生データを提供しない安心感はあるものの、サービス提供者側の画像解析タスクの精度は非常に低下しやすい。 On the other hand, with respect to image data, a method of protecting privacy by performing image processing such as blurring or replacing sensitive information considered to be privacy has been conventionally performed. Although there is a sense of security for the user not to provide the raw data, the accuracy of the image analysis task on the service provider side tends to be very low.

近年では、ニューラルネット等の機械学習によるタスクの解析精度を可能な限り低下させずに、プライバシー画像を生成する試みもある。このような手法は、タスクの精度をある程度保ったまま、クラウド管理者やサービス提供者が画像を知覚で判断することができ、かつ、ユーザのプライバシーも守ることができる。ユーザも原画を送信しなくてよいので、サービス利用への心理的障壁を低下させる効果があると考えられる。 In recent years, there has been an attempt to generate a privacy image without lowering the analysis accuracy of a task by machine learning such as a neural network as much as possible. With such a method, the cloud administrator or the service provider can perceptually judge the image while maintaining the accuracy of the task to some extent, and the privacy of the user can be protected. Since the user does not have to send the original image, it is considered to have the effect of lowering the psychological barrier to the use of the service.

例えば、特許文献１の手法では、顔器官や顔向きを推定し、アバターで顔を置き換えることで、プライバシーを守り、且つ、運転に関する行動認識精度も保つことができる。同様に、非特許文献１の手法では、顔領域をGAN（敵対的生成ネットワーク）で本人とは異なる顔で作りかえることでプライバシーを守り、且つ、行動認識の精度も保つことができる。 For example, in the method of Patent Document 1, by estimating the facial organs and the face orientation and replacing the face with an avatar, privacy can be protected and the behavior recognition accuracy related to driving can be maintained. Similarly, in the method of Non-Patent Document 1, privacy can be protected and the accuracy of behavior recognition can be maintained by recreating the face area with a face different from the person himself / herself by GAN (hostile generation network).

これら特許文献１や非特許文献１の手法は、顔等の画像の一部のプライバシー領域を置き換える方法であり、画像全体のプライバシーについては考慮されていない。例えば、着ている服や肌質、部屋の様子等、サービスに不要なものが消されておらず、全体のリアリティーを消したいといった要望には対応できない。 The methods of Patent Document 1 and Non-Patent Document 1 are methods for replacing a part of the privacy area of an image such as a face, and do not consider the privacy of the entire image. For example, items that are not necessary for the service, such as the clothes and skin quality that you are wearing, the state of the room, etc., have not been erased, and it is not possible to respond to requests that you want to erase the overall reality.

全体のリアリティーを消す／低減することが可能な手法として、非特許文献２の手法では、動画からの行動認識を低解像画像から行う。低解像なので画像ファイルサイズを軽減できるといった利点は存在する。しかし、単なる低解像動画から簡単な行動認識以外のタスクを行うのは難しく、適用可能なタスクが限定されてしまう。 As a method capable of erasing / reducing the overall reality, in the method of Non-Patent Document 2, behavior recognition from a moving image is performed from a low-resolution image. Since the resolution is low, there is an advantage that the image file size can be reduced. However, it is difficult to perform tasks other than simple action recognition from a simple low-resolution video, and applicable tasks are limited.

一方、非特許文献３では、ランダムノイズを大量に挿入したターゲット画像に近づくように、敵対的学習の枠組みを用いて原画像全体を変換するモデルを学習し生成している。敵対的学習の枠組みを用いることでタスクの精度を保ちつつ、ランダムノイズの入ったターゲット画像に近い画像変換モデルを学習できる。タスクとは例えば、画像認識、顔器官の認識等がある。 On the other hand, in Non-Patent Document 3, a model that transforms the entire original image is trained and generated by using a framework of hostile learning so as to approach a target image in which a large amount of random noise is inserted. By using the framework of hostile learning, it is possible to learn an image conversion model that is close to the target image with random noise while maintaining the accuracy of the task. Tasks include, for example, image recognition, facial organ recognition, and the like.

同手法では、変換した画像全体からタスクの解析に不要な要素が隠されやすく、全体のリアリティーを消したいといったプライバシーに対する要望に対応しやすい。一方で、同手法ではタスクの精度の劣化も低く抑えることができる。 With this method, elements unnecessary for task analysis are easily hidden from the entire converted image, and it is easy to respond to privacy requests such as wanting to erase the overall reality. On the other hand, with this method, deterioration of task accuracy can be suppressed to a low level.

特表2018-528536号公報Special Table 2018-528536 Gazette

Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018. Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017.Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017. Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." arXiv preprint arXiv:1901.00098 (2019).Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." ArXiv preprint arXiv: 1901.00098 (2019).

しかしながら、上記のように種々の要求に対処しうる非特許文献３の手法においても、次のような課題が存在した。 However, even in the method of Non-Patent Document 3 that can deal with various requirements as described above, the following problems exist.

すなわち、非特許文献３の手法では、プライバシーのために変換された画像は、ランダムノイズが表れやすい。画像圧縮については考慮されておらず、ランダムノイズ化した変換画像は、空間周波数成分を低周波数成分から高周波成分に渡るまでの各成分において大きく有しており、圧縮効率が非常に悪化するという課題がある。画像解析では、多くの静止画像・動画像を送信・保存する際の容量の低減が課題となっており、プライバシーや解析精度を保ちつつファイルサイズを小さくすることが求められているが、圧縮効率が悪いとファイルサイズ低減の要求を満たすことができない。 That is, in the method of Non-Patent Document 3, random noise tends to appear in the image converted for privacy. Image compression is not taken into consideration, and the converted image converted to random noise has a large spatial frequency component in each component from the low frequency component to the high frequency component, which causes a problem that the compression efficiency is extremely deteriorated. There is. In image analysis, reducing the capacity when transmitting and storing many still images and moving images is an issue, and it is required to reduce the file size while maintaining privacy and analysis accuracy, but compression efficiency If it is bad, the request for file size reduction cannot be satisfied.

上記従来技術の課題に鑑み、本発明は、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習する学習方法、装置及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, an object of the present invention is to provide a learning method, an apparatus and a program for learning an image conversion process capable of obtaining an image in which task recognition accuracy, privacy protection and compression efficiency are ensured. do.

上記目的を達成するため、本発明は、ニューラルネットワーク構造による画像変換処理の重みパラメータを学習する学習方法であって、訓練用画像を前記画像変換処理で変換したプライバシー保護画像を、所定タスクの認識を行うタスク処理で認識した認識結果に対する第１コストと、前記訓練用画像を圧縮処理で圧縮した圧縮画像と、前記プライバシー保護画像と、の類似性の評価結果に対する第２コストと、を用いて前記画像変換処理の重みパラメータを学習することを特徴とする。また、前記学習方法に対応する学習装置であることと、コンピュータに前記学習方法を実行させるプログラムであることを特徴とする。 In order to achieve the above object, the present invention is a learning method for learning weight parameters of an image conversion process by a neural network structure, and recognizes a privacy-protected image obtained by converting a training image by the image conversion process into a predetermined task. Using the first cost for the recognition result recognized in the task process of performing the above, the second cost for the evaluation result of the similarity between the compressed image obtained by compressing the training image by the compression process, and the privacy protection image. It is characterized by learning the weight parameter of the image conversion process. Further, it is characterized in that it is a learning device corresponding to the learning method and a program that causes a computer to execute the learning method.

本発明によれば、前記第１コストと前記第２コストを用いて学習することで、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習することができる。 According to the present invention, it is possible to learn an image conversion process capable of obtaining an image in which task recognition accuracy, privacy protection and compression efficiency are ensured by learning using the first cost and the second cost. can.

従来手法でのタスク実施時（推論時）の構成である従来実施構成の機能ブロック図である。It is a functional block diagram of the conventional execution configuration which is the configuration at the time of task execution (inference) by the conventional method. 従来画像変換部の重みパラメータを学習する際の構成である、従来学習構成の機能ブロック図である。It is a functional block diagram of the conventional learning configuration which is the configuration when learning the weight parameter of the conventional image conversion unit. 従来学習構成による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。It is a flow chart of learning by the conventional learning configuration, and shows the procedure of learning using GAN which is an existing method. 一実施形態に係るタスク実施時（推論時）の構成である認識装置の機能ブロック図である。It is a functional block diagram of the recognition device which is the structure at the time of task execution (inference time) which concerns on one Embodiment. 画像変換部の重みパラメータを学習する際の構成である、一実施形態に係る学習装置の機能ブロック図である。It is a functional block diagram of the learning apparatus which concerns on one Embodiment which is the structure at the time of learning the weight parameter of an image conversion part. 一実施形態に係る学習装置による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。It is a flowchart of learning by the learning apparatus which concerns on one Embodiment, and shows the procedure of learning using GAN which is an existing method. 図５とは別の一実施形態に係る学習装置の機能ブロック図である。It is a functional block diagram of the learning apparatus which concerns on one Embodiment different from FIG. 図７の構成における一実施形態に係る学習装置20による学習のフローチャートである。It is a flowchart of learning by the learning apparatus 20 which concerns on one Embodiment in the configuration of FIG. 一般的なコンピュータ装置におけるハードウェア構成を示す図である。It is a figure which shows the hardware configuration in a general computer device.

以下、本実施形態を説明する前に、対比例としての非特許文献３の手法（以下、「従来手法」と称する。）を簡潔に説明する。図１は、従来手法でのタスク実施時（推論時）の構成である従来実施構成100の機能ブロック図であり、従来実施構成100は従来画像変換部101、従来圧縮部102及び従来タスク部103を有する。 Hereinafter, before explaining the present embodiment, the method of Non-Patent Document 3 as a inverse proportion (hereinafter, referred to as “conventional method”) will be briefly described. FIG. 1 is a functional block diagram of the conventional implementation configuration 100, which is the configuration at the time of task execution (inference) by the conventional method. The conventional implementation configuration 100 is the conventional image conversion unit 101, the conventional compression unit 102, and the conventional task unit 103. Has.

従来画像変換部101は、画像の難読化器の役割を有し、変換対象となる画像（ユーザが提供する、プライバシー保護の対象となる画像）を変換して、プライバシー保護画像を出力する。従来圧縮部102は、既存手法で当該プライバシー保護画像を圧縮することで、圧縮プライバシー保護画像を出力する。従来タスク部103は、当該圧縮プライバシー保護画像を復号したうえで所定の認識タスク（例えば姿勢推定や画像認識）を実施して認識結果（例えば姿勢推定結果や画像認識結果）を出力する。 The conventional image conversion unit 101 has a role of an image obfuscation device, converts an image to be converted (an image provided by a user and is subject to privacy protection), and outputs a privacy protection image. Conventionally, the compression unit 102 outputs a compressed privacy protection image by compressing the privacy protection image by an existing method. The conventional task unit 103 decodes the compressed privacy protection image, executes a predetermined recognition task (for example, posture estimation or image recognition), and outputs a recognition result (for example, posture estimation result or image recognition result).

既に説明したように、プライバシー保護画像（又はこれを保存や伝送するための圧縮プライバシー保護画像）は、プライバシーが保護された状態（難読化された状態）に変換されており、且つ、従来タスク部103での認識精度も一定精度を確保しうる画像となっている。しかしながら、圧縮効率が悪く、圧縮プライバシー保護画像のファイルサイズが大きくなるという課題を有する。 As described above, the privacy-protected image (or the compressed privacy-protected image for storing or transmitting the image) has been converted into a privacy-protected state (obfuscated state), and the conventional task unit. The recognition accuracy of 103 is also an image that can secure a certain accuracy. However, there is a problem that the compression efficiency is poor and the file size of the compressed privacy-protected image becomes large.

図１の従来実施構成100によるタスク実施のためには、予め、所定の畳込ニューラルネットワークや多層パーセプトロン（以下、「畳込ニューラルネットワーク等」と呼ぶ）で構成されている従来画像変換部101を学習しておき、その重みパラメータを求めておく必要がある。図２は、従来画像変換部101の重みパラメータを当該学習する際の構成である、従来学習構成200の機能ブロック図である。図示されるように、従来学習構成200は、従来画像変換部101、従来タスク部103、従来第一評価部104、ターゲット画像生成部201、従来識別部203及び従来第二評価部204を有する。 In order to execute the task according to the conventional implementation configuration 100 of FIG. 1, a conventional image conversion unit 101 composed of a predetermined convolutional neural network or a multi-layer perceptron (hereinafter, referred to as “convolutional neural network or the like”) is provided in advance. It is necessary to learn and find the weight parameter. FIG. 2 is a functional block diagram of the conventional learning configuration 200, which is a configuration for learning the weight parameter of the conventional image conversion unit 101. As shown in the figure, the conventional learning configuration 200 includes a conventional image conversion unit 101, a conventional task unit 103, a conventional first evaluation unit 104, a target image generation unit 201, a conventional identification unit 203, and a conventional second evaluation unit 204.

共通の符号が付されるように、図１及び図２の両方にそれぞれ存在する従来画像変換部101及び従来タスク部103は、図１及び図２において同一の構成である。ただし、従来画像変換部101の重みパラメータは、従来学習構成200による学習によって逐次的に更新され、学習が完了した際の重みパラメータで構成される従来画像変換部101が、図１の従来実施構成100において用いられるものとなる。 The conventional image conversion unit 101 and the conventional task unit 103, which are present in both FIGS. 1 and 2, respectively, have the same configuration in FIGS. 1 and 2, so that they have a common reference numeral. However, the weight parameters of the conventional image conversion unit 101 are sequentially updated by learning by the conventional learning configuration 200, and the conventional image conversion unit 101 composed of the weight parameters when the learning is completed is the conventional implementation configuration of FIG. It will be used in 100.

一方、図１及び図２で共通の従来タスク部103に関しては、画像に対して所定のタスク（姿勢推定など）を実行する任意の既存の畳込ニューラルネットワーク等で構成されており、図２の従来学習構成200による学習を行う時点において既に学習済みであるものとして、その重みパラメータが定まっているものである。（すなわち、従来学習構成200による学習において、従来タスク部103の重みパラメータが学習されて更新されることはない。） On the other hand, the conventional task unit 103 common to FIGS. 1 and 2 is composed of an arbitrary existing convolutional neural network or the like that executes a predetermined task (posture estimation, etc.) on the image. The weight parameters are determined as those that have already been learned at the time of learning by the conventional learning configuration 200. (That is, in the learning by the conventional learning configuration 200, the weight parameter of the conventional task unit 103 is not learned and updated.)

図３は、従来学習構成200による学習のフローチャートであり、既存手法である敵対的学習の枠組みを利用した学習を行う手順を示すものである。当該フローの開始にあたり、学習対象となる従来画像変換部101（及び従来識別部203）の重みパラメータには初期値をランダム値等として設定しておく。当該フローが開始されるとステップS101では、GAN接続の構成における従来学習構成200により、従来識別部203の学習を行い、その重みパラメータを更新してから、ステップS102へと進む。 FIG. 3 is a flow chart of learning based on the conventional learning configuration 200, and shows a procedure for learning using the framework of hostile learning, which is an existing method. At the start of the flow, initial values are set as random values or the like in the weight parameters of the conventional image conversion unit 101 (and the conventional identification unit 203) to be learned. When the flow is started, in step S101, the conventional identification unit 203 is learned by the conventional learning configuration 200 in the GAN connection configuration, the weight parameter thereof is updated, and then the process proceeds to step S102.

GAN接続の構成とは、従来学習構成200において、従来タスク部103及び従来第一評価部104を省略した、従来画像変換部101、ターゲット画像生成部201、従来識別部203、従来第二評価部204のみを有する構成として定義される。具体的に以下の（１０１）〜（１０４）で示される一連の学習手順により、ステップS101において従来識別部203の重みパラメータを更新する。 The GAN connection configuration is the conventional image conversion unit 101, the target image generation unit 201, the conventional identification unit 203, and the conventional second evaluation unit, in which the conventional task unit 103 and the conventional first evaluation unit 104 are omitted in the conventional learning configuration 200. Defined as a configuration with only 204. Specifically, the weight parameter of the conventional identification unit 203 is updated in step S101 by a series of learning procedures shown in the following (101) to (104).

（１０１）学習データとして与えられる訓練用画像に対して、従来画像変換部101による変換処理を適用してフェイク画像を生成し、且つ、当該訓練用画像に対してターゲット画像生成部201によるノイズ重畳処理（ガウシアンノイズで画素近傍ごとにランダムに色を変えるノイズ重畳処理）を適用して、リアル画像を生成する。ここで例えば、半分をリアル画像として生成し、残りの半分をフェイク画像として生成して、これらをミニバッチとして得る。 (101) A fake image is generated by applying a conversion process by the conventional image conversion unit 101 to the training image given as training data, and noise is superimposed on the training image by the target image generation unit 201. A real image is generated by applying processing (noise superimposition processing that randomly changes the color for each pixel neighborhood with Gaussian noise). Here, for example, half is generated as a real image and the other half is generated as a fake image to obtain these as mini-batch.

（１０２）上記ミニバッチとして得られたフェイク画像とリアル画像とを従来識別部203で識別し、いずれがフェイク画像（従来画像変換部101の出力）であり、いずれがリアル画像（ターゲット画像生成部201の出力）であるかの識別結果を得る。なお、従来識別部203は、リアル画像とフェイク画像を識別するタスクを実行するもの（真贋を見分ける識別器、Discriminator）として、所定の畳込ニューラルネットワーク等で構成され、この従来学習構成200においてその重みパラメータが学習されるものである。（従って、従来識別部203にはミニバッチを構成するフェイク画像とリアル画像が入力されるが、当該入力された画像がリアル画像又はフェイク画像のいずれであるかの正解情報については与えられておらず、従来識別部203において自ら、その識別結果を得る。） (102) The fake image and the real image obtained as the mini-batch are identified by the conventional identification unit 203, which is the fake image (output of the conventional image conversion unit 101), and which is the real image (target image generation unit 201). (Output of) is obtained. The conventional identification unit 203 is configured by a predetermined convolutional neural network or the like as a device (discriminator for discriminating authenticity) that executes a task of discriminating between a real image and a fake image. The weight parameter is learned. (Therefore, the fake image and the real image constituting the mini-batch are input to the conventional identification unit 203, but the correct answer information as to whether the input image is a real image or a fake image is not given. , Conventionally, the identification unit 203 obtains the identification result by itself.)

（１０３）従来第二評価部204では、従来識別部203の識別結果を受け取り、予め学習データとして与えられる正解（ミニバッチ内の各画像のいずれがリアル画像でいずれがフェイク画像であるかの正解）と照合し、識別結果が正解であれば低コスト値を、正解でなければ高コスト値を与える所定の識別用コスト関数で評価することにより、識別結果に対するコストを算出する。 (103) The conventional second evaluation unit 204 receives the identification result of the conventional identification unit 203 and gives a correct answer as learning data in advance (correct answer as to which of each image in the mini-batch is a real image and which is a fake image). The cost for the identification result is calculated by evaluating with a predetermined identification cost function that gives a low cost value if the identification result is correct and a high cost value if the identification result is not correct.

なお、上記の従来第二評価部204での識別用コスト関数は、従来識別部203が真贋を見分ける精度を向上させるためのものである。 The identification cost function in the conventional second evaluation unit 204 is for improving the accuracy of the conventional identification unit 203 for discriminating authenticity.

（１０４）複数の訓練用画像について以上（１０１）〜（１０３）の処理、すなわち、コスト（誤差）の順伝播計算を行ったうえで、当該コストを用いて逆方向に、従来第二評価部204→従来識別部203の誤差逆伝播法の計算を行うことで、確率的勾配降下法等のオプティマイザー（以下、「確率的勾配降下法等」と呼ぶ）を用いて従来識別部203の重みパラメータを更新する。当該更新により、従来識別部203が真贋を見分ける精度の向上が期待される。 (104) After performing the above processes (101) to (103) for a plurality of training images, that is, the forward propagation calculation of the cost (error), the cost (error) is used in the reverse direction in the conventional second evaluation unit. 204 → By calculating the error backpropagation method of the conventional identification unit 203, the weight of the conventional identification unit 203 is used by using an optimizer such as the stochastic gradient descent method (hereinafter referred to as “stochastic gradient descent method”). Update the parameters. With this update, it is expected that the accuracy of the conventional identification unit 203 for discriminating authenticity will be improved.

ステップS102では、GAN接続構成及びタスク接続構成において学習を行うことにより、従来画像変換部101の重みパラメータを更新してから、ステップS103へと進む。GAN接続構成はステップS101で説明した通りであり、一方、タスク接続構成とは、従来学習構成200において従来画像変換部101、従来タスク部103及び従来第一評価部104のみを備える構成として定義される。具体的に以下の（２０１）〜（２０３Ａ）または（２０３Ｂ）で示される一連の学習手順により、ステップS102において従来画像変換部101の重みパラメータを更新する。なお、以下に説明する通り、（２０３Ａ）の手順の変形例として（２０３Ｂ）の手順も可能であり、いずれかを用いればよい。（２０３Ａ）は従来第一評価部104の出力コストと従来第二評価部204の出力コストを交互に計算することにより、従来画像変換部101を交互に学習する手法であり、（２０３Ｂ）はこれら２つの出力コストから算出される総合コストにより従来画像変換部101を学習する手法である。 In step S102, the weight parameter of the conventional image conversion unit 101 is updated by learning in the GAN connection configuration and the task connection configuration, and then the process proceeds to step S103. The GAN connection configuration is as described in step S101, while the task connection configuration is defined as a configuration in which the conventional learning configuration 200 includes only the conventional image conversion unit 101, the conventional task unit 103, and the conventional first evaluation unit 104. NS. Specifically, the weight parameter of the conventional image conversion unit 101 is updated in step S102 by a series of learning procedures shown in (201) to (203A) or (203B) below. As described below, the procedure of (203B) is also possible as a modification of the procedure of (203A), and any of them may be used. (203A) is a method of alternately learning the conventional image conversion unit 101 by alternately calculating the output cost of the conventional first evaluation unit 104 and the output cost of the conventional second evaluation unit 204, and (203B) is these. This is a method of learning the conventional image conversion unit 101 from the total cost calculated from the two output costs.

（２０１） GAN接続構成において、学習データとして与えられる訓練用画像に対して、ステップS101に関して説明した（１０１）〜（１０３）の手順を実施する。ただしこの際、（１０３）はステップS101における手順からは変更された手順（１０３'）として実施し、具体的に手順（１０３'）においては、従来第二評価部204が用いるコスト関数を、手順（１０３）で用いた識別用コスト関数とは真逆の評価を行う、識別失敗用コスト関数に変更する。すなわち、従来識別部203で得た識別結果が正解であれば高コスト値を、正解でなければ（識別に失敗していれば）低コスト値を与える、所定の識別失敗用コスト関数を用いて、従来第二評価部204における評価を実施し、識別結果に対するコストを算出する。 (201) In the GAN connection configuration, the procedures (101) to (103) described with respect to step S101 are carried out for the training image given as the learning data. However, at this time, (103) is carried out as a procedure (103') changed from the procedure in step S101, and specifically, in the procedure (103'), the cost function conventionally used by the second evaluation unit 204 is used as a procedure. Change to the identification failure cost function, which evaluates exactly the opposite of the identification cost function used in (103). That is, using a predetermined identification failure cost function that gives a high cost value if the identification result obtained by the conventional identification unit 203 is correct, and a low cost value if the identification result is not correct (if identification fails). , Conventionally, the second evaluation unit 204 evaluates and calculates the cost for the identification result.

なお、上記の従来第二評価部204での識別失敗用コスト関数は、従来識別部203が真贋を見分けることに失敗するように、従来画像変換部101におけるフェイク画像の生成精度を向上させるためのものである。 The above-mentioned cost function for identification failure in the conventional second evaluation unit 204 is for improving the accuracy of generating a fake image in the conventional image conversion unit 101 so that the conventional identification unit 203 fails to distinguish the authenticity. It is a thing.

（２０２）タスク接続構成において、学習データとして与えられる訓練用画像に対して、従来画像変換部101で変換を施してフェイク画像を得て、このフェイク画像を従来タスク部103で認識して認識結果を得て、この認識結果を従来第一評価部104において、学習データとして与えられる正解と照合することにより評価して、認識結果に対するコストを算出する。当該コストは、認識結果が正解であれば低コスト値とし、正解でなければ高コスト値となるように、従来第一評価部104において所定のコスト関数を用いて算出する。 (202) In the task connection configuration, the training image given as learning data is converted by the conventional image conversion unit 101 to obtain a fake image, and this fake image is recognized by the conventional task unit 103 and the recognition result. The recognition result is evaluated by collating it with the correct answer given as learning data in the conventional first evaluation unit 104, and the cost for the recognition result is calculated. If the recognition result is correct, the cost is set to a low cost value, and if the recognition result is not correct, the cost is set to a high cost value. Conventionally, the first evaluation unit 104 calculates the cost using a predetermined cost function.

（２０３Ａ）複数の訓練用画像をミニバッチに分け、バッチごとに以上の（２０１）または（２０２）の処理、すなわち、コスト（誤差）の順伝播計算を行い、当該接続構成において、従来第一評価部104の出力したコストまたは従来第二評価部204の出力したコストを計算する。GAN接続構成の場合は、従来第二評価部204の出力したコストを用いて（GAN接続構成上を）逆方向に、従来第二評価部204→従来識別部203→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新する。また、タスク接続構成の場合は、従来第一評価部104の出力したコストを用いて、逆方向に、従来第一評価部104→従来タスク部103→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新する。以上のGAN接続構成の逆伝播、タスク接続構成の逆伝播を交互に行い、従来画像変換部101の重みパラメータは学習される。 (203A) A plurality of training images are divided into mini-batch, and the above processing (201) or (202), that is, forward propagation calculation of cost (error) is performed for each batch, and the conventional first evaluation is performed in the connection configuration. Calculate the output cost of unit 104 or the output cost of conventional second evaluation unit 204. In the case of the GAN connection configuration, the error of the conventional second evaluation unit 204 → the conventional identification unit 203 → the conventional image conversion unit 101 is used in the reverse direction (on the GAN connection configuration) using the cost output by the conventional second evaluation unit 204. By calculating the backpropagation method, the weight parameter of the conventional image conversion unit 101 is updated by using the stochastic gradient descent method or the like. Further, in the case of the task connection configuration, the error back propagation method of the conventional first evaluation unit 104 → the conventional task unit 103 → the conventional image conversion unit 101 is used in the reverse direction using the cost output by the conventional first evaluation unit 104. By performing the calculation, the weight parameter of the conventional image conversion unit 101 is updated by using the stochastic gradient descent method or the like. The back propagation of the above GAN connection configuration and the back propagation of the task connection configuration are alternately performed, and the weight parameter of the conventional image conversion unit 101 is learned.

（２０３Ｂ）総合コストを用いる場合には、複数の訓練用画像（GAN接続構成とタスク接続構成とで共通の訓練用画像）について以上の（２０１）及び（２０２）の処理、すなわち、コスト（誤差）の順伝播計算を行い、当該両接続構成において共通の訓練用画像ごとに、従来第一評価部104の出力したコストと従来第二評価部204の出力したコストとの所定の重みづけ和としての総合コストを計算し、当該総合コストを用いて（GAN接続構成上を）逆方向に、従来第二評価部204→従来識別部203→従来画像変換部101及び（タスク接続構成上を）逆方向に、従来第一評価部104→従来タスク部103→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新してもよい。（なお、従来手法では、GAN接続構成時において、タスク部の誤差も少々考慮する総合コストを用いている。タスク接続構成の場合には総合コストを用いていない。） (203B) When the total cost is used, the processing of the above (201) and (202) for a plurality of training images (training images common to the GAN connection configuration and the task connection configuration), that is, the cost (error). ), And for each training image common to both connection configurations, as a predetermined weighted sum of the cost output by the conventional first evaluation unit 104 and the cost output by the conventional second evaluation unit 204. Calculate the total cost of, and use the total cost in the opposite direction (on the GAN connection configuration), the conventional second evaluation unit 204 → the conventional identification unit 203 → the conventional image conversion unit 101 and (on the task connection configuration). By calculating the error backpropagation method of the conventional first evaluation unit 104 → the conventional task unit 103 → the conventional image conversion unit 101 in the direction, the weight parameter of the conventional image conversion unit 101 is calculated by using the stochastic gradient descent method or the like. You may update it. (Note that the conventional method uses the total cost that takes into account some errors in the task section when configuring the GAN connection. In the case of the task connection configuration, the total cost is not used.)

当該（２０３Ａ）または（２０３Ｂ）の更新により、従来画像変換部101によって変換して得られるフェイク画像が、従来識別部203が真贋を見分けることを失敗させる（すなわち、ターゲット画像生成部201で得られるリアル画像に類似している）精度が向上したものとなり、且つ、従来タスク部103での認識精度も向上したものとなる（すなわち、認識処理に適した状態の画像となる）ことが期待される。 Due to the update of (203A) or (203B), the fake image obtained by conversion by the conventional image conversion unit 101 causes the conventional identification unit 203 to fail to distinguish the authenticity (that is, the target image generation unit 201 obtains it). It is expected that the accuracy (similar to a real image) will be improved, and the recognition accuracy of the conventional task unit 103 will also be improved (that is, the image will be in a state suitable for recognition processing). ..

ステップS103では学習が収束したか否かを判定し、収束していれば当該時点での従来画像変換部101（及び従来識別部203）の重みパラメータを最終的な学習結果として得たうえで図３のフローを終了し、収束していなければステップS101に戻ることにより、以上説明した通りの学習（ステップS101及びS102）がさらに継続して実施されることとなる。ステップS103での収束判定には例えば、訓練用画像とは別途のテスト用画像を用いることで手順（２０３Ｂ）の総合コストまたは手順（２０３Ａ）の従来第一、第二評価部104,204がそれぞれ出力するコストを計算して学習モデルの精度を評価し、当該精度の向上（向上の履歴）が収束したか否かによって判定すればよい。単純に所定のエポック数等を収束条件としても良い。 In step S103, it is determined whether or not the learning has converged, and if it has converged, the weight parameters of the conventional image conversion unit 101 (and the conventional identification unit 203) at that time are obtained as the final learning result, and then the figure is shown. By ending the flow of 3 and returning to step S101 if it has not converged, the learning (steps S101 and S102) as described above will be further continuously carried out. For the convergence test in step S103, for example, by using a test image separate from the training image, the total cost of the procedure (203B) or the conventional first and second evaluation units 104 and 204 of the procedure (203A) output respectively. The cost may be calculated to evaluate the accuracy of the learning model, and it may be determined whether or not the improvement (history of improvement) of the accuracy has converged. A predetermined number of epochs or the like may be simply set as a convergence condition.

以上、非特許文献３の手法では、図２及び図３に示されるように敵対的学習の枠組みを利用して、従来画像変換部101と従来識別部203とを相互に競争させながら学習して、従来画像変換部101（及び従来識別部203）の学習結果を得ることができる。 As described above, in the method of Non-Patent Document 3, as shown in FIGS. 2 and 3, the conventional image conversion unit 101 and the conventional identification unit 203 are learned while competing with each other by using the framework of hostile learning. , The learning result of the conventional image conversion unit 101 (and the conventional identification unit 203) can be obtained.

以下、非特許文献３の手法に対して画像圧縮率を考慮する改良を施したものとして、本発明の一実施形態を説明する。 Hereinafter, an embodiment of the present invention will be described assuming that the method of Non-Patent Document 3 has been improved in consideration of the image compression rate.

図４は、一実施形態に係るタスク実施時（推論時）の構成である認識装置10の機能ブロック図であり、認識装置10は画像変換部11、圧縮部21及びタスク部13を有する。 FIG. 4 is a functional block diagram of the recognition device 10 which is configured at the time of task execution (inference) according to one embodiment, and the recognition device 10 includes an image conversion unit 11, a compression unit 21, and a task unit 13.

画像変換部11は、画像の難読化器の役割を有し、変換対象となる画像（ユーザが提供する、プライバシー保護の対象となる画像）を変換して、プライバシー保護画像を出力する。圧縮部21は、既存手法で当該プライバシー保護画像を圧縮することで、圧縮プライバシー保護画像を出力する。この際、ユーザ指定される所定の圧縮設定に従って、圧縮部21は圧縮を行うことができる。タスク部13は、当該圧縮プライバシー保護画像を復号したうえで所定の認識タスク（例えば姿勢推定や画像認識）を実施して認識結果（例えば姿勢推定結果や画像認識結果）を出力する。 The image conversion unit 11 has a role of an image obfuscation device, converts an image to be converted (an image provided by the user and is subject to privacy protection), and outputs a privacy protection image. The compression unit 21 outputs the compressed privacy protection image by compressing the privacy protection image by the existing method. At this time, the compression unit 21 can perform compression according to a predetermined compression setting specified by the user. The task unit 13 decodes the compressed privacy protection image, executes a predetermined recognition task (for example, posture estimation or image recognition), and outputs a recognition result (for example, posture estimation result or image recognition result).

本実施形態においても非特許文献３の従来手法と同様に、プライバシー保護画像（又はこれを保存や伝送するための圧縮プライバシー保護画像）は、プライバシーが保護された状態（難読化された状態）に変換されており、且つ、タスク部13での認識精度も一定精度を確保しうる画像となっている。 Also in this embodiment, as in the conventional method of Non-Patent Document 3, the privacy-protected image (or the compressed privacy-protected image for storing or transmitting the image) is in a privacy-protected state (obfuscated state). The image is converted and the recognition accuracy of the task unit 13 can be ensured to a certain degree.

一方、本実施形態では従来手法とは異なり、得られる画像は圧縮効率に優れており、画像変換部11で得たプライバシー保護画像（非圧縮状態のもの）を圧縮部21において圧縮プライバシー保護画像として（不可逆）圧縮することで、圧縮前から大きく品質を変えることなく、ファイルサイズを小さく抑えることができる。 On the other hand, in the present embodiment, unlike the conventional method, the obtained image has excellent compression efficiency, and the privacy-protected image (uncompressed state) obtained by the image conversion unit 11 is used as the compressed privacy-protected image in the compression unit 21. By compressing (irreversibly), the file size can be kept small without significantly changing the quality from before compression.

図４の認識装置10によるタスク実施のためには、予め、畳込ニューラルネットワーク等で構成されている画像変換部11を学習しておき、その重みパラメータを求めておく必要がある。図５は、画像変換部11の重みパラメータを当該学習する際の構成である、一実施形態に係る学習装置20の機能ブロック図である。図示されるように、学習装置20は、画像変換部11、タスク部13、第一評価部14、圧縮部21、識別部23及び第二評価部24を有する。 In order to execute the task by the recognition device 10 of FIG. 4, it is necessary to learn the image conversion unit 11 configured by the convolutional neural network or the like in advance and obtain the weight parameter thereof. FIG. 5 is a functional block diagram of the learning device 20 according to the embodiment, which is a configuration for learning the weight parameter of the image conversion unit 11. As shown in the figure, the learning device 20 includes an image conversion unit 11, a task unit 13, a first evaluation unit 14, a compression unit 21, an identification unit 23, and a second evaluation unit 24.

共通の符号が付されるように、図４及び図５の両方にそれぞれ存在する画像変換部11、圧縮部21及びタスク部13は、図４及び図５において同一の構成である。ただし、画像変換部11の重みパラメータは、学習装置20による学習によって逐次的に更新され、学習が完了した際の重みパラメータで構成される画像変換部11が、図４の認識装置10において用いられるものとなる。 The image conversion unit 11, the compression unit 21, and the task unit 13, which are present in both FIGS. 4 and 5, have the same configuration in FIGS. 4 and 5, so that they have a common reference numeral. However, the weight parameters of the image conversion unit 11 are sequentially updated by learning by the learning device 20, and the image conversion unit 11 composed of the weight parameters when the learning is completed is used in the recognition device 10 of FIG. It becomes a thing.

一方、図４及び図５で共通のタスク部13に関しては、画像に対して所定のタスク（姿勢推定など）を実行する任意の既存の畳込ニューラルネットワーク等で構成されており、図５の学習装置20による学習を行う時点において既に学習済みであるものとして、その重みパラメータが定まっているものである。（すなわち、学習装置20による学習において、タスク部13の重みパラメータが学習されて更新されることは基本的にはないが、タスク精度が十分に保てない場合にはファインチューニングを行ってタスク部13の重みパラメータを更新しても良い。その場合、通常の画像のタスク精度は低下するが、画像変換器の出力画像に対してはタスクの精度が向上する。） On the other hand, the task unit 13 common to FIGS. 4 and 5 is composed of an arbitrary existing convolutional neural network or the like that executes a predetermined task (posture estimation, etc.) on the image, and is composed of the learning of FIG. The weight parameter is determined as having already been learned at the time of learning by the device 20. (That is, in learning by the learning device 20, the weight parameters of the task unit 13 are not basically learned and updated, but if the task accuracy cannot be sufficiently maintained, fine tuning is performed and the task unit is performed. The 13 weight parameters may be updated. In that case, the task accuracy of the normal image is reduced, but the task accuracy is improved for the output image of the image converter.)

図６は、一実施形態に係る学習装置20による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。学習装置20による図６の当該手順はステップS11,S12,S13で構成されるが、これらはそれぞれ、従来学習構成200による図３のステップS101,S102,S103に対応しており、従来学習構成200における各部を、学習装置20の各部に以下のように読み替えてステップS101,S102,S103を実施したものが、図６のステップS11,S12,S13にそれぞれ相当する。（従って、図６の各ステップに関して、処理主体となる機能部を図２のものから図５のものへと読み替えることによって図３の各ステップに対応しているため、重複する説明は省略する。） FIG. 6 is a flowchart of learning by the learning device 20 according to the embodiment, and shows a procedure for performing learning using GAN, which is an existing method. The procedure of FIG. 6 by the learning device 20 is composed of steps S11, S12, and S13, which correspond to steps S101, S102, and S103 of FIG. 3 by the conventional learning configuration 200, respectively, and the conventional learning configuration 200. Each part of the above is read as each part of the learning device 20 as follows, and steps S101, S102, and S103 are performed, which correspond to steps S11, S12, and S13 of FIG. 6, respectively. (Therefore, with respect to each step of FIG. 6, since each step of FIG. 3 is corresponded to by reading the functional unit which is the main processing subject from that of FIG. 2 to that of FIG. 5, duplicate description will be omitted. )

すなわち、「読み替え前の従来学習構成200の構成→読み替え後の学習装置20の構成」という形で読み替えの対応関係を示すと、「ターゲット画像生成部201→圧縮部21」、「従来識別部203→識別部23」、「従来第二評価部204→第二評価部24」、「従来画像変換部101→画像変換部11」、「従来タスク部103→タスク部13」及び「従来第一評価部104→第一評価部14」という対応関係で、読み替えることができる。学習の際のGAN接続やタスク接続に関しても、これら読み替えにより同様に定義される。 That is, when the correspondence of reading is shown in the form of "configuration of conventional learning configuration 200 before replacement-> configuration of learning device 20 after replacement", "target image generation unit 201-> compression unit 21" and "conventional identification unit 203". → Identification unit 23 ”,“ Conventional second evaluation unit 204 → Second evaluation unit 24 ”,“ Conventional image conversion unit 101 → Image conversion unit 11 ”,“ Conventional task unit 103 → Task unit 13 ”and“ Conventional first evaluation It can be read as "Part 104 → First Evaluation Department 14". GAN connection and task connection during learning are also defined by these replacements.

上記対応関係において、「ターゲット画像生成部201及び圧縮部21」のみが互いに相違する処理を行う関係にあり、その他は全て、学習の際の各ステップにおいて同一の処理を行う関係にある。換言すれば、本実施形態では、従来学習構成200のターゲット画像生成部201を圧縮部21に置き換えたものとして学習装置20を用意し、図３の各ステップと同様である図６の各ステップを学習装置20において実行することで、結果的に、その重みパラメータが学習される画像変換部11が、図４の認識装置10において説明した通りの、圧縮効率に優れプライバシー保護されており、且つ、タスク部13による認識処理にも適した画像を出力可能なものとして得られることとなる。 In the above correspondence, only the "target image generation unit 201 and the compression unit 21" are in a relationship of performing different processing from each other, and all the others are in a relationship of performing the same processing in each step during learning. In other words, in the present embodiment, the learning device 20 is prepared by replacing the target image generation unit 201 of the conventional learning configuration 200 with the compression unit 21, and each step of FIG. 6 which is the same as each step of FIG. 3 is performed. As a result, the image conversion unit 11 whose weight parameter is learned by executing the learning device 20 has excellent compression efficiency and privacy protection as described in the recognition device 10 of FIG. An image suitable for the recognition process by the task unit 13 can be output.

本実施形態において上記のように、（従来手法において用いられていたターゲット画像生成部201に代わるものとして、）学習装置20に圧縮部21を設けることは、次のような独自の知見に基づくものである。すなわち、圧縮部21ではユーザ指定される圧縮設定に従って、JPEG等の非可逆圧縮を行うことで、訓練用画像からリアル画像を得る。ここで、非可逆圧縮は劣化を伴うため、非可逆圧縮されデータサイズが小さくなったリアル画像は、そのまま、プライバシー保護画像としても利用可能である、という知見である。 In the present embodiment, as described above, the provision of the compression unit 21 in the learning device 20 (as an alternative to the target image generation unit 201 used in the conventional method) is based on the following unique knowledge. Is. That is, the compression unit 21 obtains a real image from the training image by performing lossy compression such as JPEG according to the compression setting specified by the user. Here, since lossy compression is accompanied by deterioration, it is a finding that a real image that has been lossy compressed and whose data size has been reduced can be used as it is as a privacy protection image.

従って、敵対的学習の枠組みに即した図６のフローにより、画像変換部101は、圧縮部21によって圧縮された画像に類似するものとして、圧縮効率が高く、プライバシー保護も実現されている画像であって、且つ、タスク部13による認識にも適している画像を出力することができるものとして、敵対的関係にある識別部23と共にその重みパラメータを学習することが可能となる。 Therefore, according to the flow of FIG. 6 in line with the framework of hostile learning, the image conversion unit 101 is an image having high compression efficiency and privacy protection as being similar to the image compressed by the compression unit 21. Assuming that an image suitable for recognition by the task unit 13 can be output, it is possible to learn the weight parameter together with the identification unit 23 having a hostile relationship.

図７は、図５とは別の一実施形態に係る学習装置20の機能ブロック図であり、図８は、図７の構成における一実施形態に係る学習装置20による学習のフローチャートである。 FIG. 7 is a functional block diagram of the learning device 20 according to the embodiment different from that of FIG. 5, and FIG. 8 is a flowchart of learning by the learning device 20 according to the embodiment in the configuration of FIG.

図５及び図６の実施形態では敵対的学習の枠組みを利用して画像変換部11の重みパラメータを学習したのに対して、図７及び図８の実施形態では敵対的学習の枠組みを利用せずに画像変換部11の重みパラメータを学習することができる。敵対的学習の枠組みを利用しないことにより、図７の学習装置20は、図５の構成から識別部23が除外された構成となる。 In the embodiments of FIGS. 5 and 6, the weight parameters of the image conversion unit 11 are learned by using the framework of hostile learning, whereas in the embodiments of FIGS. 7 and 8, the framework of hostile learning is used. The weight parameter of the image conversion unit 11 can be learned without learning. By not using the framework of hostile learning, the learning device 20 of FIG. 7 has a configuration in which the identification unit 23 is excluded from the configuration of FIG.

図７の学習装置20における第二評価部24は、図５における処理（識別部23の識別結果の評価処理）とは異なる処理として、次のような処理を行う。すなわち、図７の第二評価部24は、圧縮部21が訓練用画像を圧縮して得る圧縮画像と、画像変換部11が訓練用画像を変換して得るプライバシー保護画像と、を読み込み、所定のコスト関数により、これら２画像の相違が大きいほどその値が大きくなるようなコストを算出する。一実施形態では、圧縮画像とプライバシー保護画像との平均二乗誤差（MSE、当該２画像の差分画像の画素値の二乗和を画素数で割ったもの）として、図７の第二評価部24はコストを算出することができる。あるいは、差分画像の絶対値和の画素数平均により、コストを算出してもよい。 The second evaluation unit 24 in the learning device 20 of FIG. 7 performs the following processing as a processing different from the processing in FIG. 5 (evaluation processing of the identification result of the identification unit 23). That is, the second evaluation unit 24 of FIG. 7 reads the compressed image obtained by compressing the training image by the compression unit 21 and the privacy protection image obtained by converting the training image by the image conversion unit 11, and determines the predetermined image. By the cost function of, the cost is calculated so that the larger the difference between these two images, the larger the value. In one embodiment, the second evaluation unit 24 in FIG. 7 determines that the mean square error between the compressed image and the privacy protection image (MSE, the sum of squares of the pixel values of the difference images of the two images divided by the number of pixels). The cost can be calculated. Alternatively, the cost may be calculated by averaging the number of pixels of the sum of the absolute values of the difference images.

一方、図７の学習装置20における第二評価部24以外の構成である圧縮部21、画像変換部11、タスク部13及び第一評価部14のそれぞれの処理内容に関しては、図５の学習装置20における処理内容と共通である。以下、図８のフローの各ステップを説明する。 On the other hand, regarding the processing contents of the compression unit 21, the image conversion unit 11, the task unit 13, and the first evaluation unit 14, which are configurations other than the second evaluation unit 24 in the learning device 20 of FIG. 7, the learning device of FIG. 5 It is the same as the processing content in 20. Hereinafter, each step of the flow of FIG. 8 will be described.

図８のフローの開始時には予め、画像変換部11の重みパラメータの初期値を設定しておく。（なお、タスク部13に関しては図５及び図６の実施形態と同様に、既に重みパラメータが学習済みの状態にある。）図８のフローが開始されると、ステップS21において、画像変換部11の学習を行い、その重みパラメータを更新してから、ステップS22へと進む。ステップS21では具体的に以下の（２１）~（２２Ａ）又は（２２Ｂ）で示される一連の学習手順により、画像変換部11の重みパラメータを更新することができる。手順（２２Ａ）と（２２Ｂ）とは、基本的にはそのいずれかを用いればよい。（両方用いてもよい。） At the start of the flow of FIG. 8, the initial value of the weight parameter of the image conversion unit 11 is set in advance. (Regarding the task unit 13, the weight parameters have already been learned for the task unit 13 as in the embodiments of FIGS. 5 and 6.) When the flow of FIG. 8 is started, the image conversion unit 11 is in step S21. After learning the above and updating its weight parameter, the process proceeds to step S22. In step S21, the weight parameter of the image conversion unit 11 can be updated by a series of learning procedures specifically shown by the following (21) to (22A) or (22B). As the procedure (22A) and (22B), basically any one of them may be used. (Both may be used.)

（２１）学習データとして与えられる訓練用画像を、画像変換部11で変換することによりプライバシー保護画像を得て、且つ、当該訓練用画像を圧縮部21で圧縮することにより圧縮画像を得る。当該プライバシー保護画像及び圧縮画像を第二評価部24で評価することにより、コストを計算する。また、当該プライバシー保護画像をタスク部13で認識して認識結果を得て、この認識結果を第一評価部14において学習データとして与えられる正解と照合することにより評価して、認識結果に対するコストを算出する。当該コストは、（図５及び図６の第一評価部14と同様である図２及び図３の従来第一評価部104と同様に、）認識結果が正解に近ければ低コスト値とし、正解に近くなければ高コスト値となるように、第一評価部14において所定のコスト関数を用いて算出する。 (21) The training image given as the training data is converted by the image conversion unit 11 to obtain a privacy protection image, and the training image is compressed by the compression unit 21 to obtain a compressed image. The cost is calculated by evaluating the privacy protection image and the compressed image by the second evaluation unit 24. In addition, the task unit 13 recognizes the privacy protection image to obtain a recognition result, and the first evaluation unit 14 evaluates the recognition result by collating it with the correct answer given as learning data to determine the cost for the recognition result. calculate. If the recognition result is close to the correct answer (similar to the conventional first evaluation unit 104 of FIGS. 2 and 3 which is the same as the first evaluation unit 14 of FIGS. 5 and 6), the cost is set to a low cost value and the correct answer. The first evaluation unit 14 calculates using a predetermined cost function so that the cost value is high if it is not close to.

（２２Ａ）複数の訓練用画像をミニバッチに分け、バッチごとにコスト（誤差）の順伝播計算を行い、バッチごとに、第一評価部14の出力したコストまたは第二評価部24の出力したコストを計算する。GAN接続構成（図５に対応するものとして図７に関して画像変換部11、圧縮部21及び第二評価部24の構成（図５での識別部23を除外した構成）として定義）の場合は、第二評価部の出力したコストを用いて逆方向に、第二評価部24→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新する。また、タスク接続構成（図７に関して図５と同様に定義）の場合は、第一評価部14の出力したコストを用いて、逆方向に、第一評価部14→タスク部13→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新する。以上のGAN接続構成の逆伝播、タスク接続構成の逆伝播を交互に行い、画像変換部11の重みパラメータは学習される。 (22A) A plurality of training images are divided into mini-batches, the cost (error) is forward-propagated for each batch, and the cost output by the first evaluation unit 14 or the output cost of the second evaluation unit 24 is performed for each batch. To calculate. In the case of the GAN connection configuration (defined as the configuration of the image conversion unit 11, the compression unit 21 and the second evaluation unit 24 (the configuration excluding the identification unit 23 in FIG. 5) with respect to FIG. 7 as corresponding to FIG. 5), By calculating the error backpropagation method of the second evaluation unit 24 → image conversion unit 11 in the opposite direction using the cost output by the second evaluation unit, the image conversion unit 11 uses the stochastic gradient descent method or the like. Update the weight parameter of. Further, in the case of the task connection configuration (defined in the same manner as in FIG. 7 with respect to FIG. 7), the cost output by the first evaluation unit 14 is used, and the first evaluation unit 14 → the task unit 13 → the image conversion unit is used in the opposite direction. By calculating the error backpropagation method of 11, the weight parameter of the image conversion unit 11 is updated by using the stochastic gradient descent method or the like. The back propagation of the above GAN connection configuration and the back propagation of the task connection configuration are alternately performed, and the weight parameter of the image conversion unit 11 is learned.

（２２Ｂ）総合コストを用いる場合には、複数の訓練用画像（GAN接続構成とタスク接続構成とで共通の訓練用画像）について、コスト（誤差）の順伝播計算を行い、当該両接続構成において共通の訓練用画像ごとに、第一評価部14の出力したコストと第二評価部24の出力したコストとの所定の重みづけ和としての総合コストを計算し、当該総合コストを用いて（GAN接続構成上を）逆方向に、第二評価部24→画像変換部11及び第一評価部14→タスク部13→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新してもよい。 (22B) When the total cost is used, the forward propagation calculation of the cost (error) is performed for a plurality of training images (training images common to the GAN connection configuration and the task connection configuration), and in both connection configurations. For each common training image, the total cost as a predetermined weighted sum of the output cost of the first evaluation unit 14 and the output cost of the second evaluation unit 24 is calculated, and the total cost is used (GAN). Probabilistic gradient descent by calculating the error backpropagation method of the second evaluation unit 24 → image conversion unit 11 and the first evaluation unit 14 → task unit 13 → image conversion unit 11 in the opposite direction (on the connection configuration). The weight parameter of the image conversion unit 11 may be updated by using a method or the like.

当該手順（２２Ａ）または（２２Ｂ）のコストを用いた更新により、図５及び図６による敵対的学習の枠組み利用の場合と同様にこの図７及び図８の実施形態においても、画像変換部11で変換して得られるプライバシー保護画像が、圧縮部21で圧縮した画像と類似することでプライバシー保護及びファイルサイズ削減の要求を満たし、且つ、タスク部13による認識にも適した画像となることが期待される。 By updating using the cost of the procedure (22A) or (22B), the image conversion unit 11 also in the embodiment of FIGS. 7 and 8 as in the case of using the framework of hostile learning according to FIGS. 5 and 6. The privacy-protected image obtained by converting with is similar to the image compressed by the compression unit 21, so that the requirements for privacy protection and file size reduction can be satisfied, and the image can be recognized by the task unit 13. Be expected.

ステップS22では学習が収束したか否かを判定し、収束していれば当該時点での画像変換部11の重みパラメータを最終的な学習結果として得たうえで図８のフローを終了し、収束していなければステップS21に戻ることにより、以上説明した通りの学習（ステップS21）がさらに継続して実施されることとなる。ステップS22での収束判定は、図３のステップS103や図６のステップS13と同様にして例えば、訓練用画像とは別途のテスト用画像を用いることで手順（２２Ｂ）の総合コストや手順（２２Ａ）の第一評価部14のコスト及び第二評価部24のコストを計算して学習モデルの精度を評価し、当該精度の向上（向上の履歴）が収束したか否かによって判定すればよい。また、単純に所定のエポック数で学習を切り上げても良い。 In step S22, it is determined whether or not the learning has converged, and if it has converged, the weight parameter of the image conversion unit 11 at that time is obtained as the final learning result, and then the flow of FIG. 8 is terminated and the learning is converged. If not, by returning to step S21, the learning as described above (step S21) will be further continuously carried out. The convergence test in step S22 is the same as in step S103 in FIG. 3 and step S13 in FIG. 6, for example, by using a test image separate from the training image, the total cost of the procedure (22B) and the procedure (22A). ), The cost of the first evaluation unit 14 and the cost of the second evaluation unit 24 may be calculated to evaluate the accuracy of the training model, and it may be determined whether or not the improvement of the accuracy (history of improvement) has converged. Further, the learning may be simply rounded up by a predetermined number of epochs.

以上、図４〜図８等を参照して説明した本発明の各実施形態によれば、訓練用画像を画像変換部11で変換したプライバシー保護画像に対するタスク部13での認識結果を第一評価部14で評価した第１コストと、訓練用画像を圧縮部21で変換した圧縮画像とプライバシー保護画像との類似性を第二評価部24で評価した第２コストとを用いてニューラルネットワーク構造の画像変換部11の重みパラメータを学習することで、学習結果として得られる画像変換部11が、プライバシー保護、圧縮効率及びタスク部13での認識性能の３つの点の全てにおいて優れた画像を出力することが可能となる。既に説明したように、第１コスト及び第２コストを用いた学習として、各々のコストの交互の最小化や、重みづけ和として求まる総合コストの最小化がなされるように学習することが可能である。 According to each embodiment of the present invention described with reference to FIGS. 4 to 8 and the like, the recognition result of the task unit 13 for the privacy protection image obtained by converting the training image by the image conversion unit 11 is first evaluated. Using the first cost evaluated in Part 14 and the second cost evaluated in the second evaluation part 24 for the similarity between the compressed image obtained by converting the training image in the compression part 21 and the privacy protection image, the neural network structure is constructed. By learning the weight parameters of the image conversion unit 11, the image conversion unit 11 obtained as a learning result outputs an excellent image in all three points of privacy protection, compression efficiency, and recognition performance in the task unit 13. It becomes possible. As already explained, as learning using the first cost and the second cost, it is possible to learn so that the alternate minimization of each cost and the total cost obtained as the weighted sum are minimized. be.

以下、種々の補足例や追加例などに関する説明を行う。 Hereinafter, various supplementary examples and additional examples will be described.

（１）図４〜図８の各実施形態で共通して用いられる圧縮部21に関して、以下のようにしてもよい。圧縮部21による圧縮は、例えば、周波数変換を用いる画像圧縮（基底にDCT（離散コサイン変換）を用いるJPEGや、基底にウェーブレット変換を用いるJPEG2000等）により、以下の(i)〜(iii)の観点でユーザ指定される圧縮設定の下において行うことができる。 (1) Regarding the compression unit 21 commonly used in each of the embodiments of FIGS. 4 to 8, the following may be applied. The compression by the compression unit 21 is performed by, for example, image compression using frequency conversion (JPEG using DCT (discrete cosine transform) as the base, JPEG2000 using wavelet transform as the base, etc.), and the following (i) to (iii). It can be performed under the compression setting specified by the user from the viewpoint.

(i) JPEGやJPEG2000圧縮であれば品質値が全体の半分以下となるように圧縮した低品質圧縮画像を得るように、設定してよい。JPEGであれば、量子化により多くの高周波成分が0となる。これにより、服のテクスチャなどのエッジに関するプライバシーを保護しやすくなり、圧縮率も高くなる。 (i) In the case of JPEG or JPEG2000 compression, it may be set to obtain a low-quality compressed image compressed so that the quality value is less than half of the whole. In the case of JPEG, many high-frequency components become 0 due to quantization. This makes it easier to protect the privacy of edges such as clothing textures and increases the compression ratio.

(ii)-a JPEGのDCT成分やJPEG2000のウェーブレット変換後の最小周波数成分を全て同一値（例：中間値が望ましい。0―1階調なら0.5）とする。これにより、大半のグラデーションがなくなり、肌等のプライバシーを保護しやすくなる。また、DCT係数の情報削減により圧縮率も高くなる。 (ii) -a The DCT component of JPEG and the minimum frequency component after wavelet transform of JPEG2000 are all set to the same value (example: intermediate value is desirable. 0.5 for 0-1 gradation). This eliminates most of the gradation and makes it easier to protect the privacy of the skin and the like. In addition, the compression rate is increased by reducing the information on the DCT coefficient.

(ii)-b JPEGのDCT成分やウェーブレット変換後の最小周波数成分を数個の値（例：２値から8値）とする。これにより、微細なグラデーションがなくなり、肌等のプライバシーを保護しやすくなる。また、DCT係数の情報削減により圧縮率も高くなる。 (ii) -b Let the DCT component of JPEG and the minimum frequency component after wavelet transform be several values (example: 2 to 8 values). This eliminates fine gradations and makes it easier to protect the privacy of the skin and the like. In addition, the compression rate is increased by reducing the information on the DCT coefficient.

すなわち、(ii)-aでは、変換後の最小周波数成分（の本来の値）を同一値に書き換えており、(ii)-bでは、変換後の最小周波数成分を粗く量子化する。通常は画像品質を確保するために低周波成分は細かく量子化し、粗く量子化されないが、ここでは粗く量子化するのが特徴である。 That is, in (ii) -a, the converted minimum frequency component (original value) is rewritten to the same value, and in (ii) -b, the converted minimum frequency component is roughly quantized. Normally, low-frequency components are finely quantized and not coarsely quantized in order to ensure image quality, but here, the feature is that they are coarsely quantized.

(iii) 用いる周波数の基底を選択する、及び／又は、強度を変更する。すなわち、基底を選択する場合は、選択されなかった所定の基底の変換係数を削除することとなる。強度（変換係数）を変更する場合は、所定基底の変換係数を一定値に強制的に書き換えるか、係数の絶対値を変更することとなる。例えばDCTのDC成分以外においては、係数の絶対値を小さくすることで強度が弱くなる。 (iii) Select the base of the frequency to be used and / or change the intensity. That is, when selecting a basis, the conversion coefficient of a predetermined basis that has not been selected is deleted. When changing the intensity (conversion coefficient), the conversion coefficient of a predetermined basis is forcibly rewritten to a constant value, or the absolute value of the coefficient is changed. For example, except for the DC component of DCT, the intensity is weakened by reducing the absolute value of the coefficient.

基本的には、タスクの精度を下げにくい(i)-(iii)を随時選択・組み合わせるとよい。一般的な行動認識や画像認識タスクであれば、動きやグラデーションが分かる(i)と(iii) が向く可能性が高く、(ii)は顔や骨格のキーポイントを抽出するタスクに向くと考えられる。また、タスクと(i)-(iii)の相性が不明である場合は、(iii)で複数の周波数基底を選択しても良い。例えば、低周波、高周波、その中間周波数成分に分け、それぞれの周波成分のみ、または、どれか２つの領域の周波数成分のみで難読化器（画像変換部）を学習し、タスクの精度劣化度を得た後に、用いる周波数成分を決定するようにしてもよい。周波数成分の分け方は３つに限らない。 Basically, it is advisable to select and combine (i)-(iii) at any time, which makes it difficult to reduce the accuracy of the task. For general behavior recognition and image recognition tasks, it is highly likely that (i) and (iii), which understand movement and gradation, are suitable, and (ii) is suitable for the task of extracting key points of the face and skeleton. Be done. If the compatibility between the task and (i)-(iii) is unknown, a plurality of frequency bases may be selected in (iii). For example, it is divided into low frequency, high frequency, and its intermediate frequency components, and the obfuscation device (image converter) is learned with only each frequency component or only the frequency components in any two regions, and the degree of deterioration of task accuracy is determined. After obtaining it, the frequency component to be used may be determined. The method of dividing the frequency components is not limited to three.

例えば、同じ大きさで顔を撮影した場合、FFT（高速フーリエ変換）を用いた空間周波数解析により、空間周波数の強度と年齢または性別には高い相関関係があることが、以下の特許文献やURL（「顔画像における表情や印象と空間周波数特性との関係」）で開示されるように、知られている。
特許05827225号（特願2012-521378）
https://www.jstage.jst.go.jp/article/itej/69/11/69_836/_pdf/-char/ja For example, when the face is photographed with the same size, the spatial frequency analysis using FFT (Fast Fourier Transform) shows that there is a high correlation between the intensity of the spatial frequency and age or gender. It is known as disclosed in ("Relationship between facial expressions and impressions in facial images and spatial frequency characteristics").
Patent No. 05827225 (Japanese Patent Application No. 2012-521378)
https://www.jstage.jst.go.jp/article/itej/69/11/69_836/_pdf/-char/ja

この考えを用いると、年齢や男女を隠すように、圧縮に用いる周波数の強度を変更してもよい。例えば、低周波の強度が大きいと女性に判別されやすく、高周波の強度が大きいと男性に判別されやすいことが分かっている。そこで、女性の画像においても高周波成分を男性的に見えるように強度を段階的に強くする等によりプライバシーを保護できると考えられる。そのような圧縮画像を訓練用画像全体に対して作っておいてもよい。 Using this idea, the frequency intensity used for compression may be changed to hide age and gender. For example, it is known that when the intensity of low frequency is high, it is easy for women to distinguish it, and when the intensity of high frequency is high, it is easy for men to distinguish it. Therefore, it is considered that privacy can be protected by gradually increasing the intensity of high-frequency components so that they look masculine even in female images. Such a compressed image may be created for the entire training image.

（２）プライバシー保護と圧縮率向上の観点から、ダイナミックレンジ（階調数）は予め縮小したものを訓練用画像として用いてもよい。例えば、ダイナミックレンジを縮小しつつ画素値平均を128近辺とする、RGBそれぞれ256の階調数→8値化する、等が可能である。また、減色する、肌色など人に目立つ色を別の色（例：青、紫、緑等）に変換する等を行ってもよい。 (2) From the viewpoint of privacy protection and improvement of compression rate, a dynamic range (number of gradations) reduced in advance may be used as a training image. For example, it is possible to reduce the dynamic range and set the average pixel value to around 128, or to change the number of gradations of 256 for each of RGB to 8 values. Further, color reduction, conversion of a color that is conspicuous to humans such as skin color to another color (eg, blue, purple, green, etc.) may be performed.

タスク部13の学習済みパラメータをさらに更新しつつ画像変換部11の学習を行う場合（前述したファインチューニングを行う場合）は、生成画像をそのままタスクに入力してもよいが、そうでない場合は、タスク実施前に元のダイナミックレンジや色数に戻してからタスクを実施すればよい。同様に、画像のプライバシー保護やデータ削減のため、圧縮前に、減色する、肌色など人に目立つ色を別の色（例：青、紫、緑等）に変換する、ダイナミックレンジを縮小する等行っても良い。この場合は、タスク実施前に元の色数や色、ダイナミックレンジを元に戻してからタスクを実施する。 When learning the image conversion unit 11 while further updating the learned parameters of the task unit 13 (when performing the fine tuning described above), the generated image may be input to the task as it is, but if not, the generated image may be input to the task as it is. Before executing the task, the task may be executed after returning to the original dynamic range and the number of colors. Similarly, in order to protect the privacy of images and reduce data, before compression, color reduction, conversion of colors that are noticeable to humans such as skin color to other colors (eg blue, purple, green, etc.), reduction of dynamic range, etc. You may go. In this case, the original number of colors, colors, and dynamic range are restored before the task is executed, and then the task is executed.

（３）タスクが姿勢推定である場合、姿勢推定可能な状態でプライバシーを保護する応用例は様々であるが、例えば以下がある。
・宅内で運動した画像をサーバに送信し、姿勢推定による画像解析を行い、アドバイスを受ける場合に、宅内撮影画像の人物・肌・服装・部屋などのプライバシーを守ることができる。
・ドライブレコーダーで撮影した車外の映像をサーバに送信する際に、歩行者の挙手姿勢、転倒姿勢等をAI（人工知能）で認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・サーバに集められたドライブレコーダーで撮影した車外の映像の公開レベルをあげてデータを移管・公開する際に、歩行者の挙手姿勢、転倒姿勢等をAIで認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・ドライブレコーダーで撮影した車内の映像から、運転者や同乗者の行為（携帯で電話している、後ろを向いている等）を検出できるようにしつつ、運転手や車内の同乗者のプライバシーを守り、映像をサーバに送信する・公開することができる。 (3) When the task is posture estimation, there are various application examples of protecting privacy in a state where posture estimation is possible, and for example, there are the following.
-When sending an image of exercise in the house to a server, performing image analysis by posture estimation, and receiving advice, it is possible to protect the privacy of the person, skin, clothes, room, etc. of the image taken at home.
-When transmitting the image of the outside of the vehicle taken by the drive recorder to the server, it is possible to protect the privacy of the pedestrian while maintaining the state in which the AI (artificial intelligence) can recognize the pedestrian's raising hand posture, falling posture, etc.
・ When transferring / publishing data by raising the level of disclosure of images outside the vehicle taken by the drive recorder collected on the server, the pedestrian's hand-raising posture, falling posture, etc. can be recognized by AI while maintaining the pedestrian's posture. You can protect your privacy.
・ Privacy of the driver and passengers in the car while making it possible to detect the actions of the driver and passengers (calling on a mobile phone, facing backwards, etc.) from the images inside the car taken by the drive recorder. You can protect and send / publish the video to the server.

（４）図４〜図８の各実施形態で共通して用いられる画像変換部11に関して、以下のようにしてもよい。 (4) Regarding the image conversion unit 11 commonly used in each of the embodiments of FIGS. 4 to 8, the following may be applied.

画像変換部11を構成するネットワークの中間層または出力層に、圧縮予定の周波数基底（圧縮部21で圧縮に用いるのと共通の周波数基底）をカーネルとした畳込層を挿入する。例えば、８×８のDCT基底をカーネルとする。（特定層の全部ではなく、一部のみに当該周波数基底カーネルの畳込層を入れてもよい。）ストライド幅は圧縮部21での圧縮ブロックサイズとすることで、圧縮しやすい画像を変換するネットワークを学習により生成しやすくなることが期待される。基底は予め選択しておいても、実験的にタスク精度・圧縮率が高くなったネットワークで選択された基底の組み合わせを後から特定してもよい。また、強度を変更してもよい。強度調整は例えばカーネル値を予めスカラー倍することで実現する。 An convolution layer having a frequency base to be compressed (a frequency base common to that used for compression in the compression unit 21) as a kernel is inserted into the intermediate layer or the output layer of the network constituting the image conversion unit 11. For example, let the kernel be an 8x8 DCT basis. (The convolution layer of the frequency base kernel may be included in only a part of the specific layer, not the entire layer.) By setting the stride width to the compression block size in the compression unit 21, an image that is easy to compress is converted. It is expected that learning will make it easier to generate networks. The basis may be selected in advance, or the combination of the basis selected in the network in which the task accuracy and the compression rate are experimentally increased may be specified later. Moreover, the strength may be changed. Strength adjustment is realized, for example, by multiplying the kernel value by a scalar in advance.

カーネル値は、品質（Quality）ごとのJPEG等の量子化テーブル値を用いて、1/テーブル値でスカラー倍してもよい。テーブル値は範囲が広いため、2〜8値化等を行っておくとよい。その後、2〜8値化できるステップ状のアクティベーション関数で量子化することで、実際のJPEG圧縮を画像生成のニューラルネットにおいて、ある程度模擬することも可能である。 The kernel value may be scalar-multiplied by 1 / table value using a quantization table value such as JPEG for each quality. Since the table value has a wide range, it is advisable to convert it to 2 to 8 values. After that, by quantizing with a step-like activation function that can be converted into 2 to 8 values, it is possible to simulate the actual JPEG compression to some extent in the image generation neural network.

学習する際は、挿入した周波数基底カーネルのみは重みの更新を行わない。つまり、固定値とし、他の層の重みのみ更新すればよい。 When learning, only the inserted frequency-based kernel does not update the weights. That is, it is set to a fixed value, and only the weights of other layers need to be updated.

なお、挿入したDCT層（周波数基底カーネルを含む層）の後段（直後でなくて良い）に逆量子化と逆DCTにあたるアップコンボリューション層（重みが固定され学習で更新されない層）を挿入してもよい。アップコンボリューション層が入ることで、周波数領域でなく空間領域に戻り、見た目でプライバシーを確認しやすくなること、また、JPEG圧縮を模擬した画像変換器とタスクを直接接続でき、学習時にタスク誤差の画像変換器への逆伝播もできるため、JPEG圧縮の影響を正しく推測できることが期待できる。通常は、画像変換器の後のJPEG圧縮は、学習時には考慮しないが、これによりある程度正しくJPEG圧縮の影響も考慮することが可能になることが期待できる。 In addition, an up-convolution layer (a layer whose weight is fixed and is not updated by learning) corresponding to inverse quantization and inverse DCT is inserted in the subsequent stage (not immediately after) of the inserted DCT layer (layer including the frequency basis kernel). May be good. By adding an up-convolution layer, it returns to the spatial domain instead of the frequency domain, making it easier to visually check privacy, and the image converter that simulates JPEG compression can be directly connected to the task, causing task errors during learning. Since it can be propagated back to the image converter, it can be expected that the effect of JPEG compression can be estimated correctly. Normally, JPEG compression after the image converter is not considered during learning, but it can be expected that this will make it possible to consider the effects of JPEG compression to some extent correctly.

（５）図９は、一般的なコンピュータ装置70におけるハードウェア構成を示す図であり、図４〜図８の各実施形態の認識装置10及び学習装置20はそれぞれ、このような構成を有する１つ以上のコンピュータ装置70として実現可能である。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサ72（GPU（グラフィック演算装置）や深層学習専用プロセッサ等）、CPU71や専用プロセッサ72にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77と、これらの間でデータを授受するためのバスBSと、を備える。 (5) FIG. 9 is a diagram showing a hardware configuration in a general computer device 70, and the recognition device 10 and the learning device 20 of each embodiment of FIGS. 4 to 8 each have such a configuration 1. It can be realized as one or more computer devices 70. The computer device 70 is a CPU (central processing unit) 71 that executes a predetermined instruction, and a dedicated processor 72 (GPU (graphic calculation device)) that executes a part or all of the execution instructions of the CPU 71 on behalf of the CPU 71 or in cooperation with the CPU 71. And deep learning dedicated processor, etc.), RAM73 as the main storage device that provides a work area for the CPU 71 and the dedicated processor 72, ROM74 as the auxiliary storage device, communication interface 75, display 76, mouse, keyboard, touch panel, etc. It includes an input interface 77 that accepts data, and a bus BS for exchanging data between them.

認識装置10及び学習装置20の各部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実現することができる。また、学習装置20による学習方法は、図６または図８の各ステップに対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実施することができる。 Each part of the recognition device 10 and the learning device 20 can be realized by a CPU 71 and / or a dedicated processor 72 that reads and executes a predetermined program corresponding to the function of each part from the ROM 74. Further, the learning method by the learning device 20 can be carried out by the CPU 71 and / or the dedicated processor 72 that reads and executes a predetermined program corresponding to each step of FIG. 6 or FIG. 8 from the ROM 74.

10…認識装置、20…学習装置
21…圧縮部、23…識別部、24…第二評価部、11…画像変換部、13…タスク部、14…第一評価部 10 ... recognition device, 20 ... learning device
21 ... compression unit, 23 ... identification unit, 24 ... second evaluation unit, 11 ... image conversion unit, 13 ... task unit, 14 ... first evaluation unit

Claims

It is a learning method to learn the weight parameters of the image conversion process by the neural network structure.
The first cost for the recognition result of the privacy protection image obtained by converting the training image by the image conversion process and recognizing the privacy protection image by the task process for recognizing a predetermined task, and
It is characterized in that the weight parameter of the image conversion process is learned by using the compressed image obtained by compressing the training image by the compression process and the second cost for the evaluation result of the similarity between the privacy protection image and the compressed image. Learning method.

The learning method according to claim 1, wherein the second cost is evaluated based on a difference between the compressed image and the privacy protection image.

Further using an identification process by a neural network structure, which is learned to distinguish authenticity by identifying the compressed image as a real image and the privacy-protected image as a fake image.
It is characterized by further comprising learning by a hostile generation network so that the identification process improves the accuracy of discriminating authenticity and the image conversion process improves the accuracy of misleading authenticity with respect to the identification process. The learning method according to claim 1 or 2.

The learning method according to any one of claims 1 to 3, wherein the compression process uses a discrete cosine transform or a wavelet transform.

The compression process is characterized by including frequency conversion using a conversion basis and replacement or quantization of the frequency-converted lowest frequency component with a constant value. The learning method described in either.

The learning method according to any one of claims 1 to 5, wherein the compression process includes frequency conversion using a conversion basis and deleting a conversion coefficient of a predetermined conversion basis.

The compression process involves frequency conversion using a conversion basis.
7. The learning method described in Crab.

The learning method according to claim 7, wherein the stride width of the fixed folding layer is set to match the compression block size in the compression process.

Learning the weight parameter of the image conversion process is
A claim characterized in that the first cost and the second cost are alternately minimized, or the total cost calculated from the first cost and the second cost is minimized. The learning method according to any one of 1 to 8.

A learning device that learns weight parameters for image conversion processing using a neural network structure.
The first cost for the recognition result of the privacy protection image obtained by converting the training image by the image conversion process and recognizing the privacy protection image by the task process for recognizing a predetermined task, and
It is characterized in that the weight parameter of the image conversion process is learned by using the compressed image obtained by compressing the training image by the compression process and the second cost for the evaluation result of the similarity between the privacy protection image and the compressed image. Learning device.

A program comprising causing a computer to execute the learning method according to any one of claims 1 to 9.