JP7280210B2

JP7280210B2 - Learning method, device and program

Info

Publication number: JP7280210B2
Application number: JP2020014490A
Authority: JP
Inventors: 絵美明堂; 和之田坂; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-01-31
Filing date: 2020-01-31
Publication date: 2023-05-23
Anticipated expiration: 2040-01-31
Also published as: JP2021120840A

Description

本発明は、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習する学習方法、装置及びプログラムに関する。 The present invention relates to a learning method, apparatus, and program for learning image conversion processing capable of obtaining an image that ensures task recognition accuracy, privacy protection, and compression efficiency.

ユーザのプライバシー情報が含まれうる画像／音声データを、クラウドに送信し、ニューラルネット等の機械学習を用いて解析するケースにおいて、ユーザへのプライバシー侵害が発生することを防止する必要がある。例えば、音声データに関して、クラウドに送られたスマートスピーカの内容を、仮にサービス提供者側が視聴したとすると、機械学習の精度向上等の技術的な目的の下での視聴であったとしても、結果としてプライバシーの侵害が発生しうる。 In cases where image/audio data that may contain user privacy information is sent to the cloud and analyzed using machine learning such as a neural network, it is necessary to prevent infringement of user privacy. For example, with respect to audio data, if the service provider views the content of the smart speaker sent to the cloud, even if it is for technical purposes such as improving the accuracy of machine learning, the result As a result, a violation of privacy may occur.

なお、このようなスマートスピーカでは、一般的に通信路の盗聴からは、データの暗号化によりユーザのプライバシーを守っている。しかし、クラウド側では暗号化されたデータを復号するため、上記のような状況が発生しうることとなる。 In addition, in such a smart speaker, the user's privacy is generally protected by data encryption from eavesdropping of the communication channel. However, since the cloud side decrypts the encrypted data, the situation described above can occur.

以下のURLのニュースリリース記事「暗号化したままディープラーニングの標準的な学習処理ができる秘密計算技術を世界で初めて実現」において開示されているように、クラウド側で暗号化したまま再学習やファインチューニング等の処理を行う手法もある。
https://www.ntt.co.jp/news2019/1909/190902a.html As disclosed in the news release article at the following URL, "The world's first realization of secure computation technology that enables standard deep learning processing while encrypted", re-learning and fine-tuning are performed while encrypted on the cloud side There is also a method of performing processing such as tuning.
https://www.ntt.co.jp/news2019/1909/190902a.html

ここでの課題の一つ目は、サービス提供者の知覚による画像や音声データの確認ができないことである。実際に、問題の原因追究や機械学習の誤り等、人の知覚で行いたい作業もある。例えば、ポイズニングデータの目視排除等やユーザのクレーム対応等をサービス提供者側が行いたい場合があるが、そのような確認も難しくなると考えられる。二つ目の課題は、暗号化はしていても生データが含まれるため、ユーザにとっては、攻撃や運用ミスなどで生データが漏洩してしまうのではないかという不安を持ちやすいことである。 The first problem here is that the service provider cannot confirm the image and audio data by perception. In fact, there are tasks that should be performed by human perception, such as investigating the cause of problems and making mistakes in machine learning. For example, there are cases where the service provider side wants to visually exclude poisoning data or respond to complaints from users, but such confirmation may be difficult. The second issue is that even if encrypted, raw data is included, so users tend to worry that raw data may be leaked due to attacks or operational errors. .

一方、画像データに関しては、プライバシーと考えられるセンシティブな情報にぼかしや置き換え等の画像処理を行い、プライバシーを保護する手法が従来から行われている。ユーザにとっては、生データを提供しない安心感はあるものの、サービス提供者側の画像解析タスクの精度は非常に低下しやすい。 On the other hand, with respect to image data, a technique has been conventionally used to protect privacy by performing image processing such as blurring and replacement on sensitive information considered to be private. Although the user feels relieved that the raw data is not provided, the accuracy of the image analysis task on the service provider's side tends to be very low.

近年では、ニューラルネット等の機械学習によるタスクの解析精度を可能な限り低下させずに、プライバシー画像を生成する試みもある。このような手法は、タスクの精度をある程度保ったまま、クラウド管理者やサービス提供者が画像を知覚で判断することができ、かつ、ユーザのプライバシーも守ることができる。ユーザも原画を送信しなくてよいので、サービス利用への心理的障壁を低下させる効果があると考えられる。 In recent years, attempts have been made to generate a privacy image without lowering the analysis accuracy of a task by machine learning such as a neural network as much as possible. Such a method allows cloud administrators and service providers to perceive images while maintaining task accuracy to some extent, and also protects user privacy. Since the user does not have to send the original image, it is thought to have the effect of lowering the psychological barrier to using the service.

例えば、特許文献１の手法では、顔器官や顔向きを推定し、アバターで顔を置き換えることで、プライバシーを守り、且つ、運転に関する行動認識精度も保つことができる。同様に、非特許文献１の手法では、顔領域をGAN（敵対的生成ネットワーク）で本人とは異なる顔で作りかえることでプライバシーを守り、且つ、行動認識の精度も保つことができる。 For example, in the method of Patent Document 1, by estimating facial organs and facial orientation and replacing the face with an avatar, it is possible to protect privacy and maintain action recognition accuracy regarding driving. Similarly, in the method of Non-Patent Document 1, privacy can be protected and the accuracy of action recognition can be maintained by recreating a face area with a face different from that of the person using a GAN (hostile generation network).

これら特許文献１や非特許文献１の手法は、顔等の画像の一部のプライバシー領域を置き換える方法であり、画像全体のプライバシーについては考慮されていない。例えば、着ている服や肌質、部屋の様子等、サービスに不要なものが消されておらず、全体のリアリティーを消したいといった要望には対応できない。 The methods of Patent Document 1 and Non-Patent Document 1 are methods of replacing a partial privacy region of an image such as a face, and do not consider the privacy of the entire image. For example, it is not possible to respond to a request to erase the reality of the whole because the clothes, the skin type, the state of the room, etc., which are unnecessary for the service are not erased.

全体のリアリティーを消す／低減することが可能な手法として、非特許文献２の手法では、動画からの行動認識を低解像画像から行う。低解像なので画像ファイルサイズを軽減できるといった利点は存在する。しかし、単なる低解像動画から簡単な行動認識以外のタスクを行うのは難しく、適用可能なタスクが限定されてしまう。 As a method capable of erasing/reducing the overall reality, the method of Non-Patent Document 2 performs action recognition from moving images from low-resolution images. Since it is a low resolution, there is an advantage that the image file size can be reduced. However, it is difficult to perform tasks other than simple action recognition from simple low-resolution videos, and applicable tasks are limited.

一方、非特許文献３では、ランダムノイズを大量に挿入したターゲット画像に近づくように、敵対的学習の枠組みを用いて原画像全体を変換するモデルを学習し生成している。敵対的学習の枠組みを用いることでタスクの精度を保ちつつ、ランダムノイズの入ったターゲット画像に近い画像変換モデルを学習できる。タスクとは例えば、画像認識、顔器官の認識等がある。 On the other hand, Non-Patent Document 3 learns and generates a model that transforms the entire original image using an adversarial learning framework so as to approach a target image in which a large amount of random noise is inserted. By using an adversarial learning framework, we can learn an image transformation model that is close to the target image with random noise while maintaining the accuracy of the task. Tasks include, for example, image recognition, recognition of facial features, and the like.

同手法では、変換した画像全体からタスクの解析に不要な要素が隠されやすく、全体のリアリティーを消したいといったプライバシーに対する要望に対応しやすい。一方で、同手法ではタスクの精度の劣化も低く抑えることができる。 With this method, it is easy to hide elements unnecessary for task analysis from the entire converted image, and it is easy to respond to requests for privacy such as erasing the reality of the whole. On the other hand, this method can also suppress the deterioration of task accuracy.

特表2018-528536号公報Japanese Patent Publication No. 2018-528536

Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018. Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017.Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017. Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." arXiv preprint arXiv:1901.00098 (2019).Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." arXiv preprint arXiv:1901.00098 (2019).

しかしながら、上記のように種々の要求に対処しうる非特許文献３の手法においても、次のような課題が存在した。 However, even the technique of Non-Patent Document 3, which can cope with various demands as described above, has the following problems.

すなわち、非特許文献３の手法では、プライバシーのために変換された画像は、ランダムノイズが表れやすい。画像圧縮については考慮されておらず、ランダムノイズ化した変換画像は、空間周波数成分を低周波数成分から高周波成分に渡るまでの各成分において大きく有しており、圧縮効率が非常に悪化するという課題がある。画像解析では、多くの静止画像・動画像を送信・保存する際の容量の低減が課題となっており、プライバシーや解析精度を保ちつつファイルサイズを小さくすることが求められているが、圧縮効率が悪いとファイルサイズ低減の要求を満たすことができない。 That is, in the technique of Non-Patent Document 3, random noise tends to appear in an image converted for privacy. Image compression is not taken into consideration, and a random noise transformed image has large spatial frequency components in each component ranging from low frequency components to high frequency components, and the compression efficiency is greatly deteriorated. There is In image analysis, reducing the amount of space required to transmit and store large numbers of still and moving images has become an issue. If it is bad, it cannot meet the demand for file size reduction.

上記従来技術の課題に鑑み、本発明は、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習する学習方法、装置及びプログラムを提供することを目的とする。 SUMMARY OF THE INVENTION In view of the above problems of the prior art, it is an object of the present invention to provide a learning method, an apparatus, and a program for learning an image conversion process capable of obtaining an image ensuring task recognition accuracy, privacy protection, and compression efficiency. do.

上記目的を達成するため、本発明は、ニューラルネットワーク構造による画像変換処理の重みパラメータを学習する学習方法であって、訓練用画像を前記画像変換処理で変換したプライバシー保護画像を、所定タスクの認識を行うタスク処理で認識した認識結果に対する第１コストと、前記訓練用画像を圧縮処理で圧縮した圧縮画像と、前記プライバシー保護画像と、の類似性の評価結果に対する第２コストと、を用いて前記画像変換処理の重みパラメータを学習することを特徴とする。また、前記学習方法に対応する学習装置であることと、コンピュータに前記学習方法を実行させるプログラムであることを特徴とする。 In order to achieve the above object, the present invention provides a learning method for learning weight parameters for image transformation processing using a neural network structure, wherein a privacy protection image obtained by transforming a training image by the image transformation processing is used for recognition of a predetermined task. and a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image. It is characterized by learning a weight parameter of the image conversion process. Further, the present invention is characterized by being a learning device corresponding to the learning method and a program for causing a computer to execute the learning method.

本発明によれば、前記第１コストと前記第２コストを用いて学習することで、タスクの認識精度、プライバシー保護及び圧縮効率を確保した画像を得ることのできる画像変換処理を学習することができる。 According to the present invention, by learning using the first cost and the second cost, it is possible to learn an image conversion process that can obtain an image that ensures task recognition accuracy, privacy protection, and compression efficiency. can.

従来手法でのタスク実施時（推論時）の構成である従来実施構成の機能ブロック図である。FIG. 11 is a functional block diagram of a conventionally implemented configuration, which is a configuration when a task is executed (inference) by a conventional method; 従来画像変換部の重みパラメータを学習する際の構成である、従来学習構成の機能ブロック図である。FIG. 4 is a functional block diagram of a conventional learning configuration, which is a configuration for learning weighting parameters of a conventional image conversion unit; 従来学習構成による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。It is a flowchart of learning by the conventional learning structure, and shows the procedure of learning using GAN which is an existing method. 一実施形態に係るタスク実施時（推論時）の構成である認識装置の機能ブロック図である。FIG. 4 is a functional block diagram of a recognition device that is configured when executing a task (during inference) according to an embodiment; 画像変換部の重みパラメータを学習する際の構成である、一実施形態に係る学習装置の機能ブロック図である。FIG. 4 is a functional block diagram of a learning device according to an embodiment, which is a configuration for learning weighting parameters of an image conversion unit; 一実施形態に係る学習装置による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。10 is a flowchart of learning by the learning device according to one embodiment, showing a procedure for performing learning using GAN, which is an existing method. 図５とは別の一実施形態に係る学習装置の機能ブロック図である。FIG. 6 is a functional block diagram of a learning device according to another embodiment different from FIG. 5; 図７の構成における一実施形態に係る学習装置20による学習のフローチャートである。8 is a flow chart of learning by the learning device 20 according to one embodiment in the configuration of FIG. 7. FIG. 一般的なコンピュータ装置におけるハードウェア構成を示す図である。It is a figure which shows the hardware constitutions in a common computer apparatus.

以下、本実施形態を説明する前に、対比例としての非特許文献３の手法（以下、「従来手法」と称する。）を簡潔に説明する。図１は、従来手法でのタスク実施時（推論時）の構成である従来実施構成100の機能ブロック図であり、従来実施構成100は従来画像変換部101、従来圧縮部102及び従来タスク部103を有する。 Before describing the present embodiment, the method of Non-Patent Document 3 (hereinafter referred to as the “conventional method”) will be briefly described as a comparison. FIG. 1 is a functional block diagram of a conventional implementation configuration 100, which is a configuration at the time of task execution (at the time of inference) in a conventional method. have

従来画像変換部101は、画像の難読化器の役割を有し、変換対象となる画像（ユーザが提供する、プライバシー保護の対象となる画像）を変換して、プライバシー保護画像を出力する。従来圧縮部102は、既存手法で当該プライバシー保護画像を圧縮することで、圧縮プライバシー保護画像を出力する。従来タスク部103は、当該圧縮プライバシー保護画像を復号したうえで所定の認識タスク（例えば姿勢推定や画像認識）を実施して認識結果（例えば姿勢推定結果や画像認識結果）を出力する。 The conventional image conversion unit 101 serves as an image obfuscator, converts an image to be converted (an image provided by the user and subject to privacy protection), and outputs a privacy-protected image. The conventional compression unit 102 outputs a compressed privacy protection image by compressing the privacy protection image using an existing method. The conventional task unit 103 decodes the compressed privacy-protected image, performs a predetermined recognition task (for example, pose estimation or image recognition), and outputs a recognition result (for example, pose estimation result or image recognition result).

既に説明したように、プライバシー保護画像（又はこれを保存や伝送するための圧縮プライバシー保護画像）は、プライバシーが保護された状態（難読化された状態）に変換されており、且つ、従来タスク部103での認識精度も一定精度を確保しうる画像となっている。しかしながら、圧縮効率が悪く、圧縮プライバシー保護画像のファイルサイズが大きくなるという課題を有する。 As already explained, the privacy-preserving image (or the compressed privacy-preserving image for storing or transmitting it) has been converted to a privacy-preserving state (obfuscated state), and the conventional task unit The recognition accuracy in 103 is also an image that can secure a certain accuracy. However, there is a problem that the compression efficiency is low and the file size of the compressed privacy protection image becomes large.

図１の従来実施構成100によるタスク実施のためには、予め、所定の畳込ニューラルネットワークや多層パーセプトロン（以下、「畳込ニューラルネットワーク等」と呼ぶ）で構成されている従来画像変換部101を学習しておき、その重みパラメータを求めておく必要がある。図２は、従来画像変換部101の重みパラメータを当該学習する際の構成である、従来学習構成200の機能ブロック図である。図示されるように、従来学習構成200は、従来画像変換部101、従来タスク部103、従来第一評価部104、ターゲット画像生成部201、従来識別部203及び従来第二評価部204を有する。 In order to perform the task by the conventional implementation configuration 100 of FIG. It is necessary to learn and obtain the weight parameter. FIG. 2 is a functional block diagram of a conventional learning configuration 200, which is a configuration for learning the weighting parameters of the conventional image conversion unit 101. As shown in FIG. As shown, the conventional learning configuration 200 includes a conventional image conversion unit 101 , a conventional task unit 103 , a conventional first evaluator 104 , a target image generator 201 , a conventional identification unit 203 and a conventional second evaluator 204 .

共通の符号が付されるように、図１及び図２の両方にそれぞれ存在する従来画像変換部101及び従来タスク部103は、図１及び図２において同一の構成である。ただし、従来画像変換部101の重みパラメータは、従来学習構成200による学習によって逐次的に更新され、学習が完了した際の重みパラメータで構成される従来画像変換部101が、図１の従来実施構成100において用いられるものとなる。 1 and 2, the conventional image conversion unit 101 and the conventional task unit 103, which are respectively present in both FIGS. 1 and 2, have the same configuration in FIGS. However, the weighting parameters of the conventional image conversion unit 101 are sequentially updated by learning by the conventional learning configuration 200, and the conventional image conversion unit 101 configured with the weighting parameters when the learning is completed uses the conventional implementation configuration shown in FIG. 100 will be used.

一方、図１及び図２で共通の従来タスク部103に関しては、画像に対して所定のタスク（姿勢推定など）を実行する任意の既存の畳込ニューラルネットワーク等で構成されており、図２の従来学習構成200による学習を行う時点において既に学習済みであるものとして、その重みパラメータが定まっているものである。（すなわち、従来学習構成200による学習において、従来タスク部103の重みパラメータが学習されて更新されることはない。） On the other hand, the conventional task unit 103, which is common in FIGS. At the time of learning by the conventional learning configuration 200, it is assumed that learning has already been completed, and the weighting parameters are determined. (That is, in learning by the conventional learning configuration 200, the weighting parameters of the conventional task section 103 are not learned and updated.)

図３は、従来学習構成200による学習のフローチャートであり、既存手法である敵対的学習の枠組みを利用した学習を行う手順を示すものである。当該フローの開始にあたり、学習対象となる従来画像変換部101（及び従来識別部203）の重みパラメータには初期値をランダム値等として設定しておく。当該フローが開始されるとステップS101では、GAN接続の構成における従来学習構成200により、従来識別部203の学習を行い、その重みパラメータを更新してから、ステップS102へと進む。 FIG. 3 is a flowchart of learning by the conventional learning configuration 200, showing the procedure of learning using the framework of adversarial learning, which is an existing method. At the start of the flow, an initial value is set as a random value or the like for the weight parameter of the conventional image conversion unit 101 (and the conventional identification unit 203) to be learned. When the flow is started, in step S101, the conventional discriminating unit 203 is trained by the conventional learning configuration 200 in the GAN connection configuration, and after updating the weight parameter, the process proceeds to step S102.

GAN接続の構成とは、従来学習構成200において、従来タスク部103及び従来第一評価部104を省略した、従来画像変換部101、ターゲット画像生成部201、従来識別部203、従来第二評価部204のみを有する構成として定義される。具体的に以下の（１０１）～（１０４）で示される一連の学習手順により、ステップS101において従来識別部203の重みパラメータを更新する。 The configuration of GAN connection means that the conventional task unit 103 and the conventional first evaluation unit 104 are omitted from the conventional learning configuration 200, and the conventional image conversion unit 101, the target image generation unit 201, the conventional identification unit 203, the conventional second evaluation unit 204 only. Specifically, in step S101, the weight parameter of the conventional identification unit 203 is updated by a series of learning procedures shown in (101) to (104) below.

（１０１）学習データとして与えられる訓練用画像に対して、従来画像変換部101による変換処理を適用してフェイク画像を生成し、且つ、当該訓練用画像に対してターゲット画像生成部201によるノイズ重畳処理（ガウシアンノイズで画素近傍ごとにランダムに色を変えるノイズ重畳処理）を適用して、リアル画像を生成する。ここで例えば、半分をリアル画像として生成し、残りの半分をフェイク画像として生成して、これらをミニバッチとして得る。 (101) A fake image is generated by applying conversion processing by the conventional image conversion unit 101 to a training image given as learning data, and noise is superimposed on the training image by the target image generation unit 201. A real image is generated by applying a process (noise superimposition process in which Gaussian noise randomly changes color for each pixel neighborhood). Here, for example, half are generated as real images and the remaining half are generated as fake images, and these are obtained as a mini-batch.

（１０２）上記ミニバッチとして得られたフェイク画像とリアル画像とを従来識別部203で識別し、いずれがフェイク画像（従来画像変換部101の出力）であり、いずれがリアル画像（ターゲット画像生成部201の出力）であるかの識別結果を得る。なお、従来識別部203は、リアル画像とフェイク画像を識別するタスクを実行するもの（真贋を見分ける識別器、Discriminator）として、所定の畳込ニューラルネットワーク等で構成され、この従来学習構成200においてその重みパラメータが学習されるものである。（従って、従来識別部203にはミニバッチを構成するフェイク画像とリアル画像が入力されるが、当該入力された画像がリアル画像又はフェイク画像のいずれであるかの正解情報については与えられておらず、従来識別部203において自ら、その識別結果を得る。） (102) The conventional identification unit 203 identifies the fake image and the real image obtained as the mini-batch. output). Note that the conventional identification unit 203 is configured with a predetermined convolutional neural network or the like as a unit that performs the task of identifying a real image and a fake image (discriminator for discriminating authenticity). A weight parameter is what is learned. (Therefore, the fake image and the real image that make up the mini-batch are input to the conventional identification unit 203, but correct information as to whether the input image is a real image or a fake image is not given. , the conventional identification unit 203 obtains the identification result by itself.)

（１０３）従来第二評価部204では、従来識別部203の識別結果を受け取り、予め学習データとして与えられる正解（ミニバッチ内の各画像のいずれがリアル画像でいずれがフェイク画像であるかの正解）と照合し、識別結果が正解であれば低コスト値を、正解でなければ高コスト値を与える所定の識別用コスト関数で評価することにより、識別結果に対するコストを算出する。 (103) The conventional second evaluation unit 204 receives the identification result of the conventional identification unit 203 and receives the correct answer given in advance as learning data (correct answer as to which of the images in the mini-batch is the real image and which is the fake image). and a predetermined identification cost function that gives a low cost value if the identification result is correct and a high cost value if the identification result is not correct, thereby calculating the cost for the identification result.

なお、上記の従来第二評価部204での識別用コスト関数は、従来識別部203が真贋を見分ける精度を向上させるためのものである。 The cost function for identification in the conventional second evaluation unit 204 is used to improve the accuracy with which the conventional identification unit 203 discriminates authenticity.

（１０４）複数の訓練用画像について以上（１０１）～（１０３）の処理、すなわち、コスト（誤差）の順伝播計算を行ったうえで、当該コストを用いて逆方向に、従来第二評価部204→従来識別部203の誤差逆伝播法の計算を行うことで、確率的勾配降下法等のオプティマイザー（以下、「確率的勾配降下法等」と呼ぶ）を用いて従来識別部203の重みパラメータを更新する。当該更新により、従来識別部203が真贋を見分ける精度の向上が期待される。 (104) After performing the processes (101) to (103) above for a plurality of training images, that is, forward propagation calculation of the cost (error), the cost is used in the reverse direction, the conventional second evaluation unit 204→By calculating the error backpropagation of the conventional identification unit 203, the weight of the conventional identification unit 203 is obtained using an optimizer such as stochastic gradient descent method (hereinafter referred to as “stochastic gradient descent method”). Update parameters. This update is expected to improve the accuracy with which the conventional identification unit 203 distinguishes authenticity.

ステップS102では、GAN接続構成及びタスク接続構成において学習を行うことにより、従来画像変換部101の重みパラメータを更新してから、ステップS103へと進む。GAN接続構成はステップS101で説明した通りであり、一方、タスク接続構成とは、従来学習構成200において従来画像変換部101、従来タスク部103及び従来第一評価部104のみを備える構成として定義される。具体的に以下の（２０１）～（２０３Ａ）または（２０３Ｂ）で示される一連の学習手順により、ステップS102において従来画像変換部101の重みパラメータを更新する。なお、以下に説明する通り、（２０３Ａ）の手順の変形例として（２０３Ｂ）の手順も可能であり、いずれかを用いればよい。（２０３Ａ）は従来第一評価部104の出力コストと従来第二評価部204の出力コストを交互に計算することにより、従来画像変換部101を交互に学習する手法であり、（２０３Ｂ）はこれら２つの出力コストから算出される総合コストにより従来画像変換部101を学習する手法である。 In step S102, the weighting parameters of the conventional image conversion unit 101 are updated by learning in the GAN connection configuration and the task connection configuration, and then the process proceeds to step S103. The GAN connection configuration is as described in step S101, while the task connection configuration is defined as a configuration that includes only the conventional image conversion unit 101, the conventional task unit 103, and the conventional first evaluation unit 104 in the conventional learning configuration 200. be. Specifically, the weighting parameters of the conventional image conversion unit 101 are updated in step S102 by a series of learning procedures indicated by (201) to (203A) or (203B) below. As described below, the procedure (203B) is also possible as a modified example of the procedure (203A), and either one may be used. (203A) is a method of alternately learning the conventional image conversion unit 101 by alternately calculating the output cost of the conventional first evaluation unit 104 and the output cost of the conventional second evaluation unit 204, and (203B) is a method of learning these This is a method of learning the conventional image conversion unit 101 from the total cost calculated from the two output costs.

（２０１） GAN接続構成において、学習データとして与えられる訓練用画像に対して、ステップS101に関して説明した（１０１）～（１０３）の手順を実施する。ただしこの際、（１０３）はステップS101における手順からは変更された手順（１０３'）として実施し、具体的に手順（１０３'）においては、従来第二評価部204が用いるコスト関数を、手順（１０３）で用いた識別用コスト関数とは真逆の評価を行う、識別失敗用コスト関数に変更する。すなわち、従来識別部203で得た識別結果が正解であれば高コスト値を、正解でなければ（識別に失敗していれば）低コスト値を与える、所定の識別失敗用コスト関数を用いて、従来第二評価部204における評価を実施し、識別結果に対するコストを算出する。 (201) In the GAN connection configuration, the procedures of (101) to (103) described with respect to step S101 are performed on training images given as learning data. However, at this time, (103) is performed as a procedure (103′) changed from the procedure in step S101. The cost function for identification used in (103) is changed to a cost function for identification failure that performs the opposite evaluation. That is, using a predetermined identification failure cost function, a high cost value is given if the identification result obtained by the conventional identification unit 203 is correct, and a low cost value is given if it is not correct (identification fails) , the evaluation in the conventional second evaluation unit 204 is performed, and the cost for the identification result is calculated.

なお、上記の従来第二評価部204での識別失敗用コスト関数は、従来識別部203が真贋を見分けることに失敗するように、従来画像変換部101におけるフェイク画像の生成精度を向上させるためのものである。 Note that the cost function for identification failure in the conventional second evaluation unit 204 is used to improve the accuracy of fake image generation in the conventional image conversion unit 101 so that the conventional identification unit 203 fails to distinguish authenticity. It is.

（２０２）タスク接続構成において、学習データとして与えられる訓練用画像に対して、従来画像変換部101で変換を施してフェイク画像を得て、このフェイク画像を従来タスク部103で認識して認識結果を得て、この認識結果を従来第一評価部104において、学習データとして与えられる正解と照合することにより評価して、認識結果に対するコストを算出する。当該コストは、認識結果が正解であれば低コスト値とし、正解でなければ高コスト値となるように、従来第一評価部104において所定のコスト関数を用いて算出する。 (202) In the task connection configuration, the conventional image conversion unit 101 converts training images given as learning data to obtain fake images, and the conventional task unit 103 recognizes the fake images to obtain recognition results. This recognition result is evaluated by collating it with the correct answer given as learning data in the conventional first evaluation unit 104, and the cost for the recognition result is calculated. The cost is calculated using a predetermined cost function in the conventional first evaluation unit 104 so that if the recognition result is correct, the cost is low, and if the recognition result is not correct, the cost is high.

（２０３Ａ）複数の訓練用画像をミニバッチに分け、バッチごとに以上の（２０１）または（２０２）の処理、すなわち、コスト（誤差）の順伝播計算を行い、当該接続構成において、従来第一評価部104の出力したコストまたは従来第二評価部204の出力したコストを計算する。GAN接続構成の場合は、従来第二評価部204の出力したコストを用いて（GAN接続構成上を）逆方向に、従来第二評価部204→従来識別部203→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新する。また、タスク接続構成の場合は、従来第一評価部104の出力したコストを用いて、逆方向に、従来第一評価部104→従来タスク部103→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新する。以上のGAN接続構成の逆伝播、タスク接続構成の逆伝播を交互に行い、従来画像変換部101の重みパラメータは学習される。 (203A) Divide a plurality of training images into mini-batches, perform the processing of (201) or (202) above for each batch, that is, the forward propagation calculation of the cost (error), and in the connection configuration, the conventional first evaluation The cost output by the unit 104 or the cost output by the conventional second evaluation unit 204 is calculated. In the case of the GAN connection configuration, using the cost output from the conventional second evaluation unit 204, the error of the conventional second evaluation unit 204→conventional identification unit 203→conventional image conversion unit 101 is reversed (on the GAN connection configuration). By performing backpropagation calculation, the weighting parameters of the conventional image conversion unit 101 are updated using the stochastic gradient descent method or the like. In the case of the task connection configuration, using the cost output from the conventional first evaluation unit 104, the error backpropagation method of the conventional first evaluation unit 104→conventional task unit 103→conventional image conversion unit 101 is used in the reverse direction. By performing the calculation, the weighting parameters of the conventional image conversion unit 101 are updated using the stochastic gradient descent method or the like. The above-described backpropagation of the GAN connection configuration and backpropagation of the task connection configuration are alternately performed, and the weighting parameters of the conventional image conversion unit 101 are learned.

（２０３Ｂ）総合コストを用いる場合には、複数の訓練用画像（GAN接続構成とタスク接続構成とで共通の訓練用画像）について以上の（２０１）及び（２０２）の処理、すなわち、コスト（誤差）の順伝播計算を行い、当該両接続構成において共通の訓練用画像ごとに、従来第一評価部104の出力したコストと従来第二評価部204の出力したコストとの所定の重みづけ和としての総合コストを計算し、当該総合コストを用いて（GAN接続構成上を）逆方向に、従来第二評価部204→従来識別部203→従来画像変換部101及び（タスク接続構成上を）逆方向に、従来第一評価部104→従来タスク部103→従来画像変換部101の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて従来画像変換部101の重みパラメータを更新してもよい。（なお、従来手法では、GAN接続構成時において、タスク部の誤差も少々考慮する総合コストを用いている。タスク接続構成の場合には総合コストを用いていない。） (203B) When using the total cost, the processing of (201) and (202) above for a plurality of training images (common training images for the GAN connection configuration and the task connection configuration), that is, the cost (error ), and for each training image common to both connection configurations, a predetermined weighted sum of the cost output by the conventional first evaluation unit 104 and the cost output by the conventional second evaluation unit 204 is , and using the total cost in the reverse direction (on the GAN connection configuration), the conventional second evaluation unit 204 → the conventional identification unit 203 → the conventional image conversion unit 101 and (on the task connection configuration) In the direction, the conventional first evaluation unit 104→conventional task unit 103→conventional image conversion unit 101 calculates the weight parameters of the conventional image conversion unit 101 using the stochastic gradient descent method or the like by performing error backpropagation calculation. You may update. (In the conventional method, when configuring GAN connection, the total cost is used that slightly considers the error of the task unit. In the case of task connection configuration, the total cost is not used.)

当該（２０３Ａ）または（２０３Ｂ）の更新により、従来画像変換部101によって変換して得られるフェイク画像が、従来識別部203が真贋を見分けることを失敗させる（すなわち、ターゲット画像生成部201で得られるリアル画像に類似している）精度が向上したものとなり、且つ、従来タスク部103での認識精度も向上したものとなる（すなわち、認識処理に適した状態の画像となる）ことが期待される。 By updating (203A) or (203B), the fake image obtained by conversion by the conventional image conversion unit 101 causes the conventional identification unit 203 to fail to discriminate authenticity (that is, the fake image obtained by the target image generation unit 201 It is expected that the accuracy will be improved (similar to a real image) and the recognition accuracy in the conventional task unit 103 will also be improved (that is, the image will be in a state suitable for recognition processing). .

ステップS103では学習が収束したか否かを判定し、収束していれば当該時点での従来画像変換部101（及び従来識別部203）の重みパラメータを最終的な学習結果として得たうえで図３のフローを終了し、収束していなければステップS101に戻ることにより、以上説明した通りの学習（ステップS101及びS102）がさらに継続して実施されることとなる。ステップS103での収束判定には例えば、訓練用画像とは別途のテスト用画像を用いることで手順（２０３Ｂ）の総合コストまたは手順（２０３Ａ）の従来第一、第二評価部104,204がそれぞれ出力するコストを計算して学習モデルの精度を評価し、当該精度の向上（向上の履歴）が収束したか否かによって判定すればよい。単純に所定のエポック数等を収束条件としても良い。 In step S103, it is determined whether or not the learning has converged. By ending the flow of 3 and returning to step S101 if not converged, the learning (steps S101 and S102) as described above will be further continued. For the determination of convergence in step S103, for example, by using a test image separate from the training image, the total cost of procedure (203B) or the conventional first and second evaluation units 104 and 204 of procedure (203A) output The cost may be calculated to evaluate the accuracy of the learning model, and determination may be made based on whether or not the improvement in accuracy (improvement history) has converged. A simple predetermined number of epochs or the like may be used as the convergence condition.

以上、非特許文献３の手法では、図２及び図３に示されるように敵対的学習の枠組みを利用して、従来画像変換部101と従来識別部203とを相互に競争させながら学習して、従来画像変換部101（及び従来識別部203）の学習結果を得ることができる。 As described above, in the method of Non-Patent Document 3, as shown in FIG. 2 and FIG. , the learning result of the conventional image conversion unit 101 (and the conventional identification unit 203) can be obtained.

以下、非特許文献３の手法に対して画像圧縮率を考慮する改良を施したものとして、本発明の一実施形態を説明する。 Hereinafter, an embodiment of the present invention will be described as a method in which the method of Non-Patent Document 3 is improved in consideration of the image compression rate.

図４は、一実施形態に係るタスク実施時（推論時）の構成である認識装置10の機能ブロック図であり、認識装置10は画像変換部11、圧縮部21及びタスク部13を有する。 FIG. 4 is a functional block diagram of the recognition device 10, which is a configuration when executing a task (during inference) according to one embodiment.

画像変換部11は、画像の難読化器の役割を有し、変換対象となる画像（ユーザが提供する、プライバシー保護の対象となる画像）を変換して、プライバシー保護画像を出力する。圧縮部21は、既存手法で当該プライバシー保護画像を圧縮することで、圧縮プライバシー保護画像を出力する。この際、ユーザ指定される所定の圧縮設定に従って、圧縮部21は圧縮を行うことができる。タスク部13は、当該圧縮プライバシー保護画像を復号したうえで所定の認識タスク（例えば姿勢推定や画像認識）を実施して認識結果（例えば姿勢推定結果や画像認識結果）を出力する。 The image conversion unit 11 serves as an image obfuscator, converts an image to be converted (an image provided by the user and subject to privacy protection), and outputs a privacy-protected image. Compression unit 21 outputs a compressed privacy protection image by compressing the privacy protection image using an existing method. At this time, the compression unit 21 can perform compression according to predetermined compression settings designated by the user. The task unit 13 decodes the compressed privacy-protected image, performs a predetermined recognition task (for example, pose estimation or image recognition), and outputs a recognition result (for example, pose estimation result or image recognition result).

本実施形態においても非特許文献３の従来手法と同様に、プライバシー保護画像（又はこれを保存や伝送するための圧縮プライバシー保護画像）は、プライバシーが保護された状態（難読化された状態）に変換されており、且つ、タスク部13での認識精度も一定精度を確保しうる画像となっている。 In the present embodiment, as in the conventional method of Non-Patent Document 3, a privacy-protected image (or a compressed privacy-protected image for storing or transmitting it) is converted to a privacy-protected state (obfuscated state). It is an image that has been converted and that can ensure a certain level of recognition accuracy in the task unit 13 .

一方、本実施形態では従来手法とは異なり、得られる画像は圧縮効率に優れており、画像変換部11で得たプライバシー保護画像（非圧縮状態のもの）を圧縮部21において圧縮プライバシー保護画像として（不可逆）圧縮することで、圧縮前から大きく品質を変えることなく、ファイルサイズを小さく抑えることができる。 On the other hand, in this embodiment, unlike the conventional method, the obtained image has excellent compression efficiency. By (irreversible) compression, the file size can be kept small without significantly changing the quality from before compression.

図４の認識装置10によるタスク実施のためには、予め、畳込ニューラルネットワーク等で構成されている画像変換部11を学習しておき、その重みパラメータを求めておく必要がある。図５は、画像変換部11の重みパラメータを当該学習する際の構成である、一実施形態に係る学習装置20の機能ブロック図である。図示されるように、学習装置20は、画像変換部11、タスク部13、第一評価部14、圧縮部21、識別部23及び第二評価部24を有する。 In order for the recognition device 10 in FIG. 4 to perform the task, it is necessary to previously learn the image transforming unit 11, which is composed of a convolutional neural network or the like, and obtain its weighting parameters. FIG. 5 is a functional block diagram of the learning device 20 according to one embodiment, which is the configuration for learning the weighting parameter of the image conversion unit 11. As shown in FIG. As illustrated, the learning device 20 has an image conversion unit 11, a task unit 13, a first evaluation unit 14, a compression unit 21, an identification unit 23 and a second evaluation unit .

共通の符号が付されるように、図４及び図５の両方にそれぞれ存在する画像変換部11、圧縮部21及びタスク部13は、図４及び図５において同一の構成である。ただし、画像変換部11の重みパラメータは、学習装置20による学習によって逐次的に更新され、学習が完了した際の重みパラメータで構成される画像変換部11が、図４の認識装置10において用いられるものとなる。 4 and 5, the image conversion section 11, the compression section 21 and the task section 13, which are respectively present in both FIGS. However, the weight parameters of the image conversion unit 11 are sequentially updated by learning by the learning device 20, and the image conversion unit 11 configured with the weight parameters when learning is completed is used in the recognition device 10 of FIG. become a thing.

一方、図４及び図５で共通のタスク部13に関しては、画像に対して所定のタスク（姿勢推定など）を実行する任意の既存の畳込ニューラルネットワーク等で構成されており、図５の学習装置20による学習を行う時点において既に学習済みであるものとして、その重みパラメータが定まっているものである。（すなわち、学習装置20による学習において、タスク部13の重みパラメータが学習されて更新されることは基本的にはないが、タスク精度が十分に保てない場合にはファインチューニングを行ってタスク部13の重みパラメータを更新しても良い。その場合、通常の画像のタスク精度は低下するが、画像変換器の出力画像に対してはタスクの精度が向上する。） On the other hand, the task unit 13, which is common to FIGS. The weighting parameters are already determined when learning is performed by the device 20, assuming that learning has already been completed. (That is, in learning by the learning device 20, the weighting parameters of the task unit 13 are basically not learned and updated, but if the task accuracy cannot be sufficiently maintained, fine tuning is performed to We may update the 13 weighting parameters, which reduces task accuracy for normal images, but improves task accuracy for image transformer output images.)

図６は、一実施形態に係る学習装置20による学習のフローチャートであり、既存手法であるGANを利用した学習を行う手順を示すものである。学習装置20による図６の当該手順はステップS11,S12,S13で構成されるが、これらはそれぞれ、従来学習構成200による図３のステップS101,S102,S103に対応しており、従来学習構成200における各部を、学習装置20の各部に以下のように読み替えてステップS101,S102,S103を実施したものが、図６のステップS11,S12,S13にそれぞれ相当する。（従って、図６の各ステップに関して、処理主体となる機能部を図２のものから図５のものへと読み替えることによって図３の各ステップに対応しているため、重複する説明は省略する。） FIG. 6 is a flowchart of learning by the learning device 20 according to one embodiment, and shows a procedure of learning using GAN, which is an existing method. 6 by the learning device 20 consists of steps S11, S12 and S13, which respectively correspond to steps S101, S102 and S103 of FIG. 6 are replaced with those of the learning apparatus 20 as follows, and steps S101, S102, and S103 are executed, respectively, corresponding to steps S11, S12, and S13 in FIG. (Therefore, each step in FIG. 6 corresponds to each step in FIG. 3 by changing the functional unit that is the subject of processing from that in FIG. 2 to that in FIG. 5, so redundant description will be omitted. )

すなわち、「読み替え前の従来学習構成200の構成→読み替え後の学習装置20の構成」という形で読み替えの対応関係を示すと、「ターゲット画像生成部201→圧縮部21」、「従来識別部203→識別部23」、「従来第二評価部204→第二評価部24」、「従来画像変換部101→画像変換部11」、「従来タスク部103→タスク部13」及び「従来第一評価部104→第一評価部14」という対応関係で、読み替えることができる。学習の際のGAN接続やタスク接続に関しても、これら読み替えにより同様に定義される。 That is, if the correspondence relationship of reading is shown in the form of "the configuration of the conventional learning configuration 200 before the reading change→the configuration of the learning device 20 after the reading change", the "target image generation unit 201→the compression unit 21" and the "conventional identification unit 203 → identification unit 23”, “conventional second evaluation unit 204→second evaluation unit 24”, “conventional image conversion unit 101→image conversion unit 11”, “conventional task unit 103→task unit 13”, and “conventional first evaluation Section 104→first evaluation section 14" can be read as a corresponding relationship. GAN connections and task connections during learning are similarly defined by these replacements.

上記対応関係において、「ターゲット画像生成部201及び圧縮部21」のみが互いに相違する処理を行う関係にあり、その他は全て、学習の際の各ステップにおいて同一の処理を行う関係にある。換言すれば、本実施形態では、従来学習構成200のターゲット画像生成部201を圧縮部21に置き換えたものとして学習装置20を用意し、図３の各ステップと同様である図６の各ステップを学習装置20において実行することで、結果的に、その重みパラメータが学習される画像変換部11が、図４の認識装置10において説明した通りの、圧縮効率に優れプライバシー保護されており、且つ、タスク部13による認識処理にも適した画像を出力可能なものとして得られることとなる。 In the above correspondence relationship, only the "target image generating unit 201 and the compressing unit 21" are in a relationship of performing mutually different processes, and all others are in a relationship of performing the same processes in each step during learning. In other words, in this embodiment, the learning device 20 is prepared by replacing the target image generation unit 201 of the conventional learning configuration 200 with the compression unit 21, and each step in FIG. 6, which is the same as each step in FIG. By executing in the learning device 20, as a result, the image conversion unit 11 whose weight parameter is learned has excellent compression efficiency and privacy protection as described in the recognition device 10 of FIG. 4, and An image suitable for recognition processing by the task unit 13 can be obtained as an outputtable image.

本実施形態において上記のように、（従来手法において用いられていたターゲット画像生成部201に代わるものとして、）学習装置20に圧縮部21を設けることは、次のような独自の知見に基づくものである。すなわち、圧縮部21ではユーザ指定される圧縮設定に従って、JPEG等の非可逆圧縮を行うことで、訓練用画像からリアル画像を得る。ここで、非可逆圧縮は劣化を伴うため、非可逆圧縮されデータサイズが小さくなったリアル画像は、そのまま、プライバシー保護画像としても利用可能である、という知見である。 In the present embodiment, providing the compression unit 21 in the learning device 20 (in place of the target image generation unit 201 used in the conventional method) as described above is based on the following unique knowledge. is. That is, the compression unit 21 obtains a real image from the training image by performing irreversible compression such as JPEG according to the compression setting specified by the user. Here, since irreversible compression is accompanied by deterioration, it is knowledge that a real image whose data size has been reduced by irreversible compression can be used as it is as a privacy protection image.

従って、敵対的学習の枠組みに即した図６のフローにより、画像変換部101は、圧縮部21によって圧縮された画像に類似するものとして、圧縮効率が高く、プライバシー保護も実現されている画像であって、且つ、タスク部13による認識にも適している画像を出力することができるものとして、敵対的関係にある識別部23と共にその重みパラメータを学習することが可能となる。 Therefore, according to the flow of FIG. 6, which is in line with the adversarial learning framework, the image conversion unit 101 uses an image that is similar to the image compressed by the compression unit 21 and that has high compression efficiency and privacy protection. Assuming that there is an image suitable for recognition by the task unit 13 and can output an image that is suitable for recognition by the task unit 13, it is possible to learn the weight parameter together with the identification unit 23, which is in a hostile relationship.

図７は、図５とは別の一実施形態に係る学習装置20の機能ブロック図であり、図８は、図７の構成における一実施形態に係る学習装置20による学習のフローチャートである。 FIG. 7 is a functional block diagram of the learning device 20 according to another embodiment different from FIG. 5, and FIG. 8 is a flow chart of learning by the learning device 20 according to the embodiment in the configuration of FIG.

図５及び図６の実施形態では敵対的学習の枠組みを利用して画像変換部11の重みパラメータを学習したのに対して、図７及び図８の実施形態では敵対的学習の枠組みを利用せずに画像変換部11の重みパラメータを学習することができる。敵対的学習の枠組みを利用しないことにより、図７の学習装置20は、図５の構成から識別部23が除外された構成となる。 In the embodiments of FIGS. 5 and 6, the adversarial learning framework is used to learn the weight parameters of the image transforming unit 11, whereas in the embodiments of FIGS. 7 and 8, the adversarial learning framework is not used. It is possible to learn the weighting parameters of the image transforming unit 11 without the need. By not using the adversarial learning framework, the learning device 20 in FIG. 7 has a configuration in which the identification unit 23 is removed from the configuration in FIG.

図７の学習装置20における第二評価部24は、図５における処理（識別部23の識別結果の評価処理）とは異なる処理として、次のような処理を行う。すなわち、図７の第二評価部24は、圧縮部21が訓練用画像を圧縮して得る圧縮画像と、画像変換部11が訓練用画像を変換して得るプライバシー保護画像と、を読み込み、所定のコスト関数により、これら２画像の相違が大きいほどその値が大きくなるようなコストを算出する。一実施形態では、圧縮画像とプライバシー保護画像との平均二乗誤差（MSE、当該２画像の差分画像の画素値の二乗和を画素数で割ったもの）として、図７の第二評価部24はコストを算出することができる。あるいは、差分画像の絶対値和の画素数平均により、コストを算出してもよい。 The second evaluation unit 24 in the learning device 20 in FIG. 7 performs the following processing as processing different from the processing in FIG. 5 (processing for evaluating the identification result of the identification unit 23). That is, the second evaluation unit 24 in FIG. 7 reads the compressed image obtained by compressing the training image by the compression unit 21 and the privacy protection image obtained by converting the training image by the image conversion unit 11, and A cost function is calculated such that the larger the difference between these two images, the larger the cost. In one embodiment, as the mean squared error (MSE, the sum of the squares of the pixel values of the difference image between the two images divided by the number of pixels) between the compressed image and the privacy protection image, the second evaluation unit 24 of FIG. Cost can be calculated. Alternatively, the cost may be calculated by averaging the number of pixels of the sum of absolute values of the difference image.

一方、図７の学習装置20における第二評価部24以外の構成である圧縮部21、画像変換部11、タスク部13及び第一評価部14のそれぞれの処理内容に関しては、図５の学習装置20における処理内容と共通である。以下、図８のフローの各ステップを説明する。 On the other hand, the processing contents of each of the compression unit 21, the image conversion unit 11, the task unit 13, and the first evaluation unit 14, which are components other than the second evaluation unit 24 in the learning device 20 of FIG. The processing content is the same as in 20. Each step of the flow in FIG. 8 will be described below.

図８のフローの開始時には予め、画像変換部11の重みパラメータの初期値を設定しておく。（なお、タスク部13に関しては図５及び図６の実施形態と同様に、既に重みパラメータが学習済みの状態にある。）図８のフローが開始されると、ステップS21において、画像変換部11の学習を行い、その重みパラメータを更新してから、ステップS22へと進む。ステップS21では具体的に以下の（２１）~（２２Ａ）又は（２２Ｂ）で示される一連の学習手順により、画像変換部11の重みパラメータを更新することができる。手順（２２Ａ）と（２２Ｂ）とは、基本的にはそのいずれかを用いればよい。（両方用いてもよい。） At the start of the flow of FIG. 8, the initial values of the weight parameters of the image conversion unit 11 are set in advance. (As for the task section 13, the weighting parameters have already been learned, as in the embodiments of FIGS. 5 and 6.) When the flow of FIG. is learned, the weight parameter is updated, and then the process proceeds to step S22. In step S21, the weighting parameters of the image conversion unit 11 can be updated by a series of learning procedures specifically shown in (21) to (22A) or (22B) below. Basically, either of the procedures (22A) and (22B) may be used. (Both may be used.)

（２１）学習データとして与えられる訓練用画像を、画像変換部11で変換することによりプライバシー保護画像を得て、且つ、当該訓練用画像を圧縮部21で圧縮することにより圧縮画像を得る。当該プライバシー保護画像及び圧縮画像を第二評価部24で評価することにより、コストを計算する。また、当該プライバシー保護画像をタスク部13で認識して認識結果を得て、この認識結果を第一評価部14において学習データとして与えられる正解と照合することにより評価して、認識結果に対するコストを算出する。当該コストは、（図５及び図６の第一評価部14と同様である図２及び図３の従来第一評価部104と同様に、）認識結果が正解に近ければ低コスト値とし、正解に近くなければ高コスト値となるように、第一評価部14において所定のコスト関数を用いて算出する。 (21) A training image given as learning data is converted by the image conversion unit 11 to obtain a privacy protection image, and the training image is compressed by the compression unit 21 to obtain a compressed image. A cost is calculated by evaluating the privacy-protected image and the compressed image in the second evaluation unit 24 . Further, the task unit 13 recognizes the privacy-protected image to obtain a recognition result, and the first evaluation unit 14 compares the recognition result with the correct answer given as learning data to evaluate the cost of the recognition result. calculate. If the recognition result is close to the correct answer (similar to the conventional first evaluation unit 104 in FIGS. 2 and 3, which is the same as the first evaluation unit 14 in FIGS. 5 and 6), the cost is set to a low cost value, is calculated using a predetermined cost function in the first evaluation unit 14 so that a high cost value is obtained if it is not close to .

（２２Ａ）複数の訓練用画像をミニバッチに分け、バッチごとにコスト（誤差）の順伝播計算を行い、バッチごとに、第一評価部14の出力したコストまたは第二評価部24の出力したコストを計算する。GAN接続構成（図５に対応するものとして図７に関して画像変換部11、圧縮部21及び第二評価部24の構成（図５での識別部23を除外した構成）として定義）の場合は、第二評価部の出力したコストを用いて逆方向に、第二評価部24→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新する。また、タスク接続構成（図７に関して図５と同様に定義）の場合は、第一評価部14の出力したコストを用いて、逆方向に、第一評価部14→タスク部13→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新する。以上のGAN接続構成の逆伝播、タスク接続構成の逆伝播を交互に行い、画像変換部11の重みパラメータは学習される。 (22A) Divide a plurality of training images into mini-batches, perform forward propagation calculation of cost (error) for each batch, and calculate the cost output by the first evaluation unit 14 or the cost output by the second evaluation unit 24 for each batch. to calculate In the case of the GAN connection configuration (defined as the configuration of the image conversion unit 11, the compression unit 21 and the second evaluation unit 24 in FIG. 7 corresponding to FIG. 5 (the configuration excluding the identification unit 23 in FIG. 5)), Using the cost output by the second evaluation unit, the image conversion unit 11 is calculated using the stochastic gradient descent method or the like by performing error backpropagation calculation in the reverse direction from the second evaluation unit 24 to the image conversion unit 11. Update the weight parameter of . In the case of the task connection configuration (defined in the same manner as in FIG. 5 with respect to FIG. 7), the costs output by the first evaluation unit 14 are used in the opposite direction, the first evaluation unit 14→task unit 13→image conversion unit 11, the weighting parameters of the image conversion unit 11 are updated using the stochastic gradient descent method or the like. The backpropagation of the GAN connection configuration and the backpropagation of the task connection configuration are alternately performed, and the weight parameter of the image conversion unit 11 is learned.

（２２Ｂ）総合コストを用いる場合には、複数の訓練用画像（GAN接続構成とタスク接続構成とで共通の訓練用画像）について、コスト（誤差）の順伝播計算を行い、当該両接続構成において共通の訓練用画像ごとに、第一評価部14の出力したコストと第二評価部24の出力したコストとの所定の重みづけ和としての総合コストを計算し、当該総合コストを用いて（GAN接続構成上を）逆方向に、第二評価部24→画像変換部11及び第一評価部14→タスク部13→画像変換部11の誤差逆伝播法の計算を行うことで、確率的勾配降下法等を用いて画像変換部11の重みパラメータを更新してもよい。 (22B) When using the total cost, perform forward propagation calculation of the cost (error) for multiple training images (common training images for the GAN connection configuration and the task connection configuration), and in both the connection configurations For each common training image, calculate a total cost as a predetermined weighted sum of the cost output by the first evaluation unit 14 and the cost output by the second evaluation unit 24, and use the total cost (GAN In the connection configuration), the error backpropagation calculation is performed in the reverse direction of the second evaluation unit 24 → image conversion unit 11 and the first evaluation unit 14 → task unit 13 → image conversion unit 11, thereby stochastic gradient descent The weighting parameter of the image conversion unit 11 may be updated using a method or the like.

当該手順（２２Ａ）または（２２Ｂ）のコストを用いた更新により、図５及び図６による敵対的学習の枠組み利用の場合と同様にこの図７及び図８の実施形態においても、画像変換部11で変換して得られるプライバシー保護画像が、圧縮部21で圧縮した画像と類似することでプライバシー保護及びファイルサイズ削減の要求を満たし、且つ、タスク部13による認識にも適した画像となることが期待される。 7 and 8 as well as in the case of using the adversarial learning framework according to FIGS. The privacy-protected image obtained by the conversion in is similar to the image compressed by the compression unit 21, thereby satisfying the requirements for privacy protection and file size reduction and becoming an image suitable for recognition by the task unit 13. Be expected.

ステップS22では学習が収束したか否かを判定し、収束していれば当該時点での画像変換部11の重みパラメータを最終的な学習結果として得たうえで図８のフローを終了し、収束していなければステップS21に戻ることにより、以上説明した通りの学習（ステップS21）がさらに継続して実施されることとなる。ステップS22での収束判定は、図３のステップS103や図６のステップS13と同様にして例えば、訓練用画像とは別途のテスト用画像を用いることで手順（２２Ｂ）の総合コストや手順（２２Ａ）の第一評価部14のコスト及び第二評価部24のコストを計算して学習モデルの精度を評価し、当該精度の向上（向上の履歴）が収束したか否かによって判定すればよい。また、単純に所定のエポック数で学習を切り上げても良い。 In step S22, it is determined whether or not the learning has converged, and if it has converged, the weight parameter of the image conversion unit 11 at that time is obtained as the final learning result, and the flow of FIG. If not, by returning to step S21, learning as described above (step S21) is further continued. The convergence determination in step S22 is performed in the same manner as step S103 in FIG. 3 and step S13 in FIG. ), the accuracy of the learning model is evaluated by calculating the cost of the first evaluation unit 14 and the cost of the second evaluation unit 24, and determination is made based on whether or not the improvement in accuracy (improvement history) has converged. Alternatively, learning may simply be rounded up at a predetermined number of epochs.

以上、図４～図８等を参照して説明した本発明の各実施形態によれば、訓練用画像を画像変換部11で変換したプライバシー保護画像に対するタスク部13での認識結果を第一評価部14で評価した第１コストと、訓練用画像を圧縮部21で変換した圧縮画像とプライバシー保護画像との類似性を第二評価部24で評価した第２コストとを用いてニューラルネットワーク構造の画像変換部11の重みパラメータを学習することで、学習結果として得られる画像変換部11が、プライバシー保護、圧縮効率及びタスク部13での認識性能の３つの点の全てにおいて優れた画像を出力することが可能となる。既に説明したように、第１コスト及び第２コストを用いた学習として、各々のコストの交互の最小化や、重みづけ和として求まる総合コストの最小化がなされるように学習することが可能である。 As described above, according to each embodiment of the present invention described with reference to FIGS. Using the first cost evaluated by the unit 14 and the second cost evaluated by the second evaluation unit 24 for the similarity between the compressed image obtained by converting the training image by the compression unit 21 and the privacy protection image, the neural network structure is constructed. By learning the weighting parameters of the image conversion unit 11, the image conversion unit 11 obtained as a learning result outputs an image excellent in all three points of privacy protection, compression efficiency, and recognition performance in the task unit 13. becomes possible. As already explained, as learning using the first cost and the second cost, it is possible to alternately minimize each cost or to minimize the total cost obtained as a weighted sum. be.

以下、種々の補足例や追加例などに関する説明を行う。 Various supplementary examples and additional examples will be described below.

（１）図４～図８の各実施形態で共通して用いられる圧縮部21に関して、以下のようにしてもよい。圧縮部21による圧縮は、例えば、周波数変換を用いる画像圧縮（基底にDCT（離散コサイン変換）を用いるJPEGや、基底にウェーブレット変換を用いるJPEG2000等）により、以下の(i)～(iii)の観点でユーザ指定される圧縮設定の下において行うことができる。 (1) The compressing unit 21 commonly used in each of the embodiments shown in FIGS. 4 to 8 may be modified as follows. The compression by the compression unit 21 is performed by, for example, image compression using frequency transform (JPEG using DCT (discrete cosine transform) as a base, JPEG2000 using wavelet transform as a base, etc.) to achieve the following (i) to (iii). This can be done under user-specified compression settings in terms.

(i) JPEGやJPEG2000圧縮であれば品質値が全体の半分以下となるように圧縮した低品質圧縮画像を得るように、設定してよい。JPEGであれば、量子化により多くの高周波成分が0となる。これにより、服のテクスチャなどのエッジに関するプライバシーを保護しやすくなり、圧縮率も高くなる。 (i) In the case of JPEG or JPEG2000 compression, setting may be made so as to obtain a low-quality compressed image compressed so that the quality value is less than half of the whole. In the case of JPEG, many high frequency components become 0 due to quantization. This makes it easier to preserve the privacy of edges, such as clothing textures, and provides a higher compression ratio.

(ii)-a JPEGのDCT成分やJPEG2000のウェーブレット変換後の最小周波数成分を全て同一値（例：中間値が望ましい。0―1階調なら0.5）とする。これにより、大半のグラデーションがなくなり、肌等のプライバシーを保護しやすくなる。また、DCT係数の情報削減により圧縮率も高くなる。 (ii)-a The DCT component of JPEG and the minimum frequency component after wavelet transform of JPEG2000 are all set to the same value (eg, the intermediate value is desirable, 0.5 for gradation 0-1). This eliminates most of the gradation, making it easier to protect the privacy of the skin. In addition, the reduction of DCT coefficient information also increases the compression rate.

(ii)-b JPEGのDCT成分やウェーブレット変換後の最小周波数成分を数個の値（例：２値から8値）とする。これにより、微細なグラデーションがなくなり、肌等のプライバシーを保護しやすくなる。また、DCT係数の情報削減により圧縮率も高くなる。 (ii)-b The DCT component of JPEG and the minimum frequency component after wavelet transform are set to several values (eg, 2 to 8 values). This eliminates fine gradations and makes it easier to protect the privacy of the skin. In addition, the reduction of DCT coefficient information also increases the compression ratio.

すなわち、(ii)-aでは、変換後の最小周波数成分（の本来の値）を同一値に書き換えており、(ii)-bでは、変換後の最小周波数成分を粗く量子化する。通常は画像品質を確保するために低周波成分は細かく量子化し、粗く量子化されないが、ここでは粗く量子化するのが特徴である。 That is, in (ii)-a, the minimum frequency component after transformation (original value) is rewritten to the same value, and in (ii)-b, the minimum frequency component after transformation is roughly quantized. Normally, low-frequency components are finely quantized and not coarsely quantized in order to ensure image quality.

(iii) 用いる周波数の基底を選択する、及び／又は、強度を変更する。すなわち、基底を選択する場合は、選択されなかった所定の基底の変換係数を削除することとなる。強度（変換係数）を変更する場合は、所定基底の変換係数を一定値に強制的に書き換えるか、係数の絶対値を変更することとなる。例えばDCTのDC成分以外においては、係数の絶対値を小さくすることで強度が弱くなる。 (iii) select the frequency basis used and/or change the intensity; That is, when a basis is selected, the transform coefficients of a predetermined basis that have not been selected are deleted. When changing the intensity (transform coefficient), the transform coefficient of the predetermined base is forcibly rewritten to a constant value, or the absolute value of the coefficient is changed. For example, other than the DC component of DCT, the strength is weakened by decreasing the absolute value of the coefficient.

基本的には、タスクの精度を下げにくい(i)-(iii)を随時選択・組み合わせるとよい。一般的な行動認識や画像認識タスクであれば、動きやグラデーションが分かる(i)と(iii) が向く可能性が高く、(ii)は顔や骨格のキーポイントを抽出するタスクに向くと考えられる。また、タスクと(i)-(iii)の相性が不明である場合は、(iii)で複数の周波数基底を選択しても良い。例えば、低周波、高周波、その中間周波数成分に分け、それぞれの周波成分のみ、または、どれか２つの領域の周波数成分のみで難読化器（画像変換部）を学習し、タスクの精度劣化度を得た後に、用いる周波数成分を決定するようにしてもよい。周波数成分の分け方は３つに限らない。 Basically, it is good to select/combine (i)-(iii) as needed so that the accuracy of the task is less likely to decrease. For general action recognition and image recognition tasks, (i) and (iii) are likely to be suitable for detecting movement and gradation, and (ii) is likely to be suitable for tasks to extract key points of faces and skeletons. be done. Also, if the compatibility between the task and (i)-(iii) is unknown, multiple frequency bases may be selected in (iii). For example, divide into low frequency, high frequency, and intermediate frequency components, learn the obfuscator (image conversion part) only with each frequency component, or only the frequency component of any two areas, and measure the accuracy deterioration degree of the task After obtaining, the frequency components to be used may be determined. The number of divisions of frequency components is not limited to three.

例えば、同じ大きさで顔を撮影した場合、FFT（高速フーリエ変換）を用いた空間周波数解析により、空間周波数の強度と年齢または性別には高い相関関係があることが、以下の特許文献やURL（「顔画像における表情や印象と空間周波数特性との関係」）で開示されるように、知られている。
特許05827225号（特願2012-521378）
https://www.jstage.jst.go.jp/article/itej/69/11/69_836/_pdf/-char/ja For example, when a face is photographed at the same size, spatial frequency analysis using FFT (Fast Fourier Transform) shows that there is a high correlation between the intensity of the spatial frequency and age or gender. (“Relationship between facial expressions and impressions in facial images and spatial frequency characteristics”).
Patent No. 05827225 (Patent application 2012-521378)
https://www.jstage.jst.go.jp/article/itej/69/11/69_836/_pdf/-char/ja

この考えを用いると、年齢や男女を隠すように、圧縮に用いる周波数の強度を変更してもよい。例えば、低周波の強度が大きいと女性に判別されやすく、高周波の強度が大きいと男性に判別されやすいことが分かっている。そこで、女性の画像においても高周波成分を男性的に見えるように強度を段階的に強くする等によりプライバシーを保護できると考えられる。そのような圧縮画像を訓練用画像全体に対して作っておいてもよい。 Using this idea, the intensity of the frequencies used for compression may be varied to mask age and gender. For example, it is known that a high intensity of low frequency is easily discriminated by females, and a high intensity of high frequency is easily discriminated by males. Therefore, it is conceivable that the privacy can be protected by, for example, stepwise increasing the intensity of the high-frequency components in an image of a woman so that the image looks masculine. Such compressed images may be made for the entire training image.

（２）プライバシー保護と圧縮率向上の観点から、ダイナミックレンジ（階調数）は予め縮小したものを訓練用画像として用いてもよい。例えば、ダイナミックレンジを縮小しつつ画素値平均を128近辺とする、RGBそれぞれ256の階調数→8値化する、等が可能である。また、減色する、肌色など人に目立つ色を別の色（例：青、紫、緑等）に変換する等を行ってもよい。 (2) From the viewpoint of privacy protection and improvement of compression ratio, images whose dynamic range (the number of gradations) has been reduced in advance may be used as training images. For example, it is possible to set the average pixel value to around 128 while reducing the dynamic range, 256 gradation levels for each of RGB→eight values, and the like. In addition, the color may be reduced, or a color such as skin color that is conspicuous to humans may be converted to another color (eg, blue, purple, green, etc.).

タスク部13の学習済みパラメータをさらに更新しつつ画像変換部11の学習を行う場合（前述したファインチューニングを行う場合）は、生成画像をそのままタスクに入力してもよいが、そうでない場合は、タスク実施前に元のダイナミックレンジや色数に戻してからタスクを実施すればよい。同様に、画像のプライバシー保護やデータ削減のため、圧縮前に、減色する、肌色など人に目立つ色を別の色（例：青、紫、緑等）に変換する、ダイナミックレンジを縮小する等行っても良い。この場合は、タスク実施前に元の色数や色、ダイナミックレンジを元に戻してからタスクを実施する。 When learning the image conversion unit 11 while further updating the learned parameters of the task unit 13 (when performing the fine tuning described above), the generated image may be directly input to the task. Before executing the task, the original dynamic range and number of colors can be restored before executing the task. Similarly, in order to protect image privacy and reduce data, before compression, color reduction, conversion of colors that stand out to humans such as skin tones to other colors (e.g. blue, purple, green, etc.), reduction of dynamic range, etc. you can go In this case, restore the original number of colors, colors, and dynamic range before executing the task, and then execute the task.

（３）タスクが姿勢推定である場合、姿勢推定可能な状態でプライバシーを保護する応用例は様々であるが、例えば以下がある。
・宅内で運動した画像をサーバに送信し、姿勢推定による画像解析を行い、アドバイスを受ける場合に、宅内撮影画像の人物・肌・服装・部屋などのプライバシーを守ることができる。
・ドライブレコーダーで撮影した車外の映像をサーバに送信する際に、歩行者の挙手姿勢、転倒姿勢等をAI（人工知能）で認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・サーバに集められたドライブレコーダーで撮影した車外の映像の公開レベルをあげてデータを移管・公開する際に、歩行者の挙手姿勢、転倒姿勢等をAIで認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・ドライブレコーダーで撮影した車内の映像から、運転者や同乗者の行為（携帯で電話している、後ろを向いている等）を検出できるようにしつつ、運転手や車内の同乗者のプライバシーを守り、映像をサーバに送信する・公開することができる。 (3) When the task is posture estimation, there are various application examples for protecting privacy in a state where posture estimation is possible.
・In the case of transmitting images of exercising at home to a server, performing image analysis based on posture estimation, and receiving advice, it is possible to protect the privacy of the person, skin, clothes, room, etc. in the images taken at home.
・When transmitting images of the outside of the vehicle captured by the drive recorder to the server, it is possible to protect the privacy of pedestrians while maintaining a state where AI (artificial intelligence) can recognize pedestrians' postures such as raising their hands and falling.
・In raising the level of publicity of the video outside the vehicle captured by the drive recorder collected on the server and transferring/publishing the data, it is possible to maintain a state in which the AI can recognize the posture of pedestrians raising their hands, falling postures, etc. You can protect your privacy.
・It is possible to detect the actions of the driver and passengers (calling on a mobile phone, looking backwards, etc.) from the video taken by the drive recorder, while protecting the privacy of the driver and passengers in the car. You can protect it, send the video to the server, and publish it.

（４）図４～図８の各実施形態で共通して用いられる画像変換部11に関して、以下のようにしてもよい。 (4) The image converter 11 commonly used in each of the embodiments shown in FIGS. 4 to 8 may be modified as follows.

画像変換部11を構成するネットワークの中間層または出力層に、圧縮予定の周波数基底（圧縮部21で圧縮に用いるのと共通の周波数基底）をカーネルとした畳込層を挿入する。例えば、８×８のDCT基底をカーネルとする。（特定層の全部ではなく、一部のみに当該周波数基底カーネルの畳込層を入れてもよい。）ストライド幅は圧縮部21での圧縮ブロックサイズとすることで、圧縮しやすい画像を変換するネットワークを学習により生成しやすくなることが期待される。基底は予め選択しておいても、実験的にタスク精度・圧縮率が高くなったネットワークで選択された基底の組み合わせを後から特定してもよい。また、強度を変更してもよい。強度調整は例えばカーネル値を予めスカラー倍することで実現する。 In the intermediate layer or output layer of the network that constitutes the image conversion unit 11, a convolution layer is inserted with a kernel of the frequency base to be compressed (common frequency base used for compression by the compression unit 21). For example, let an 8×8 DCT basis be the kernel. (The convolution layer of the frequency basis kernel may be included in only part of the specific layer, not all of it.) By setting the stride width to the compression block size in the compression unit 21, an image that is easy to compress is converted. It is expected that it will become easier to generate networks by learning. The bases may be selected in advance, or a combination of bases selected in a network with high task accuracy/compression ratio experimentally may be specified later. Moreover, you may change intensity|strength. Intensity adjustment is realized, for example, by scalar-multiplying the kernel value in advance.

カーネル値は、品質（Quality）ごとのJPEG等の量子化テーブル値を用いて、1/テーブル値でスカラー倍してもよい。テーブル値は範囲が広いため、2～8値化等を行っておくとよい。その後、2～8値化できるステップ状のアクティベーション関数で量子化することで、実際のJPEG圧縮を画像生成のニューラルネットにおいて、ある程度模擬することも可能である。 The kernel value may be scalar-multiplied by 1/table value using a quantization table value such as JPEG for each quality. Since the range of table values is wide, it is better to convert to 2 to 8 values. After that, it is possible to simulate actual JPEG compression to some extent in a neural network for image generation by quantizing with a step-like activation function that can be converted to 2 to 8 values.

学習する際は、挿入した周波数基底カーネルのみは重みの更新を行わない。つまり、固定値とし、他の層の重みのみ更新すればよい。 During learning, only the inserted frequency basis kernel does not update the weights. In other words, only the weights of other layers need to be updated with fixed values.

なお、挿入したDCT層（周波数基底カーネルを含む層）の後段（直後でなくて良い）に逆量子化と逆DCTにあたるアップコンボリューション層（重みが固定され学習で更新されない層）を挿入してもよい。アップコンボリューション層が入ることで、周波数領域でなく空間領域に戻り、見た目でプライバシーを確認しやすくなること、また、JPEG圧縮を模擬した画像変換器とタスクを直接接続でき、学習時にタスク誤差の画像変換器への逆伝播もできるため、JPEG圧縮の影響を正しく推測できることが期待できる。通常は、画像変換器の後のJPEG圧縮は、学習時には考慮しないが、これによりある程度正しくJPEG圧縮の影響も考慮することが可能になることが期待できる。 In addition, after the inserted DCT layer (the layer containing the frequency basis kernel), insert an upconvolution layer (a layer with fixed weights that is not updated by learning) that corresponds to inverse quantization and inverse DCT. good too. By adding an up-convolution layer, we can return to the spatial domain instead of the frequency domain, making it easier to visually confirm privacy. Backpropagation to the image converter is also possible, so it is expected that the effect of JPEG compression can be correctly estimated. Normally, the JPEG compression after the image converter is not taken into account during training, but it is expected that this will allow the effects of JPEG compression to be considered correctly to some extent.

（５）図９は、一般的なコンピュータ装置70におけるハードウェア構成を示す図であり、図４～図８の各実施形態の認識装置10及び学習装置20はそれぞれ、このような構成を有する１つ以上のコンピュータ装置70として実現可能である。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサ72（GPU（グラフィック演算装置）や深層学習専用プロセッサ等）、CPU71や専用プロセッサ72にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77と、これらの間でデータを授受するためのバスBSと、を備える。 (5) FIG. 9 is a diagram showing the hardware configuration of a general computer device 70. The recognition device 10 and the learning device 20 of each embodiment of FIGS. 4 to 8 each have such a configuration. It can be implemented as one or more computer devices 70. The computer device 70 includes a CPU (central processing unit) 71 that executes predetermined instructions, and a dedicated processor 72 (GPU (graphic processing unit)) that executes some or all of the execution instructions of the CPU 71 instead of the CPU 71 or in cooperation with the CPU 71. , dedicated processor for deep learning, etc.), RAM 73 as a main storage device that provides a work area to the CPU 71 and dedicated processor 72, ROM 74 as an auxiliary storage device, communication interface 75, display 76, mouse, keyboard, touch panel, etc. It has an input interface 77 for receiving data and a bus BS for exchanging data therebetween.

認識装置10及び学習装置20の各部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実現することができる。また、学習装置20による学習方法は、図６または図８の各ステップに対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実施することができる。 Each part of the recognition device 10 and the learning device 20 can be realized by a CPU 71 and/or a dedicated processor 72 that reads and executes a predetermined program corresponding to the function of each part from the ROM 74 . Also, the learning method by the learning device 20 can be implemented by the CPU 71 and/or the dedicated processor 72 that reads and executes a predetermined program corresponding to each step in FIG. 6 or 8 from the ROM 74 .

10…認識装置、20…学習装置
21…圧縮部、23…識別部、24…第二評価部、11…画像変換部、13…タスク部、14…第一評価部 10... recognition device, 20... learning device
21 Compression unit 23 Identification unit 24 Second evaluation unit 11 Image conversion unit 13 Task unit 14 First evaluation unit

Claims

A learning method for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
The learning method of claim 1, wherein the second cost is evaluated based on a difference between the compressed image and the privacy protected image .

A learning method for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
Using a neural network structure-based identification process that is learned to distinguish authenticity by identifying the compressed image as a real image and the privacy-protected image as a fake image,
Further comprising learning by a generative adversarial network so that the identification process improves the accuracy of discerning authenticity and the image conversion process improves the accuracy of misleading the identification process. and learning method.

3. The learning method according to claim 1 , wherein said compression processing uses discrete cosine transform or wavelet transform.

4. The method according to any one of claims 1 to 3 , wherein said compression processing includes frequency transforming using a transform base, and replacing the frequency-transformed lowest frequency component with a constant value or quantizing. Any of the learning methods described.

5. The learning method according to any one of claims 1 to 4 , wherein said compression processing includes performing frequency transform using transform bases and deleting transform coefficients of a predetermined transform base.

A learning method for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
The compression process includes frequency transforming using a transform basis,
A learning method, wherein the layer in the neural network structure in the image transformation processing includes a fixed convolutional layer that has a transformation base used in the compression processing as a kernel and is not updated by learning.

7. The learning method according to claim 6 , wherein the stride width of said fixed convolutional layer is set to match the compression block size in said compression process.

Learning the weight parameter of the image conversion process includes:
Alternately minimizing the first cost and the second cost, or minimizing a total cost calculated from the first cost and the second cost. 8. The learning method according to any one of 1 to 7 .

A learning device for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
The learning device , wherein the second cost is evaluated based on a difference between the compressed image and the privacy protection image .

A learning device for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
Using a neural network structure-based identification process that is learned to distinguish authenticity by identifying the compressed image as a real image and the privacy-protected image as a fake image,
Further comprising learning by a generative adversarial network so that the identification process improves the accuracy of discerning authenticity and the image conversion process improves the accuracy of misleading the identification process. and learning device.

A learning device for learning weight parameters for image conversion processing using a neural network structure,
a first cost for a recognition result obtained by recognizing a privacy-protected image obtained by converting a training image in the image conversion process in a task process for recognizing a predetermined task;
learning the weight parameter of the image conversion process using a second cost for the similarity evaluation result between the compressed image obtained by compressing the training image and the privacy protection image ;
The compression process includes frequency transforming using a transform basis,
A learning device, wherein a layer in a neural network structure in the image transformation processing includes a fixed convolutional layer that has a transformation base used in the compression processing as a kernel and is not updated by learning.

A program for causing a computer to execute the learning method according to any one of claims 1 to 8 .