JP7453900B2

JP7453900B2 - Learning method, image conversion device and program

Info

Publication number: JP7453900B2
Application number: JP2020209665A
Authority: JP
Inventors: 絵美明堂; 和之田坂; 茂之酒澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2024-03-21
Anticipated expiration: 2040-12-17
Also published as: JP2022096519A

Description

本発明は、タスク精度を確保しつつ画像のプライバシー保護等を行うことのできる画像変換処理を学習する学習方法、画像変換装置及びプログラムに関する。 The present invention relates to a learning method, an image conversion device, and a program for learning image conversion processing that can protect image privacy while ensuring task accuracy.

ユーザのプライバシー情報が含まれうる画像／音声データを、クラウドに送信し、ニューラルネット等の機械学習を用いて解析するケースにおいて、ユーザへのプライバシー侵害が発生することを防止する必要がある。例えば、音声データに関して、クラウドに送られたスマートスピーカの内容を、仮にサービス提供者側が視聴したとすると、機械学習の精度向上等の技術的な目的の下での視聴であったとしても、結果としてプライバシーの侵害が発生しうる。 In cases where image/audio data that may include user privacy information is sent to the cloud and analyzed using machine learning such as neural networks, it is necessary to prevent invasion of user privacy. For example, regarding audio data, if a service provider were to listen to the content of a smart speaker sent to the cloud, even if the viewing was for a technical purpose such as improving the accuracy of machine learning, the result would be A violation of privacy may occur as a result.

なお、このようなスマートスピーカでは、一般的に通信路の盗聴からは、データの暗号化によりユーザのプライバシーを守っている。しかし、クラウド側では暗号化されたデータを復号するため、上記のような状況が発生しうることとなる。 Note that such smart speakers generally protect user privacy from eavesdropping on the communication path by encrypting data. However, since encrypted data is decrypted on the cloud side, the above situation may occur.

以下のURLのニュースリリース記事「暗号化したままディープラーニングの標準的な学習処理ができる秘密計算技術を世界で初めて実現」において開示されているように、クラウド側で暗号化したまま再学習やファインチューニング等の処理を行う手法もある。
https://www.ntt.co.jp/news2019/1909/190902a.html As disclosed in the news release article at the URL below, "Realizing the world's first secure computation technology that can perform standard training processing for deep learning while encrypted," re-learning and fine-tuning can be performed while encrypted on the cloud side. There is also a method of performing processing such as tuning.
https://www.ntt.co.jp/news2019/1909/190902a.html

ここでの課題の一つ目は、サービス提供者の知覚による画像や音声データの確認ができないことである。実際に、問題の原因追究や機械学習の誤り等、人の知覚で行いたい作業もある。例えば、ユーザのクレーム対応やポイズニングデータの目視排除等をサービス提供者側が行いたい場合があるが、そのような確認も難しくなると考えられる。ユーザクレームとして、例えば画像を用いて姿勢推定結果を返すサービスでは、姿勢が検出できないといった意見がユーザから寄せられることが考えられる。その際、暗号化を行ったスクランブル画像等の情報では、人の形状が分からず、どのような姿勢が検出できていないかといったことを目視確認することはできない。また、姿勢が検出できなかった難読化画像のみを集め、再度姿勢推定器をチューニングする場合を考えると、スクランブル画像では目視でアノテーションを行うことはできず、プライバシーに配慮してユーザにアノテーションをしてもらう場合には学習の妨げとなる誤ったアノテーションのデータが混ざりやすい。また、人の知覚で行いたい作業としては、モニタールームでの監視なども人の知覚で何が起こっているか分かるようにしたい要望がある。人が起因する想定外のトラブルが発生した場合に、人がどのような体勢をとっているのか、どの程度動かないか等目視で判別できるように、人の形やその動きが追える様になっていないと、警備員は次の動作の判断（現地に行くなど）を行うことが難しいからである。 The first problem here is that it is not possible to confirm the image and audio data as perceived by the service provider. In fact, there are some tasks that we would like to perform using human perception, such as investigating the cause of problems and errors in machine learning. For example, the service provider may want to respond to user complaints or visually eliminate poisoning data, but such confirmation may also be difficult. As a user complaint, it is conceivable that users may complain that, for example, a service that uses images to return posture estimation results cannot detect posture. At this time, information such as an encrypted scrambled image does not know the shape of the person, and it is not possible to visually confirm what posture has not been detected. In addition, considering the case where only the obfuscated images for which the pose could not be detected are collected and the pose estimator is tuned again, it is not possible to annotate the scrambled images visually, and the user is required to annotate the scrambled images in consideration of privacy. If you ask them to do so, incorrect annotation data is likely to be mixed in, which hinders learning. In addition, there is a desire to be able to understand what is happening using human perception, such as monitoring in a monitor room, as work that would like to be done using human perception. In the event of an unexpected problem caused by a person, it is now possible to track the shape and movement of the person so that you can visually determine what position the person is in, how much they are not moving, etc. If not, it will be difficult for the security guard to decide what to do next (such as going to the site).

二つ目の課題は、暗号化はしていても生データが含まれるため、ユーザにとっては、攻撃や運用ミスなどで生データが漏洩してしまうのではないかという不安を持ちやすいことである。 The second issue is that even if it is encrypted, it still contains raw data, so users are likely to worry that the raw data may be leaked due to an attack or operational error. .

一方、画像データに関しては、プライバシーと考えられるセンシティブな情報にぼかしや置き換え等の画像処理を行い、プライバシーを保護する手法が従来から行われている。ユーザにとっては、生データを提供しない安心感はあるものの、サービス提供者側の画像解析タスクの精度は非常に低下しやすい。 On the other hand, with regard to image data, methods have been used to protect privacy by performing image processing such as blurring or replacing sensitive information that is considered to be private. Although there is a sense of security for the user by not providing raw data, the accuracy of the image analysis task on the service provider's side is likely to deteriorate significantly.

近年では、ニューラルネット等の機械学習によるタスクの解析精度を可能な限り低下させずに、プライバシー画像を生成する試みもある。このような手法は、タスクの精度をある程度保ったまま、クラウド管理者やサービス提供者が画像を知覚で判断することができ、かつ、ユーザのプライバシーも守ることができる。ユーザも原画を送信しなくてよいので、サービス利用への心理的障壁を低下させる効果があると考えられる。 In recent years, there have been attempts to generate privacy images without reducing the accuracy of task analysis using machine learning such as neural networks. Such a method allows cloud administrators and service providers to judge images based on perception while maintaining a certain degree of task accuracy, and also protects user privacy. Since users do not have to send original drawings, it is thought to have the effect of lowering psychological barriers to using the service.

例えば、特許文献１の手法では、顔器官や顔向きを推定し、アバターで顔を置き換えることで、プライバシーを守り、且つ、運転に関する行動認識精度も保つことができる。同様に、非特許文献１の手法では、顔領域をGAN（敵対的生成ネットワーク）で本人とは異なる顔で作りかえることでプライバシーを守り、且つ、行動認識の精度も保つことができる。 For example, in the method of Patent Document 1, by estimating facial organs and facial orientation and replacing the face with an avatar, it is possible to protect privacy and maintain the accuracy of behavior recognition related to driving. Similarly, in the method of Non-Patent Document 1, it is possible to protect privacy and maintain the accuracy of behavior recognition by regenerating the face area with a face different from the person's own face using a generative adversarial network (GAN).

これら特許文献１や非特許文献１の手法は、顔等の画像の一部のプライバシー領域を置き換える方法であり、画像全体のプライバシーについては考慮されていない。例えば、着ている服や肌質、部屋の様子等、サービスに不要なものが消されておらず、全体のリアリティーを消したいといった要望には対応できない。 The methods disclosed in Patent Document 1 and Non-Patent Document 1 are methods for replacing a partial privacy area of an image such as a face, and do not take into account the privacy of the entire image. For example, things that are unnecessary for the service, such as the clothes worn, skin type, and the state of the room, are not erased, and requests to erase the entire reality cannot be met.

全体のリアリティーを消す／低減することが可能な手法として、非特許文献２の手法では、動画からの行動認識を低解像画像から行う。低解像なので画像ファイルサイズを軽減できるといった利点は存在する。しかし、単なる低解像動画からは簡単な行動認識以外のタスクを行うのは難しく、適用可能なタスクが限定されてしまう。 As a method capable of erasing/reducing the overall reality, the method of Non-Patent Document 2 performs action recognition from a video from a low-resolution image. The low resolution has the advantage of reducing the image file size. However, it is difficult to perform tasks other than simple action recognition from simple low-resolution videos, and the tasks to which it can be applied are limited.

一方、非特許文献３では、ランダムノイズを大量に挿入したターゲット画像に近づくように、敵対的学習の枠組みを用いて原画像全体を変換するモデルを学習し生成している。敵対的学習の枠組みを用いることでタスクの精度を保ちつつ、ランダムノイズの入ったターゲット画像に近い画像変換モデルを学習できる。タスクとは例えば、20クラス程度までの画像認識、顔器官の認識等がある。 On the other hand, Non-Patent Document 3 uses an adversarial learning framework to learn and generate a model that transforms the entire original image so that it approaches the target image into which a large amount of random noise has been inserted. By using an adversarial learning framework, it is possible to learn an image transformation model that is close to the target image containing random noise while maintaining task accuracy. Examples of tasks include image recognition of up to about 20 classes, recognition of facial organs, etc.

同手法では、変換した画像全体からタスクの解析に不要な要素が隠されやすく、全体のリアリティーを消したいといったプライバシーに対する要望に対応しやすい。一方で、同手法ではタスクの精度の劣化も低く抑えることができる。 With this method, elements that are unnecessary for task analysis are easily hidden from the entire converted image, making it easy to respond to requests for privacy such as wanting to erase the overall reality. On the other hand, this method can also keep the deterioration of task accuracy to a low level.

特表2018-528536号公報Special Publication No. 2018-528536

Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Ren, Zhongzheng, Yong Jae Lee, and Michael S. Ryoo. "Learning to anonymize faces for privacy preserving action detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018. Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017.Ryoo, Michael S., et al. "Privacy-preserving human activity recognition from extreme low resolution." Thirty-First AAAI Conference on Artificial Intelligence. 2017. Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." arXiv preprint arXiv:1901.00098 (2019).Kim, Tae-hoon, et al. "Training with the Invisibles: Obfuscating Images to Share Safely for Learning Visual Recognition Models." arXiv preprint arXiv:1901.00098 (2019).

しかしながら、非特許文献３等の従来手法は、プライバシー保護を図ったうえで、タスク精度や目視確認等の際の扱いやすさも確保することに関して、改良の余地が残るものであった。すなわち、画像変換後の出力画像はRGB出力を前提としていたため、色情報や階調情報を十分に削減してプライバシー保護を十分に図ることができるかは不明であった。また、非特許文献３ではランダムなノイズが目ざわりであり、また、フレームごとに画像を変換して動画を生成すると目にちらつく問題があったため、扱いやすさに関する問題があった。ここで、プライバシー保護画像は、原画像の情報を削減した情報削減画像として得ることができるものであって、プライバシー保護以外の用途にも利用できるものであるが、従来手法では、タスク精度を確保して且つ扱いやすい情報削減画像を得ることができなかった。 However, the conventional methods such as Non-Patent Document 3 still have room for improvement in terms of ensuring task accuracy and ease of handling during visual confirmation, etc., while protecting privacy. That is, since the output image after image conversion was assumed to be an RGB output, it was unclear whether it was possible to sufficiently reduce color information and gradation information and sufficiently protect privacy. Furthermore, in Non-Patent Document 3, random noise is distracting, and when a moving image is generated by converting images frame by frame, there is a problem of flickering to the eyes, so there is a problem regarding ease of use. Here, a privacy-protected image can be obtained as an information-reduced image in which the information of the original image is reduced, and can be used for purposes other than privacy protection. However, with conventional methods, it is difficult to ensure task accuracy. However, it was not possible to obtain an information-reduced image that was easy to handle.

上記従来技術の課題に鑑み、本発明は、タスク精度を確保してプライバシー保護等も図ることができ、扱いやすい情報削減画像を得る画像変換処理の学習方法、当該方法に対応する画像変換装置及びプログラムを提供することを目的とする。 In view of the problems of the prior art described above, the present invention provides a learning method for image conversion processing that can ensure task accuracy, protect privacy, etc., and obtain an information-reduced image that is easy to handle, an image conversion device corresponding to the method, and The purpose is to provide programs.

上記目的を達成するため、本発明は学習方法であって、ニューラルネットワーク構造による画像変換処理のパラメータを学習する学習方法であって、訓練用画像を前記画像変換処理で変換した情報削減画像に対して評価される情報削減度合いに基づいて算出される誤差によるパラメータ更新と、前記情報削減画像を、所定タスクの認識処理を行うタスク処理で認識した結果に基づいて算出される誤差によるパラメータ更新と、を用いて前記画像変換処理のパラメータを学習し、前記画像変換処理によって前記情報削減画像は、前記訓練用画像の階調情報及び輪郭情報をそれぞれ抽出した階調情報チャンネル及び輪郭情報チャンネルを有するものとして変換され、前記情報削減度合いに基づいて算出される誤差によるパラメータ更新において、前記階調情報チャンネルの画像と前記輪郭情報チャンネルの画像とにおける情報削減度合いを評価することを特徴とする。
In order to achieve the above object, the present invention is a learning method for learning parameters of an image conversion process using a neural network structure, the learning method being a learning method for learning parameters of an image conversion process using a neural network structure, and in which a training image is converted into an information-reduced image obtained by converting a training image by the image conversion process. updating parameters based on an error calculated based on the degree of information reduction evaluated by updating the parameters based on an error calculated based on a result of recognizing the information reduced image in a task process that performs recognition processing of a predetermined task; is used to learn the parameters of the image conversion process, and the information-reduced image by the image conversion process has a gradation information channel and a contour information channel from which gradation information and contour information of the training image are respectively extracted. In the parameter update using an error calculated based on the information reduction degree, the information reduction degree in the image of the gradation information channel and the image of the contour information channel is evaluated.

また、本発明は、前記学習方法によって学習された画像変換処理のパラメータを用いて入力画像を変換することにより情報削減画像を得る画像変換装置であること、ニューラルネットワーク構造による画像変換処理を行うことで入力画像を変換して情報削減画像を得る画像変換装置であって、前記情報削減画像は、前記入力画像の階調情報及び輪郭情報をそれぞれ抽出した階調情報チャンネル及び輪郭情報チャンネルを有するものとして変換されるものであること、コンピュータに前記学習方法を実行させるプログラムであること、または、コンピュータを前記画像変換装置として機能させるプログラムであることを特徴とする。
The present invention also provides an image conversion device that obtains an information-reduced image by converting an input image using image conversion processing parameters learned by the learning method, and performs image conversion processing using a neural network structure. An image conversion device that converts an input image to obtain an information-reduced image, the information-reduced image having a gradation information channel and a contour information channel from which gradation information and contour information of the input image are respectively extracted. The image conversion device is a program that causes a computer to execute the learning method, or a program that causes a computer to function as the image conversion device.

本発明によれば、階調情報チャンネル及び輪郭情報チャンネルを有するものとして原画像を変換した情報削減画像を得るように学習することで、タスク精度を確保してプライバシー保護等も図ることができ、扱いやすい情報削減画像を得るようにすることができる。
According to the present invention, by learning to obtain an information-reduced image obtained by converting an original image into one having a gradation information channel and a contour information channel , it is possible to ensure task accuracy and protect privacy. It is possible to obtain an information-reduced image that is easy to handle.

一実施形態に係るタスク実行装置及び学習装置の機能ブロック図である。FIG. 1 is a functional block diagram of a task execution device and a learning device according to an embodiment. 画像変換部の変換NW部による処理を説明するための図である。FIG. 3 is a diagram for explaining processing by a conversion NW unit of an image conversion unit. 変換NW部における処理の例を示す図である。FIG. 3 is a diagram illustrating an example of processing in a conversion NW unit. タスク実行装置による処理例を３つの画像例として示す図である。11A and 11B are diagrams showing three example images of a processing example by the task execution device; 一実施形態に係る学習装置による学習のフローチャートである。It is a flowchart of learning by a learning device concerning one embodiment. 順伝播時の垂直ステップの階段関数と、これをなだらかにして一定勾配を有するようにした逆伝播時の傾斜ステップ状の階段関数との例を示す図である。FIG. 7 is a diagram illustrating an example of a vertical step step function during forward propagation and a slope step-like step function during back propagation, which is smoothed to have a constant slope. 各種の量子化関数等の例を示す図である。It is a figure which shows the example of various quantization functions, etc. 本発明の実施形態によるプライバシー保護効果の実験例（質問及び回答）を示す図である。It is a figure which shows the experimental example (question and answer) of the privacy protection effect by embodiment of this invention. 一般的なコンピュータにおけるハードウェア構成を示す図である。FIG. 1 is a diagram showing the hardware configuration of a general computer.

図１は、一実施形態に係るタスク実行装置及び学習装置の機能ブロック図である。図示するように、タスク実行装置10は、変換ネットワーク(NW)部11及び量子化部12を備える画像変換部1とタスク部13とを備える。タスク実行装置10は、変換NW部11及びタスク部13において所定構造の畳込ニューラルネットワークや多層パーセプトロン（以下、「畳込ニューラルネットワーク等」と呼ぶ）を含んで構成されており、予め学習されたパラメータを用いることにより、画像に対する画像変換や推論処理として画像に対するタスク実行を行うことができる。 FIG. 1 is a functional block diagram of a task execution device and a learning device according to an embodiment. As illustrated, the task execution device 10 includes an image conversion section 1 including a conversion network (NW) section 11 and a quantization section 12, and a task section 13. The task execution device 10 is configured to include a convolutional neural network and a multilayer perceptron (hereinafter referred to as "convolutional neural network etc.") having a predetermined structure in the conversion NW unit 11 and the task unit 13. By using parameters, it is possible to perform tasks on images such as image conversion and inference processing on images.

このタスク実行装置10のパラメータを学習するための構成が、学習によるパラメータ更新対象となるタスク実行装置10自身と、学習部20と、を備えて構成される学習装置30である。学習部20は、階調更新部21及び輪郭更新部22を備えるプライバシー用更新部2と、タスク用更新部23と、を備える。 A configuration for learning parameters of the task execution device 10 is a learning device 30 that includes the task execution device 10 itself, which is subject to parameter update through learning, and a learning section 20. The learning unit 20 includes a privacy update unit 2 including a gradation update unit 21 and a contour update unit 22, and a task update unit 23.

学習装置30が実行する学習により、画像変換部1中の変換NW部11のパラメータが逐次的に更新され、学習が完了した時点のパラメータで構成される画像変換部1が、タスク実行装置10に備わる画像変換部1として利用可能となる。このパラメータ更新では図１中に線L11,L12として示されるように、「階調更新部21→量子化部12→変換NW部11」及び「輪郭更新部22→量子化部12→変換NW部11」の順で誤差逆伝播を行うことで変換NW部11の重みパラメータを更新する。ファインチューニングを行う場合は同様に、タスク実行装置10に備わるタスク部13のパラメータも逐次的に更新され、学習完了時点のパラメータで構成されるタスク部13が、タスク実行装置10に備わるタスク部13として利用可能となる。このパラメータ更新では図１中に線L23として示されるように、「タスク用更新部23→タスク部13→量子化部12→変換NW部11」の順で誤差逆伝播を行うことでタスク部13及び変換NW部11の重みパラメータを更新する。（ファインチューニングを行わない場合、タスク部13では重みパラメータを固定して更新せず、変換NW部11の重みパラメータのみを更新する。）以下、学習装置30によってタスク実行装置10のパラメータは学習済みであるものとし、当該学習処理の詳細は後述することとして、推論処理を行うタスク実行装置10の処理を説明する。 Through the learning performed by the learning device 30, the parameters of the conversion NW unit 11 in the image conversion unit 1 are sequentially updated, and the image conversion unit 1 configured with the parameters at the time when the learning is completed is updated to the task execution device 10. It can be used as the included image conversion unit 1. In this parameter update, as shown as lines L11 and L12 in FIG. The weight parameters of the conversion NW unit 11 are updated by performing error backpropagation in the order of ``11''. Similarly, when performing fine tuning, the parameters of the task part 13 provided in the task execution device 10 are also updated sequentially, and the task part 13 configured with the parameters at the time of completion of learning is changed to the task part 13 provided in the task execution device 10. It will be available as. In this parameter update, as shown as line L23 in FIG. and update the weight parameters of the conversion NW unit 11. (If fine tuning is not performed, the weight parameters are fixed and not updated in the task unit 13, and only the weight parameters of the conversion NW unit 11 are updated.) Below, the parameters of the task execution device 10 have been learned by the learning device 30. The details of the learning process will be described later, and the process of the task execution device 10 that performs the inference process will be described.

なお、図１において、機能ブロック間のデータ授受を表す矢印のうち、実線矢印で描かれているものはタスク実行装置10による推論処理及び学習装置30による学習処理（学習の際の誤差順伝播）の両方に関連するものであり、一点鎖線矢印で描かれているものは学習装置30による学習処理（学習の際の誤差逆伝播）に関連するものである。 In FIG. 1, among the arrows representing data transfer between functional blocks, those drawn with solid lines indicate inference processing by the task execution device 10 and learning processing by the learning device 30 (forward error propagation during learning). , and those drawn with dashed-dotted arrows are related to the learning process (error backpropagation during learning) by the learning device 30.

なお、タスク実行装置10の運用上、次のようにしてもよい。すなわち、画像変換部1を、プライバシー保護対象となる画像を取得するユーザ側に配置し、プライバシー保護画像を用いたタスク実行を行うクラウド側にタスク部13を配置してよい。すなわち、画像変換部1とタスク部13とはネットワーク経由で通信して、プライバシー保護画像を画像変換部1からタスク部13へと送信するようにしてよい。あるいは、画像変換部1及びタスク部13の両方をクラウド側に配置し、プライバシー保護対象となる画像（原画像）を暗号化してクラウド側に送信したうえで画像変換部1において変換した後、この原画像を削除することで、クラウド側に保存しないようにしてもよい。このようにして、プライバシー保護を実現しつつ、タスク実施が可能となる。 Note that the task execution device 10 may be operated as follows. That is, the image conversion unit 1 may be placed on the side of the user who acquires the image subject to privacy protection, and the task unit 13 may be placed on the side of the cloud that executes the task using the privacy protected image. That is, the image conversion unit 1 and the task unit 13 may communicate via a network, and the privacy protected image may be transmitted from the image conversion unit 1 to the task unit 13. Alternatively, both the image conversion unit 1 and the task unit 13 may be placed on the cloud side, and the image subject to privacy protection (original image) is encrypted and sent to the cloud side, converted by the image conversion unit 1, and then By deleting the original image, it may be possible to prevent it from being saved on the cloud side. In this way, tasks can be performed while protecting privacy.

タスク実行装置10の画像変換部1は、畳込ニューラルネットワーク等で構成されるものであり、そのパラメータ（ニューラルネットワークの重みパラメータ等）を予め学習して求めておくことにより、原画像を変換してプライバシー保護画像を出力することが可能となる。タスク実行装置10のタスク部13も、画像に対して所定のタスク（物体認識や姿勢推定など）を実行する所定構造の畳込ニューラルネットワーク等で構成されるものであり、そのパラメータを予め学習して求めておくことにより、画像に対してタスク実行して結果を出力することが可能となる。タスク部13に関しては、タスク内容に応じた任意の既存の学習済みのニューラルネットワーク等を用いるようにしてよいが、この際、既存データセットで学習済みの重みパラメータを用いてもよいし、画像変換部1の出力するプライバシー保護画像に対するタスク実行精度を向上させるべく、再学習（ファインチューニング）を行った重みパラメータを用いるようにしてもよい。なお、画像変換部1のプライバシー保護度合い(逆の評価基準ではプライバシー漏洩度合いも意味する)を簡易に調整する機能として、量子化部12を利用することができる。 The image conversion unit 1 of the task execution device 10 is composed of a convolutional neural network, etc., and converts the original image by learning and determining its parameters (neural network weight parameters, etc.) in advance. This makes it possible to output privacy-protected images. The task unit 13 of the task execution device 10 is also composed of a convolutional neural network or the like with a predetermined structure that executes a predetermined task (object recognition, pose estimation, etc.) on an image, and its parameters are learned in advance. By determining this in advance, it becomes possible to execute a task on an image and output the result. Regarding the task part 13, any existing trained neural network etc. may be used depending on the task content, but in this case, weight parameters trained with an existing data set may be used, or image transformation In order to improve the accuracy of task execution for the privacy-protected image output by the unit 1, weight parameters that have been re-learned (fine-tuned) may be used. Note that the quantization unit 12 can be used as a function to easily adjust the degree of privacy protection (which also means the degree of privacy leakage in the opposite evaluation criteria) of the image conversion unit 1.

以下さらに、主に推論時を対象として、各部の概要を説明する。 The outline of each part will be further explained below, mainly focusing on inference.

画像変換部1は、画像の難読化器の役割を有し、変換対象となる画像（ユーザが提供する、プライバシー保護の対象となる画像）を変換して、画像全体に関するプライバシー保護が実現されたプライバシー保護画像を出力する。この際、画像変換部1では変換NW部11及び量子化部12においてこの順番で入力画像を処理することで、プライバシー保護画像を得る。 The image conversion unit 1 has the role of an image obfuscator, and converts the image to be converted (the image provided by the user and subject to privacy protection), thereby achieving privacy protection for the entire image. Output a privacy protected image. At this time, in the image conversion unit 1, the conversion NW unit 11 and the quantization unit 12 process the input image in this order to obtain a privacy protected image.

図２は、画像変換部1の変換NW部11による処理を説明するための図である。画像変換部11は深層学習ネットワークとして構成することができ、図２では当該ネットワーク構造の模式図として変換NW部11を、入力段Sin及び出力段Soutを有するものとして描いている。変換NW11では、通常のカラー画像（一般カラー画像）として構成される入力画像Pin=(Pa,Pb,Pc)（各画像Pa,Pb,Pcは例えばRGB画像として構成される入力画像PinにおけるR,G,Bの３つの各チャンネル画像）を入力段Sinにおいて読み込み、これにプライバシー保護を施したものとしての出力画像Pout=(P1,P2)を出力段Soutにおいて得て、この出力画像を量子化部12へと出力する。 FIG. 2 is a diagram for explaining processing by the conversion NW unit 11 of the image conversion unit 1. The image conversion unit 11 can be configured as a deep learning network, and in FIG. 2, the conversion NW unit 11 is depicted as having an input stage Sin and an output stage Sout as a schematic diagram of the network structure. In the conversion NW11, input image Pin=(Pa,Pb,Pc) configured as a normal color image (general color image) (each image Pa, Pb, Pc is R, The three channel images G, B) are read in the input stage Sin, and the output image Pout=(P1,P2) with privacy protection applied to it is obtained in the output stage Sout, and this output image is quantized. Output to section 12.

この出力画像Pout=(P1,P2)は第１チャンネルCH1の第１画像P1及び第２チャンネルCH2の第２画像P2の２チャンネルで構成される画像であり、例えば、第１画像P1は入力画像Pinから平坦化により階調情報を抽出した階調画像として構成されるものであり、第２画像P2は入力画像Pinからエッジ抽出により輪郭情報を抽出した輪郭画像として構成されるものであり、当該2チャンネルの出力画像Pout=(P1,P2)によって、入力画像Pinに対してプライバシー保護を施すことができる。（なお、後段側の量子化部12において出力画像Poutを量子化することで、さらにプライバシー保護を強化した画像を得ることができるが、変換NW部11のみにおいても一定のプライバシー保護が可能となる。） This output image Pout=(P1,P2) is an image composed of two channels: the first image P1 of the first channel CH1 and the second image P2 of the second channel CH2. For example, the first image P1 is the input image The second image P2 is configured as a gradation image obtained by extracting gradation information from the input image Pin by flattening, and the second image P2 is configured as a contour image obtained by extracting contour information from the input image Pin by edge extraction. By using the two-channel output image Pout=(P1, P2), privacy protection can be applied to the input image Pin. (Although it is possible to obtain an image with further enhanced privacy protection by quantizing the output image Pout in the subsequent quantization unit 12, a certain level of privacy protection is also possible only in the conversion NW unit 11. .)

図３は、変換NW部11における処理の例として、これらの入力画像Pin、第１画像P1、第２画像P2、出力画像Poutの例を、入力画像Pinがバイクに乗っている人を撮影したカラー画像である場合に関して示す図である。入力画像Pinより第１画像P1において階調情報が抽出され、第２画像P2において輪郭情報が抽出されている様子を見て取ることができる。 FIG. 3 shows an example of the input image Pin, the first image P1, the second image P2, and the output image Pout as an example of the processing in the conversion NW unit 11. FIG. 6 is a diagram illustrating a case where the image is a color image. It can be seen that gradation information is extracted from the input image Pin in the first image P1, and outline information is extracted in the second image P2.

量子化部12は、変換NW部11から得た画像に対して量子化を施してさらにプライバシー保護を図った画像を、画像変換部1の出力としてタスク部13へと出力する。図１にも示されるように、量子化部12での量子化数（4値、8値、16値等）はユーザあるいは管理者等によって設定することが可能である。 The quantization unit 12 performs quantization on the image obtained from the conversion NW unit 11 to further protect privacy and outputs the image to the task unit 13 as an output of the image conversion unit 1. As shown in FIG. 1, the number of quantizations (4-value, 8-value, 16-value, etc.) in the quantization unit 12 can be set by a user, an administrator, or the like.

なお、量子化部12の実装として、変換NW部11を構成するネットワークの最終層に対して適用する量子化関数としてもよいし、変換NW部11を構成するネットワークの最終層ではこのような量子化関数を適用せずに画像を出力して、この出力画像に対してさらに、量子化を行うようにしてもよい。 Note that the quantization unit 12 may be implemented as a quantization function applied to the final layer of the network constituting the conversion NW unit 11, or such a quantization function may be used in the final layer of the network constituting the conversion NW unit 11. It is also possible to output the image without applying the quantization function and further perform quantization on this output image.

なお、以上の変換NW部11及び量子化部12の詳細は、学習の説明の際に後述する。 Note that details of the above-mentioned conversion NW section 11 and quantization section 12 will be described later when explaining learning.

タスク部13は、画像変換部1より得られた当該プライバシー保護画像に対して所定のキイポイント抽出タスク（例えば姿勢推定や顔器官検出）や物体検出を実施して結果（例えば姿勢推定結果や顔器官検出結果、物体検出結果）を出力する。 The task unit 13 performs predetermined key point extraction tasks (for example, pose estimation and facial organ detection) and object detection on the privacy protected image obtained by the image conversion unit 1, and extracts the results (for example, pose estimation results and facial organ detection). (organ detection results, object detection results).

タスクについては、例えばトップダウンな姿勢推定については以下の非特許文献４に開示されるものを用いてよい。タスク内部では、正解データの骨格の座標からヒートマップを作成し、それに近づけるように深層ニューラルネットワークのモデルを学習している。前述の通りタスク部13では既存手法を用いることにより、姿勢推定に限らず、顔器官検出や物体検出等の任意のタスクをプライバシー保護画像に対して行い、その結果を出力することができる。
[非特許文献４] Chen, Yilun, et al. "Cascaded pyramid network for multi-person pose estimation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. Regarding the task, for example, for top-down posture estimation, the one disclosed in Non-Patent Document 4 below may be used. Inside the task, a heat map is created from the coordinates of the skeleton of the correct answer data, and a deep neural network model is trained to approximate it. As mentioned above, by using the existing method, the task unit 13 can perform arbitrary tasks such as not only pose estimation but also facial organ detection and object detection on the privacy protected image, and output the results.
[Non-patent Document 4] Chen, Yilun, et al. "Cascaded pyramid network for multi-person pose estimation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

図４は、以上のタスク実行装置10による処理例を３つの画像例EX1～EX3として示す図である。原画像EX1（タスク実行装置10への入力画像）に対して、画像EX2は画像変換部11からの出力としてのプライバシー保護された画像（原画像EX1より得られるプライバシー保護画像)の例である。原画像EX1において人物がポーズを取っている状態（サッカーの練習中の状態）が撮影され、この原画像EX1ではカラーや詳細なグラデーションが残っているのに対し、これを変換したプライバシー保護画像EX2においては、RBGの３次元的な色情報が削除され、4値など少ない値で階調が平坦化されることで難読化され、画像の全体的な観点からのプライバシー保護が実現されている様子を見て取ることができる。階調のみではタスク精度を保ちにくいため、輪郭情報も残している。 FIG. 4 is a diagram showing an example of processing by the task execution device 10 as three image examples EX1 to EX3. In contrast to the original image EX1 (an input image to the task execution device 10), the image EX2 is an example of a privacy-protected image output from the image conversion unit 11 (a privacy-protected image obtained from the original image EX1). In the original image EX1, a person is photographed posing (during soccer practice), and while the original image EX1 retains the colors and detailed gradations, the converted privacy-protected image EX2 In this case, the three-dimensional color information of RBG is deleted, and the gradation is flattened with a small number of values such as 4 values, making it obfuscated and protecting privacy from the overall perspective of the image. can be seen. Since it is difficult to maintain task accuracy with gradation alone, contour information is also retained.

2チャンネルで構成されるプライバシー保護画像EX2の表示に際しては例えば、階調情報をRチャンネルに、輪郭情報をGチャンネルにして表示し、Bチャンネルは全ての画素に最大階調をダミーで入れて表示することで、通常のカラー画像を表示する方式を用いた表示が可能である。例EX3はプライバシー保護画像EX2から検出した骨格の１例であり、例EX1の原画像に対して抽出骨格を重畳したものとして示している。（なお、本発明者の実験によれば、非特許文献４の姿勢推定器を画像変換部1の学習時にチューニングし、データセットの画像を画像変換部1で変換して評価（タスク部13に入力）したところ、プライバシー保護画像から5%以下の精度劣化で骨格を抽出できている。） When displaying a privacy protection image EX2 consisting of two channels, for example, gradation information is displayed in the R channel, contour information is displayed in the G channel, and the B channel is displayed with the maximum gradation in all pixels as a dummy. By doing so, display using a method of displaying a normal color image is possible. Example EX3 is an example of a skeleton detected from the privacy protection image EX2, and is shown as an extracted skeleton superimposed on the original image of example EX1. (According to experiments by the present inventor, the pose estimator of Non-Patent Document 4 was tuned during learning of the image conversion unit 1, and images of the dataset were converted by the image conversion unit 1 and evaluated (task unit 13). input), the skeleton can be extracted from the privacy-protected image with less than 5% accuracy degradation.)

階調情報と輪郭情報については前述した通り変換NW部11において抽出され、その抽出の詳細については学習の説明において後述するが、これを用いることの技術的意義等は以下の通りである。 The gradation information and contour information are extracted by the conversion NW unit 11 as described above, and the details of the extraction will be described later in the explanation of learning, but the technical significance of using them is as follows.

すなわち、従来技術のプライバシー保護画像ではRGB等の通常の色空間利用を前提としていたが、本発明の実施形態においては、画像変換部1による出力を「R」「G」「B」の様な色空間のチャンネルとして学習するのではなく、「階調チャンネル（CH1）」「輪郭チャンネル(CH2)」の２チャンネルに分けて学習している。姿勢推定の様なキイポイント抽出タスクでは、解像度を保ったシャープな輪郭や、領域が分かる程度の階調が重要であることが本発明者らの事前実験で分かったため、これらの「階調」「輪郭」の２つのチャンネル設定としている。また、物体検出（人検出等）のタスクにもこの２つの要素は非常に有効であることが本発明者らの事前実験から分かっている。このように色を出力するチャンネルを無くすことにより、明示的に階調数を限るだけでなく、明示的に色情報が残らないプライバシー保護画像を生成できる。また、CH1とCH2を個別に制御すれば良いため、安定して似た入力の画像に対して似ている画像を出力しやすく、映像入力を扱う場合にフリッカーの少ない画像を生成しやすい。 In other words, the privacy protection image of the prior art is based on the use of normal color spaces such as RGB, but in the embodiment of the present invention, the output from the image conversion unit 1 is converted into colors such as "R", "G", "B", etc. Rather than learning as a color space channel, it is divided into two channels: the gradation channel (CH1) and the contour channel (CH2). In key point extraction tasks such as pose estimation, the inventors found through preliminary experiments that sharp contours that maintain resolution and gradations that are sufficient to identify the area are important, so these "gradations" Two channels are set for "contour". Furthermore, preliminary experiments conducted by the inventors have shown that these two elements are very effective for tasks such as object detection (human detection, etc.). By eliminating channels that output colors in this way, it is possible not only to explicitly limit the number of gradations, but also to generate a privacy-protected image in which no color information remains explicitly. In addition, since it is only necessary to control CH1 and CH2 individually, it is easy to stably output similar images for similar input images, and it is easy to generate images with less flicker when handling video input.

なお、図４の例EX2でも説明した通り、通常の色情報（例えばRGBチャンネルやYCbCrチャンネルによる色情報）が消失した２チャンネルCH1,CH2で構成されるプライバシー保護画像に関して、CH1やCH2をそれぞれRやGチャンネル（またはYやCbなどの色チャンネル）に割り当てることで、通常の画像として出力や目視確認ができる。（使わないチャンネルBについては、全て同値、またはGチャンネルやRチャンネルと同じ情報を入れておけばよい。同様にCrチャンネルについても、全て同値、またはYチャンネルやCbチャンネルと同じ情報を入れておけばよい。） As explained in Example EX2 in Figure 4, regarding the privacy protection image composed of two channels CH1 and CH2 in which normal color information (for example, color information from RGB channels and YCbCr channels) has disappeared, CH1 and CH2 are R By assigning it to the or G channel (or color channels such as Y or Cb), you can output and visually check it as a normal image. (For channel B, which is not used, enter the same values or the same information as the G channel and R channel. Similarly, for the Cr channel, enter the same values or the same information as the Y channel and Cb channel. Good luck.)

また、量子化部13の技術的意義等は次の通りである。 Further, the technical significance of the quantization unit 13 is as follows.

すなわち、プライバシーの感じ方は人により異なるため、プライバシー保護画像の保護の度合いを簡単に調整できることが望ましい。運用やメモリ、処理速度の観点から、画像変換器（本発明の実施形態では画像変換部1）のモデルサイズを小さく抑えたり、プライバシーレベルに応じて複数のモデルを用意しなくて良いことも大切である。しかし、従来手法では、タスク精度を保ったプライバシーの調整を簡単には行えなかった。（もし行いたい場合には、別途プライバシーレベルの異なる画像変換器を学習させ、複数の画像変換器を準備する必要があると考えられる。） That is, since the perception of privacy differs from person to person, it is desirable to be able to easily adjust the degree of protection of a privacy protection image. From the perspective of operation, memory, and processing speed, it is important to keep the model size of the image converter (image converter 1 in the embodiment of this invention) small, and to avoid having to prepare multiple models depending on the privacy level. It is. However, with conventional methods, it is not easy to adjust privacy while maintaining task accuracy. (If you want to do this, you may need to separately learn image converters with different privacy levels and prepare multiple image converters.)

本発明の実施形態では、出力した画像または画像変換部1の最終層に対して適用する量子化関数を入れ替えることで調整可能となる。例えば、画像変換部1から出力された画像に対し、4値化のための量子化関数を適用すればプライバシー重視、8値化や16値化のための量子化関数を適用すれば精度重視とすることができ、複数の画像変換器モデルを持つことなく、個別ユーザのプライバシーの感じ方に応じてユーザプライバシー保護度合いを簡単に切り替えることができる。複数の量子化関数(4値、8値、16値等)を切り替えて画像変換部1を学習させておくことで、単純なN値化画像等をCH1に用いた場合と比較して、タスク精度を保つことができる。 In the embodiment of the present invention, adjustment is possible by replacing the quantization function applied to the output image or the final layer of the image conversion unit 1. For example, applying a quantization function for 4-value conversion to the image output from image conversion unit 1 places emphasis on privacy, and applying a quantization function for 8-value conversion or 16-value conversion places emphasis on accuracy. It is possible to easily switch the degree of user privacy protection according to the individual user's perception of privacy without having to have multiple image converter models. By having the image conversion unit 1 learn by switching between multiple quantization functions (4-value, 8-value, 16-value, etc.), the task can be Accuracy can be maintained.

図５は、一実施形態に係る学習装置30による学習のフローチャートである。図５の実施形態によれば、図４に例EX2の画像として例示したように画像変換部1による変換でプライバシー保護に優れており、且つ、タスク部13によるタスク実行精度も確保された画像が得られるように、画像変換部1のパラメータを学習して得ることができる。この際、ファインチューニングによりタスク部13のパラメータを学習することも可能である。以下、図５の各ステップを説明する。 FIG. 5 is a flowchart of learning by the learning device 30 according to one embodiment. According to the embodiment of FIG. 5, as illustrated in FIG. 4 as an image of example EX2, an image which is converted by the image converting unit 1 to have excellent privacy protection and also ensures accuracy of task execution by the task unit 13 is obtained. It can be obtained by learning the parameters of the image conversion unit 1 so that it can be obtained. At this time, it is also possible to learn the parameters of the task unit 13 by fine tuning. Each step in FIG. 5 will be explained below.

図５のフローを開始するにあたり、学習対象となる画像変換部1（変換NW部11）のパラメータには初期値をランダム値等として設定しておく。タスク部13に対してファインチューニングを行う場合には、既存モデルで学習済みのパラメータをタスク部13のパラメータの初期値として設定しておく。（ファインチューニングを行わない場合には、タスク部13のパラメータはこの初期値のままで固定しておけばよい。） Before starting the flow shown in FIG. 5, initial values are set as random values or the like for the parameters of the image conversion unit 1 (conversion NW unit 11) to be learned. When performing fine tuning on the task unit 13, parameters learned using an existing model are set as initial values of the parameters of the task unit 13. (If fine tuning is not performed, the parameters of task part 13 can be fixed at these initial values.)

図５のフローが開始されるとステップS11では、プライバシー保護接続の構成の学習装置30において変換NW部11の学習を行い、そのパラメータを更新してから、ステップS12へと進む。プライバシー保護接続の構成とは、学習装置30において画像変換部1及びプライバシー用更新部2のみを利用する構成として定義され、ステップS11では当該構成において順伝播及び逆伝播の計算を行うことでパラメータを更新する。すなわち、ステップS11では訓練用画像を用いて「変換NW部11→量子化部12→プライバシー用更新部2」の順に順伝播の計算を行ってプライバシー用更新部2において誤差を求め、この誤差を「量子化部12（後述する逆量子化）→変換NW部11」の順に逆伝播することで線L11,L12に示されるように変換NW部11のパラメータを更新する。 When the flow of FIG. 5 is started, in step S11, the conversion NW unit 11 is trained in the learning device 30 configured for privacy protection connection, and its parameters are updated, and then the process proceeds to step S12. The privacy-preserving connection configuration is defined as a configuration in which only the image conversion unit 1 and privacy update unit 2 are used in the learning device 30, and in step S11, parameters are calculated by performing forward propagation and back propagation calculations in this configuration. Update. That is, in step S11, forward propagation calculations are performed in the order of "conversion NW unit 11 → quantization unit 12 → privacy update unit 2" using the training image, an error is obtained in the privacy update unit 2, and this error is The parameters of the conversion NW unit 11 are updated as shown by lines L11 and L12 by backpropagating in the order of “quantization unit 12 (inverse quantization described later)→conversion NW unit 11”.

なお、誤差とは正解等からの乖離が大きいほど値が大きくなる損失（コスト）であるが、「誤差」の用語を用いることとする。 Note that error is a loss (cost) whose value increases as the deviation from the correct answer increases, and the term "error" will be used here.

ステップS11にて順伝播の際、画像変換部1では当該時点で設定されているパラメータを用いて訓練用画像（ミニバッチを構成する各画像）を変換してプライバシー保護画像を得て、プライバシー用更新部2へと出力する。この際、線L1,L2に示されるように、プライバシー保護画像のうち階調情報が抽出された第１チャンネルCH1の画像を階調更新部21へと出力し、プライバシー保護画像のうち輪郭情報が抽出された第２チャンネルCH2の画像を輪郭更新部22へと出力する。プライバシー用更新部2では、当該得られたプライバシー保護画像について、プライバシー保護の性能を評価するものとして誤差を階調更新部21及び輪郭更新部22においてそれぞれ計算する。
During forward propagation in step S11, the image conversion unit 1 converts the training images (each image forming the mini-batch) using the parameters set at that time to obtain a privacy-protected image, and updates it for privacy. Output to part 2. At this time, as shown by lines L1 and L2, the image of the first channel CH1 from which the gradation information has been extracted from the privacy protected image is output to the gradation updating unit 21, and the contour information from the privacy protected image is output to the gradation updating unit 21. The extracted image of the second channel CH2 is output to the contour updating section 22. In the privacy updating unit 2, the gradation updating unit 21 and the contour updating unit 22 each calculate an error for evaluating the privacy protection performance of the obtained privacy protected image.

ここで、プライバシー用更新部2での学習方法の詳細に関しては後述するが、種々の実施形態によって学習することが可能である。例えば、線L10として示されるように、画像変換部1でプライバシー保護画像へと変換する対象となった原画像である訓練用画像を加工した画像を利用して学習する実施形態が可能である。 Here, details of the learning method in the privacy update unit 2 will be described later, but learning can be performed by various embodiments. For example, as shown by line L10, an embodiment is possible in which learning is performed using an image obtained by processing a training image, which is an original image to be converted into a privacy-protected image by the image conversion unit 1.

このとき、第１の方法では、プライバシーを保護できていない度合い（プライバシー漏洩度合い）が高いほど誤差値が高くなる単純な基準での誤差関数を設定して、階調更新部21及び輪郭更新部22の各々において計算する。 At this time, in the first method, an error function is set based on a simple standard in which the higher the degree of privacy failure (privacy leakage degree), the higher the error value is, and the gradation update unit 21 and the contour update unit Calculate in each of 22.

また、第２の方法では、階調更新部21及び輪郭更新部22の各々においてGAN（敵対的生成ネットワーク）のフレームワークを用いてプライバシー画像制御を行うことができる。GANでは一般的に、学習時は生成器と識別器を敵対的に学習させ、生成器の出力(フェイク画像)をターゲット画像(リアル画像)に近づける。識別器は、生成器からの出力をフェイク画像、ターゲット画像をリアル画像とし、その真偽を学習する。推論（画像生成）時には、識別器は必要なく、生成器からターゲット画像に似た画風の画像を出力できる。このGANのフレームワークをプライバシー用更新部2に適用し、訓練用画像を加工したターゲット画像に近づくように画像変換部1（変換NW部11）を学習する。そのためには、訓練用画像を加工した画像をターゲット画像として、画像変換部1を生成器とし、敵対的に画像変換部1(生成器)と識別器を学習すればよい。なお、画像変換部1(生成器)は１つの生成器として共有し、識別器に関しては、階調更新部21及び輪郭更新部22の各々において設けるようにすればよい。（なお、当該第２の方法を用いる場合において階調更新部21及び輪郭更新部22の各々に当該設ける識別器（及びターゲット画像を得るための加工器）の細部構成は、図１では描くのを省略しているが、階調更新部21及び輪郭更新部22の各々が識別器及び加工器を有するものとして、GANのフレームワークに即した学習を行うことが可能である。） Furthermore, in the second method, privacy image control can be performed using a GAN (Generative Adversarial Network) framework in each of the gradation updating section 21 and the contour updating section 22. In GAN, generally, during training, the generator and classifier are trained adversarially to bring the output of the generator (fake image) closer to the target image (real image). The classifier uses the output from the generator as a fake image and the target image as a real image, and learns the authenticity of the images. At the time of inference (image generation), a classifier is not required, and the generator can output an image with a style similar to the target image. This GAN framework is applied to the privacy update unit 2, and the image conversion unit 1 (conversion NW unit 11) is trained to approach the target image obtained by processing the training image. To do this, it is sufficient to use an image obtained by processing a training image as a target image, use the image conversion unit 1 as a generator, and learn the image conversion unit 1 (generator) and the classifier in an adversarial manner. Note that the image converting section 1 (generator) may be shared as one generator, and the discriminator may be provided in each of the gradation updating section 21 and the contour updating section 22. (In the case of using the second method, the detailed configuration of the discriminator (and processing device for obtaining the target image) provided in each of the gradation update section 21 and the contour update section 22 is not shown in FIG. Although omitted, it is possible to perform learning in accordance with the GAN framework assuming that each of the gradation update unit 21 and contour update unit 22 has a classifier and a processor.)

識別器は画像変換部1からの出力をフェイク、ターゲット画像をリアルとして学習をする。利用時には、画像変換部11の出力をターゲット画像に近づけることが可能である。ターゲット画像をプライバシーが保護されやすく精度も保ちやすい画像特徴をもつようにあらかじめ加工しておくことで、後述するタスク精度を維持しようとしてもプライバシーを保ちやすいプライバシー画像制御を行える。原画像に対する加工によりターゲット画像を得ることに関しては後述する。 The classifier performs learning by using the output from the image conversion unit 1 as a fake and the target image as a real image. When used, it is possible to bring the output of the image converter 11 closer to the target image. By processing the target image in advance so that it has image characteristics that make it easy to protect privacy and maintain accuracy, privacy image control that makes it easy to maintain privacy even when trying to maintain task accuracy, which will be described later, can be performed. Obtaining the target image by processing the original image will be described later.

ステップS11にて逆伝播の際は、上記ミニバッチを構成する各訓練用画像について計算した誤差を用いて、プライバシー用更新部2（後述するように、階調更新部21及び輪郭更新部22の両方それぞれ）が画像変換部1に対して誤差逆伝播の計算を行うことで、線L11,L12に示されるように画像変換部1のパラメータを更新する。逆伝播の計算には、任意の既存手法として確率的勾配降下法等のオプティマイザーを用いればよく、画像変換部1を構成するネットワーク内を出力側から入力側へと逆方向に辿る計算を行えばよい。 During backpropagation in step S11, the error calculated for each training image constituting the mini-batch is used to update the privacy updater 2 (as described later, both the gradation updater 21 and the contour updater 22). ) perform error backpropagation calculations on the image conversion unit 1, thereby updating the parameters of the image conversion unit 1 as shown by lines L11 and L12. Backpropagation calculations can be performed using any existing method such as an optimizer such as stochastic gradient descent, which performs calculations that trace the network that constitutes the image transformation unit 1 in the reverse direction from the output side to the input side. That's fine.

ステップS11では以上のようにしてプライバシー用更新部2において、その詳細を後述する誤差を用いて逆伝播により画像変換部11のパラメータを更新することで、画像変換部1による変換で得られる画像がプライバシー保護性能に関してより優れたものとなることが期待される。 In step S11, as described above, the privacy update unit 2 updates the parameters of the image conversion unit 11 by back propagation using the error, the details of which will be described later, so that the image obtained by the conversion by the image conversion unit 1 is updated. It is expected that the privacy protection performance will be even better.

ステップS12では、タスク接続の構成の学習装置30において画像変換部1の学習を行い、そのパラメータを更新してから、ステップS13へと進む。なお、タスク部13のファインチューニングを行う場合、ステップS12では画像変換部1の学習と共にタスク部13の学習も行い、タスク部13のパラメータも更新したうえで、ステップS13へと進む。タスク接続の構成とは、学習装置30において画像変換部1、タスク部13及びタスク用更新部23のみを利用する構成として定義され、ステップS12では当該構成において順伝播及び逆伝播の計算を行うことで、画像変換部1の変換NW部11（及びファインチューニングを行う場合はさらにタスク部13）の重みパラメータを更新する。すなわち、ステップS12では訓練用画像を用いて「変換NW部11→量子化部12→タスク部13→タスク用更新部23」の順に順伝播の計算を行ってタスク用更新部23において誤差を求め、「タスク部13→量子化部12（後述する逆量子化）→変換NW部12」の順でこの誤差を逆伝播することで、線L21（及び線L23）に示されるように画像変換部11（及びタスク部13）の重みパラメータを更新する。 In step S12, the image conversion unit 1 is trained in the learning device 30 having the task connection configuration, and its parameters are updated, and then the process proceeds to step S13. Note that when performing fine tuning of the task unit 13, in step S12, the learning of the task unit 13 is performed as well as the learning of the image conversion unit 1, and the parameters of the task unit 13 are also updated, before proceeding to step S13. The task connection configuration is defined as a configuration in which only the image conversion unit 1, task unit 13, and task update unit 23 are used in the learning device 30, and forward propagation and back propagation calculations are performed in this configuration in step S12. Then, the weight parameters of the conversion NW unit 11 of the image conversion unit 1 (and further the task unit 13 when performing fine tuning) are updated. That is, in step S12, forward propagation calculation is performed in the order of "conversion NW unit 11 → quantization unit 12 → task unit 13 → task update unit 23" using the training image, and the error is obtained in the task update unit 23. By backpropagating this error in the order of "task unit 13 → quantization unit 12 (inverse quantization described later) → conversion NW unit 12", the image conversion unit 11 (and task unit 13).

ステップS12にて順伝播の際、画像変換部1では当該時点で設定されている重みを含めた各種パラメータを用いて訓練用画像（ミニバッチを構成する各画像）を変換してプライバシー保護画像を得て、タスク部13へと出力する。タスク部13では、当該得られたプライバシー保護画像について、当該時点で設定されているパラメータ（ファインチューニングを行わない場合は固定パラメータ）を用いて、認識等のタスクを実行し、その結果をタスク用更新部23へと出力する。タスク用更新部23では、当該得られたプライバシー保護画像に対する認識等のタスクの結果について、訓練用画像に対して予め付与されているタスクについての正解と比較し、得られた結果が正解に近いほど値が小さくなるものとして、誤差を計算する。 During forward propagation in step S12, the image conversion unit 1 converts the training images (each image forming the mini-batch) using various parameters including the weights set at that time to obtain a privacy-protected image. and outputs it to the task section 13. The task unit 13 executes tasks such as recognition on the obtained privacy-protected image using the parameters set at the time (fixed parameters if fine tuning is not performed), and uses the results for the task. Output to the update unit 23. The task update unit 23 compares the results of tasks such as recognition for the obtained privacy protection image with the correct answer for the task assigned in advance to the training image, and determines that the obtained result is close to the correct answer. The error is calculated assuming that the value becomes smaller as the value increases.

ステップS12にて逆伝播の際は、上記ミニバッチを構成する各訓練用画像について計算した誤差を用いて、タスク用更新部23が画像変換部1及びタスク部13に対して誤差逆伝播の計算を行うことで、線L23に示されるように画像変換部1の変換NW部11（及びタスク部13）の重みパラメータを更新する。逆伝播の計算には、ステップS11と同様に任意の既存手法としてオプティマイザーを用いればよく、画像変換部1及びタスク部13の直列接続（画像変換部1→タスク部13）で構成されるネットワーク内を出力側から入力側へと逆方向（タスク部13→画像変換部1の方向）に辿る計算を行えばよい。 During backpropagation in step S12, the task update unit 23 performs backpropagation calculations on the image conversion unit 1 and the task unit 13 using the error calculated for each training image constituting the mini-batch. By doing so, the weight parameters of the conversion NW unit 11 (and task unit 13) of the image conversion unit 1 are updated as shown by line L23. For backpropagation calculations, an optimizer can be used as any existing method as in step S11, and a network consisting of a series connection of image conversion unit 1 and task unit 13 (image conversion unit 1→task unit 13) Calculation can be performed in the reverse direction (from the task unit 13 to the image conversion unit 1) from the output side to the input side.

ステップS12では以上のようにしてタスク用更新部23において、逆伝播により画像変換部1（変換NW部11）のパラメータを更新することで、画像変換部1による変換で得られる画像がタスク部13による認識等のタスクの精度が向上するものとなることが期待される。ファインチューニングを行う場合はさらに、タスク部13のパラメータも更新することで、画像変換部1による変換で得られる画像に対するタスク部13による認識等の精度も向上することが期待される。 In step S12, the task update unit 23 updates the parameters of the image conversion unit 1 (conversion NW unit 11) by backpropagation as described above, so that the image obtained by the conversion by the image conversion unit 1 is updated to the task unit 23. It is expected that the accuracy of tasks such as recognition will improve. When performing fine tuning, the parameters of the task unit 13 are also updated, which is expected to improve the accuracy of recognition by the task unit 13 of images obtained by conversion by the image conversion unit 1.

ステップS13では学習が収束したか否かを判定し、収束していればステップS14へと進み、当該時点での画像変換部1の変換NW部11（及びタスク部13）のパラメータを最終的な学習結果として得たうえで図５のフローを終了し、収束していなければステップS11に戻ることにより、以上説明した通りの学習（ステップS11及びS12）がさらに継続して実施されることとなる。ステップS13での収束判定には例えば、訓練用画像とは別途のテスト用画像を用いることで、ステップS11と同様にプライバシー用更新部2が計算する誤差及びステップS12と同様にタスク用更新部23が計算する誤差を求めることで、学習モデルとしてのパラメータ精度を評価し、当該精度の向上（向上の履歴）が収束したか否かによって判定すればよい。単純に所定のエポック数等を収束条件としてもよい。 In step S13, it is determined whether the learning has converged or not. If it has converged, the process proceeds to step S14, and the parameters of the conversion NW unit 11 (and task unit 13) of the image conversion unit 1 at that point are changed to the final After obtaining the learning result, the flow in FIG. 5 is ended, and if it has not converged, the process returns to step S11, so that the learning described above (steps S11 and S12) is further continued. . For example, by using a test image separate from the training image for the convergence determination in step S13, the error calculated by the privacy update unit 2 as in step S11 and the error calculated by the task update unit 2 as in step S12 can be used. The accuracy of the parameters as a learning model can be evaluated by determining the error calculated by , and the determination can be made based on whether the improvement in accuracy (history of improvement) has converged. A predetermined number of epochs or the like may simply be used as the convergence condition.

以上、図５のフローによれば、ステップS11において、画像変換部1（変換NW部11）による変換で得られるプライバシー保護画像のプライバシー保護性能が向上するように、プライバシーに関する誤差を最小化する方向やGANの学習が進む方向へと重みパラメータを更新し、ステップS12において、画像変換部1（変換NW部11）による変換で得られるプライバシー保護画像に対するタスク精度が向上するように、タスクに関する誤差を最小化する方向へと画像変換部1の重みパラメータを更新することで、プライバシー保護に関する誤差やGANの誤差とタスク精度に関する誤差とについて交互に誤差を最小化(最適化)する敵対的学習の枠組みにより、画像変換部1の変換NW部11（及びタスク部13）のパラメータを学習することができる。タスク誤差を最小化する際には、タスク部13の重みパラメータも更新しても良い。図５のフローにおいて、ステップS11とS12とはその位置を入れ替えてもよい。また、図５のフローの繰り返しの各回において例えば、ステップS11の学習及び更新を1回行い、ステップS12の学習及び更新を2回行うものとして設定するといったように、プライバシー保護接続とタスク接続とで学習の割合を変えてもよい。 As described above, according to the flow of FIG. 5, in step S11, the direction is directed to minimize the privacy-related error so that the privacy protection performance of the privacy protection image obtained by the conversion by the image conversion unit 1 (conversion NW unit 11) is improved. In step S12, the error related to the task is updated so that the task accuracy for the privacy-protected image obtained by the conversion by the image conversion unit 1 (conversion NW unit 11) is improved. An adversarial learning framework that alternately minimizes (optimizes) errors related to privacy protection, GAN errors, and errors related to task accuracy by updating the weight parameters of image conversion unit 1 in the direction of minimizing them. Accordingly, the parameters of the conversion NW unit 11 (and task unit 13) of the image conversion unit 1 can be learned. When minimizing the task error, the weight parameters of the task unit 13 may also be updated. In the flow of FIG. 5, the positions of steps S11 and S12 may be interchanged. Furthermore, each time the flow in FIG. 5 is repeated, the learning and updating of step S11 is set to be performed once, and the learning and updating of step S12 is set to be performed twice. You may change the learning rate.

以下、詳細を後述するとしたステップS11でのプライバシー用更新部2によるプライバシー保護画像に対する誤差計算の各実施形態に関して説明するが、要約すると以下の（１）、（２）の通りである。 Hereinafter, each embodiment of the error calculation for the privacy protection image by the privacy update unit 2 in step S11, which will be described in detail later, will be described, but it will be summarized as follows (1) and (2).

（１）CH1における階調削減のための画像変換器学習について
原画像はカラーでもグレイスケール画像でも良い。ここでは、カラー画像を256階調を持つようにグレイスケール化し、そのグレイスケール画像と4階調程度の画像変換器の出力画像(CH1チャンネル)をMSE最小化やGAN等で比較して学習させたとする。そのようにすると、画像変換器の出力画像は、グレイスケール画像の階調を疑似的に保持しようとハーフトーン画像のような粒状ノイズが多くなる画像となってしまう。つまり、平坦なプライバシー画像は生成されにくい。そこで、CH1の学習で画像変換器出力画像と比較する画像は、256階調の様な階調の多いグレイスケール画像ではなく、グレイスケール画像を単純N値化等して平坦化した画像を用いる。N値は出力したい画像の２倍程度の階調(4階調出力の場合は8階調)が望ましい。このようにすることで、ハーフトーン画像のような粒状感を軽減させることができる。 (1) Image converter learning for tone reduction in CH1 The original image may be a color or grayscale image. Here, a color image is converted to grayscale to have 256 gradations, and the grayscale image and the output image of an image converter (CH1 channel) with about 4 gradations are compared using MSE minimization, GAN, etc. for learning. Suppose that In this case, the output image of the image converter becomes an image with a lot of granular noise, like a halftone image, in order to maintain the gradation of a grayscale image in a pseudo manner. In other words, a flat privacy image is difficult to generate. Therefore, the image to be compared with the image converter output image in CH1 learning is not a grayscale image with many gradations such as 256 gradations, but an image that is flattened by converting the grayscale image to simple N-values etc. . It is desirable that the N value be approximately twice the gradation of the image you want to output (8 gradations in the case of 4-gradation output). By doing so, it is possible to reduce the graininess that occurs in halftone images.

（２）CH2における輪郭表現のための画像変換器学習について
これは、原画から輪郭画像を生成し、その画像に近づけるように画像変換器を学習する。輪郭画像は通常のフィルタ処理(DoG, xDog,ラプラシアン,ソベル)等で良い。画像変換器の出力画像(CH2チャンネル)をGAN等用いて輪郭画像に近づけるように学習する。CH１のみの時より輪郭が出ることでキイポイント抽出精度が上がるが、階調に関するグラデーションやリアルさのプライバシーが増加することを防げると考えられる。 (2) Regarding image converter learning for contour expression in CH2 This generates a contour image from the original image and trains the image converter to approximate that image. The contour image may be subjected to normal filter processing (DoG, xDog, Laplacian, Sobel), etc. The output image (CH2 channel) of the image converter is learned to approximate the contour image using GAN, etc. The key point extraction accuracy is improved by the appearance of the outline compared to when only CH1 is used, but it is thought that it is possible to prevent the privacy of gradation and realism from increasing.

以下、各実施形態の詳細を説明する。いずれの実施形態も、図４の例EX2の画像に示したように、変換結果として得られるプライバシー保護画像は変換前の原画像（例EX1）に対して画素分布が平坦化されつつ輪郭が残った状態として得られることで、画像全体でのグラデーション等に対するプライバシー保護が実現されたものとなる。 Details of each embodiment will be described below. In either embodiment, as shown in the image of example EX2 in FIG. As a result, privacy protection for gradations and the like in the entire image can be achieved.

本実施形態では、図１に線L1,L2,L10として示される流れのように、画像変換部1で得たプライバシー保護画像（各チャンネルCH1,CH2の画像）に加えて、変換前の訓練用画像（原画像）も利用して誤差計算する。具体的な処理を示す。
In this embodiment, as shown in the flow shown as lines L1, L2, and L10 in FIG. The image (original image) is also used to calculate the error. Describes specific processing.

本実施形態の前提として、量子化を考慮した学習（Quantization Aware Training、以後、量子化考慮学習と呼ぶ。）を少なくとも画像変換部1の最終層に適用するので、まずこの量子化考慮学習に関して説明する。量子化考慮学習の場合、画像変換部1を構成するネットワークの出力層の値（すなわち、変換画像の各画素値）を得るための活性化関数として、または最終層へ適用されている活性化関数への追加の量子化関数として、この変換画像を構成する階調数（原画像よりも低い所定の階調数）の個数分の階段関数を用いる、という制約を設ける。すなわち、画像変換部1を構成するネットワークの最後に位置する量子化関数として、階調数分の階段関数を用いることにより、自ずと当該原画像よりも低い階調数で量子化して構成される、平坦化されたプライバシー保護画像を得ることが可能となる。 As a premise of this embodiment, quantization-aware training (hereinafter referred to as quantization-aware training) is applied to at least the final layer of the image conversion unit 1, so first, this quantization-aware training will be explained. do. In the case of quantization-aware learning, the activation function is used as an activation function to obtain the value of the output layer of the network that constitutes the image transformation unit 1 (i.e., each pixel value of the transformed image), or as an activation function that is applied to the final layer. A constraint is set that as many step functions as the number of gradations (a predetermined number of gradations lower than the original image) constituting this converted image are used as additional quantization functions. That is, by using a step function for the number of gradations as the quantization function located at the end of the network that constitutes the image conversion unit 1, the image is naturally quantized and constructed with a lower number of gradations than the original image. It becomes possible to obtain a flattened privacy-protected image.

なお、既に説明している通り、量子化関数の適用（学習時及び推論時の両方）は図１にて機能ブロックとして量子化部12として示される。 Note that, as already explained, the application of the quantization function (both during learning and during inference) is shown as the quantization unit 12 as a functional block in FIG.

ただし、順伝播の計算時では出力画像の各画素値を所定階調数内のいずれかの値に量子化して割り当てるために垂直なステップ状に立ち上がる階段関数を用いるが、逆伝播の計算時には微分時の勾配が0となってしまうことを防止するために、例えばSTE(straight through estimator)の手法を用いて、階段の立ち上がりを垂直なステップ状から一定勾配のなだらかな傾斜状へと変更した関数を利用する。 However, when calculating forward propagation, a step function that rises in vertical steps is used to quantize and assign each pixel value of the output image to one of the values within a predetermined number of gradations, but when calculating back propagation, a step function is used that rises in vertical steps. In order to prevent the slope from becoming 0, for example, a function that uses the STE (straight through estimator) method to change the rise of the stairs from a vertical step to a gentle slope with a constant slope. Take advantage of.

図６に、順伝播時の垂直ステップの階段関数と、これをなだらかにして0以外の一定勾配を有するようにした（すなわち、微分の勾配が消失しないように、階段関数の増加挙動を模したものとしての）逆伝播時の傾斜ステップ状の階段関数との例を示す。グラフG2F及びG2Bはそれぞれ、２段の垂直ステップの階段関数及び対応する傾斜ステップ状の階段関数を示すものである。グラフG2Fを用いる場合、画像変換部1では２値画像を出力することとなる。逆伝播時に用いる関数は、傾斜ステップ状の階段関数であるグラフG2Bに示すものに代えてSTE(Straight through estimator)によりグラフG2B2のように直線でも良い。グラフG3F及びG3Bはそれぞれ、３段の垂直ステップの階段関数及び対応する傾斜ステップ状の階段関数を示すものである。グラフG3Fを用いる場合、画像変換部1では３値画像を出力することとなる。逆伝播時に用いる関数は、傾斜ステップ状の階段関数であるグラフG3Bに示すものに代えてSTE(Straight through estimator)によりグラフG3B2のように直線でも良い。 Figure 6 shows the step function of the vertical step during forward propagation, and it is smoothed to have a constant slope other than 0 (in other words, the increasing behavior of the step function is simulated so that the slope of the differential does not disappear). An example with a gradient step-like step function during backpropagation (as an example) is shown. Graphs G2F and G2B show a step function with two vertical steps and a corresponding step function with a sloped step, respectively. When using the graph G2F, the image conversion unit 1 outputs a binary image. The function used at the time of backpropagation may be a straight line like graph G2B2 by STE (Straight through estimator) instead of the stepwise step function shown in graph G2B. Graphs G3F and G3B show a step function of three vertical steps and a corresponding step function of a sloped step, respectively. When graph G3F is used, the image conversion unit 1 outputs a ternary image. The function used during backpropagation may be a straight line as shown in graph G3B2 using STE (Straight through estimator) instead of the stepwise step function shown in graph G3B.

階段関数によって設定される画像変換部1の出力する階調数は、以下のような考察に基づいて例えば2階調以上64階調以下に設定するとよい。（2階調の場合は白黒2値となる） The number of gradations output by the image conversion unit 1, which is set by the step function, is preferably set to, for example, 2 to 64 gradations based on the following considerations. (In case of 2 gradations, it will be black and white binary)

まず、量子化考慮学習を利用しないとすると、通常では、重みや活性化関数に32ビットの浮動小数点を用いて計算することが多いが、RGBそれぞれ32ビットで出力された画素値のプライバシー画像を送信するのは、情報量の多さや一般的な画像フォーマットのビット深度が8ビット(256階調)であることから望ましくないと考えられる。また、それぞれのチャンネルで32ビットに出力後に、一般的な8ビットに256階調に量子化することも考えられるが、事後量子化では精度が低くなりやすい。よって、平坦化を行うプライバシー保護画像の生成においては、逆伝播でのSTEを用いた学習を用いることで、出力層が256階調以下となるように設定するとよい。また、人の視覚特性により64階調以上は連続階調として認識できるといわれている。平坦化されたプライバシー画像の画素値の一部のレンジが広くなるようにトーンカーブ補正すると、補正前には人が知覚できなかったグラデーションが表れてしまう恐れがある。それを防ぐために、64階調以下とすることがさらに望ましい。また、2階調である程度、前景と背景はわかるため、最低２階調あればよい。また、4色定理で知られるように、隣り合う領域を塗分けるのに4色あれば十分と言われることから、階調数は4値程度あれば平坦画風を表現しやすいと考えられる。まとめると、平坦化画像は2階調以上64階調以下とし、4階調程度が最も望ましいと考えられる。平坦化ではこのように少ない階調数を用いるため、事後量子化では非常に精度が低くなってしまいやすいが、逆伝播でSTEを用いることで、それを阻止することができる。 First, if quantization-aware learning is not used, calculations are usually performed using 32-bit floating points for weights and activation functions, but a privacy image of pixel values output with 32 bits for each RGB is used. It is considered undesirable to transmit this because the amount of information is large and the bit depth of the general image format is 8 bits (256 gradations). It is also possible to output 32 bits for each channel and then quantize it to a general 8 bits with 256 gradations, but post-quantization tends to result in lower accuracy. Therefore, when generating a privacy-protected image that performs flattening, it is preferable to use learning using STE in backpropagation so that the output layer has 256 gradations or less. Furthermore, it is said that 64 or more gradations can be recognized as continuous gradations due to human visual characteristics. If tone curve correction is performed to widen the range of some pixel values in a flattened privacy image, there is a risk that a gradation that was not perceptible to humans before correction may appear. To prevent this, it is more desirable to use 64 gradations or less. Furthermore, since the foreground and background can be seen to some extent with two gradations, at least two gradations are sufficient. Also, as known from the four-color theorem, it is said that four colors are sufficient to separate adjacent areas, so it is thought that a level number of about four levels would be sufficient to express a flat painting style. In summary, the flattened image should have 2 to 64 gradations, with about 4 gradations being the most desirable. Because flattening uses such a small number of gradations, post-quantization tends to result in extremely low accuracy, but this can be prevented by using STE in backpropagation.

なお、量子化考慮学習については、プライバシー保護画像を出力する出力層だけでなく、画像変換部1を構成するネットワーク全体で行う（すなわち、活性化関数を全て、所定のビット深度数に一致する階段数の階段関数に置き換えたネットワークを利用する）ようにしてもよい。（この場合、図１では、変換NW部11の内部構成として量子化部12が含まれるものとして読み替えるようにすればよい。）例えば8ビット整数（INT８）等を用いて画像変換部1を学習し、最終層の活性化関数または追加の量子化関数で4値化する等しても良い。この場合、画像変換部1のモデルの容量が小さくなり、計算量も削減することができる。モバイル機器でもリアルタイム動作しやすい等の利点がある。 Note that quantization-aware learning is performed not only on the output layer that outputs the privacy-protected image, but also on the entire network that makes up the image conversion unit 1 (that is, all activation functions are It is also possible to use a network replaced with a step function of numbers. (In this case, in FIG. 1, the internal configuration of the conversion NW unit 11 may be interpreted as including the quantization unit 12.) For example, the image conversion unit 1 is trained using an 8-bit integer (INT8), etc. However, it is also possible to perform quaternization using an activation function or an additional quantization function in the final layer. In this case, the capacity of the model of the image conversion unit 1 is reduced, and the amount of calculation can also be reduced. It has the advantage of being easy to operate in real time even on mobile devices.

なお、タスク部13側の学習（すなわち、ステップS12の学習でファインチューニングによりタスク部13も学習する場合）については、量子化考慮学習を行う（すなわち、タスク部13の活性化関数にも階段関数を設定する）ようにしてもよいし、量子化考慮学習は行わないようにしてもよい。ここでは量子化考慮学習の効果は、一般的な効果と同様で、単純に32ビット小数で学習したモデルを8ビットモデルに変換する際などに精度が低くなりにくいことである。 For learning on the task part 13 side (i.e., when the task part 13 is also learned by fine tuning in the learning in step S12), quantization-considered learning is performed (i.e., the activation function of the task part 13 also uses a step function. ), or quantization-aware learning may not be performed. Here, the effect of quantization-aware learning is similar to the general effect, and is that accuracy is less likely to decrease when converting a model trained with 32-bit decimals to an 8-bit model.

ここで、画像変換部１において出力層での量子化を考慮した学習において、階調数を少なくする（例えば前述のように６４階調以下とする）ことの意味について述べる。一般的には、出力層から出力される各ノードの出力値（画素値）は、256値で表現される。平坦化によるプライバシー保護を考慮したとき、単純には、この出力値の階調数を少なくすることでも実現できる。例えば、出力値をグレイスケール2階調とすると、階調数が少ないため、平坦化されやすい。しかし、タスク精度はかなり低くなりやすい。一方、出力値を4階調、8階調と増やすにつれて、プライバシーは漏洩しやすくなるが、タスク精度も保ちやすい特徴がある。すなわち、プライバシー保護とタスク精度とはトレードオフの関係にある。 Here, the meaning of reducing the number of gradations (for example, 64 gradations or less as described above) in learning that takes into account quantization in the output layer in the image conversion unit 1 will be described. Generally, the output value (pixel value) of each node output from the output layer is expressed by 256 values. When considering privacy protection through flattening, this can be achieved simply by reducing the number of gradations of this output value. For example, if the output value is two gray scales, the number of gray scales is small, so it is likely to be flattened. However, task accuracy tends to be quite low. On the other hand, as the output value increases from 4 to 8 gradations, privacy is more likely to be leaked, but task accuracy is also easily maintained. In other words, there is a trade-off relationship between privacy protection and task accuracy.

以下、詳細を後述するとしていた、画像変換部1での２つのチャンネルCH1,CH2を利用した処理に関して説明する。 Hereinafter, the processing using the two channels CH1 and CH2 in the image conversion unit 1, which will be described in detail later, will be explained.

画像変換部1は前述したとおり、画像を変換するDNNネットワーク等で構成できる変換NW部11とその後適用される量子化部12からなるものとして構成できる。プライバシー用更新部2は、色を表現できるRGBの様な３つの出力チャンネルではなく、階調表現用のチャンネル（CH1）と輪郭表現用のチャンネル（CH2）を持ち、当該各チャンネルの処理を階調更新部21及び輪郭更新部22が担う。ここで、色チャンネルを持たないことで明示的に色情報を除去している。顔器官検出や姿勢推定などのキイポイント抽出では、多くの階調や色よりも、最低限の階調や解像度を落とさない輪郭情報が重視されやすい項目であるため、この２チャンネルとしている。人を含めた物体検出にも有効である。 As described above, the image conversion unit 1 can be configured to include a conversion NW unit 11 that can be configured with a DNN network or the like that converts an image, and a quantization unit 12 that is applied thereafter. The privacy update unit 2 does not have three output channels like RGB that can express colors, but has a channel for expressing gradation (CH1) and a channel for expressing contours (CH2), and the processing of each channel is The key updating section 21 and the contour updating section 22 are responsible for this. Here, color information is explicitly removed by not having a color channel. When extracting key points such as facial organ detection and pose estimation, these two channels are used because the minimum gradation and contour information that does not reduce resolution are more important than many gradations and colors. It is also effective in detecting objects including people.

チャンネルCH1については通常のRチャンネルを、チャンネルCH2については通常のGチャンネルをあて、Bチャンネルは全ての画素で最大値や、RやGと同値とするなどすれば、通常の画像フォーマットで画像変換部1が出力した変換画像を人が視認することができる。YCbCr等の色空間であれば、CH1を輝度Yチャンネル、CH2をCbやCrにあてるなどしても良い。あるいは、このように通常の色空間を割り当てたうえでさらに、グレイスケール化して１チャンネルとしてもよい。 If you apply the normal R channel to channel CH1, the normal G channel to channel CH2, and set the B channel to the maximum value for all pixels or the same value as R and G, you can convert the image in the normal image format. A person can visually recognize the converted image output by the unit 1. If it is a color space such as YCbCr, CH1 may be assigned to the luminance Y channel, and CH2 may be assigned to Cb or Cr. Alternatively, after allocating the normal color space in this way, it may be further converted to gray scale to form one channel.

画像変換部1では、階調表現用チャンネルCH1(階調チャンネルとも呼ぶ)と輪郭表現用チャンネルCH2(輪郭チャンネルとも呼ぶ)を出力する。階調更新部21はこのうち階調表現用チャンネルCH1の画像が入力され、変換画像のCH1画像とのみ接続して画像変換部1を学習する。輪郭更新部22はこのうちの輪郭表現用チャンネルCH2の画像が入力され、変換画像のCH2画像とのみ接続して画像変換部1を学習する。前述した図５のステップS11において、階調更新部21と輪郭更新部22はそれぞれ1回ずつ、または、割合を決めて、画像変換部1に誤差逆伝播を行い学習する。量子化関数は最後に追加適用するのが望ましいが、その場合、CH1、CH2共通の量子化関数で量子化しても、別の量子化関数で量子化してもよい。たとえば、CH1,CH2ともに4値化しても良いし、異なる量子化を行ってもよい。 The image conversion unit 1 outputs a tone expression channel CH1 (also called a tone channel) and a contour expression channel CH2 (also called a contour channel). The gradation updating unit 21 receives the image of the gradation expression channel CH1 as input, and learns the image converting unit 1 by connecting only the CH1 image of the converted image. The contour update section 22 receives the image of the contour expression channel CH2 as input, and learns the image conversion section 1 by connecting only the CH2 image of the converted image. In step S11 of FIG. 5 described above, the gradation updating section 21 and the contour updating section 22 each perform learning by backpropagating the error to the image converting section 1 once or at a determined rate. It is desirable to additionally apply the quantization function at the end, but in that case, CH1 and CH2 may be quantized with a common quantization function, or may be quantized with a different quantization function. For example, both CH1 and CH2 may be 4-valued, or different quantization may be performed.

階調更新部21の役割について述べる。ここでは、変換画像の階調チャンネルCH1画像を正解としての参考平坦化画像Tyに近づけるように学習するため、以下の第１又は第２の方法を用いることができる。 The role of the gradation updating unit 21 will be described. Here, in order to learn to bring the gradation channel CH1 image of the converted image closer to the reference flattened image Ty as the correct answer, the following first or second method can be used.

第１の方法は、画像変換部1が出力した変換画像（CH1）と、この変換画像の原画像である訓練用画像と、を読み込み、加工器により訓練用画像から参考平坦化画像Tyを生成したうえで、参考平坦化画像Tyと変換画像との画像の相違を定量化したものとして例えばMSEを最小化するように、誤差を逆伝搬して画像変換部1を学習させ、そのパラメータを更新する。 The first method is to read the converted image (CH1) output by the image conversion unit 1 and a training image that is the original image of this converted image, and use a processing device to generate a reference flattened image Ty from the training image. After that, the image transformation unit 1 is trained by back propagating the error to minimize the MSE, which is a quantification of the image difference between the reference flattened image Ty and the transformed image, and its parameters are updated. do.

第２の方法は、画像変換部1が出力した変換画像（CH1）と、この変換画像の原画像である訓練用画像と、を読み込み、加工器により訓練用画像から参考平坦化画像Tyを生成したうえで、参考平坦化画像Ty(ターゲット画像)をリアル画像、変換画像をフェイク画像としてGANのフレームワークにおける識別器と生成器（画像変換部1）を学習し、そのパラメータを更新する。（前述の通り、加工器及び識別器は階調更新部21に備わるものとして、階調更新部21において識別器を学習するようにすればよい。） The second method is to read the converted image (CH1) output by the image conversion unit 1 and a training image that is the original image of this converted image, and use a processing device to generate a reference flattened image Ty from the training image. Then, the classifier and generator (image conversion unit 1) in the GAN framework are trained using the reference flattened image Ty (target image) as a real image and the converted image as a fake image, and their parameters are updated. (As mentioned above, the processing device and the discriminator are provided in the gradation updating section 21, and the discriminator may be learned in the gradation updating section 21.)

加工器に関して、第１の方法、第２の方法ともに、参考平坦化画像Tyの階調数が大きすぎると、変換画像の階調チャンネルCH1が平坦化されにくく疑似中間調表現（ハーフトーン画像）の様なノイズが多くなりやすいため、少な目の階調数で単純にN値化して生成する等して平坦な画像を生成するとよい。一方で、N値をCH1が出力する階調数と同じやそれ未満とすると、階調の制約が強くなるためか、平坦画像が壊れ領域が分かりにくい、タスク精度を維持しにくい等の問題が発生する。よって、N値は、CH１が出力する階調の2倍程度が望ましく、３倍までに抑えるとよい。CH１が出力する階調数が４であれば、参考平坦化画像Tyは単純8値化する等で生成する。このとき、参考平坦化画像は8値であるが、その階調を4値で疑似的に表現させるためにあるのではなく、CH1とCH2で異なる画風を出力するため、また、CH1で平坦化した画風を表現するために用いている。 Regarding the processing equipment, in both the first method and the second method, if the number of gradations in the reference flattened image Ty is too large, the gradation channel CH1 of the converted image is difficult to flatten, resulting in pseudo-halftone expression (halftone image). Since noise such as , etc. tends to increase, it is better to generate a flat image by simply converting it into N-values with a smaller number of gradations. On the other hand, if the N value is set to be the same as or less than the number of gradations output by CH1, problems such as flat images are broken, areas are difficult to distinguish, and task accuracy is difficult to maintain, probably due to stronger gradation constraints. Occur. Therefore, the N value is desirably about twice the gradation level output by CH1, and is preferably suppressed to three times or less. If the number of gradations output by CH1 is 4, the reference flattened image Ty is generated by simple 8-value conversion or the like. At this time, the reference flattened image is 8-valued, but it is not intended to pseudo-express the gradation with 4-valued values, but to output different styles in CH1 and CH2. It is used to express a particular style of painting.

ここで、参考平坦化画像Tyは、既存手法の領域分割を適用して領域を識別したうえで、CH1が出力する階調数と同じ程度の数で領域を塗分けた平坦な画像として生成してもよい。またTyは動的に閾値を決める判別分析法等を用いてN値画像を生成しても良い。（ただし、単純N値画像と比較して変換画像を動画化した場合のフリッカーは起きやすくなる。）また、CH1は、計算量の観点から、第１の方法で問題はない。 Here, the reference flattened image Ty is generated as a flat image in which the regions are identified by applying the existing method of region segmentation, and then the regions are colored with approximately the same number of gradations as the number of gradations output by CH1. You can. Further, Ty may generate an N-value image using a discriminant analysis method or the like that dynamically determines a threshold value. (However, flickering is more likely to occur when a converted image is animated than a simple N-value image.) Also, for CH1, there is no problem with the first method from the viewpoint of the amount of calculation.

以上のような各方法により、画像変換部1による変換画像の階調チャンネルCH1が、参考平坦化画像Tyに類似する平坦なものとなるよう学習が進行し、変換画像においてプライバシー保護を実現することができる。 Through each of the above methods, learning progresses so that the gradation channel CH1 of the converted image by the image conversion unit 1 becomes flat similar to the reference flattened image Ty, and privacy protection is realized in the converted image. Can be done.

次いで、輪郭更新部22の役割について述べる。ここでは、変換画像の輪郭チャンネルCH2画像を正解としての参考輪郭画像Teに近づけるように学習するため、以下の第１又は第２の方法を用いることができる。 Next, the role of the contour updating section 22 will be described. Here, in order to learn to bring the contour channel CH2 image of the converted image closer to the reference contour image Te as the correct answer, the following first or second method can be used.

第１の方法は、画像変換部1が出力した変換画像（CH2）と、この変換画像の原画像である訓練用画像と、を読み込み、加工器により訓練用画像から参考輪郭画像Teを生成したうえで、参考輪郭画像Teと変換画像との画像の相違を定量化したものとして例えばMSEを最小化するように、誤差を逆伝搬して画像変換部1を学習させ、そのパラメータを更新する。 The first method is to read the converted image (CH2) output by the image converter 1 and the training image that is the original image of this converted image, and generate a reference contour image Te from the training image using a processing device. Then, the error is back-propagated to make the image conversion unit 1 learn, and its parameters are updated so as to minimize the MSE, which is a quantification of the image difference between the reference contour image Te and the converted image.

第２の方法は、画像変換部1が出力した変換画像（CH2）と、この変換画像の原画像である訓練用画像と、を読み込み、加工器により訓練用画像から参考輪郭画像Teを生成したうえで、参考輪郭画像Te(ターゲット画像)をリアル画像、変換画像をフェイク画像としてGANのフレームワークにおける識別器と生成器（画像変換部1）を学習し、そのパラメータを更新する。（前述の通り、加工器及び識別器は輪郭更新部22に備わるものとして、輪郭更新部22において識別器を学習するようにすればよい。） The second method is to read the converted image (CH2) output by the image converter 1 and the training image that is the original image of this converted image, and generate a reference contour image Te from the training image using a processing device. Then, the classifier and generator (image conversion unit 1) in the GAN framework are trained using the reference contour image Te (target image) as a real image and the converted image as a fake image, and their parameters are updated. (As mentioned above, the processing device and the discriminator are provided in the contour updating section 22, and the discriminator may be learned in the contour updating section 22.)

加工器に関して、第１の方法、第２の方法ともに、参考輪郭画像Teは通常の輪郭抽出フィルタで作成することができる。例えば、ラプラシアン、ソベル、Canny、DoG(Difference of Gaussian)、拡張DoGフィルタなどがある。通常エッジ情報は0を中心に正数も負数も含むため、通常の視覚化と同じ方法で視覚化するとよい。例えば、値の絶対値をとり、線ではなく背景が白になるように(反転)処理しておくとよい。または、0が中間階調にくるように画像がとりえる値の範囲（0~1や0から255など）に正規化してもよい。さらに、減色しておいてもよい。 Regarding the processing tool, in both the first method and the second method, the reference contour image Te can be created using a normal contour extraction filter. For example, there are Laplacian, Sobel, Canny, DoG (Difference of Gaussian), and extended DoG filters. Edge information usually includes both positive and negative numbers around 0, so it is best to visualize it using the same method as normal visualization. For example, it is a good idea to take the absolute value of the value and process it so that the background is white instead of the line (inversion). Alternatively, it may be normalized to the range of values that the image can take (0 to 1, 0 to 255, etc.) so that 0 is in the middle gradation. Furthermore, the colors may be reduced.

以上のような各方法により、画像変換部1による変換画像の輪郭チャンネルCH2が、参考輪郭画像Teに類似するものとなるよう学習が進行し、変換画像において重要情報のみを残したプライバシー保護を実現することができる。 Through each of the above methods, learning progresses so that the contour channel CH2 of the converted image by the image conversion unit 1 becomes similar to the reference contour image Te, and privacy protection is achieved by leaving only important information in the converted image. can do.

なお、輪郭画像は情報が少なく、第１の方法では画像変換部1のCH2で必ずしも輪郭を表現できない場合がありうるが、この場合は、第２の方法を用いるようにすればよい。 Note that the contour image has little information, and the first method may not necessarily represent the contour in CH2 of the image conversion unit 1. In this case, the second method may be used.

以上のように、平坦化・輪郭抽出した参考画像Ty,Teの画風(平坦・輪郭)に近づけるなどでプライバシー画像用更新部2において誤差を計算し、図５のフローに沿って学習することで、画像変換部1による変換画像を平坦化によるプライバシー保護が実現され、且つ、タスク精度を保つものとすることができる。さらに、以下の例１，２の処理を行ってもよい。 As described above, the privacy image update unit 2 calculates the error by approaching the drawing style (flatness/outline) of the flattened/outline extracted reference images Ty, Te, and learns according to the flow shown in Figure 5. , privacy protection can be realized by flattening the converted image by the image conversion unit 1, and task accuracy can be maintained. Furthermore, the following processing examples 1 and 2 may be performed.

（例１）なお、タスク部13については、訓練用画像を学習済みの画像変換部1で変換した後、スクラッチからの再学習や、追加での学習をさせても良い。例えば、タスクが間違えやすい教師付き訓練用画像を追加で学習するなどにより、タスク精度を高められる。その際、画像変換部1からの出力、特にその階調チャンネルについては、学習の態様によっては粒状ノイズが生じていることもあるので、ノイズ除去処理をした後に、タスク部13を学習させても良い。ノイズ除去処理は例えば、膨張収縮処理があり、画像変換部1からの出力として量子化部12で量子化された状態においてノイズ除去すればよい。これによりさらにプライバシー保護度を高めることができる。 (Example 1) Note that the task unit 13 may perform re-learning from scratch or additional learning after the training image is converted by the trained image converting unit 1. For example, task accuracy can be improved by learning additional supervised training images that are prone to mistakes. At this time, the output from the image conversion unit 1, especially its gradation channels, may contain granular noise depending on the learning mode, so even if the task unit 13 is trained after performing noise removal processing. good. The noise removal process includes, for example, expansion/contraction processing, and the noise may be removed after the output from the image conversion unit 1 is quantized by the quantization unit 12. This allows the degree of privacy protection to be further increased.

（例２）一方、変換画像を、学習時に用いたタスクとは別のタスクへ適用しても良い。例えば、タスク部13ははじめ、体の骨格推定が割り当てられ、画像変換部1を学習させるケースを考える。画像変換部1の学習がある程度収束すると、画像変換部1の出力画像は、不要な情報（色・詳細なグラデーション）は削るという条件の中、体の骨格推定に重要な情報(輪郭・領域分割がわかる程度の階調)はタスク精度を可能な限り下げない状態で残る状態となる。この状態は、実は、骨格以外の物体検出タスクなどでも同じとなりやすい。よって、物体検出用訓練用画像を、骨格推定タスクと接続して学習済みの画像変換器で変換した後、物体検出器を学習させると、物体検出タスクもプライバシー保護された画像に対してそれなりの精度を保つことができる。この場合の物体検出器のような別タスクの学習は、学習済みモデルからのチューニングでもよいし、スクラッチからの学習でも良い。 (Example 2) On the other hand, the converted image may be applied to a task different from the task used during learning. For example, consider a case where the task unit 13 is initially assigned body skeleton estimation and causes the image conversion unit 1 to learn. Once the learning of the image conversion unit 1 converges to a certain extent, the output image of the image conversion unit 1 will contain information important for body skeleton estimation (contours and region segmentation), with the condition that unnecessary information (colors and detailed gradations) will be removed. (grayscale that is easy to understand) remains in a state where the task accuracy is not degraded as much as possible. In fact, this situation is likely to be the same in object detection tasks other than skeletons. Therefore, if you connect the training image for object detection with the skeleton estimation task and transform it using a trained image converter, and then train the object detector, the object detection task will also perform reasonably well on privacy-protected images. Accuracy can be maintained. In this case, learning for another task such as an object detector may be performed by tuning from a trained model or by learning from scratch.

すなわち、以下の手順１，２のような学習を行うようにしてもよい。この際、図１の構成の学習装置30において、手順１ではタスク部13が第１タスクに関するタスク部13-1であるものとし、手順２ではタスク部13が第２タスクに関するタスク部13-2であるものとして、（手順１、２で利用するタスク部13を互いに異なるタスク部13-1,13-2に置き換えることで）学習を行うことができる。
＜手順１＞図５のフローに沿って、画像変換部1及び第１のタスク（例えば骨格推定タスク）に関するタスク部13-1のパラメータを学習する。
＜手順２＞手順１で学習済みの固定されたパラメータで構成される画像変換部1を用いて、第１のタスクとは異なる第２のタスク（例えば物体検出タスク）に関するタスク部13-2のパラメータを学習する。当該学習は、図５のフローチャートからステップS11を省略したフローに沿って、ステップS12ではタスク部13-2の学習のみ行うものとすればよい。 That is, learning may be performed in steps 1 and 2 below. At this time, in the learning device 30 having the configuration shown in FIG. 1, it is assumed that in step 1, the task unit 13 is the task unit 13-1 related to the first task, and in step 2, the task unit 13 is the task unit 13-2 related to the second task. Assuming that, learning can be performed (by replacing the task part 13 used in steps 1 and 2 with mutually different task parts 13-1 and 13-2).
<Procedure 1> Parameters of the image conversion unit 1 and the task unit 13-1 regarding the first task (eg, skeleton estimation task) are learned according to the flow shown in FIG.
<Procedure 2> Using the image conversion unit 1 configured with the fixed parameters learned in Step 1, the task unit 13-2 for a second task different from the first task (for example, an object detection task) is Learn parameters. The learning may be performed by following the flowchart of FIG. 5 with step S11 omitted, and only learning the task portion 13-2 is performed in step S12.

以上の例１や例２、図５のステップS12において学習時にタスクも更新して学習したモデルのように、変換画像に対して精度をあげるように学習されたタスクのモデルは、転移学習する場合でも、同様の画像変換部1を用いて訓練用画像を変換してから学習を行うと良い。このとき、入力枚数を少なくしても、タスクの精度を向上させやすい。これは、学習済みの画像変換部1は、学習時に不要な情報を削減しやすいためだと考えられる。 When performing transfer learning on a task model that has been trained to increase accuracy with respect to the converted image, such as the model that was learned by updating the task during learning in Example 1 and Example 2 above and step S12 in Figure 5, However, it is better to perform learning after converting the training images using a similar image conversion unit 1. At this time, even if the number of input sheets is reduced, it is easy to improve the accuracy of the task. This is considered to be because the trained image conversion unit 1 can easily reduce unnecessary information during learning.

別タスクをスクラッチから学習する場合の訓練用画像は、変換しないフルカラーRGB画像を用いた場合より、入力枚数を少なくしても、タスクの精度を向上させやすい。これは、学習済みの画像変換部１は、学習時（及び推論時）に不要な情報を削減しやすいためだと考えられる。つまり、別タスクのスクラッチからの学習においても、学習済みの画像変換部１を用いることで、様々な背景で訓練用画像を撮影するといった大量の訓練用画像取得の手間も省くことができる。 When learning another task from scratch, it is easier to improve the accuracy of the task even if the number of training images is reduced than when using unconverted full-color RGB images. This is considered to be because the trained image conversion unit 1 can easily reduce unnecessary information during learning (and inference). In other words, even in learning from scratch for another task, by using the trained image conversion unit 1, it is possible to save the effort of acquiring a large amount of training images such as photographing training images with various backgrounds.

すなわち、画像変換部1において変換画像として得られるプライバシー保護画像は、その好適な用途の一例としてプライバシー保護画像と呼べるものであるが、より一般には、各種のタスクに対する精度を保ったうえで原画像から情報削減された情報削減画像と呼べるものである。（以下でもこのことを前提として、一般には情報削減画像である画像変換部1の出力画像を、用途例による呼称としてプライバシー保護画像と呼ぶこととする。） In other words, the privacy-protected image obtained as a converted image in the image conversion unit 1 can be called a privacy-protected image as an example of its suitable use, but more generally, the privacy-protected image obtained as a converted image in the image conversion unit 1 can be used to convert the original image after maintaining accuracy for various tasks. This can be called an information-reduced image in which information is reduced from the image. (In the following, based on this premise, the output image of the image conversion unit 1, which is generally an information-reduced image, will be referred to as a privacy-protected image depending on the usage example.)

同様に、学習の際にプライバシー用更新部2で評価した誤差は、プライバシー保護度合いに基づく誤差であったが、より一般には情報削減度合いに基づく誤差と呼ぶことも可能なものである。また同様に、ユーザ設定等を受けて量子化部12で設定される量子化関数も、プライバシー保護度合いを調整するものであるが、より一般には、情報削減度合いを調整するものと呼ぶことも可能である。 Similarly, the error evaluated by the privacy updating unit 2 during learning was an error based on the degree of privacy protection, but more generally, it can also be called an error based on the degree of information reduction. Similarly, the quantization function set by the quantization unit 12 in response to user settings etc. adjusts the degree of privacy protection, but more generally, it can also be called a function that adjusts the degree of information reduction. It is.

以下では、量子化部12の詳細に関して説明する。 The details of the quantization unit 12 will be explained below.

最終的に適用する量子化関数は、ユーザの求めるプライバシーレベルに応じて切り替えることができる。例えば、2値化用~32値化用の階段状の量子化関数を用意しておき、64値以内で出力された変換画像を、ユーザが選んだ量子化レベルで量子化することも可能である。量子化の粒度が高いほど、プライバシーは漏洩するが、タスク精度は高くなりやすい。 The quantization function that is finally applied can be switched depending on the privacy level desired by the user. For example, it is possible to prepare step-like quantization functions for binarization to 32-value conversion, and then quantize converted images output within 64 values at the quantization level selected by the user. be. The higher the granularity of quantization, the more privacy is leaked, but the task accuracy tends to be higher.

量子化の粒度は異なっていても、誤差逆伝播の時に用いる関数が(STEの結果)同じであれば、階段関数を切り替えて学習しなくとも、学習していない階段関数の切り替えに対してもそれなりの精度は保ちやすいが、量子化粒度が変わっても、タスク精度を更に保てるようにするために、階調更新部21や輪郭更新部22による画像変換部の学習時で、離れた値の量子化関数を交互に用いて学習するとさらに良い。 Even if the quantization granularity is different, if the function used during error backpropagation is the same (STE result), there is no need to switch step functions for learning, and even for switching unlearned step functions. It is easy to maintain a certain degree of accuracy, but in order to further maintain task accuracy even when the quantization granularity changes, when the image conversion unit is trained by the gradation update unit 21 and contour update unit 22, it is necessary to It is even better to learn by using quantization functions alternately.

図７は、上記のように用いる各種の量子化関数等の例を示す図であり、以下において適宜、図７の例を参照する。図７の各例はいずれも、横軸（x軸）を量子化前の値（0≦x≦1）、縦軸（y軸）を量子化後の値（0≦y≦1）として、量子化関数やそのベース関数（量子化関数によって模擬される対象としての連続関数）を示している。なお、丸括弧（）で囲まれた２つの例EXa0,EXd0がベース関数であり、その他の例は量子化関数となっている。例EXa0のベース関数はy=xの直線増加関数であり、これを均一に模擬した4値、8値の量子化関数が例EXa,EXbであり、不均一に模擬した8値、6値の量子化関数が例EXc, EXeである。また、例EXd0のベース関数はシグモイド曲線状の増加関数であり、これを模擬した4値の量子化関数が例EXdである。 FIG. 7 is a diagram showing examples of various quantization functions used as described above, and the example of FIG. 7 will be referred to as appropriate below. In each example in Figure 7, the horizontal axis (x-axis) is the value before quantization (0≦x≦1), and the vertical axis (y-axis) is the value after quantization (0≦y≦1). It shows the quantization function and its base function (a continuous function that is simulated by the quantization function). Note that the two examples EXa0 and EXd0 enclosed in parentheses () are base functions, and the other examples are quantization functions. The base function of example EXa0 is a linearly increasing function of y=x, and examples EXa,EXb are 4-value and 8-value quantization functions that uniformly simulate this, and Examples of quantization functions are EXc and EXe. Further, the base function of example EXd0 is a sigmoid curve-like increasing function, and example EXd is a four-value quantization function that simulates this.

例えば、偶数エポックではCH1、CH2を４値化の量子化関数で学習させ、奇数エポックでは8（～32）値化の階段関数で学習させる等である。４値と8値で交互学習する場合の最も単純なCH1の学習方法（MSE最小化利用）は、偶数エポックでは単純8値化でTyを生成し、画像変換の最後に適用する階段関数に例EXa（4値の直線型）、奇数エポックでは単純16値化でTyを生成し、画像変換の最後に適用する階段関数にEXb（8値の直線型）を適用して学習させることである。双方とも、例EXa0ベース関数(直線型；y（量子化後の値）=x（量子化前の値）)を誤差逆伝播時には用いて勾配が0になることを避け、逆伝播時の関数が同じとなるようにすると良い。（なお、切り替え学習の場合には、逆伝播時に図６のG2BやG3Bの様な傾斜ステップ状の関数を用いることは望ましくない。） For example, in even-numbered epochs, CH1 and CH2 are trained using a 4-value quantization function, and in odd-numbered epochs, they are trained using an 8-value step function. The simplest CH1 learning method (using MSE minimization) when learning alternately with 4-value and 8-value is to generate Ty by simple 8-value conversion in even-numbered epochs, and apply it to the step function at the end of image transformation. EXa (4-value linear type), Ty is generated by simple 16-value conversion at odd epochs, and EXb (8-value linear type) is applied to the step function applied at the end of image conversion for learning. In both cases, the example EXa0 base function (linear type; y (value after quantization) = x (value before quantization)) is used during error backpropagation to avoid the gradient becoming 0, and the function during backpropagation is It is better to make them the same. (In the case of switching learning, it is not desirable to use gradient step functions like G2B and G3B in Figure 6 during backpropagation.)

上記の階段関数切り替え学習は例EXa,EXb（4値、8値の直線型）のように量子化前の値も量子化後の階調値も等間隔であることが前提であったが、例EXc（8値の不均一直線型）の様に等間隔でない階段関数も使いうる。 The step function switching learning described above was based on the assumption that the values before quantization and the gradation values after quantization were equally spaced, as in the example EXa and EXb (4-value, 8-value linear type). You can also use step functions that are not equally spaced, such as EXc (8-value non-uniform linear type).

Tyを例EXc（8値の不均一直線型）のように量子化前（入力）の中間調は粗く量子化して生成すると、中間階調のプライバシーを保ちやすい。例EXc（8値の不均一直線型）は例EXa（4値の直線型）と同様に、逆伝播時に用いる関数として例EXa0（直線増加関数）を用いてもよいため、不整合も起こりにくい。例えば、これら例EXa,EXc（4値直線型、8値不均一直線型）を切り替えて学習することで、量子化後の粒度を4値→8値などに増加させても、中間階調のプライバシーは漏洩しにくい。一方で、タスク誤差のフィードバックとのバランスもあり、タスク解析に最小限必要な中間階調の変化等は、輪郭チャンネルや元の中間階調からずれた低い階調や高い階調で表現されやすい。（この時、出力側の階調差は人が明確に境界を視認できるように、ダイナミックレンジを64等分した値以上に離す必要がある。） If Ty is generated by coarsely quantizing the intermediate tones before quantization (input), as in EXc (8-value non-uniform linear type), it is easier to maintain the privacy of the intermediate tones. Example EXc (8-value non-uniform linear type) can use Example EXa0 (linear increasing function) as a function during backpropagation, just like Example EXa (4-value linear type), so inconsistency is less likely to occur. . For example, by switching between these examples EXa and EXc (4-value linear type, 8-value non-uniform linear type) and learning, even if the granularity after quantization is increased from 4 values to 8 values, the intermediate gradation Privacy is difficult to leak. On the other hand, there is also a balance with task error feedback, and changes in intermediate gradations, etc. that are minimally necessary for task analysis, are likely to be expressed by contour channels and low or high gradations that are shifted from the original intermediate gradation. . (At this time, the gradation difference on the output side needs to be separated by at least a value that divides the dynamic range into 64 equal parts so that people can clearly see the boundary.)

一方で、Ty生成や学習・推論に使う階段関数の形状のベースを直線形状から変えることで、プライバシー漏洩度やタスク精度も変化させることができる。例えば、自然画像は中間階調に分布が偏るため、Tyを例EXd0（シグモイド曲線型）のようにコントラストを強調するトーンカーブで補正した後や（分布の平坦化を行った後）に例EXa0（直線増加型）をベースにした量子化を用いてTyを生成すると、量子化考慮学習による情報削除が起こりにくく、タスク精度を保ちやすい。 On the other hand, by changing the base shape of the step function used for Ty generation, learning, and inference from a linear shape, the degree of privacy leakage and task accuracy can also be changed. For example, since the distribution of natural images is biased toward intermediate tones, after correcting Ty with a tone curve that emphasizes contrast like EXd0 (sigmoid curve type) or after flattening the distribution (after flattening the distribution), EXa0 When Ty is generated using quantization based on (linear increasing type), information deletion due to quantization-aware learning is less likely to occur, making it easier to maintain task accuracy.

利用（推論）する時は、切り替え学習に用いた量子化関数だけでなく、その中間やそれより粒度の高い量子化関数もある程度問題なく利用でき、プライバシー漏洩度を調整できる。例えば、4値と8値の量子化関数で学習すると、6値の量子化関数ではその中間的なプライバシー漏洩度やタスク精度を示しやすく、16値ではプライバシー漏洩度は増す代わりにタスク精度が高くなりやすい。（ただし、量子化の粒度を粗くする場合にはその精度を保ちにくくなる傾向があるため、4値前後での切り替え関数の学習はしておくと良い。）切り替え学習に用いた量子化関数の中間的な量子化関数は、可能な限り切り替え学習の関数と一致させることで、タスク精度を保ちやすくなると考えられる。 When using (inference), not only the quantization function used for switching learning, but also intermediate and higher-granularity quantization functions can be used without any problem to some extent, and the degree of privacy leakage can be adjusted. For example, when learning with 4-value and 8-value quantization functions, a 6-value quantization function tends to show intermediate privacy leakage and task accuracy, and a 16-value quantization function tends to show higher task accuracy at the cost of increased privacy leakage. Prone. (However, if the granularity of quantization is coarsened, it tends to be difficult to maintain accuracy, so it is a good idea to learn the switching function around 4 values.) The quantization function used for switching learning It is thought that task accuracy can be easily maintained by making the intermediate quantization function match the switching learning function as much as possible.

例えば、4値と8値の量子化関数の切り替え学習を、それぞれ例EXa,EXb（4値直線型、8値直線型）で行った場合、その中間的な6値の量子化関数を例EXe（6値不均一直線型）の様にするとよい。例EXe（6値不均一直線型としてグラフが6個のステップを有する）では、量子化前の値(０~0.25、0.75~1、すなわち、６個のステップのうち１，２，５，６番目のステップ)においては例EXb（8値直線型）と一致し、量子化前の値(0.25~0.75、すなわち、６個のステップのうち３，４番目のステップ)においては例EXa（4値直線型）と一致している。 For example, if learning to switch between 4-value and 8-value quantization functions is performed using Examples EXa and EXb (4-value linear type, 8-value linear type), the intermediate 6-value quantization function will be changed to Example EXe. (6-value non-uniform linear type) In example EXe (the graph has 6 steps as a 6-value non-uniform linear type), the values before quantization (0~0.25, 0.75~1, i.e. 1, 2, 5, 6 out of 6 steps) The values before quantization (0.25 to 0.75, that is, the 3rd and 4th steps out of 6 steps) match the example EXa (4-value linear type). (linear type).

また、逆伝播に用いる関数は、例EXa0（直線増加関数）や図６のグラフG2B2やG3B2に示すように直線であることが望ましく、図６のB2BやG3Bに示すような傾斜階段状の関数は不整合が起きやすいため用いない方が良い。 In addition, the function used for backpropagation is preferably a straight line as shown in the example EXa0 (linear increasing function) and graphs G2B2 and G3B2 in Figure 6, and a sloped step-like function as shown in B2B and G3B in Figure 6. It is better not to use it because it tends to cause inconsistency.

また、利用するときは、（必ずしもタスク精度を可能な限り高く保つわけではないが）、同じ4値化でも異なる量子化関数も使いうる。例えば、例EXa（4値直線型）を用いて学習した画像変換部の変換画像に対し、タスク精度を重視したい場合には例EXa（4値直線型）を用い、プライバシーを重視して少々調整したい場合には例EXd（4値シグモイド曲線型）を用いることもできる。例EXd（4値シグモイド曲線型）では2値画像に近くなり中間階調表現ができ、タスク精度は少々下がるが、プライバシー漏洩度が軽減され、3値化と4値化の間の微調整としても用いることができる。 Also, when used, different quantization functions can be used for the same 4-value conversion (although it does not necessarily keep the task accuracy as high as possible). For example, if you want to emphasize task accuracy, use Example EXa (4-value linear type) for the converted image of the image conversion unit learned using Example EXa (4-value linear type), and make some adjustments to prioritize privacy. If desired, example EXd (four-level sigmoid curve type) can also be used. For example, EXd (4-level sigmoid curve type) is close to a binary image and can express intermediate gradations, and although the task accuracy is slightly lower, the degree of privacy leakage is reduced, and it can be used as a fine adjustment between 3-level and 4-level conversion. can also be used.

以上、本発明の各実施形態によれば、画像変換部1は小さな畳込ネットワーク等で構成することができ、ユーザ端末上でも動作できる。ユーザ端末上で複数の量子化関数を用意し、切り替えられるようにすることで、ユーザは自分に適正なプライバシーレベルを選ぶことができ、その後の姿勢推定等の精度も基本的にはその範囲で最大限に保つことができる。 As described above, according to each embodiment of the present invention, the image conversion unit 1 can be configured with a small convolutional network, etc., and can also operate on a user terminal. By providing multiple quantization functions on the user terminal and allowing them to switch between them, the user can select the privacy level that is appropriate for him/her, and the accuracy of subsequent posture estimation etc. will basically remain within that range. can be kept to the maximum.

また、プライバシー保護画像からの姿勢推定や顔器官検出等タスクのキイポイント抽出精度を保ちつつ、色や階調情報を明示的に削減し、静止画・動画出力時の目障りなノイズも発生しにくいプライバシー保護画像を生成できる。さらに、タスク精度を保ちつつ、画像上のプライバシー漏洩度の調整を、画像変換器を切り替えることなく画像への量子化関数を切り替えるだけで容易に行える。 In addition, while maintaining the accuracy of key point extraction for tasks such as pose estimation and facial organ detection from privacy-protected images, color and gradation information is explicitly reduced, making it less likely to cause obtrusive noise when outputting still images and videos. Can generate privacy-protected images. Furthermore, while maintaining task accuracy, the degree of privacy leakage on an image can be easily adjusted by simply switching the quantization function for the image without switching the image converter.

以下、種々の補足例や追加例等に関して説明する。 Below, various supplementary examples and additional examples will be explained.

（１）タスクが姿勢推定である場合、姿勢推定可能な状態でプライバシーを保護する応用例は様々であるが、例えば以下がある。
・宅内で運動した画像をサーバに送信し、姿勢推定による画像解析を行い、アドバイスを受ける場合に、宅内撮影画像の人物・肌・服装・部屋などのプライバシーを守ることができる。
・ドライブレコーダーや道路走行ロボットが撮影した映像をサーバに送信する際に、すれ違う歩行者の姿勢等をAI（人工知能）で認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・サーバに集められた撮影映像の公開レベルをあげてデータを移管・公開する際に、歩行者の姿勢等をAIで認識できる状態を保ちつつ歩行者のプライバシーを守ることができる。
・モニタールームでのリアルタイム映像監視用途、サービス提供者の閲覧においても被写体のプライバシーを守りやすい。 (1) When the task is pose estimation, there are various application examples for protecting privacy while pose estimation is possible, including the following.
- When you send images of yourself exercising at home to a server, perform image analysis using posture estimation, and receive advice, you can protect the privacy of people, skin, clothes, rooms, etc. in images taken at home.
- When sending images captured by drive recorders and road-driving robots to a server, it is possible to protect the privacy of pedestrians while maintaining a state in which AI (artificial intelligence) can recognize the postures of pedestrians passing each other.
・When transferring and publishing the data by increasing the disclosure level of the video footage collected on the server, it is possible to protect the privacy of pedestrians while maintaining a state in which AI can recognize the posture of pedestrians.
・Easy to protect the privacy of subjects in real-time video monitoring applications in monitor rooms and viewing by service providers.

（２）図８の質問例に示すように、ユーザのプライバシー保護への感じ方は、映像の利用用途・撮影場所などで大きく異なってくる。例えば、街中を移動するロボット搭載カメラに映った映像を想定し、「一般公開して良いか。」と質問Q1するのと、「サービス提供者がサービス品質向上目的のために閲覧しても良いか？」と質問Q2するのでは、ユーザの受容性は大きく異なる。（図８では当該質問Q1,Q2の回答がA1,A2として示され、前者の方がプライバシー保護観点で受容度が低い。）本特許では、後者の様に、サービス提供者の映像利用に対するユーザのプライバシー保護を想定している。 (2) As shown in the example question in Figure 8, users' feelings about privacy protection vary greatly depending on the purpose of use of the video, the location where the video was taken, and other factors. For example, assuming that the video is captured by a camera mounted on a robot that moves around the city, we can ask question Q1, ``Is it okay to make it available to the public?'' and ``Is it okay for service providers to view it for the purpose of improving service quality?'' If you ask question Q2, “Is it?”, the receptivity of users will be very different. (In Figure 8, the answers to questions Q1 and Q2 are shown as A1 and A2, and the former is less acceptable from the viewpoint of privacy protection.) In this patent, as shown in the latter, the answers to questions Q1 and Q2 are shown as A1 and A2. It is assumed that the privacy protection of

なお、図８では、街角で不特定多数の人が歩いている映像（原画、原映像）と、これに本発明の実施形態を適用してプライバシー保護を図った映像（変換画像、プライバシー保護映像）とを、ユーザを想定した複数の被験者に見せたうえで上記質問Q1,Q2を行った際の回答として回答A1,A2が示されており、ユーザのプライバシー保護への感じ方が様々であっても、本発明の実施形態により相応のプライバシー保護が達成されている。 Note that FIG. 8 shows a video of an unspecified number of people walking on a street corner (original image, original video) and a video to which the embodiment of the present invention is applied to protect privacy (converted image, privacy protection video). ) was shown to multiple subjects who assumed to be users, and answers A1 and A2 were shown as the answers to questions Q1 and Q2 above, indicating that users' feelings about privacy protection vary. However, reasonable privacy protection is achieved by embodiments of the present invention.

図８の実験は詳細には次の通りである。日中の街中映像（原画）から本発明の実施形態でプライバシー保護を行った変換画像を作成し、原画と変換画像に対してユーザの受容性評価を行った結果が示されている。（映像は9種類、原画と変換画像があるため18クリップ、それぞれ10秒の映像を用いた。実験参加者は映像中の目立つ人を想定して質問を行った。また、実験参加者は20代～60代の男女12名である。）質問Q1の「一般公開されても良いか」に対しては、回答A1のように変換画像において50%程度しか「良い側」の回答をしていないのに対し、質問Q2の「サービス品質向上目的のため、サービス提供者が見ても良いか。（全動画を保存。目的に応じた利用の後、即廃棄。最大で2カ月保存。）」という条件に対しては、回答A2のように「良い側」の回答が90%となった。質問Q2におけるプライバシー保護では人の輪郭や服などの領域情報が残っていても日中の街中映像のプライバシー保護では実用に耐えうると予想される。さらに本実験では、顔をぼかす処理でプライバシー保護感が100%に増すことを確認しており、そのような局所的なプライバシー保護処理を同時に用いることもできる。また、人の形状（輪郭）や服などの領域情報が残っていることで、目視で骨格のアノテーションを行うことができ、姿勢推定器をチューニングするなど、サービス提供者側が品質向上検討をしやすくなる。 The details of the experiment shown in FIG. 8 are as follows. A converted image with privacy protection performed according to an embodiment of the present invention is created from a daytime street image (original image), and the results of evaluating user acceptability of the original image and the converted image are shown. (There were 9 types of videos, original images and converted images, so 18 clips, each 10 seconds long, were used.The experiment participants asked questions while imagining people who stood out in the video.In addition, the experiment participants had 20 (12 men and women in their 60's to 60's.) Regarding question Q1, "Is it okay for it to be shown to the public?", as shown in answer A1, only about 50% of the converted images answered "on the good side". In contrast, question Q2: "Can the service provider view it for the purpose of improving service quality? (Save all videos. Discard immediately after using for the purpose. Store for a maximum of 2 months.) ”, 90% of the answers were on the “good side” as in answer A2. Regarding privacy protection in question Q2, it is expected that even if area information such as people's outlines and clothing remains, it will be practical for privacy protection of daytime street images. Furthermore, in this experiment, we confirmed that the sense of privacy protection increases to 100% when faces are blurred, and such local privacy protection processing can also be used at the same time. In addition, by retaining area information such as a person's shape (outline) and clothing, it is possible to visually annotate the skeleton, making it easier for service providers to consider improving quality, such as by tuning posture estimators. Become.

（３）図９は、一般的なコンピュータ装置70におけるハードウェア構成を示す図であり、以上説明してきた各実施形態のタスク実行装置10及び学習装置30はそれぞれ、このような構成を有する１つ以上のコンピュータ装置70として実現可能である。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサ72（GPU（グラフィック演算装置）や深層学習専用プロセッサ等）、CPU71や専用プロセッサ72にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77と、これらの間でデータを授受するためのバスBSと、を備える。 (3) FIG. 9 is a diagram showing the hardware configuration of a general computer device 70, and each of the task execution device 10 and learning device 30 of each embodiment described above is one having such a configuration. This can be realized as the computer device 70 described above. The computer device 70 includes a CPU (central processing unit) 71 that executes predetermined instructions, a dedicated processor 72 (GPU (graphics processing unit)) that executes some or all of the instructions executed by the CPU 71 instead of the CPU 71 or in cooperation with the CPU 71. A RAM 73 serves as the main memory that provides a work area for the CPU 71 and the dedicated processor 72, a ROM 74 serves as an auxiliary memory, a communication interface 75, a display 76, a mouse, a keyboard, a touch panel, etc. It includes an input interface 77 for receiving data, and a bus BS for transmitting and receiving data therebetween.

タスク実行装置10及び学習装置30の各部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実現することができる。また、学習装置30による学習方法は、図５の各ステップに対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又は専用プロセッサ72によって実施することができる。 Each part of the task execution device 10 and the learning device 30 can be realized by a CPU 71 and/or a dedicated processor 72 that reads a predetermined program corresponding to the function of each part from the ROM 74 and executes it. Further, the learning method by the learning device 30 can be implemented by the CPU 71 and/or the dedicated processor 72, which reads a predetermined program corresponding to each step in FIG. 5 from the ROM 74 and executes it.

10…タスク実行装置、20…学習部、30…学習装置
1…画像変換部、11…変換NW部、12…量子化部、13…タスク部
2…プライバシー用更新部、21…階調更新部、22…輪郭更新部、23…タスク用更新部 10...Task execution device, 20...Learning section, 30...Learning device
1...Image conversion section, 11...Conversion NW section, 12...Quantization section, 13...Task section
2...Privacy update section, 21...gradation update section, 22...contour update section, 23...task update section

Claims

A learning method for learning parameters of image conversion processing using a neural network structure,
updating parameters using an error calculated based on the degree of information reduction evaluated for the information reduction image obtained by converting the training image by the image conversion process;
learning the parameters of the image conversion process using a parameter update based on an error calculated based on a result of recognizing the information-reduced image in a task process that performs a recognition process of a predetermined task;
Through the image conversion process, the information-reduced image is converted into one having a gradation information channel and a contour information channel from which gradation information and contour information of the training image are respectively extracted,
A learning method characterized in that the degree of information reduction in the image of the gradation information channel and the image of the contour information channel is evaluated in updating parameters using an error calculated based on the degree of information reduction.

A step function that outputs pixel values as discrete values is set in the output layer of the neural network that performs the image conversion process, and the information-reduced image is converted as having pixel values of the discrete values. The learning method according to claim 1.

The step function is used during forward propagation of errors, and a continuous function that simulates the increasing behavior of the step function is used during back propagation of errors, thereby updating parameters. The learning method according to claim 2.

4. The learning method according to claim 2, wherein the number of gradations, which is the number of discrete values that can be output by the step function, is set to a value of 2 or more and 64 or less.

The step function can be set to a plurality of step functions that differ in the number of gradations that are the number of discrete numbers that can be output and/or the continuous function that is the target to be simulated by the step function,
5. The learning method according to claim 2, wherein learning is performed while switching between the plurality of step functions.

6. The information-reduced image is obtained in the form of a general color image by assigning a gradation information channel and a contour information channel to channels in a general color image such as RGB or YCbCr. The learning method described in either.

In parameter updating using an error calculated based on the information reduction degree,
Learning according to any one of claims 1 to 6, characterized in that a difference between the image of the gradation information channel and a flattened reference image generated by flattening the training image is evaluated as an error. Method.

In parameter updating using an error calculated based on the information reduction degree,
The image of the gradation information channel is taken as a fake image, the flattened reference image generated by flattening the training image is taken as a real image, and a GAN (Generative Adversarial Network) is used to distinguish between the fake image and the real image. 8. The learning method according to claim 1, wherein parameters are updated for a generator that obtains an image of a gradation information channel as a fake image through the image conversion process through learning of a discriminator.

9. The flattened reference image is generated with the number of gradations of the flattened reference image being larger than the number of gradations of the image of the gradation information channel. How to learn.

In parameter updating using an error calculated based on the information reduction degree,
10. The method according to claim 1, wherein a difference between the image of the contour information channel and a contour reference image generated by applying a contour extraction filter to the training image is evaluated as an error. How to learn.

In parameter updating using an error calculated based on the information reduction degree,
The image of the contour information channel is a fake image, the contour reference image generated by applying a contour extraction filter to the training image is a real image, and a GAN (Generative Adversarial Network) is used to distinguish between the fake image and the real image. ) The learning method according to any one of claims 1 to 10, characterized in that parameters are updated for a generator that obtains an image of the contour information channel as a fake image by the image conversion process through learning of a discriminator of .

12. Parameter updating using an error calculated based on the information reduction degree and parameter updating using an error calculated based on a result recognized in the task processing are performed alternately. The learning method described in any of the above.

Regarding the predetermined task, after learning the parameters of the image conversion process using parameter updating based on an error calculated based on the recognition result in the first task process that performs the recognition process of the first task, further:
The method is characterized in that parameters for a second task process that performs a recognition process for a second task different from the first task are learned using an information-reduced image obtained by converting a training image by the learned image conversion process. The learning method according to any one of claims 1 to 12.

An image conversion device characterized in that an information-reduced image is obtained by converting an input image using parameters for image conversion processing learned by the learning method according to any one of claims 1 to 13.

A program that causes a computer to execute the learning method according to any one of claims 1 to 13, or causes a computer to function as an image conversion device according to claim 14 .