JP7208940B2

JP7208940B2 - Image processing device, server, image processing method, attitude estimation method, and program

Info

Publication number: JP7208940B2
Application number: JP2020026606A
Authority: JP
Inventors: 茂之酒澤; 絵美明堂; 和之田坂
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2023-01-19
Anticipated expiration: 2040-02-19
Also published as: JP2021131725A

Description

本発明は、画像処理装置、サーバ、画像処理方法、姿勢推定方法、及びプログラムに関する。 The present invention relates to an image processing device, a server, an image processing method, a posture estimation method, and a program.

近年、ユーザが自身を被写体に含む画像をサーバに送信し、サーバにて画像認識処理を実行するサービスも実用化されつつある。例えば、ユーザが宅内でのヨガのポーズ指導を受けるために、スマートフォンで撮影した自身の映像をサーバに送信し、サーバがポーズ推定を行ったうえで結果をユーザに送信するサービスも検討されている。 In recent years, a service has been put into practical use in which a user transmits an image including himself/herself as a subject to a server, and the server executes image recognition processing. For example, in order for a user to receive guidance on yoga poses at home, a service is being considered in which a video of the user himself/herself taken with a smartphone is sent to a server, the server estimates the pose, and the result is sent to the user. .

このようなサービスを実施するためには、ユーザがサーバに送信する画像からユーザの個人情報が漏洩することを防ぐことが求められる。例えば、非特許文献１には、ターゲットとなる姿勢推定と、プライバシー侵害推定とを両立させるような画像変換を学習によって求めるための技術が提案されている。 In order to implement such a service, it is required to prevent the user's personal information from being leaked from the image that the user transmits to the server. For example, Non-Patent Literature 1 proposes a technique for obtaining, by learning, an image transformation that achieves both target orientation estimation and privacy violation estimation.

Haotao Wang, Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, and Hailin Jin, “Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset” ２０１９年７月２９日、[２０２０年１月３０日検索]、インターネット＜ＵＲＬ：https://arxiv.org/pdf/1906.05675.pdf＞Haotao Wang, Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, and Hailin Jin, “Privacy-Preserving Deep Visual Recognition: An Adversarial Learning Framework and A New Dataset,” July 29, 2019, [accessed January 30, 2020], Internet <URL: https://arxiv.org/pdf/1906.05675.pdf>

上記の技術における画像変換は一義的であり、どのような入力画像に対しても同じ処理が実行される。しかしながら、人のプライバシー上の懸念への感じ方には様々な要素があり、それらは人によって異なり、さらには同じ人でも時と場合によって変化しうる。このため、一律なプライバシーの秘匿化処理を施すだけでは必ずしもいつも画像の秘匿化を望むユーザの要望に応えられるとは限らない。 The image transformation in the above technique is unique, and the same processing is performed for any input image. However, there are various factors in how a person perceives privacy concerns, and they differ from person to person, and even the same person can change from time to time. For this reason, simply performing a uniform privacy anonymization process does not always meet the user's desire to anonymize images.

本発明はこれらの点に鑑みてなされたものであり、画像の秘匿化のパターンをユーザの選択に応じて変更する技術を提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of these points, and an object of the present invention is to provide a technique for changing an image anonymization pattern in accordance with a user's selection.

本発明の第１の態様は、画像処理装置である。この装置は、秘匿対象の被写体を含む処理対象画像を取得する画像取得部と、前記被写体に適用する秘匿化処理のパターンの選択を受け付ける選択受付部と、画像に対してそれぞれが異なる秘匿化処理を実行するように学習された複数の異なる秘匿化モデルであって、複数の前記秘匿化処理のパターン毎に定められた秘匿化モデルを格納する秘匿化モデル記憶部から、前記選択受付部が受け付けた秘匿化処理のパターンに対応する秘匿化モデルを読み出す秘匿化モデル取得部と、前記処理対象画像に対して前記秘匿化モデル取得部が読み出した秘匿化モデルを適用することにより、秘匿化画像を生成するモデル適用部と、を備える。 A first aspect of the present invention is an image processing apparatus. This device includes an image acquisition unit that acquires an image to be processed that includes a subject to be anonymized, a selection reception unit that receives a selection of an anonymization processing pattern to be applied to the subject, and an anonymization processing that is different for each image. The selection receiving unit receives from an anonymization model storage unit that stores a plurality of different anonymization models learned to execute the anonymization models determined for each of the plurality of anonymization processing patterns. a masking model acquisition unit that reads a masking model corresponding to a masking process pattern; a model applicator for generating.

前記異なる秘匿化モデルそれぞれは、画像に対して異なる種類の秘匿化処理を実行する複数の異なる秘匿化サブモデルを共有してもよく、前記モデル適用部は、複数の異なる秘匿化サブモデルの出力画像を、前記秘匿化処理のパターンに応じて定まる重み付け係数に基づいて重ね合わせた画像を前記秘匿化画像として出力してもよい。 Each of the different masking models may share a plurality of different masking sub-models that perform different types of masking processes on images, and the model application unit outputs the plurality of different masking sub-models. An image obtained by superimposing images based on a weighting coefficient determined according to the pattern of the anonymization process may be output as the anonymized image.

前記複数の秘匿化サブモデルのそれぞれは、あらかじめ前記被写体に設定されたいずれかの部分領域か、又は前記被写体以外の領域として設定される背景領域か、の少なくとも一つの領域を秘匿化対象領域として学習されていてもよい。 Each of the plurality of anonymization sub-models has at least one of a partial area set in advance for the subject and a background area set as an area other than the subject as an anonymization target area. It may be learned.

前記被写体は人物であってもよく、前記複数の秘匿化サブモデルのそれぞれは、（１）入力画像と当該入力画像に対して秘匿化サブモデルを適用して生成される出力画像との前記秘匿化対象領域における乖離度の大小を示す乖離度評価関数と、（２）前記複数の秘匿化サブモデルそれぞれを重ね合わせた前記秘匿化画像に含まれる人物の姿勢を推定するように学習された姿勢推定モデルの推定精度の高低を示す姿勢評価関数と、の２つの評価関数に基づいて、前記乖離度が大きくなり、かつ前記推定精度が高くなるように学習されていてもよい。 The subject may be a person, and each of the plurality of anonymization sub-models includes: (1) an input image and an output image generated by applying the anonymization sub-model to the input image; and (2) a posture learned to estimate the posture of a person included in the anonymized image obtained by superimposing each of the plurality of anonymization sub-models. Learning may be performed such that the degree of divergence increases and the estimation accuracy increases based on two evaluation functions: a posture evaluation function that indicates the level of estimation accuracy of the estimation model.

前記乖離度評価関数は、入力画像と当該入力画像に対して秘匿化サブモデルを適用して生成される出力画像との前記秘匿化対象領域における乖離度である秘匿化対象領域乖離度の大小と、前記出力画像のうち前記秘匿化対象領域以外の領域における前記入力画像との乖離度である秘匿化対象外領域乖離度の大小と、を示すように構成されていてもよく、前記複数の秘匿化サブモデルのそれぞれは、前記乖離度評価関数と前記姿勢評価関数とに基づいて、前記秘匿化対象領域乖離度が大きくなり、前記秘匿化対象外領域乖離度が小さくなり、かつ前記推定精度が高くなるように学習されていてもよく、前記モデル適用部は、複数の異なる秘匿化サブモデルそれぞれの出力画像の画素値から各秘匿化サブモデルの秘匿化対象領域以外の領域の画素値を減じた画像を重ね合わせた画像を前記秘匿化画像として出力してもよい。 The divergence evaluation function is the degree of divergence in the anonymization target region, which is the degree of divergence in the anonymization target region between the input image and the output image generated by applying the anonymization submodel to the input image. , and the size of a non-anonymization target region deviation degree, which is a degree of deviation from the input image in a region other than the anonymization target region in the output image, and the plurality of encryption Based on the divergence evaluation function and the posture evaluation function, each of the sub-models has a large anonymization target area deviation, a small anonymization non-anonymization area deviation, and an estimation accuracy of The model application unit subtracts pixel values of areas other than the anonymization target area of each anonymization submodel from the pixel values of the output images of each of a plurality of different anonymization submodels. An image obtained by superimposing the obtained images may be output as the anonymized image.

本発明の第２の態様は、サーバである。このサーバは、上述の画像処理装置が生成した秘匿化画像と、当該秘匿化画像に適用された秘匿化処理のパターンを示すパターン情報とを、通信ネットワークを介して取得する秘匿化画像取得部と、画像に対してそれぞれが異なる鮮鋭化処理を実行するように学習された複数の異なる鮮鋭化モデルであって、複数の前記鮮鋭化処理のパターン毎に定められた鮮鋭化モデルを格納する鮮鋭化モデル記憶部から、前記パターン情報に対応する鮮鋭化モデルを読み出す鮮鋭化モデル取得部と、前記秘匿化画像に対して前記鮮鋭化モデル取得部が読み出した鮮鋭化モデルを適用することにより、鮮鋭化画像を生成する鮮鋭化部と、前記鮮鋭化画像に対して前記姿勢推定モデルを適用することにより、前記鮮鋭化画像に含まれる人物の姿勢を推定する姿勢推定部と、を備える。 A second aspect of the present invention is a server. The server includes an anonymized image acquisition unit that acquires, via a communication network, an anonymized image generated by the image processing apparatus and pattern information indicating a pattern of anonymization processing applied to the anonymized image. , a plurality of different sharpening models trained to perform different sharpening processes on an image, the sharpening models storing sharpening models determined for each of the plurality of sharpening process patterns. a sharpening model acquisition unit that reads a sharpening model corresponding to the pattern information from a model storage unit; A sharpening unit that generates an image, and a pose estimation unit that estimates a pose of a person included in the sharpened image by applying the pose estimation model to the sharpened image.

本発明の第３の態様は、画像処理方法である。この方法において、プロセッサが、秘匿対象の被写体を含む処理対象画像を取得するステップと、前記被写体に適用する秘匿化処理のパターンの選択を受け付けるステップと、画像に対してそれぞれが異なる秘匿化処理を実行するように学習された複数の異なる秘匿化モデルであって、複数の前記秘匿化処理のパターン毎に定められた秘匿化モデルを格納する秘匿化モデル記憶部から、前記受け付けた秘匿化処理のパターンに対応する秘匿化モデルを読み出すステップと、前記処理対象画像に対して前記読み出した秘匿化モデルを適用することにより、秘匿化画像を生成するステップと、を実行する。 A third aspect of the present invention is an image processing method. In this method, the processor acquires an image to be processed that includes a subject to be anonymized, receives a selection of a pattern of anonymization processing to be applied to the subject, and performs different anonymization processing on the image. from an anonymization model storage unit that stores a plurality of different anonymization models that have been learned to be executed and that are determined for each of the plurality of anonymization processing patterns; reading out a concealment model corresponding to the pattern; and applying the read concealment model to the processing target image to generate a concealment image.

本発明の第４の態様は、プログラムである。このプログラムは、コンピュータに、秘匿対象の被写体を含む処理対象画像を取得する機能と、前記被写体に適用する秘匿化処理のパターンの選択を受け付ける機能と、画像に対してそれぞれが異なる秘匿化処理を実行するように学習された複数の異なる秘匿化モデルであって、複数の前記秘匿化処理のパターン毎に定められた秘匿化モデルを格納する秘匿化モデル記憶部から、前記受け付けた秘匿化処理のパターンに対応する秘匿化モデルを読み出す機能と、前記処理対象画像に対して前記読み出した秘匿化モデルを適用することにより、秘匿化画像を生成する機能と、を実現させる。 A fourth aspect of the present invention is a program. This program provides a computer with a function of acquiring an image to be processed that includes a subject to be anonymized, a function of accepting selection of a pattern of anonymization processing to be applied to the subject, and performing different anonymization processing on each image. from an anonymization model storage unit that stores a plurality of different anonymization models that have been learned to be executed and that are determined for each of the plurality of anonymization processing patterns; A function of reading an anonymization model corresponding to a pattern and a function of generating an anonymization image by applying the read anonymization model to the image to be processed are realized.

本発明の第５の態様は、姿勢推定方法である。この方法において、上述の画像処理装置と通信ネットワークを介して接続するサーバのプロセッサが、前記画像処理装置が生成した秘匿化画像と、当該秘匿化画像に適用された秘匿化処理のパターンを示すパターン情報とを、前記通信ネットワークを介して取得するステップと、画像に対してそれぞれが異なる鮮鋭化処理を実行するように学習された複数の異なる鮮鋭化モデルであって、複数の前記鮮鋭化処理のパターン毎に定められた鮮鋭化モデルを格納する鮮鋭化モデル記憶部から、前記パターン情報に対応する鮮鋭化モデルを読み出すステップと、前記秘匿化画像に対して前記読み出した鮮鋭化モデルを適用することにより、鮮鋭化画像を生成するステップと、前記鮮鋭化画像に対して前記姿勢推定モデルを適用することにより、前記鮮鋭化画像に含まれる人物の姿勢を推定するステップと、を実行する。 A fifth aspect of the present invention is a pose estimation method. In this method, a processor of a server connected to the above-described image processing device via a communication network generates a masked image generated by the image processing device and a pattern indicating a pattern of masking processing applied to the masked image. and a plurality of different sharpening models each trained to perform a different sharpening process on an image, the sharpening processes of the plurality of sharpening processes. reading out a sharpening model corresponding to the pattern information from a sharpening model storage unit that stores a sharpening model determined for each pattern; and applying the read out sharpening model to the concealed image. generating a sharpened image; and estimating the pose of a person included in the sharpened image by applying the pose estimation model to the sharpened image.

本発明の第６の態様は、プログラムである。このプログラムは、上述の画像処理装置と通信ネットワークを介して接続するコンピュータに、前記画像処理装置が生成した秘匿化画像と、当該秘匿化画像に適用された秘匿化処理のパターンを示すパターン情報とを、前記通信ネットワークを介して取得する機能と、画像に対してそれぞれが異なる鮮鋭化処理を実行するように学習された複数の異なる鮮鋭化モデルであって、複数の前記鮮鋭化処理のパターン毎に定められた鮮鋭化モデルを格納する鮮鋭化モデル記憶部から、前記パターン情報に対応する鮮鋭化モデルを読み出す機能と、前記秘匿化画像に対して前記読み出した鮮鋭化モデルを適用することにより、鮮鋭化画像を生成する機能と、前記鮮鋭化画像に対して前記姿勢推定モデルを適用することにより、前記鮮鋭化画像に含まれる人物の姿勢を推定する機能と、を実現させる。 A sixth aspect of the present invention is a program. This program stores an anonymized image generated by the image processing device and pattern information indicating a pattern of anonymization processing applied to the anonymized image in a computer connected to the image processing device via a communication network. via the communication network, and a plurality of different sharpening models trained to perform different sharpening processes on the image, each of the plurality of sharpening process patterns A function of reading out a sharpening model corresponding to the pattern information from a sharpening model storage unit storing a sharpening model determined in and applying the read out sharpening model to the anonymized image, A function of generating a sharpened image and a function of estimating the pose of a person included in the sharpened image by applying the pose estimation model to the sharpened image are realized.

本発明の第４の態様のプログラムと第６の態様のプログラムとを提供するため、あるいはこれらのプログラムの一部をアップデートするために、これらのプログラムを記録したコンピュータ読み取り可能な記録媒体が提供されてもよく、また、これらのプログラムが通信回線で伝送されてもよい。 In order to provide the program of the fourth aspect and the program of the sixth aspect of the present invention, or to update a part of these programs, a computer-readable recording medium recording these programs is provided. Alternatively, these programs may be transmitted over a communication line.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、コンピュータプログラム、データ構造、記録媒体などの間で変換したものもまた、本発明の態様として有効である。 Any combination of the above-described components, and expressions of the present invention converted into methods, devices, systems, computer programs, data structures, recording media, etc. are also effective as aspects of the present invention.

本発明によれば、画像の秘匿化のパターンをユーザの選択に応じて変更する技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique which changes the pattern of anonymization of an image according to a user's selection can be provided.

実施の形態に係る画像処理装置が実行する画像処理の概要を説明するための模式図である。1 is a schematic diagram for explaining an outline of image processing executed by an image processing apparatus according to an embodiment; FIG. 実施の形態に係る画像処理装置及びサーバの機能構成を模式的に示す図である。1 is a diagram schematically showing functional configurations of an image processing apparatus and a server according to an embodiment; FIG. 秘匿化処理のパターンを格納するパターンデータベースの内部構造を模式的に示す図である。FIG. 4 is a diagram schematically showing the internal structure of a pattern database that stores anonymization processing patterns; 秘匿化サブモデルの学習に用いるマスク画像を説明するための図である。FIG. 10 is a diagram for explaining a mask image used for learning an anonymization sub-model; FIG. 秘匿化サブモデルを学習するための学習ネットワークの構造を模式的に示す図である。FIG. 4 is a diagram schematically showing the structure of a learning network for learning anonymization submodels; 実施の形態に係る被写体に関する情報の一例を説明するための図である。FIG. 4 is a diagram for explaining an example of information about a subject according to the embodiment; FIG. 姿勢推定損失を説明するための模式図である。FIG. 4 is a schematic diagram for explaining posture estimation loss; 実施の形態に係る画像処理装置が実行する画像処理の流れを説明するためのフローチャートである。4 is a flowchart for explaining the flow of image processing executed by the image processing apparatus according to the embodiment; 実施の形態に係るサーバによる姿勢推定処理時の画像の変化を模式的に示す図である。FIG. 10 is a diagram schematically showing changes in an image during posture estimation processing by the server according to the embodiment; 実施の形態の第１の変形例に係る画像処理器の内部構成を説明するための模式図である。FIG. 5 is a schematic diagram for explaining the internal configuration of an image processor according to a first modified example of the embodiment;

（実施の形態の概要）
実施の形態に係る画像処理装置は、処理対象画像に対して複数の異なる画像処理を実行し、各画像処理の処理結果を合成して一つの出力画像を生成する。ここで、複数の異なる画像処理は、それぞれ処理対象画像に対して異なる種類の秘匿化処理を実行する。したがって、各画像処理の処理結果を合成して生成される出力画像は、処理対象画像に対して複数の異なる秘匿化処理が施された画像となる。ここで、実施の形態に係る画像処理装置は、各画像処理の秘匿化処理の強度を変更可能であり、各画像処理の強度のパターンに関するユーザの選択を受け付けることができる。これにより、実施の形態に係る画像処理装置は、画像の秘匿化のパターンをユーザの選択に応じて変更することができる。 (Overview of Embodiment)
An image processing apparatus according to an embodiment performs a plurality of different image processes on an image to be processed, and synthesizes the processing results of each image process to generate one output image. Here, the plurality of different image processes execute different types of anonymization processes on the images to be processed. Therefore, the output image generated by synthesizing the processing results of each image processing is an image obtained by subjecting the image to be processed to a plurality of different anonymization processes. Here, the image processing apparatus according to the embodiment can change the strength of the anonymization processing for each image processing, and can accept the user's selection regarding the pattern of the strength of each image processing. Thereby, the image processing apparatus according to the embodiment can change the image anonymization pattern according to the user's selection.

図１は、実施の形態に係る画像処理装置が実行する画像処理の概要を説明するための模式図である。図１は、処理対象画像Ｉに部屋の中でヨガのポーズを取る女性が写っている場合の例を示している。 FIG. 1 is a schematic diagram for explaining an outline of image processing executed by an image processing apparatus according to an embodiment. FIG. 1 shows an example in which an image I to be processed includes a woman taking a yoga pose in a room.

図１に示す例では、画像処理装置は、入力画像に対して第１画像処理器Ｆ１、第２画像処理器Ｆ２、及び第３画像処理器Ｆ３を含む画像処理器Ｆによる処理を実行し、それぞれ第１中間画像Ｂ１、第２中間画像Ｂ２、及び第３中間画像Ｂ３を出力する。その後、画像処理装置は、第１中間画像Ｂ１、第２中間画像Ｂ２、及び第３中間画像Ｂ３それぞれに対して重み付けをして重ね合わせることにより、秘匿化画像Ｈを出力する。 In the example shown in FIG. 1, the image processing apparatus performs processing on an input image by an image processor F including a first image processor F1, a second image processor F2, and a third image processor F3, A first intermediate image B1, a second intermediate image B2, and a third intermediate image B3 are output, respectively. After that, the image processing device outputs the anonymized image H by weighting and superimposing the first intermediate image B1, the second intermediate image B2, and the third intermediate image B3.

図１に示す例では、第１画像処理器Ｆ１は、処理対象画像Ｉに撮像されている人物の顔領域を秘匿化するための画像処理器である。第２画像処理器Ｆ２は、処理対象画像Ｉに撮像されている人物の輪郭を秘匿化するための画像処理器である。さらに、第３画像処理器Ｆ３は、処理対象画像Ｉのうち人物以外の領域である背景領域を秘匿化するための画像処理器である。 In the example shown in FIG. 1, the first image processor F1 is an image processor for concealing the face area of a person captured in the image I to be processed. The second image processor F2 is an image processor for concealing the outline of the person captured in the image I to be processed. Furthermore, the third image processor F3 is an image processor for anonymizing a background area, which is an area other than the person, in the image I to be processed.

ここで、画像処理装置は、第１中間画像Ｂ１、第２中間画像Ｂ２、及び第３中間画像Ｂ３それぞれを重ね合わせる際に利用する重み付けとして複数の異なるパターンＰを用意しており、あらかじめいずれのパターンＰを採用するかの選択をユーザから受け付けている。図１では、第２中間画像Ｂ２の重みが相対的に大きく、第３中間画像Ｂ３の重みが相対的に小さく、第１中間画像Ｂ１の重みは中間の大きさである場合の例を示している。 Here, the image processing apparatus prepares a plurality of different patterns P as weightings used when superimposing the first intermediate image B1, the second intermediate image B2, and the third intermediate image B3. A user's selection as to whether to adopt the pattern P is accepted. FIG. 1 shows an example in which the weight of the second intermediate image B2 is relatively large, the weight of the third intermediate image B3 is relatively small, and the weight of the first intermediate image B1 is intermediate. there is

ユーザは、中間画像Ｂの重み付けのパターンＰを変更することにより、顔の秘匿化を重視したり、背景の秘匿化を重視したりする等、好みに応じて秘匿化処理のパターンＰを変更することができる。 By changing the weighting pattern P of the intermediate image B, the user changes the pattern P of the anonymization process according to his/her preference, such as emphasizing the anonymization of the face or emphasizing the anonymization of the background. be able to.

（実施の形態に係る画像処理装置１の機能構成）
図２は、実施の形態に係る画像処理装置１及びサーバ２の機能構成を模式的に示す図である。画像処理装置１は、記憶部１０、撮像部１１、及び制御部１２を備える。またサーバ２は、記憶部２０と制御部２１とを備える。画像処理装置１とサーバ２とは、インターネット等の通信ネットワークＮを介して通信可能な態様で接続されている。 (Functional configuration of image processing apparatus 1 according to embodiment)
FIG. 2 is a diagram schematically showing functional configurations of the image processing apparatus 1 and the server 2 according to the embodiment. The image processing device 1 includes a storage unit 10 , an imaging unit 11 and a control unit 12 . The server 2 also includes a storage unit 20 and a control unit 21 . The image processing apparatus 1 and the server 2 are connected in a communicable manner via a communication network N such as the Internet.

図２において、矢印は主なデータの流れを示しており、図２に示していないデータの流れがあってもよい。図２において、各機能ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図２に示す機能ブロックは単一の装置内に実装されてもよく、あるいは複数の装置内に分かれて実装されてもよい。機能ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてもよい。 In FIG. 2, arrows indicate main data flows, and there may be data flows not shown in FIG. In FIG. 2, each functional block does not show the configuration in units of hardware (apparatus), but the configuration in units of functions. Therefore, the functional blocks shown in FIG. 2 may be implemented within a single device, or may be implemented separately within a plurality of devices. Data exchange between functional blocks may be performed via any means such as a data bus, network, or portable storage medium.

記憶部１０は、画像処理装置１を実現するコンピュータのＢＩＯＳ（Basic Input Output System）等を格納するＲＯＭ（Read Only Memory）や画像処理装置１の作業領域となるＲＡＭ（Random Access Memory）、ＯＳ（Operating System）やアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の大容量記憶装置である。 The storage unit 10 includes a ROM (Read Only Memory) that stores a BIOS (Basic Input Output System) of a computer that implements the image processing apparatus 1, a RAM (Random Access Memory) that serves as a work area of the image processing apparatus 1, an OS ( (Operating System), application programs, and a large-capacity storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) that stores various information referred to when the application program is executed.

撮像部１１は、処理対象画像Ｉを生成するための撮像機器であり、例えばＣＣＤ（Charge Coupled Device）イメージセンサやＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサ等の既知の固体撮像素子を用いて実現される。制御部１２は、画像処理装置１のＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサであり、記憶部１０に記憶されたプログラムを実行することによって画像取得部１２０、選択受付部１２１、秘匿化モデル取得部１２２、及びモデル適用部１２３として機能する。 The imaging unit 11 is an imaging device for generating the processing target image I, and is realized using a known solid-state imaging device such as a CCD (Charge Coupled Device) image sensor or a CMOS (Complementary Metal Oxide Semiconductor) image sensor. be. The control unit 12 is a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) of the image processing apparatus 1, and executes a program stored in the storage unit 10 to perform an image acquisition unit 120, a selection acceptance unit 121 , an anonymization model acquisition unit 122 , and a model application unit 123 .

記憶部２０は、サーバ２を実現するコンピュータのＢＩＯＳ等を格納するＲＯＭやサーバ２の作業領域となるＲＡＭ、ＯＳやアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤやＳＳＤ等の大容量記憶装置である。制御部２１は、サーバ２のＣＰＵやＧＰＵ等のプロセッサであり、記憶部２０に記憶されたプログラムを実行することによって秘匿化画像取得部２１０、鮮鋭化モデル取得部２１１、鮮鋭化部２１２、及び姿勢推定部２１３として機能する。 The storage unit 20 includes a ROM storing the BIOS and the like of the computer that implements the server 2, a RAM serving as a work area for the server 2, an OS and application programs, and an HDD storing various information referred to when the application programs are executed. A large-capacity storage device such as an SSD. The control unit 21 is a processor such as a CPU or GPU of the server 2, and by executing a program stored in the storage unit 20, an anonymized image acquisition unit 210, a sharpening model acquisition unit 211, a sharpening unit 212, and It functions as posture estimation section 213 .

なお、図２は、画像処理装置１及びサーバ２がそれぞれ単一の装置で構成されている場合の例を示している。しかしながら、画像処理装置１とサーバ２との少なくともいずれか一方は、例えばクラウドコンピューティングシステムのように複数のプロセッサやメモリ等の計算リソースによって実現されてもよい。この場合、制御部１２又は制御部２１を構成する各部は、複数の異なるプロセッサの中の少なくともいずれかのプロセッサがプログラムを実行することによって実現される。 Note that FIG. 2 shows an example in which the image processing apparatus 1 and the server 2 are configured by a single apparatus. However, at least one of the image processing device 1 and the server 2 may be implemented by computational resources such as multiple processors and memories, such as a cloud computing system. In this case, each unit constituting the control unit 12 or the control unit 21 is realized by executing a program by at least one of a plurality of different processors.

画像取得部１２０は、秘匿対象の被写体を含む処理対象画像Ｉを取得する。画像取得部１２０は、撮像部１１が撮像した画像を処理対象画像Ｉとして取得してもよいし、図示しないメモリーカードから読み出して取得したり、他のＰＣから有線又は無線で取得したりしてもよい。 The image acquisition unit 120 acquires a processing target image I including a subject to be concealed. The image acquiring unit 120 may acquire the image captured by the imaging unit 11 as the processing target image I, read it from a memory card (not shown), or acquire it from another PC by wire or wirelessly. good too.

選択受付部１２１は、被写体に適用する秘匿化処理のパターンＰの選択を受け付ける。具体的には、選択受付部１２１は、マウスやキーボード、あるいはタッチパネル等の画像処理装置１の入力インタフェース（不図示）を介してユーザからの選択を受け付ける。 The selection accepting unit 121 accepts selection of the anonymization processing pattern P to be applied to the subject. Specifically, the selection accepting unit 121 accepts a selection from the user via an input interface (not shown) of the image processing apparatus 1 such as a mouse, keyboard, or touch panel.

秘匿化モデル取得部１２２は、複数の異なる秘匿化モデルを格納する秘匿化モデル記憶部から、選択受付部１２１が受け付けた秘匿化処理のパターンＰに対応する秘匿化モデルを読み出す。ここで、各秘匿化モデルは、複数の秘匿化処理のパターンＰ毎に定められており、画像に対してそれぞれが異なる秘匿化処理を実行するように学習されている。なお、図２に示す例では、記憶部１０が秘匿化モデル記憶部を兼ねている。 The anonymization model acquisition unit 122 reads the anonymization model corresponding to the anonymization processing pattern P accepted by the selection reception unit 121 from the anonymization model storage unit that stores a plurality of different anonymization models. Here, each anonymization model is determined for each of a plurality of anonymization processing patterns P, and is learned so as to execute different anonymization processing for each image. Note that in the example shown in FIG. 2, the storage unit 10 also serves as an anonymization model storage unit.

モデル適用部１２３は、処理対象画像Ｉに対して秘匿化モデル取得部１２２が読み出した秘匿化モデルを適用することにより、秘匿化画像Ｈを生成する。これにより、画像処理装置１は、ユーザの選択に応じたパターンＰの秘匿化処理を用いて、処理対象画像Ｉの秘匿化を実行することができる。なお、サーバ２の詳細は後述する。 The model application unit 123 generates a masked image H by applying the masked model read by the masked model acquisition unit 122 to the image I to be processed. Accordingly, the image processing apparatus 1 can anonymize the processing target image I using the anonymization processing of the pattern P according to the user's selection. Details of the server 2 will be described later.

（秘匿化モデルの詳細）
図１を参照して説明したように、各秘匿化モデルは、画像に対して異なる種類の秘匿化処理を実行する複数の異なる秘匿化サブモデル（図１における画像処理器Ｆ）を有している。具体的には、各秘匿化モデルが有する秘匿化モデルは互いに異なるモデルではなく、各秘匿化モデルは秘匿化サブモデルを共有している。 (Details of masking model)
As described with reference to FIG. 1, each masking model has a plurality of different masking sub-models (image processor F in FIG. 1) that perform different types of masking processes on images. there is Specifically, each anonymization model has an anonymization model that is not different from each other, and each anonymization model shares an anonymization sub-model.

以下、本明細書において、処理対象画像Ｉに撮像されている人物の顔領域を秘匿化するためモデルを「第１秘匿化サブモデルＦ１」、処理対象画像Ｉに撮像されている人物の輪郭を秘匿化するためのモデルを「第２秘匿化サブモデルＦ２」、処理対象画像Ｉのうち人物以外の領域である背景領域を秘匿化するためのモデルを「第３秘匿化サブモデルＦ３」、各秘匿化サブモデルを特に区別しない場合には「秘匿化サブモデルＦ」と記載する。この意味で、秘匿化サブモデルＦ及び第１～３秘匿化サブモデルＦ１～Ｆ３は、それぞれ、図１における画像処理器Ｆ及び第１～３画像処理器Ｆ１～Ｆ３に対応する。なお、各秘匿化サブモデルの処理内容は一例であり、この他の処理があってもよい。 Hereinafter, in this specification, the model for anonymizing the face area of the person imaged in the processing target image I is a "first anonymization submodel F1", and the contour of the person imaged in the processing target image I is A model for anonymization is a "second anonymization sub-model F2", a model for anonymization of a background region other than a person in the image I to be processed is a "third anonymization sub-model F3", and each When the anonymization sub-model is not particularly distinguished, it is described as "anonymization sub-model F". In this sense, the anonymization submodel F and the first to third anonymization submodels F1 to F3 respectively correspond to the image processor F and the first to third image processors F1 to F3 in FIG. Note that the processing content of each anonymization submodel is an example, and other processing may be performed.

モデル適用部１２３は、複数の異なる秘匿化サブモデルＦの出力画像である中間画像Ｂを、秘匿化処理のパターンＰに応じて定まる重み付け係数に基づいて重ね合わせることで生成される画像を秘匿化画像Ｈとして出力する。これにより、各秘匿化モデルが有する秘匿化サブモデルＦが同じであっても、画像処理装置１は、重み付けのパターンＰを変えることによって異なる秘匿化画像Ｈを生成することができる。 The model application unit 123 anonymizes an image generated by superimposing an intermediate image B, which is an output image of a plurality of different anonymization submodels F, based on a weighting coefficient determined according to an anonymization processing pattern P. Output as image H. As a result, even if each anonymization model has the same anonymization sub-model F, the image processing apparatus 1 can generate different anonymization images H by changing the weighting pattern P.

図３は、秘匿化処理のパターンＰを格納するパターンデータベースの内部構造を模式的に示す図である。パターンデータベースはモデル適用部１２３によって管理されており、記憶部１０に格納されている。図３では、秘匿化処理のパターンＰとして、「顔重視型」、「輪郭重視型」、「背景重視型」、及び「バランス型」の４つのパターンＰそれぞれにおける重みが例示されている。 FIG. 3 is a diagram schematically showing the internal structure of a pattern database that stores anonymization processing patterns P. As shown in FIG. The pattern database is managed by the model application unit 123 and stored in the storage unit 10. FIG. FIG. 3 exemplifies the weights of four patterns P of anonymization processing, namely, "face-oriented type", "outline-oriented type", "background-oriented type", and "balanced type".

例えば、「顔重視型」パターンＰでは、第１中間画像Ｂ１の重みは０．７であり、第２中間画像Ｂ２の重み（０．１５）及び第３中間画像Ｂ３の重み（０．１５）よりも大きい。上述したように第１中間画像Ｂ１は、処理対象画像Ｉに撮像されている人物の顔領域を秘匿化するための画像処理が施された画像である。したがって、第１中間画像Ｂ１の重みが他の重みよりも大きいことは、秘匿化画像Ｈにおいて人物の顔領域の秘匿化が重視されていることを意味する。他のパターンＰについても同様である。このように、パターンデータベースにあらかじめ複数の秘匿化処理のパターンＰを記憶しておくことで、ユーザは選択操作をするだけで処理対象画像Ｉに対して施す秘匿化処理のパターンＰを変更することができる。 For example, in the "face-oriented" pattern P, the weight of the first intermediate image B1 is 0.7, the weight of the second intermediate image B2 (0.15), and the weight of the third intermediate image B3 (0.15). bigger than As described above, the first intermediate image B1 is an image that has undergone image processing for concealing the face region of the person captured in the image I to be processed. Therefore, the fact that the weight of the first intermediate image B1 is higher than the other weights means that in the anonymized image H, the anonymization of the person's face area is emphasized. The same applies to other patterns P as well. In this way, by storing a plurality of anonymization processing patterns P in the pattern database in advance, the user can change the anonymization processing pattern P to be applied to the processing target image I simply by performing a selection operation. can be done.

（秘匿化サブモデルＦの学習）
実施の形態に係る秘匿化処理に用いられる各秘匿化サブモデルＦは、ニューラルネットワークの一種である既知の敵対的生成ネットワーク（Generative Adversarial Networks；ＧＡＮ）を用いた機械学習手法によって作成されている。以下、秘匿化サブモデルＦの学習手法について説明する。なお、ＧＡＮは既知であるためＧＡＮ自体の詳細な説明は省略し、以下では主に学習に用いるデータ及び評価関数について説明する。 (Learning of anonymous submodel F)
Each anonymization sub-model F used in the anonymization process according to the embodiment is created by a machine learning technique using known Generative Adversarial Networks (GAN), which is a type of neural network. The method of learning the anonymization sub-model F will be described below. Since the GAN is already known, a detailed description of the GAN itself is omitted, and data and evaluation functions used mainly for learning will be described below.

図４（ａ）－（ｄ）は、秘匿化サブモデルの学習に用いるマスク画像Ｍを説明するための図である。具体的に、図４（ａ）は、秘匿化サブモデルを生成するための教師データＴの一例を示す図である。図４（ｂ）－（ｄ）は、それぞれ第１秘匿化サブモデルＦ１、第２秘匿化サブモデルＦ２、及び第３秘匿化サブモデルＦ３を学習するために用いられるマスク画像Ｍである。 FIGS. 4(a) to 4(d) are diagrams for explaining the mask image M used for learning the anonymization submodel. Specifically, FIG. 4A is a diagram showing an example of teacher data T for generating anonymization submodels. FIGS. 4(b)-(d) are mask images M used for learning the first anonymization sub-model F1, the second anonymization sub-model F2, and the third anonymization sub-model F3, respectively.

図４（ｂ）に示すように、第１マスク画像Ｍ１は、教師データＴに写っている人物の顔領域以外の領域をマスクした画像である。また、図４（ｃ）に示すように、第２マスク画像Ｍ２は、教師データＴに写っている人物以外の領域である背景領域をマスクした画像である。さらに、図４（ｄ）に示すように、第３マスク画像Ｍ３は、教師データＴに写っている人物の領域をマスクした画像である。これらのマスク画像は、学習に用いる教師データＴ毎に、学習の実施者があらかじめ用意しておく。あるいは、従来技術を用いて、秘匿化サブモデルＦの前処理としてマスク処理を実現してもよい。 As shown in FIG. 4B, the first mask image M1 is an image obtained by masking the area other than the face area of the person appearing in the teacher data T. As shown in FIG. Also, as shown in FIG. 4C, the second mask image M2 is an image obtained by masking the background area, which is the area other than the person appearing in the teacher data T. As shown in FIG. Furthermore, as shown in FIG. 4D, the third mask image M3 is an image obtained by masking the area of the person appearing in the teacher data T. As shown in FIG. These mask images are prepared in advance by the learner for each teacher data T used for learning. Alternatively, mask processing may be implemented as pre-processing of the anonymized sub-model F using a conventional technique.

図５は、秘匿化サブモデルＦを学習するための学習ネットワークＮｔの構造を模式的に示す図である。学習ネットワークＮｔは、画像処理器Ｆ、姿勢推定器Ｅ、及び乖離度算出器Ｄを備える。秘匿化サブモデルＦの学習に用いられる教師データＴは、画像データＴ１と、画像データＴ１に写っている人物の姿勢に関する情報Ｔ２とを含んでいる。ここで画像データＴ１には、図４に例示されているマスク画像Ｍも含まれる。また、「人物の姿勢に関する情報Ｔ２」とは、各Ｔ１に撮像されている人物の体のパーツを示す情報である。 FIG. 5 is a diagram schematically showing the structure of a learning network Nt for learning the anonymization submodel F. As shown in FIG. The learning network Nt includes an image processor F, a posture estimator E, and a divergence calculator D. The teacher data T used for learning the anonymized sub-model F includes image data T1 and information T2 regarding the posture of the person appearing in the image data T1. Here, the image data T1 also includes the mask image M illustrated in FIG. "Information T2 about the posture of the person" is information indicating the parts of the body of the person captured in each T1.

図６（ａ）－（ｂ）は、実施の形態に係る被写体に関する情報の一例を説明するための図である。具体的には、図６（ａ）は画像データＴ１に撮像されている女性の被写体に設定された部位位置を模式的に示す図であり、図６（ｂ）は図６（ａ）に示された各部位位置の位置座標を表形式で示す図である。 FIGS. 6A and 6B are diagrams for explaining an example of information about a subject according to the embodiment. Specifically, FIG. 6(a) is a diagram schematically showing the body part positions set for the female subject imaged in the image data T1, and FIG. 6(b) is shown in FIG. 6(a). FIG. 10 is a diagram showing position coordinates of each part position obtained in a tabular format;

図６（ａ）に示すように、人物の姿勢に関する情報Ｔ２には、被写体の頭頂、手首、肘、肩、首、腰、足の付け根、膝、足先等を含む１５カ所の部位位置が設定されている。また、被写体の各部位位置を示す座標として、人物の姿勢に関する情報Ｔ２の左上を原点Ｏとし、人物の姿勢に関する情報Ｔ２の横方向をＸ軸、縦方向をＹ軸とする２次元座標系における座標が設定されている。図６（ｂ）に示すように、１５カ所の部位位置にはそれぞれ１から１５までの番号が部位番号として割り当てられており、各部位番号に対応する部位の座標が画像データＴ１に対応づけて設定されている。 As shown in FIG. 6A, the information T2 about the posture of the person includes 15 part positions including the top of the subject's head, wrists, elbows, shoulders, neck, waist, base of feet, knees, and toes. is set. In addition, as the coordinates indicating the position of each part of the subject, in a two-dimensional coordinate system in which the upper left of the information T2 on the posture of the person is the origin O, the horizontal direction of the information T2 on the posture of the person is the X axis, and the vertical direction is the Y axis. Coordinates are set. As shown in FIG. 6B, numbers from 1 to 15 are assigned to the 15 site positions as site numbers, and the coordinates of the site corresponding to each site number are associated with the image data T1. is set.

図５の説明に戻る。学習ネットワークにおいて、画像データＴ１はまず画像処理器Ｆに入力される。画像処理器Ｆは、第１秘匿化サブモデルＦ１、第２秘匿化サブモデルＦ２、及び第３秘匿化サブモデルＦ３を備えており、画像データＴ１は第１秘匿化サブモデルＦ１、第２秘匿化サブモデルＦ２、及び第３秘匿化サブモデルＦ３のそれぞれに入力される。 Returning to the description of FIG. In the learning network, image data T1 is first input to an image processor F. The image processor F includes a first concealment sub-model F1, a second concealment sub-model F2, and a third concealment sub-model F3. input to each of the encryption sub-model F2 and the third anonymization sub-model F3.

ここで、各秘匿化サブモデルＦは、対応するマスク画像Ｍでマスクされる領域を除いた領域について処理する。例えば、第１秘匿化サブモデルＦ１に対応する第１マスク画像Ｍ１は、教師データＴに写っている人物の顔領域以外の領域をマスクした画像である。したがって、第１秘匿化サブモデルＦ１は、第１マスク画像Ｍ１でマスクされる領域を除いた領域、すなわち人物の顔領域について処理を実行することになる。同様に、第２秘匿化サブモデルＦ２は人物領域について処理を実行し、第３秘匿化サブモデルＦ３は背景領域について処理を実行する。 Here, each anonymizing sub-model F processes an area excluding the area masked by the corresponding mask image M. FIG. For example, the first mask image M1 corresponding to the first anonymization sub-model F1 is an image obtained by masking areas other than the face area of the person appearing in the teacher data T. FIG. Therefore, the first anonymization sub-model F1 performs processing on the area excluding the area masked by the first mask image M1, that is, the person's face area. Similarly, the second anonymization sub-model F2 performs processing on the person region, and the third anonymization sub-model F3 performs processing on the background region.

第１秘匿化サブモデルＦ１、第２秘匿化サブモデルＦ２、及び第３秘匿化サブモデルＦ３それぞれの出力である第１中間画像Ｂ１、第２中間画像Ｂ２、及び第３中間画像Ｂ３は合成して出力される。各画像処理器Ｆの合成された出力は、姿勢推定器Ｅ及び乖離度算出器Ｄの入力となる。なお、各画像処理器Ｆの出力である中間画像Ｂ１～Ｂ３は、上述した秘匿化処理のパターンＰ毎に異なる重み付けで合成され秘匿化画像Ｈとなる。すなわち、秘匿化サブモデルＦは、秘匿化処理のパターンＰ毎に異なる学習によって生成される。 The first intermediate image B1, the second intermediate image B2, and the third intermediate image B3, which are the outputs of the first anonymization sub-model F1, the second anonymization sub-model F2, and the third anonymization sub-model F3, respectively, are synthesized. output as The synthesized output of each image processor F is input to the attitude estimator E and the divergence calculator D. FIG. Note that the intermediate images B1 to B3, which are the outputs of the respective image processors F, are combined with different weightings for each of the above-described anonymization processing patterns P to form an anonymized image H. FIG. That is, the anonymization sub-model F is generated by different learning for each anonymization processing pattern P. FIG.

乖離度算出器Ｄは、第１乖離度算出器Ｄ１、第２乖離度算出器Ｄ２、及び第３乖離度算出器Ｄ３を備えている。乖離度算出器Ｄは、画像処理器Ｆに入力された画像と、画像処理器Ｆが出力した画像との乖離度を算出する。 The divergence calculator D includes a first divergence calculator D1, a second divergence calculator D2, and a third divergence calculator D3. The divergence calculator D calculates the divergence between the image input to the image processor F and the image output by the image processor F. FIG.

具体的には、第１乖離度算出器Ｄ１は、第１マスク画像Ｍ１でマスクされる領域を除いた領域における画像処理器Ｆの入力画像と出力画像との乖離度を算出する。すなわち、第１乖離度算出器Ｄ１は、人物の顔領域における乖離度を算出する。同様に、第２乖離度算出器Ｄ２は人物領域における乖離度を算出し、第３乖離度算出器Ｄ３は背景領域における乖離度を算出する。 Specifically, the first divergence calculator D1 calculates the divergence between the input image and the output image of the image processor F in an area excluding the area masked by the first mask image M1. That is, the first divergence calculator D1 calculates the divergence in the person's face area. Similarly, the second degree of divergence calculator D2 calculates the degree of divergence in the person region, and the third degree of divergence calculator D3 calculates the degree of divergence in the background region.

ここで、乖離度算出器Ｄが算出する乖離度は、画像処理器Ｆに入力された画像と画像処理器Ｆが出力した画像との乖離度を計ることができればどのような計測手法を用いてもよいが、一例としては平均二乗誤差（Mean Squared Error；ＭＳＥ）を用いて計測される。以下、第１乖離度算出器Ｄ１が人物の顔領域について算出したＭＳＥを第１原画乖離損失Ｌ_ｄ１、第２乖離度算出器Ｄ２が人物領域について算出したＭＳＥを第２原画乖離損失Ｌ_ｄ２、第３乖離度算出器Ｄ３が背景領域について算出したＭＳＥを第３原画乖離損失Ｌ_ｄ３と記載する。実施の形態に係る画像処理装置１は、処理対象画像Ｉの秘匿化を一つの目的としているため、この目的のためには、各原画乖離損失Ｌが大きいほど好ましい。原画乖離損失Ｌが大きいほど、画像処理器Ｆの入力と出力とが乖離していることを示すからである。 Here, the degree of divergence calculated by the degree of divergence calculator D can be obtained by using any measurement method if it is possible to measure the degree of divergence between the image input to the image processor F and the image output by the image processor F. However, as an example, it is measured using Mean Squared Error (MSE). Hereinafter, the MSE calculated for the person's face area by the first divergence calculator D1 is the first original image divergence loss L _d1 , the MSE calculated for the person area by the second divergence calculator D2 is the second original image divergence loss L _d2 , The MSE calculated for the background area by the third deviation calculator D3 is referred to as a third original image deviation loss _Ld3 . One of the purposes of the image processing apparatus 1 according to the embodiment is to conceal the processing target image I. For this purpose, it is preferable that each original image divergence loss L is as large as possible. This is because the greater the original image divergence loss L, the greater the divergence between the input and the output of the image processor F.

姿勢推定器Ｅは、複数の秘匿化サブモデルＦそれぞれを重ね合わせた秘匿化画像Ｈに含まれる人物の姿勢を推定するように学習された既知の姿勢推定モデルで実現されている。より具体的には、姿勢推定器Ｅは、画像処理器Ｆが出力した画像に含まれる人物の頭頂、手首、肘、肩、首、腰、足の付け根、膝、足先等を含む１５カ所の部位位置を推定する。ここで、姿勢推定器Ｅが推定の対象とする各部位位置は、人物の姿勢に関する情報Ｔ２として正解が定められている。そこで、姿勢推定器Ｅは、画像処理器Ｆが出力した画像から推定した各部位位置と、人物の姿勢に関する情報Ｔ２として定められた各部位位置の正解位置との誤差を姿勢推定損失Ｌ_ｐとして算出する。 The pose estimator E is implemented by a known pose estimation model trained to estimate the pose of a person included in an anonymized image H obtained by superimposing each of the plurality of anonymized submodels F. More specifically, the posture estimator E detects 15 points including the top of the head, wrists, elbows, shoulders, neck, waist, groin, knees, and toes of the person included in the image output by the image processor F. Estimate the position of the part of Here, for each part position to be estimated by the posture estimator E, the correct answer is determined as the information T2 regarding the posture of the person. Therefore, the posture estimator E uses the error between each part position estimated from the image output by the image processor F and the correct position of each part position determined as the information T2 regarding the posture of the person as a posture estimation loss _Lp . calculate.

図７は、姿勢推定損失Ｌ_ｐを説明するための模式図である。図７において、白抜きの丸は姿勢推定器Ｅが推定した部位位置を示している。また、白抜きの四角は、各部位位置の正解位置としてあらかじめ人物の姿勢に関する情報Ｔ２に設定されている人物の左肘の位置を示している。 FIG. 7 is a schematic diagram for explaining the posture estimation loss _Lp . In FIG. 7 , white circles indicate body part positions estimated by the posture estimator E. In FIG. A white square indicates the position of the left elbow of the person, which is set in advance in the information T2 regarding the posture of the person as the correct position for each part position.

なお、煩雑となることを避けるために図７では正解位置として左肘の位置のみを図示している。また、図７において、破線の円Ｃは被写体の左肘部分の拡大図である。図７に示すように、姿勢推定器Ｅが推定した左肘の位置と、左肘の正解位置とは、距離Ｑだけずれている。姿勢推定器Ｅは、各部位位置におけるずれ量の二乗平均を姿勢推定損失Ｌ_ｐとして算出する。 In order to avoid complication, FIG. 7 shows only the position of the left elbow as the correct position. In FIG. 7, the dashed circle C is an enlarged view of the subject's left elbow. As shown in FIG. 7, the position of the left elbow estimated by the posture estimator E and the correct position of the left elbow are shifted by a distance Q. In FIG. The posture estimator E calculates the mean square of the deviation amount at each part position as the posture estimation loss _Lp .

この場合、姿勢推定損失Ｌ_ｐの値が小さいほど、姿勢推定器Ｅの認識精度が高いことを示す。実施の形態に係る画像処理装置１は、秘匿化画像Ｈから姿勢を推定することを一つの目的としているため、この目的のためには、姿勢推定損失Ｌ_ｐが小さいほど好ましい。 In this case, the smaller the pose estimation loss _Lp , the higher the recognition accuracy of the pose estimator E. One purpose of the image processing apparatus 1 according to the embodiment is to estimate the orientation from the anonymized image H. For this purpose, the orientation estimation loss _Lp is preferably as small as possible.

そこで、各秘匿化サブモデルＦは、以下に示す乖離度評価関数Ｇ１（第１乖離度評価関数Ｇ１１、第２乖離度評価関数Ｇ１２、及び第３乖離度評価関数Ｇ１３）と姿勢評価関数Ｇ２とが小さくなるようにＧＡＮのフレームワークにおいて学習される。具体的には、
Ｇ１１＝１／λ_１Ｌ_ｄ１
Ｇ１２＝１／λ_２Ｌ_ｄ２
Ｇ１３＝１／λ_３Ｌ_ｄ３
Ｇ２＝姿勢推定損失Ｌ_ｐ
ここでλ_１、λ_２及びλ_３は、それぞれ正の実数である。 Therefore, each anonymization sub-model F includes a deviation evaluation function G1 (a first deviation evaluation function G11, a second deviation evaluation function G12, and a third deviation evaluation function G13) and a posture evaluation function G2 shown below. is learned in the GAN framework such that . in particular,
G11=1/λ ₁ L _d1
G12=1/λ ₂ L _d2
G13=1/λ ₃ L _d3
G2 = Attitude estimation loss L _p
Here, λ ₁ , λ ₂ and λ ₃ are each positive real numbers.

乖離度評価関数Ｇ１に基づく学習は、処理対象画像Ｉの秘匿化処理に関する学習である。マスク画像Ｍを用いることで、複数の秘匿化サブモデルＦのそれぞれは、あらかじめ被写体に設定されたいずれかの部分領域か、又は被写体以外の領域として設定される背景領域か、の少なくとも一つの領域を秘匿化対象領域として学習されることになる。より具体的には、まず、第１乖離度評価関数Ｇ１１を用いて第１画像処理器Ｆ１が学習され、続いて第２乖離度評価関数Ｇ１２を用いて第２画像処理器Ｆ２が学習され、続いて第３乖離度評価関数Ｇ１３を用いて第３画像処理器Ｆ３が学習され、続いて姿勢評価関数Ｇ２を用いて各画像処理器Ｆ（第１画像処理器Ｆ１、第２画像処理器Ｆ２、及び第３画像処理器Ｆ３）が学習される。これを繰り返すことにより、各画像処理器Ｆが学習される。なお、姿勢損失Ｌｐを用いた学習の際には、各画像処理器Ｆに対して、対応するパターンＰで重み付けられた誤差を用いて誤差逆伝搬（バックプロパゲーション）される。 Learning based on the divergence evaluation function G1 is learning related to the anonymization process of the image I to be processed. By using the mask image M, each of the plurality of anonymization sub-models F has at least one of a partial area set in advance as the subject and a background area set as an area other than the subject. is learned as an anonymization target area. More specifically, first, the first image processor F1 is trained using the first divergence evaluation function G11, then the second image processor F2 is trained using the second divergence evaluation function G12, Subsequently, the third image processor F3 is trained using the third deviation evaluation function G13, and then each image processor F (first image processor F1, second image processor F2 , and the third image processor F3) are trained. By repeating this, each image processor F is learned. In learning using the posture loss Lp, errors weighted by the corresponding pattern P are used for error back propagation (back propagation) to each image processor F. FIG.

乖離度評価関数Ｇ１が小さくなることは、画像処理器Ｆの入力画像とその入力画像に対して秘匿化サブモデルＦを適用して生成される出力画像との乖離度が大きくなることを示している。これにより、画像処理器Ｆは、処理対象画像Ｉと秘匿化画像Ｈとの乖離度が大きくなるように学習される。 A decrease in the deviation evaluation function G1 indicates an increase in the deviation between the input image of the image processor F and the output image generated by applying the anonymization submodel F to the input image. there is As a result, the image processor F learns such that the degree of divergence between the processing target image I and the anonymized image H increases.

一方、姿勢評価関数Ｇ２が小さくなることは、姿勢推定モデルの推定精度が高くなることを示している。これにより、画像処理器Ｆは、処理対象画像Ｉと秘匿化画像Ｈとの姿勢推定損失が小さくなるように学習される。ここで、秘匿化処理に関する学習により、処理対象画像Ｉと秘匿化画像Ｈとの乖離度が大きくなることは、処理対象画像Ｉと秘匿化画像Ｈとの人物を含む領域の乖離度も大きくなることを意味する。この場合、秘匿化画像Ｈを入力とする人物推定の精度が悪くなりかねない。各秘匿化サブモデルＦが乖離度評価関数Ｇ１のみならず姿勢評価関数Ｇ２にも基づいて学習されることによって処理対象画像Ｉの秘匿化と姿勢推定の精度との両立を図ることができる。 On the other hand, the smaller posture evaluation function G2 indicates the higher estimation accuracy of the posture estimation model. As a result, the image processor F is trained so that the orientation estimation loss between the processing target image I and the anonymized image H is small. Here, learning about the anonymization processing increases the degree of divergence between the processing target image I and the anonymized image H, and thus the degree of divergence of the region including the person between the processing target image I and the anonymized image H also increases. means that In this case, the accuracy of person estimation using the anonymized image H as input may deteriorate. By learning each anonymization sub-model F based on not only the divergence evaluation function G1 but also the orientation evaluation function G2, both anonymization of the processing target image I and accuracy of orientation estimation can be achieved.

（画像処理装置１が実行する画像処理方法の処理フロー）
図８は、実施の形態に係る画像処理装置１が実行する画像処理の流れを説明するためのフローチャートである。本フローチャートにおける処理は、例えば画像処理装置１が起動したときに開始する。 (Processing Flow of Image Processing Method Executed by Image Processing Apparatus 1)
FIG. 8 is a flowchart for explaining the flow of image processing executed by the image processing apparatus 1 according to the embodiment. The processing in this flowchart starts, for example, when the image processing apparatus 1 is activated.

画像取得部１２０は、秘匿対象の被写体を含む処理対象画像Ｉを取得する（Ｓ２）。選択受付部１２１は、被写体に適用する秘匿化処理のパターンＰの選択をユーザから受け付ける（Ｓ４）。 The image acquisition unit 120 acquires the processing target image I including the subject to be concealed (S2). The selection accepting unit 121 accepts from the user a selection of the anonymization process pattern P to be applied to the subject (S4).

秘匿化モデル取得部１２２は、秘匿化モデル記憶部から、選択受付部１２１が受け付けた秘匿化処理のパターンＰに対応する秘匿化モデルを読み出して取得する（Ｓ６）。モデル適用部１２３は、処理対象画像Ｉに対して秘匿化モデル取得部１２２が読み出した秘匿化モデルを適用することにより、秘匿化画像Ｈを生成する（Ｓ８）。モデル適用部１２３が秘匿化画像Ｈを生成すると、本フローチャートにおける処理は終了する。 The anonymization model acquisition unit 122 reads and acquires the anonymization model corresponding to the anonymization processing pattern P accepted by the selection acceptance unit 121 from the anonymization model storage unit (S6). The model application unit 123 applies the anonymization model read by the anonymization model acquisition unit 122 to the processing target image I to generate an anonymization image H (S8). When the model application unit 123 generates the anonymized image H, the processing in this flowchart ends.

（姿勢推定サーバ）
再び図２の説明に戻る。サーバ２の秘匿化画像取得部２１０は、画像処理装置１が生成した秘匿化画像Ｈと、秘匿化画像Ｈに適用された秘匿化処理のパターンＰを示すパターン情報とを、通信ネットワークＮを介して取得する。 (Posture estimation server)
Returning to the description of FIG. 2 again. The anonymized image acquisition unit 210 of the server 2 transmits the anonymized image H generated by the image processing apparatus 1 and the pattern information indicating the anonymization process pattern P applied to the anonymized image H via the communication network N. to get.

鮮鋭化モデル取得部２１１は、複数の鮮鋭化処理のパターンＰ毎に定められた鮮鋭化モデルを格納する鮮鋭化モデル記憶部から、パターン情報に対応する鮮鋭化モデルを読み出す。ここで、「鮮鋭化モデル」は、画像に対してそれぞれが異なる鮮鋭化処理を実行するように学習された複数の異なるモデルである。 The sharpening model acquisition unit 211 reads a sharpening model corresponding to pattern information from a sharpening model storage unit that stores a sharpening model determined for each pattern P of a plurality of sharpening processes. Here, a "sharpening model" is a plurality of different models, each trained to perform a different sharpening process on an image.

例えば、人物の姿勢推定には人物の輪郭が重要となる。秘匿化処理のパターンＰとして輪郭重視型が選択されている場合には人物領域の輪郭が他のパターンＰと比較してより不鮮明となっているため、輪郭重視型のパターンＰにおける鮮鋭化モデルは、強めの鮮鋭化処理が実行されるように学習されている。なお、図２に示す例では、記憶部２０が鮮鋭化モデル記憶部を兼ねている。 For example, the contour of a person is important for estimating the pose of the person. When the contour-oriented type is selected as the pattern P of the anonymization processing, the contour of the person region is blurred more than the other patterns P. Therefore, the sharpening model in the contour-oriented pattern P is , is trained to perform strong sharpening processing. Note that in the example shown in FIG. 2, the storage unit 20 also serves as the sharpening model storage unit.

鮮鋭化部２１２は、秘匿化画像Ｈに対して鮮鋭化モデル取得部２１１が読み出した鮮鋭化モデルを適用することにより、鮮鋭化画像を生成する。姿勢推定部２１３は、鮮鋭化画像に対して姿勢推定モデルを適用することにより、鮮鋭化画像に含まれる人物の姿勢を推定する。 The sharpening unit 212 generates a sharpened image by applying the sharpening model read by the sharpening model acquisition unit 211 to the concealed image H. FIG. The posture estimation unit 213 estimates the posture of the person included in the sharpened image by applying a posture estimation model to the sharpened image.

図９は、実施の形態に係るサーバ２による姿勢推定処理時の画像の変化を模式的に示す図である。図９に示すように、秘匿化画像Ｈは、人物の顔領域と背景領域、及び人物の輪郭が不鮮明となった画像となっている。鮮鋭化部２１２が秘匿化画像Ｈに対して鮮鋭化モデルを適用すると、人物領域のエッジが強調された鮮鋭化画像Ｓが出力される。 FIG. 9 is a diagram schematically showing how an image changes during posture estimation processing by the server 2 according to the embodiment. As shown in FIG. 9, the anonymized image H is an image in which the face region and background region of the person and the outline of the person are blurred. When the sharpening unit 212 applies the sharpening model to the anonymized image H, a sharpened image S in which the edges of the human region are emphasized is output.

姿勢推定部２１３が鮮鋭化画像Ｓに対して姿勢推定モデルを適用すると鮮鋭化画像Ｓに写っている人物の姿勢が推定される。具体的には、図９に示すように、鮮鋭化画像Ｓに写っている人物の各部位が推定された処理結果Ｏが出力される。このように、サーバ２は、姿勢推定処理を実行する前に秘匿化画像Ｈに対して鮮鋭化処理を施すことにより、秘匿化画像Ｈに写っている人物の姿勢推定の精度を向上させることができる。 When the posture estimation unit 213 applies the posture estimation model to the sharpened image S, the posture of the person appearing in the sharpened image S is estimated. Specifically, as shown in FIG. 9, a processing result O obtained by estimating each part of the person appearing in the sharpened image S is output. In this manner, the server 2 performs the sharpening process on the anonymized image H before executing the pose estimation process, thereby improving the accuracy of estimating the pose of the person appearing in the anonymized image H. can.

（実施の形態に係る画像処理装置１及びサーバ２が奏する効果）
以上説明したように、実施の形態に係る画像処理装置１によれば、画像の秘匿化のパターンＰをユーザの選択に応じて変更する技術を提供することができる。また、実施の形態に係るサーバ２によれば、画像処理装置１が生成した秘匿化画像Ｈに写っている人物の姿勢を推定することができる。 (Effects of Image Processing Apparatus 1 and Server 2 According to Embodiment)
As described above, according to the image processing apparatus 1 according to the embodiment, it is possible to provide a technique for changing the image anonymization pattern P according to the user's selection. Further, according to the server 2 according to the embodiment, it is possible to estimate the posture of the person appearing in the anonymized image H generated by the image processing device 1 .

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果をあわせ持つ。以下そのような変形例について説明する。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. be. For example, all or part of the device can be functionally or physically distributed and integrated in arbitrary units. In addition, new embodiments resulting from arbitrary combinations of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment caused by the combination has the effect of the original embodiment. Such modifications will be described below.

（第１の変形例）
上記では、乖離度評価関数Ｇ１が、入力画像とその入力画像に対して秘匿化サブモデルＦを適用して生成される出力画像との秘匿化対象領域における乖離度の大小を示す場合について説明した。これに代えて、第１の変形例に係る乖離度評価関数Ｇ１は、入力画像とその入力画像に対して秘匿化サブモデルＦを適用して生成される出力画像との秘匿化対象領域における乖離度である秘匿化対象領域乖離度の大小と、出力画像のうち秘匿化対象領域以外の領域における入力画像との乖離度である秘匿化対象外領域乖離度の大小と、を示すように構成されている。
Ｇ１＝処理対象領域以外のＭＳＥ／処理対象領域のＭＳＥ (First modification)
A case has been described above in which the divergence evaluation function G1 indicates the degree of divergence in the anonymization target region between the input image and the output image generated by applying the anonymization submodel F to the input image. . Instead, the divergence evaluation function G1 according to the first modification is the divergence in the anonymization target region between the input image and the output image generated by applying the anonymization submodel F to the input image. and the magnitude of the anonymization target area divergence degree, which is the degree of anonymization target area deviation, and the magnitude of the anonymization target area deviation degree, which is the degree of deviation from the input image in the area of the output image other than the anonymization target area. ing.
G1=MSE other than processing target area/MSE in processing target area

例えば、第１画像処理器Ｆ１の学習に用いる評価関数である第１乖離度評価関数Ｇ１１は、Ｇ１１＝（λ_２Ｌ_ｄ２＋λ_３Ｌ_ｄ３）／（λ_１Ｌ_ｄ１）となる。同様に、第２画像処理器Ｆ２の学習に用いる評価関数である第２乖離度評価関数Ｇ１２は、Ｇ１２＝（λ_１Ｌ_ｄ１＋λ_３Ｌ_ｄ３）／（λ_２Ｌ_ｄ２）となり、第３画像処理器Ｆ３の学習に用いる評価関数である第３乖離度評価関数Ｇ１３は、Ｇ１３＝（λ_１Ｌ_ｄ１＋λ_２Ｌ_ｄ２）／（λ_３Ｌ_ｄ３）となる。第１の変形例に係る各秘匿化サブモデルＦは、対応する乖離度評価関数Ｇ１を用いて独立に学習される。 For example, a first divergence evaluation function G11, which is an evaluation function used for learning of the first image processor F1, is G11=(λ ₂ L _d2 +λ ₃ L _d3 )/(λ ₁ L _d1 ). Similarly, the second divergence evaluation function G12, which is the evaluation function used for learning of the second image processor F2, is G12=(λ ₁ L _d1 + λ ₃ L _d3 )/(λ ₂ L _d2 ), and the third image A third deviation evaluation function G13, which is an evaluation function used for learning by the processor F3, is G13=(λ ₁ L _d1 +λ ₂ L _d2 )/(λ ₃ L _d3 ). Each anonymization sub-model F according to the first modification is learned independently using the corresponding deviation evaluation function G1.

複数の秘匿化サブモデルＦのそれぞれは、乖離度評価関数Ｇ１と姿勢評価関数Ｇ２とに基づいて、秘匿化対象領域乖離度が大きくなり、秘匿化対象外領域乖離度が小さくなり、かつ推定精度が高くなるように学習される。この結果、第１の変形例に係る秘匿化サブモデルＦが出力する中間画像Ｂは、対応するマスク画像Ｍでマスクされる領域（すなわち、秘匿化対象領域以外の領域）が原画像と近くなるように学習される。 Each of the plurality of anonymization sub-models F has a large anonymization target region deviation, a small anonymization non-target region deviation, and an estimation accuracy is learned to become higher. As a result, in the intermediate image B output by the anonymization submodel F according to the first modification, the area masked by the corresponding mask image M (that is, the area other than the anonymization target area) is close to the original image. is learned as

具体的には、第１の変形例に係る学習ネットワークにおいて、複数の異なる秘匿化サブモデルＦは、それぞれ出力画像の画素値から各秘匿化サブモデルの秘匿化対象領域以外の領域（すなわち、マスク画像Ｍがマスクする領域）の画素値を減じた画像を中間画像Ｂとして出力し、その中間画像Ｂを重ね合わせた画像を秘匿化画像Ｈとして出力する。 Specifically, in the learning network according to the first modification, a plurality of different anonymization sub-models F are obtained from the pixel values of the output image in areas other than the anonymization target area of each anonymization sub-model (that is, mask An image obtained by subtracting the pixel values of the area masked by the image M) is output as an intermediate image B, and an image obtained by superimposing the intermediate image B is output as an anonymized image H.

図１０は、実施の形態の第１の変形例に係る画像処理器Ｆを説明するための模式図である。具体的は、図１０は、実施の形態の第１の変形例に係る第３画像処理器Ｆ３の動作を説明するための図である。図１０に示すように、第１の変形例に係る第３画像処理器Ｆ３は、教師用の画像データＴ１のうち、第３マスク画像Ｍ３がマスクしていない背景領域を秘匿化し、第３マスク画像Ｍ３がマスクしている人物領域の画素の画素値は画像データＴ１の画素値を維持する。 FIG. 10 is a schematic diagram for explaining an image processor F according to the first modification of the embodiment. Specifically, FIG. 10 is a diagram for explaining the operation of the third image processor F3 according to the first modification of the embodiment. As shown in FIG. 10, the third image processor F3 according to the first modification anonymizes the background area, which is not masked by the third mask image M3, in the teacher image data T1. The pixel values of the pixels in the person region masked by the image M3 maintain the pixel values of the image data T1.

図１０に示すように、第３画像処理器Ｆ３の出力画像は、画像データＴ１から減じられる。この結果、第３中間画像Ｂ３には、第３マスク画像Ｍ３がマスクしている人物領域の画素値が０となる。言い換えると、第１の変形例に係る第３中間画像Ｂ３は、秘匿化対象領域である背景画像のみが写っている画像となる。 As shown in FIG. 10, the output image of the third image processor F3 is subtracted from the image data T1. As a result, the pixel value of the person area masked by the third mask image M3 becomes 0 in the third intermediate image B3. In other words, the third intermediate image B3 according to the first modified example is an image in which only the background image, which is the anonymization target area, is captured.

第１の変形例に係る第１画像処理器Ｆ１が出力する第１中間画像Ｂ１も同様であり、第１画像処理器Ｆ１の秘匿化対象領域（すなわち、第１マスク画像Ｍ１がマスクしていない領域）である顔領域のみを含む画像となる。また、第２画像処理器Ｆ２が出力する第２中間画像Ｂ２は、人物領域のみを含む画像となる。この結果、第１の変形例に係るモデル適用部１２３が出力する秘匿化画像Ｈは、各秘匿化サブモデルＦの秘匿化対象領域がパッチワークのように重ね合わされた画像となり、秘匿化画像Ｈを構成する画素はいずれかの秘匿化サブモデルＦが出力した画素のみで構成することができる。 The same applies to the first intermediate image B1 output by the first image processor F1 according to the first modification. The image is an image that includes only the face area, which is the face area. Also, the second intermediate image B2 output by the second image processor F2 is an image including only the person area. As a result, the anonymized image H output by the model application unit 123 according to the first modification is an image in which the anonymization target regions of the anonymization submodels F are superimposed like a patchwork, and the anonymized image H can be composed only of pixels output by any of the anonymization sub-models F.

（第２の変形例）
上記では、秘匿化処理のパターンＰ毎に秘匿化サブモデルＦを学習する場合について説明した。これに代えて、学習時には固定の重みパターンＰ（例えば、全ての秘匿化サブモデルの重みが等しいパターン）で学習し、秘匿化処理時には秘匿化処理のパターンＰ毎に重みを変更するようにしてもよい。これにより、画像処理器Ｆの生成に係る計算コストを削減することができる。 (Second modification)
In the above description, the case where the anonymization sub-model F is learned for each anonymization processing pattern P has been described. Instead, during learning, learning is performed with a fixed weighting pattern P (for example, a pattern in which all anonymization submodels have equal weights), and during anonymization processing, the weight is changed for each anonymization processing pattern P. good too. As a result, the calculation cost associated with the generation of the image processor F can be reduced.

（第３の変形例）
上記では、処理対象画像Ｉと秘匿化画像Ｈとのアスペクト比が同一である場合について主に説明した。しかしながら、秘匿化画像Ｈのアスペクト比は、処理対象画像Ｉのアスペクト比と異なっていてもよい。これは、制御部１２中のアスペクト比変更部が、秘匿化画像Ｈのアスペクトを変更して新たに秘匿化画像Ｈとすればよい。秘匿化画像Ｈのアスペクト比が変更されれば秘匿化画像Ｈ中の人物の体形も変更されるため、見た目の秘匿化を促進できる点で有利である。 (Third modification)
In the above description, the case where the aspect ratios of the processing target image I and the anonymized image H are the same has been mainly described. However, the aspect ratio of the anonymized image H may differ from the aspect ratio of the image I to be processed. This can be done by changing the aspect ratio of the anonymized image H by the aspect ratio changer in the control unit 12 to create a new anonymized image H. FIG. If the aspect ratio of the anonymized image H is changed, the body shape of the person in the anonymized image H is also changed.

1・・・画像処理装置
１０・・・記憶部
１１・・・撮像部
１２・・・制御部
１２０・・・画像取得部
１２１・・・選択受付部
１２２・・・秘匿化モデル取得部
１２３・・・モデル適用部
２・・・サーバ
２０・・・記憶部
２１・・・制御部
２１０・・・秘匿化画像取得部
２１１・・・鮮鋭化モデル取得部
２１２・・・鮮鋭化部
２１３・・・姿勢推定部
Ｄ・・・乖離度算出器
Ｅ・・・姿勢推定器
Ｆ・・・画像処理器
Ｎ・・・通信ネットワーク
Ｎｔ・・・学習ネットワーク
1... Image processing apparatus 10... Storage unit 11... Imaging unit 12... Control unit 120... Image acquisition unit 121... Selection reception unit 122... Anonymized model acquisition unit 123. Model application unit 2 Server 20 Storage unit 21 Control unit 210 Anonymized image acquisition unit 211 Sharpening model acquisition unit 212 Sharpening unit 213 Posture estimating unit D Deviation calculator E Posture estimator F Image processor N Communication network Nt Learning network

Claims

an image acquisition unit that acquires an image to be processed that includes a subject to be anonymized;
a selection reception unit that receives selection of a pattern of anonymization processing to be applied to the subject;
a plurality of different masking models trained to perform different masking processes on an image, the masking model storing a masking model determined for each of the plurality of masking process patterns an anonymization model acquisition unit that reads, from a storage unit, an anonymization model corresponding to the anonymization processing pattern accepted by the selection acceptance unit;
a model application unit that generates a masked image by applying the masked model read by the masked model acquisition unit to the image to be processed;
An image processing device comprising:

each of the different masking models shares a plurality of different masking sub-models that perform different types of masking processes on images;
The model application unit outputs an image obtained by superimposing output images of a plurality of different anonymization sub-models based on a weighting coefficient determined according to the pattern of the anonymization process, as the anonymization image.
The image processing apparatus according to claim 1.

Each of the plurality of anonymization sub-models has at least one of a partial area set in advance for the subject and a background area set as an area other than the subject as an anonymization target area. being learned,
The image processing apparatus according to claim 2.

the subject is a person,
Each of the plurality of anonymization sub-models (1) indicates the degree of divergence in the anonymization target region between an input image and an output image generated by applying the anonymization sub-model to the input image. and (2) a posture indicating the level of estimation accuracy of a posture estimation model trained to estimate the posture of a person included in the anonymized image obtained by superimposing each of the plurality of anonymization sub-models. Based on the two evaluation functions, the evaluation function and the learning is performed so that the deviation is increased and the estimation accuracy is increased.
The image processing apparatus according to claim 3.

The divergence evaluation function is the degree of divergence in the anonymization target region, which is the degree of divergence in the anonymization target region between the input image and the output image generated by applying the anonymization submodel to the input image. , and the size of a non-anonymization target region deviation degree, which is a degree of deviation from the input image in a region other than the anonymization target region in the output image, and
Each of the plurality of anonymization sub-models increases the anonymization target area deviation and decreases the anonymization non-target area deviation based on the deviation evaluation function and the posture evaluation function, and It is learned so that the estimation accuracy is high,
The model application unit is configured to anonymize an image obtained by superimposing an image obtained by subtracting a pixel value of an area other than an anonymization target area of each anonymization submodel from a pixel value of an output image of each of a plurality of different anonymization submodels. output as an image,
The image processing apparatus according to claim 4.

Anonymized image acquisition for acquiring an anonymized image generated by the image processing apparatus according to claim 4 or 5 and pattern information indicating a pattern of anonymization processing applied to the anonymized image via a communication network. Department and
a plurality of different sharpening models trained to perform different sharpening processes on an image, the sharpening models storing sharpening models determined for each of the plurality of sharpening process patterns; a sharpening model acquisition unit that reads a sharpening model corresponding to the pattern information from a storage unit;
a sharpening unit that generates a sharpened image by applying the sharpening model read by the sharpening model acquisition unit to the anonymized image;
a posture estimation unit that estimates a posture of a person included in the sharpened image by applying the posture estimation model to the sharpened image;
A server with

the processor
a step of acquiring a processing target image including a subject to be concealed;
a step of receiving selection of a pattern of anonymization processing to be applied to the subject;
a plurality of different masking models trained to perform different masking processes on an image, the masking model storing a masking model determined for each of the plurality of masking process patterns a step of reading an anonymization model corresponding to the accepted anonymization process pattern from a storage unit;
generating a masked image by applying the read masking model to the image to be processed;
An image processing method that performs

to the computer,
A function of acquiring a processing target image including a subject to be concealed;
a function of accepting selection of a pattern of anonymization processing to be applied to the subject;
a plurality of different masking models trained to perform different masking processes on an image, the masking model storing a masking model determined for each of the plurality of masking process patterns a function of reading an anonymization model corresponding to the accepted anonymization process pattern from a storage unit;
a function of generating an anonymized image by applying the read anonymization model to the image to be processed;
program to realize

A processor of a server connected to the image processing apparatus according to claim 4 or 5 via a communication network,
a step of acquiring, via the communication network, an anonymized image generated by the image processing device and pattern information indicating a pattern of anonymization processing applied to the anonymized image;
a plurality of different sharpening models trained to perform different sharpening processes on an image, the sharpening models storing sharpening models determined for each of the plurality of sharpening process patterns; reading a sharpening model corresponding to the pattern information from a storage unit;
generating a sharpened image by applying the read sharpening model to the concealed image;
estimating a pose of a person included in the sharpened image by applying the pose estimation model to the sharpened image;
A pose estimation method that performs

A computer connected to the image processing apparatus according to claim 4 or 5 via a communication network,
a function of acquiring, via the communication network, an anonymized image generated by the image processing apparatus and pattern information indicating a pattern of anonymization processing applied to the anonymized image;
a plurality of different sharpening models trained to perform different sharpening processes on an image, the sharpening models storing sharpening models determined for each of the plurality of sharpening process patterns; a function of reading a sharpening model corresponding to the pattern information from a storage unit;
a function of generating a sharpened image by applying the read sharpening model to the concealed image;
a function of estimating the pose of a person included in the sharpened image by applying the pose estimation model to the sharpened image;
program to realize