JP2020197926A

JP2020197926A - Image processing device, image processing method, and program

Info

Publication number: JP2020197926A
Application number: JP2019103836A
Authority: JP
Inventors: 暢小倉; Toru Kokura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-06-03
Filing date: 2019-06-03
Publication date: 2020-12-10

Abstract

To reduce the discontinuity of image conversion processing results caused by switching a dataset for learning conversion parameters to be used when performing image data conversion processing.SOLUTION: An image processing device 106 performs predetermined image conversion on acquired input image data. Specifically, the image processing device performs predetermined image conversion on at least a part of a first partial data and a second partial data based on conversion parameters generated from and associated with a first attribute value and a second attribute value, in which the first attribute value is possessed by the first partial data among a plurality of partial data constituting input image data, and the second attribute value is different from the first attribute value and possessed by the second partial data neighboring the first partial data among the plurality of partial data.SELECTED DRAWING: Figure 5

Description

本開示による技術は、画像処理技術に関する。 The technology according to the present disclosure relates to an image processing technology.

ノイズ除去、ボケ除去、領域分割、高解像度化などの画像変換技術において、学習を利用した手法が知られている。この学習を用いた手法では、画像変換前の第１の画像データを画像変換後の第２の画像データに変換する変換パラメータを、予め用意した教師画像データ群を含むデータセットによる学習によって事前に求めておく。ここで、画像変換とは、入力した第１の画像データに対応する第２の画像データを出力する技術の総称である。 Techniques using learning are known in image conversion techniques such as noise removal, blur removal, region division, and high resolution. In the method using this learning, the conversion parameters for converting the first image data before image conversion to the second image data after image conversion are learned in advance by learning with a dataset including a teacher image data group prepared in advance. I'll ask for it. Here, the image conversion is a general term for a technique for outputting a second image data corresponding to the input first image data.

この画像変換において用いる変換パラメータを学習によって精度良く求めるための工夫として、画像の撮像条件や撮像対象といった、入力画像データの属性情報に応じて選択されたデータセットで学習を行う手法が知られている。例えば、画像認識の分野では、入力画像データの撮像時刻・撮像場所等の属性情報に応じて選択されたデータセットで学習を行うことにより、計算コストを削減し認識精度を向上させるという手法が知られている（特許文献１参照）。 As a device for accurately obtaining the conversion parameters used in this image conversion by learning, a method of learning with a data set selected according to the attribute information of the input image data such as the image imaging conditions and the imaging target is known. There is. For example, in the field of image recognition, a method of reducing calculation cost and improving recognition accuracy by learning with a data set selected according to attribute information such as imaging time and imaging location of input image data is known. (See Patent Document 1).

特開２０１１−０５９８１０号公報Japanese Unexamined Patent Publication No. 2011-059810

上記画像変換技術を適用する場合において、時間の経過によって天気やカメラの露出などが変化すると、それに応じて画像データの特定の属性の属性値、例えば輝度値も経時変化しうる。特許文献１の学習手法では、映像データを構成する一連のフレーム画像データの属性情報が変化した場合、属性情報の変化に応じて学習で用いるデータセットを切り替える。データセットを変えて学習すると、複数の異なる変換パラメータが得られる。推論段階では、学習結果として得られた異なる変換パラメータそれぞれに基づいて画像処理されたフレーム画像データでは、その画質が異なる。その結果、映像データのうち、カット割りされていない１つのショットの途中で変換パラメータの切り替えが生じると、画質が途中で突然大きく変化するなどして、不連続的で違和感のある映像データとなる。 When the above image conversion technique is applied, if the weather or the exposure of the camera changes with the passage of time, the attribute value of a specific attribute of the image data, for example, the luminance value may also change with time. In the learning method of Patent Document 1, when the attribute information of a series of frame image data constituting the video data changes, the data set used for learning is switched according to the change of the attribute information. Training with different datasets yields several different transformation parameters. In the inference stage, the image quality of the frame image data image-processed based on the different conversion parameters obtained as the learning result is different. As a result, if the conversion parameters are switched in the middle of one shot of the video data that is not cut and divided, the image quality suddenly changes significantly in the middle, resulting in discontinuous and uncomfortable video data. ..

そこで本開示の技術は、画像データを変換処理する際に使用する変換パラメータが切り替わることで生じる画像変換の処理結果の不連続性を軽減することを目的とする。 Therefore, the technique of the present disclosure aims to reduce the discontinuity of the processing result of the image conversion caused by switching the conversion parameters used in the conversion processing of the image data.

本開示の技術は、画像処理装置であって、入力画像データを取得する取得手段と、前記取得手段により取得された前記入力画像データに対して所定の画像変換を行う変換手段と、を有し、前記変換手段は、前記入力画像データを構成する複数の部分データのうちの第１の部分データが有する第１の属性値と、前記第１の部分データに隣接する、前記複数の部分データのうちの第２の部分データが有する前記第１の属性値とは異なる第２の属性値とに基づき生成された前記第１の属性値及び前記第２の属性値と関連付けられた変換パラメータに基づいて、前記第１の部分データ及び前記第２の部分データの少なくとも一部に対して前記所定の画像変換を行うことを特徴とする画像処理装置である。 The technique of the present disclosure is an image processing apparatus, and includes an acquisition means for acquiring input image data and a conversion means for performing a predetermined image conversion on the input image data acquired by the acquisition means. The conversion means has the first attribute value of the first partial data among the plurality of partial data constituting the input image data, and the plurality of partial data adjacent to the first partial data. Based on the first attribute value generated based on the second attribute value different from the first attribute value possessed by the second partial data, and the conversion parameter associated with the second attribute value. The image processing apparatus is characterized in that the predetermined image conversion is performed on at least a part of the first partial data and the second partial data.

本開示によれば、画像データを変換処理する際に使用する変換パラメータが切り替わることで生じる画像変換の処理結果の不連続性を軽減することができる。 According to the present disclosure, it is possible to reduce the discontinuity of the processing result of the image conversion caused by switching the conversion parameters used in the conversion processing of the image data.

実施形態１を適用しうる撮像システムの模式図。The schematic diagram of the imaging system to which Embodiment 1 can be applied. 画像処理装置のハードウェア構成を示すブロック図。The block diagram which shows the hardware configuration of an image processing apparatus. 実施形態１の課題を説明する図。The figure explaining the subject of Embodiment 1. 実施形態１における画像変換処理の概要を説明する図。The figure explaining the outline of the image conversion process in Embodiment 1. 実施形態１における画像処理装置の機能構成を示すブロック図。The block diagram which shows the functional structure of the image processing apparatus in Embodiment 1. FIG. 実施形態１における画像変換処理の流れを示すフローチャート。The flowchart which shows the flow of the image conversion process in Embodiment 1. 実施形態１におけるデータセット構成比の決定法を説明する図。The figure explaining the method of determining the data set composition ratio in Embodiment 1. 実施形態３における画像変換処理の概要を説明する図。The figure explaining the outline of the image conversion process in Embodiment 3. データセット構築を支援するユーザインタフェースの構成例。A configuration example of a user interface that supports data set construction. 被写体の変化に本開示の技術を適用する例を示した図。The figure which showed the example which applied the technique of this disclosure to the change of a subject.

以下、本開示の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

Embodiment 1

＜撮像システムの全体構成＞
実施形態１では、画像変換技術の１つである高解像度化を行う画像処理装置の例について説明する。具体的には、スポーツ選手を被写体として撮像し、得られた高解像度な画像データを含むデータセットで変換パラメータに関して学習を行う。学習結果に基づき生成された変換パラメータを用いて、低解像度な画像データを高解像度な画像データに変換して出力する。 <Overall configuration of imaging system>
In the first embodiment, an example of an image processing apparatus for increasing the resolution, which is one of the image conversion techniques, will be described. Specifically, an athlete is imaged as a subject, and learning is performed on conversion parameters using a data set including the obtained high-resolution image data. Using the conversion parameters generated based on the learning results, low-resolution image data is converted to high-resolution image data and output.

以下、本実施形態の構成を述べる。図１は、本開示の技術を適用し得る撮像システムの一例を示した模式図である。撮像システムは、仮想視点画像データの生成のために用いられるものである。撮像システムは、複数の撮像装置による撮像に基づく複数の画像データと、指定された仮想視点とに基づいて、指定された仮想視点からの見えを表す仮想視点画像データが生成されるステムである。本実施形態における仮想視点画像データは、自由視点映像データとも呼ばれるものである。なお、この仮想視点画像データは、ユーザが自由に（任意に）指定した視点に対応する画像データに限定されず、例えば複数の候補からユーザが選択した視点に対応する画像データなども仮想視点画像データに含まれる。また、本実施形態では仮想視点の指定がユーザ操作により行われる場合を中心に説明するが、仮想視点の指定が画像解析の結果等に基づいて自動で行われてもよい。また、本実施形態では仮想視点画像データが動画データである場合を中心に説明するが、仮想視点画像データは静止画データであってもよい。 Hereinafter, the configuration of this embodiment will be described. FIG. 1 is a schematic view showing an example of an imaging system to which the technique of the present disclosure can be applied. The imaging system is used for generating virtual viewpoint image data. The imaging system is a stem that generates virtual viewpoint image data representing the appearance from a designated virtual viewpoint based on a plurality of image data based on imaging by a plurality of imaging devices and a designated virtual viewpoint. The virtual viewpoint image data in the present embodiment is also called free viewpoint video data. The virtual viewpoint image data is not limited to the image data corresponding to the viewpoint freely (arbitrarily) specified by the user, and for example, the image data corresponding to the viewpoint selected by the user from a plurality of candidates is also a virtual viewpoint image. Included in the data. Further, in the present embodiment, the case where the virtual viewpoint is specified by the user operation will be mainly described, but the virtual viewpoint may be automatically specified based on the result of image analysis or the like. Further, in the present embodiment, the case where the virtual viewpoint image data is moving image data will be mainly described, but the virtual viewpoint image data may be still image data.

図１で示すように、スタジアムには撮像装置１０１が配置されており、注目する被写体である選手１０５が撮像され、画像データ１０８が得られる。一方、スタジアムには撮像装置１０２も配置され、撮像装置１０２は、例えば撮像装置１０１に比べて長い焦点距離のレンズを有し、画像データ１０８に比べ画角は狭いものの、より高解像度で被写体が描写された画像データ１０９を得ることができる。 As shown in FIG. 1, an image pickup device 101 is arranged in the stadium, and a player 105, which is a subject of interest, is imaged and image data 108 is obtained. On the other hand, an image pickup device 102 is also arranged in the stadium, and the image pickup device 102 has a lens having a longer focal length than, for example, the image pickup device 101, and although the angle of view is narrower than that of the image data 108, the subject has a higher resolution. The depicted image data 109 can be obtained.

撮像装置１０１で得られた画像データ１０８の少なくとも一部分を、画像データ１０９と同程度の高解像度な画像データにするための画像処理装置１０６、また表示装置１０７も撮像システムに含まれる。なお、画像処理装置１０６において、仮想視点画像が生成されてもよい。 The image pickup system also includes an image processing device 106 for converting at least a part of the image data 108 obtained by the image pickup device 101 into image data having a high resolution similar to that of the image data 109, and a display device 107. The image processing device 106 may generate a virtual viewpoint image.

撮像装置１０１と同じく低解像度で被写体を撮像する撮像装置１０３、撮像装置１０２と同じく高解像度で被写体を撮像する撮像装置１０４が他に複数存在しても良い。また、図１ではスポーツシーンを例にとって説明したが、異なる解像度で同一の被写体を撮像する一般的なシーンにも適用可能である。また、高解像度化される部分としては、被写体である選手全体や、選手の顔や四肢であってもよいし、それ以外であってもよい。 There may be a plurality of other image pickup devices 103 that capture a subject at a low resolution like the image pickup device 101, and a plurality of other image pickup devices 104 that capture a subject at a high resolution like the image pickup device 102. Further, although the sports scene has been described as an example in FIG. 1, it can be applied to a general scene in which the same subject is imaged at different resolutions. Further, the portion where the resolution is increased may be the entire player who is the subject, the face or limbs of the player, or other parts.

なお、画像データ１０９は、画像データ１０８の被写体部分を高解像度化するための教師画像データとして、学習段階で用いられてもよい。 The image data 109 may be used in the learning stage as teacher image data for increasing the resolution of the subject portion of the image data 108.

図２は、本実施形態の画像処理装置１０６の構成を示す図である。 FIG. 2 is a diagram showing the configuration of the image processing device 106 of the present embodiment.

画像処理装置１０６は、ＣＰＵ２０１と、ＲＡＭ２０２と、ＲＯＭ２０３と、記憶部２０４と、入力インターフェース２０５と、出力インターフェース２０６と、システムバス２０７とから構成される。外部メモリ２０８は入力インターフェース２０５と出力インターフェース２０６に接続されており、表示装置１０３出力インターフェース２０６に接続されている。 The image processing device 106 includes a CPU 201, a RAM 202, a ROM 203, a storage unit 204, an input interface 205, an output interface 206, and a system bus 207. The external memory 208 is connected to the input interface 205 and the output interface 206, and is connected to the display device 103 output interface 206.

ＣＰＵ２０１は、画像処理装置１０６の各構成を統括的に制御するプロセッサであり、ＲＡＭ２０２は、ＣＰＵ２０１の主メモリ、ワークエリアとして機能するメモリである。そして、ＲＯＭ２０３は画像処理装置１０６内の処理に用いられるプログラム等を格納するメモリである。ＣＰＵ２０１は、ＲＡＭ２０２をワークエリアとしてＲＯＭ２０３に格納されたプログラムを実行することで、後述する様々な処理を実行する。記憶部２０４は、画像処理装置１０６での処理に用いる画像データや、処理のためのパラメータなどを記憶する記憶デバイスである。記憶部２０４としてはＨＤＤや光ディスクドライブ、フラッシュメモリなどを用いることができる。 The CPU 201 is a processor that collectively controls each configuration of the image processing device 106, and the RAM 202 is a memory that functions as a main memory and a work area of the CPU 201. The ROM 203 is a memory for storing a program or the like used for processing in the image processing device 106. The CPU 201 executes various processes described later by executing a program stored in the ROM 203 with the RAM 202 as a work area. The storage unit 204 is a storage device that stores image data used for processing in the image processing device 106, parameters for processing, and the like. As the storage unit 204, an HDD, an optical disk drive, a flash memory, or the like can be used.

入力インターフェース２０５は、例えばＵＳＢやＩＥＥＥ１３９４などのシリアルバスインターフェースである。画像処理装置１０６は、この入力インターフェース２０５を介して、外部メモリ２０８（例えば、ＨＤＤ、ＣＦカードやＳＤカードなどのメモリカード、ＵＳＢメモリなど）から処理対象の画像データ等を取得することができる。 The input interface 205 is a serial bus interface such as USB or IEEE1394. The image processing device 106 can acquire image data or the like to be processed from an external memory 208 (for example, a memory card such as an HDD, a CF card or an SD card, a USB memory, etc.) via the input interface 205.

出力インターフェース２０６は、例えばＤＶＩやＨＤＭＩ（登録商標）などの映像出力端子である。画像処理装置１０６は、この出力インターフェース２０６を介して、表示装置１０３（液晶ディスプレイなどの画像表示デバイス）に、画像処理装置１０６で処理した画像データを出力することができる。なお、画像処理装置１０６の構成要素は上記のもの以外にも存在するが、本開示の技術の主眼ではないため、説明を省略する。 The output interface 206 is a video output terminal such as DVI or HDMI (registered trademark). The image processing device 106 can output the image data processed by the image processing device 106 to the display device 103 (an image display device such as a liquid crystal display) via the output interface 206. Although there are components of the image processing device 106 other than those described above, the description thereof will be omitted because they are not the main focus of the technique of the present disclosure.

＜課題と解決手段の概要＞
図３は、本開示の技術における課題を説明する図である。 <Outline of issues and solutions>
FIG. 3 is a diagram illustrating a problem in the technique of the present disclosure.

高解像度化の対象である選手の顔を含む映像データ３０１が、撮像装置１０１を用いて撮像されている。図３では、映像データ３０１の各フレーム画像データを、時刻が進む順に並べて表示している。映像データ３０１の前半のフレーム画像データは、雲がかかるなど照度が低くい環境下で撮像され、低輝度な画像データとなっている。一方、映像データ３０１の後半のフレーム画像データは、雲がとれて日が差すなど照度が高い環境下で撮像され、輝度高めな画像データとなっている。なお、ここで低輝度（露出アンダー）な画像データ、高輝度（露出オーバー）な画像データとは、比較対象となる他の画像データに対して低輝度な画像データ又は高輝度な画像データであることを意味する。すなわち、低輝度、高輝度とは、画像データの相対的な輝度レベルを表すものである。 The video data 301 including the player's face, which is the target of high resolution, is imaged by using the image pickup device 101. In FIG. 3, each frame image data of the video data 301 is displayed side by side in the order in which the time advances. The frame image data in the first half of the video data 301 is imaged in an environment where the illuminance is low such as being covered with clouds, and is low-luminance image data. On the other hand, the frame image data in the latter half of the video data 301 is imaged in an environment with high illuminance such as when clouds are removed and the sun is shining, and is image data with high brightness. Here, the low-luminance (underexposure) image data and the high-brightness (overexposure) image data are low-luminance image data or high-brightness image data with respect to other image data to be compared. Means that. That is, low brightness and high brightness represent relative brightness levels of image data.

撮像された画像データを高精度に高解像度化するためには、画像データの撮像条件や撮像対象といった、画像データの特定の属性の属性値に対応付けられたデータセットで学習を行う必要がある。従って、属性値が曇り（低輝度）の画像データに対しては、曇りの属性値に対応付けられた画像データで構築したデータセット３０２で学習を行って得られた変換パラメータを用いて高解像度化を行う。これにより、曇りの低解像度な画像データ３０４は、高解像度な画像データ３０５に変換される。同様に、属性値が晴れ（高輝度）の画像データ３０６に関しても、晴れの属性値に対応付けられた画像データで構築したデータセット３０３で学習を行って得られた変換パラメータを用いて高解像度化を行い、高解像度な画像データ３０７に変換する。 In order to increase the resolution of the captured image data with high accuracy, it is necessary to perform training with a data set associated with the attribute value of a specific attribute of the image data, such as the imaging condition of the image data and the imaging target. .. Therefore, for image data with cloudy (low brightness) attribute values, high resolution is used using the conversion parameters obtained by training with the dataset 302 constructed from the image data associated with the cloudy attribute values. To make it. As a result, the cloudy low-resolution image data 304 is converted into the high-resolution image data 305. Similarly, for the image data 306 having a sunny (high brightness) attribute value, high resolution is used using the conversion parameters obtained by training with the dataset 303 constructed from the image data associated with the sunny attribute value. It is converted into high-resolution image data 307.

このとき、学習用データセットを切り替える直前・直後のフレーム画像データを画像データ３０８・画像データ３０９とする。教師画像データとなる画像データ群に基づき学習を行う機械学習を用いた画像変換では、出力画像データは、学習に用いた教師画像データの画像データ群と特性が類似する。そのため、中間的な輝度の画像データ３０８は、低輝度の画像データからなるデータセット３０２の特性に影響を受け、低輝度な高解像度画像データ３０９に変換される。同様に、中間的な輝度値の画像データ３１０は、高輝度な高解像度画像データ３１１に変換される。 At this time, the frame image data immediately before and after switching the learning data set is referred to as image data 308 and image data 309. In the image conversion using machine learning in which learning is performed based on the image data group to be the teacher image data, the output image data has similar characteristics to the image data group of the teacher image data used for the training. Therefore, the intermediate brightness image data 308 is affected by the characteristics of the data set 302 composed of the low brightness image data, and is converted into the low brightness high resolution image data 309. Similarly, the image data 310 having an intermediate brightness value is converted into high-brightness high-resolution image data 311.

結果として、高解像度化された画像データを時刻順に繋げて映像データとした場合、隣接する２フレーム画像データである画像データ３０９と画像データ３１１の間に大きな輝度の差が発生する。これにより、違和感のある映像データが生成される。以下では、隣接するフレーム画像データ間で画像データの特性が大きく変化するという性質を不連続性と表現する。 As a result, when the high-resolution image data is connected in chronological order to form video data, a large difference in brightness occurs between the image data 309 and the image data 311 which are adjacent two-frame image data. As a result, video data with a sense of incongruity is generated. In the following, the property that the characteristics of image data change significantly between adjacent frame image data is expressed as discontinuity.

また、本開示では、映像データ（動画データ）におけるフレーム画像データを、画像データにおける部分データとし、フレーム画像データ群を画像データにおける部分データ群とする。同様に、静止画データが表す画像を構成する領域ごとの部分画像を表す部分画像データを、画像データにおける部分データ又は部分データ群とする。 Further, in the present disclosure, the frame image data in the video data (moving image data) is referred to as partial data in the image data, and the frame image data group is referred to as a partial data group in the image data. Similarly, the partial image data representing the partial image for each region constituting the image represented by the still image data is defined as the partial data or the partial data group in the image data.

以上の課題に対し、本開示の技術では学習用データセットを段階的に変化させる。図４に、本開示の実施形態１に係る画像変換処理の概要を示す。画像データの属性値が曇りから晴れに切り替わる時刻の周辺の画像データに対しては、曇りの画像データと晴れの画像データとを混合したデータセット４０１、４０２を用いてそれぞれ変換パラメータを求め、それらを用いて高解像度化を行う。このデータセット４０１、４０２を用いて得られた変換パラメータは、曇りおよび晴れの属性値の両方に関連付けられた変換パラメータとなるため、曇りの画像データと晴れの画像データとの両方の属性値の特徴を併せ持った高解像度画像データが出力される。その結果、中間的な輝度の画像データ４０３、４０４が生成され、輝度の変化がより滑らかになる。ここで、データセット４０１は曇りの画像データと晴れの画像データとを２対１の割合で含み、データセット４０２はそれらを１対２の割合で含むように構築されている。 In response to the above problems, the technique of the present disclosure changes the learning data set step by step. FIG. 4 shows an outline of the image conversion process according to the first embodiment of the present disclosure. For the image data around the time when the attribute value of the image data changes from cloudy to sunny, conversion parameters are obtained using datasets 401 and 402, which are a mixture of cloudy image data and sunny image data, respectively, and they are used. To increase the resolution using. The conversion parameters obtained using the datasets 401 and 402 are the conversion parameters associated with both the cloudy and sunny attribute values, so that the attribute values of both the cloudy image data and the sunny image data High-resolution image data with features is output. As a result, image data 403 and 404 with intermediate brightness are generated, and the change in brightness becomes smoother. Here, the data set 401 is constructed to include cloudy image data and sunny image data in a ratio of 2: 1 and the data set 402 is constructed to include them in a ratio of 1: 2.

なお、上記の例では天気による画像データの輝度の差異について説明したが、他にも天気によって反射特性や色温度や影の差異など他の属性値が存在し、それらがもたらす不連続性にも本実施形態を適用可能である。また、日照の強度差、昼夜の差、撮像場所による画像特性の差、被写体の変化、撮像装置の撮像パラメータの変化といった、天気以外の属性変化にも適用可能である。 In the above example, the difference in brightness of image data due to the weather was explained, but there are other attribute values such as reflection characteristics, color temperature, and shadow differences depending on the weather, and the discontinuity caused by them also exists. This embodiment is applicable. It can also be applied to changes in attributes other than weather, such as differences in sunshine intensity, differences between day and night, differences in image characteristics depending on the imaging location, changes in the subject, and changes in imaging parameters of the imaging device.

＜画像処理装置の構成と処理の流れ＞
以下、実施形態１の画像処理装置１０６で行われる処理について、図５及び図６を参照して説明する。図５は、画像処理装置１０６の機能構成を示すブロック図である。画像処理装置１０６は、ＲＯＭ２０３に格納されたプログラムをＣＰＵ２０１がＲＡＭ２０２をワークメモリとして実行することで、図５に示す各構成部として機能し、図６のフローチャートに示す処理を実行する。なお、以下に示す処理の全てがＣＰＵ２０１によって実行される必要はなく、処理の一部又は全部が、ＣＰＵ２０１以外の１つ又は複数の処理回路によって行われるように画像処理装置１０６が構成されてもよい。以下、各構成部により行われる処理の流れを説明する。 <Configuration of image processing device and processing flow>
Hereinafter, the processing performed by the image processing apparatus 106 of the first embodiment will be described with reference to FIGS. 5 and 6. FIG. 5 is a block diagram showing a functional configuration of the image processing device 106. The image processing device 106 functions as each component shown in FIG. 5 by executing the program stored in the ROM 203 by the CPU 201 using the RAM 202 as the work memory, and executes the process shown in the flowchart of FIG. It should be noted that it is not necessary that all of the processes shown below are executed by the CPU 201, and even if the image processing device 106 is configured so that a part or all of the processes is performed by one or a plurality of processing circuits other than the CPU 201. Good. Hereinafter, the flow of processing performed by each component will be described.

まず、学習段階での処理を説明する。 First, the processing at the learning stage will be described.

ステップＳ６０１において、教師画像取得部５０１は、高解像度で注目する被写体、ここでは選手の顔を撮像する撮像装置１０２から、又は記憶部２０４から教師画像データを取得する。すなわち、本実施形態における教師画像データは、注目する被写体である選手の顔が高解像度で映っている画像データである。教師画像データとして用いる画像データに、注目する被写体以外の被写体が映っている領域を多く含む場合、データ量を抑えるために、注目する被写体である選手の顔部分以外の領域を省いた画像データを生成し、それを教師画像データとすることが望ましい。取得された教師画像データは、属性判定部５０２に出力される。 In step S601, the teacher image acquisition unit 501 acquires teacher image data from the image pickup device 102 that captures the subject of interest at high resolution, here the face of the athlete, or from the storage unit 204. That is, the teacher image data in the present embodiment is image data in which the face of the athlete who is the subject of interest is reflected in high resolution. When the image data used as the teacher image data includes a large area in which a subject other than the subject of interest is reflected, in order to reduce the amount of data, the image data excluding the area other than the face of the player who is the subject of interest is used. It is desirable to generate it and use it as teacher image data. The acquired teacher image data is output to the attribute determination unit 502.

ステップＳ６０２において、属性判定部５０２は、受け取った教師画像データの属性及び属性値を判定する。属性として用いるのは、撮像時の環境の情報、被写体（撮像対象）の情報、撮像パラメータの情報、撮像装置の情報といった事項である。撮像時の環境の情報における属性値としては、天気、昼夜、撮像時刻、撮像場所などがある。被写体の情報における属性値としては、人物か物体かといったクラスラベル、人物ＩＤ、性別、国籍、身体部位などがある。撮像パラメータの情報における属性値としは、露出値、絞り値、シャッター速度、撮像距離などがある。撮像装置の情報における属性値としては、機種、個体ＩＤなどがある。これらの中から、学習精度の向上に寄与するものを選択して用いる。 In step S602, the attribute determination unit 502 determines the attributes and attribute values of the received teacher image data. Items used as attributes include information on the environment at the time of imaging, information on the subject (imaging target), information on imaging parameters, and information on the imaging device. Attribute values in the environment information at the time of imaging include weather, day and night, imaging time, imaging location, and the like. The attribute values in the information of the subject include a class label such as a person or an object, a person ID, a gender, a nationality, and a body part. The attribute values in the imaging parameter information include an exposure value, an aperture value, a shutter speed, an imaging distance, and the like. The attribute values in the information of the image pickup apparatus include a model, an individual ID, and the like. From these, those that contribute to the improvement of learning accuracy are selected and used.

撮像装置から撮像時のパラメータを受け取るか、教師画像データから解析・認識を行うことにより、上記の属性及び属性値の少なくとも一部を算出又は推定する。顔画像データにおいて天気に関する属性の属性値を推定する例では、顔中心部の輝度値の平均値が所定の閾値以上であれば晴れ、閾値未満であれば曇りと判定する。得られた属性情報は、タグとして教師画像データに付与され、データセット取得部５０６に出力される。なお、既存のデータセットから、属性値タグ付きの教師画像データを取得しても構わない。 At least a part of the above attributes and attribute values is calculated or estimated by receiving the parameters at the time of imaging from the imaging device or performing analysis / recognition from the teacher image data. In the example of estimating the attribute value of the attribute related to the weather in the face image data, if the average value of the brightness values in the center of the face is equal to or more than a predetermined threshold value, it is determined to be sunny, and if it is less than the threshold value, it is determined to be cloudy. The obtained attribute information is added to the teacher image data as a tag and output to the data set acquisition unit 506. It should be noted that the teacher image data with the attribute value tag may be acquired from the existing data set.

ステップＳ６０３において、入力画像取得部５０３は、低解像度で注目する被写体を撮像する撮像装置１０１から、又は記憶部２０４から入力画像データを取得する。ここで、入力画像データとは、画像変換を施して高解像度化する対象となる画像データを指す。入力画像データは注目する被写体である選手の顔が映っている画像データであり、教師画像データと同様に、必要に応じて注目する被写体である選手の顔部分を切り抜いて生成する。取得された入力画像データは、属性値変化検出部５０４に出力される。また、入力画像データは、教師画像データに対してフィルタ処理を行って低解像度化したものを用いてもよい。 In step S603, the input image acquisition unit 503 acquires the input image data from the image pickup device 101 that captures the subject of interest at a low resolution or from the storage unit 204. Here, the input image data refers to image data that is subject to image conversion to increase the resolution. The input image data is image data showing the face of the player who is the subject of interest, and is generated by cutting out the face portion of the player who is the subject of interest as necessary, as in the teacher image data. The acquired input image data is output to the attribute value change detection unit 504. Further, as the input image data, the teacher image data may be filtered to reduce the resolution.

ステップＳ６０４において、属性値変化検出部５０４は、受け取った入力画像データに対し、ステップＳ６０２と同様の手段により属性値タグを付与する。加えて、その属性値の経時変化を検出し、属性値が切り替わる時刻である属性値変化時刻Ｔを算出する。顔画像データの曇りから晴れへの属性値の変化を検出する例においては、顔の輝度値を測定して判定を行う。ある注目する時刻における顔の輝度値は、予め計測されたフレーム画像データ毎の顔領域の輝度値の平均を、注目する時刻前後の所定の時間（例えば０．５〜１．０秒程度）内に存在するフレーム画像データについて平均した値とする。時間方向の平均をとることで、フリッカーなどによる輝度値の変動の影響を抑制することができる。この注目する時刻における顔の輝度値が、曇りと晴れとを区別する基準となる所定の輝度レベルを超えた又は割った場合に属性値に変化ありと判定する。この属性値に変化ありと判定された時刻を属性値変化時刻Ｔとする。属性値変化時刻Ｔと属性値変化時刻Ｔ前後の所定の時間内のフレーム画像データの属性値の情報は、混合比率決定部５０５に出力される。 In step S604, the attribute value change detection unit 504 attaches an attribute value tag to the received input image data by the same means as in step S602. In addition, the change with time of the attribute value is detected, and the attribute value change time T, which is the time when the attribute value is switched, is calculated. In the example of detecting the change of the attribute value from cloudy to sunny in the face image data, the brightness value of the face is measured to make a judgment. The brightness value of the face at a certain time of interest is the average of the brightness values of the face area for each frame image data measured in advance within a predetermined time (for example, about 0.5 to 1.0 seconds) before and after the time of interest. It is an average value for the frame image data existing in. By taking the average in the time direction, it is possible to suppress the influence of fluctuations in the brightness value due to flicker or the like. When the brightness value of the face at the time of interest exceeds or divides a predetermined brightness level that is a reference for distinguishing between cloudy and sunny, it is determined that the attribute value has changed. The time when it is determined that there is a change in this attribute value is defined as the attribute value change time T. The information of the attribute value of the frame image data within the predetermined time before and after the attribute value change time T and the attribute value change time T is output to the mixing ratio determination unit 505.

ステップＳ６０５において、混合比率決定部５０５は、受け取った情報に基づき、データセットの教師画像データの混合比率を決定する。混合比率は、属性値変化時間幅Ｗと比率変更段階数Ｍとに応じて決定される。属性値変化時間幅Ｗは、属性値が一定以上変化している変化領域データの開始から終了までの時間である。比率変更段階数Ｍは、属性値の異なるデータセットの混合比率を何段階に変化させるかを決める値であり、変化領域データの分割数、すなわち変化領域データを分割してできた分割部分データの数である。 In step S605, the mixing ratio determination unit 505 determines the mixing ratio of the teacher image data of the data set based on the received information. The mixing ratio is determined according to the attribute value change time width W and the ratio change step number M. The attribute value change time width W is the time from the start to the end of the change area data in which the attribute value has changed by a certain amount or more. The ratio change step number M is a value that determines how many steps the mixing ratio of the data sets having different attribute values is changed, and is the number of divisions of the change area data, that is, the divided partial data created by dividing the change area data. It is a number.

曇りから晴れへの属性値変化を検出する例では、上記顔の輝度値が、曇りの基準値から、例えば曇りの時間帯の平均的な輝度値から所定の閾値以上乖離した時点を属性値変化の開始点とする。そして、上記顔の輝度値が、晴れの基準値から、例えば晴れの時間帯の平均的な輝度値から所定の閾値以内の値に達した時点を終了点とする。これら開始点と終了点との両時刻の差分から属性値変化時間幅Ｗを算出する。 In the example of detecting the change in the attribute value from cloudy to sunny, the attribute value changes when the brightness value of the face deviates from the reference value of cloudiness by, for example, the average brightness value in the cloudy time zone by a predetermined threshold value or more. Let is the starting point of. Then, the end point is when the brightness value of the face reaches a value within a predetermined threshold value from the reference value of fine weather, for example, the average brightness value of the sunny time zone. The attribute value change time width W is calculated from the difference between both the start point and the end point.

比率変更段階数Ｍは、曇りと晴れの輝度値の差に比例する値、例えば、所定の１段階の変更量で曇りと晴れの輝度値の差を割ったときの商に設定する。なお、属性値変化時間幅Ｗと比率変更段階数Ｍは、事前に求めた所与の値にするなど、別の方法で設定しても構わない。 The ratio change step number M is set to a value proportional to the difference between the brightness values of cloudy and sunny, for example, the quotient when the difference between the brightness values of cloudy and sunny is divided by a predetermined one-step change amount. The attribute value change time width W and the ratio change step number M may be set by other methods such as setting them to given values obtained in advance.

この決定の処理を、図７を用いて説明する。図７では、選手の顔の映像データ３０１が第１の属性値（曇り）から第２の属性値（晴れ）に切り替わる瞬間を表している。時刻７０１が属性値変化時刻Ｔ、幅７０２が属性値変化時間幅Ｗを表す。この属性値変化時間幅Ｗの変化領域データをＭ個の区域７０３に分割し、各区域についてデータセットを構築する。図７ではＭ＝４の例を示している。そして、各区域における属性値の異なる教師画像データの混合比率を決定する。混合比率は、第ｋ番目（ｋ＝１、２、・・・、Ｍ）の区域Ｒ_kにおいて、第１の属性値の教師画像データと第２の属性値の教師画像データとを（Ｍ−ｋ＋１）対ｋの割合で混合するように決定される。例えば、データセット７０４はｋ＝２の区域Ｒ₂に位置するため、混合比率は３対２となる。決定された混合比率は、データセット取得部５０６に出力される。 The process of this determination will be described with reference to FIG. FIG. 7 shows the moment when the video data 301 of the player's face switches from the first attribute value (cloudy) to the second attribute value (sunny). The time 701 represents the attribute value change time T, and the width 702 represents the attribute value change time width W. The change area data of the attribute value change time width W is divided into M areas 703, and a data set is constructed for each area. FIG. 7 shows an example of M = 4. Then, the mixing ratio of the teacher image data having different attribute values in each area is determined. The mixing ratio is such that in the kth (k = 1, 2, ..., M) area R _k , the teacher image data of the first attribute value and the teacher image data of the second attribute value are (M-). k + 1) Determined to mix at a ratio of to k. For example, since the data set 704 is located in the area R ₂ of k = 2, the mixing ratio is 3: 2. The determined mixing ratio is output to the data set acquisition unit 506.

ステップＳ６０６において、データセット取得部５０６は、受け取った混合比率に基づいて区域毎の学習用データセットを構築する。属性判定部５０２から入力された教師画像データの集合から、受け取った混合比率に基づき属性値１のタグが付与された画像データをｎ_k1枚と、属性値２のタグが付与された画像データをｎ_k2枚とを抽出し、それらを混合して各区域のデータセットとする。ここで上記ｎ_k1、ｎ_k2は、それぞれ In step S606, the data set acquisition unit 506 constructs a learning data set for each area based on the received mixing ratio. From a set of teacher image data input from the attribute determining unit 502, the received and _k1 Like n image data tag attribute value 1 is assigned based on the mixture ratio, the image data which the tag is assigned attribute values 2 _{Two nks} are extracted and mixed to obtain a data set for each area. Here, the above n _k1 and n _k2 are respectively.

であり、ｎは学習に必要な画像データ数である。構築された各区域のデータセットは、学習部５０７に出力される。なお、受け取った混合比率のデータセットが既に存在する場合、既存のデータセットを取得し、それを学習用データセットとして学習部５０７に出力しても構わない。 And n is the number of image data required for learning. The data set of each constructed area is output to the learning unit 507. If the received mixed ratio data set already exists, the existing data set may be acquired and output to the learning unit 507 as a learning data set.

ステップＳ６０７において、学習部５０７は、受け取った区域毎の学習用データセットに基づき、低解像度な画像データを高解像度な画像データに変換するための変換パラメータを区域毎に学習する。ここでは既存のよく知られている画像変換ニューラルネットワークを学習器として用いる。得られた区域毎の変換パラメータは、画像変換部５０８に出力される。 In step S607, the learning unit 507 learns conversion parameters for converting low-resolution image data into high-resolution image data for each area based on the received learning data set for each area. Here, an existing well-known image conversion neural network is used as a learner. The obtained conversion parameters for each area are output to the image conversion unit 508.

ステップＳ６０８は、推論段階である。ステップＳ６０８において、画像変換部５０８は、学習部５０７から区域毎の変換パラメータを取得し、属性値変化検出部５０４から入力画像データを取得する。そして、学習部５０７で用いたものと同じニューラルネットワークと、受け取った区域毎の変換パラメータに基づき、対応する区域に属する低解像度な入力画像データに対して変換処理を行い、高解像度化された画像データを出力する。すなわち、入力画像データが第ｋ番目の小区域に属する場合、同じく第ｋ番目の小区域の学習用データセットで学習された変換パラメータに基づいて画像変換を行う。なお、ステップＳ６０７と６０８においては、画像変換ニューラルネットワークに代えて、ＳＶＭやランダムフォレストといった、別の学習器を用いても構わない。 Step S608 is an inference step. In step S608, the image conversion unit 508 acquires the conversion parameters for each area from the learning unit 507, and acquires the input image data from the attribute value change detection unit 504. Then, based on the same neural network used in the learning unit 507 and the conversion parameters for each received area, conversion processing is performed on the low-resolution input image data belonging to the corresponding area to increase the resolution of the image. Output data. That is, when the input image data belongs to the k-th sub-area, image conversion is performed based on the conversion parameters learned in the learning data set of the k-th sub-area as well. In steps S607 and 608, another learning device such as SVM or random forest may be used instead of the image conversion neural network.

また、画像変換部５０８は、学習部５０７を兼ねてもよい。また、画像処理装置１０６は、入力画像取得部５０３と、属性変化検出部５０４と画像変換部５０８とを有し、画像変換部５０８は、事前に学習された学習部５０７である構成であってもよい。つまり、図６におけるＳ６０１〜Ｓ６０７は、別の装置で行われてもよく、Ｓ６０８が画像処理装置１０６で行われる構成であってもよい。 Further, the image conversion unit 508 may also serve as the learning unit 507. Further, the image processing device 106 has an input image acquisition unit 503, an attribute change detection unit 504, and an image conversion unit 508, and the image conversion unit 508 is a learning unit 507 that has been learned in advance. May be good. That is, S601 to S607 in FIG. 6 may be performed by another device, or S608 may be performed by the image processing device 106.

なお、本実施形態では高解像度な教師画像データを獲得するために焦点距離がより長い撮像装置を利用したが、焦点距離の長さは同じままでより高画素の撮像素子を備えた撮像装置を利用してもよい。 In the present embodiment, an image sensor having a longer focal length is used in order to acquire high-resolution teacher image data, but an image sensor equipped with an image sensor having a higher pixel count while maintaining the same focal length is used. You may use it.

また、奥行方向に距離のある複数の被写体を被写界深度内におさめて撮像した場合、１つの画像データ内において、手前に位置する被写が、奥に位置する被写体よりも大きく、高解像度に映る。そのため、被写界深度を深く取って、被写界深度の後端側（奥側）に注目する被写体を配置して撮像し、被写界深度の前端側（手前側）に位置した被写体の画像データを教師画像データとして利用してもよい。 Further, when a plurality of subjects having a distance in the depth direction are captured within the depth of field, the subject located in the foreground is larger than the subject located in the back in one image data, and the resolution is high. It is reflected in. Therefore, the depth of field is set deep, and the subject of interest is placed on the rear end side (back side) of the depth of field for imaging, and the subject located on the front end side (front side) of the depth of field is imaged. The image data may be used as the teacher image data.

また、本実施形態では画像変換のタスクとして高解像度化の例を示したが、入力画像データに対応する別ドメインの画像データ、すなわち別の特性を有する画像データを出力する画像変換技術一般に適用できる。画像変換技術としては、例えばノイズ除去、ブラー除去、領域分割、色相付与、テクスチャ変換、デプスマップ推定などが挙げられる。 Further, in the present embodiment, an example of increasing the resolution is shown as an image conversion task, but it can be generally applied to an image conversion technique for outputting image data of another domain corresponding to input image data, that is, image data having different characteristics. .. Examples of the image conversion technique include noise removal, blur removal, region division, hue addition, texture conversion, and depth map estimation.

以上、本実施形態によれば、入力画像データの属性が経時変化した際にも、画像変換結果の不連続性を低減できる。 As described above, according to the present embodiment, the discontinuity of the image conversion result can be reduced even when the attributes of the input image data change with time.

Embodiment 2

実施形態１では、学習用データセットにおける属性値の異なる教師画像データの構成比を、区域毎に段階的に変更する方法を示した。この方法では、区域ごとに学習用データセットを保持する必要があるため、大きな記憶容量が必要となる。これに対し、実施形態２では、学習用データセットの属性値の異なる教師画像データの構成比は一定とし、学習回数を段階的に変更する。以下で、実施形態２について説明するが、本実施形態の基本的な構成は実施形態１と同様であるため、その説明は割愛する。 In the first embodiment, a method of gradually changing the composition ratio of teacher image data having different attribute values in the learning data set for each area is shown. In this method, since it is necessary to hold a training data set for each area, a large storage capacity is required. On the other hand, in the second embodiment, the composition ratio of the teacher image data having different attribute values of the learning data set is fixed, and the number of learnings is changed stepwise. The second embodiment will be described below, but since the basic configuration of the second embodiment is the same as that of the first embodiment, the description thereof will be omitted.

図５に示すデータセット取得部５０６は、第１の属性値の教師画像データのみから構成される第１の学習用データセットと、第２の属性値の教師画像データのみから構成される第２の学習用データセットを構築し、学習部５０７に出力する。 The data set acquisition unit 506 shown in FIG. 5 has a first learning data set composed of only the teacher image data of the first attribute value, and a second data set composed of only the teacher image data of the second attribute value. The learning data set of is constructed and output to the learning unit 507.

学習部５０７は、混合比率決定部５０５から混合比率を受け取り、ある単一の学習モデルにおいて、第１の学習用データセットと第２の学習用データセットとを受け取った混合比率で交互に用いて学習を行う。すなわち、第１の学習用データセットでｍ_k1回、第２の学習用データセットでｍ_k2回の学習を行う。ｍ_k1、ｍ_k2は、式（１−１）、（１−２）に基づき、ｍ_k1＝ｍ（Ｍ−ｋ＋１）／（Ｍ＋１）、ｍ_k2＝ｍｋ／（Ｍ＋１）であり、ｍは学習に必要な学習回数である。このようにｋの値が大きくなる毎にｍ_k1の値は小さく、ｍ_k2は大きくなるため、段階的に変換結果の画像データの特性が変化する。 The learning unit 507 receives the mixing ratio from the mixing ratio determining unit 505, and in a single learning model, the first learning data set and the second learning data set are alternately used at the received mixing ratio. Do learning. That is, learning is performed m _k1 times with the first learning data set and m _k2 times with the second learning data set. m _k1 and m _k2 are m _k1 = m (M−k + 1) / (M + 1) and m _k2 = mk / (M + 1) based on the equations (1-1) and (1-2), and m is learning. It is the number of learning required for. In this way, as the value of k increases, the value of m _k1 decreases and m _k2 increases, so that the characteristics of the image data of the conversion result change step by step.

これにより、第１の属性値の教師画像データと第２の属性値の教師画像データとの中間的な特徴を持った画像データへの画像変換ができる。 As a result, the image can be converted into image data having an intermediate feature between the teacher image data of the first attribute value and the teacher image data of the second attribute value.

なお、学習回数に替えて、学習率を段階的に変更してもよい。学習回数を第１の学習用データセットと第２の学習用データセットとで一律とし、それぞれのデータセットで学習する際の学習率を学習回数ｍ_k1、ｍ_k2に比例するようにする。 The learning rate may be changed stepwise instead of the number of learnings. The number of learnings is made uniform for the first learning data set and the second learning data set, and the learning rate when learning with each data set is proportional to the number of learnings m _k1 and m _k2 .

また、学習用データセットでなく、学習用バッチの構成比を段階的に変化させてもよい。ニューラルネットワークを学習させる際には、小規模な画像データ集合である学習用バッチを用意し、その学習用バッチで学習を行う、という事を繰り返す。学習用バッチ内の画像データ数をｎとし、属性値の異なる画像データ数の比率を式（１−１）、（１−２）に従うように学習用バッチを構成してもよい。又は、第１の学習用データセットと第２の学習用データセットとからそれぞれ確率ｐ_k1とｐ_k2とで教師画像データを選んで学習用バッチを構成してもよい。ここに、 Further, the composition ratio of the training batch may be changed stepwise instead of the training data set. When training a neural network, a learning batch, which is a small set of image data, is prepared, and learning is performed with the learning batch, which is repeated. The learning batch may be configured so that the number of image data in the learning batch is n and the ratio of the number of image data having different attribute values is according to the formulas (1-1) and (1-2). Alternatively, the teacher image data may be selected from the first learning data set and the second learning data set with the probabilities p _k1 and p _k2 , respectively, to form a learning batch. here,

というように、学習回数ｍ_k1、ｍ_k2に比例するように確率ｐ_k1、ｐ_k2を設定する。なお、関数ｆを、比例式のみならず、多項式や指数関数としてもよい。また、関数ｆを任意の関数とし、その入出力の対応関係をルックアップテーブルとして保持してもよい。 Thus, the probabilities p _k1 and p _k2 are set so as to be proportional to the number of learnings m _k1 and m _k2 . The function f may be a polynomial or an exponential function as well as a proportional expression. Further, the function f may be an arbitrary function, and the input / output correspondence may be held as a lookup table.

また、学習後に、求めた変換パラメータを混合比率に基づいて混合しても構わない。すなわち、第１の学習用データセットのみで学習を行って得られた変換パラメータθ_k1と、第２の学習用データセットのみで学習を行って得られた変換パラメータθ_k2とが求まっているとき、 Further, after learning, the obtained conversion parameters may be mixed based on the mixing ratio. That is, when the conversion parameter θ _k1 obtained by learning only with the first learning data set and the conversion parameter θ _k2 obtained by learning only with the second learning data set are obtained. ,

として変換パラメータθ_kを算出し、画像変換部５０８に出力する。 The conversion parameter θ _k is calculated as, and output to the image conversion unit 508.

以上のように、本実施形態によれば、比率変更段階数Ｍに依らず、１回の属性値変化につき学習用データセットを２個保持すれば良いため、記憶容量を節減できる。 As described above, according to the present embodiment, it is sufficient to hold two learning data sets for each attribute value change regardless of the ratio change step number M, so that the storage capacity can be saved.

Embodiment 3

実施形態１、２では、時間的に属性値の変化を検出し、画像変換結果の不連続性を低減した。本実施形態では、領域による属性値の変化から引き起こされる不連続性、即ち領域不連続性を低減する例を示す。本実施形態の基本的な構成は実施形態１と同様であるため、その説明は割愛する。 In the first and second embodiments, the change in the attribute value is detected over time, and the discontinuity of the image conversion result is reduced. In this embodiment, an example of reducing the discontinuity caused by the change of the attribute value depending on the region, that is, the region discontinuity is shown. Since the basic configuration of the present embodiment is the same as that of the first embodiment, the description thereof will be omitted.

本実施形態の処理を、図８を用いて説明する。ここでは、競技場（スタジアム）に設置された看板を撮像する状況を例にとって説明する。撮像画像データ８０１は、日向と日陰にまたがって配置されている看板を撮像して得られた画像データである。看板の左側には日光が当たっているが、右側は屋根等により日光が遮られて日陰となっている。スタジアムの屋根とグラウンドに置かれた看板のように被写体と光を遮る物との距離が長い場合、光が回り込んで日向の領域と日陰の領域との間では照度が連続的に変化する。そのため、撮像画像データ８０１は、日向の領域と日陰の領域との間で連続的に輝度値が変化する画像データとなっている。 The processing of this embodiment will be described with reference to FIG. Here, an example of a situation in which a signboard installed in a stadium is imaged will be described. The captured image data 801 is image data obtained by imaging a signboard arranged across the sun and the shade. The left side of the signboard is exposed to sunlight, but the right side is shaded by the roof. When the distance between the subject and an object that blocks light, such as a signboard placed on the roof of a stadium and the ground, is long, the light wraps around and the illuminance changes continuously between the sunlit area and the shaded area. Therefore, the captured image data 801 is image data in which the brightness value continuously changes between the sunlit area and the shaded area.

日向の領域の画像データ８０２を高解像度化する際には、日向の領域の画像データで構成した学習用データセット８０３で学習を行う。同様に、日陰の画像データ８０６を高解像度化する際には、日陰の領域の画像データで構成したデータセット８０７で学習を行う。 When increasing the resolution of the image data 802 in the Hinata region, learning is performed using the learning data set 803 composed of the image data in the Hinata region. Similarly, when increasing the resolution of the shaded image data 806, learning is performed using the data set 807 composed of the shaded area image data.

日向と日陰との境界は、予め記憶されたスタジアムの３次元形状と、ステレオ画像データなどから算出した看板の３次元形状と、スタジアム、看板、及び太陽などの光源の相対的な位置関係とに基づくシミュレーション結果に基づき算出することができる。シミュレーション結果から日向と判定された３次元位置に対応する部分画像データに対して日向の属性値をタグ付けし、日陰に判定された３次元位置に対応する部分画像データに対して日陰の属性値をタグ付けする。このようにタグ付けされた属性値に基づき、属性値変化の検出を行う。 The boundary between the sun and the shade is the relative positional relationship between the three-dimensional shape of the stadium stored in advance, the three-dimensional shape of the signboard calculated from stereo image data, and the light sources such as the stadium, the signboard, and the sun. It can be calculated based on the simulation result based on. The attribute value of the sun is tagged with the partial image data corresponding to the 3D position determined to be the sun from the simulation result, and the attribute value of the shade is attached to the partial image data corresponding to the 3D position determined to be the shade. To tag. The change in the attribute value is detected based on the attribute value tagged in this way.

但し、これだけでは、画像データ８０８のように、日向と日陰の切り替わり領域において輝度値が大きく変化し、輝度値に関して領域不連続性が増大する。そこで、日向と日陰とが切り替わる領域（属性値が変化する境界線から所定の距離までの領域）の画像データ８０４を高解像度化する際には、日向の画像データと日陰の画像データとを混合して構成した学習用データセット８０５を用いて学習を行う。 However, with this alone, the luminance value changes significantly in the area where the sun and the shade are switched, as in the image data 808, and the area discontinuity increases with respect to the luminance value. Therefore, when increasing the resolution of the image data 804 in the area where the sun and the shade are switched (the area from the boundary line where the attribute value changes to a predetermined distance), the image data of the sun and the image data of the shade are mixed. Learning is performed using the learning data set 805 constructed in the above manner.

これにより、日向と日陰という属性値の異なる画像データの特徴を併せ持った変換結果が出力される。実施形態１と同様、画像変換する部分画像データを細分化し、学習用データセットにおける属性値の異なる教師画像データの構成比を細分化された部分画像データ毎に段階的に変化させることで、領域不連続性を低減することができる。 As a result, a conversion result having the characteristics of image data having different attribute values of sun and shade is output. Similar to the first embodiment, the partial image data to be image-converted is subdivided, and the composition ratio of the teacher image data having different attribute values in the training data set is changed stepwise for each subdivided partial image data. Discontinuity can be reduced.

なお、上記では被写体として看板を、属性値として日向と日陰を例にとって説明したが、その他の被写体や属性値変化にも本実施形態を適用可能である。 In the above description, the signboard is used as the subject and the sun and shade are used as the attribute values, but the present embodiment can be applied to other subjects and changes in the attribute value.

以上、本実施形態によれば、画像変換処理において単一画像データ内の領域不連続性を低減することができる。 As described above, according to the present embodiment, it is possible to reduce the region discontinuity in the single image data in the image conversion process.

Embodiment 4

実施形態４では、適切な学習用データセット構築と不連続性の低減とを支援するＵＩについて記載する。 In the fourth embodiment, a UI that supports the construction of an appropriate learning data set and the reduction of discontinuity will be described.

図９は、画像変換結果の確認と、学習用データセット構築方法の指定とを行うＵＩの構成例である。ウインドウ９０１は表示装置１０７に表示される表示画面であり、ウインドウ９０２は画像変換結果を動画として表示するものである。画像処理装置１０６は、最初に、実施形態１乃至３のいずれかの方法に従って変換パラメータの学習と学習して得られた変換パラメータに基づく画像変換とを行い、画像変換した動画を表示する。ユーザは、再生ボタン及びシークバー９０３を操作し、動画に違和感や画像データの特性に大きな不連続性がないかを確認する。画像処理装置が判定した属性値変化の概要は、区域バー９０４に表示される。図９の例では、３回の属性値変化が検出されており、動画全体が４つの区域に分割されている。特定の区域を選択することで、属性情報ウインドウ９０５を閲覧することができ、その区域の属性と、属性値がどう変化したかを確認できる。 FIG. 9 is a configuration example of the UI for confirming the image conversion result and specifying the learning data set construction method. The window 901 is a display screen displayed on the display device 107, and the window 902 displays the image conversion result as a moving image. The image processing device 106 first learns the conversion parameters according to the method of any one of the first to third embodiments and performs image conversion based on the conversion parameters obtained by learning, and displays the image-converted moving image. The user operates the play button and the seek bar 903 to check whether the moving image has a sense of discomfort or a large discontinuity in the characteristics of the image data. The summary of the attribute value change determined by the image processing apparatus is displayed on the area bar 904. In the example of FIG. 9, three attribute value changes are detected, and the entire moving image is divided into four areas. By selecting a specific area, the attribute information window 905 can be browsed, and the attributes of the area and how the attribute values have changed can be confirmed.

また、ユーザは学習用データセット構築のハイパーパラメータである属性値変化時刻Ｔ、比率変更段階数Ｍ、属性値変化時間幅Ｗの値を再設定し、画像変換結果を更新させることができる。スライドバー９０６、９０７、９０８を操作することで、属性値変化時刻Ｔ、比率変更段階数Ｍ、属性値変化時間幅Ｗのそれぞれの値を増減させることができる。ここで、属性値変化時刻Ｔを変更するとは、データセットの切り替えの基準となる属性値、例えば輝度レベルを変更することであり、属性値変化時間幅Ｗを変更するとは、変化領域データの範囲を変更することである。 Further, the user can reset the values of the attribute value change time T, the ratio change step number M, and the attribute value change time width W, which are hyperparameters of the learning data set construction, and update the image conversion result. By operating the slide bars 906, 907, and 908, the respective values of the attribute value change time T, the ratio change step number M, and the attribute value change time width W can be increased or decreased. Here, changing the attribute value change time T means changing the attribute value that is the reference for switching the data set, for example, the brightness level, and changing the attribute value change time width W means changing the range of the change area data. Is to change.

また、区域バー９０４で１つの区域が選択された状態でボタン９０９が押下された場合には、１つの区域が複数の区域に分割される。この区域の分割は、データセットの切り替えの基準となる属性値の１つ、例えば輝度レベルを新たに設定することである。逆に区域バー９０４で複数の区域が選択された状態でボタン９１０が押下された場合には、複数の区域が１つの区域に結合される。この区域の結合は、データセットの切り替えの基準となる属性値の１つ、例えば輝度レベルの１つを削除することである。 Further, when the button 909 is pressed with one area selected in the area bar 904, one area is divided into a plurality of areas. The division of this area is to newly set one of the attribute values, for example, the brightness level, which is the reference for switching the data set. Conversely, when the button 910 is pressed while a plurality of areas are selected on the area bar 904, the plurality of areas are combined into one area. The combination of this area is to remove one of the attribute values that is the basis for switching the dataset, such as one of the brightness levels.

以上の変更を行った後に再学習ボタン９１１が押下されると、設定されたハイパーパラメータや条件に基づいて学習が再度行われ、その結果が反映された映像データがウインドウ９０２に表示される。このとき、最初は少ない学習時間での画像変換結果をプレビューし、バックグラウンドで残りの学習を継続しつつ、プレビューを随時更新する。 When the re-learning button 911 is pressed after making the above changes, learning is performed again based on the set hyperparameters and conditions, and the video data reflecting the result is displayed in the window 902. At this time, the image conversion result in a short learning time is previewed at first, and the preview is updated as needed while continuing the remaining learning in the background.

保存ボタン９１２が押下されると、変換結果の動画が保存される。同時に、動画に付与された属性情報、属性値変化の情報、学習用データセットの構成が追加の履歴データとしてファイルに保存される。次回以降の画像変換を行う時に、この履歴データを参照することにより、過去に構築したものと同様のデータセットを構築することができる。 When the save button 912 is pressed, the converted moving image is saved. At the same time, the attribute information given to the moving image, the information on the change in the attribute value, and the structure of the learning data set are saved in the file as additional historical data. By referring to this historical data when performing image conversion from the next time onward, it is possible to construct a data set similar to the one constructed in the past.

以上により、本実施形態によれば、ユーザは画像変換結果を確認しつつ、適切なデータセット構築と画像変換を行うことができる。 As described above, according to the present embodiment, the user can perform appropriate data set construction and image conversion while confirming the image conversion result.

Embodiment 5

本開示の技術は、天気や日照といった環境の属性が変化した場合だけでなく、被写体の属性に変化が起こった場合にも適用できる。その例を、図１０を用いて説明する。 The technique of the present disclosure can be applied not only when the attributes of the environment such as weather and sunshine change, but also when the attributes of the subject change. An example thereof will be described with reference to FIG.

図１０は、カメラの向きや焦点距離の変化により、カメラの被写体が変化する例である。ある時点では画像データ１００１のように複数の異なる属性値を有する人物を含むシーンが撮像され、その後画像データ１００２のように特定の人物のみがズームして撮像されている。 FIG. 10 shows an example in which the subject of the camera changes depending on the orientation of the camera and the change in the focal length. At a certain point in time, a scene including a person having a plurality of different attribute values such as image data 1001 is imaged, and then only a specific person is zoomed and imaged like image data 1002.

画像データ１００１、１００２において撮像された被写体の顔画像データを、学習用データセットを用いて高解像度化する場合、含まれる被写体の顔画像データの属性値の変化に応じて学習用データセットを変更することで変換精度を向上させることができる。すなわち、画像データ１００１のように特定の属性に関して異なる属性値を有する人物群の顔画像データすべてを同時に処理するには、そこに含まれる複数の属性値を有する複数の人物群の画像データを混合したデータセット１００３を用いて学習を行えばよい。 When the face image data of the subject captured in the image data 1001 and 1002 is increased in resolution by using the learning data set, the learning data set is changed according to the change in the attribute value of the included face image data of the subject. By doing so, the conversion accuracy can be improved. That is, in order to process all the face image data of a group of people having different attribute values with respect to a specific attribute like the image data 1001 at the same time, the image data of a plurality of people having a plurality of attribute values included therein is mixed. The training may be performed using the obtained data set 1003.

一方、画像データ１００２のように特定の人物のみが映った画像データを高解像度化する場合には、その人物が有する属性値を有する人物群の画像データのみのデータセット１００４を構築して学習を行う。そして、属性の切り替わり時、すなわちズーム中は、両データセットを一定の比率で混合したデータセットで学習と画像変換を行う。 On the other hand, in the case of increasing the resolution of image data in which only a specific person is shown, such as image data 1002, a data set 1004 of only image data of a group of people having attribute values possessed by that person is constructed for learning. Do. Then, when the attributes are switched, that is, during zooming, learning and image conversion are performed with a data set in which both data sets are mixed at a constant ratio.

なお、ズームに伴い映り込む人物（被写体）構成の変化を検出する方法としては、画像データから検出可能な被写体の特徴量に基づき判定してもよいし、ＧＰＳなどを用いて取得した各被写体の３次元位置に基づき判定してもよい。 As a method of detecting the change in the composition of the person (subject) reflected by the zoom, the determination may be made based on the feature amount of the subject that can be detected from the image data, or each subject acquired by using GPS or the like. The determination may be made based on the three-dimensional position.

このように、本開示の技術は、被写体の属性値に基づいて学習用データセットを使い分ける場合においても適用可能である。 As described above, the technique of the present disclosure can be applied even when the learning data set is properly used based on the attribute value of the subject.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１０６画像処理装置
５０５混合比率決定部
５０７学習部
５０８画像変換部 106 Image processing device 505 Mixing ratio determination unit 507 Learning unit 508 Image conversion unit

Claims

It is an image processing device
An acquisition method for acquiring input image data,
A conversion means that performs predetermined image conversion on the input image data acquired by the acquisition means, and
Have,
The conversion means has a first attribute value of the first partial data among the plurality of partial data constituting the input image data, and the plurality of partial data adjacent to the first partial data. Based on the first attribute value generated based on the second attribute value different from the first attribute value possessed by the second partial data of the above and the conversion parameter associated with the second attribute value. An image processing apparatus characterized in that the predetermined image conversion is performed on at least a part of the first partial data and the second partial data.

The conversion means, in addition to the conversion parameters associated with the first attribute value and the second attribute value, the conversion associated with the first attribute value generated based only on the first attribute value. With respect to the first partial data and a portion of the second partial data, based on the parameters and the conversion parameters associated with the second attribute value generated only based on the second attribute value. When performing a predetermined image conversion, of the first partial data and the second partial data, only the partial data to which the conversion parameter associated only with the first attribute value is applied and the second attribute value. 1. The conversion parameter associated with the first attribute value and the second attribute value is applied to the partial data to and from the partial data to which the conversion parameter associated with is applied. The image processing apparatus according to.

Of the first partial data and the second partial data, the attribute value changes between the first reference value for the first attribute value and the second reference value for the second attribute value. A detection means for detecting change area data consisting of continuous partial data,
With more
The first attribute value and the conversion parameters associated with the second attribute value depend on the number of divided partial data obtained by dividing the detected change area data and the positional relationship between the divided partial data. A plurality of data generated based on the first attribute value and the second attribute value according to the mixing ratio of the first attribute value and the second attribute value determined for each of the divided partial data. Includes different conversion parameters
The image processing apparatus according to claim 1 or 2, wherein the conversion means applies the plurality of different conversion parameters to the corresponding divided partial data.

3. The third aspect of the present invention is characterized in that the user further includes a changing means for changing at least one of the attribute value, the range of the changing area data, and the number of divisions of the changing area data. Image processing equipment.

The first attribute value and the conversion parameters associated with the second attribute value are the result of learning using at least a part of the first data set associated with the first attribute value, and the above. The image according to any one of claims 1 to 4, characterized in that it was generated based on the result of learning using at least a part of the second data set associated with the second attribute value. Processing equipment.

The conversion parameters associated only with the first attribute value are generated based only on the results of training using the first dataset.
The image processing apparatus according to claim 5, wherein the conversion parameters associated only with the second attribute value are generated based only on the result of learning using the second data set.

A second acquisition means for acquiring input image data,
A determination means for determining the attribute value of the input image data, and
The image processing apparatus according to any one of claims 1 to 6, further comprising.

The image processing apparatus according to any one of claims 1 to 7, further comprising a display means for displaying the result of the predetermined image conversion by the conversion means.

A generation means for generating a plurality of image data based on imaging by a plurality of imaging devices and virtual viewpoint image data representing a view from a specified virtual viewpoint.
With more
The image processing apparatus according to any one of claims 1 to 8, wherein the input image data is the virtual viewpoint image data.

It is an image processing device
A first acquisition means for acquiring a data set including teacher image data associated with an attribute value of the partial data constituting the input image data, and
A generation means for generating conversion parameters for performing a predetermined image conversion on the input image data based on the learning result using the acquired data set, and a generation means.
With
The generation means is learning using at least a part of the first data set associated with the first attribute value of the first partial data among the plurality of partial data constituting the input image data. The result and at least a part of the second data set associated with the second attribute value of the second partial data of the plurality of partial data adjacent to the first partial data were used. An image processing apparatus characterized in that a first attribute value generated based on a learning result and a conversion parameter associated with the second attribute value are generated.

The generation means generates conversion parameters associated only with the first attribute value generated based only on the result of learning using the first data set, and learns using the second data set. The image processing apparatus according to claim 10, wherein a conversion parameter associated only with the second attribute value generated based only on the result of the above is generated.

It is an image processing device
A first acquisition means for acquiring a data set including teacher image data associated with an attribute value of the partial data constituting the input image data, and
A generation means for generating conversion parameters for performing a predetermined image conversion on the input image data based on the learning result using the acquired data set, and a generation means.
By applying the conversion parameter to the partial data having the attribute value associated with the data set used for learning the conversion parameter, the conversion that performs the predetermined image conversion on the input image data is performed. Means and
With
The generation means includes at least a part of the first data set associated with the first attribute value of the first partial data among the plurality of partial data constituting the input image data, and the first. At least a part of a second data set associated with a second attribute value different from the first attribute value of the second partial data of the plurality of partial data adjacent to the partial data of. Based on the learning result using and, the first attribute value and the conversion parameter associated with the second attribute value are generated.
The conversion means is determined with respect to at least a part of the first partial data and the second partial data based on the first attribute value and the conversion parameters associated with the second attribute value. An image processing device characterized by performing image conversion.

The generation means generates conversion parameters associated only with the first attribute value based on the learning result using only the first data set, and based on the learning result using only the second data set. Generate conversion parameters associated only with the attribute value of 2
In addition to the conversion parameters associated with the first attribute value and the second attribute value, the conversion means includes only the conversion parameters associated with only the first attribute value and the second attribute value. When the predetermined image conversion is performed on the first partial data and a part of the second partial data based on the associated conversion parameters, the first partial data and the second partial data Among the partial data, the partial data between the partial data to which the conversion parameter associated only with the first attribute value is applied and the partial data to which the conversion parameter associated only with the second attribute value is applied is the first. The image processing apparatus according to claim 12, wherein a conversion parameter associated with the attribute value of 1 and the second attribute value is applied.

Of the first partial data and the second partial data, the attribute value changes between the first reference value for the first attribute value and the second reference value for the second attribute value. A detection means for detecting change area data consisting of continuous partial data,
The detected change area data is divided into a plurality of divided partial data, and the first divided partial data is divided according to the number of divisions of the changed area data and the positional relationship between the divided partial data. A determining means for determining the mixing ratio of the data set and the second data set, and
With more
The generation means uses at least a part of the first data set mixed according to the corresponding mixing ratio for each of the divided partial data, and a learning result using at least a part of the second data set. The image processing apparatus according to any one of claims 10 to 13, wherein the first attribute value and the conversion parameter associated with the second attribute value are generated based on the result.

The generation means determines the number of times of learning of the first data set and the second data set based on the mixing ratio, and according to the number of times of learning, the first data set and the second data set The image processing apparatus according to claim 14, further comprising learning the first attribute value and the conversion parameters associated with the second attribute value based on the learning results in which the above-mentioned first attribute value and the second attribute value are alternately used.

The generation means determines the first learning rate of the first data set and the second learning rate of the second data set based on the mixing ratio, and uses the first data set. Conversion parameters associated with the first attribute value and the second attribute value based on the learning result at the first learning rate and the learning result at the second learning rate using the second data set. The image processing apparatus according to claim 14, wherein the image processing apparatus is generated.

The generation means selects the teacher image data from the first data set and the second data set with the probability of the mixing ratio to generate a learning batch, and the learning result using the learning batch. The image processing apparatus according to claim 14, wherein the first attribute value and the conversion parameters associated with the second attribute value are generated based on the above.

Further provided with a construction means for constructing a third data set in which the teacher image contained in the first data set and the teacher image contained in the second data set are mixed at the mixing ratio.
14. The generation means according to claim 14, wherein the generation means generates a conversion parameter associated with the first attribute value and the second attribute value based on a learning result using the third data set. Image processing equipment.

The generation means mixes the conversion parameter associated only with the first attribute value and the conversion parameter associated only with the second attribute value at the mixing ratio to obtain the first attribute value and the said generation means. The image processing apparatus according to claim 14, wherein a conversion parameter associated with a second attribute value is generated.

Claims 14 to 19 further include changing means for the user to change at least one of the attribute value, the range of the changing area data, and the number of divisions of the changing area data. The image processing apparatus according to any one of the above.

Further provided with a storage means for storing the history of changes made by the change means,
The image processing apparatus according to claim 20, wherein the generation means generates the first attribute value and the conversion parameter associated with the second attribute value according to the history of the change.

A generation means for generating virtual viewpoint image data representing a view from the virtual viewpoint based on a plurality of image data based on imaging by a plurality of imaging devices and a designated virtual viewpoint.
With more
The input image data is the virtual viewpoint image data, and is
The method according to any one of claims 10 to 12, wherein the data set includes image data not used for generating the virtual viewpoint image data among the plurality of image data as the teacher image data. Image processing device.

The input image data is moving image data, and the partial data is frame image data constituting the moving image data.
Claims 1 to 1, wherein the second partial data adjacent to the first partial data is frame image data that is temporally continuous with respect to the frame image data of the first partial data. The image processing apparatus according to any one of 22.

The input image data is still image data, and the partial data is partial image data representing a partial image for each region constituting the image represented by the still image data.
The second partial data adjacent to the first partial data is a partial image representing a partial image adjacent to the partial image represented by the partial image data of the first partial data in the image represented by the still image data. The image processing apparatus according to any one of claims 1 to 22, wherein the image processing apparatus is data.

A second acquisition means for acquiring input image data,
A determination means for determining the attribute value of the input image data, and
The image processing apparatus according to any one of claims 1 to 24, further comprising.

The attribute value of the input image data is among the imaging time of the input image data, the imaging location, the imaging target, the state of the imaging target, the environment at the time of imaging, the model of the imaging device, and the imaging parameters of the imaging device at the time of imaging. The image processing apparatus according to any one of claims 1 to 25, which comprises a value relating to at least one.

It is further provided with a tagging means for acquiring an attribute value based on the three-dimensional position of the subject captured in the input image data and tagging the partial data corresponding to the three-dimensional position with the attribute value. The image processing apparatus according to any one of claims 1 to 26.

It is an image processing method
The acquisition step to acquire the input image data and
A conversion step of performing a predetermined image conversion on the input image data acquired by the acquisition step, and
Have,
The conversion step has a first attribute value included in the first partial data constituting the input image data and a second partial data constituting the input image data adjacent to the first partial data. Based on the first attribute value generated based on a second attribute value different from the first attribute value and the conversion parameter associated with the second attribute value, the first partial data and An image processing method comprising performing the predetermined image conversion on at least a part of the second partial data.

It is an image processing method
The first acquisition step of acquiring the data set including the teacher image data associated with the attribute value of the predetermined attribute of the partial data constituting the input image data, and
A generation step of generating conversion parameters for performing a predetermined image conversion on the input image data based on the learning result using the acquired data set, and a generation step.
With
The generation step is for learning using at least a part of the first data set associated with the first attribute value of the first partial data among the plurality of partial data constituting the input image data. The result and at least a part of the second data set associated with the second attribute value of the second partial data of the plurality of partial data adjacent to the first partial data were used. An image processing method characterized by generating a first attribute value generated based on a learning result and a conversion parameter associated with the second attribute value.

It is an image processing method
The first acquisition step of acquiring the data set including the teacher image data associated with the attribute value of the predetermined attribute of the partial data constituting the input image data, and
A generation step of generating conversion parameters for performing a predetermined image conversion on the input image data based on the learning result using the acquired data set, and a generation step.
By applying the conversion parameter to the partial data having the attribute value associated with the data set used for learning the conversion parameter, the conversion that performs the predetermined image conversion on the input image data is performed. Steps and
With
The generation step is adjacent to at least a part of the first data set associated with the first attribute value of the first partial data constituting the input image data and the first partial data. Based on the learning result using at least a part of the second data set associated with the second attribute value different from the first attribute value of the second partial data constituting the input image data. Generate the first attribute value and the conversion parameters associated with the second attribute value.
The conversion step is determined with respect to at least a portion of the first partial data and the second partial data based on the first attribute value and the conversion parameters associated with the second attribute value. An image processing method characterized by performing image conversion.

A program for operating a computer as the image processing device according to any one of claims 1 to 27.