JP2020024612A

JP2020024612A - Image processing device, image processing method, processing device, processing method and program

Info

Publication number: JP2020024612A
Application number: JP2018149357A
Authority: JP
Inventors: 信彦田村; Nobuhiko Tamura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-08-08
Filing date: 2018-08-08
Publication date: 2020-02-13

Abstract

To solve the problem that an appropriate super-resolution process cannot be executed by utilizing ambiguous image identification information.SOLUTION: An image processing device includes: first obtaining means for obtaining first image data; second obtaining means for obtaining a classification score; and processing means for creating, from the first image data on the basis of the classification score, second image data that has a higher resolution than that of the first image data. Parameters that form the processing means are decided on the basis of learning data formed by a set of multiple pieces of data that contain learning image data and an identifier which identifies the classification of the learning image data. The classification score indicates a probability that the first image data is each of the identifiers.SELECTED DRAWING: Figure 3

Description

本発明は、機械学習を用いて画像を高解像度化する画像処理の技術に関する。 The present invention relates to an image processing technique for increasing the resolution of an image using machine learning.

画素数の少ない低解像度画像を画素数の多い高解像度画像へと拡大する際、バイリニア補間などを用いて画像を拡大すると、拡大処理後の画像がぼけてしまう。このような問題を解決する方法として超解像処理が知られている。超解像処理として、学習型の超解像技術が知られている（特許文献１）。以降では、超解像の意味を明確にするため超解像を高解像度化と称する。特許文献１では、画像の種類を一意に識別する「正解ＩＤ」を用いて高解像度化処理を行うことが記載されている。 When a low-resolution image with a small number of pixels is enlarged to a high-resolution image with a large number of pixels, if the image is enlarged using bilinear interpolation or the like, the image after the enlargement processing is blurred. Super-resolution processing is known as a method for solving such a problem. As the super-resolution processing, a learning-type super-resolution technique is known (Patent Document 1). Hereinafter, in order to clarify the meaning of super-resolution, super-resolution is referred to as high resolution. Patent Literature 1 describes performing high resolution processing using a “correct ID” that uniquely identifies an image type.

国際公開第２０１４／１６２６９０号International Publication No. 2014/162690

しかしながら、特許文献１に記載されているように、画像の種類を一意に識別する「正解ＩＤ」が常に取得可能であるとは限らない。例えば、防犯カメラで不審者が映っている画像を高解像度化により鮮明に拡大する際、不審者を識別するような「正解ＩＤ」は取得可能ではない場合が一般的である。むしろ目撃情報として「男性っぽい」、「中年のような気がした」、「ひょっとすると青年かもしれない」など、必ずしも画像の種類を一意に識別できない、あいまいな画像識別情報が得られることが一般的である。特許文献１の技術では、画像の種類を一意に識別する「正解ＩＤ」が得られない場合に、適切に高解像度化処理を行うことができない。 However, as described in Patent Document 1, it is not always possible to always acquire a “correct answer ID” that uniquely identifies an image type. For example, when an image of a suspicious person in a security camera is sharply enlarged by increasing the resolution, a “correct ID” for identifying the suspicious person is generally not obtainable. Rather than being able to uniquely identify the type of image, such as `` masculine '', `` feeling like middle-aged '', or `` maybe youth '' as sighting information, obscure image identification information can be obtained Is common. In the technique of Patent Literature 1, when a “correct ID” that uniquely identifies an image type cannot be obtained, it is not possible to appropriately perform the resolution increasing process.

本発明の一態様に係る画像処理装置は、第一の画像データを取得する第一取得手段と、分類スコアを取得する第二取得手段と、前記分類スコアに基づいて前記第一の画像データから前記第一の画像データよりも高解像度の第二の画像データを生成する処理手段と、を備え、前記処理手段を構成するパラメータは、学習用画像データと前記学習用画像データの分類を識別する識別子とを含む組の複数のデータで構成される学習用データに基づいて決定され、前記分類スコアは、前記第一の画像データが前記識別子の各々である確率を示すことを特徴とする。 An image processing apparatus according to an aspect of the present invention includes a first obtaining unit that obtains first image data, a second obtaining unit that obtains a classification score, and the first image data based on the classification score. Processing means for generating second image data having a higher resolution than the first image data, wherein the parameters constituting the processing means identify learning image data and classification of the learning image data. The classification score is determined based on learning data composed of a plurality of sets of data including an identifier, and the classification score indicates a probability that the first image data is each of the identifiers.

本発明によれば、必ずしも画像の種類を一意に識別できない、あいまいな画像識別情報を用いることによって、高解像度化を適切に行うことが可能となる。 According to the present invention, it is possible to appropriately increase the resolution by using ambiguous image identification information that cannot always uniquely identify an image type.

機械学習の基本フレームワークを示す概念図。The conceptual diagram which shows the basic framework of machine learning. 画像処理装置の構成の一例を示すブロック図。FIG. 2 is a block diagram illustrating an example of a configuration of an image processing device. 画像処理装置の機能ブロック図。FIG. 2 is a functional block diagram of the image processing apparatus. 画像処理装置による処理の流れの一例を示すフローチャート。9 is a flowchart illustrating an example of the flow of a process performed by the image processing apparatus. 高解像度化ニューラルネットワークの構成の一例を示す図。The figure which shows an example of a structure of a high resolution neural network. 高解像度化ニューラルネットワークのパラメータ最適化を説明するための図。The figure for demonstrating parameter optimization of a high resolution neural network. 判定部ニューラルネットワークの構成の一例を示す図。The figure which shows an example of a structure of a determination part neural network. 分類スコアを設定するＵＩの例。6 is an example of a UI for setting a classification score. 高解像度化ニューラルネットワークの構成を示す図。The figure which shows the structure of a high resolution neural network. 分類スコアを設定するＵＩの例。6 is an example of a UI for setting a classification score.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and all combinations of features described in the present embodiments are not necessarily essential to the solution of the present invention. The same components will be described with the same reference numerals.

＜＜実施形態１＞＞
＜機械学習の基本フレームワーク＞
図１は、機械学習の基本フレームワークを説明する図である。実施形態を円滑に説明するため、まず、機械学習の基本フレームワークについて説明する。図１を用いて、対応関係が明確ではないデータｘとデータｙとに関して、既知のｘから未知のｙを予測するモデルを構成する方法を説明する。 << First Embodiment >>
<Basic framework of machine learning>
FIG. 1 is a diagram illustrating a basic framework of machine learning. First, a basic framework of machine learning will be described in order to smoothly describe the embodiment. A method of configuring a model for predicting unknown y from known x with respect to data x and data y whose correspondence is not clear will be described with reference to FIG.

図１の学習用データ１０１は、ｘとｙとのペアの複数セットで構成されるデータである。図１では、Ｎセットのペアのデータが学習用データ１０１に含まれている。モデル１０２は、ｘとｙとを対応付ける関数ｙ＝ｆ（ｘ；θ）によって表すことができる。機械学習では、関数ｆと、そのパラメータθとが適切に設定されれば、関数ｆにより、ｘからｙが良好に推定できるとする。なお、そもそも因果関係が明確ではないため、関数ｆのパラメータθを極めて多く採ることにより、あらゆる対応関係を実現できる余地を残すことが一般的である。例えば、モデル１０２がニューラルネットワーク（以下、「ＮＮ」という）の一種であるコンボリューショナルニューラルネットワーク（以下、「ＣＮＮ」という）である場合を想定する。この場合、コンボリューション層の数や、その層に対応付けられる係数を増やすことがパラメータθを多く採ることに相当する。画像処理を行うＣＮＮでは、パラメータの数は数万以上であることが普通である。モデルパラメータ最適化１０３は、モデル１０２が既知のｘから未知のｙを良好に予測できるよう、パラメータθを最適化する処理である。典型的な最適化方法としては式（１）で与えられるＬ２ノルムの２乗を目的関数Ｌ（θ）として、この目的関数Ｌ（θ）を最小化するようパラメータθを設定する方法が挙げられる。 The learning data 101 in FIG. 1 is data composed of a plurality of sets of pairs of x and y. In FIG. 1, data of N sets of pairs are included in the learning data 101. The model 102 can be represented by a function y = f (x; θ) that associates x with y. In machine learning, if the function f and its parameter θ are appropriately set, it is assumed that the function f can be used to estimate y well from x. Since the causal relationship is not clear in the first place, it is general to leave room for realizing any correspondence by taking an extremely large number of parameters θ of the function f. For example, it is assumed that the model 102 is a convolutional neural network (hereinafter, referred to as “CNN”), which is a kind of a neural network (hereinafter, referred to as “NN”). In this case, increasing the number of convolution layers and the coefficient associated with the layer corresponds to adopting more parameters θ. In a CNN that performs image processing, the number of parameters is usually tens of thousands or more. The model parameter optimization 103 is a process for optimizing the parameter θ so that the model 102 can satisfactorily predict the unknown y from the known x. A typical optimization method is a method in which the square of the L2 norm given by the equation (1) is set as an objective function L (θ), and a parameter θ is set so as to minimize the objective function L (θ). .

式（１）以外にも様々な目的関数が提案されている。目的関数に応じて機械学習による推定の特性が変わるため、目的関数の設定は重要である。目的関数を最小化するようモデルパラメータの最適化１０３が終了すると、最適なモデルパラメータθとして、θ₀が決定される。予測モデル１０４は、この最適なモデルパラメータθ₀をモデル１０２に適用したモデルである。予測モデル１０４を用いて、既知のｘから未知のｙを予測することができる。即ち、ｙ＝ｆ（ｘ；θ₀）により既知のｘから未知のｙを予測することが可能となる。以上が機械学習の基本フレームワークである。 Various objective functions have been proposed in addition to equation (1). The setting of the objective function is important because the characteristics of estimation by machine learning change according to the objective function. When the model parameter optimization 103 is completed so as to minimize the objective function, θ ₀ is determined as the optimal model parameter θ. The prediction model 104 is a model in which the optimal model parameter θ ₀ is applied to the model 102. Using the prediction model 104, the unknown y can be predicted from the known x. That is, unknown y can be predicted from known x by y = f (x; θ ₀ ). The above is the basic framework of machine learning.

＜画像処理装置の構成＞
図２は、実施形態１の画像処理装置の構成の一例を示す図である。画像処理装置２００は、ＣＰＵ（Central Processing Unit）２０１、ＲＡＭ（Random Access Memory）２０２、ＨＤＤ（Hard Disk Drive）２０３を備える。また画像処理装置２００は、バス２０４、およびインターフェース２０５を備える。ＣＰＵ２０１は、ＲＡＭ２０２またはＨＤＤ２０３に格納されているコンピュータプログラムおよびデータを用いて各種の処理を実行する。これによりＣＰＵ２０１は、コンピュータ装置全体の動作制御を行うと共に、上述した機械学習に関する各処理を実行または制御する。またＣＰＵ２０１は、その処理の一部をバス２０４に接続された不図示のＧＰＵ（Graphics Processing Unit）に担わせてもよい。 <Configuration of image processing device>
FIG. 2 is a diagram illustrating an example of a configuration of the image processing apparatus according to the first embodiment. The image processing apparatus 200 includes a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, and a HDD (Hard Disk Drive) 203. Further, the image processing apparatus 200 includes a bus 204 and an interface 205. The CPU 201 executes various processes using computer programs and data stored in the RAM 202 or the HDD 203. Accordingly, the CPU 201 controls the operation of the entire computer device, and executes or controls each process related to the machine learning described above. Further, the CPU 201 may cause a GPU (Graphics Processing Unit) (not shown) connected to the bus 204 to perform a part of the processing.

ＲＡＭ２０２は、ＨＤＤ２０３からロードされたコンピュータプログラムやデータを格納するためのエリアを有する。更にＲＡＭ２０２は、ＣＰＵ２０１が各種の処理を実行する際に用いるワークエリアを有する。このようにＲＡＭ２０２は、各種のエリアを適宜提供することができる。ＨＤＤ２０３は、ハードディスクドライブ装置に代表される大容量情報記憶装置である。ＨＤＤ２０３には、ＯＳ（オペレーティングシステム）、ならびに、上述した機械学習に関する各処理をＣＰＵ２０１に実行させるためのコンピュータプログラムおよびデータが保存されている。ＨＤＤ２０３に保存されているデータには、処理対象となる画像または動画像のデータが含まれている。ＨＤＤ２０３に保存されているコンピュータプログラムおよびデータは、ＣＰＵ２０１による制御に従って適宜ＲＡＭ２０２にロードされ、ＣＰＵ２０１による処理対象となる。なお、ＨＤＤ２０３としては、ハードディスクドライブ装置以外にもＳＳＤ（Solid State Drive）、フラッシュメモリ、ＵＳＢ（Universal Serial Bus）メモリなどのメモリ装置でもよい。さらには、インターフェース２０５を介して接続されたネットワーク２０８上に存在する不図示の記憶装置を仮想的にＨＤＤ２０３としてもよい。 The RAM 202 has an area for storing computer programs and data loaded from the HDD 203. Further, the RAM 202 has a work area used when the CPU 201 executes various processes. As described above, the RAM 202 can appropriately provide various areas. The HDD 203 is a large-capacity information storage device represented by a hard disk drive. The HDD 203 stores an OS (Operating System), and computer programs and data for causing the CPU 201 to execute each process related to the machine learning described above. The data stored in the HDD 203 includes data of an image or a moving image to be processed. The computer programs and data stored in the HDD 203 are appropriately loaded into the RAM 202 under the control of the CPU 201, and are processed by the CPU 201. The HDD 203 may be a memory device such as an SSD (Solid State Drive), a flash memory, or a USB (Universal Serial Bus) memory other than the hard disk drive device. Further, a storage device (not shown) existing on the network 208 connected via the interface 205 may be virtually used as the HDD 203.

ＣＰＵ２０１、ＲＡＭ２０２、ＨＤＤ２０３、およびインターフェース２０５は、何れもバス２０４に接続されている。インターフェース２０５には、入力装置２０６、出力装置２０７、ネットワーク２０８、及びバス２０４が接続されている。入力装置２０６は、キーボード、マウス、タッチパネル、マイク、カメラ、またはジャイロセンサーなどにより構成されており、画像処理装置２００に対し、設定の変更や処理の開始を各種の様式で指示することができる。出力装置２０７は、ディスプレイ、プロジェクタ、またはプリンタなどにより構成されており、画像処理装置２００による処理結果を画像および文字などで表示、投影、および印刷することができる。なお、入力装置２０６および出力装置２０７は、タブレット端末およびスマートフォンのようにタッチパネルディスプレイ等を用いることにより一体化していてもよい。ネットワーク２０８には、複数の装置が接続されており、ＣＰＵ２０１の命令に応じて画像処理装置２００に情報記憶機能、演算機能、入出力機能などＳａａＳ（ＳｏｆｔｗａｒｅａｓａＳｅｒｖｉｃｅ）等の形態により提供することができる。なお、図２に示した構成は、画像処理装置２００に適用可能なコンピュータ装置の構成の一例に過ぎず、他の構成を採用してもよい。 The CPU 201, the RAM 202, the HDD 203, and the interface 205 are all connected to the bus 204. The input device 206, the output device 207, the network 208, and the bus 204 are connected to the interface 205. The input device 206 includes a keyboard, a mouse, a touch panel, a microphone, a camera, a gyro sensor, and the like, and can instruct the image processing device 200 to change settings and start processing in various forms. The output device 207 is configured by a display, a projector, a printer, or the like, and can display, project, and print the processing results of the image processing device 200 as images and characters. Note that the input device 206 and the output device 207 may be integrated by using a touch panel display or the like like a tablet terminal and a smartphone. A plurality of devices are connected to the network 208, and the information is provided to the image processing device 200 in the form of SaaS (Software as a Service) such as an information storage function, an arithmetic function, and an input / output function in accordance with an instruction from the CPU 201. Can be. Note that the configuration illustrated in FIG. 2 is merely an example of a configuration of a computer device that can be applied to the image processing device 200, and another configuration may be employed.

＜機能ブロック図＞
図３は、画像処理装置２００の機能ブロック図の一例を示す図である。画像処理装置２００は、低解像度画像取得部３０１、分類スコア取得部３０２、高解像度処理部３０３、学習用データ取得部３０４、パラメータ最適化部３０５、およびパラメータ保持部３０６を備えている。画像処理装置２００において、ＣＰＵ２０１がＨＤＤ２０３内に格納された制御プログラムを読み込み実行することで、上記各部の機能を実現する。なお、各構成部に相当する専用の処理回路を備えるように画像処理装置２００を構成するようにしてもよい。また、専用の処理回路と制御プログラムとを併せて用いても良い。 <Functional block diagram>
FIG. 3 is a diagram illustrating an example of a functional block diagram of the image processing apparatus 200. The image processing apparatus 200 includes a low-resolution image acquisition unit 301, a classification score acquisition unit 302, a high-resolution processing unit 303, a learning data acquisition unit 304, a parameter optimization unit 305, and a parameter holding unit 306. In the image processing apparatus 200, the functions of the above-described units are realized by the CPU 201 reading and executing a control program stored in the HDD 203. Note that the image processing apparatus 200 may be configured to include a dedicated processing circuit corresponding to each component. Further, a dedicated processing circuit and a control program may be used together.

低解像度画像取得部３０１は、低解像度画像の画像データ（第一の画像データ）を取得する。低解像度画像取得部３０１は、ネットワーク２０８を通じて画像データを取得してもよいし、ＨＤＤ２０３に格納されている画像データを取得してもよい。また、取得した画像データが所定の解像度に満たない場合には、所定の解像度（低解像度）の画像データとなるように拡大処理を行ってもよい。 The low resolution image acquisition unit 301 acquires image data (first image data) of a low resolution image. The low-resolution image acquisition unit 301 may acquire image data via the network 208 or may acquire image data stored in the HDD 203. When the acquired image data does not have the predetermined resolution, the enlargement process may be performed so that the image data has a predetermined resolution (low resolution).

分類スコア取得部３０２は、低解像度画像取得部３０１で取得された低解像度画像の画像データに対応する分類スコアを取得する。本実施形態では、予め定められた複数の分類にカテゴライズされた分類を用いる。また、各分類を識別する識別子が用いられる。分類スコアとは、低解像度画像が、各分類を識別する識別子である確率を示すスコア値である。分類スコアの詳細は後述する。 The classification score obtaining unit 302 obtains a classification score corresponding to the image data of the low-resolution image obtained by the low-resolution image obtaining unit 301. In the present embodiment, classifications categorized into a plurality of predetermined classifications are used. An identifier for identifying each classification is used. The classification score is a score value indicating the probability that the low-resolution image is an identifier for identifying each classification. Details of the classification score will be described later.

高解像度処理部３０３は、パラメータ保持部３０６で保持されているパラメータを用いて高解像度処理を行う。高解像度処理部３０３は、パラメータ保持部３０６で保持されているパラメータで構成されているＮＮである。低解像度画像の画像データおよび分類スコアが入力されると、高解像度処理部３０３は、低解像度画像よりも解像度が高い高解像度画像の画像データ（第二の画像データ）を生成して出力する。詳細は後述する。 The high resolution processing unit 303 performs high resolution processing using the parameters stored in the parameter storage unit 306. The high resolution processing unit 303 is an NN configured with the parameters held in the parameter holding unit 306. When the image data of the low-resolution image and the classification score are input, the high-resolution processing unit 303 generates and outputs image data (second image data) of a high-resolution image having a higher resolution than the low-resolution image. Details will be described later.

学習用データ取得部３０４は、学習用データを取得する。学習用データは、先に説明したように、学習用画像データであるｘとｙとに加えて、分類スコアｓの組の複数セットである。取得された学習用データは、パラメータ最適化部３０５に送られる。 The learning data acquisition unit 304 acquires learning data. As described above, the learning data is a plurality of sets of sets of classification scores s in addition to the learning image data x and y. The acquired learning data is sent to the parameter optimizing unit 305.

パラメータ最適化部３０５は、学習用データを用いて高解像度処理部３０３で用いられるパラメータを最適化する。詳細は後述する。パラメータ保持部３０６は、パラメータ最適化部３０５で最適化されたパラメータを保持する。 The parameter optimizing unit 305 optimizes parameters used in the high resolution processing unit 303 using the learning data. Details will be described later. The parameter holding unit 306 holds the parameters optimized by the parameter optimizing unit 305.

なお、図３では、単一の画像処理装置２００の構成の例として説明したが、図３に示す各部は、複数の装置において分散され、複数の装置を用いた処理が行われても良い。例えば、低解像度画像取得部３０１、分類スコア取得部３０２、および高解像度処理部３０３が第１の画像処理装置で実装され、学習用データ取得部３０４およびパラメータ最適化部３０５が第２の画像処理装置で実装されてもよい。パラメータ保持部３０６は、パラメータ保持装置に実装されてもよい。そして、これらの第１の画像処理装置、パラメータ保持装置、および第２の画像処理装置がネットワークで接続される画像処理システムを用いても良い。また、例えばパラメータ最適化部３０５および高解像度処理部３０３が、複数の装置（サーバ）によって構成されており、処理の負荷を各サーバで分散する形態でも良い。 Although FIG. 3 has been described as an example of the configuration of the single image processing apparatus 200, each unit illustrated in FIG. 3 may be distributed among a plurality of apparatuses and processing using a plurality of apparatuses may be performed. For example, the low-resolution image acquisition unit 301, the classification score acquisition unit 302, and the high-resolution processing unit 303 are implemented in the first image processing device, and the learning data acquisition unit 304 and the parameter optimization unit 305 perform the second image processing. It may be implemented in a device. The parameter holding unit 306 may be implemented in a parameter holding device. An image processing system in which the first image processing device, the parameter holding device, and the second image processing device are connected via a network may be used. Further, for example, the parameter optimization unit 305 and the high-resolution processing unit 303 may be configured by a plurality of devices (servers), and the processing load may be distributed among the servers.

＜フローチャート＞
図４は、実施形態１の画像処理装置２００で行われる処理を示すフローチャートである。以下、図３に示す機能ブロック図および図４に示すフローチャートを用いて説明する。図４のフローチャートで示される一連の処理は、ＣＰＵ２０１がＨＤＤ２０３に記憶されているプログラムコードをＲＡＭ２０２に展開し実行することにより行われる。あるいはまた、図４におけるステップの一部または全部の機能をＡＳＩＣまたは電子回路等のハードウェアで実現してもよい。なお、各処理の説明における記号「Ｓ」は、当該フローチャートにおけるステップであることを意味する。 <Flow chart>
FIG. 4 is a flowchart illustrating processing performed by the image processing apparatus 200 according to the first embodiment. Hereinafter, description will be made with reference to a functional block diagram shown in FIG. 3 and a flowchart shown in FIG. A series of processes shown in the flowchart of FIG. 4 is performed by the CPU 201 expanding the program code stored in the HDD 203 to the RAM 202 and executing the program code. Alternatively, some or all of the functions of the steps in FIG. 4 may be realized by hardware such as an ASIC or an electronic circuit. The symbol “S” in the description of each process means that it is a step in the flowchart.

Ｓ４０１においては、パラメータ保持部３０６に学習済パラメータが保持されていればＳ４０２〜Ｓ４０５の高解像度化フェーズに移行し、保持されていなければＳ４０６〜Ｓ４０９の学習フェーズに移行する。なお、Ｓ４０１では、パラメータ保持部３０６に学習用パラメータが保持されているか否か、即ち、学習済みであるか否かで処理が分かれる例を説明したが、これに限られない。既に学習済みである場合であっても、さらなる学習用データを用いて学習を行う場合には、Ｓ４０６に進んで良い。 In step S401, if the learned parameter is stored in the parameter storage unit 306, the process proceeds to a higher resolution phase of steps S402 to S405, and if not, the process proceeds to a learning phase of steps S406 to S409. In S401, an example has been described in which the processing is divided depending on whether or not the learning parameter is stored in the parameter storage unit 306, that is, whether or not learning has been completed. However, the present invention is not limited to this. Even if the learning has already been performed, when learning is to be performed using further learning data, the process may proceed to S406.

＜学習フェーズ＞
まず学習フェーズの処理の流れを説明する。Ｓ４０６では学習用データ取得部３０４は、学習用データを取得する。学習用データは、例えば｛低解像度画像ｘ_i、高解像度画像ｙ_i、識別子ＩＤ_i｝からなる組のデータであり、この組のデータが複数含まれているデータである。なお添え字のｉはｉ番目の学習用データ組｛ｘ_i，ｙ_i，ＩＤ_i｝であることを示す。学習用データに用いられる低解像度画像ｘ_iは、高解像度画像ｙ_iをダウンサンプリングして生成されてもよい。また、画素数の少ないカメラで撮像した画像を低解像度画像ｘ_i、同一被写体を画素数の多いカメラで撮像した画像を高解像度画像ｙ_iとしても良い。あるいは、画素数は同一のカメラでも、広角で撮像した画像の第１被写体領域の画像を低解像度画像ｘ_i、望遠で撮像した画像のうちの同一の第１被写体領域に対応する画像を高解像度画像ｙ_iとしても良い。識別子ＩＤは、｛ｘ，ｙ｝が示す学習用画像を識別するデータである。例えばＩＤはスカラー値であって、年齢で識別する形態であれば、子供（ＩＤ＝１）、成人（ＩＤ＝２）、老人（ＩＤ＝３）などと設定することができる。 <Learning phase>
First, the flow of processing in the learning phase will be described. In S406, the learning data acquisition unit 304 acquires the learning data. The learning data is a set of data including, for example, a {low-resolution image x _i , a high-resolution image y _i , and an identifier ID _i }, and includes a plurality of sets of data. Note subscript i is the i-th training data set _{_{{x i, y i, ID}} i} indicates a. Low resolution image x _i used in the training data may be generated by down-sampling the high resolution image y _i. An image captured by a camera having a small number of pixels may be referred to as a low-resolution image x _i , and an image captured of the same subject by a camera having a large number of pixels may be referred to as a high-resolution image y _i . Alternatively, even with a camera having the same number of pixels, an image of the first subject area of an image captured at a wide angle is a low-resolution image x _i , and an image corresponding to the same first subject area of an image captured at a telephoto is of a high resolution. The image y _i may be used. The identifier ID is data for identifying the learning image indicated by {x, y}. For example, if the ID is a scalar value and is identified by age, it can be set as a child (ID = 1), an adult (ID = 2), an elderly person (ID = 3), or the like.

Ｓ４０７で学習用データ取得部３０４は、Ｓ４０６で得た学習用データのうち、識別子ＩＤを分類スコアｓへと変換する。本実施形態において分類スコアは、画像が、識別子の各々である確率を示す値である。具体的に、ｉ番目の識別子ＩＤ_iについての分類スコアｓ_iは、j番目の要素ｓ_ijが式（２）で定義されるベクトルである。 In S407, the learning data acquisition unit 304 converts the identifier ID in the learning data obtained in S406 into a classification score s. In the present embodiment, the classification score is a value indicating the probability that the image is each of the identifiers. Specifically, the classification score s _i for the i-th identifier ID _i, a vector j th element s _ij is defined by equation (2).

例えば画像に老人（ＩＤ＝３）の識別子が設定されている場合、対応する分類スコアｓ_iは、［００１］のように３番目の要素が１であり、それ以外の要素が０のベクトルとなる。 For example, when the identifier of the old man (ID = 3) is set in the image, the corresponding classification score s _i has the third element as 1 and the other elements as 0 as in [0 0 1]. Vector.

なお、上記の例では、学習用データであるので、正解である識別子が与えられている識別子が用いられている。例えば、画像に老人（ＩＤ＝３）の識別子が設定されている場合、対応する分類スコアｓ_iは［００１］のように３番目の要素が１、即ち、老人である確率が１００％であることを示す分類スコアが用いられることになる。しかしながら、この例に限られるものではない。学習用データにおいても、複数の分類のうちの所定の分類の識別子の確率を示すデータが学習用データとして用いられても良い。例えば、子供である確率が５０％であり、成人である確率が５０％であるような学習用データの分類スコアとして、［０．５０．５０］のような値が、Ｓ４０７で得られても良い。即ち、分類スコアは、値が「０」でない要素が２つ以上含むものが用いられて良い。 In the above example, since the learning data is used, an identifier provided with a correct identifier is used. For example, if an identifier of an old man (ID = 3) is set in the image, the corresponding classification score s _i is 1 in the third element as in [0 0 1], that is, the probability that the person is an old man is 100%. Will be used. However, it is not limited to this example. Also in the learning data, data indicating the probability of the identifier of the predetermined class among the plurality of classes may be used as the learning data. For example, a value such as [0.5 0.50] is obtained in S407 as the classification score of the learning data such that the probability of being a child is 50% and the probability of being an adult is 50%. May be. That is, a classification score that includes two or more elements whose values are not “0” may be used.

なお、本実施形態では、学習用データには、識別子がスカラー値で設定されている例を説明した。ＮＮであるパラメータ最適化部３０５に入力するデータは、ベクトル表現されているデータであることが求められるので、Ｓ４０７では学習データを変換する処理を行う例を説明した。しかしながら、学習用データに既にベクトル表現されている分類スコアが含まれている場合には、Ｓ４０７の処理はスキップされても良い。 In this embodiment, an example has been described in which the identifier is set as a scalar value in the learning data. Since the data to be input to the parameter optimization unit 305, which is the NN, is required to be data expressed in a vector, the example in which the process of converting the learning data is performed has been described in S407. However, if the learning data includes a classification score already expressed in a vector, the process of S407 may be skipped.

Ｓ４０８でパラメータ最適化部３０５は、学習用データ取得部３０４から入力される学習用データを用いて、高解像度化処理を行う高解像度化ＮＮのパラメータの最適化を行う。高解像度化ＮＮは、数式的には式（３）の関数ｆで定義できる。
ｙ＝ｆ（ｘ，ｓ；θ）式（３） In step S <b> 408, the parameter optimization unit 305 optimizes the parameters of the high-resolution NN that performs the high-resolution processing using the learning data input from the learning data acquisition unit 304. The resolution enhancement NN can be mathematically defined by a function f of Expression (3).
y = f (x, s; θ) Equation (3)

式（３）において、ｘは低解像度画像、ｓは分類スコア、θはモデルパラメータ、ｙは関数ｆにより推定された高解像度画像である。パラメータ最適化部３０５は、上記の式（３）におけるモデルパラメータθを最適化する処理を行う。関数ｆはＮＮを用いて実現される。パラメータ最適化部３０５での最適化処理を説明するに先立って、まず、本実施形態の高解像度処理部３０３を構成する高解像度化ＮＮを説明する。 In Expression (3), x is a low-resolution image, s is a classification score, θ is a model parameter, and y is a high-resolution image estimated by a function f. The parameter optimizing unit 305 performs a process of optimizing the model parameter θ in the above equation (3). The function f is realized using NN. Prior to describing the optimization processing in the parameter optimization unit 305, first, the high resolution NN constituting the high resolution processing unit 303 of the present embodiment will be described.

図５は、本実施形態の高解像度化ＮＮのネットワークアーキテクチャの一例を示す図である。図５のＮＮは、Ｕ−ｎｅｔと呼ばれるＵ字形式のＮＮをベースとしている。入力画像５０１の画像データは、入力データである低解像度画像のデータである。入力画像５０１は、低解像度画像を予めバイキュービック補間などにより所望の解像度まで拡大した３２×３２（画素）の１チャンネルからなるモノクロ画像である。なお、ここでは、説明のため、入力画像５０１のサイズを３２×３２（画素）の１チャンネルからなるモノクロ画像であるものとして説明するが、これに限られるものではない。例えば、５１２×５１２（画素）の画像や、１０２４×１０２４(画素)の画像を入力の画像データとしても良い。 FIG. 5 is a diagram illustrating an example of the network architecture of the high resolution NN according to the present embodiment. The NN in FIG. 5 is based on a U-shaped NN called U-net. The image data of the input image 501 is data of a low-resolution image that is input data. The input image 501 is a monochrome image composed of one channel of 32 × 32 (pixels) obtained by previously enlarging a low-resolution image to a desired resolution by bicubic interpolation or the like. Here, for the sake of explanation, the size of the input image 501 is described as a monochrome image composed of one channel of 32 × 32 (pixels), but is not limited to this. For example, a 512 × 512 (pixel) image or a 1024 × 1024 (pixel) image may be used as input image data.

入力画像５０１は、ストライデッドコンボリューション(strided convolution)により空間方向に半分に縮小され、その代りチャネル数が６４に増えた中間データＤ１となる。さらに空間サイズを縮小させ、チャネルを増加させる処理を繰り返して中間データＤ２，Ｄ３，Ｄ４を得て、最終的には画像サイズ１×１でチャネル数が１０１４の中間データ５０２となる。中間データ５０２を得るまで様々な処理が行われ、また中間データ５０２から最終的な出力画像５０４の画像データを算出するまでにも多くのコンボリューション等の演算が行われる。それら演算のパラメータを全てひっくるめてモデルパラメータθと呼んでいることに注意されたい。 The input image 501 is reduced to half in the spatial direction by a strided convolution, and becomes intermediate data D1 in which the number of channels is increased to 64 instead. Further, the process of reducing the space size and increasing the number of channels is repeated to obtain intermediate data D2, D3, and D4. Finally, intermediate data 502 having an image size of 1 × 1 and 1014 channels is obtained. Various processes are performed until the intermediate data 502 is obtained, and many calculations such as convolution are performed until the final image data of the output image 504 is calculated from the intermediate data 502. It should be noted that all of these calculation parameters are collectively called a model parameter θ.

さて中間データ５０２は、画像サイズ１×１であるため入力画像５０１の局所的な情報を喪失しており、画像全体の特徴あるいは意味を表すデータであると考えられる。そこで本実施形態では、この中間データ５０２に、高解像度化ＮＮへのもう一つの入力データである分類スコア５０３を継ぎ足して、統合中間データを得る。図５では分類スコアが１０要素のベクトルであるものとする。１０要素のベクトルが中間データ５０２に継ぎ足された結果、統合中間データは、画像サイズ１×１でチャネル数が１０２４の中間データとなる。 Now, since the intermediate data 502 has an image size of 1 × 1, the local information of the input image 501 has been lost, and it is considered that the intermediate data 502 is data representing the characteristics or meaning of the entire image. Therefore, in the present embodiment, integrated intermediate data is obtained by adding the classification score 503, which is another input data to the high resolution NN, to the intermediate data 502. In FIG. 5, it is assumed that the classification score is a vector of 10 elements. As a result of adding the 10-element vector to the intermediate data 502, the integrated intermediate data is intermediate data having an image size of 1 × 1 and 1024 channels.

続いてこの統合中間データは、トランスポーズドコンボリューション(transposed convolution)、あるいはデコンボリューション（Deconvolution）によって、画像サイズが増えチャネル数が減った中間データＵ４となる。なお、中間データＵ４が画像サイズ２×２でチャンネル数が５１２の中間データとなるように、Ｕ−ｎｅｔのトランスポーズドコンボリューション、あるいはデコンボリューションのパラメータが設定されているものとする。中間データＵ４は、中間データＤ４と加算されて、同様の処理により画像サイズが増えチャネル数が減った中間データＵ３となる。このようなバイパス構造はＵ−ＮＥＴの特徴の一つである。同様の処理を繰り返すことで、最終的に出力画像５０４の画像データが出力される。ここでは、出力画像５０４は、入力画像５０１と同様に、画像サイズ３２×３２で１チャンネルからなる画像である。 Subsequently, the integrated intermediate data becomes intermediate data U4 in which the image size increases and the number of channels decreases due to transposed convolution or deconvolution. It is assumed that U-net transposed convolution or deconvolution parameters are set so that the intermediate data U4 is intermediate data having an image size of 2 × 2 and 512 channels. The intermediate data U4 is added to the intermediate data D4, and becomes the intermediate data U3 in which the image size is increased and the number of channels is reduced by the same processing. Such a bypass structure is one of the features of U-NET. By repeating the same processing, the image data of the output image 504 is finally output. Here, similarly to the input image 501, the output image 504 is an image having an image size of 32 × 32 and including one channel.

図５のネットワークアーキテクチャに関して本実施形態の特徴的な構成の一つを述べる。本実施形態では、ＮＮに対して２系統で入力を行っている。２系統入力自体は、一般的に行われるものである。しかしながら、１系統の入力でＮＮの処理が開始され、ＮＮの処理の途中に追加的にもう１系統のデータが入力されることは、ＮＮにおける処理では異例である。一般的には、まとめて入力を与えて、まとめて出力を受け取ることが行われている。 One characteristic configuration of the present embodiment will be described with respect to the network architecture of FIG. In the present embodiment, inputs are made to the NN in two systems. The two-system input itself is generally performed. However, it is unusual in the process of the NN that the process of the NN is started by the input of one system and the data of the other system is additionally input during the process of the NN. In general, input is provided collectively and output is collectively received.

なお、本実施形態のＮＮのアーキテクチャにおいても一般的な処理のように、まとめて入力を与える処理を行うことも可能である。例えば上述した図５の例では、分類スコアの要素は、１０要素である。従って、まとめて入力を与える形態の場合には、この１０要素分のデータを加えた入力データをＮＮに与えれば良い。具体的には、入力画像のチャネル数を１から１１へと変更し、冗長ではあるが各画素に分類スコアを定義すればよい。本実施形態において、一般的な処理をせずに、わざわざ入力データの供給パスを２分することには、次のような利点があるからである。まず、入力データサイズの増加を防げる。図５の例でいうと、入力画像の各画素に分類スコアを定義した場合、入力の要素数は３２×３２×１１＝１１２６４である。それに対し本実施形態の処理を採用した場合、入力の要素数は３２×３２×１＋１０＝１０３４であり、一般的な処理の場合に比べて１０分の１以下である。従って、本実施形態の処理を採用する場合、メモリを有効に使うことが可能となり、この結果、学習速度を速めることができる。例えばミニバッチ学習の際のバッチサイズを大きくとる事ができる。また、本実施形態の処理を採用する場合、パラメータの増大を防ぐことが出来る。ＮＮでは入力のチャネルが増えるとそれに応じてネットワーク全体の中間データ、及びそれを算出するためのパラメータを増やす事が一般的である。そうすることで、増えた入力データを適切に処理する余地を作るのである。単純に考えれば、入力が１０倍なら中間データも１０倍必要であるし、それを算出するためのパラメータも１０倍必要である。チャネル相互相関がある場合はより少なく設定することも可能であるが、何れにせよ中間データ、およびモデルパラメータは増大する。そしてそれらを計算するための計算負荷も当然増加する。これらの問題は分類スコアの要素数が多くなるほど深刻であることは言うまでもない。本実施形態では、すでに意味情報となっている分類スコアをあえて画像の形式に整形するような事をせず、入力画像が意味情報となる段階を待って、中間データ５０２に分類スコア５０３を継ぎ足すことが行われる。 Note that in the NN architecture of the present embodiment, it is also possible to perform a process of giving inputs collectively as in a general process. For example, in the example of FIG. 5 described above, the elements of the classification score are ten elements. Therefore, in the case of providing input collectively, input data obtained by adding the data of these 10 elements may be supplied to the NN. Specifically, the number of channels of the input image may be changed from 1 to 11, and a classification score may be defined for each pixel although it is redundant. In this embodiment, dividing the supply path of the input data into two without performing general processing has the following advantages. First, an increase in input data size can be prevented. In the example of FIG. 5, when the classification score is defined for each pixel of the input image, the number of elements of the input is 32 × 32 × 11 = 111264. On the other hand, when the processing of the present embodiment is adopted, the number of input elements is 32 × 32 × 1 + 10 = 1,034, which is 1/10 or less as compared with the case of general processing. Therefore, when the processing of the present embodiment is adopted, the memory can be used effectively, and as a result, the learning speed can be increased. For example, the batch size for mini-batch learning can be increased. Further, when the processing of the present embodiment is adopted, an increase in parameters can be prevented. Generally, in an NN, when the number of input channels increases, intermediate data of the entire network and parameters for calculating the intermediate data increase. By doing so, there is room for properly processing the increased input data. To put it simply, if the input is 10 times, the intermediate data also needs to be 10 times, and the parameter for calculating it also needs to be 10 times. If there is channel cross-correlation, it can be set smaller, but in any case, the intermediate data and the model parameters increase. And the calculation load for calculating them naturally increases. It goes without saying that these problems become more serious as the number of elements in the classification score increases. In the present embodiment, the classification score that has already become the semantic information is not intentionally shaped into an image format. Adding is done.

なお、図５では、局所的な情報が欠落し、入力画像が意味情報となっている（即ち、画像全体の特徴を表している）、画像サイズが１×１でチャンネル数が１０２４の中間データに分類スコア５０３を継ぎ足す例を説明した。画像全体の特徴を表している中間データに、各々の識別子の確率を示す分類スコアを継ぎ足すことで、画像全体がどの識別子の特徴を有すべきかの方向付けを行うことができるので、適切な出力画像を得ることができるからである。しかしながら、この例に限られるものではない。例えば、画像を４分割することが可能であり、かつその分割画像の各々に相関がないような場合には、画像サイズが２×２でチャンネル数が５１２の中間データＵ４に分類スコアを継ぎ足す形態でもよい。あるいは、全体をそれほど考慮しなくて良いような画像の場合にも、画像サイズが２×２でチャンネル数が５１２の中間データＵ４に分類スコアを継ぎ足す形態でもよい。これらの形態でも、ある程度の効果が認められ得る。なお、図５において画像サイズが２×２でチャンネル数が５１２の中間データＵ４に分類スコアを継ぎ足す場合には、分類スコアのデータサイズも２×２であり、かつ１０チャンネル（１０要素）のデータ形式で継ぎ足すことになる。この場合、分類スコア５０３のデータを単純に２×２のサイズにコピーして用いれば良い。 In FIG. 5, the local information is missing, the input image is semantic information (that is, it represents the characteristics of the entire image), and the intermediate data of the image size of 1 × 1 and the number of channels of 1024 is used. The example in which the classification score 503 is added to the above has been described. By adding the classification score indicating the probability of each identifier to the intermediate data representing the features of the entire image, it is possible to determine which identifier feature the entire image should have. This is because an output image can be obtained. However, it is not limited to this example. For example, when it is possible to divide an image into four parts and each of the divided images has no correlation, the classification score is added to the intermediate data U4 having an image size of 2 × 2 and 512 channels. It may be in a form. Alternatively, even in the case of an image in which the entire image does not need to be considered so much, the classification score may be added to the intermediate data U4 having an image size of 2 × 2 and 512 channels. Even in these forms, some effects can be observed. When the classification score is added to the intermediate data U4 having an image size of 2 × 2 and 512 channels in FIG. 5, the data size of the classification score is also 2 × 2 and 10 channels (10 elements). It will be added in data format. In this case, the data of the classification score 503 may be simply copied to a size of 2 × 2 and used.

なお、図５ではＵ−ｎｅｔを例に挙げて説明したが、これに限られるものではない。局所的な情報を喪失し、画像全体の特徴を表す画像サイズに縮小する処理を行うようなＮＮであればいずれのＮＮを適用してもよい。例えば、ＣｏｎｖｏｌｕｔｉｏｎａｌＥｎｃｏｄｅｒ−Ｄｅｃｏｄｅｒａｒｃｈｉｔｅｃｔｕｒｅであっても良い。このＮＮでは、局所情報を喪失するまでＥｎｃｏｄｅｒ部で空間解像度を下げることが行われる。 Although FIG. 5 illustrates the U-net as an example, the present invention is not limited to this. Any NN may be applied as long as it is a NN that performs processing for losing local information and reducing the image size to the image size representing the characteristics of the entire image. For example, a Convolutional Encoder-Decoder architecture may be used. In this NN, the spatial resolution is reduced in the encoder section until the local information is lost.

また、図５では説明のために、入力画像５０１は、画像サイズが３２×３２の１チャンネルである場合を示し、５１２×５１２（画素）の画像や、１０２４×１０２４(画素)の画像を入力の画像データとしても良いことを説明した。ここで、実際の入力画像の画像サイズが、例えば５１２×５１２（画素）の１チャンネルの画像である場合、その画像（全体画像）を複数の部分画像に分割し、分割した部分画像を入力の画像データとしても良い。即ち、図５の入力画像５０１は、部分画像であっても良い。そして、出力画像として出力された各部分画像を合成して最終の出力画像が得られても良い。なお、各部分画像を合成する際に、所定の画像処理を行い、最終の出力画像が得られても良い。 In FIG. 5, for the sake of explanation, the input image 501 is a case where the image size is one channel of 32 × 32, and a 512 × 512 (pixel) image or a 1024 × 1024 (pixel) image is input. That the image data may be used. Here, when the image size of the actual input image is, for example, an image of one channel of 512 × 512 (pixels), the image (whole image) is divided into a plurality of partial images, and the divided partial images are input. It may be image data. That is, the input image 501 in FIG. 5 may be a partial image. Then, the final output image may be obtained by combining the partial images output as the output image. When combining the partial images, predetermined image processing may be performed to obtain a final output image.

以上が本実施形態の高解像度処理部３０３である高解像度化ＮＮのアーキテクチャの説明である。次に、説明を戻し、Ｓ４０８においてパラメータ最適化部３０５によって行われる高解像度化ＮＮのパラメータの最適化処理を説明する。 The above is the description of the architecture of the high resolution NN which is the high resolution processing unit 303 of the present embodiment. Next, returning to the description, the optimization processing of the parameters of the high resolution NN performed by the parameter optimization unit 305 in S408 will be described.

本実施形態では、パラメータの最適化処理に、敵対的生成ネットワーク（ＧＡＮ：Generative Adversarial Network）を用いる。敵対的生成ネットワークでは、一般的にＧｅｎｅｒａｔｏｒとＤｉｓｃｒｉｍｉｎａｔｏｒとの２つのネットワークを用いた処理が行われる。Ｇｅｎｅｒａｔｏｒは、Ｄｉｓｃｒｉｍｉｎａｔｏｒに見破られないようにオリジナルに限りなく近い「偽物」を生成するように学習される。Ｄｉｓｃｒｉｍｉｎａｔｏｒは、入力されたものが、Ｇｅｎｅｒａｔｏｒによって生成された「偽物」かオリジナル（「本物」）かを判定し、Ｇｅｎｅｒａｔｏｒによって生成された「偽物」を見破るように学習される。このような２つのネットワークが、いわば互いに切磋琢磨するように学習することで、Ｇｅｎｅｒａｔｏｒの学習精度が高まる。本実施形態では、このようなＧＡＮを用いてパラメータの最適化が行われる。 In the present embodiment, a hostile generation network (GAN: Generative Adversarial Network) is used for parameter optimization processing. In the hostile generation network, processing is generally performed using two networks, a generator and a discriminator. The generator is trained to generate a "fake" that is as close as possible to the original so as not to be overlooked by the discriminator. The Discriminator is learned so as to determine whether the input is “fake” generated by the generator or the original (“real”), and to find out “fake” generated by the generator. The learning accuracy of the generator is improved by learning such two networks so as to compete with each other. In the present embodiment, parameter optimization is performed using such a GAN.

図６は、パラメータの最適化方法を説明する図である。分かりやすさのため、まず分類スコアを用いない基本的な高解像度化ＮＮの最適化方法を説明する。図６（ａ）は、分類スコアを用いない場合のパラメータの最適化方法を説明する図である。 FIG. 6 is a diagram illustrating a parameter optimization method. For the sake of simplicity, a basic optimization method for a high resolution NN that does not use a classification score will be described first. FIG. 6A is a diagram illustrating a method for optimizing parameters when a classification score is not used.

図６（ａ）では高解像度処理部（Ｇｅｎｅｒａｔｏｒ）の最適化のために追加の判定部（Ｄｉｓｃｒｉｍｉｎａｔｏｒ）を設けている。高解像度処理部の役割は、入力データ（低解像度画像）の高解像度化である。高解像度処理部は、関数的には式（４）として表せる。 In FIG. 6A, an additional determination unit (Discriminator) is provided for optimizing the high-resolution processing unit (Generator). The role of the high-resolution processing section is to increase the resolution of input data (low-resolution image). The high-resolution processing unit can be expressed functionally as Expression (4).

ここで、ｘは学習データから取得した低解像度画像、θは高解像度処理部のモデルパラメータである。 Here, x is a low-resolution image acquired from the learning data, and θ is a model parameter of the high-resolution processing unit.

は、高解像度処理部が生成した推定高解像度画像である。 Is an estimated high-resolution image generated by the high-resolution processing unit.

一方、判定部の役割は入力画像が、高解像度処理部が生成した推定高解像度画像である場合には０（偽），学習データから取得した高解像度画像である場合には１（真）を出力する事である。判定部を関数で表すと式（５）となる。
ｂ＝ｇ（ｙ；φ）式（５） On the other hand, the role of the determination unit is 0 (false) if the input image is an estimated high-resolution image generated by the high-resolution processing unit, and 1 (true) if the input image is a high-resolution image obtained from learning data. Output. Expression (5) represents the determination unit as a function.
b = g (y; φ) Equation (5)

ここで、ｂは真偽を示す値、ｙは入力画像（高解像度画像）、φは判定部のモデルパラメータである。当然ながら、学習開始時において高解像度処理部、判定部とも役割通りの処理は行えないことに注意されたい。例えば判定部ｇ（ｙ；φ）の出力ｂなどは０〜１の連続値を取ることが通常である。学習が進むにつれて、出力ｂは、０または１の値に集約される。 Here, b is a value indicating true / false, y is an input image (high-resolution image), and φ is a model parameter of the determination unit. Of course, it should be noted that, at the start of learning, neither the high-resolution processing unit nor the determination unit can perform processing according to roles. For example, the output b of the determination unit g (y; φ) usually takes a continuous value of 0 to 1. As the learning progresses, the output b is aggregated into values of 0 or 1.

このような高解像度処理部と判定部とを有する場合において、図６（ａ）の高解像度処理部の目的関数Ｌ_f（θ）は、式（６）で表せる。 In the case where such a high-resolution processing unit and the determination unit are provided, the objective function L _f (θ) of the high-resolution processing unit in FIG. 6A can be expressed by Expression (6).

式（６）の第一項は、高解像度処理部によって生成された推定高解像度画像と、学習用データから取得した正解の高解像度画像ｙ_iとの二乗誤差であって、両者を近づける作用がある。式（６）の第二項は、高解像度処理部によって生成された推定高解像度画像を、判定部に入力した際に、真（１）を出力せしめる作用がある。 The first term of the equation (6) is a square error between the estimated high-resolution image generated by the high-resolution processing unit and the correct high-resolution image y _i acquired from the learning data, and has an effect of bringing the two close to each other. is there. The second term in equation (6) has an effect of outputting true (1) when the estimated high-resolution image generated by the high-resolution processing unit is input to the determination unit.

一方、図６（ａ）の判定部の目的関数Ｌ_g（φ）は、式（７）で表せる。 On the other hand, the objective function L _g (φ) of the determination unit in FIG. 6A can be expressed by equation (7).

式（７）の第一項は、学習用データから取得した正解の高解像度画像ｙ_iを判定部に入力した場合に真（１）を出力せしめる作用がある。第二項は、高解像度処理部によって生成された推定高解像度画像を、判定部に入力した際に、偽（０）を出力せしめる作用がある。 The first term of the equation (7) has an effect of outputting true (1) when the correct high-resolution image y _i acquired from the learning data is input to the determination unit. The second term has an effect of outputting false (0) when the estimated high-resolution image generated by the high-resolution processing unit is input to the determination unit.

このように、判定部は、高解像度処理部の出力を偽と判定するようにパラメータが最適化される。一方、高解像度処理部は、判定部の出力を真とせしめるような画像を生成するようパラメータが最適化される。平たく言えば、判定部は高解像度画像が本物（学習用データ由来）か偽物かを判定し、高解像度処理部はその判定部を騙すように両者のパラメータが最適化される。これが敵対的生成ネットワークと呼ばれる所以である。 As described above, the parameters are optimized so that the determination unit determines that the output of the high-resolution processing unit is false. On the other hand, the parameters of the high resolution processing unit are optimized so as to generate an image that makes the output of the determination unit true. To put it simply, the determination unit determines whether the high-resolution image is genuine (derived from learning data) or a fake, and the high-resolution processing unit optimizes both parameters so as to trick the determination unit. This is why it is called a hostile generation network.

以上が、分類スコアを用いない場合の一般的なパラメータ最適化処理の説明である。以上の説明を基礎として、以下に本実施形態における、分類スコアを用いる高解像度化ＮＮの最適化方法を説明する。 The above is the description of the general parameter optimization processing when the classification score is not used. Based on the above description, a method of optimizing a high-resolution NN using a classification score in the present embodiment will be described below.

図６（ｂ）は、本実施形態において分類スコアを用いたパラメータ最適化処理を説明する図である。本実施形態においては、判定部の役割には、画像の真偽を判定することに加えて、入力画像の分類スコアを推定することが含まれる。この理由は、次の通りである。本実施形態の高解像度処理部では、低解像度画像と分類スコアとを入力として高解像度画像（偽物）を出力することになる。例えば、「花」の分類スコアと低解像度画像とを高解像度処理部に入力して高解像度化処理した画像は、高解像の花の画像となる（「花」であることに方向付けされて高解像度化された画像となる）。つまり、この例では、高解像度処理部は、花と判定されるような高解像化の方法を学習する必要がある。高解像度処理部が花と判定されるような高解像度化の方法を学習するためには、ＧＡＮにおいては、高解像度処理部で生成された画像が花であるかを判定する（見破ることができる）判定部が必要になる。従って、本実施形態において判定部は、画像の真偽を判定することに加えて、入力画像の分類スコアを推定する役割を担うことになる。 FIG. 6B is a diagram illustrating a parameter optimization process using a classification score in the present embodiment. In the present embodiment, the role of the determination unit includes estimating the classification score of the input image in addition to determining the authenticity of the image. The reason is as follows. The high-resolution processing unit of the present embodiment outputs a high-resolution image (fake) with the low-resolution image and the classification score as inputs. For example, an image obtained by inputting the classification score of “flower” and the low-resolution image to the high-resolution processing unit and performing high-resolution processing becomes a high-resolution flower image (oriented to be “flower”). The image will be a higher resolution image). That is, in this example, the high-resolution processing unit needs to learn a high-resolution method for determining a flower. In order to learn a high-resolution processing method in which the high-resolution processing unit is determined to be a flower, the GAN determines whether the image generated by the high-resolution processing unit is a flower (can be seen). ) A judgment unit is required. Therefore, in the present embodiment, the determination unit plays a role of estimating the classification score of the input image in addition to determining the authenticity of the image.

本実施形態の図６（ｂ）の判定部を関数で表すと式（８）となる。 When the determination unit in FIG. 6B of the present embodiment is represented by a function, Expression (8) is obtained.

ここで、 here,

は、入力画像の推定分類スコアである。つまり、上記の「花」の例では、推定分類スコアは、入力画像が花である確率を示す値を含むことになる。もちろん、他の識別子である確率を含むことになる。 Is the estimated classification score of the input image. That is, in the above example of “flower”, the estimated classification score includes a value indicating the probability that the input image is a flower. Of course, it will include the probability of being another identifier.

一方、本実施形態の高解像度処理部の役割は、入力された低解像度画像を、入力された分類スコアに応じて高解像度化することである。図６（ｂ）の高解像度処理部を関数で表すと式（９）になる。 On the other hand, the role of the high-resolution processing unit of the present embodiment is to increase the resolution of the input low-resolution image according to the input classification score. Expression (9) is obtained by expressing the high-resolution processing unit in FIG. 6B by a function.

ここで、ｓは学習データから取得した分類スコアである。 Here, s is a classification score obtained from the learning data.

以上のような高解像度処理部と判定部とを有する場合において、本実施形態の図６（ｂ）の高解像度処理部の目的関数Ｌ_f（θ）は、式（１０）で表せる。 In the case where the high-resolution processing unit and the determination unit are provided as described above, the objective function L _f (θ) of the high-resolution processing unit in FIG. 6B of the present embodiment can be expressed by Expression (10).

式（１０）の第一項は、高解像度処理部によって生成された推定高解像度画像と、学習用データから取得した正解の高解像度画像ｙ_iとの二乗誤差であって、両者を近づける作用がある。第二項は、高解像度処理部が分類スコアｓ_iに基づいて低解像度画像ｘ_iから生成した推定高解像度画像を判定部に入力した場合に、真（１）であり、かつ推定分類スコア The first term of the equation (10) is a square error between the estimated high-resolution image generated by the high-resolution processing unit and the correct high-resolution image y _i obtained from the learning data, and the action of bringing the two close together. is there. The second term, if you enter the estimated high resolution image high resolution processing section is generated from the low-resolution image x _i based on the classification score s _i to the determination unit, is true (1), and the estimated classification score

と学習用データの分類スコアｓ_iとを近づける作用がある。即ち、式（１０）の第二項は、高解像度処理部が分類スコアｓ_iに基づいて低解像度画像ｘ_iから生成した推定高解像度画像を判定部に入力した場合に、真（１）であり、かつ分類スコアｓ_iと判定せしめる作用がある。 And the classification score s _i of the learning data. That is, if you enter the paragraph, the determination unit estimates high-resolution image generated from a low-resolution image x _i based high-resolution processing unit to the classification score s _i of equation (10), true (1) Yes, and has the effect of making the classification score s _i .

一方、本実施形態の図６（ｂ）の判定部の目的関数Ｌ_g（φ）は、式（１１）で表せる。 On the other hand, the objective function L _g (φ) of the determination unit in FIG. 6B of the present embodiment can be expressed by Expression (11).

上式の第一項は、学習用データから取得した正解の高解像度画像ｙ_iを判定部に入力した場合に真（１）かつ分類スコアｓ_iと出力せしめる作用がある。第二項のｇ_bは、判定部ｇの２つの出力 The first term of the above equation has an effect of outputting true (1) and the classification score s _i when the correct high-resolution image y _i acquired from the learning data is input to the determination unit. The second term g _b is the two outputs of the judgment unit g.

のうち真偽を示す値ｂを意味する。式（１１）の第二項は、高解像度処理部によって生成された推定高解像度画像を、判定部に入力した際に、偽（０）を出力せしめる作用がある。 Means a value b indicating true / false. The second term of equation (11) has an effect of outputting false (0) when the estimated high-resolution image generated by the high-resolution processing unit is input to the determination unit.

上記の目的関数の設定により、判定部は、画像の真偽を判定する能力に加えて、学習用データから高解像度画像と分類スコアとの関係を学習し、入力される画像から分類スコアを推定する能力を獲得する。なお本実施形態においては、分類スコアの学習に用いられる画像は、学習用データから取得した高解像度画像のみであって、高解像度処理部によって生成された推定高解像度画像を用いていない。学習中においては高解像度処理部によって生成された画像と分類スコアとの対応が確実ではないからである。このため、高解像度処理部と判定部とは、画像の真偽に関しては図６（ａ）と同様に対立関係にあるが、分類スコアに関しては対立関係にはない。 By setting the above objective function, the determining unit learns the relationship between the high-resolution image and the classification score from the learning data and estimates the classification score from the input image, in addition to the ability to determine the authenticity of the image. Gain the ability to In the present embodiment, the images used for learning the classification score are only the high-resolution images acquired from the learning data, and do not use the estimated high-resolution images generated by the high-resolution processing unit. This is because during the learning, the correspondence between the image generated by the high-resolution processing unit and the classification score is not reliable. For this reason, the high-resolution processing unit and the determination unit have a conflicting relationship regarding the authenticity of the image as in FIG. 6A, but have no conflicting relationship regarding the classification score.

なお高解像度処理部を構成するＮＮの例は、図５で既に説明した。判定部を構成するＮＮとしては、画像を入力して真偽値と分類スコアとを連結したベクトルを出力する適切なＮＮであればよい。例えば図５のＮＮにおいて、中間データ５０２を最終出力とするようなＮＮでもよい。その際の中間データ５０２のサイズは１０１４ではなく１＋分類スコアの要素数となる。判定部を構成するＮＮのパラメータは、高解像度化ＮＮのパラメータと共有しないことは言うまでもない。 The example of the NN constituting the high-resolution processing unit has already been described with reference to FIG. The NN constituting the determination unit may be any suitable NN that inputs an image and outputs a vector obtained by connecting a boolean value and a classification score. For example, the NN in FIG. 5 may be such that the intermediate data 502 is the final output. At this time, the size of the intermediate data 502 is not 1014 but 1 + the number of elements of the classification score. It goes without saying that the parameters of the NN constituting the determination unit are not shared with the parameters of the high resolution NN.

図７は、ＥｎｃｏｄｅｒタイプのＮＮの例である。判定部は、図７に示すようなＮＮで構成されて良い。ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＧＡＮのＤｉｓｃｒｉｍｉｎａｔｏｒタイプのアーキテクチャを有するＮＮの最終出力を、本来の真偽値に加え、分類スコアも合わせたベクトルで出力するように改良すれば良い。 FIG. 7 is an example of an Encoder type NN. The determination unit may be configured by an NN as shown in FIG. What is necessary is to improve the final output of the NN having the Discriminator type architecture of the Deep Convolutional GAN so as to output a vector in which the classification score is added in addition to the original true / false value.

以上が、Ｓ４０８においてパラメータ最適化部３０５で行われる高解像度化ＮＮのパラメータ最適法である。図４のフローチャートに戻り説明を続ける。Ｓ４０９ではパラメータ最適化部３０５が最適化されたパラメータをパラメータ保持部３０６に保存する。以上でＳ４０６〜Ｓ４０９の学習フェーズが終了し、Ｓ４０２から始まる高解像度化フェーズに移行する。 The above is the parameter optimization method of the high resolution NN performed by the parameter optimization unit 305 in S408. Returning to the flowchart of FIG. 4, the description will be continued. In step S409, the parameter optimization unit 305 stores the optimized parameters in the parameter storage unit 306. Thus, the learning phases of S406 to S409 are completed, and the process proceeds to the resolution increasing phase starting from S402.

なお、図６（ｂ）で示す高解像度処理部は、高解像度処理部３０３と同じであっても良いし、別であってもよい。いずれにせよ、パラメータ最適化部３０５では、高解像度処理部３０３で用いられるパラメータが生成され、パラメータ保持部３０６に保持される。高解像度処理部３０３は、このパラメータ保持部３０６で保持されているパラメータを用いてＮＮが構成されことになる。図６（ｂ）で示す判定部は、パラメータ最適化部３０５に含まれるものである。 The high-resolution processing unit shown in FIG. 6B may be the same as or different from the high-resolution processing unit 303. In any case, in the parameter optimizing unit 305, the parameters used in the high resolution processing unit 303 are generated and stored in the parameter storing unit 306. The high resolution processing unit 303 is configured with the NN using the parameters held in the parameter holding unit 306. The determination unit shown in FIG. 6B is included in the parameter optimization unit 305.

＜高解像度化フェーズ＞
Ｓ４０２では高解像度処理部３０３は、パラメータ保持部３０６から学習済のパラメータを取得する。学習フェーズにおいて高解像度処理部、判定部の二つのＮＮの学習（パラメータ最適化）を行ったが、高解像度化フェーズにおいて利用するのは高解像度化ＮＮのみであるため高解像度化ＮＮのパラメータのみ取得する。 <High resolution phase>
In step S <b> 402, the high-resolution processing unit 303 acquires the learned parameters from the parameter storage unit 306. In the learning phase, learning (parameter optimization) of the two NNs of the high-resolution processing unit and the determination unit was performed. get.

Ｓ４０３では、低解像度画像取得部３０１が、ＨＤＤ２０３から低解像度画像データを取得する。 In step S403, the low-resolution image acquisition unit 301 acquires low-resolution image data from the HDD 203.

Ｓ４０４では、分類スコア取得部３０２が、ＨＤＤ２０３から分類スコアを取得する。或いはインタラクティブに表示キーボード等の入力装置２０６から分類スコアを取得しても良い。 In S404, the classification score obtaining unit 302 obtains a classification score from the HDD 203. Alternatively, the classification score may be obtained interactively from the input device 206 such as a display keyboard.

図８は、分類スコアをインタラクティブに取得する際に用いられるＵＩの一例を示す図である。インタラクティブに分類スコアを取得する際には、ディスプレイなどの出力装置２０７に図８のようなＵＩを表示させることができる。図８には、分類スコア設定部８０１および画像表示領域８０２を含む。画像表示領域８０２は、分類スコア設定部８０１で設定された分類スコアを用いて高解像度化処理した画像を表示するための領域である。なお、Ｓ４０４で取得される分類スコアは、［００１］のようにどれかが「１」で他すべてが「０」でなくともよい。図８に例示した設定のように、分類スコアは、［０．２０．１０．７］のようにあいまいな情報であってよい。すなわち、本実施形態では、あいまいな画像識別情報を用いた高解像度化を行うことができる。 FIG. 8 is a diagram illustrating an example of a UI used when interactively acquiring a classification score. When acquiring a classification score interactively, a UI as shown in FIG. 8 can be displayed on the output device 207 such as a display. FIG. 8 includes a classification score setting unit 801 and an image display area 802. The image display area 802 is an area for displaying an image that has been subjected to high resolution processing using the classification score set by the classification score setting unit 801. Note that any one of the classification scores acquired in S404, such as [0 0 1], does not have to be “1” and all others have to be “0”. As in the setting illustrated in FIG. 8, the classification score may be ambiguous information such as [0.2 0.1 0.7]. That is, in the present embodiment, it is possible to increase the resolution using ambiguous image identification information.

Ｓ４０５では、高解像度処理部３０３が、高解像度化処理を行う。具体的には図５に示す高解像度化ＮＮに、低解像度画像をバイキュービック補間等で所望のサイズに拡大した入力画像５０１と、分類スコア５０３とが入力され、高解像度化処理された出力画像５０４が得られる。 In step S405, the high resolution processing unit 303 performs a high resolution process. Specifically, an input image 501 obtained by enlarging a low-resolution image to a desired size by bicubic interpolation or the like and a classification score 503 are input to the high-resolution NN shown in FIG. 504 are obtained.

以上のステップにより、低解像度画像を分類スコアに応じて鮮明に拡大することができる。図８の例では、分類スコアが［０．２０．１０．７］であるため、鮮明に拡大した画像が、少し子供の要素はあるがほぼ老人であるように高解像度化処理がなされる。被写体が実際には老人であった場合、画像識別情報なしでの高解像度化をすると、分類スコアは、規定の種類の確率を均等に割り振った値のスコアとなる。図８の例では、分類スコアが［０．３３０．３３０．３３］での高解像度化になる。このような処理に比べ本実施形態の高解像度化処理の精度は向上する。 Through the above steps, the low-resolution image can be clearly enlarged according to the classification score. In the example of FIG. 8, since the classification score is [0.2 0.1 0.7], the image that has been sharply enlarged is subjected to the high-resolution processing such that the image is almost an old man with a little child element. You. If the subject is actually an old man and if the resolution is increased without image identification information, the classification score will be a value of a value obtained by equally assigning probabilities of prescribed types. In the example of FIG. 8, the resolution is increased when the classification score is [0.33 0.33 0.33]. Compared with such processing, the accuracy of the high resolution processing of the present embodiment is improved.

以上説明したように、本実施形態では、必ずしも画像の種類を一意に識別できない、あいまいな画像識別情報を利用して適切に高解像度化処理を行うことができる。即ち、本実施形態では、画像の種類を分類する分類スコアを用いて高解像度化が行われる。この分類スコアは、画像が、画像の種類を分類する識別子の各々である確率を示すスコアである。分類スコアを用いることで、必ずしも正解でないような、あいまいな情報であっても、画像処理装置において高解像度化処理が行われる場合の高解像度化の方向性を付与することができる。従って、画像識別情報（分類スコア）が平均的に正しい情報であれば、高解像度化の精度の期待値を向上させることができる。 As described above, in the present embodiment, it is possible to appropriately perform the resolution increasing process using the ambiguous image identification information that cannot always uniquely identify the type of the image. That is, in the present embodiment, the resolution is increased using the classification score for classifying the type of the image. The classification score is a score indicating the probability that the image is each of the identifiers for classifying the type of the image. By using the classification score, even in the case of ambiguous information that is not always the correct answer, it is possible to give the direction of higher resolution when the higher resolution processing is performed in the image processing apparatus. Therefore, if the image identification information (classification score) is information that is correct on average, the expected value of the accuracy of high resolution can be improved.

＜＜実施形態２＞＞
本実施形態では分類スコアの概念を拡張する形態を説明する。例えば「鼻から上は子供らしいが、鼻から下は成人の様である」の如く、画像の場所に応じて異なる分類スコアを定義し、それらしく高解像度化を行う形態を説明する。本実施形態は、実施形態１とは、Ｓ４０４、Ｓ４０７、Ｓ４０８が異なる。特別な記述がない場合、他の処理は実施形態１と同じであるものとして説明する。以下、その動作について説明する。 << Embodiment 2 >>
In the present embodiment, a form in which the concept of the classification score is extended will be described. For example, a description will be given of a form in which different classification scores are defined according to the location of an image, and the resolution is increased, such as "the child is like a child above the nose, but like an adult from the nose". This embodiment is different from the first embodiment in S404, S407, and S408. If there is no special description, the other processes will be described as being the same as in the first embodiment. Hereinafter, the operation will be described.

Ｓ４０７では、Ｓ４０６で得た識別子ＩＤを分類スコアｓへと変換する。本実施形態の分類スコアは、画像中の各々の領域（本実施形態では画像を４分割した場合の左上、右上、左下、右下）が、識別子の各々である確率を示す値である。画像中の領域は指標（ｘ，ｙ）で表し、左上が（ｘ＝０，ｙ＝０）右下が（ｘ＝１，ｙ＝１）に対応するものとする。分類スコアｓの値は領域に応じて異なるためｓ（ｘ，ｙ）と表記する。 In S407, the identifier ID obtained in S406 is converted into a classification score s. The classification score of the present embodiment is a value indicating the probability that each region in the image (in the present embodiment, the upper left, upper right, lower left, and lower right when the image is divided into four) is each of the identifiers. The region in the image is represented by an index (x, y), and the upper left corresponds to (x = 0, y = 0) and the lower right corresponds to (x = 1, y = 1). Since the value of the classification score s differs depending on the region, it is described as s (x, y).

ｉ番目の識別子ＩＤ_iについて、画像領域（ｘ，ｙ）の分類スコアｓ（ｘ，ｙ）_iはj番目の要素ｓ（ｘ，ｙ）_ijが次式で定義されるベクトルである。 for the i-th identifier ID _i, classification score s (x, y) _i of the image area (x, y) is a vector j-th element s (x, y) _ij is defined by the following equation.

例えば画像に老人（ＩＤ＝３）の識別子が設定されている場合、対応する分類スコアｓ（ｘ，ｙ）_ijは全てのｘ，ｙについて［００１］のように３番目の要素が「１」であり、それ以外の要素が「０」のベクトルとなる。また、目元が若いが口元が老けているといったように、画像領域に応じて異なる識別子が設定されている場合には、目元に対応する画像領域（ｘ，ｙ）が［０１０］となり、口元に対応する画像領域（ｘ，ｙ）が［００１］となる。 For example, if the identifier of the old man (ID = 3) is set in the image, the corresponding classification score s (x, y) _ij is “0 0 1” for all x and y, and the third element is “0 0 1”. 1 ", and the other elements are vectors of" 0 ". When different identifiers are set according to the image areas, such as the eyes are young but the mouth is old, the image area (x, y) corresponding to the eyes becomes [0 10], The image area (x, y) corresponding to the lip becomes [0 0 1].

Ｓ４０８ではパラメータ最適化部３０５が、高解像度化処理を行うＮＮのパラメータの最適化を行う。本実施形態の高解像度化ＮＮのアーキテクチャは、実施形態１で説明した例とは異なる。これは分類スコアが（ｘ，ｙ）の空間次元を有するため、図５において中間データ５０２に分類スコア５０３を継ぎ足すことができないからである。中間データに分類スコア５０３を継ぎ足すことができない理由は、次元が一致しないためである。 In S408, the parameter optimizing unit 305 optimizes the parameters of the NN for performing the high resolution processing. The architecture of the high resolution NN of this embodiment is different from the example described in the first embodiment. This is because the classification score has a spatial dimension of (x, y), so that the classification score 503 cannot be added to the intermediate data 502 in FIG. The reason why the classification score 503 cannot be added to the intermediate data is that the dimensions do not match.

図９は、本実施形態の高解像度化ＮＮのネットワークアーキテクチャを説明する図である。図９の大部分は図５と同じであるが、分類スコアを中間データと結合させる場所が異なる。分類スコアの空間解像度は２×２であるから、２×２の空間解像度を有する中間データ９０２に分類スコアｓ（ｘ，ｙ）_iを結合させる。図９中で中間データ９０２はサイズ２×２×５０２であるが、連結によって中間データＵ４は２×２×５１２となる。以降の処理は実施形態１と同様である。 FIG. 9 is a diagram illustrating the network architecture of the high resolution NN according to the present embodiment. Most of FIG. 9 is the same as FIG. 5 except that the place where the classification score is combined with the intermediate data is different. Since the spatial resolution of the classification score is 2 × 2, the classification score s (x, y) _i is combined with the intermediate data 902 having the spatial resolution of 2 × 2. In FIG. 9, the intermediate data 902 has a size of 2 × 2 × 502, but the intermediate data U4 becomes 2 × 2 × 512 due to the concatenation. Subsequent processing is the same as in the first embodiment.

Ｓ４０４では分類スコア取得部３０２が、ＨＤＤ２０３から分類スコアを取得する。或いはインタラクティブに表示キーボード等の入力装置２０６から分類スコアを取得しても良い。 In S404, the classification score obtaining unit 302 obtains a classification score from the HDD 203. Alternatively, the classification score may be obtained interactively from the input device 206 such as a display keyboard.

図１０は、インタラクティブに分類スコアを取得する際に用いられるＵＩの例を示す図である。インタラクティブに分類スコアを取得する際にはディスプレイなどの出力装置２０７に図１０のようなＵＩを表示させることができる。図１０には、画像領域選択枠１００３、分類スコア設定部１００１、および画像表示領域１００２が含まれる。分類スコア設定部１００１は、画像領域選択枠１００３で選択された画像領域の分類スコアを設定するために用いられる。画像表示領域１００２は、かかる分類スコアで高解像度化処理した画像を表示するための領域である。当然、領域ごとに異なる分類スコアを設定しても良い。 FIG. 10 is a diagram illustrating an example of a UI used when interactively acquiring a classification score. When acquiring the classification score interactively, a UI as shown in FIG. 10 can be displayed on the output device 207 such as a display. FIG. 10 includes an image area selection frame 1003, a classification score setting unit 1001, and an image display area 1002. The classification score setting unit 1001 is used to set a classification score of the image region selected in the image region selection frame 1003. The image display area 1002 is an area for displaying an image that has been subjected to the high-resolution processing using the classification score. Naturally, a different classification score may be set for each region.

以上のステップにより、画像の領域ごとに異なる分類スコアを設定することができ、かかる領域を所定の分類スコアに応じて鮮明に拡大することができる。例えば犯罪捜査のためのモンタージュ写真作成の要領で、画像の領域ごとに識別情報を設定して、それに応じて鮮明に拡大された画像を得ることができる。識別情報は確実なものでなくとも良いことは言うまでもない。 Through the above steps, a different classification score can be set for each region of the image, and such a region can be sharply enlarged according to a predetermined classification score. For example, in the manner of creating a montage photograph for criminal investigation, identification information can be set for each area of an image, and a clearly enlarged image can be obtained accordingly. It goes without saying that the identification information need not be reliable.

なお、実施形態１，２とも学習用データは（低解像度画像ｘ_i、高解像度画像ｙ_i、識別子ＩＤ_i）からなる複数の組とし、Ｓ４０７で識別子を分類スコアに変換した。しかしながら、必要に応じて、最初から学習データとして（低解像度画像ｘ_i、高解像度画像ｙ_i、分類スコアｓ_i）の形式で保持しても構わない。この場合、スカラー値ＩＤ_iをベクトル値ｓ_iの形式で保持するためデータ量は増えてしまう。しかし、識別子ＩＤ_iそのものがベクトルで表現することが適切であるケースにおいては、最初から分類スコアの形式で学習データを保持する事が好適である。そのケースとは例えば子供と成人の中間の少年の場合、学習用分類スコアとして［０．５０．５０］を設定する場合などである。 In the first and second embodiments, the learning data is a plurality of sets including (low-resolution image x _i , high-resolution image y _i , and identifier ID _i ), and the identifier is converted into a classification score in S407. However, if necessary, learning data may be stored in the form of (low-resolution image x _i , high-resolution image y _i , classification score s _i ) from the beginning. In this case, since the scalar value ID _i is held in the form of the vector value s _i , the data amount increases. However, in a case where it is appropriate for the identifier ID _i itself to be represented by a vector, it is preferable to hold the learning data in the form of a classification score from the beginning. The case is, for example, a case where a boy between a child and an adult is set to [0.5 0.50] as a learning classification score.

＜＜その他の実施形態＞＞
以上説明した実施形態では、画像データを処理する画像処理装置を例に挙げて説明したが、これに限られない。第一のデータを取得する第一取得部と、第一のデータよりも空間サイズが小さい第二のデータを取得する第二取得部と、処理部と、を備える処理装置でも良い。処理部は、ニューラルネットワークにより構成されており、第一のデータと第二のデータとに基づいて、第三のデータを生成するように構成されている。このような処理装置において、処理部は、第一のデータを処理した中間データの空間サイズが第二のデータの空間サイズと同一となる段階の中間データと、第二のデータと、を用いて統合中間データを生成してよい。そして処理部は、統合中間データに基づいて第三のデータを生成してよい。より好適には、第二のデータは、空間的な情報を有さず、統合中間データは、空間的な情報がなくなった段階の中間データと第二のデータとを用いて生成されるとよい。 << Other embodiments >>
In the embodiment described above, an image processing apparatus that processes image data has been described as an example. However, the present invention is not limited to this. The processing device may include a first acquisition unit that acquires the first data, a second acquisition unit that acquires the second data having a smaller space size than the first data, and a processing unit. The processing unit is configured by a neural network, and is configured to generate third data based on the first data and the second data. In such a processing device, the processing unit uses the intermediate data at the stage where the space size of the intermediate data obtained by processing the first data is the same as the space size of the second data, and the second data. Integrated intermediate data may be generated. Then, the processing unit may generate third data based on the integrated intermediate data. More preferably, the second data does not have spatial information, and the integrated intermediate data may be generated using the intermediate data and the second data at the stage where the spatial information has disappeared. .

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program for realizing one or more functions of the above-described embodiments to a system or an apparatus via a network or a storage medium, and one or more processors in a computer of the system or the apparatus read and execute the program. This processing can be realized. Further, it can also be realized by a circuit (for example, an ASIC) that realizes one or more functions.

３０１低解像度画像取得部
３０２分類スコア取得部
３０３高解像度処理部
３０４学習用データ取得部
３０５パラメータ最適化部 301 Low-resolution image acquisition unit 302 Classification score acquisition unit 303 High-resolution processing unit 304 Learning data acquisition unit 305 Parameter optimization unit

Claims

First acquisition means for acquiring first image data,
Second acquisition means for acquiring a classification score;
Processing means for generating second image data of higher resolution than the first image data from the first image data based on the classification score,
Parameters constituting the processing means are determined based on learning data composed of a plurality of sets of data including learning image data and an identifier for identifying the classification of the learning image data,
The image processing device according to claim 1, wherein the classification score indicates a probability that the first image data is each of the identifiers.

First acquisition means for acquiring first image data,
Second acquisition means for acquiring a classification score;
Processing means for generating second image data of higher resolution than the first image data from the first image data based on the classification score,
The parameters constituting the processing means are determined based on learning data composed of a plurality of sets of data including learning image data and learning classification scores,
The image processing apparatus according to claim 1, wherein the classification score indicates a probability that the first image data is each of the identifiers for classifying an image.

The image processing apparatus according to claim 1, wherein the classification score includes two or more elements that are not zero.

The processing means includes:
A neural network,
Intermediate data in which the space size of the first image data has been reduced, and generating integrated intermediate data based on the classification score,
The image processing apparatus according to claim 1, wherein the second image data is generated using the integrated intermediate data.

The method according to claim 4, wherein the processing unit generates the integrated intermediate data using the intermediate data and the classification score at a stage where the space size of the intermediate data is the same as the space size of the classification score. The image processing apparatus according to any one of the preceding claims.

The image processing device according to claim 1, wherein the classification score is set for each region of an image corresponding to the first image data.

Parameters constituting the processing means are as follows:
An objective function for optimizing parameters constituting the processing means,
A parameter constituting a determination unit that outputs a value indicating whether the input image input is an image generated by the processing unit or not, and an estimated classification score obtained by estimating the classification of the input image, An objective function to optimize,
The image processing apparatus according to any one of claims 1 to 6, wherein the image processing apparatus is determined using:

The objective function for optimizing the parameters constituting the processing means is:
When the processing unit inputs the second image data generated from the first image data based on the classification score to the determination unit, the estimated classification score output from the determination unit, The image processing apparatus according to claim 7, further comprising a term set based on a difference between the classification score and the classification score.

The objective function for optimizing the parameters constituting the determination means is:
The second image data generated from the first image data based on the classification score by the processing means, when input to the determination means, a value indicating the truth output from the determination means and The image processing apparatus according to claim 7, further comprising a term that does not include the estimated classification score among the estimated classification scores.

The image processing apparatus according to claim 1, wherein the second acquisition unit acquires the classification score in accordance with an instruction through a UI.

First acquisition means for acquiring first data;
Second acquisition means for acquiring second data having a smaller space size than the first data,
Based on the first data and the second data, to generate third data, processing means configured by a neural network,
With
The processing means uses the intermediate data at the stage where the spatial size of the intermediate data obtained by processing the first data is the same as the spatial size of the second data, and the integrated intermediate data using the second data. , And generating the third data based on the integrated intermediate data.

The second data has no spatial information,
12. The processing apparatus according to claim 11, wherein the integrated intermediate data is generated using intermediate data at a stage where spatial information has disappeared and the second data.

13. The processing apparatus according to claim 12, wherein the second data has a space size of 1 × 1.

An image processing method using a processing unit that generates second image data of higher resolution than the first image data from the first image data based on the classification score,
Obtaining the first image data;
Obtaining the classification score;
Inputting the first image data and the classification score to the processing unit,
With
Parameters constituting the processing means are determined based on learning data composed of a plurality of sets of data including learning image data and an identifier for identifying the classification of the learning image data,
The image processing method according to claim 1, wherein the classification score indicates a probability that the first image data is each of the identifiers.

An image processing method using a processing unit that generates second image data of higher resolution than the first image data from the first image data based on the classification score,
Obtaining the first image data;
Obtaining the classification score;
Inputting the first image data and the classification score to the processing unit,
With
The parameters constituting the processing means are determined based on learning data composed of a plurality of sets of data including learning image data and learning classification scores,
The image processing method according to claim 1, wherein the classification score indicates a probability that the first image data is each of identifiers for identifying a classification of an image.

Based on the first data and the second data, to generate third data, a processing method using a processing means configured by a neural network,
Obtaining the first data;
Acquiring the second data having a smaller spatial size than the first data,
Inputting the first data and the second data to the processing means,
The processing means, the intermediate data at the stage where the spatial size of the intermediate data processed the first data is the same as the spatial size of the second data, and the integrated intermediate data using the second data Generating
The processing means generating the third data based on the integrated intermediate data.

A program for causing a computer to function as each unit of the image processing apparatus according to claim 1.

A program for causing a computer to function as each unit of the processing device according to any one of claims 11 to 13.