JPWO2020062846A5

JPWO2020062846A5 -

Info

Publication number: JPWO2020062846A5
Application number: JP2020529196A
Authority: JP
Publication date: 2022-04-27
Anticipated expiration: 2039-04-23

Description

関連出願の相互参照
本出願は、２０１８年９月３０に出願された中国特許出願第２０１８１１１５５２５２．６号に基づく優先権と、２０１８年９月３０に出願された中国特許出願第２０１８１１１５５３２６．６号に基づく優先権と、２０１８年９月３０に出願された中国特許出願第２０１８１１１５５１４７．２号に基づく優先権と、２０１８年９月３０に出願された中国特許出願第２０１８１１１５５９３０．９号に基づく優先権とを主張しており、その内容は、本明細書において出典明記により全体に組み込まれる。 Mutual reference of related applications This application has priority based on Chinese patent application No. 2018111552522.6 filed on September 30, 2018 and Chinese patent application No. 201811155326.6 filed on September 30, 2018. Priority based on, priority based on Chinese patent application No. 201811155147.2 filed on September 30, 2018, and priority based on Chinese patent application No. 201811155930.9 filed on September 30, 2018. Is claimed, the content of which is incorporated herein by reference in its entirety.

本開示は、全般的には、深層学習技術分野に関し、より詳しくは、画像処理鑑別ネットワーク用の装置、方法及びコンピュータ読み取り可能媒体を含む深層学習に基づく画像処理技術に関する。 The present disclosure relates generally to the field of deep learning techniques, and more particularly to image processing techniques based on deep learning, including devices and methods for image processing discrimination networks and computer readable media.

人工ニューラルネットワークに基づく深層学習技術は、画像処理などの分野で大いに進歩している。深層学習技術の利点は、汎用構造及び比較的に類似したシステムを利用した異なる技術的問題のソリューションにある。 Deep learning techniques based on artificial neural networks have made great strides in fields such as image processing. The advantage of deep learning techniques lies in the solution of different technical problems utilizing general-purpose structures and relatively similar systems.

本開示の実施形態は、複数の相関性画像を生成する装置である。前記装置は、トレーニング画像を受信し、前記トレーニング画像から少なくとも１つ以上の特徴を抽出して前記トレーニング画像に基づいて第１特徴画像を生成するように構成される特徴抽出ユニットと、前記第１特徴画像を正規化し、第２特徴画像を生成するように構成される正規化器と、前記第２特徴画像に対して複数回の並進シフトを行って複数のシフトされた画像を生成し、前記複数のシフトされた画像の各々を前記第２特徴画像と相関させて複数の相関性画像を生成するように構成されるシフト相関ユニットとを含み得る。 An embodiment of the present disclosure is an apparatus that generates a plurality of correlation images. The apparatus includes a feature extraction unit configured to receive a training image, extract at least one feature from the training image, and generate a first feature image based on the training image, and the first feature extraction unit. A normalizer configured to normalize the feature image and generate a second feature image, and a plurality of translational shifts on the second feature image to generate a plurality of shifted images. It may include a shift correlation unit configured to correlate each of the plurality of shifted images with the second feature image to generate a plurality of correlation images.

少なくともいくつかの実施形態において、前記シフト相関ユニットは、前記第２特徴画像のピクセルブロック内の一番左の又は一番右のａ列のピクセルをそれぞれ前記ピクセルブロックの一番右の又は一番左の列になるようにシフトし、前記第２特徴画像のピクセルブロック内の最下位又は最上位のｂ行のピクセルをそれぞれ前記ピクセルブロックの最上位又は最下位の行になるようにシフトすることによって、前記第２特徴画像に対して前記複数回の並進シフトを行うように構成され得る。少なくともいくつかの実施形態において、０≦ａ＜Ｙであり、０≦ｂ＜Ｘであり、ａ及びｂはいずれも整数であり、Ｙは前記第２特徴画像のピクセルブロック内のピクセルの総列数であり、Ｘは前記第２特徴画像のピクセルブロック内のピクセルの総行数であり、ａとｂとは同一であり、又は異なる。 In at least some embodiments, the shift correlation unit places the leftmost or rightmost pixel in column a in the pixel block of the second feature image on the rightmost or first of the pixel block, respectively. Shifting to the left column, and shifting the pixels in the lowest or highest b row in the pixel block of the second feature image to the highest or lowest row of the pixel block, respectively. Can be configured to perform the plurality of translational shifts on the second feature image. In at least some embodiments, 0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, where Y is the total sequence of pixels in the pixel block of the second feature image. A number, where X is the total number of rows of pixels in the pixel block of the second feature image, and a and b are the same or different.

少なくともいくつかの実施形態において、前記シフト相関ユニットは、前記第２特徴画像のピクセルブロック内の一番左の又は一番右のａ列のピクセルを削除し、前記ピクセルブロックの一番右の又は一番左の位置にａ列のピクセルをそれぞれ追加することと、前記第２特徴画像のピクセルブロック内の最下位又は最上位のｂ行のピクセルを削除し、前記ピクセルブロックの最上位又は最下位の位置にｂ行のピクセルをそれぞれ追加することによって、前記第２特徴画像に対して前記複数回の並進シフトを行うように構成され得る。少なくともいくつかの実施形態において、０≦ａ＜Ｙであり、０≦ｂ＜Ｘであり、ａ及びｂはいずれも整数であり、Ｙは前記第２特徴画像のピクセルブロック内のピクセルの総列数であり、Ｘは前記第２特徴画像のピクセルブロック内のピクセルの総行数であり、追加されたピクセルの各々は、０のピクセル値を有する。 In at least some embodiments, the shift correlation unit deletes the leftmost or rightmost pixel in column a in the pixel block of the second feature image, and the rightmost or rightmost pixel block. Add the pixels in column a to the leftmost position, delete the lowest or top b row pixels in the pixel block of the second feature image, and delete the top or bottom of the pixel block. By adding each of the b-row pixels at the position of , the second feature image may be configured to perform the translational shift a plurality of times. In at least some embodiments, 0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, where Y is the total sequence of pixels in the pixel block of the second feature image. A number, where X is the total number of rows of pixels in the pixel block of the second feature image, and each of the added pixels has a pixel value of 0 .

少なくともいくつかの実施形態において、前記シフト相関ユニットは、前記複数のシフトされた画像の各々のピクセルブロック内の各ピクセルのピクセル値に前記第２特徴画像のピクセルブロック内の位置的に対応するピクセルのピクセル値を乗じることによって、前記複数のシフトされた画像の各々を前記第２特徴画像と相関させるように構成され得る。少なくともいくつかの実施形態において、前記第１特徴画像は、輝度特徴画像であり得る。少なくともいくつかの実施形態において、前記特徴抽出ユニットは、前記トレーニング画像から輝度情報を抽出して前記輝度特徴画像を生成するように構成される輝度検出器を含み得る。 In at least some embodiments, the shift correlation unit is a pixel that corresponds positionally within the pixel block of the second feature image to the pixel value of each pixel in each pixel block of the plurality of shifted images. By multiplying the pixel values of, each of the plurality of shifted images may be configured to correlate with the second feature image. In at least some embodiments, the first feature image can be a luminance feature image. In at least some embodiments, the feature extraction unit may include a luminance detector configured to extract luminance information from the training image to generate the luminance feature image.

少なくともいくつかの実施形態において、前記輝度特徴画像を生成するために、前記輝度検出器は、次の式（１）によって前記輝度特徴画像における所与の位置でのピクセルの輝度値を確定するように構成され、 In at least some embodiments, in order to generate the luminance feature image, the luminance detector determines the luminance value of the pixel at a given position in the luminance feature image by the following equation (1). Consists of

Ｉ＝０．２９９Ｒ＋０．５８７Ｇ＋０．１１４Ｂ（１） I = 0.299R + 0.587G + 0.114B (1)

Ｉは、前記輝度値である。Ｒは、前記トレーニング画像における位置的に対応するピクセルの赤成分値である。Ｇは、前記トレーニング画像における位置的に対応するピクセルの緑成分値である。Ｂは、前記トレーニング画像における位置的に対応するピクセルの青成分値である。 I is the luminance value. R is the red component value of the pixel corresponding to the position in the training image. G is a green component value of the pixel corresponding to the position in the training image. B is the blue component value of the pixel corresponding to the position in the training image.

少なくともいくつかの実施形態において、前記正規化器は、次の式（２）によって前記輝度特徴画像を正規化するように構成され得、 In at least some embodiments, the normalizer may be configured to normalize the luminance feature image by the following equation (2).

Ｎは、前記第１特徴画像である。Ｉは、前記輝度特徴画像における所与の位置でのピクセルの輝度値を表す。Ｂｌｕｒ（Ｉ）は、前記輝度特徴画像にガウシアンフィルタを適用することによって得られた画像である。Ｂｌｕｒ（Ｉ^２）は、前記輝度特徴画像における各ピクセル値を二乗してから、前記輝度特徴画像にガウシアンフィルタを適用することによって得られた画像である。 N is the first feature image. I represents the luminance value of the pixel at a given position in the luminance feature image. Blur (I) is an image obtained by applying a Gaussian filter to the luminance feature image. Blur (I ² ) is an image obtained by squaring each pixel value in the luminance feature image and then applying a Gaussian filter to the luminance feature image.

少なくともいくつかの実施形態において、前記第２特徴画像は、第１サイズを有するピクセルブロックを含み得る。前記複数のシフトされた画像の各々及び前記複数の相関性画像の各々は、前記第１サイズを有するピクセルブロックを含み得る。前記複数のシフトされた画像の各々において、非ゼロピクセル値を有するピクセルは、前記第２特徴画像における同じ非ゼロピクセル値を持つ対応するピクセルを有し得る。 In at least some embodiments, the second feature image may include pixel blocks having a first size. Each of the plurality of shifted images and each of the plurality of correlated images may include a pixel block having the first size. In each of the plurality of shifted images, a pixel having a non-zero pixel value may have a corresponding pixel having the same non-zero pixel value in the second feature image.

本開示の別の実施形態は、複数の相関性画像を生成する方法である。前記方法は、トレーニング画像に基づいて第１特徴画像を生成するステップと、前記第１特徴画像を正規化し、第２特徴画像を生成するステップと、前記第２特徴画像に対して複数回の並進シフトを行って複数のシフトされた画像を生成するステップと、前記複数のシフトされた画像の各々を前記第２特徴画像と相関させて複数の相関性画像を生成するステップとを含み得る。 Another embodiment of the present disclosure is a method of generating a plurality of correlated images. The method includes a step of generating a first feature image based on a training image, a step of normalizing the first feature image and generating a second feature image, and a plurality of translations with respect to the second feature image. It may include a step of performing a shift to generate a plurality of shifted images, and a step of correlating each of the plurality of shifted images with the second feature image to generate a plurality of correlated images.

少なくともいくつかの実施形態において、前記複数のシフトされた画像の各々を前記第２特徴画像と相関させるステップは、前記複数のシフトされた画像の各々のピクセルブロック内の各ピクセルのピクセル値に前記第２特徴画像のピクセルブロック内の位置的に対応するピクセルのピクセル値を乗じるステップを含み得る。 In at least some embodiments, the step of correlating each of the plurality of shifted images with the second feature image is said to the pixel value of each pixel in each pixel block of the plurality of shifted images. The second feature may include a step of multiplying the pixel values of the corresponding pixels in position within the pixel block of the image.

少なくともいくつかの実施形態において、前記複数回の並進シフトを行うステップは、前記第２特徴画像のピクセルブロック内の一番左の又は一番右のａ列のピクセルをそれぞれ前記ピクセルブロックの一番右の又は一番左の列になるようにシフトするステップと、前記第２特徴画像のピクセルブロック内の最下位又は最上位のｂ行のピクセルをそれぞれ前記ピクセルブロックの最上位又は最下位の行になるようにシフトするステップとを含み得る。少なくともいくつかの実施形態において、０≦ａ＜Ｙであり、０≦ｂ＜Ｘであり、ａ及びｂはいずれも整数であり、Ｙは前記第２特徴画像のピクセルブロック内のピクセルの総列数であり、Ｘは前記第２特徴画像のピクセルブロック内のピクセルの総行数であり、ａとｂとは同一であり、又は異なる。少なくともいくつかの実施形態において、ａ及びｂの少なくとも一つは、前記複数回の並進シフトの実行中に少なくとも一回変化し得る。 In at least some embodiments, the step of performing the multiple translational shifts places the leftmost or rightmost pixel in column a in the pixel block of the second feature image at the top of the pixel block, respectively. The step of shifting to the right or leftmost column and the lowest or highest b row of pixels in the pixel block of the second feature image are the highest or lowest row of the pixel block, respectively. It may include a step that shifts to. In at least some embodiments, 0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, where Y is the total sequence of pixels in the pixel block of the second feature image. A number, where X is the total number of rows of pixels in the pixel block of the second feature image, and a and b are the same or different. In at least some embodiments, at least one of a and b may change at least once during the execution of the plurality of translational shifts.

少なくともいくつかの実施形態において、前記複数回の並進シフトを行うステップは、前記第２特徴画像のピクセルブロック内の一番左の又は一番右のａ列のピクセルを削除し、前記ピクセルブロックの一番右の又は一番左の位置にａ列のピクセルをそれぞれ追加するステップと、前記第２特徴画像のピクセルブロック内の最下位又は最上位のｂ行のピクセルを削除し、前記ピクセルブロックの最上位又は最下位の位置にｂ行のピクセルをそれぞれ追加するステップとを含み得る。少なくともいくつかの実施形態において、０≦ａ＜Ｙであり、０≦ｂ＜Ｘであり、ａ及びｂはいずれも整数であり、Ｙは前記第２特徴画像のピクセルブロック内のピクセルの総列数であり、Ｘは前記第２特徴画像のピクセルブロック内のピクセルの総行数である。少なくともいくつかの実施形態において、追加されたピクセルの各々は、０のピクセル値を有し得る。少なくともいくつかの実施形態において、ａ及びｂの少なくとも一つは、前記複数回の並進シフトの実行中に少なくとも一回変化し得る。 In at least some embodiments, the step of performing the plurality of translational shifts removes the leftmost or rightmost column a pixel in the pixel block of the second feature image and removes the pixel of the pixel block. The step of adding the pixel of column a at the rightmost or leftmost position, and the pixel of the lowest or highest row b in the pixel block of the second feature image are deleted, and the pixel of the pixel block is deleted. It may include a step of adding rows b of pixels at the top or bottom positions, respectively. In at least some embodiments, 0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, where Y is the total sequence of pixels in the pixel block of the second feature image. It is a number, and X is the total number of rows of pixels in the pixel block of the second feature image. In at least some embodiments, each of the added pixels can have a pixel value of 0. In at least some embodiments, at least one of a and b may change at least once during the execution of the plurality of translational shifts.

少なくともいくつかの実施形態において、前記方法は、Ｘ＊Ｙ回の並進シフトを行うステップを更に含み得、Ｙは前記第２特徴画像のピクセルブロック内のピクセルの総列数であり、Ｘは前記第２特徴画像のピクセルブロック内のピクセルの総行数である。 In at least some embodiments, the method may further include a step of performing XY translational shifts, where Y is the total number of rows of pixels in the pixel block of the second feature image and X is said. The total number of rows of pixels in the pixel block of the second feature image.

少なくともいくつかの実施形態において、前記方法は、前記第１特徴画像を生成する前に、前記トレーニング画像を受信するステップを更に含み得る。少なくともいくつかの実施形態において、前記第１特徴画像を生成するステップは、前記トレーニング画像の輝度情報に基づいて輝度特徴画像を生成するステップを含み得る。 In at least some embodiments, the method may further include receiving the training image prior to generating the first feature image. In at least some embodiments, the step of generating the first feature image may include the step of generating a luminance feature image based on the luminance information of the training image.

少なくともいくつかの実施形態において、前記方法は、次の式（１）によって前記輝度特徴画像における所与の位置でのピクセルの輝度値を確定するステップを更に含み得、 In at least some embodiments, the method may further comprise the step of determining the luminance value of a pixel at a given position in the luminance feature image by the following equation (1).

少なくともいくつかの実施形態において、前記方法は、次の式（２）によって前記輝度特徴画像を正規化するステップを更に含み得、 In at least some embodiments, the method may further include the step of normalizing the luminance feature image by the following equation (2).

Ｎは、前記第１特徴画像である。Ｉは、前記輝度特徴画像を表す。Ｂｌｕｒ（Ｉ）は、前記輝度特徴画像にガウシアンフィルタを適用することによって得られた画像である。Ｂｌｕｒ（Ｉ^２）は、前記輝度特徴画像における各ピクセル値を二乗してから、前記輝度特徴画像にガウシアンフィルタを適用することによって得られた画像である。 N is the first feature image. I represents the luminance feature image. Blur (I) is an image obtained by applying a Gaussian filter to the luminance feature image. Blur (I ² ) is an image obtained by squaring each pixel value in the luminance feature image and then applying a Gaussian filter to the luminance feature image.

少なくともいくつかの実施形態において、前記第１特徴画像は、第１サイズを有するピクセルブロックを含み得る。少なくともいくつかの実施形態において、前記複数のシフトされた画像の各々及び前記複数の相関性画像の各々は、前記第１サイズを有するピクセルブロックを含み得る。少なくともいくつかの実施形態において、前記複数のシフトされた画像の各々において、非ゼロピクセル値を有するピクセルは、前記第１特徴画像における同じ非ゼロピクセル値を持つ対応するピクセルを有し得る。 In at least some embodiments, the first feature image may include pixel blocks having a first size. In at least some embodiments, each of the plurality of shifted images and each of the plurality of correlated images may include a pixel block having the first size. In at least some embodiments, in each of the plurality of shifted images, a pixel having a non-zero pixel value may have a corresponding pixel having the same non-zero pixel value in the first feature image.

本開示の別の実施形態は、コンピュータに複数の相関性画像を生成する方法を実行させる命令を記憶する非一時的なコンピュータ読み取り可能媒体である。前記方法は、上記のようであり得る。 Another embodiment of the present disclosure is a non-temporary computer-readable medium that stores instructions that cause a computer to perform a method of generating multiple correlated images. The method may be as described above.

本開示の別の実施形態は、敵対的生成ネットワークをトレーニングするシステムである。前記システムは、鑑別ネットワークマイクロプロセッサによりトレーニングされるように構成される生成ネットワークマイクロプロセッサと、前記生成ネットワークにカップリングされた鑑別ネットワークマイクロプロセッサとを含む敵対的生成ネットワークプロセッサを含み得る。 Another embodiment of the present disclosure is a system for training a hostile generation network. The system may include a generated network microprocessor configured to be trained by a differential network microprocessor and a hostile generated network processor including a differential network microprocessor coupled to the generated network.

少なくともいくつかの実施形態において、前記鑑別ネットワークマイクロプロセッサは、それぞれが上記のようであり得る複数の相関性画像を生成する複数の装置にカップリングされた複数の入力端と、それぞれが前記複数の入力端のうちの一つにカップリングされた複数の分析モジュールと、カスケードの各ステージが前記複数の分析モジュールのうちの一つ及び前記カスケードの前のステージにおけるプーリングモジュールにカップリングされたプーリングモジュールを含む前記カスケードにより、接続された複数のプーリングモジュールと、前記カスケードの最後のステージにおけるプーリングモジュールにカップリングされた鑑別ネットワークとを含み得る。 In at least some embodiments, the differential network microprocessor has a plurality of input ends coupled to a plurality of devices, each of which produces a plurality of correlated images, each of which may be as described above. A plurality of analysis modules coupled to one of the input ends and a pooling module in which each stage of the cascade is coupled to a pooling module in one of the plurality of analysis modules and the previous stage of the cascade. The cascade may include a plurality of connected pooling modules and a discriminant network coupled to the pooling modules at the last stage of the cascade.

本発明と見なされる主題は、本明細書の終末での請求項に特に指摘され且つ明確に請求される。本開示の前述の及び他の目的、特徴並びに利点は、添付図面と併せて進められる次の詳細な説明からより明らかになるであろう。図面は以下の通りである。 The subject matter considered to be the invention is specifically noted and expressly claimed in the claims at the end of the specification. The aforementioned and other objectives, features and advantages of the present disclosure will become more apparent from the following detailed description, which will proceed in conjunction with the accompanying drawings. The drawings are as follows.

本開示の実施形態に係る画像処理用の装置のブロック図を示す。The block diagram of the apparatus for image processing which concerns on embodiment of this disclosure is shown. 本開示の実施形態に係る第１特徴画像における３＊３ピクセルブロックの概略図を示す。The schematic diagram of the 3 * 3 pixel block in the 1st feature image which concerns on embodiment of this disclosure is shown. 本開示の実施形態に係る図２に図示される第１特徴画像をシフトすることによって得られた９つのシフトされた画像の各々における３＊３ピクセルブロックを示す。Shown are 3 * 3 pixel blocks in each of the nine shifted images obtained by shifting the first feature image illustrated in FIG. 2 according to an embodiment of the present disclosure. 本開示の別の実施形態に係る図２に図示される第１特徴画像をシフトすることによって得られた９つのシフトされた画像の各々における３＊３ピクセルブロックを示す。Shown are 3 * 3 pixel blocks in each of the nine shifted images obtained by shifting the first feature image illustrated in FIG. 2 according to another embodiment of the present disclosure. 本開示に係る画像処理用の装置にカップリングされ得る本開示の実施形態に係る鑑別ネットワークを示す。The discrimination network according to the embodiment of this disclosure which can be coupled to the apparatus for image processing which concerns on this disclosure is shown. 本開示の実施形態に係る画像処理用の方法のフローチャートを示す。The flowchart of the method for image processing which concerns on embodiment of this disclosure is shown. 本開示の別の実施形態に係る画像処理用の方法のフローチャートを示す。A flowchart of a method for image processing according to another embodiment of the present disclosure is shown. 本開示の実施形態に係るニューラルネットワークをトレーニングするシステムのブロック図を示す。A block diagram of a system for training a neural network according to an embodiment of the present disclosure is shown.

図示は当業者による詳細な説明と併せた本発明の理解の促進における明確性を図るものであるため、図面の多様な特徴は一定の縮尺で描かれたものではない。 The various features of the drawings are not drawn to a certain scale, as the illustrations are intended to provide clarity in facilitating the understanding of the invention, along with detailed explanations by those skilled in the art.

次に、上で簡単に述べられた添付図面と併せて本開示の実施形態を明確且つ具体的に記述することにする。本開示の主題は、法定要件を満たすために特異性を持って記述される。しかし、説明そのものは本開示の範囲を限定することを意図していない。むしろ、本発明者らは、この文書で記述されるステップ又は要素に類似した異なるステップ又は要素を含むように、請求される主題が現在又は将来の技術と併せて他のやり方で具現され得ることを考える。 Next, the embodiments of the present disclosure will be described clearly and concretely together with the accompanying drawings briefly described above. The subject matter of this disclosure is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors may otherwise embody the claimed subject matter in conjunction with current or future techniques to include different steps or elements similar to the steps or elements described in this document. think of.

多様な図面の実施形態に関連して本技術を記述したが、理解すべきことは、本技術から逸脱することなく本技術の同じ機能を実行するために、他の類似した実施形態が利用され得るか、又は記述された実施形態に対する変更及び追加が実施され得る。したがって、本技術は、いずれか単一の実施形態に限定されるべきではなく、添付される特許請求の範囲に応じた広さ及び範囲に準拠して解釈されるべきである。また、当該技術分野における通常の知識を有する者によりこの書類に記載される実施形態に基づいて得られるその他すべての実施形態は本開示の範囲内であると見なされる。 Although the technique has been described in relation to various embodiments of the drawings, it is important to understand that other similar embodiments have been used to perform the same function of the technique without departing from the technique. Changes and additions to the embodiments obtained or described may be implemented. Therefore, the art should not be limited to any single embodiment, but should be construed in accordance with the breadth and scope of the appended claims. In addition, all other embodiments obtained on the basis of the embodiments described in this document by those having ordinary knowledge in the art are considered to be within the scope of this disclosure.

人工ニューラルネットワークに基づく深層学習技術は、画像処理などの分野で大いに進歩している。深層学習は、機械学習方法におけるデータの特徴づけに基づく学習方法である。観測値（例えば、画像）は多様なピクセルの強度値のベクタとして、もしくは、より抽象的には、一連のエッジ、特定の形状を有する領域等として多様な方式により表され得る。深層学習技術の利点は、汎用構造及び比較的に類似したシステムを利用した異なる技術的問題のソリューションにある。深層学習の利点は、特徴の手動取得を特徴学習及び階層的な特徴抽出用の効率的な教師なし又は半教師ありアルゴリズムに置き換えることである。 Deep learning techniques based on artificial neural networks have made great strides in fields such as image processing. Deep learning is a learning method based on data characterization in machine learning methods. Observations (eg, images) can be represented by a variety of methods as vectors of intensity values of various pixels, or more abstractly, as a series of edges, regions with a particular shape, and the like. The advantage of deep learning techniques lies in the solution of different technical problems utilizing general-purpose structures and relatively similar systems. The advantage of deep learning is to replace manual feature acquisition with an efficient unsupervised or semi-supervised algorithm for feature learning and hierarchical feature extraction.

自然界の画像は、人間によって合成的に又はコンピュータによってランダムに作成された画像と容易に区別され得る。自然画像は、少なくとも特定の構造を含有し、非常に非ランダムであるため特徴的である。例えば、合成的に及びコンピュータによってランダムに生成された画像は、自然的なシーン又はオブジェクトをほとんど含有しない。 Images in nature can be easily distinguished from images created synthetically by humans or randomly by computers. Natural images are characteristic because they contain at least certain structures and are very non-random. For example, images generated synthetically and randomly by a computer contain few natural scenes or objects.

圧縮アルゴリズム、アナログ記憶媒体、さらには人間自身の視覚システムのような画像処理システムは現実世界の画像に対して機能する。敵対的生成ネットワーク（ＧＡＮ）は、自然画像の現実的サンプルを生成する一ソリューションである。ＧＡＮは、２つのモデルが同時にトレーニングされるか又はクロストレーニングされる生成モデリングへのアプローチであり得る。 Image processing systems such as compression algorithms, analog storage media, and even human own visual systems work for real-world images. Generative adversarial networks (GANs) are a solution for generating realistic samples of natural images. GAN can be an approach to production modeling in which two models are trained simultaneously or cross-trained.

学習システムは、特定のターゲットに基づいてパラメータを調整するように構成され、損失関数で表され得る。ＧＡＮにおいて、損失関数は、難しいタスクを独立して学習できる別の機械学習システムに置き換えられる。ＧＡＮは、通常、鑑別ネットワークに対抗する生成ネットワークを含む。前記生成ネットワークは、低解像度データ画像の入力を受信し、前記低解像度データ画像をアップスケールし、当該アップスケールされた画像を前記鑑別ネットワークに送る。前記鑑別ネットワークは、その入力が前記生成ネットワークの出力（即ち、「フェイク」アップスケールされたデータ画像）であるかそれとも実際画像（即ち、オリジナル高解像度データ画像）であるかを分類するタスクを任せられる。前記鑑別ネットワークは、その入力がアップスケールされた画像及びオリジナル画像である確率を測定する「０」と「１」の間のスコアを出力する。前記鑑別ネットワークが「０」又は「０」に近づくスコアを出力する場合、前記鑑別ネットワークは、当該画像が前記生成ネットワークの出力であると判断している。前記鑑別ネットワークが「１」又は「１」に近づく数値を出力する場合、前記鑑別ネットワークは、当該画像がオリジナル画像であると判断している。このような生成ネットワークを鑑別ネットワークに対抗させ、したがって、「敵対的」な仕方は２つのネットワーク間の競争を利用して、生成ネットワークにより生成された画像がオリジナルと区別できなくなるまで、両方のネットワークがそれらの方法を改善するように駆動する。 The learning system is configured to adjust the parameters based on a particular target and can be represented by a loss function. In GAN, the loss function is replaced by another machine learning system that can independently learn difficult tasks. The GAN usually includes an adversarial network that opposes the differential network. The generation network receives the input of the low resolution data image, upscales the low resolution data image, and sends the upscaled image to the discrimination network. The discriminant network is tasked with classifying whether its input is the output of the generated network (ie, a "fake" upscaled data image) or an actual image (ie, the original high resolution data image). Will be. The discrimination network outputs a score between "0" and "1" that measures the probability that the input is an upscaled image and an original image. When the discrimination network outputs a score approaching "0" or "0", the discrimination network determines that the image is the output of the generation network. When the discrimination network outputs a numerical value approaching "1" or "1", the discrimination network determines that the image is an original image. Such an adversarial network opposes the discriminative network, and thus the "hostile" way takes advantage of the competition between the two networks until the images produced by the adversarial network are indistinguishable from the original. Drives to improve those methods.

鑑別ネットワークは、所定のスコアを有するデータを用いて入力を「リアル」又は「フェイク」とスコアリングするようにトレーニングされ得る。「フェイク」データは生成ネットワークにより生成された高解像度画像であり得、「リアル」データは所定のリファレンス画像であり得る。鑑別ネットワークをトレーニングするために、鑑別ネットワークが「リアル」データを受信する時はいつでも「１」に近づくスコアを出力し、「フェイク」データを受信する時はいつでも「０」に近づくスコアを出力するまで、前記鑑別ネットワークのパラメータを調整する。生成ネットワークをトレーニングするために、前記生成ネットワークの出力が鑑別ネットワークから「１」にできるだけ近いスコアを受信するまで、前記生成ネットワークのパラメータを調整する。 The discrimination network may be trained to score the input as "real" or "fake" with data having a given score. The "fake" data can be a high resolution image generated by the generation network and the "real" data can be a given reference image. To train the discriminant network, whenever the discriminant network receives "real" data, it outputs a score that approaches "1", and whenever it receives "fake" data, it outputs a score that approaches "0". Until, the parameters of the discrimination network are adjusted. To train the generation network, the parameters of the generation network are adjusted until the output of the generation network receives a score as close as possible to "1" from the discrimination network.

ＧＡＮの普遍的な類推は、偽造者と警察である。生成ネットワークは偽造者に類推され、贋金を製造して、検出なしにそれを使用しようとするのに対し、鑑別ネットワークは警察に類推され、当該贋金を検出しようとし得る。偽造者と警察の間の競争は双方が偽造品を本物と区別できなくなるまでそれらの方法を改善するように刺激する。 The universal analogy of GAN is counterfeiters and police. The generation network is analogized by counterfeiters and attempts to manufacture counterfeit money and use it without detection, whereas the discrimination network is analogized by police and may attempt to detect the counterfeit money. Competition between counterfeiters and police encourages both sides to improve their methods until they are indistinguishable from the real thing.

生成ネットワーク及び鑑別ネットワークの両方ともゼロ和ゲームで異なり且つ対立する目的関数、即ち、損失関数を最適化しようとする。「クロストレーニング」を通じて鑑別ネットワークによる出力を最大化し、生成ネットワークは生成ネットワークが生成する画像を改善し、鑑別ネットワークはそのオリジナル高解像度画像と生成ネットワークにより生成された画像の区別の正確度を向上させる。前記生成ネットワークと前記鑑別ネットワークとは、より良好な画像を生成し、画像を評価する基準を高めようと競争する。 Both the generation network and the discrimination network try to optimize the objective function, that is, the loss function, which is different and conflicts in the zero-sum game. Maximize the output of the discriminant network through "cross-training", the spawning network improves the images produced by the spawning network, and the discriminant network improves the accuracy of the distinction between the original high resolution image and the image produced by the spawning network. .. The generation network and the discrimination network compete to generate better images and raise the criteria for evaluating the images.

特定のパラメータにおいて、生成ネットワークを改善するようにトレーニングするためには、オリジナル高解像度画像と生成ネットワークにより生成された画像の区別における鑑別ネットワークの正確度を高める必要が残っている。例えば、リアルで破損していないと認識される画像の生成するタスクに関心がある。これは、ぼけ除去、雑音除去、デモザイク処理、圧縮解除、コントラスト強調、画像超解像度などのような問題に応用できる。このような問題において、破損された画像が視覚的に損なわれており、機械学習システムがそれを修復するために設計され得る。しかし、オリジナル画像を復旧する目標は往々にして非現実的であり、本物らしく見えない画像につながる。ＧＡＮは、「リアル」画像を生成するように設計される。一般的な構成は、カラー出力画像を取り、機械学習システム（例えば、畳み込みネットワーク）を用いて画像がどれほどリアルであるかを測定する単一の数値を出力する。このシステムは知覚品質を向上できるが、現在、敵対的システムの出力は依然として人間ビューアにより自然画像と認識されるのに不足している。 In order to train to improve the generation network at certain parameters, it remains necessary to increase the accuracy of the differential network in distinguishing between the original high resolution image and the image generated by the generation network. For example, I'm interested in the task of generating an image that is perceived as real and undamaged. This can be applied to problems such as blur removal, denoising, demosaic processing, decompression, contrast enhancement, image superresolution, and so on. In such problems, the corrupted image is visually impaired and a machine learning system can be designed to repair it. However, the goal of recovering the original image is often unrealistic, leading to images that do not look real. GAN is designed to produce "real" images. A typical configuration takes a color output image and uses a machine learning system (eg, a convolutional network) to output a single number that measures how realistic the image is. Although this system can improve perceptual quality, the output of hostile systems is still insufficient to be recognized as a natural image by human viewers.

図１は、本開示の実施形態に係る画像処理用の装置のブロック図を示す。 FIG. 1 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

図１のブロック図は、装置１００が図１に示されるコンポーネントのみを含むことを示すことを意図していない。むしろ、装置１００は、具体的な実施の詳細に応じて、当該技術分野における通常の知識を有する者に知られているが図１に示されていない任意の数の追加的なアクセサリ及び／又はコンポーネントを含み得る。 The block diagram of FIG. 1 is not intended to show that the device 100 includes only the components shown in FIG. Rather, device 100 is an arbitrary number of additional accessories and / or not shown in FIG. 1, known to those of ordinary skill in the art, depending on the specific implementation details. Can include components.

図１に示す如く、装置１００は、特徴抽出ユニット１１０と、シフト相関ユニット１２０とを含む。 As shown in FIG. 1, the device 100 includes a feature extraction unit 110 and a shift correlation unit 120.

特徴抽出ユニット１１０は、装置１００に入力されるか又は装置１００により受信されるトレーニング画像から１つ以上の特徴を抽出し、当該抽出された特徴に基づいて特徴画像を生成するように構成される。前記特徴画像は、前記トレーニング画像の１つ以上の特徴を表す。前記トレーニング画像は、生成ネットワークにより生成された画像又は所定のリファレンス画像であり得る。 The feature extraction unit 110 is configured to extract one or more features from a training image input to or received by the device 100 and generate a feature image based on the extracted features. .. The feature image represents one or more features of the training image. The training image can be an image generated by the generation network or a predetermined reference image.

いくつかの実施形態において、図１に示す如く、特徴抽出ユニット１１０は、輝度検出器１１１を含み得る。 In some embodiments, the feature extraction unit 110 may include a luminance detector 111, as shown in FIG.

輝度検出器１１１は、例えば、トレーニング画像における輝度に関する情報をトレーニング画像から抽出することによって、前記トレーニング画像の第１特徴画像を生成するように構成される。したがって、前記第１特徴画像は、輝度特徴画像とも称され得る。 The luminance detector 111 is configured to generate a first feature image of the training image, for example, by extracting information about the luminance in the training image from the training image. Therefore, the first feature image may also be referred to as a luminance feature image.

いくつかの実施形態において、図１に示す如く、特徴抽出ユニット１１０は、正規化器１１２を含み得る。 In some embodiments, the feature extraction unit 110 may include a normalizer 112, as shown in FIG.

正規化器１１２は、前記第１特徴画像を正規化することによって第２特徴画像を生成するように構成される。第１特徴画像が輝度特徴画像である実施形態において、正規化器１１２は、前記輝度特徴画像を正規化するように構成される。正規化により、画像のピクセル値がより小さな値の範囲内に収まるようにし、高すぎる又は低すぎる外れピクセル値を取り除くことができる。これは、結局、以下で議論されるように、相関性の計算を促進できる。 The normalizer 112 is configured to generate a second feature image by normalizing the first feature image. In the embodiment in which the first feature image is a luminance feature image, the normalizer 112 is configured to normalize the luminance feature image. Normalization allows the pixel values of an image to fall within a smaller range of values and removes outlier pixel values that are too high or too low. This can eventually facilitate the calculation of correlations, as discussed below.

本開示に係る画像処理用の装置１００は、汎用のコンピュータ、マイクロプロセッサ、デジタル電子回路、集積回路、特に設計されたＡＳＩＣ（特定用途向け集積回路）、コンピュータハードウェア、ファームウェア、ソフトウェア、及び／又はその組み合わせの形になっているコンピューティング装置で実施され得る。 The image processing apparatus 100 according to the present disclosure includes a general-purpose computer, a microprocessor, a digital electronic circuit, an integrated circuit, a particularly designed ASIC (application specific integrated circuit), computer hardware, firmware, software, and / or. It can be implemented in a computing device in the form of that combination.

特徴抽出ユニット１１０により生成された第２特徴画像は、さらなる処理を行うためにシフト相関ユニット１２０に出力される。シフト相関ユニット１２０は、前記第２特徴画像の複数回の並進シフトを行って複数のシフトされた画像を生成するように構成される。シフト相関ユニット１２０は、さらに、前記第２特徴画像と前記複数のシフトされた画像の各々の間の１組の相関性に基づいて複数の相関性画像を生成するように構成される。シフト相関ユニット１２０は、さらに、深層学習ネットワークをトレーニングするために、前記複数の相関性画像を深層学習ネットワークに送信するように構成される。例えば、いくつかの実施形態において、前記複数の相関性画像を敵対的生成ネットワークにおける鑑別ネットワークに送信して、前記鑑別ネットワークを敵対的生成ネットワークにおける生成ネットワークと反復的にトレーニングし得る。 The second feature image generated by the feature extraction unit 110 is output to the shift correlation unit 120 for further processing. The shift correlation unit 120 is configured to perform a plurality of translational shifts of the second feature image to generate a plurality of shifted images. The shift correlation unit 120 is further configured to generate a plurality of correlation images based on a set of correlations between each of the second feature image and the plurality of shifted images. The shift correlation unit 120 is further configured to transmit the plurality of correlation images to the deep learning network in order to train the deep learning network. For example, in some embodiments, the plurality of correlation images may be transmitted to the differential network in the hostile generation network to iteratively train the discrimination network with the generation network in the hostile generation network.

前記第２特徴画像は、第１数量の行のピクセル及び第１数量の列のピクセルにより定義される第１サイズのピクセルブロックを有する。前記第２特徴画像は、複数の並進シフトの前に前記第１サイズに対応する第１領域を占める。並進シフトは、いくつかの方法で達成され得る。いくつかの実施形態において、並進シフトは、前記第２特徴画像におけるピクセルを初期領域から行（又は水平）方向或いは列（又は垂直）方向に移動させる。いくつかの実施形態において、並進シフトは、前記第１領域の外にシフトされたピクセルの行及び／又は列を削除することと、シフトされたピクセルによって空いたスペースにおけるピクセルに「０」の値を割り当てることとを含み得る。いくつかの実施形態において、並進シフトは、ピクセルの行及び／又は列を並べ替えるか又は再配列することを含み得る。 The second feature image has a first size pixel block defined by pixels in a first quantity of rows and pixels in a first quantity of columns. The second feature image occupies a first region corresponding to the first size prior to the plurality of translational shifts. Translation shifts can be achieved in several ways. In some embodiments, the translational shift moves the pixels in the second feature image from the initial region in the row (or horizontal) or column (or vertical) direction. In some embodiments, the translational shift removes the rows and / or columns of the pixels shifted outside the first region and values "0" for the pixels in the space vacated by the shifted pixels. Can include assigning. In some embodiments, the translational shift may include rearranging or rearranging the rows and / or columns of pixels.

前記複数のシフトされた画像の各々は、前記第２特徴画像におけるピクセルブロックの第１サイズと同じサイズのピクセルブロックを有する。前記複数のシフトされた画像の各々は、前記第２特徴画像と同じ数の行のピクセル及び同じ数の列のピクセルを有する。 Each of the plurality of shifted images has a pixel block having the same size as the first size of the pixel block in the second feature image. Each of the plurality of shifted images has the same number of rows of pixels and the same number of columns of pixels as the second feature image.

各シフトされた画像における非ゼロ値を有する各々のピクセルは、前記第２特徴画像における同じ非ゼロ値を持つ対応するピクセルを有する。少なくともいくつかの実施形態において、前記第２特徴画像における対応するピクセルを有しないピクセは「０」の値が割り当てられる。例示的な例として、シフトされた画像における最初の２行のピクセルの値は、第１特徴画像の最後の２行におけるそれぞれ対応するピクセルの値と同じであり、前記シフトされた画像における他の全てのピクセルは「０」の値が割り当てられる。前記第２特徴画像における対応するピクセルを有するシフトされた画像における各ピクセルは、対応するピクセルと同じピクセル値を有する。 Each pixel with a non-zero value in each shifted image has a corresponding pixel with the same non-zero value in the second feature image. In at least some embodiments, a pixe having no corresponding pixel in the second feature image is assigned a value of "0". As an exemplary example, the pixel values in the first two rows of the shifted image are the same as the corresponding pixel values in the last two rows of the first feature image, respectively, and the other in the shifted image. All pixels are assigned a value of "0". Each pixel in the shifted image having the corresponding pixel in the second feature image has the same pixel value as the corresponding pixel.

本開示において、「対応するピクセル」は、位置的に対応するピクセルに限定されず、異なる位置を占めるピクセルも含み得る。「対応するピクセル」は、同じピクセル値を有するピクセルを指す。 In the present disclosure, the "corresponding pixel" is not limited to the pixel corresponding to the position, but may include the pixel occupying a different position. "Corresponding pixel" refers to a pixel having the same pixel value.

本開示において、画像はピクセルブロックとして処理される。ブロック内のピクセルの値は、ブロック内のピクセルに位置的に対応する画像におけるピクセルの値を表す。 In the present disclosure, the image is treated as a pixel block. The value of the pixel in the block represents the value of the pixel in the image that corresponds positionally to the pixel in the block.

２つの画像の間の相関性は、当該２つの画像のピクセルブロックのピクセル対ピクセルの乗算によって計算され得る。例えば、相関性画像のｉ行目かつｊ列目（ｉ，ｊ）のピクセルの値は、第２特徴画像における（ｉ，ｊ）位置でのピクセルの値に、対応するシフトされた画像における（ｉ，ｊ）位置でのピクセルの値を乗じることによって確定され得る。 The correlation between two images can be calculated by pixel-to-pixel multiplication of the pixel blocks of the two images. For example, the pixel values in the i-th row and j-th column (i, j) of the correlation image are shifted to the pixel values at the (i, j) position in the second feature image (in the image). i, j) It can be determined by multiplying the value of the pixel at the position.

図１に示す如く、いくつかの実施形態において、特徴抽出ユニット１１０は、輝度検出器１１１と、正規化器１１２とを含む。 As shown in FIG. 1, in some embodiments, the feature extraction unit 110 includes a luminance detector 111 and a normalizer 112.

輝度検出器１１１は、例えば、特徴抽出ユニット１１０が受信したトレーニング画像から前記トレーニング画像における輝度に関する情報を抽出することによって第１特徴画像を生成し、当該抽出された輝度情報に基づいて輝度特徴画像を生成するように構成される。したがって、前記第１特徴画像は、輝度特徴画像とも称される。人間の目は、他の特徴よりも画像の輝度にもっと敏感である傾向がある。輝度情報を抽出することにより、本開示の装置は、トレーニング画像から不必要な情報を取り除き、処理負荷を低減できる。 The luminance detector 111 generates a first characteristic image by extracting information on the luminance in the training image from the training image received by the feature extraction unit 110, and the luminance feature image is based on the extracted luminance information. Is configured to generate. Therefore, the first feature image is also referred to as a luminance feature image. The human eye tends to be more sensitive to the brightness of the image than other features. By extracting the luminance information, the apparatus of the present disclosure can remove unnecessary information from the training image and reduce the processing load.

前記輝度特徴画像におけるピクセルの行及び列の数は、前記トレーニング画像と同じである。前記輝度特徴画像のｉ行目かつｊ列目（ｉ，ｊ）でのピクセルの輝度値Ｉは、次の式（１）によって計算され得る。 The number of rows and columns of pixels in the luminance feature image is the same as in the training image. The luminance value I of the pixel in the i-th row and the j-th column (i, j) of the luminance feature image can be calculated by the following equation (1).

式（１）において、Ｒは、前記トレーニング画像におけるピクセル（ｉ，ｊ）の赤成分値を表す。Ｇは、緑成分値を表す。Ｂは、青成分値を表す。ｉ及びｊはいずれも整数である。ｉの値は、１≦ｉ≦Ｘである。ｊの値は、１≦ｊ≦Ｙである。Ｘは前記トレーニング画像における総行数であり、Ｙは前記トレーニング画像における総列数である。 In the formula (1), R represents the red component value of the pixel (i, j) in the training image. G represents a green component value. B represents the blue component value. Both i and j are integers. The value of i is 1 ≦ i ≦ X. The value of j is 1 ≦ j ≦ Y. X is the total number of rows in the training image, and Y is the total number of columns in the training image.

いくつかの実施形態において、前記トレーニング画像は、カラー画像である。いくつかの実施形態において、前記トレーニング画像は、Ｒコンポーネントと、Ｇコンポーネントと、Ｂコンポーネントとを有し、本開示の装置は、前記Ｒコンポーネント、前記Ｇコンポーネント、及び前記Ｂコンポーネントが前記輝度検出器に入力されて、それぞれＹコンポーネント、Ｕコンポーネント、及びＶコンポーネントに変換されてから、そこで、それぞれＹチャンネル、Ｕチャンネル、及びＶチャンネルに入力されるように、前記トレーニング画像を処理するように構成され得る。前記Ｙコンポーネント、前記Ｕコンポーネント、及びＶコンポーネントは、ＹＵＶ空間内のトレーニング画像のコンポーネントである。前記Ｙチャンネル、前記Ｕチャンネル、前記Ｖチャンネルは、これらのチャンネルからの出力がそれぞれＹコンポーネント出力、Ｕコンポーネント出力、及びＶコンポーネント出力であることを示す。トレーニング画像のＲＧＢコンポーネントがＹＵＶコンポーネントに変換される実施形態において、前記輝度値Ｉは、Ｙコンポーネントの値に対応する。 In some embodiments, the training image is a color image. In some embodiments, the training image comprises an R component, a G component, and a B component, wherein the apparatus of the present disclosure comprises the R component, the G component, and the B component being the brightness detector. Is configured to process the training image so that it is input to the Y, U, and V components, respectively, and then input to the Y, U, and V channels, respectively. obtain. The Y component, the U component, and the V component are components of the training image in the YUV space. The Y channel, the U channel, and the V channel indicate that the outputs from these channels are the Y component output, the U component output, and the V component output, respectively. In an embodiment in which the RGB component of the training image is converted to a YUV component, the luminance value I corresponds to the value of the Y component.

いくつかの実施形態において、トレーニング画像は、Ｙコンポーネントと、Ｕコンポーネントと、Ｖコンポーネントとを有する。その場合、本開示の装置は、前記輝度検出器のＹチャンネルを介して前記トレーニング画像のＹコンポーネントを処理し、前記輝度検出器のＵチャンネルを介して前記トレーニング画像のＵコンポーネントを処理し、前記輝度検出器Ｖチャンネルを介して前記トレーニング画像のＶコンポーネントを処理するように構成され得る。 In some embodiments, the training image has a Y component, a U component, and a V component. In that case, the apparatus of the present disclosure processes the Y component of the training image via the Y channel of the luminance detector, processes the U component of the training image via the U channel of the luminance detector, and the said. It may be configured to process the V component of the training image via the luminance detector V channel.

いくつかの実施形態において、ＹＵＶ空間を用いることは、トレーニング画像に対してクロマサンプリングを行うことである。前記トレーニング画像のＹコンポーネントは、Ｙチャンネルに入る。前記トレーニング画像のＵコンポーネントは、Ｕチャンネルに入る。前記トレーニング画像のＶコンポーネントは、Ｖチャンネルに入る。前記トレーニング画像の入力信号を３つのグループに分けることにより、前記Ｙコンポーネント、前記Ｕコンポーネント、及び前記Ｖコンポーネントのグループからのコンポーネントにおけるそれぞれのチャンネル処理信号は、計算負担を軽減し、処理速度を向上し得る。前記Ｕコンポーネント及び前記Ｖコンポーネントは、画像の表示効果への影響が比較的に低いため、異なるチャンネルにおいて異なるコンポーネントを処理するのは、画像表示に大きな影響を及ぼさない。 In some embodiments, using YUV space is to perform chroma sampling on the training image. The Y component of the training image enters the Y channel. The U component of the training image enters the U channel. The V component of the training image enters the V channel. By dividing the input signal of the training image into three groups, each channel processing signal in the components from the Y component, the U component, and the V component group reduces the calculation load and improves the processing speed. Can be. Since the U component and the V component have a relatively low effect on the display effect of the image, processing different components on different channels does not significantly affect the image display.

正規化器１１２は、前記第１特徴画像を正規化することによって第２特徴画像を生成するように構成される。特徴抽出ユニット１１０が輝度検出器１１１を含み且つ第１特徴画像が輝度特徴画像である実施形態において、正規化器１１２は、前記輝度特徴画像を正規化するように構成される。正規化により、画像のピクセル値がより小さな値の範囲内に収まるようにし、高すぎる又は低すぎる外れピクセル値を取り除くことができる。これは、結局、相関性の計算を促進できる。 The normalizer 112 is configured to generate a second feature image by normalizing the first feature image. In an embodiment in which the feature extraction unit 110 includes a luminance detector 111 and the first feature image is a luminance feature image, the normalizer 112 is configured to normalize the luminance feature image. Normalization allows the pixel values of an image to fall within a smaller range of values and removes outlier pixel values that are too high or too low. This can, in the end, facilitate the calculation of correlations.

より具体的には、正規化器１１２は、次の式（２）によって正規化を行って、第２特徴画像を得るように構成される。 More specifically, the normalizer 112 is configured to perform normalization according to the following equation (2) to obtain a second feature image.

式（２）において、Ｎは、第２特徴画像を表す。Ｉは、トレーニング画像から得られた輝度特徴画像を表す。Ｂｌｕｒは、ガウシアンぼかしを表す。Ｂｌｕｒ（Ｉ）は、前記輝度特徴画像に対して実行するガウシアンぼかしフィルタを表す。Ｂｌｕｒ（Ｉ^２）は、前記輝度特徴画像における各ピクセル値を二乗してから、前記輝度特徴画像にガウシアンぼかしフィルタを実行することによって得られた画像を表す。μは、ガウシアンぼかしフィルタを用いて得られた出力画像を表す。σ^２は、局所分散正規化された画像を表す。 In the formula (2), N represents a second feature image. I represents a luminance feature image obtained from the training image. Blur represents Gaussian blur. Blur (I) represents a Gaussian blur filter performed on the luminance feature image. Blur (I ² ) represents an image obtained by squaring each pixel value in the luminance feature image and then performing a Gaussian blur filter on the luminance feature image. μ represents the output image obtained using the Gaussian blur filter. σ ² represents a locally distributed normalized image.

本開示のいくつかの実施形態において、第２特徴画像の並進シフトは、前記第２特徴画像における最後のａ列のピクセルを残りの列のピクセルの前にシフトして中間画像を得ることを含む。そして、前記中間画像における最後のｂ行のピクセルを残りの行のピクセル前にシフトしてシフトされた画像を得る。ａの値は、０≦ａ＜Ｙである。ｂの値は、０≦ｂ＜Ｘである。ａ及びｂはいずれも整数である。Ｘは、前記第２特徴画像におけるピクセルの総行数を表す。Ｙは、前記第２特徴画像におけるピクセルの総列数を表す。ａの値とｂの値とは同じか又は異なり得る。ａ及びｂがいずれもゼロである場合、前記シフトされた画像は、前記第２特徴画像である。いくつかの実施形態において、任意の所与の２つの画像シフトプロセスにおいて、ａ及びｂの少なくとも一つの値は変化する。シフトが行われる順序が特に限定されないことは理解できる。例えば、いくつかの実施形態において、行のピクセルをシフトして中間画像が得られ得、そして列のピクセルをシフトしてシフトされた画像が得られ得る。 In some embodiments of the present disclosure, the translational shift of the second feature image comprises shifting the pixels in the last a column of the second feature image before the pixels in the remaining columns to obtain an intermediate image. .. Then, the pixel in the last b row in the intermediate image is shifted before the pixel in the remaining row to obtain a shifted image. The value of a is 0 ≦ a <Y. The value of b is 0 ≦ b <X. Both a and b are integers. X represents the total number of rows of pixels in the second feature image. Y represents the total number of columns of pixels in the second feature image. The value of a and the value of b may be the same or different. When both a and b are zero, the shifted image is the second feature image. In some embodiments, at least one value of a and b changes in any two given image shift processes. It is understandable that the order in which the shifts are made is not particularly limited. For example, in some embodiments, the pixels in a row can be shifted to obtain an intermediate image, and the pixels in a column can be shifted to obtain a shifted image.

前記シフトされた画像における各ピクセルの値は、前記第２特徴画像におけるピクセルの値に対応する。複数のシフトされた画像の各々におけるピクセル（ｉ，ｊ）の値は、前記第２特徴画像における異なる位置での異なるピクセルに由来する。 The value of each pixel in the shifted image corresponds to the value of the pixel in the second feature image. The value of the pixel (i, j) in each of the plurality of shifted images is derived from different pixels at different positions in the second feature image.

いくつかの実施形態において、前記第１特徴画像の並進シフトは、前記第２特徴画像における最後のｂ行のピクセルを残りの行のピクセルの前にシフトして中間画像を得ることを含む。そして、前記中間画像における最後のａ行のピクセルを残りの行のピクセルの前にシフトしてシフトされた画像を得る。 In some embodiments, the translational shift of the first feature image comprises shifting the pixels in the last b row in the second feature image before the pixels in the remaining rows to obtain an intermediate image. Then, the pixel in the last row a in the intermediate image is shifted before the pixel in the remaining row to obtain a shifted image.

いくつかの実施形態において、前記第２特徴画像に対してＸ＊Ｙ回の並進シフトを行ってＸ＊Ｙ個の相関性画像を得る。ａ及びｂがいずれもゼロである場合でも、これも一つの並進シフトとしてカウントされる。 In some embodiments, the second feature image is subjected to XY translational shifts to obtain XY correlation images. Even if both a and b are zero, this is also counted as one translational shift.

図２は、本開示の実施形態に係る第２特徴画像における３＊３ピクセルブロックの概略図を示す。図２において、「ｐ１」…「ｐ９」はそれぞれ９つのピクセルのうちの一つの値を表す。図３は、本開示の実施形態に係る図２に図示される第２特徴画像をシフトすることによって得られた９つのシフトされた画像の各々における３＊３ピクセルブロックを示す。 FIG. 2 shows a schematic diagram of a 3 * 3 pixel block in the second feature image according to the embodiment of the present disclosure. In FIG. 2, “p1” ... “p9” each represent a value of one of nine pixels. FIG. 3 shows a 3 * 3 pixel block in each of the nine shifted images obtained by shifting the second feature image illustrated in FIG. 2 according to the embodiment of the present disclosure.

本開示の実施形態において、前記第２特徴画像は、第１サイズを有するピクセルブロックを含む。前記複数のシフトされた画像の各々及び前記複数の相関性画像の各々は、前記第１サイズを有するピクセルブロックを含む。 In the embodiments of the present disclosure, the second feature image comprises a pixel block having a first size. Each of the plurality of shifted images and each of the plurality of correlated images includes a pixel block having the first size.

本開示の目的のために、図２に図示されるブロック内の最上位の行のピクセルは第１行であり、図２に図示されるブロック内の一番左の列のピクセルは第１列である。ａ＝１かつｂ＝１である場合、図３における第２行の中央に示されるシフトされた画像が得られ、前記第２特徴画像における最後の列（即ち、一番右の列）のピクセルを第１列（即ち、一番左の列）のピクセルの前に移動し、最後の行（即ち、一番下の行）のピクセルを第１行（即ち、一番上の行）のピクセルの前に移動する。 For the purposes of the present disclosure, the pixels in the top row in the block illustrated in FIG. 2 are in the first row, and the pixels in the leftmost column in the block illustrated in FIG. 2 are in the first column. Is. When a = 1 and b = 1, the shifted image shown in the center of the second row in FIG. 3 is obtained, and the pixels in the last column (that is, the rightmost column) in the second feature image. Is moved before the pixels in the first column (ie, the leftmost column), and the pixels in the last row (ie, the bottom row) are the pixels in the first row (ie, the top row). Move in front of.

図２及び図３に図示される実施形態において、ピクセルはブロック内の９つの位置のうちの一つを占めることができ、各ピクセルが９つの位置の各々に現れる可能性が９つのシフトされた画像に反映される。その後、９つの相関性画像には、各ピクセルのそれ自体との相関性だけでなく、各ピクセルの画像における他のピクセルとの相関性も含まれている。敵対的生成ネットワークの例示的な例において、前記生成ネットワークが、一つのピクセルの値が高解像度のオリジナル（「リアル」）画像と異なる画像を生成する場合、合成的に生成された画像に基づいて得られた各相関性画像は、前記高解像度のオリジナル画像の相関性画像との不一致を示す。この不一致は、鑑別ネットワークに合成的に生成された画像を「０」により近くスコアリングさせ（即ち、「フェイク」の分類）、前記生成ネットワークがより現実的で知覚的により納得のいく出力の生成をアップデート及び向上させるように駆動する。 In the embodiments illustrated in FIGS. 2 and 3, a pixel can occupy one of nine positions within a block, with nine shifts in the likelihood that each pixel will appear in each of the nine positions. It is reflected in the image. The nine correlated images then include not only the correlation of each pixel with itself, but also the correlation of each pixel with the other pixels in the image. In an exemplary example of a hostile generated network, where the generated network produces an image in which the value of one pixel differs from the high resolution original (“real”) image, it is based on the synthetically generated image. Each of the obtained correlated images shows a discrepancy with the correlated image of the high resolution original image. This discrepancy causes the differential network to score synthetically generated images closer to "0" (ie, "fake" classification), and the generated network produces a more realistic and perceptually convincing output. Drive to update and improve.

本開示が画像に応用され得る並進シフトを限定しないことは理解できる。図４は、本開示の別の実施形態に係る図２に図示される第２特徴画像をシフトした後に得られた９つのシフトされた画像の各々における３＊３ピクセルブロックを示す。 It is understandable that the present disclosure does not limit the translational shifts that can be applied to the image. FIG. 4 shows a 3 * 3 pixel block in each of the nine shifted images obtained after shifting the second feature image illustrated in FIG. 2 according to another embodiment of the present disclosure.

図２及び図４において、前記第２特徴画像における最後のａ列のピクセルを除去し、残りの列のピクセルの前にａ列のピクセルを追加して中間画像を得る。当該追加されたａ列における各々のピクセルは、「０」の値を有する。次に、前記中間画像において、最後のｂ行のピクセルを除去し、残りの行のピクセルの前にｂ行のピクセルを追加してシフトされた画像を得る。当該追加されたｂ列における各々のピクセルは「０」の値を有する。より具体的には、０≦ａ＜Ｙであり、０≦ｂ＜Ｘであり、ａ及びｂはいずれも整数である。Ｘは、前記第２特徴画像におけるピクセルの総行数を表す。Ｙは、前記第２特徴画像におけるピクセルの総列数を表す。ａの値とｂの値とは同じか又は異なり得る。いくつかの実施形態において、任意の所与の２つの画像シフトプロセスにおいて、ａ及びｂの少なくとも一つの値は変化する。 In FIGS. 2 and 4, the last pixel in column a in the second feature image is removed, and the pixel in column a is added in front of the pixel in the remaining column to obtain an intermediate image. Each pixel in the added column a has a value of "0". Next, in the intermediate image, the pixel in the last b row is removed, and the pixel in the b row is added before the pixel in the remaining row to obtain a shifted image. Each pixel in the added column b has a value of "0". More specifically, 0 ≦ a <Y, 0 ≦ b <X, and both a and b are integers. X represents the total number of rows of pixels in the second feature image. Y represents the total number of columns of pixels in the second feature image. The value of a and the value of b may be the same or different. In some embodiments, at least one value of a and b changes in any two given image shift processes.

シフト相関ユニット１２０は、２つの画像における対応する位置でのピクセルの値を乗じることによって相関性画像を生成するように構成される。相関性画像において、（ｉ，ｊ）位置でのピクセルの値は、前記第２特徴画像におけるピクセル（ｉ，ｊ）の値と前記シフトされた画像におけるピクセル（ｉ，ｊ）の値とを乗じることによって得られる。ｉの値は、１≦ｉ≦Ｘである。ｊの値は、１≦ｊ≦Ｙである。ｉ及びｊはいずれも整数である。Ｘは、前記第２特徴画像におけるピクセルの総行数を表す。Ｙは、前記第２特徴画像におけるピクセルの総列数を表す。 The shift correlation unit 120 is configured to generate a correlation image by multiplying the values of the pixels at the corresponding positions in the two images. In the correlation image, the pixel value at the (i, j) position is multiplied by the pixel (i, j) value in the second feature image and the pixel (i, j) value in the shifted image. Obtained by that. The value of i is 1 ≦ i ≦ X. The value of j is 1 ≦ j ≦ Y. Both i and j are integers. X represents the total number of rows of pixels in the second feature image. Y represents the total number of columns of pixels in the second feature image.

本開示に係る画像処理用の装置１００は、汎用のコンピュータ、マイクロプロセッサ、デジタル電子回路、集積回路、特に設計されたＡＳＩＣ（特定用途向け集積回路）、コンピュータハードウェア、ファームウェア、ソフトウェア、及び／又はその組み合わせの形になっているコンピューティング装置で実施され得る。これらの多様な実施は、少なくとも一つのプログラマブルプロセッサを含むプログラマブルシステムで実行可能及び／又は解釈可能な１つ以上のコンピュータプログラムにおける実施を含み、当該少なくとも一つのプログラマブルプロセッサは専用又は汎用であり得、且つカップリングされて記憶システム、少なくとも一つの入力装置、及び少なくとも一つの出力装置からデータ及び命令を受信し、記憶システム、少なくとも一つの入力装置、及び少なくとも一つの出力装置にデータ及び命令を送信し得る。 The image processing apparatus 100 according to the present disclosure includes a general-purpose computer, a microprocessor, a digital electronic circuit, an integrated circuit, a particularly designed ASIC (application specific integrated circuit), computer hardware, firmware, software, and / or. It can be implemented in a computing device in the form of that combination. These diverse implementations include implementations in one or more computer programs that are executable and / or interpretable in a programmable system that includes at least one programmable processor, wherein the at least one programmable processor may be dedicated or general purpose. And coupled to receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, at least one input device, and at least one output device. obtain.

これらのコンピュータプログラム（プログラム、ソフトウェア、ソフトウェアアプリケーション又はコードとも呼ばれる）は、プログラマブルプロセッサの機械命令を含み、高レベルの手続き及び／又はオブジェクト指向プログラミング言語、及び／又はアセンブリ／機械言語で実施され得る。本明細書で使用されるように、用語「機械読み取り可能媒体」、「コンピュータ読み取り可能媒体」は、機械読み取り可能信号として機械命令を受信する機械読み取り可能媒体を含むプログラマブルプロセッサに機械命令及び／又はデータを提供するために用いられる任意のコンピュータプログラム製品、装置及び／又はデバイス（例えば、磁気ディスク、光ディスク、メモリ、プログラマブル論理デバイス（ＰＬＤ））を指す。用語「機械読み取り可能信号」は、プログラマブルプロセッサに機械命令及び／又はデータを提供するために用いられる任意の信号を指す。 These computer programs (also referred to as programs, software, software applications or codes) may include programmable processor machine instructions and may be implemented in high-level procedural and / or object-oriented programming languages and / or assembly / machine languages. As used herein, the terms "machine readable medium", "computer readable medium" are machine instructions and / or to programmable processors including machine readable media that receive machine instructions as machine readable signals. Refers to any computer program product, device and / or device (eg, magnetic disk, disk disk, memory, programmable logic device (PLD)) used to provide data. The term "machine readable signal" refers to any signal used to provide machine instructions and / or data to a programmable processor.

ユーザとのインタラクションを提供するために、本説明書で記述される装置、システム、プロセス、機能、及び技法は、ユーザに情報を表示するための表示装置（例えば、ＣＲＴ（陰極線管）又はＬＣＤ（液晶ディスプレイ）モニター）、並びにユーザがコンピュータに入力を提供できるキーボード及びポインティングデバイス（例えば、マウス又はトラックボール））を有するコンピュータで実施され得る。他の種類のアクセサリ及び／又はデバイスを用いてユーザとのインタラクションを提供しても良い。例えば、ユーザに提供されるフィードバックは任意の形の感覚フィードバック（例えば、視覚フィードバック、聴覚フィードバック、又は触覚フィードバック）であり得る。ユーザからの入力は、音響、音声又は触覚入力を含む任意の形で受信され得る。 To provide interaction with the user, the devices, systems, processes, functions, and techniques described in this document are display devices (eg, CRTs) or LCDs (eg, CRTs) or LCDs for displaying information to the user. It can be performed on a computer with a liquid crystal display) monitor), as well as a keyboard and pointing device (eg, a mouse or trackball) that allows the user to provide input to the computer. Other types of accessories and / or devices may be used to provide interaction with the user. For example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback). Inputs from the user can be received in any form, including acoustic, audio or tactile inputs.

上記の装置、システム、プロセス、機能、及び技法は、バックエンド・コンポーネント（例えば、データサーバとして）を含む、又はミドルウェアコンポーネント（例えば、アプリケーションサーバ）を含む、又はフロントエンドコンポーネント（例えば、ユーザが上記の装置、システム、プロセス、機能、及び技法の実施とインタラクションを行えるグラフィカルユーザインタフェース又はウェブブラウザを有するクライアントコンピュータ）を含む、又はそのようなバックエンド、ミドルウェア、又はフロントエンドコンポーネントの組み合わせを含むコンピューティングシステムにおいて実施され得る。前記システムのコンポーネントは、任意の形式又はデジタルデータ通信の媒体（通信ネットワーク等）により相互接続され得る。通信ネットワークの例は、ローカルエリアネットワーク（「ＬＡＮ」）、ワイドエリアネットワーク（「ＷＡＮ」）、及びインターネットを含む。 The devices, systems, processes, functions, and techniques described above include back-end components (eg, as data servers), or include middleware components (eg, application servers), or front-end components (eg, the user said above). A computing that includes (a client computer with a graphical user interface or web browser capable of performing and interacting with its equipment, systems, processes, functions, and techniques), or a combination of such backend, middleware, or frontend components. Can be implemented in the system. The components of the system may be interconnected by any form or medium of digital data communication (communication network, etc.). Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), and the Internet.

前記コンピューティングシステムは、クライアントと、サーバとを含み得る。クライアントとサーバとは、通常互いに離れており、且つ、一般的に通信ネットワークを介してインタラクションを行う。クライアントとサーバの関係は、それぞれのコンピュータで実行され且つ互いにクライアント・サーバ関係を持つコンピュータプログラムによって生じる。 The computing system may include a client and a server. The client and the server are usually separated from each other and generally interact with each other via a communication network. The client-server relationship arises from computer programs that run on their respective computers and have a client-server relationship with each other.

本開示に係る画像処理用の装置は、ニューラルネットワークにカップリングされ得、前記ニューラルネットワークをトレーニングするように構成され得る。いくつかの実施形態において、本開示に係る装置は、敵対的生成ネットワーク（ＧＡＮ）をトレーニングするように構成される。前記ＧＡＮは、生成ネットワークと、鑑別ネットワークとを含み得る。 The image processing apparatus according to the present disclosure may be coupled to a neural network and may be configured to train the neural network. In some embodiments, the apparatus according to the present disclosure is configured to train a hostile generation network (GAN). The GAN may include a generation network and a discrimination network.

前記鑑別ネットワークは、鑑別ネットワークが入力として受信した画像と入力された画像と同じ解像度を有する所定のリファレンス画像の間のマッチング度を分類できる限り、当該技術分野における通常の知識を有する者に知られている任意の適当な方法で構築及び構成され得る。図５は、本開示の実施形態に係る鑑別ネットワーク２００を示す。鑑別ネットワーク２００は、複数の入力端Ｉｎ１、Ｉｎ２、Ｉｎ３と、複数の分析モジュール２１０と、複数のプーリングモジュール２２０と、鑑別モジュール２３０とを含み得る。 The discrimination network is known to those having ordinary knowledge in the art as long as the discrimination network can classify the degree of matching between the image received as an input and a predetermined reference image having the same resolution as the input image. It can be constructed and configured in any suitable way. FIG. 5 shows the discrimination network 200 according to the embodiment of the present disclosure. The discrimination network 200 may include a plurality of input terminals In1, In2, In3, a plurality of analysis modules 210, a plurality of pooling modules 220, and a discrimination module 230.

複数の分析モジュール２１０の各々は、複数の入力端Ｉｎ１、Ｉｎ２、Ｉｎ３の対応する一つにカップリングされる。分析モジュール２１０は、入力端Ｉｎ１、Ｉｎ２、Ｉｎ３を介して本開示に係る装置により生成された複数の相関性画像を受信する。分析モジュール２１０は、前記複数の相関性画像に基づいて対応する複数の第３特徴画像を生成するように構成される。前記複数の第３特徴画像の各々は、対応する相関性画像の異なるディメンションを表すマルチチャンネル画像である。前記複数の第３特徴画像の各々は、前記対応する相関性画像より多い数のチャンネルを有する。例えば、入力される相関性画像は３つのチャンネルを有し得、出力される第３特徴画像は６４個のチャンネル、１２８個のチャンネル又は他の任意の数のチャンネルを有し得る。前記複数の第３特徴画像の各々は、前記対応する相関性画像と同じ解像度で生成される。 Each of the plurality of analysis modules 210 is coupled to the corresponding one of the plurality of input ends In1, In2, In3. The analysis module 210 receives a plurality of correlation images generated by the apparatus according to the present disclosure via the input terminals In1, In2, and In3. The analysis module 210 is configured to generate a plurality of corresponding third feature images based on the plurality of correlation images. Each of the plurality of third feature images is a multi-channel image representing different dimensions of the corresponding correlation image. Each of the plurality of third feature images has a larger number of channels than the corresponding correlated image. For example, the input correlation image may have 3 channels and the output 3rd feature image may have 64 channels, 128 channels or any number of other channels. Each of the plurality of third feature images is generated at the same resolution as the corresponding correlated image.

複数の分析モジュール２１０の各々は、複数のプーリングモジュール２２０のうちの一つにカップリングされる。複数のプーリングモジュール２２０は、カスケード接続される。プーリングモジュール２２０は、複数の入力画像を受信し、前記複数の入力画像を連結することによって併合画像を生成し、前記併合画像の解像度を低下させてダウンスケールされた併合画像を生成するように構成される。より具体的には、前記複数の入力画像は、対応する分析モジュール２１０から受信された第３特徴画像と、リファレンス画像とを含む。図５に示す如く、カスケードの第１ステージにおいて、分析モジュール２１０からの第３特徴画像は、対応するプーリングモジュール２２０のリファレンス画像として兼ねる。カスケードの後続のステージにおいて、前記リファレンス画像は、カスケードの前のステージにおけるプーリングモジュールにより生成されたダウンスケールされた併合画像である。 Each of the plurality of analysis modules 210 is coupled to one of the plurality of pooling modules 220. The plurality of pooling modules 220 are cascaded. The pooling module 220 is configured to receive a plurality of input images, generate a merged image by concatenating the plurality of input images, and reduce the resolution of the merged image to generate a downscaled merged image. Will be done. More specifically, the plurality of input images include a third feature image received from the corresponding analysis module 210 and a reference image. As shown in FIG. 5, in the first stage of the cascade, the third feature image from the analysis module 210 also serves as a reference image for the corresponding pooling module 220. In the subsequent stages of the cascade, the reference image is a downscaled merged image produced by the pooling module in the previous stage of the cascade.

鑑別モジュール２３０は、カスケードの最後のステージにおけるプーリングモジュール２２０からダウンスケールされた併合画像を受信し、受信された画像と、当該受信された画像と同じ解像度を有する所定のリファレンス画像との間のマッチング度を表すスコアを生成することによって受信されたダウンスケールされた併合画像を分類するように構成される。 The discrimination module 230 receives a merged image downscaled from the pooling module 220 in the final stage of the cascade and matches the received image with a predetermined reference image having the same resolution as the received image. It is configured to classify the downscaled merged images received by generating a score representing the degree.

前記生成ネットワークは、生成ネットワークが画像をアップスケール及び生成できる限り、当該技術分野における通常の知識を有する者に知られている任意の適当な方法で構築及び構成され得る。 The generation network can be constructed and configured by any suitable method known to those of ordinary skill in the art, as long as the generation network can upscale and generate images.

装置１００は、前記鑑別ネットワークの入力端を介して前記鑑別ネットワークにカップリングされ得る。前記鑑別ネットワークは、前記生成ネットワークからの出力画像、又は高解像度オリジナルサンプル画像を直接受信しなくて良い。むしろ、前記鑑別ネットワークは、前記生成ネットワークからの出力画像、又は高解像度オリジナルサンプル画像が装置１００により前処理された後にそれらを受信、分類及びスコアリングするように構成され得る。言い換えれば、前記鑑別ネットワークは、装置１００からの出力を受信、分類及びスコアリングするように構成され得る。 The device 100 may be coupled to the discrimination network via the input end of the discrimination network. The discrimination network does not have to directly receive the output image from the generation network or the high resolution original sample image. Rather, the discrimination network may be configured to receive, classify, and score the output images from the generation network, or high resolution original sample images, after being preprocessed by the device 100. In other words, the discrimination network may be configured to receive, classify and score the output from device 100.

ＧＡＮをトレーニングする従来の方法は、生成ネットワークからの出力画像又はオリジナルサンプル画像を、分類のために、直接鑑別ネットワークに送る。その結果、分類を目的として、前記鑑別ネットワークは前記出力画像又は前記オリジナルサンプル画像にある情報に依存することに限定される。 The conventional method of training GANs is to send the output image from the generation network or the original sample image directly to the discrimination network for classification. As a result, for the purpose of classification, the discrimination network is limited to relying on the information in the output image or the original sample image.

本開示に係る画像処理用の装置において、シフト相関ユニットは、前記生成ネットワークからの出力画像及び／又は高解像度のオリジナル画像を処理して複数の相関性画像を生成する。例えば、前記シフト相関ユニットは、前記出力画像及び／又は前記オリジナルサンプル画像に固有の情報だけでなく、それらの画像とシフトされた或いはそうでなければ変換された画像の間の相関性に関する情報も含む複数の相関性画像を生成するように構成される。従来の方法に比べ、本開示のシステムにおける鑑別ネットワークは、例えば、前記生成ネットワークからの出力画像と前記変換された画像の間の１組の相関性と、前記オリジナルサンプル画像と前記変換された画像の間の１組の相関性とを比較することによって分類を行うための追加的な情報を備える。さらに、自然画像評価（ＮＩＱＥ）非参照画質スコアにより、出力画像（又はオリジナルサンプル画像）と変換画像の間の相関性は知覚品質に影響を与えることが考えられる。 In the image processing apparatus according to the present disclosure, the shift correlation unit processes the output image from the generation network and / or the high resolution original image to generate a plurality of correlation images. For example, the shift correlation unit may include not only information specific to the output image and / or the original sample image, but also information about the correlation between those images and the shifted or otherwise converted image. It is configured to generate multiple correlation images, including. Compared to conventional methods, the discriminant network in the system of the present disclosure comprises, for example, a set of correlations between the output image from the generation network and the converted image, and the original sample image and the converted image. It provides additional information for making the classification by comparing with a set of correlations between. In addition, the natural image evaluation (NIQE) non-reference quality score suggests that the correlation between the output image (or the original sample image) and the converted image may affect the perceptual quality.

従来の方法に比べ、本開示の画像処理用の装置からの出力に基づく分類は、分類の精度を高め、分類結果の正確性を改善し、実画像に非常に似ているので鑑別ネットワークにより分類されにくいソリューションの作成に向かって生成ネットワークのパラメータをトレーニングする。これは、知覚的に優れたソリューションを促す。 Compared with the conventional method, the classification based on the output from the image processing apparatus of the present disclosure improves the accuracy of the classification, improves the accuracy of the classification result, and is very similar to the actual image, so that it is classified by the discrimination network. Train the parameters of the generated network towards the creation of less likely solutions. This encourages a perceptually superior solution.

本開示は、画像処理用の方法を更に提供する。図６は、本開示の実施形態に係る画像処理用の方法のフローチャートを示す。 The present disclosure further provides a method for image processing. FIG. 6 shows a flowchart of the method for image processing according to the embodiment of the present disclosure.

ステップＳ１は、例えば、トレーニング画像の抽出された輝度情報に基づいて輝度特徴画像を生成することによって第１特徴画像を得るステップを含む。 Step S1 includes, for example, obtaining a first feature image by generating a luminance feature image based on the extracted luminance information of the training image.

ステップＳ２は、前記第１特徴画像を正規化して第２特徴画像を得るステップを含む。 Step S2 includes a step of normalizing the first feature image to obtain a second feature image.

ステップＳ３は、前記第２特徴画像に対して複数回の並進シフトを行って複数のシフトされた画像を得るステップを含む。各シフトされた画像は、前記第２特徴画像と同じ数の行及び列のピクセルを有する。各シフトされた画像における非ゼロ値を有する各ピクセルは、前記第２特徴画像における同じ非ゼロ値を持つ対応するピクセルを有する。前記第２特徴画像における対応するピクセルを有しないピクセルは、「０」の値が割り当てられてもよい。言い換えれば、シフトされた画像における非ゼロ値を有する各ピクセルは、前記第２特徴画像における対応するピクセルを有する。 Step S3 includes a step of performing a plurality of translational shifts on the second feature image to obtain a plurality of shifted images. Each shifted image has the same number of rows and columns of pixels as the second feature image. Each pixel with a non-zero value in each shifted image has a corresponding pixel with the same non-zero value in the second feature image. Pixels that do not have a corresponding pixel in the second feature image may be assigned a value of "0". In other words, each pixel with a non-zero value in the shifted image has a corresponding pixel in the second feature image.

ステップＳ４は、前記第２特徴画像と前記複数のシフトされた画像の間の相関性に基づいて複数の相関性画像を生成するステップを含む。各相関性画像は、前記第２特徴画像と同じ数の行及び列のピクセルを有する。 Step S4 includes generating a plurality of correlated images based on the correlation between the second feature image and the plurality of shifted images. Each correlation image has the same number of rows and columns of pixels as the second feature image.

ステップＳ５は、例えば、敵対的生成ネットワークの鑑別ネットワークのようなニューラルネットワークに前記複数の相関性画像を送信するステップを含む。 Step S5 includes, for example, transmitting the plurality of correlation images to a neural network such as a discrimination network of a hostile generation network.

本開示に係る方法は、ニューラルネットワークをトレーニングするように構成され得る。いくつかの実施形態において、本開示に係る方法は、敵対的生成ネットワーク（ＧＡＮ）をトレーニングするように構成される。前記ＧＡＮは、生成ネットワークと、鑑別ネットワークとを含み得る。ＧＡＮをトレーニングする従来の方法は、生成ネットワークからの出力画像又はオリジナルサンプル画像を、分類のために、直接鑑別ネットワークに送る。その結果、分類を目的として、前記鑑別ネットワークは前記出力画像又は前記オリジナルサンプル画像にある情報に依存することに限定される。 The method according to the present disclosure may be configured to train a neural network. In some embodiments, the methods according to the present disclosure are configured to train a hostile generation network (GAN). The GAN may include a generation network and a discrimination network. The conventional method of training GANs is to send the output image from the generation network or the original sample image directly to the discrimination network for classification. As a result, for the purpose of classification, the discrimination network is limited to relying on the information in the output image or the original sample image.

従来の技法に比べ、本開示の方法は、生成ネットワークからの出力画像又は高解像度のオリジナル画像を直接鑑別ネットワークに送信しない。むしろ、画像は、分類のために前記鑑別ネットワークに送られる前に、特徴抽出ユニットとシフト相関ユニットとを含む上記の装置により処理される。前記シフト相関ユニットは、複数の変換された画像を生成する。例えば、前記シフト相関ユニットは、前記出力画像及び／又は前記オリジナルサンプル画像に固有の情報だけでなく、それらの画像と変換された画像の間の相関性に関する情報も含む複数の相関性画像を生成するように構成される。この追加的な情報は、前記鑑別ネットワークが２組の相関性の間の類似性に基づいて、即ち、前記生成ネットワークからの出力画像と前記変換された画像の間の１組の相関性と、前記オリジナルサンプル画像と前記変換された画像の間のもう１組の相関性の間の類似性に基づいて分類を行うようにする。さらに、自然画像評価（ＮＩＱＥ）非参照画質スコアにより、出力画像（又はオリジナルサンプル画像）と変換された画像の間の相関性は知覚品質に影響を与えることが考えられる。 Compared to conventional techniques, the methods of the present disclosure do not send the output image from the generation network or the high resolution original image directly to the discrimination network. Rather, the image is processed by the above-mentioned device, including a feature extraction unit and a shift correlation unit, before being sent to the discrimination network for classification. The shift correlation unit produces a plurality of transformed images. For example, the shift correlation unit produces a plurality of correlation images that include not only information specific to the output image and / or the original sample image, but also information about the correlation between those images and the converted image. It is configured to do. This additional information is based on the similarity between the two sets of correlations of the differential network, i.e., one set of correlations between the output image from the generation network and the transformed image. The classification is to be based on the similarity between the other set of correlations between the original sample image and the transformed image. In addition, the natural image rating (NIQE) non-reference quality score suggests that the correlation between the output image (or the original sample image) and the converted image may affect the perceptual quality.

本開示の装置からの出力に基づく分類は、分類の精度を高め、分類結果の正確性を改善し、実画像に非常に似ているので鑑別ネットワークにより分類されにくいソリューションの作成に向かって生成ネットワークのパラメータをトレーニングする。これは、知覚的に優れたソリューションを促す。 Classification based on the output from the equipment of the present disclosure enhances the accuracy of classification, improves the accuracy of classification results, and creates a network towards the creation of solutions that are so similar to real images that they are difficult to classify by a discriminant network. To train the parameters of. This encourages a perceptually superior solution.

図７は、本開示の別の実施形態に係る画像処理用の方法のフローチャートを示す。 FIG. 7 shows a flowchart of a method for image processing according to another embodiment of the present disclosure.

ステップＳ１は、第１特徴画像を得るステップを含む。前記第１特徴画像は、トレーニング画像の輝度情報を抽出することによって得られた輝度特徴画像であり得る。 Step S1 includes a step of obtaining a first feature image. The first feature image may be a luminance feature image obtained by extracting the luminance information of the training image.

したがって、前記第１特徴画像を得るステップは、前記トレーニング画像における輝度情報に基づいて輝度特徴画像を得るステップを含むステップＳ１１を含み得る。 Therefore, the step of obtaining the first feature image may include step S11 including a step of obtaining a luminance feature image based on the luminance information in the training image.

前記輝度特徴画像は、前記トレーニング画像と同じ数の行及び列のピクセルを有する。前記輝度特徴画像のｉ行目かつｊ列目（ｉ，ｊ）でのピクセルの輝度値Ｉは、次の式（１）によって計算され得る。 The luminance feature image has the same number of rows and columns of pixels as the training image. The luminance value I of the pixel in the i-th row and the j-th column (i, j) of the luminance feature image can be calculated by the following equation (1).

ステップＳ１２において、前記輝度特徴画像を正規化して第２特徴画像を得る。正規化により、画像のピクセル値がより小さな値の範囲内に収まるようにし、高すぎる又は低すぎる外れピクセル値を取り除くことができる。これは、結局、相関性の計算を促進できる。 In step S12, the luminance feature image is normalized to obtain a second feature image. Normalization allows the pixel values of an image to fall within a smaller range of values and removes outlier pixel values that are too high or too low. This can, in the end, facilitate the calculation of correlations.

より具体的には、ステップＳ１２において、次の式（２）によって正規化を行う。 More specifically, in step S12, normalization is performed by the following equation (2).

式（２）において、Ｎは、前記第２特徴画像を表す。Ｉは、前記トレーニング画像から得られた輝度特徴画像における所与の位置でのピクセルの輝度値を表す。Ｂｌｕｒは、ガウシアンぼかしを表す。Ｂｌｕｒ（Ｉ）は、前記輝度特徴画像に対して実行するガウシアンぼかしフィルタを表す。Ｂｌｕｒ（Ｉ^２）は、前記輝度特徴画像における各ピクセル値を二乗してから、前記輝度特徴画像にガウシアンぼかしフィルタを実行することによって得られた画像を表す。μは、ガウシアンぼかしフィルタを用いて得られた出力画像を表す。σ^２は、局所分散画像を表す。 In the formula (2), N represents the second feature image. I represents the luminance value of the pixel at a given position in the luminance feature image obtained from the training image. Blur represents Gaussian blur. Blur (I) represents a Gaussian blur filter performed on the luminance feature image. Blur (I ² ) represents an image obtained by squaring each pixel value in the luminance feature image and then performing a Gaussian blur filter on the luminance feature image. μ represents the output image obtained using the Gaussian blur filter. σ ² represents a locally dispersed image.

ステップＳ２は、前記第２特徴画像に対して複数回の並進シフトを行って複数のシフトされた画像を得るステップを含む。各シフトされた画像は、前記第２特徴画像と同じ数の行及び列のピクセルを有する。 Step S2 includes a step of performing a plurality of translational shifts on the second feature image to obtain a plurality of shifted images. Each shifted image has the same number of rows and columns of pixels as the second feature image.

本開示のいくつかの実施形態において、前記複数回の並進シフトを行うステップは、前記第２特徴画像における最後のａ列のピクセルを残りの列のピクセルの前にシフトして中間画像を得、そして前記中間画像における最後のｂ行のピクセルを残りの行のピクセルの前にシフトしてシフトされた画像を得るステップを含む。 In some embodiments of the present disclosure, the multiple translational shift steps shift the pixels in the last column a in the second feature image in front of the pixels in the remaining columns to obtain an intermediate image. It includes a step of shifting the pixels in the last b row in the intermediate image before the pixels in the remaining rows to obtain a shifted image.

本開示の他の実施形態において、前記複数回の並進シフトを行うステップは、第２特徴画像における最後のｂ行のピクセルを残りの行のピクセルの前にシフトして中間画像を得、そして前記中間画像における最後のａ行のピクセルを残りの行のピクセルの前にシフトしてシフトされた画像を得るステップを含む。 In another embodiment of the present disclosure, the multiple translational shift steps shift the pixels in the last b row in the second feature image before the pixels in the remaining rows to obtain an intermediate image, and said. It involves shifting the pixels in the last row of rows of the intermediate image before the pixels in the remaining rows to obtain a shifted image.

ａの値は、≦ａ＜Ｙである。ｂの値は、０≦ｂ＜Ｘである。ａ及びｂはいずれも整数である。Ｘは、前記第２特徴画像におけるピクセルの総行数を表す。Ｙは、前記第２特徴画像におけるピクセルの総列数を表す。いくつかの実施形態において、任意の所与の２つの画像シフトプロセスにおいて、ａ及びｂの少なくとも一つの値は変化する。 The value of a is ≦ a <Y. The value of b is 0 ≦ b <X. Both a and b are integers. X represents the total number of rows of pixels in the second feature image. Y represents the total number of columns of pixels in the second feature image. In some embodiments, at least one value of a and b changes in any two given image shift processes.

各シフトされた画像における非ゼロ値を有する各ピクセルは、前記第２特徴画像における同じ非ゼロ値を持つ対応するピクセルを有する。前記第２特徴画像における対応するピクセルを有しないピクセルは、「０」の値が割り当てられてもよい。言い換えれば、シフトされた画像における非ゼロ値を有する各ピクセルは、前記第２特徴画像における対応するピクセルを有する。 Each pixel with a non-zero value in each shifted image has a corresponding pixel with the same non-zero value in the second feature image. Pixels that do not have a corresponding pixel in the second feature image may be assigned a value of "0". In other words, each pixel with a non-zero value in the shifted image has a corresponding pixel in the second feature image.

ステップＳ３は、前記第２特徴画像と前記複数のシフトされた画像の間の相関性に基づいて複数の相関性画像を生成するステップを含む。各相関性画像は、前記第２特徴画像と同じ数の行及び列のピクセルを有する。 Step S3 includes generating a plurality of correlated images based on the correlation between the second feature image and the plurality of shifted images. Each correlation image has the same number of rows and columns of pixels as the second feature image.

前記複数の相関性画像を生成するステップは、前記第２特徴画像における各ピクセルの値と前記シフトされた画像における位置的に対応するピクセルの値とを乗じるステップを含む。言い換えれば、前記第２特徴画像におけるピクセル（ｉ，ｊ）の値に前記シフトされた画像におけるピクセル（ｉ，ｊ）の値を乗じて前記相関性画像における（ｉ，ｊ）位置でのピクセルの値を生成する。ｉの値は、１≦ｉ≦Ｘである。ｊの値は、１≦ｊ≦Ｙである。ｉ及びｊはいずれも整数である。Ｘは、前記第２特徴画像におけるピクセルの総行数を表す。Ｙは、前記第２特徴画像におけるピクセルの総列数を表す。 The step of generating the plurality of correlated images includes a step of multiplying the value of each pixel in the second feature image by the value of the positionally corresponding pixel in the shifted image. In other words, the value of the pixel (i, j) in the second feature image is multiplied by the value of the pixel (i, j) in the shifted image to obtain the pixel at the (i, j) position in the correlation image. Generate a value. The value of i is 1 ≦ i ≦ X. The value of j is 1 ≦ j ≦ Y. Both i and j are integers. X represents the total number of rows of pixels in the second feature image. Y represents the total number of columns of pixels in the second feature image.

ステップＳ４は、例えば、敵対的生成ネットワークの鑑別ネットワークのようなニューラルネットワークに前記複数の相関性画像を送信するステップを含む。 Step S4 includes, for example, transmitting the plurality of correlation images to a neural network such as a discrimination network of a hostile generation network.

本開示に係る画像処理用の方法は、汎用のコンピュータ、マイクロプロセッサ、デジタル電子回路、集積回路、特に設計されたＡＳＩＣ（特定用途向け集積回路）、コンピュータハードウェア、ファームウェア、ソフトウェア、及び／又はその組み合わせの形になっているコンピューティング装置で実施され得る。これらの多様な実施は、少なくとも一つのプログラマブルプロセッサを含むプログラマブルシステムで実行可能及び／又は解釈可能な１つ以上のコンピュータプログラムにおける実施を含み、当該少なくとも一つのプログラマブルプロセッサは専用又は汎用であり得、且つカップリングされて記憶システム、少なくとも一つの入力装置、及び少なくとも一つの出力装置からデータ及び命令を受信し、記憶システム、少なくとも一つの入力装置、及び少なくとも一つの出力装置にデータ及び命令を送信し得る。 The methods for image processing according to the present disclosure include general purpose computers, microprocessors, digital electronic circuits, integrated circuits, specifically designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and / or the like. It can be implemented in a combination of computing devices. These diverse implementations include implementations in one or more computer programs that are executable and / or interpretable in a programmable system that includes at least one programmable processor, wherein the at least one programmable processor may be dedicated or general purpose. And coupled to receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, at least one input device, and at least one output device. obtain.

図８は、本開示の実施形態に係るニューラルネットワークをトレーニングするシステムのブロック図を示す。 FIG. 8 shows a block diagram of a system for training a neural network according to an embodiment of the present disclosure.

図８に示す如く、上記の装置１００は、入力端Ｉｎを介して鑑別ネットワーク２００にカップリングされ得る。鑑別ネットワーク２００の構造及び構成は特に限定されない。鑑別ネットワーク２００は、以上の記述のように、構築及び構成され得、または、鑑別ネットワークが入力として受信した画像と、入力された画像と同じ解像度を有する所定のリファレンス画像との間のマッチング度を分類できる限り、当該技術分野における通常の知識を有する者に知られている任意の適当な方法でも構築及び構成され得る。 As shown in FIG. 8, the device 100 may be coupled to the discrimination network 200 via the input end In. The structure and configuration of the discrimination network 200 are not particularly limited. The discrimination network 200 can be constructed and configured as described above, or the degree of matching between an image received as an input by the discrimination network and a predetermined reference image having the same resolution as the input image. As long as it can be classified, it can be constructed and constructed by any suitable method known to those having ordinary knowledge in the art.

本開示の実施形態は、生成ネットワークからの出力画像及び／又は高解像度のオリジナル画像を直接鑑別ネットワークに送信しない。むしろ、画像は、分類のために前記鑑別ネットワークに送られる前に、特徴抽出ユニットとシフト相関ユニットとを含む上記の装置により処理される。前記シフト相関ユニットは、前記生成ネットワークからの出力画像及び／又は前記高解像度のオリジナル画像を処理して複数の変換された画像を生成するように構成される。例えば、前記シフト相関ユニットは、前記出力画像及び／又は前記オリジナルサンプル画像に固有の情報だけでなく、それらの画像と変換された画像の間の相関性に関する情報も含む複数の相関性画像を生成するように構成される。この追加的な情報は、前記鑑別ネットワークが２組の相関性の間の類似性に基づいて、即ち、前記生成ネットワークからの出力画像と前記変換された画像の間の１組の相関性と、前記オリジナルサンプル画像と前記変換された画像の間のもう１組の相関性の間の類似性に基づいて分類を行うようにする。さらに、自然画像評価（ＮＩＱＥ）非参照画質スコアにより、出力画像（又はオリジナルサンプル画像）と変換された画像の間の相関性は知覚品質に影響を与えることが考えられる。 The embodiments of the present disclosure do not directly transmit the output image from the generation network and / or the high resolution original image to the discrimination network. Rather, the image is processed by the above-mentioned device, including a feature extraction unit and a shift correlation unit, before being sent to the discrimination network for classification. The shift correlation unit is configured to process the output image from the generation network and / or the high resolution original image to generate a plurality of converted images. For example, the shift correlation unit produces a plurality of correlation images that include not only information specific to the output image and / or the original sample image, but also information about the correlation between those images and the converted image. It is configured to do. This additional information is based on the similarity between the two sets of correlations of the differential network, i.e., one set of correlations between the output image from the generation network and the transformed image. The classification is to be based on the similarity between the other set of correlations between the original sample image and the transformed image. In addition, the natural image rating (NIQE) non-reference quality score suggests that the correlation between the output image (or the original sample image) and the converted image may affect the perceptual quality.

本開示に係る装置からの出力に基づく分類は、分類の精度を高め、分類結果の正確性を改善し、実画像に非常に似ているので鑑別ネットワークにより分類されにくいソリューションの作成に向かって生成ネットワークのパラメータをトレーニングする。これは、知覚的に優れたソリューションを促す。 Classification based on the output from the device according to the present disclosure is generated towards the creation of solutions that improve the accuracy of classification, improve the accuracy of classification results, and are so similar to real images that they are difficult to classify by a discriminant network. Train network parameters. This encourages a perceptually superior solution.

いくつかの実施形態において、本開示に係る装置は、例えば、図８に示す如く、敵対的生成ネットワークをトレーニングするように構成され得る。図８は、本開示の実施形態に係る、一つの入力端Ｉｎを介して鑑別ネットワーク２００にカップリングされた一つの装置１００を含む敵対的生成ネットワークをトレーニングするシステムを示す。但し、本開示は、図８に示す実施形態に限定されない。例えば、生成ネットワークが異なる解像度を有する複数の画像を生成する実施形態において、鑑別ネットワークは、それぞれが装置１００にカップリングされた複数の入力端Ｉｎを含み得る。前記生成ネットワークからの各画像は、複数の画像処理用の装置１００のうちの一つに送信される。各装置１００は、当該受信された画像に基づいて複数の相関性画像を生成し、前記複数の相関性画像を鑑別ネットワーク２００に送信する。一つの装置１００からの複数の相関性画像は、分類されるべき画像の特定のチャネルの特徴画像を表し得る。鑑別ネットワーク２００は、前記複数の入力端を介して複数の装置１００から相関性画像を受信し、前記生成ネットワークからの最高解像度を持つ画像を分類されるべき画像として設定するように構成され、そして、鑑別ネットワーク２００は、分類されるべき画像と、同じ解像度を有する所定のリファレンス画像の間のマッチング度をスコアリングするように構成される。 In some embodiments, the apparatus according to the present disclosure may be configured to train a hostile generation network, eg, as shown in FIG. FIG. 8 shows a system according to an embodiment of the present disclosure that trains a hostile generation network including one device 100 coupled to a discrimination network 200 via one input end In. However, the present disclosure is not limited to the embodiment shown in FIG. For example, in an embodiment in which the generation network produces a plurality of images having different resolutions, the discrimination network may include a plurality of input terminals In, each coupled to the device 100. Each image from the generation network is transmitted to one of a plurality of image processing devices 100. Each device 100 generates a plurality of correlated images based on the received image, and transmits the plurality of correlated images to the discrimination network 200. A plurality of correlated images from one device 100 may represent feature images of a particular channel of images to be classified. The discrimination network 200 is configured to receive correlated images from the plurality of devices 100 via the plurality of input ends and set the image with the highest resolution from the generation network as the image to be classified. , The discrimination network 200 is configured to score the degree of matching between the image to be classified and a predetermined reference image having the same resolution.

図８のブロック図は、前記鑑別ネットワークが図８に示されるコンポーネントのみを含むことを示すことを意図していない。本開示に係る鑑別ネットワークは、具体的な実施の詳細に応じて、当該技術分野における通常の知識を有する者に知られているが図８に示されていない任意の数の追加的なアクセサリ及び／又はコンポーネントを含み得る。 The block diagram of FIG. 8 is not intended to show that the discrimination network contains only the components shown in FIG. The discrimination network according to the present disclosure is an arbitrary number of additional accessories known to those of ordinary skill in the art but not shown in FIG. 8, depending on the specific implementation details. / Or may include components.

本開示は、上記のように敵対的生成ネットワークをトレーニングする画像を前処理する方法を実行するための命令を記憶するコンピュータ読み取り可能媒体を提供する。 The present disclosure provides a computer-readable medium that stores instructions for performing a method of preprocessing an image that trains a hostile generation network as described above.

本明細書で使用されるように、用語「コンピュータ読み取り可能媒体」は、機械読み取り可能信号として機械命令を受信する機械読み取り可能媒体を含むプログラマブルプロセッサに機械命令及び／又はデータを提供するために用いられる任意のコンピュータプログラム製品、装置及び／又はデバイス（例えば、磁気ディスク、光ディスク、メモリ、プログラマブル論理デバイス（ＰＬＤ））を指す。用語「機械読み取り可能信号」は、プログラマブルプロセッサに機械命令及び／又はデータを提供するために用いられる任意の信号を指す。本開示に係るコンピュータ読み取り可能媒体は、ランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、不揮発性ランダムアクセスメモリ（ＮＶＲＡＭ）、プログラマブル読み取り専用メモリ（ＰＲＯＭ）、消去可能なプログラマブル読み取り専用メモリ（ＥＰＲＯＭ）、電気的消去可能ＰＲＯＭ（ＥＥＰＲＯＭ）、フラッシュメモリ、磁気又は光学データストレージ、レジスタ、コンパクトディスク（ＣＤ）又はＤＶＤ（デジタル・バーサタイル・ディスク）光学記憶媒体及び他の非一時的媒体のようなディスク又はテープを含むが、これらに限られない。 As used herein, the term "computer-readable medium" is used to provide machine instructions and / or data to programmable processors that include machine-readable media that receive machine instructions as machine-readable signals. Refers to any computer program product, device and / or device (eg, magnetic disk, disk disk, memory, programmable logic device (PLD)). The term "machine readable signal" refers to any signal used to provide machine instructions and / or data to a programmable processor. The computer-readable medium according to the present disclosure includes a random access memory (RAM), a read-only memory (ROM), a non-volatile random access memory (NVRAM), a programmable read-only memory (PROM), and an erasable programmable read-only memory (EPROM). ), Electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, compact discs (CDs) or DVDs (digital versatile discs) optical storage media and discs such as other non-temporary media. Or, but not limited to, including, but not limited to, tape.

本明細書の記述において、「ある実施形態」、「いくつかの実施形態」、及び「例示的な実施形態」、「例」及び「特定の例」又は「いくつかの例」などに対する言及は、特定の特徴及び構造、材料又は特性が本開示の少なくとも一部の実施形態又は例に含まれる実施形態又は例に関連して記述された旨を意図する。用語の概略的な表現は、必ずしも同じ実施形態又は例を指すとは限らない。さらに、記述される特定の特徴、構造、材料又は特性は、任意の適切な方法で任意の１つ以上の実施形態又は例に含まれ得る。また、当該技術分野における通常の知識を有する者にとって、開示されたものは本開示の範囲に関し、技術方案は技術的特徴の特定の組み合わせに限定されず、本発明の概念から逸脱することなく技術的特徴又は技術的特徴の同等の特徴を組み合わせることによって形成される他の技術方案も網羅すべきである。その上、用語「第１」及び「第２」は単に説明を目的としており、示された技術的特徴の相対的な重要性を明示又は暗示するものと、数量の暗示的な言及として解釈されるべきではない。したがって、用語「第１」及び「第２」によって定義される特徴は、１つ以上の特徴を明示的又は暗黙的に含み得る。本開示の記述において、「複数」の意味は、特に具体的に定義されない限り、２つ以上である。 References herein include "some embodiments," "some embodiments," and "exemplary embodiments," "examples," and "specific examples," or "some examples." , It is intended that a particular feature and structure, material or property has been described in connection with an embodiment or example contained in at least some of the embodiments or examples of the present disclosure. Schematic representations of terms do not necessarily refer to the same embodiment or example. In addition, the particular features, structures, materials or properties described may be included in any one or more embodiments or examples in any suitable manner. Further, for a person having ordinary knowledge in the art, what is disclosed is not limited to a specific combination of technical features with respect to the scope of the present disclosure, and the technique does not deviate from the concept of the present invention. Other technical schemes formed by combining technical features or equivalent features of technical features should also be covered. Moreover, the terms "1st" and "2nd" are for illustration purposes only and are interpreted as an explicit or implied reference to the relative importance of the technical features shown and as an implied reference to the quantity. Should not be. Thus, the features defined by the terms "first" and "second" may explicitly or implicitly include one or more features. In the description of the present disclosure, the meaning of "plurality" is two or more unless specifically defined.

本開示の原理及び実施形態は明細書に記載されている。本開示の実施形態の記述は単に本開示の方法及びその核となるアイデアの理解を助けるためのみに用いられる。一方、当該技術分野における通常の知識を有する者にとって、開示されたものは本開示の範囲に関し、技術方案は技術的特徴の特定の組み合わせに限定されず、本発明の概念から逸脱することなく技術的特徴又は技術的特徴の同等の特徴を組み合わせることによって形成される他の技術方案も網羅すべきである。例えば、本開示に開示されるような（ただし、これに限られない）上記の特徴を類似した特徴に置き換えることによって技術方案が得られ得る。
The principles and embodiments of the present disclosure are described herein. The description of embodiments of the present disclosure is used solely to aid in understanding the methods of the present disclosure and its core ideas. On the other hand, for those who have ordinary knowledge in the art, what is disclosed is the scope of the present disclosure, and the technical plan is not limited to a particular combination of technical features and does not deviate from the concept of the invention. Other technical schemes formed by combining technical features or equivalent features of technical features should also be covered. For example, a technical plan may be obtained by substituting (but not limited to) the above features as disclosed in the present disclosure with similar features.

Claims

A device that generates multiple correlation images
A feature extraction unit configured to receive a training image, extract at least one feature from the training image, and generate a first feature image based on the training image.
A normalizer configured to normalize the first feature image and generate a second feature image,
A plurality of translational shifts are performed on the second feature image to generate a plurality of shifted images, and each of the plurality of shifted images is correlated with the second feature image to generate a plurality of correlation images. A device that includes a shift correlation unit that is configured to produce.

The shift correlation unit shifts the leftmost or rightmost a column of pixels in the pixel block of the second feature image to the rightmost or leftmost column of the pixel block, respectively. Then, by shifting the lowest or highest b row of pixels in the pixel block of the second feature image to the highest or lowest row of the pixel block, respectively, the second feature image is obtained. On the other hand, it is configured to perform the above-mentioned multiple translational shifts.
0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, Y is the total number of columns of pixels in the pixel block of the second feature image, and X is the first. 2 The total number of rows of pixels in the pixel block of the feature image.
The device according to claim 1, wherein a and b are the same or different .

The shift correlation unit deletes the leftmost or rightmost a-column pixel in the pixel block of the second feature image, and a-column at the rightmost or leftmost position of the pixel block. Add the pixels of the second feature image, delete the lowest or top b row of pixels in the pixel block of the second feature image, and add the b row of pixels to the top or bottom position of the pixel block. By adding each of them, it is configured to perform the translational shift a plurality of times with respect to the second feature image.
0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, Y is the total number of columns of pixels in the pixel block of the second feature image, and X is the first. 2 The total number of rows of pixels in the pixel block of the feature image.
The device of claim 1 or 2 , wherein each of the added pixels has a pixel value of 0 .

The shift correlation unit multiplies the pixel value of each pixel in each pixel block of the plurality of shifted images by the pixel value of the positionally corresponding pixel in the pixel block of the second feature image. The apparatus according to any one of claims 1 to 3 , wherein each of the plurality of shifted images is configured to correlate with the second feature image.

The first feature image is a luminance feature image.
The feature extraction unit is
The apparatus according to any one of claims 1 to 4, comprising a luminance detector configured to extract luminance information from the training image and generate the luminance feature image.

In order to generate the luminance feature image, the luminance detector is configured to determine the brightness value of the pixel at a given position in the luminance feature image by the following equation (1).
I = 0.299R + 0.587G + 0.114B (1)
I is the luminance value, and is
R is the red component value of the pixel corresponding to the position in the training image, and is
G is a green component value of the pixel corresponding to the position in the training image, and is
The device according to claim 5 , wherein B is a blue component value of a pixel corresponding to the position in the training image.

The normalizer is configured to normalize the luminance feature image by the following equation (2).

N is the first feature image, and is
I represents the luminance value of the pixel at a given position in the luminance feature image.
Blur (I) is an image obtained by applying a Gaussian filter to the luminance feature image.
The Blur (I ² ) is an image obtained by squaring each pixel value in the luminance feature image and then applying a Gaussian filter to the luminance feature image , according to claim 5 or 6. Device.

The second feature image includes a pixel block having a first size.
Each of the plurality of shifted images and each of the plurality of correlated images comprises a pixel block having the first size.
One of claims 1 to 7, wherein in each of the plurality of shifted images, the pixel having the non-zero pixel value has the corresponding pixel having the same non-zero pixel value in the second feature image. The device described in the section.

A method of generating multiple correlation images
The method is
Steps to generate the first feature image based on the training image,
The step of normalizing the first feature image and generating the second feature image,
A step of performing a plurality of translational shifts on the second feature image to generate a plurality of shifted images, and a step of generating a plurality of shifted images.
A method comprising a step of correlating each of the plurality of shifted images with the second feature image to generate a plurality of correlated images.

The step of correlating each of the plurality of shifted images with the second feature image is to the pixel value of each pixel in each pixel block of the plurality of shifted images within the pixel block of the second feature image. 9. The method of claim 9, comprising the step of multiplying the pixel values of the positionally corresponding pixels of.

The step of performing the multiple translation shifts is
A step of shifting the leftmost or rightmost column a pixel in the pixel block of the second feature image to the rightmost or leftmost column of the pixel block, respectively.
A step of shifting the lowest or highest b row of pixels in the pixel block of the second feature image to the highest or lowest row of the pixel block, respectively, is included.
0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, Y is the total number of columns of pixels in the pixel block of the second feature image, and X is the first. 2 The total number of rows of pixels in the pixel block of the feature image.
The method according to claim 9 or 10, wherein a and b are the same or different .

11. The method of claim 11, wherein at least one of a and b changes at least once during the execution of the plurality of translational shifts.

The step of performing the multiple translation shifts is
The leftmost or rightmost a-column pixel in the pixel block of the second feature image is deleted, and the a-column pixel is added at the rightmost or leftmost position of the pixel block, respectively. Steps and
The second feature image includes a step of deleting the lowest or highest b row of pixels in the pixel block and adding b row of pixels to the highest or lowest position of the pixel block, respectively.
0 ≦ a <Y, 0 ≦ b <X, both a and b are integers, Y is the total number of columns of pixels in the pixel block of the second feature image, and X is the first. 2 The total number of rows of pixels in the pixel block of the feature image.
The method of claim 9 or 10, wherein each of the added pixels has a pixel value of 0.

13. The method of claim 13, wherein at least one of a and b changes at least once during the execution of the plurality of translational shifts.

Including the step of performing XY translation shifts,
Any of claims 9 to 14 , where Y is the total number of columns of pixels in the pixel block of the second feature image and X is the total number of rows of pixels in the pixel block of the second feature image. The method described in paragraph 1.

The step of receiving the training image is further included before generating the first feature image.
The method according to any one of claims 9 to 15, wherein the step of generating the first feature image includes a step of generating a luminance feature image based on the luminance information of the training image.

Further comprising the step of determining the luminance value of the pixel at a given position in the luminance feature image by the following equation (1).
I = 0.299R + 0.587G + 0.114B (1)
I is the luminance value, and is
R is the red component value of the pixel corresponding to the position in the training image, and is
G is a green component value of the pixel corresponding to the position in the training image, and is
The method according to claim 16 , wherein B is a blue component value of a pixel corresponding to the position in the training image.

Further including a step of normalizing the luminance feature image by the following equation (2).

N is the first feature image, and is
I represents the luminance feature image.
Blur (I) is an image obtained by applying a Gaussian filter to the luminance feature image.
The Blur (I ² ) is an image obtained by squaring each pixel value in the luminance feature image and then applying a Gaussian filter to the luminance feature image , according to claim 16 or 17. Method.

The first feature image comprises a pixel block having a first size.
Each of the plurality of shifted images and each of the plurality of correlated images comprises a pixel block having the first size.
One of claims 9 to 18, wherein in each of the plurality of shifted images, a pixel having a non-zero pixel value has a corresponding pixel having the same non-zero pixel value in the first feature image. The method described in the section.

A non-temporary computer-readable medium that stores instructions that cause a computer to perform the method of any one of claims 9-19.

A system that trains hostile generation networks
A hostile generation network processor comprising a generation network microprocessor configured to be trained by a discrimination network microprocessor and a discrimination network microprocessor coupled to said hostile generation network.
The discrimination network microprocessor is
A plurality of input terminals coupled to the apparatus according to any one of claims 1 to 8.
Multiple analysis modules, each coupled to one of the plurality of input ends,
A plurality of pooling modules connected by the cascade, wherein each stage of the cascade comprises a pooling module coupled to a pooling module in one of the plurality of analysis modules and a stage prior to the cascade.
A system that includes a discrimination network coupled to a pooling module in the final stage of the cascade.