JP6612473B2

JP6612473B2 - Image generation using neural networks

Info

Publication number: JP6612473B2
Application number: JP2018558109A
Authority: JP
Inventors: ナル・エメリッヒ・カルヒブレナー; アーロン・ヘラルト・アントニウス・ファン・デン・オールト
Original assignee: ディープマインドテクノロジーズリミテッド
Priority date: 2016-01-25
Filing date: 2017-01-25
Publication date: 2019-11-27
Anticipated expiration: 2037-01-25
Also published as: EP3380992B1; WO2017132288A1; JP2019504433A; KR102185865B1; CN116468815A; EP3380992A1; CN108701249A; KR20180105694A; CN108701249B

Description

関連出願の相互参照
本出願は、2016年1月25日に出願された、米国仮特許出願第62/286,915号への優先権を主張する。以前の出願の開示は、その全体が、本出願の部分と考えられ、本出願の開示に参照により組み込まれる。 This application claims priority to US Provisional Patent Application No. 62 / 286,915, filed Jan. 25, 2016. The disclosure of the previous application is considered in its entirety as part of this application and is incorporated by reference into the disclosure of this application.

本出願は、ニューラルネットワークを使用した画像生成に関する。 The present application relates to image generation using neural networks.

ニューラルネットワークは、非線形ユニットの1つまたは複数の層を採用し、受け取った入力についての出力を予測する、機械学習モデルである。いくつかのニューラルネットワークは、出力層に加えて、1つまたは複数の隠れ層を含む。各隠れ層の出力は、ネットワーク中の次の層、すなわち、次の隠れ層または出力層への入力として使用される。ネットワークの各層は、それぞれのパラメータの組の現在の値にしたがって、受け取った入力から出力を生成する。 A neural network is a machine learning model that employs one or more layers of nonlinear units and predicts the output for a received input. Some neural networks include one or more hidden layers in addition to the output layer. The output of each hidden layer is used as an input to the next layer in the network, i.e., the next hidden layer or output layer. Each layer of the network generates an output from the received input according to the current value of the respective parameter set.

いくつかのニューラルネットワークは、再帰型ニューラルネットワークである。再帰型ニューラルネットワークは、入力シーケンスを受け取り、入力シーケンスから出力シーケンスを生成するニューラルネットワークである。特に、再帰型ニューラルネットワークは、現在の時間ステップにおける出力の計算において、以前の時間ステップからの、ネットワークの内部状況の一部または全部を使用することができる。 Some neural networks are recursive neural networks. A recursive neural network is a neural network that receives an input sequence and generates an output sequence from the input sequence. In particular, recursive neural networks can use some or all of the internal state of the network from previous time steps in calculating the output at the current time step.

再帰型ニューラルネットワークの例は、長-短期記憶(LSTM: Long Short-Term Memory)ニューラルネットワークであり、これは1つまたは複数のLSTMメモリブロックを含む。各LSTMメモリブロックは1つまたは複数のセルを含むことができ、各々のセルが、たとえば、現在のアクティベーションを生成するのに使用するためセルについての以前の状況をセルが記憶することを可能にするか、またはLSTMニューラルネットワークの他の構成要素にセルが提供されることを可能にする、入力ゲート、忘却ゲート、および出力ゲートを含む。 An example of a recursive neural network is a Long Short-Term Memory (LSTM) neural network, which includes one or more LSTM memory blocks. Each LSTM memory block can contain one or more cells, allowing each cell to store the previous status of the cell for use, for example, to generate a current activation Or an input gate, a forgetting gate, and an output gate that allow the cell to be provided to other components of the LSTM neural network.

http://www.jmlr.org/papers/volume15/vandenoord14a/vandenoord14a.pdfで入手可能な、Aaron van den Oord、およびBenjamin Schrauwen、「The Student-t Mixture as a Natural Image Patch Prior with Application to Image Compression」Aaron van den Oord and Benjamin Schrauwen, `` The Student-t Mixture as a Natural Image Patch Prior with Application to Image Compression, available at http://www.jmlr.org/papers/volume15/vandenoord14a/vandenoord14a.pdf "

本出願は、1つまたは複数の場所の1つまたは複数のコンピュータ上にコンピュータプログラムとして実装されるシステムが、どのようにして、ニューラルネットワーク入力から出力画像を生成することができるのかを記載する。 This application describes how a system implemented as a computer program on one or more computers at one or more locations can generate an output image from neural network input.

1つまたは複数のコンピュータのシステムにとって、特定の動作または行為を実施するように構成されるということは、動作において、システムに動作または行為を実施させる、ソフトウェア、ファームウェア、ハードウェア、またはそれらの組合せをシステムがインストールしたことを意味する。1つまたは複数のコンピュータプログラムにとって、特定の動作または行為を実施するように構成されるということは、データ処理装置によって実行されると、装置に動作または行為を実施させる命令を1つまたは複数のプログラムが含むことを意味する。 For a system of one or more computers, being configured to perform a particular operation or action means that in operation, software, firmware, hardware, or a combination thereof that causes the system to perform the operation or action Means that the system has been installed. For one or more computer programs, being configured to perform a particular operation or action means that one or more instructions, when executed by a data processing device, cause the device to perform the operation or action. Means that the program contains.

本明細書に記載される主題の特定の実施形態は、以下の利点のうちの1つまたは複数を実現するように実装することができる。本明細書に記載されるようなニューラルネットワークシステムは、ニューラルネットワーク入力から、より正確に画像を生成することができる。特に、出力画像中のピクセルについての色値を、連続値ではなく離散値としてモデル化することによって、ニューラルネットワークのトレーニングを改善することができる。すなわち、ニューラルネットワークをより迅速にトレーニングすることができ、トレーニングされたニューラルネットワークにより生成される出力画像の品質を向上させることができる。ピクセル毎、色値毎に、すなわち、所与のピクセルについての所与の色チャネルについての色値が、所与のピクセル内の以前のピクセルと任意の以前の色チャネルについての両方の色値を条件とするように出力画像を生成することによって、生成される出力画像の品質を改善することができる。本明細書に記載されるニューラルネットワークシステムを使用するこの方式で画像を生成することによって、既存のモデルに必要であった独立仮定を導入することなく、完全に一般的なピクセル間依存性をニューラルネットワークが捕捉することができる。 Particular embodiments of the subject matter described in this specification can be implemented to realize one or more of the following advantages. A neural network system as described herein can generate images more accurately from neural network inputs. In particular, the training of the neural network can be improved by modeling the color values for the pixels in the output image as discrete values rather than continuous values. That is, the neural network can be trained more quickly, and the quality of the output image generated by the trained neural network can be improved. Per pixel, per color value, i.e., the color value for a given color channel for a given pixel is the color value of both the previous pixel in the given pixel and any previous color channel. By generating the output image so as to satisfy the conditions, the quality of the generated output image can be improved. Generating images in this manner using the neural network system described herein neuralizes the general interpixel dependence completely without introducing the independent assumptions required for existing models. The network can be captured.

本明細書に記載される主題の1つまたは複数の実施形態の詳細は、添付図面および下の説明に記載される。本主題の他の特徴、態様、および利点は、説明、図面、および請求項から明らかとなるであろう。 The details of one or more embodiments of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the present subject matter will be apparent from the description, drawings, and claims.

例示的なニューラルネットワークシステムを示す図である。1 is a diagram illustrating an exemplary neural network system. FIG. ニューラルネットワーク入力から出力画像を生成するための例示的なプロセスの流れ図である。2 is a flowchart of an exemplary process for generating an output image from a neural network input. 出力画像中の所与のピクセルについての、所与の色チャネルについての色値を生成するための、例示的なプロセスの流れ図である。4 is a flow diagram of an example process for generating color values for a given color channel for a given pixel in an output image.

様々な図面における同様の参照番号および記号は、同様の要素を示す。 Like reference numbers and symbols in the various drawings indicate like elements.

図1は、例示的なニューラルネットワークシステム100を示す。ニューラルネットワークシステム100は、下に記載されるシステム、構成要素、および技法を実装することができる、1つまたは複数の場所における1つまたは複数のコンピュータ上のコンピュータプログラムとして実装されるシステムの例である。 FIG. 1 shows an exemplary neural network system 100. Neural network system 100 is an example of a system implemented as a computer program on one or more computers in one or more locations that can implement the systems, components, and techniques described below. is there.

ニューラルネットワークシステム100は、ニューラルネットワーク入力を受け取り、ニューラルネットワーク入力から出力画像を生成する。たとえば、ニューラルネットワークシステム100は、ニューラルネットワーク入力102を受け取り、ニューラルネットワーク102から出力画像152を生成することができる。 Neural network system 100 receives a neural network input and generates an output image from the neural network input. For example, the neural network system 100 can receive the neural network input 102 and generate an output image 152 from the neural network 102.

いくつかの実装形態では、ニューラルネットワークシステム100は、システムがトレーニングを受けた画像と同様の特徴を有する、画像の無損失圧縮または新しい画像の生成のために使用することができる。 In some implementations, the neural network system 100 can be used for lossless compression of images or generation of new images that have similar characteristics to the images for which the system has been trained.

特に、無損失圧縮では、ニューラルネットワーク入力は画像であってよく、ニューラルネットワークシステム100は、入力画像の再構築である出力画像を生成することができる。 In particular, with lossless compression, the neural network input may be an image, and the neural network system 100 can generate an output image that is a reconstruction of the input image.

ニューラルネットワークシステム100は、ここで、画像の算術符号化で使用するため、以下で記載するように、ニューラルネットワークシステム100の出力層により生成されるスコア分布の少なくとも一部を記憶することができる。算術符号化および復号化のための機械学習モデルにより生成されるスコア分布を使用するための例示的な技法は、http://www.jmlr.org/papers/volume15/vandenoord14a/vandenoord14a.pdfで入手可能な、Aaron van den Oord、およびBenjamin Schrauwen、「The Student-t Mixture as a Natural Image Patch Prior with Application to Image Compression」に記載される。 The neural network system 100 can now store at least a portion of the score distribution generated by the output layer of the neural network system 100 for use in arithmetic coding of images, as described below. An example technique for using score distributions generated by machine learning models for arithmetic encoding and decoding is available at http://www.jmlr.org/papers/volume15/vandenoord14a/vandenoord14a.pdf Possible Aaron van den Oord and Benjamin Schrauwen, “The Student-t Mixture as a Natural Image Patch Prior with Application to Image Compression”.

画像生成では、トレーニング期間に、ニューラルネットワーク入力が画像であってよく、ニューラルネットワークシステム100は、入力画像の再構築である出力画像を生成することができる。 In image generation, during the training period, the neural network input may be an image, and the neural network system 100 can generate an output image that is a reconstruction of the input image.

トレーニング後、ニューラルネットワークシステム100は、入力を条件とすることなく、ピクセル毎に出力画像を生成することができる。 After training, the neural network system 100 can generate an output image for each pixel without any input as a condition.

特に、所与の入力について、ニューラルネットワークシステム100は、2次元マップ中に配置される所定の数のピクセルを含み、各ピクセルが複数の色チャネルの各々についてそれぞれの色値を有する、出力画像を生成する。たとえば、ニューラルネットワークシステム100は、赤色チャネル、緑色チャネル、および青色チャネルを含む画像を生成することができる。異なる例として、ニューラルネットワークシステム100は、シアン色チャネル、マゼンタ色チャネル、イエロー色チャネル、およびブラック色チャネルを含む画像を生成することができる。複数の色チャネルは、たとえば、赤、緑、そして青、または、青、赤、そして緑といった、所定の順序にしたがって配置される。 In particular, for a given input, the neural network system 100 includes an output image that includes a predetermined number of pixels arranged in a two-dimensional map, each pixel having a respective color value for each of a plurality of color channels. Generate. For example, the neural network system 100 can generate an image that includes a red channel, a green channel, and a blue channel. As a different example, the neural network system 100 can generate an image that includes a cyan color channel, a magenta color channel, a yellow color channel, and a black color channel. The plurality of color channels are arranged according to a predetermined order, for example, red, green, and blue, or blue, red, and green.

一般的に、ニューラルネットワークシステム100は、出力画像から取られるピクセルのシーケンスに、ピクセル毎に、出力画像中の色値を生成する。すなわち、ニューラルネットワークシステム100は、出力画像中のピクセルをシーケンスへと順序づけ、次いで、シーケンスにしたがう順序で、1つずつ出力画像中の各ピクセルについての色値を生成する。 In general, the neural network system 100 generates a color value in the output image for each pixel in a sequence of pixels taken from the output image. That is, the neural network system 100 orders the pixels in the output image into a sequence and then generates a color value for each pixel in the output image one by one in the order according to the sequence.

たとえば、シーケンスは、出力画像の左上の角で始まり、出力画像にわたって行毎に進み、シーケンス中の最後のピクセルは、出力画像の右下の角のピクセルであってよい。この例では、ニューラルネットワークシステム100は、左上の角のピクセルについての色値を最初に生成し、次いで、画像の上の行中の次のピクセルへと進む。 For example, the sequence starts at the upper left corner of the output image and proceeds row by row across the output image, and the last pixel in the sequence may be the pixel at the lower right corner of the output image. In this example, the neural network system 100 first generates color values for the upper left corner pixel and then proceeds to the next pixel in the upper row of the image.

特に、出力画像中の所与のピクセルの所与の色チャネルについて、ニューラルネットワークシステム100は、(i)シーケンス中のピクセルの前のピクセルについての色値、および(ii)色チャネルの順序における色チャネルの前の任意の色チャネルについての、ピクセルについての色値を条件とした、所与のピクセルの色チャネルについての色値を生成する。トレーニング期間、または画像圧縮では、出力画像がニューラルネットワーク入力、すなわち入力画像の再構築であるために、これらの色値は、出力画像からではなく、入力画像中の対応するピクセルからとることができる。 In particular, for a given color channel for a given pixel in the output image, the neural network system 100 determines (i) the color value for the pixel before the pixel in the sequence, and (ii) the color in the order of the color channel. Generate a color value for the color channel of a given pixel, conditional on the color value for the pixel for any color channel before the channel. In training periods, or image compression, these color values can be taken from the corresponding pixels in the input image, not from the output image, because the output image is a neural network input, ie, reconstruction of the input image. .

特に、ニューラルネットワークシステム100は、1つまたは複数の初期ニューラルネットワーク層110および1つまたは複数の出力層120を含む。 In particular, neural network system 100 includes one or more initial neural network layers 110 and one or more output layers 120.

出力画像中の所与のピクセルの所与の色チャネルについての所与の色値が生成された後、初期ニューラルネットワーク層110は、現在の出力画像の代替表現を生成するために、現在の出力画像、すなわち、出力画像について既に生成された色値を含む出力画像を処理するように構成される。 After a given color value for a given color channel for a given pixel in the output image has been generated, the initial neural network layer 110 can generate a current output to generate an alternative representation of the current output image. The image is configured to process an output image that includes color values that have already been generated for the output image.

たとえば、初期ニューラルネットワーク層110は、現在の出力画像140の代替表現142を生成するため、現在の出力画像140を処理することができる。 For example, the initial neural network layer 110 can process the current output image 140 to generate an alternative representation 142 of the current output image 140.

図1に示されるように、現在の出力画像140の影付き部分は、ニューラルネットワークシステム100によって色値が既に生成されたピクセルを表し、一方、現在の出力画像140の影付きでない部分は、色値がまだ生成されていないピクセルを表す。 As shown in FIG. 1, the shaded portion of the current output image 140 represents pixels whose color values have already been generated by the neural network system 100, while the non-shaded portion of the current output image 140 is colored. Represents a pixel whose value has not yet been generated.

1つまたは複数の出力層120は、代替表現を受け取り、画像中の次の色チャネルについての離散的な可能な色値の組にわたってスコア分布を生成する。たとえば、離散的な可能な色値の組は、ゼロから255までのゼロと255を含む整数の組であってよく、スコア分布は、組中の整数の各々についてのそれぞれのスコアを含む。スコア分布中のスコアは、各々の可能なピクセル値について、尤度、たとえば実施するようにシステムが構成されるタスクについてピクセル値が所与の色チャネルの値となるべきである確率を表すことができる。 One or more output layers 120 receive the alternative representation and generate a score distribution over a set of discrete possible color values for the next color channel in the image. For example, the set of discrete possible color values may be a set of integers including zero and 255 from zero to 255, and the score distribution includes a respective score for each of the integers in the set. The score in the score distribution represents, for each possible pixel value, the likelihood, for example, the probability that the pixel value should be the value of a given color channel for the task that the system is configured to perform. it can.

上で言及された所与の色チャネルが色チャネルの所定の順序中の最後の色チャネルである場合、出力層120は、所与のピクセル後の、シーケンス中の次のピクセル中の第1の色チャネルについて、スコア分布を生成する。図1の例では、出力層120は、出力画像140中の次のピクセル142の第1の色チャネルについて、スコア分布146を生成する。 If the given color channel referred to above is the last color channel in the predetermined order of the color channels, the output layer 120 will receive the first in the next pixel in the sequence after the given pixel. A score distribution is generated for the color channel. In the example of FIG. 1, the output layer 120 generates a score distribution 146 for the first color channel of the next pixel 142 in the output image 140.

上で言及された所与の色チャネルが所定の順序中の最後の色チャネルでない場合、出力層120は、所与のピクセルについての色チャネルの順序中の所与の色チャネルの後の次の色チャネルについて、スコア分布を生成する。たとえば、色チャネルの順序が赤、緑、そして青であり、生成された最後の色値が所与のピクセルの緑色チャネルについてであった場合、出力層120により生成されるスコア分布は、所与のピクセルの青色チャネルについてのスコア分布である。 If the given color channel referred to above is not the last color channel in a given order, the output layer 120 will be the next after the given color channel in the order of the color channels for the given pixel. A score distribution is generated for the color channel. For example, if the color channel order is red, green, and blue, and the last color value generated was for the green channel of a given pixel, the score distribution generated by output layer 120 is given by Is the score distribution for the blue channel of the pixels.

いくつかの実施形態では、ニューラルネットワークシステム100は、色チャネルのすべてについてのスコア分布を生成する、たとえば単一のsoftmax層といった、単一の出力層を含む。 In some embodiments, the neural network system 100 includes a single output layer, such as a single softmax layer, that generates a score distribution for all of the color channels.

いくつかの他の実施形態では、ニューラルネットワークシステム100は、色チャネルの各々に対応する、たとえばそれぞれのsoftmax層といった、それぞれの出力層を含み、各出力層は、対応する色チャネルについてのスコア分布を生成する。 In some other embodiments, the neural network system 100 includes a respective output layer corresponding to each of the color channels, eg, a respective softmax layer, each output layer having a score distribution for the corresponding color channel. Is generated.

いくつかの実施形態では、下でより詳細に記載されるように、代替表現は、出力画像中の各ピクセルの各色チャネルについての特徴を含むフィーチャマップである。これらの実装形態では、所与のピクセルの所与のチャネルについての色値を生成するとき、出力層は、代替表現の対応する部分を使用する。すなわち、所与のピクセルの所与の色チャネルの特徴を含む代替表現の部分を使用する。 In some embodiments, as described in more detail below, the alternative representation is a feature map that includes features for each color channel of each pixel in the output image. In these implementations, when generating color values for a given channel for a given pixel, the output layer uses the corresponding portion of the alternative representation. That is, use the portion of the alternative representation that contains the features of the given color channel for the given pixel.

ニューラルネットワークシステム100は、次いで、生成されたスコア分布から、現在の色チャネル、すなわち、所与のピクセル後のシーケンス中の次のピクセルにおける第1の色チャネル、または所与のピクセルについての色チャネルの順序における所与の色チャネル後の次の色チャネルについての値を選択する。たとえば、ニューラルネットワークシステム100は、スコア分布にしたがって色値をサンプリングすること、またはスコア分布にしたがって最も高いスコアの色値を選択することができる。 The neural network system 100 then determines from the generated score distribution the current color channel, i.e. the first color channel in the next pixel in the sequence after the given pixel, or the color channel for the given pixel. Select the value for the next color channel after a given color channel in the order. For example, the neural network system 100 can sample the color values according to the score distribution or select the color value with the highest score according to the score distribution.

初期ニューラルネットワーク層110は、層110が、現在の出力画像を条件とした、すなわち、ニューラルネットワークシステム100により生成されていない出力画像中の任意の色値を条件としない、代替表現を生成することを可能にする様々な方法のいずれかで構成することができる。 The initial neural network layer 110 generates an alternative representation where the layer 110 is conditional on the current output image, i.e. is not conditional on any color value in the output image not generated by the neural network system 100. Can be configured in any of a variety of ways.

いくつかの実装形態では、初期ニューラルネットワーク層110は、初期ニューラルネットワーク層110への入力の空間解像度を各々が保持する複数の畳み込みニューラルネットワーク層からなる、完全な畳み込みニューラルネットワークである。すなわち、初期ニューラルネットワーク層110への入力の空間解像度と畳み込みニューラルネットワーク層の各々の出力は、同じ空間解像度を有する、すなわち、出力画像と同じ数のピクセルを有する一方、畳み込みニューラルネットワーク層により各ピクセルについて生成される特徴の数は変わることができる。 In some implementations, the initial neural network layer 110 is a complete convolutional neural network consisting of a plurality of convolutional neural network layers each holding the spatial resolution of the input to the initial neural network layer 110. That is, the spatial resolution of the input to the initial neural network layer 110 and the output of each of the convolutional neural network layers have the same spatial resolution, i.e., have the same number of pixels as the output image, while each pixel by the convolutional neural network layer. The number of features generated for can vary.

しかし、処理の全体にわたって、ネットワーク中のあらゆる層における、各入力位置について、すなわち各ピクセルでの特徴は、複数の部分へと分割され、各々が色チャネルの1つに対応する。 However, throughout the process, the features at each input location, ie, at each pixel, at every layer in the network are divided into multiple portions, each corresponding to one of the color channels.

こうして、初期ニューラルネットワーク層110により生成される代替表現は、所与のピクセルについての色チャネル値の各々についてのそれぞれの部分を含み、所与の色チャネルについてのスコア分布を生成するとき、出力層120は、所与の色チャネルに対応する部分を処理するように構成される。 Thus, the alternative representation generated by the initial neural network layer 110 includes a respective portion for each of the color channel values for a given pixel, and when generating a score distribution for the given color channel, the output layer 120 is configured to process a portion corresponding to a given color channel.

畳み込みニューラルネットワークが既に生成された出力値だけを条件とすることを確実にするために、所与のピクセルについての所与の色チャネルに対応する代替表現の部分が、(i)シーケンス中のピクセルの前にある出力画像中のピクセル、および(ii)色チャネルの順序における所与の色チャネルの前の色チャネルについてのピクセルについての色チャネルデータにだけ基づいて生成されるようにマスクされる畳み込みを適用するように各畳み込みニューラルネットワーク層が構成される。 To ensure that the convolutional neural network is conditioned only on output values that have already been generated, the portion of the alternative representation corresponding to a given color channel for a given pixel is (i) a pixel in the sequence And (ii) convolution masked to be generated based solely on color channel data for pixels for the color channel prior to a given color channel in the order of color channels Each convolutional neural network layer is configured to apply

第1の畳み込み層、すなわち、入力として現在の出力画像を受け取る層では、シーケンス中の所与のピクセルの前にある現在の出力画像中の隣接するピクセル、および既に生成された現在の出力画像中の対応するピクセルの色に対し、第1の畳み込み層の出力フィーチャマップ中の所与のピクセルへの接続をマスクが制限する。 In the first convolution layer, i.e. the layer that receives the current output image as input, the neighboring pixels in the current output image that precede the given pixel in the sequence, and in the current output image already generated The mask restricts the connection to a given pixel in the output feature map of the first convolution layer for the corresponding pixel color.

さらなる畳み込み層では、シーケンス中の所与のピクセルの前にあるさらなる畳み込み層に対する入力フィーチャマップ中の隣接するピクセル、既に生成された入力フィーチャマップ中の対応するピクセルの色に対応する特徴、および入力フィーチャマップ中の対応するピクセルの所与の色に対応する特徴に対し、さらなる畳み込み層の出力フィーチャマップ中の所与のピクセルにおける接続をマスクが制限する。 In a further convolution layer, adjacent pixels in the input feature map for the further convolution layer preceding a given pixel in the sequence, features corresponding to the color of the corresponding pixel in the already generated input feature map, and input For features corresponding to a given color of the corresponding pixel in the feature map, the mask limits connections at the given pixel in the output feature map of the further convolution layer.

ニューラルネットワークシステム100は、様々な方法のいずれかでこのマスキングを実装することができる。たとえば、各畳み込み層は、対応する重みがなくされたカーネルを有することができる。 The neural network system 100 can implement this masking in any of a variety of ways. For example, each convolution layer may have a corresponding weighted kernel.

いくつかの他の実装形態では、初期ニューラルネットワーク層110は、次々に積み重ねられて配置される複数のLSTM層を含む。畳み込みニューラルネットワーク層と同様に、LSTM層は、入力の空間次元を保持し、ネットワーク中のあらゆる層における各入力位置について各LSTM層によって生成された特徴は複数の部分へと分割され、各々が色チャネルの1つに対応する。 In some other implementations, the initial neural network layer 110 includes a plurality of LSTM layers arranged one after the other. Similar to the convolutional neural network layer, the LSTM layer preserves the spatial dimensions of the input, and the features generated by each LSTM layer for each input location in every layer in the network are divided into parts, each of which is a color Corresponds to one of the channels.

これらのLSTM層の各々は、すなわち、先行するLSTM層の隠れた状況または現在の出力画像といった、LSTM層に対する入力フィーチャマップへの畳み込みを適用して、入力対状況成分(Input-to-State Component)を生成し、層の先行する隠れた状況に畳み込みを適用して、状況対状況成分(State-to-State Recurrent Component)を生成する。LSTM層は、次いで、入力対状況成分および状況対状況成分からLSTM層についてのゲートの値を生成し、ゲート値および先行するセル状況から層についての更新した隠れた状況および更新したセル状況を生成する。 Each of these LSTM layers applies an input-to-state component by applying a convolution to the input feature map for the LSTM layer, i.e., the hidden state of the preceding LSTM layer or the current output image. ) And apply a convolution to the previous hidden situation of the layer to produce a State-to-State Recurrent Component. The LSTM layer then generates a gate value for the LSTM layer from the input vs. status component and the status vs. status component, and generates an updated hidden status and updated cell status for the layer from the gate value and preceding cell status To do.

これらの実装形態のいくつかでは、LSTM層は、入力フィーチャマップを上から下に行毎に処理して一度にすべての行についての特徴を計算する、行LSTM層である。 In some of these implementations, the LSTM layer is a row LSTM layer that processes the input feature map row by row from top to bottom and calculates features for all rows at once.

すなわち、入力フィーチャマップの各行について、行LSTM層は、入力フィーチャマップ全体について行LSTM層の入力対状況成分を、たとえば1次元畳み込みを使用して計算するように構成され、入力フィーチャマップ全体について入力対状況成分を計算した後、入力フィーチャマップを上から下に行毎に処理して一度にすべての行についての特徴を計算することに、入力対状況成分を使用する。 That is, for each row of the input feature map, the row LSTM layer is configured to calculate the input versus status component of the row LSTM layer for the entire input feature map, for example using a one-dimensional convolution, and input for the entire input feature map After calculating the pairwise situation component, the input pairwise situation component is used to process the input feature map row by row from top to bottom to calculate features for all rows at once.

行LSTM層が、まだ生成されていない色値の出力を条件としないことを確実にするために、入力対状況成分を生成するため行LSTM層によって使用される畳み込みは、畳み込みニューラルネットワーク層について上で記載したようにマスクされる。 To ensure that the row LSTM layer is not conditioned on the output of color values that have not yet been generated, the convolution used by the row LSTM layer to generate the input-to-situation component is the same as for the convolutional neural network layer. Masked as described in.

これらの実装形態の他のものでは、LSTM層は、対角双方向LSTM(BiLSTM)層(Diagonal Bidirectional LSTM Layer)である。 In other of these implementations, the LSTM layer is a Diagonal Bidirectional LSTM Layer.

一般的に、双方向LSTM層は、一方の方向についての出力マップおよび他方の方向についての出力マップを生成し、2つの出力マップを組み合わせて、層についての最終的な出力マップを生成するように構成される。すなわち、双方向LSTM層は、2つの方向の各々について、状況対状況成分および入力対状況成分を計算し、次いで、各方向についての状況対状況成分および入力対状況成分からその方向についての出力マップを生成する。 In general, a bidirectional LSTM layer generates an output map for one direction and an output map for the other direction, and combines the two output maps to produce the final output map for the layer. Composed. That is, the bi-directional LSTM layer calculates the situation-to-situation component and the input-to-situation component for each of the two directions, and then the situation-to-situation component for each direction and the output map for that direction from the input-to-situation component Is generated.

特に、各対角BiLSTM層は、第1の方向に沿った対角様式(Diagonal Fashion)および第2の方向に沿った対角様式で入力フィーチャマップをスキャンし、層の出力フィーチャマップを生成するように構成される。 In particular, each diagonal BiLSTM layer scans the input feature map in a diagonal fashion along the first direction and a diagonal style along the second direction to generate an output feature map for the layer Configured as follows.

より詳細には、各対角BiLSTM層は、たとえば、入力フィーチャマップ中の各行を先行する行に対して1つの位置だけオフセットさせることによって、対角に沿って畳み込みを容易に適用することを可能にする空間へと入力フィーチャマップをスキューするように構成される。 More specifically, each diagonal BiLSTM layer can easily apply convolutions along the diagonal, for example, by offsetting each row in the input feature map by one position relative to the preceding row. Is configured to skew the input feature map into the space to be

2つの方向の各々について、対角BiLSTM層は、次いで、スキューした入力フィーチャマップに対して1x1畳み込みを適用することによって、本方向の対角BiLSTM層についての入力対状況成分を計算し、スキューした入力フィーチャマップに対して列方向畳み込みを適用することによって、本方向の対角BiLSTM層についての状況対状況成分を計算するように構成される。いくつかの実装形態では、列方向畳み込みは、2x1のサイズのカーネルを有する。 For each of the two directions, the diagonal BiLSTM layer then calculated and skewed the input versus situation component for the diagonal BiLSTM layer in the main direction by applying a 1x1 convolution to the skewed input feature map. It is configured to calculate the situation-to-situation component for the diagonal BiLSTM layer in the main direction by applying column-wise convolution to the input feature map. In some implementations, the column-wise convolution has a 2x1 size kernel.

対角BiLSTM層は、上に記載したような方向について状況対状況成分および入力対状況成分から、たとえば、左スキューした出力フィーチャマップおよび右スキューした出力フィーチャマップといった、各方向についてのスキューした出力フィーチャマップを生成し、オフセット位置を除去することによって、各スキューした出力フィーチャマップを入力フィーチャマップの空間次元に合致するように戻してスキューするようにさらに構成される。対角BiLSTM層は、次いで、右出力マップを1行だけ下にシフトし、シフトした右出力マップを左出力マップに加えて、層についての最終的な出力マップを生成する。 The diagonal BiLSTM layer is a skewed output feature for each direction, eg, a left-skewed output feature map and a right-skewed output feature map, from the situation-to-situation component and the input-to-situation component for the direction as described above. Each skewed output feature map is further configured to skew back to match the spatial dimension of the input feature map by generating a map and removing the offset location. The diagonal BiLSTM layer then shifts the right output map down one row and adds the shifted right output map to the left output map to produce the final output map for the layer.

行LSTM層の場合と同様に、状況対状況成分を生成するために対角BiLSTM層に適用される畳み込みは、上で記載したようにマスクすることもできる。 As with the row LSTM layer, the convolution applied to the diagonal BiLSTM layer to generate the situation versus situation component can also be masked as described above.

いくつかの実装形態では、初期ニューラルネットワーク層110は、入力として現在の出力画像を受け取り、シーケンス中の所与のピクセルの前にある現在の出力画像中の隣接するピクセル、および既に生成され、1つもしくは複数の行LSTM層または1つもしくは複数の対角BiLSTM層がその後に続く現在の出力画像中の対応するピクセルにおける色に対して、第1の畳み込み層の出力フィーチャマップ中の所与のピクセルにおける接続をマスクが制限する、第1の畳み込み層を含む。 In some implementations, the initial neural network layer 110 receives the current output image as input, an adjacent pixel in the current output image that precedes a given pixel in the sequence, and already generated, 1 For a color at the corresponding pixel in the current output image followed by one or more row LSTM layers or one or more diagonal BiLSTM layers for a given in the output feature map of the first convolution layer A first convolution layer is included where the mask limits connections at the pixels.

いくつかの実装形態では、初期ニューラルネットワーク層110は、層間のスキップ接続、層間の残差接続(Residual Connection)、またはその両方を含む。 In some implementations, the initial neural network layer 110 includes a skip connection between layers, a residual connection between layers, or both.

図2は、ニューラルネットワーク入力から出力画像を生成するための例示的なプロセス200の流れ図である。便宜上、プロセス200は、1つまたは複数の位置に置かれた1つまたは複数のコンピュータのシステムによって実施されると記載することとする。たとえば、図1のニューラルネットワークシステム100といった、適切にプログラムされたニューラルネットワークシステムは、プロセス200を実施することができる。 FIG. 2 is a flowchart of an exemplary process 200 for generating an output image from a neural network input. For convenience, the process 200 will be described as being performed by a system of one or more computers located at one or more locations. For example, a properly programmed neural network system, such as the neural network system 100 of FIG.

プロセス200をニューラルネットワークのトレーニング期間に実施して、出力画像を生成することができる。たとえば、プロセス200は、トレーニングプロセスのフォワードパスであってよい。プロセス200は、ニューラルネットワーク入力、すなわち入力画像を圧縮することの部分として実施することもできる。 Process 200 may be performed during a neural network training period to generate an output image. For example, process 200 may be a forward pass of the training process. Process 200 can also be implemented as part of compressing a neural network input, ie, an input image.

システムがニューラルネットワーク入力を受け取る(ステップ202)。上で記載したように、ニューラルネットワーク入力は入力画像であってよい。 The system receives neural network input (step 202). As described above, the neural network input may be an input image.

システムは、出力画像から取られるピクセルのシーケンスに、ニューラルネットワーク入力からピクセル毎に出力画像を生成する(ステップ204)。すなわち、システムは、シーケンスにしたがう順序で、出力画像中の各ピクセルについて1つずつ色値を生成し、その結果、シーケンス中のより早いピクセルについての色値はシーケンス中のより遅い色値の前に生成される。各ピクセル内で、システムは、色チャネルの所定の順序にしたがって、ピクセルの色チャネルについての色値を1つずつ生成する。特に、システムは、(i)シーケンス中のピクセルの前のピクセルについてのピクセルについての色値、および(ii)色チャネルの順序における色チャネルの前の任意の色チャネルについてのピクセルについての色値を条件とした、各ピクセルについての各色値を生成する。トレーニング期間、または画像圧縮では、出力画像がニューラルネットワーク入力、すなわち入力画像の再構築であるために、これらの色値は、出力画像からではなく、入力画像中の対応するピクセルからとることができる。 The system generates an output image for each pixel from the neural network input into a sequence of pixels taken from the output image (step 204). That is, the system generates one color value for each pixel in the output image, in order according to the sequence, so that the color value for the earlier pixel in the sequence precedes the later color value in the sequence. Is generated. Within each pixel, the system generates one color value for each color channel of the pixels, according to a predetermined order of color channels. In particular, the system provides (i) a color value for a pixel for the pixel before the pixel in the sequence, and (ii) a color value for the pixel for any color channel before the color channel in the order of color channels. Each color value for each pixel is generated as a condition. In training periods, or image compression, these color values can be taken from the corresponding pixels in the input image, not from the output image, because the output image is a neural network input, ie, reconstruction of the input image. .

図3は、出力画像中の所与のピクセルの所与の色チャネルについての色値を生成するための、例示的なプロセス300の流れ図である。便宜上、プロセス300は、1つまたは複数の位置に置かれた1つまたは複数のコンピュータのシステムによって実施されると記載することとする。たとえば、図1のニューラルネットワークシステム100といった、適切にプログラムされたニューラルネットワークシステムは、プロセス300を実施することができる。 FIG. 3 is a flow diagram of an example process 300 for generating color values for a given color channel for a given pixel in the output image. For convenience, process 300 will be described as being performed by a system of one or more computers located at one or more locations. For example, a properly programmed neural network system, such as neural network system 100 of FIG.

システムは、初期ニューラルネットワーク層を通して、現在の出力画像を処理し、代替表現を生成する(ステップ302)。現在の出力画像は、シーケンス中の所与のピクセルの前のピクセルについての色チャネルの各々についての色値、および所与のピクセルについての順序における所与の色チャネルの前の任意の色チャネルについての色値を含む画像である。上で記載したように、初期ニューラルネットワーク層は、マスクされる畳み込みを適用するように構成され、その結果、代替表現は、既に生成された色値を条件とし、まだ生成されていない何らかの色値を条件としない。 The system processes the current output image through the initial neural network layer and generates an alternative representation (step 302). The current output image is the color value for each of the color channels for the pixel before the given pixel in the sequence, and for any color channel before the given color channel in order for the given pixel. It is an image including the color value of. As described above, the initial neural network layer is configured to apply masked convolution so that the alternative representation is conditional on already generated color values and any color values that have not yet been generated. Is not a requirement.

代替表現は、所与のピクセルの色チャネルの各々に対応するそれぞれの部分を含む。 The alternative representation includes a respective portion corresponding to each of the color channels of a given pixel.

システムは、所与の色チャネルに対応する、たとえばsoftmax層といった出力層を使用して、所与の色チャネルに対応する代替表現の部分を処理し、所与の色チャネルについての可能な色値にわたってスコア分布を生成する(ステップ304)。上で記載したように、いくつかの実装形態では、単一の出力層が、色チャネルのすべてに対応する一方、他の実装形態では、各色チャネルが異なって対応する出力層を有する。 The system uses the output layer corresponding to the given color channel, for example the softmax layer, to process the portion of the alternative representation corresponding to the given color channel, and possible color values for the given color channel A score distribution is generated over (step 304). As described above, in some implementations, a single output layer corresponds to all of the color channels, while in other implementations, each color channel has a different and corresponding output layer.

システムは、たとえば、最も高いスコアの色チャネルを選択すること、またはスコア分布からサンプリングすることによって、スコア分布を使用して所与のピクセルの所与の色チャネルについての色値を選択する(ステップ306)。 The system uses the score distribution to select a color value for a given color channel for a given pixel, for example by selecting the highest-scoring color channel or sampling from the score distribution (step 306).

システムは、出力画像中の各ピクセルの各色チャネルについてプロセス300を繰り返して、出力画像中の各ピクセルについての色値を生成することができる。 The system can repeat process 300 for each color channel for each pixel in the output image to generate a color value for each pixel in the output image.

システムは、所望の出力、すなわち入力のためシステムによって生成されなければならない出力画像が知られていないニューラルネットワーク入力について、プロセス200および300を実施することができる。 The system can perform processes 200 and 300 for the desired output, ie, the neural network input for which the output image that must be generated by the system for input is not known.

システムは、初期ニューラルネットワーク層、および出力層がパラメータを有する場合には出力層をトレーニングするため、すなわち初期ニューラルネットワーク層および任意選択で出力層のパラメータのためのトレーニングした値を決定するために、トレーニングデータの組、すなわち、システムによって生成されなければならない出力画像が知られている入力の組で、ニューラルネットワーク入力上でプロセス200および300を実施することもできる。プロセス200および300は、たとえば確率的勾配降下ならびに逆伝播技法といった、初期ニューラルネットワーク層をトレーニングするための、従来型の機械学習トレーニング技法の部分として、トレーニングデータの組から選択される入力に繰り返して実施することができる。 The system trains the initial neural network layer and, if the output layer has parameters, the output layer, i.e., determines the trained values for the initial neural network layer and optionally the parameters of the output layer, Processes 200 and 300 can also be performed on neural network inputs with a set of training data, ie, a set of inputs for which the output images that must be generated by the system are known. Processes 200 and 300 are repeated for inputs selected from a training data set as part of a conventional machine learning training technique for training initial neural network layers, such as stochastic gradient descent and back propagation techniques, for example. Can be implemented.

トレーニング期間には、生成されなければならない出力画像が前もって知られているために、所与のトレーニングするニューラルネットワーク入力を処理するのに必要な時間の量および計算リソースを減少させるため、したがって、トレーニングに必要な時間を減らすこと、トレーニングしたニューラルネットワークの性能を改善すること、またはその両方のため、初期ニューラルネットワーク層によって実施される計算を加速することができる。 During the training period, the output image that must be generated is known in advance, thus reducing the amount of time and computational resources required to process a given training neural network input, and therefore training The computations performed by the initial neural network layer can be accelerated in order to reduce the time required for training, improve the performance of the trained neural network, or both.

たとえば、初期ニューラルネットワーク層が完全な畳み込みニューラルネットワークであるとき、初期ニューラルネットワーク層が代替表現を生成するのに必要な処理は、すべての出力画像が計算の最初から入手可能であるために、逐次ではなく並列に行うことができる。すなわち、上で記載したように、システムは、既に生成された出力画像ピクセルの色値の代わりに、入力画像についての色値を使用することができる。畳み込みがマスクされるために、システムは、入力画像に基づいてすべての代替表現を並列で生成することができる。 For example, when the initial neural network layer is a complete convolutional neural network, the processing required for the initial neural network layer to generate an alternative representation is sequential because all output images are available from the beginning of the computation. Can be done in parallel. That is, as described above, the system can use the color value for the input image instead of the color value of the output image pixel already generated. Because the convolution is masked, the system can generate all alternative representations in parallel based on the input image.

本明細書に記載される主題および機能的な動作の実施形態は、本明細書に開示される構造およびそれらの構造的な等価物を含む、デジタル電子回路、有形に具現化されたコンピュータソフトウェアもしくはファームウェア、コンピュータハードウェア、またはそれらのうちの1つまたは複数の組合せで実装することができる。 Embodiments of the subject matter and functional operations described herein include digital electronic circuits, tangibly embodied computer software, or structures that include the structures disclosed herein and their structural equivalents. It can be implemented in firmware, computer hardware, or a combination of one or more thereof.

本明細書に記載される主題の実施形態は、1つまたは複数のコンピュータプログラム、すなわち、データ処理装置が実行するための、またはデータ処理装置の動作を制御するための、有形で非一時的プログラム担体上に符号化されるコンピュータプログラム命令の1つまたは複数のモジュールとして実装することができる。代替または追加として、プログラム命令は、データ処理装置が実行するため好適な受信装置に送信するために情報を符号化するために生成される、たとえば機械生成した電子、光、または電磁信号といった人工的に生成した伝播信号上で符号化することができる。コンピュータ記憶媒体は、機械可読記憶デバイス、機械可読記憶基板、ランダムもしくはシリアルアクセスメモリデバイス、またはそれらのうちの1つまたは複数の組合せであってよい。 Embodiments of the subject matter described herein are directed to one or more computer programs, i.e., a tangible, non-transitory program, for execution by a data processing device or for controlling operation of a data processing device. It can be implemented as one or more modules of computer program instructions encoded on a carrier. Alternatively or additionally, the program instructions are generated to encode information for transmission to a suitable receiving device for execution by a data processing device, such as an artificially generated electronic, optical, or electromagnetic signal. Can be encoded on the generated propagation signal. The computer storage medium may be a machine readable storage device, a machine readable storage substrate, a random or serial access memory device, or a combination of one or more thereof.

「データ処理装置」という用語は、例として、プログラム可能プロセッサ、コンピュータ、または複数のプロセッサもしくはコンピュータを含む、データを処理するための、すべての種類の装置、デバイス、および機械を包含する。装置は、たとえば、FPGA(フィールドプログラム可能ゲートアレイ)、またはASIC(特定用途向け集積回路)といった、専用論理回路を含むことができる。装置は、ハードウェアに加えて、たとえば、プロセッサファームウェア、プロトコルスタック、データベース管理システム、オペレーティングシステム、またはそれらのうちの1つもしくは複数の組合せを構成するコードといった、対象となるコンピュータプログラムのための実行環境を作るコードを含むこともできる。 The term “data processing apparatus” encompasses all types of apparatus, devices, and machines for processing data, including, by way of example, a programmable processor, a computer, or multiple processors or computers. The device can include dedicated logic circuitry, for example, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). In addition to hardware, the device executes for the subject computer program such as, for example, processor firmware, protocol stack, database management system, operating system, or code comprising one or more combinations thereof. It can also contain code that creates the environment.

コンピュータプログラム(プログラム、ソフトウェア、ソフトウェアアプリケーション、モジュール、ソフトウェアモジュール、スクリプト、もしくはコードと呼ぶまたは記載することもある)は、コンパイル型もしくはインタープリタ型言語、または宣言型もしくは手続き型言語など、プログラミング言語の任意の形式で書くことができ、スタンドアロンプログラムとして、またはモジュール、構成要素、サブルーチン、もしくはコンピューティング環境で使用するのに好適な他のユニットとしてなど、任意の形式で展開することができる。コンピュータプログラムは、ファイルシステム中のファイルに対応してよいが、対応する必要はない。プログラムは、たとえばマークアップ言語文書中に記憶される1つもしくは複数のスクリプトといった他のプログラムもしくはデータを保持するファイルの一部、対象となるプログラム専用の単一のファイル、またはたとえば1つもしくは複数のモジュール、サブプログラム、もしくはコードの部分を記憶するファイルといった、複数の調整したファイルの中に記憶することができる。コンピュータプログラムは、1つのコンピュータ、または1つの場所に配置される、もしくは複数の場所にわたって分散されて通信ネットワークによって相互接続される複数のコンピュータ上で実行されるように展開することができる。 A computer program (called or described as a program, software, software application, module, software module, script, or code) can be any programming language, such as a compiled or interpreted language, or a declarative or procedural language And can be deployed in any form, such as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may correspond to a file in a file system, but need not correspond. A program can be part of a file that holds other programs or data, for example one or more scripts stored in a markup language document, a single file dedicated to the program of interest, or for example one or more Can be stored in a plurality of tailored files, such as files that store modules, subprograms, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers located at one location or distributed across multiple locations and interconnected by a communication network.

本明細書に記載されるプロセスおよび論理の流れは、入力データに演算することおよび出力を生成することによって機能を実施するために、1つまたは複数のコンピュータプログラムを実行する1つまたは複数のプログラム可能コンピュータによって実施することができる。プロセスおよび論理の流れを、たとえば、FPGA(フィールドプログラム可能ゲートアレイ)、またはASIC(特定用途向け集積回路)といった、専用論理回路によって実施することもでき、また装置を、たとえば、FPGA、またはASICといった、専用論理回路として実装することもできる。 The process and logic flows described herein are one or more programs that execute one or more computer programs to perform functions by operating on input data and generating output. It can be implemented by a computer capable. Processes and logic flows can also be implemented by dedicated logic circuits, such as, for example, FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can be, for example, FPGA, or ASIC It can also be implemented as a dedicated logic circuit.

コンピュータプログラムの実行のために好適なコンピュータは、例として、汎用もしくは専用マイクロプロセッサまたは両方に基づくことができ、または任意の他の種類の中央処理装置を含む。一般的に、中央処理装置は、読取り専用メモリまたはランダムアクセスメモリまたは両方から、命令およびデータを受け取る。コンピュータの本質的な要素は、命令を実施または実行するための中央処理装置、ならびに命令およびデータを記憶するための1つまたは複数のメモリデバイスである。一般的に、コンピュータは、たとえば、磁気、光磁気ディスク、もしくは光ディスクといったデータを記憶するための1つもしくは複数の大容量記憶デバイスも含み、または、1つもしくは複数の大容量記憶デバイスからデータを受け取るもしくはデータを転送するもしくは両方をするように動作可能に結合されることになる。しかし、コンピュータがそのようなデバイスを有する必要はない。さらに、コンピュータは、いくつかの例を挙げれば、たとえば、モバイル電話、携帯情報端末(PDA)、モバイル音声もしくは動画プレイヤ、ゲームコンソール、全地球測位システム(GPS)受信器、または、たとえばユニバーサルシリアルバス(USB)フラッシュドライブといった携帯型記憶デバイスといった別のデバイスに組み込むことができる。 Computers suitable for the execution of computer programs can be based on, for example, general purpose or special purpose microprocessors or both, or include any other type of central processing unit. Generally, a central processing unit receives instructions and data from read-only memory or random access memory or both. The essential elements of a computer are a central processing unit for executing or executing instructions, and one or more memory devices for storing instructions and data. Generally, a computer also includes one or more mass storage devices for storing data, such as, for example, magnetic, magneto-optical disks, or optical disks, or receives data from one or more mass storage devices. It will be operably coupled to receive or transfer data or both. However, the computer need not have such a device. In addition, a computer may be, for example, a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus, for example. It can be incorporated into another device such as a portable storage device such as a (USB) flash drive.

コンピュータプログラム命令およびデータを記憶するために好適なコンピュータ可読媒体は、例として、たとえばEPROM、EEPROM、およびフラッシュメモリデバイスといった半導体メモリデバイス、たとえば内蔵ハードディスクまたはリムーバブルディスクといった磁気ディスク、光磁気ディスク、ならびにCD ROMおよびDVD-ROMディスクを含む、不揮発性メモリ、媒体、およびメモリデバイスのすべての形式を含む。プロセッサおよびメモリは、専用論理回路によって補うこと、または専用論理回路に組み込むことができる。 Computer readable media suitable for storing computer program instructions and data include, by way of example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices, magnetic disks such as internal hard disks or removable disks, magneto-optical disks, and CDs Includes all forms of non-volatile memory, media, and memory devices, including ROM and DVD-ROM discs. The processor and the memory can be supplemented by, or incorporated in, dedicated logic circuitry.

ユーザとの相互作用を実現するために、本明細書に記載される主題の実施形態は、ユーザに情報を表示するための、たとえばCRT(陰極線管)またはLCD(液晶ディスプレイ)モニタといった表示デバイス、ならびに、ユーザがコンピュータに入力を提供できる、キーボードおよびたとえばマウスまたはトラックボールといったポインティングデバイスを有するコンピュータ上に実装することができる。同様にユーザとの相互作用を実現するために、他の種類のデバイスを使用することができる。たとえば、ユーザに提供されるフィードバックは、たとえば、視覚的フィードバック、音響フィードバック、または触覚フィードバックといった任意の形式の感覚フィードバックであってよい。またユーザからの入力は、音響、音声、または触覚入力を含む任意の形式で受け取ることができる。加えて、コンピュータは、ユーザによって使用されるデバイスに文書を送信すること、およびデバイスから文書を受け取ること、たとえば、ウェブブラウザから受け取った要求に応答してユーザのクライアントデバイス上のウェブブラウザにウェブページを送信することによって、ユーザと相互作用することができる。 To achieve user interaction, embodiments of the subject matter described herein include a display device, such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user, As well, it can be implemented on a computer having a keyboard and a pointing device such as a mouse or trackball that allows the user to provide input to the computer. Similarly, other types of devices can be used to achieve user interaction. For example, the feedback provided to the user may be any form of sensory feedback, such as, for example, visual feedback, acoustic feedback, or haptic feedback. Also, input from the user can be received in any form including acoustic, voice, or haptic input. In addition, the computer sends a document to a device used by the user and receives a document from the device, eg, a web page on a web browser on the user's client device in response to a request received from the web browser By interacting with the user.

本明細書に記載される主題の実施形態は、たとえば、データサーバとしてバックエンド構成要素を含む、またはたとえば、アプリケーションサーバといったミドルウェア構成要素を含む、またはたとえば、グラフィカルユーザインターフェースを有するクライアントコンピュータもしくは本明細書に記載される主題の実装形態とユーザが相互作用できるウェブブラウザといったフロントエンド構成要素、または1つまたは複数のそのようなバックエンド、ミドルウェア、もしくはフロントエンド構成要素の任意の組合せを含む、コンピューティングシステムに実装することができる。システムの構成要素は、たとえば通信ネットワークといった、デジタルデータ通信の任意の形式または媒体によって相互接続することができる。通信ネットワークの例は、ローカルエリアネットワーク(「LAN」)およびたとえばインターネットといったワイドエリアネットワーク(「WAN」)を含む。 Embodiments of the subject matter described herein include, for example, a backend component as a data server, or include a middleware component such as, for example, an application server, or a client computer having a graphical user interface, for example. A computer comprising a front-end component such as a web browser with which a user can interact with an implementation of the subject matter described in the document, or any combination of one or more such back-ends, middleware, or front-end components. Can be implemented in a storage system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”) such as the Internet.

コンピューティングシステムは、クライアントおよびサーバを含むことができる。クライアントおよびサーバは、一般的に、互いに離れており、典型的には、通信ネットワークを通して相互作用する。クライアントとサーバの関係は、それぞれのコンピュータ上で走り、互いにクライアント-サーバ関係を有するコンピュータプログラムによって生じる。 The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The client-server relationship is caused by computer programs that run on each computer and have a client-server relationship with each other.

本明細書は多くの特定の実装上の詳細を含有するが、これらを、任意の発明または特許請求できるものの範囲の制限と考えるべきではなく、むしろ、特定の発明の特定の実施形態に固有であってよい特徴の記載と考えるべきである。別個の実施形態の文脈で本明細書に記載される特定の特徴は、単一の実施形態に組み合わせて実装することもできる。逆に、単一の実施形態の文脈で記載される様々な特徴を、複数の実施形態で別個に、または任意の好適な下位の組合せで実装することもできる。さらに、特徴は、特定の組合せで働くと上で記載され、そのように最初に特許請求されさえする場合があるが、特許請求される組合せからの1つまたは複数の特徴は、いくつかの場合では、組合せから削除することができ、特許請求される組合せは、下位の組合せまたは下位の組合せの変形形態を対象とすることができる。 This specification contains many specific implementation details, which should not be considered as limiting the scope of any invention or claimable, but rather specific to a particular embodiment of a particular invention. It should be considered as a description of possible features. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Further, features are described above as working in particular combinations, and may even be so claimed initially, but one or more features from the claimed combination may in some cases Then, it can be deleted from the combination, and the claimed combination can be directed to a sub-combination or a sub-combination variant.

同様に、動作は、特定の順序で図に描かれているが、このことを、望ましい結果を得るために、そのような動作が示される特定の順序で、もしくは一連の順序で実施されること、またはすべての説明した動作を実施することを必要とすると理解するべきでない。特定の事態では、マルチタスクおよび並列処理が有利な場合がある。さらに、上に記載される実施形態中の様々なシステムモジュールおよび構成要素の分割は、すべての実施形態においてそのような分割を必要とすると理解するべきでなく、記載されるプログラム構成要素およびシステムが、一般的に、単一のソフトウェア製品に一体化され得るか、または複数のソフトウェア製品へとパッケージ化され得ると理解するべきである。 Similarly, operations are illustrated in a particular order, but this may be performed in the particular order in which they are shown or in a sequence to obtain the desired result. Or should be understood to require performing all the described operations. In certain situations, multitasking and parallel processing may be advantageous. Further, the division of the various system modules and components in the embodiments described above should not be understood as requiring such divisions in all embodiments, and the program components and systems described are It should be understood that, in general, it can be integrated into a single software product or packaged into multiple software products.

本主題の特定の実施形態を記載してきた。他の実施形態は、以下の請求項の範囲内である。たとえば、請求項中で言及される行為を異なる順序で実施して、依然として望ましい結果を達成することができる。一例として、添付図面中に描かれるプロセスは、望ましい結果を達成するために、示される特定の順序または一連の順序を必ずしも必要としない。特定の実装形態では、マルチタスクおよび並列処理が有利な場合がある。 Particular embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. By way of example, the processes depicted in the accompanying drawings do not necessarily require the particular order or sequence shown to achieve the desired results. In certain implementations, multitasking and parallel processing may be advantageous.

100 ニューラルネットワークシステム
102 ニューラルネットワーク入力
110 初期ニューラルネットワーク層
120 出力層
140 現在の出力画像
142 代替表現、ピクセル
144 代替表現
146 スコア分布
152 出力画像
200 プロセス
300 プロセス 100 neural network system
102 Neural network input
110 Initial neural network layer
120 Output layer
140 Current output image
142 Alternative representation, pixel
144 Alternative expressions
146 Score distribution
152 Output image
200 processes
300 processes

Claims

A neural network system implemented by one or more computers, wherein the neural network system is configured to receive a neural network input and generate an output image from the neural network input, the output image being two-dimensional Including a plurality of pixels arranged in a map, each pixel having a respective color value for each of a plurality of color channels;
One or more initial neural network layers configured to receive the neural network input and process the neural network input to generate an alternative representation of the neural network input;
One or more output layers, including, for each pixel in the output image, generating a respective score distribution over a set of discrete possible color values for each of the plurality of color channels; A neural network system comprising: an output layer configured to receive the alternative representation and to generate the output image for each pixel from a sequence of pixels taken from the output image.

The plurality of color channels are ordered, and the one or more output layers include respective output layers corresponding to each of the plurality of color channels, each of the output layers for each pixel of the output image,
(i) a color value for a pixel for a pixel before the pixel in the sequence, and (ii) the for any color channel before the color channel corresponding to the output layer in the order of color channels. The configuration of claim 1, configured to generate the respective score distribution over a set of discrete possible color values for the color channel corresponding to the output layer, conditional on color values for pixels. Neural network system.

For each pixel, each of the output layers is (i) a color value for the pixel for the pixel before the pixel in the sequence, and (ii) the color corresponding to the output layer in the order of color channels. The neural network system of claim 2, configured to receive a portion of an alternate representation corresponding to the color channel as well as contextual information based on a color value for the pixel for any color channel prior to the channel. .

For the pixel for the color channel before the color channel corresponding to the output layer in the sequence in which the portion of the alternative representation corresponding to the color channel corresponds to the output layer in the order of the color channels The neural network system is configured to apply a mask to an output of a neural network layer in the one or more initial neural network layers so that the neural network system is generated based only on the color channel data of 3. The neural network system according to 3.

5. The neural network system according to claim 1, wherein each of the output layers is a softmax layer.

The neural network input is an image;
The one or more initial neural network layers include a row length-short term memory (LSTM) layer, the row LSTM layer comprising:
6. The neural network system according to any one of claims 1 to 5, configured to process the input image line by line from top to bottom and calculate features for all lines at once.

The neural network system of claim 6, wherein the row LSTM layer calculates the features using a one-dimensional convolution.

The row LSTM layer is
Calculate the input versus status component of the row LSTM layer for the entire input image;
After calculating the input versus situation component for the entire input image, use the input versus situation component to process the input image row by row from top to bottom to calculate features for all rows at once 8. A neural network system according to claim 6 or 7, configured to:

The neural network input is an image;
The one or more initial neural network layers include a diagonal bidirectional LSTM (BiLSTM) layer, and the diagonal BiLSTM layer
The input image map is configured to scan an input image map in a diagonal manner along a first direction and a diagonal manner along a second direction to generate features of the input image map. The neural network system according to any one of claims.

The diagonal BiLSTM layer is
Skewing the input image map into a space that allows easy application of convolution along a diagonal;
For each of the first direction and the second direction,
By applying a 1x1 convolution to the skewed input image map, calculating the input bi-situational component of the diagonal BiLSTM layer for the direction,
10. The neural network system of claim 9, configured to calculate a situation-to-situation recursive component of the diagonal BiLSTM layer for the direction by applying a column direction convolution to the skewed input image map. .

The neural network system of claim 10, wherein the column-wise convolution comprises a 2 × 1 size kernel.

The initial neural network layer includes a plurality of LSTM layers, and the plurality of LSTM layers are configured with residual connections from one LSTM layer to another LSTM layer in the plurality of LSTM layers. The neural network system according to any one of the above.

The input is an image;
13. The neural network system according to any one of claims 1 to 12, wherein the one or more initial neural network layers include one or more convolutional neural network layers.

14. The neural network system according to any one of claims 1 to 13, wherein the neural network input is an input image and the output image is a reconstructed version of the input image.

The neural network input is an input image, and the neural network system uses at least a portion of the score distribution for use in arithmetically encoding the input image for lossless compression of the input image. 15. A neural network system according to any one of claims 1 to 14, configured to store.

16. A neural network system according to any one of the preceding claims, wherein the pixels in the sequence of pixels are taken row by row from the output image.

Encoding with instructions that, when executed by one or more computers, cause the one or more computers to perform operations implementing each neural network system according to any one of claims 1 to 16. One or more computer recording media.

Receiving a neural network input;
17. A method comprising: processing a neural network input image using the neural network system according to any one of claims 1 to 16 and generating an output image from the neural network input.

A computer-implemented method for generating an output image from a neural network input, wherein the output image includes a plurality of pixels arranged in a two-dimensional map, each pixel having a respective color value for each of a plurality of color channels. Have
Receiving the neural network input at one or more initial network layers of the neural network system and processing the neural network input to generate an alternative representation of the neural network input;
One or more outputs of the neural network input, including, for each pixel in the output image, generating a respective score distribution over a set of discrete possible color values for each of the plurality of color channels Receiving the alternative representation at a layer and generating the output image for each pixel from a sequence of pixels taken from the output image.

20. A system comprising one or more computers configured to perform the method of claim 19.

When executed by one or more computers and at least one processor,
Receiving neural network input at one or more initial network layers of the neural network system and processing the neural network input to generate an alternative representation of the neural network input;
For each pixel in the output image, including generating a respective score distribution over a set of discrete possible color values for each of multiple color channels, one of said neural network input or output layer Receiving the alternative representation at and generating the output image for each pixel from a sequence of pixels taken from the output image, thereby generating an output image comprising a plurality of pixels arranged in a two-dimensional map. And at least one memory comprising computer program code that causes the apparatus to cause each pixel to have a respective color value for each of a plurality of color channels.

A method for generating an output image, wherein the output image includes a plurality of pixels arranged in a two-dimensional map, each pixel having a respective color value for each of a plurality of color channels,
Generating the output image pixel by pixel from a sequence of pixels taken from the output image;
The generating step comprises: for each color channel of each pixel in the output image
Processing a current output image using one or more initial neural network layers to generate an alternative representation, wherein the current output image is (i) prior to the pixel in the sequence; And (ii) only the color value for the pixel for any color channel before the color channel in the order of the color channels, and
Processing the alternative representation using an output layer to generate a score distribution over a set of discrete possible color values for the color channel.

23. A system comprising one or more computers configured to perform the method of claim 22.

23. One or more computer recording media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform the method of claim 22.