JP7325775B2

JP7325775B2 - Image processing system, method, and program

Info

Publication number: JP7325775B2
Application number: JP2021119564A
Authority: JP
Inventors: フレドリック・オット・マックス・フォルケ・ヘルツベルユ
Original assignee: シリコンスタジオ株式会社
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2023-08-15
Anticipated expiration: 2041-07-20
Also published as: JP2023015654A

Description

特許法第３０条第２項適用１．公開事実（１）ウェブサイトの掲載日令和２年９月４日（（２）（ア））、令和９月１７日（（２）（イ））（２）ウェブサイトのアドレス（ア）ｈｔｔｐｓ：／／ｃｅｄｅｃ．ｃｅｓａ．ｏｒ．ｊｐ／２０２０／ｓｅｓｓｉｏｎ／ｄｅｔａｉｌ／ｓ５ｅ８３２９ｅｄｆ１４２５．ｈｔｍｌ（イ）ｈｔｔｐｓ：／／ｗｗｗ．ｓｉｌｉｃｏｎｓｔｕｄｉｏ．ｃｏ．ｊｐ／ｒｄ／（３）公開者株式会社シリコンスタジオ（４）公開された発明の内容添付資料のとおり。（２）（ア）のウェブサイトにおいては、添付資料を使ってオンラインセッションを実施した。また、（２）（イ）のウェブサイトにおいては、添付資料をダウンロードできるようにリンクした。Application of Article 30, Paragraph 2 of the Patent Law 1. (1) Publication date of the website September 4, 2020 ((2) (a)), September 17, 2020 ((2) (a)) (2) Website address (a) ) https://cedec. cesa. or. jp/2020/session/detail/s5e8329edf1425. html (a) https://www. silicon studio. co. jp/rd/ (3) Publisher Silicon Studio, Inc. (4) Details of the disclosed invention As shown in the attachment. (2) On the website of (a), an online session was held using the attached materials. In addition, on the website of (2) (b), we have linked so that the attached materials can be downloaded.

本発明は、広く画像処理を行うためのシステム等に関し、より具体的には、ニューラルネットワークモデルを使ってレンダリング等のグラフィック処理を効率的に行うためのシステム等に関する。 The present invention relates generally to a system for performing image processing, and more specifically to a system for efficiently performing graphic processing such as rendering using a neural network model.

近年、ＡＩによって高品質な画像や映像を生成することができるようになり、人間の顔なども不自然さを感じさせない表現をすることができるようになった。例えば、非リアルタイムのアプリケーションでは、本物の写真との見分けがつかない画像を生成できる水準を有する。また、グラフィック処理においてＡＩ技術を適用しようとする試みもある。 In recent years, AI has made it possible to generate high-quality images and videos, and it has become possible to express human faces without feeling unnatural. For example, non-real-time applications have the ability to produce images that are indistinguishable from real photographs. There are also attempts to apply AI technology in graphics processing.

例えば、グラフィックスフレームの効率的な分散型ノイズ除去を実行する技術が提案されている（特許文献１）。 For example, a technique has been proposed for performing efficient distributed denoising of graphics frames (Patent Document 1).

すなわち、特許文献１には、レイトレーシングオペレーションを実行する複数のノードと、前記複数のノードにグラフィックスワークをディスパッチするディスパッチャノードであって、各ノードは、前記グラフィックスワークにより特定される画像フレームの領域をレンダリングするようにレイトレーシングを実行する、ディスパッチャノードと、前記複数のノードのうちの少なくとも第１のノードであって、前記画像フレームの第１の領域をレンダリングするようにレイトレーシングを実行するレイトレーシングレンダラと、前記第１の領域に関連付けられるデータと、前記第１の領域の外側の領域に関連付けられるデータとの組み合わせを用いて前記第１の領域のノイズ除去を実行するデノイザであって、前記第１の領域の外側の前記領域に関連付けられる前記データの少なくともいくつかは、少なくとも１つの他のノードから取り込まれる、デノイザとを有する少なくとも第１のノードとを備えるシステムが開示されている。 That is, in US Pat. No. 6,300,000, there are a plurality of nodes that perform ray tracing operations, and a dispatcher node that dispatches graphics work to the plurality of nodes, each node for an image frame specified by the graphics work. and at least a first node of the plurality of nodes, performing ray tracing to render a first region of the image frame. a ray tracing renderer that denoises the first region using a combination of data associated with the first region and data associated with a region outside the first region; and at least a first node having a denoiser, wherein at least some of said data associated with said regions outside said first region is retrieved from at least one other node; there is

また、クラウドに基づくリアルタイム・レンダリング技術であって、より効率的な光線追跡演算を実行する技術も提案されている（特許文献２）。 Also proposed is a cloud-based real-time rendering technology that performs more efficient ray tracing calculations (Patent Document 2).

すなわち、特許文献２には、システムであって、第１グラフィック処理演算セットを実行してグラフィックシーンをレンダリングする第１グラフィック処理ノードであって、前記第１グラフィック処理演算セットは、光線追跡独立演算を含む、第１グラフィック処理ノードと、前記第１グラフィック処理ノードを第２グラフィック処理ノードに結合する相互接続又はネットワークインタフェースと、を含み、前記第２グラフィック処理ノードは、前記第１グラフィック処理ノードのユーザの現在視野の指示を受信し、視野独立光線トラバース及び交差演算により生成される視野独立表面を受信し又は構成し、前記第２グラフィック処理ノードは、応答して、前記視野独立表面の視野依存変換を、前記ユーザの前記現在視野に基づき実行して、視野依存表面を生成し、及び前記視野依存表面を前記第１グラフィック処理ノードに提供し、前記第１グラフィック処理ノードは、第２グラフィック処理演算セットを実行して、前記視野依存表面を用いて前記グラフィックシーンのレンダリングを完了する、システムが開示されている。 That is, in US Pat. No. 6,200,008, a system is a first graphics processing node that executes a first set of graphics processing operations to render a graphics scene, wherein the first set of graphics processing operations is a ray tracing independent operation and an interconnection or network interface coupling said first graphics processing node to a second graphics processing node, said second graphics processing node being connected to said first graphics processing node. receiving an indication of a user's current view; receiving or constructing a view independent surface generated by view independent ray traversal and intersection operations; performing a transformation based on the current view of the user to generate a view dependent surface and providing the view dependent surface to the first graphics processing node, the first graphics processing node performing a second graphics processing A system is disclosed that executes a set of operations to complete the rendering of the graphic scene using the view dependent surface.

また、レンダリング処理において機械学習を採用し、著しいアーチファクトの発生を軽減させる技術も提案されている（特許文献３）。 A technique has also been proposed that employs machine learning in rendering processing to reduce the occurrence of significant artifacts (Patent Document 3).

すなわち、特許文献３には、ボリュームデータから画像を生成するための画像処理装置であって、ボリュームデータセットを取得し、前記ボリュームデータセットに基づいた非均一性マップを取得し、前記非均一性マップを使用して非周期サンプリングポイントのセットの位置を決定し、前記ボリュームデータセットから、前記非周期サンプリングポイントの前記決定された位置に基づいてサンプルされたデータ値のセットを生成し、前記サンプルされたデータ値のセットから画像データポイントのセットを生成するために、集約処理を実行することで画像データセットを生成するように構成された処理回路を具備する画像処理装置が開示されている。 That is, Patent Document 3 discloses an image processing apparatus for generating an image from volume data, which acquires a volume data set, acquires a non-uniformity map based on the volume data set, determining locations of a set of aperiodic sampling points using a map; generating from the volume data set a set of sampled data values based on the determined locations of the aperiodic sampling points; An image processing apparatus is disclosed that includes processing circuitry configured to generate an image data set by performing an aggregation process to generate a set of image data points from a set of derived data values.

特開２０２０－１０２１９５号公報JP 2020-102195 A 特開２０２０－１０９６２０号公報Japanese Patent Application Laid-Open No. 2020-109620 特開２０２０－１９１０６１号公報JP 2020-191061 A

しかしながら、リアルタイム・レンダリングについては、高品質な画像や映像を生成するために多大な計算が必要となるため、リアルタイムにＡＩで画像や映像を生成することは、依然として困難である。また、本願の出願時点における画像や映像に関するＡＩ研究は、「品質向上」に主眼が置かれており、リアルタイム・レンダリングへのＡＩ適用には、いまだ改善の余地がある。すなわち、リアルタイム・レンダリングへのＡＩ適用を試みる場合には、新しいアーキテクチャの導入の余地がある。 However, since real-time rendering requires a large amount of calculations to generate high-quality images and videos, it is still difficult to generate images and videos with AI in real time. In addition, AI research on images and videos at the time of filing this application focuses on "improving quality", and there is still room for improvement in applying AI to real-time rendering. That is, there is room for introducing a new architecture when trying to apply AI to real-time rendering.

具体例を交えて現状の課題を示す。ＡＩでリアルタイムに高品質な画像や映像を生成する際には、ＧＰＵにニューラルネットワークが実装されることがある。従来のニューラルネットワークは多数のレイヤーで構成されるが、これらのレイヤー間にはデータ依存性があるため、レイヤーごとの同期処理が必要とされる。また、あるレイヤーによって出力されたデータはメモリに出力されるが、この出力データは、次のレイヤーに入力するために再びメモリから読み込む必要がある。そして、各レイヤーは、分岐のない多数のチャンネルを持っており、不要なチャンネルを含め、すべての入力チャンネルの処理を行う必要がある。したがって、冗長な計算処理やメモリの大量消費を発生させてしまう。 Present problems are shown with concrete examples. A neural network may be implemented in a GPU when generating high-quality images and videos in real time with AI. A conventional neural network consists of many layers, and since there are data dependencies between these layers, synchronization processing for each layer is required. Also, data output by one layer is output to memory, and this output data must be read from memory again to be input to the next layer. And each layer has a large number of channels without branches, and it is necessary to process all input channels, including unnecessary channels. Therefore, redundant calculation processing and large consumption of memory are generated.

また、既存のニューラルネットワークモデルは、リアルタイム描画に対しては十分な性能を発揮できない。例えば、上述の理由により実行速度が遅いだけでなく、顔の形や方向、ライティングなどを直接指定できないからである。従って、既存のレンダリング手法をそのままＡＩに置き換えても高い効果は期待できない。 Also, existing neural network models do not perform well for real-time rendering. For example, not only is the execution speed slow due to the above reasons, but also the face shape, direction, lighting, etc. cannot be specified directly. Therefore, even if the existing rendering method is replaced with AI as it is, a high effect cannot be expected.

本発明は、上述したような非効率な処理を解消することにより、ＡＩでリアルタイムに不自然さのない人間の顔の画像やリアルタイム映像をレンダリング可能とすることを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to enable real-time rendering of natural human face images and real-time video with AI by eliminating the above-described inefficient processing.

そこで、本発明の一実施形態にかかる画像処理システムは、ＣＰＵとＧＰＵとを備え、ニューラルネットワークを使ってレンダリング処理を行う画像処理システムであって、画面フレーム中のメッシュ情報及びライティング情報をニューラルネットワークに適応できるように構造化された入力フォーマットに変換するための変換部と、タイル毎に学習させた前記ニューラルネットワークを使い分けるためのハッシュ分業を行わせる処理部とを含み、前記ハッシュ分業には、前記画面フレーム中のピクセル値をハッシュ化してハッシュ値を計算する計算部と、前記ハッシュ値をキーとして使用し、ルックアップテーブルから対応するタイルの前記ニューラルネットワークにおける重みを選択して読み込む読み込み部とが含まれ、前記ハッシュ分業による出力は、前記画面フレーム中のタイルに対応する前記ニューラルネットワークの重みによって推定される推定値であることを特徴とする。 Therefore, an image processing system according to an embodiment of the present invention is an image processing system that includes a CPU and a GPU and performs rendering processing using a neural network. and a processing unit that performs hash division of labor for properly using the neural network trained for each tile, and the hash division of labor includes: a calculator for hashing pixel values in the screen frame to calculate a hash value; and a reader that uses the hash value as a key to select and read weights in the neural network for corresponding tiles from a lookup table. and the output from the hash division of labor is an estimate estimated by the weights of the neural network corresponding to the tiles in the screen frame.

また、前記コンピュータは、処理ユニットにおいて互いに効率的に同期できる複数の部分プロセッサ（同期グループ、またはＳＭ。以下、単に「同期グループ」ともいう。）と、前記同期グループが共有できるオンチップメモリとを有しており、前記ハッシュ分業は、前記画面フレーム中のタイル毎に、前記同期グループを割り当て、前記ニューラルネットワークの重みを前記オンチップメモリ上に読み込むことを特徴とする。 The computer also includes a plurality of partial processors (synchronization groups, or SMs, hereinafter also simply referred to as "synchronization groups") that can be efficiently synchronized with each other in processing units, and an on-chip memory that the synchronization groups can share. wherein the hash division is characterized in that for each tile in the screen frame, the sync group is assigned and the neural network weights are loaded onto the on-chip memory.

また、前記コンピュータは、前記同期グループと、レジスタ（命令スケジューラのオンチップメモリ）とを有しており、前記ハッシュ分業は、前記画面フレーム中のタイル毎に、前記同期グループを割り当て、前記ニューラルネットワークの中間レイヤーの出力を前記レジスタに格納することを特徴とする。 The computer also has the synchronization group and a register (an on-chip memory of an instruction scheduler), and the hash division assigns the synchronization group to each tile in the screen frame, and the neural network is stored in the register.

本発明の一実施形態にかかる画像処理システム等によれば、リアルタイム・レンダリングへの新しいＡＩ適用を実現し、不自然さのない人間の顔の画像や映像のレンダリングを可能にするという特段の効果を奏する。 According to the image processing system and the like according to one embodiment of the present invention, a special effect of realizing a new application of AI to real-time rendering and enabling rendering of images and videos of human faces without unnaturalness. play.

本発明の一実施形態にかかる画像処理システムの全体構成例を説明する説明図である。1 is an explanatory diagram illustrating an example of the overall configuration of an image processing system according to an embodiment of the present invention; FIG. 本発明の一実施形態にかかる画像処理システムにおける情報処理サーバ構成のバリエーションを説明する説明図である。FIG. 4 is an explanatory diagram illustrating variations of the information processing server configuration in the image processing system according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システムにおける情報処理装置の外観構成を説明する説明図である。1 is an explanatory diagram illustrating an external configuration of an information processing device in an image processing system according to an embodiment of the present invention; FIG. 本発明の一実施形態にかかる画像処理システムにおける情報処理装置の機能ブロックを説明する説明図である。1 is an explanatory diagram illustrating functional blocks of an information processing device in an image processing system according to an embodiment of the present invention; FIG. 本発明の一実施形態にかかる画像処理システム等の動作概要を説明するフローチャートである。4 is a flowchart for explaining an outline of operations of an image processing system, etc., according to an embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作の詳細を説明するフローチャートである。4 is a flow chart explaining the details of the operation of the image processing system etc. according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作において前提となるデータ構成例を説明する説明図である。FIG. 2 is an explanatory diagram for explaining a data configuration example that is a prerequisite for the operation of the image processing system and the like according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作において前提となるデータ構成例を説明する説明図である。FIG. 2 is an explanatory diagram for explaining a data configuration example that is a prerequisite for the operation of the image processing system and the like according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等において採用されるハッシュ関数例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating an example of a hash function employed in the image processing system etc. according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の詳細な動作を説明するフローチャートである。4 is a flowchart for explaining detailed operations of the image processing system and the like according to one embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の詳細な動作（トレーニングの前処理）を説明するフローチャートである。4 is a flowchart illustrating detailed operations (preprocessing for training) of the image processing system, etc., according to an embodiment of the present invention. 本発明の一実施形態にかかる画像処理システム等の詳細な動作（トレーニングの前処理）を説明するフローチャートである。4 is a flowchart illustrating detailed operations (preprocessing for training) of the image processing system, etc., according to an embodiment of the present invention. 本発明の一実施形態にかかる画像処理システム等の詳細な動作（トレーニングの前処理）を説明するフローチャートである。4 is a flowchart illustrating detailed operations (preprocessing for training) of the image processing system, etc., according to an embodiment of the present invention. 本発明の一実施形態にかかる画像処理システム等の動作の具体例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a specific example of the operation of the image processing system etc. according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作の具体例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a specific example of the operation of the image processing system etc. according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作の具体例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a specific example of the operation of the image processing system etc. according to the embodiment of the present invention; 本発明の一実施形態にかかる画像処理システム等の動作の具体例を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a specific example of the operation of the image processing system etc. according to the embodiment of the present invention; 従来の画像処理システム等における動作例を説明する説明図である。FIG. 10 is an explanatory diagram for explaining an operation example in a conventional image processing system or the like; 従来の画像処理システム等における動作例を説明する説明図である。FIG. 10 is an explanatory diagram for explaining an operation example in a conventional image processing system or the like;

（用語の定義）
はじめに、本実施例で使用される用語の定義を行う。
［ラスタライズ］
一般的には、画像処理においてラスタ形式以外のデータ（例えば、ベクタ形式のデータなど）をラスタ形式に変換して画像化することをいうが、３次元コンピュータグラフィックスにおいては、ポリゴン等の形状データをピクセルデータ（フラグメントとも呼ばれる）に変換する処理をいう。本実施例においては、３次元コンピュータグラフィックスの処理に関し、各ピクセルの中間状態をＧ－Ｂｕｆｆｅｒのようなバッファに保存することをいう。本発明はこれらに限定されるものではないが、本発明の一実施形態におけるラスタライズの出力例は、ピクセルごとの物体表面の２次元位置（または、ＵＶ）、同表面の種類のＩＤ（分類ＩＤ）、同表面の光の量（光度）である。なお、ＵＶは、テクスチャ毎の座標系（ＵＶ座標系）における値（ＵＶ値）である。
［タイル］
１画素（１ピクセル）をある程度まとめて取り扱う場合の単位である。一例として、１６×１６ピクセルを１タイルとすることができる。また、それぞれのピクセルは、ＵＶ値、分類ＩＤ、光度量といったデータセットで構成されることができる。
［世界規模］
ワールド（グローバル）座標系のスケールでのスコープをいう。
［タイル規模］
ローカル座標系（その１）のスケールでのスコープをいう。ここでのローカル座標系は、ピクセル規模へ落とし込む余地を残したローカル座標である。
［ピクセル規模］
ローカル座標系（その２）のスケールでのスコープをいう。ここでのローカル座標系は、本実施形態のおける最小のローカル座標である。
［ＳＭ（Streaming Multiprocessor）］
ＧＰＵのプロセッサ単位である。一例として、エヌビディアコーポレイション（NVIDIA Corporation）のＧＰＵアーキテクチャについて、［https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf］を参照されたい。 (Definition of terms)
First, terms used in this embodiment are defined.
[Rasterize]
In general, in image processing, it refers to converting non-raster format data (e.g., vector format data) into raster format and creating an image, but in 3D computer graphics, shape data such as polygons to pixel data (also called fragments). In this embodiment, it refers to storing the intermediate state of each pixel in a buffer such as a G-Buffer in relation to 3D computer graphics processing. Although the present invention is not limited to these, examples of rasterization output in one embodiment of the present invention include the two-dimensional position (or UV) of the object surface for each pixel, the ID of the surface type (classification ID ), the amount of light (luminous intensity) on the same surface. Note that UV is a value (UV value) in a coordinate system (UV coordinate system) for each texture.
[tile]
This is a unit for treating one pixel (one pixel) collectively to some extent. As an example, a tile can be 16×16 pixels. Also, each pixel can consist of a data set such as UV value, classification ID, and luminous intensity.
[Global scale]
Scope at the scale of the world (global) coordinate system.
[Tile size]
It refers to the scope on the scale of the local coordinate system (Part 1). The local coordinate system here is a local coordinate that leaves room for reduction to the pixel scale.
[Pixel Scale]
It refers to the scope on the scale of the local coordinate system (part 2). The local coordinate system here is the minimum local coordinate in this embodiment.
[SM (Streaming Multiprocessor)]
It is a processor unit of GPU. As an example, for the GPU architecture of NVIDIA Corporation, [https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture -Whitepaper-V1.pdf].

本発明はこれに限定されるものではないが、１つのＳＭの中には、４つの独立した命令スケジューラ（これを「部分プロセッサ」という。）があり、１つの命令スケジューラには、それぞれレジスタ（一例として、６４ＫＢ）が備わる。複数（一例として、４つ）の部分プロセッサ（これを一実施形態における「同期グループ」ということもできる。）がＳＭを構成する。従って、一実施形態において、「処理ユニットにおいて互いに効率的に同期できる」というのは、１つのＳＭ内の部分プロセッサ同士が効率的に同期できるという意味である。 Although the present invention is not limited to this, one SM has four independent instruction schedulers (referred to as "partial processors"), and one instruction scheduler includes registers ( As an example, 64 KB). A plurality (for example, four) of partial processors (which can also be referred to as a "synchronization group" in one embodiment) constitutes the SM. Thus, in one embodiment, "effectively synchronized with each other in a processing unit" means that sub-processors within a single SM can be efficiently synchronized with each other.

また、一実施形態において、処理ユニットにおいては、１つのＳＭ内において、同期グループが共有できるオンチップメモリが備えられる（一例として、１２８ＫＢ）。 Also, in one embodiment, the processing unit is provided with on-chip memory (eg, 128 KB) that can be shared by a sync group within one SM.

本発明はこれに限定されるものではないが、一実施形態において、次のような構成のＳＭを利用することを想定することができる。まず、ＧＰＵ中のＳＭの数は数十から１００以上である。また、１つのＳＭ当たり４つの命令スケジューラが実装され、１命令スケジューラあたり６４ＫＢのレジスタを備えている。本発明の一実施形態においては、これらのＳＭ群は、ラスタライズされ、タイルに分割された入力画像（入力タイル）を並列処理し、一例として、１つのスケジューラに１つのタイルを処理させることができる。これらの処理の出力は出力タイルであり、入力タイルと出力タイルは、画像において互いに対応する位置を占める。
［ハッシュ分業］
本発明で採用される新規のアーキテクチャである。このアプローチによって、従来のニューラルネットワークを使ったＣＧ処理における非効率を軽減または解消させることができる。ハッシュ分業は、概略的に、画面フレーム中のタイルごとにその内容（数値）をハッシュ化する（ハッシュ値を計算する）工程と、ハッシュ値をキーとして使用し、ルックアップテーブルから対応するタイルのニューラルネットワーク（モデル）における重みを選択して読み込む工程とを含む。ハッシュ分業は、ニューラルネットワークのサブドメインを切り離して実行する処理ということができる。 Although the invention is not so limited, in one embodiment it can be envisaged to utilize an SM with the following configuration. First, the number of SMs in a GPU ranges from tens to 100 or more. Also, four instruction schedulers are implemented per SM, and each instruction scheduler has 64 KB of registers. In one embodiment of the present invention, these SMs may process a rasterized and tiled input image (input tiles) in parallel, and as an example, allow one scheduler to process one tile. . The output of these processes is the output tile, and the input and output tiles occupy corresponding positions in the image.
[Hash division of labor]
1 is a novel architecture employed in the present invention; This approach can reduce or eliminate inefficiencies in CG processing using conventional neural networks. Hash division of labor is roughly a process of hashing (calculating a hash value) the content (numerical value) of each tile in a screen frame, and using the hash value as a key, extracting the corresponding tile from a lookup table. and selecting and loading weights in the neural network (model). Hash division of labor can be said to be a process that separates and executes subdomains of a neural network.

また、ハッシュ分業による出力は、画面フレーム中のタイルに対応するニューラルネットワークの重みによって推定される推定値である。
（本発明の特徴）
次に、本発明の特徴を説明する。本発明の新規な特徴は、ニューラルネットワークを使って、ＣＧのリアルタイム描画処理を直接描画するように構成したことにある。 Also, the output from the hash division of labor is an estimate estimated by the neural network weights corresponding to the tiles in the screen frame.
(Characteristics of the present invention)
Next, features of the present invention will be described. A novel feature of the present invention is that a neural network is used to directly draw the CG real-time drawing process.

一般に、ＡＩのよる画像処理の成果は、スクリーンショットを比較して行われることが多いが、本発明の一実施形態における成果は、測定結果のみによってではなく、基本原理から正しいアプローチであることが論理的に導き出せるものである。 In general, the results of image processing by AI are often performed by comparing screenshots, but the results in one embodiment of the present invention are not based solely on the measurement results, but the correct approach from the basic principle. It can be derived logically.

また、本発明は、取り扱うデータの依存性や依存性の拡大に着目している。従って、本発明が効果を奏する条件の一つとして、大域的なデータ依存の存在が挙げられる。例えば、本発明が想定するリアルタイム描画では、データ入力は、内容が不明である写真などではなく、カメラやメッシュなどのバイナリーデータから生成されている。そのため、ピクセルに入れる情報は自由であり、必要であれば追加も可能である。このことは、本発明が提唱するような革新的手法の適用余地があることを意味している。
（本発明の基本概念）
本発明の基本概念は、リアルタイム描画に特化した非効率さを解決するためのプロセス群である。一例として、本発明の一実施形態においては、メッシュ情報及び／またはライティング情報をニューラルネットワークに適応できるように、メッシュ情報及び／またはライティング情報は、高度に構造化した入力フォーマットに変換される。 In addition, the present invention focuses on dependence of data to be handled and expansion of dependence. Therefore, one of the conditions under which the present invention is effective is the presence of global data dependency. For example, in the real-time drawing envisioned by the present invention, data input is generated from binary data such as cameras and meshes, rather than from photographs whose content is unknown. Therefore, the information to be put into the pixel is free and can be added if necessary. This means that there is room for the innovative approach proposed by the present invention.
(Basic concept of the present invention)
The basic concept of the present invention is a set of processes to solve the inefficiencies specific to real-time rendering. As an example, in one embodiment of the present invention, mesh information and/or lighting information is converted into a highly structured input format so that the mesh information and/or lighting information can be applied to a neural network.

また、本発明の一実施形態においては、特定の描画に特化したニューラルネットワークを多数学習させ、タイルごとにハッシュによって使い分けるハッシュ分業が行われる。 Further, in one embodiment of the present invention, a number of neural networks specialized for specific drawing are trained, and hash division of labor is performed in which each tile is selectively used by hash.

換言すると、本発明は、次のような基本処理群に支えられている。
（１）メッシュ情報及び／またはライティング情報は、ニューラルネットワークのために高度に構造化した入力フォーマットに変換される。
（２）特定の描画に特化したニューラルネットワークを多数学習させ、タイルごとにハッシュによって使い分けるハッシュ分業プロセスが採用される。 In other words, the present invention is supported by the following basic processing groups.
(1) Mesh information and/or lighting information is converted into a highly structured input format for neural networks.
(2) A hash division of labor process is employed in which a large number of neural networks specializing in specific drawing are trained and each tile is selectively used by hash.

なお、上記（１）及び（２）におけるニューラルネットワークは、ハードウェア性能の制約に収まるように設計される。
（本発明の適用場面例～人の顔を描画する場合）
次に、本発明の理解の容易のために、本発明の適用場面の一例を挙げる。一般に、人間は、顔については非常に高い感受性をもって認知することができる。例えば、人間は、同じ人の表情が少しでも不自然だとそのことにすぐに気づくことができる。この事実は、もし、コンピュータが人間の顔の微妙な変化を正しく描画することができれば、人間にとっての映像の魅力を大きく増大させることができるということを意味する。近年、ハードウェアによるリアルタイム・レイトレーシングが実現されつつあり、例えば、物体の鏡面反射などは正確に計算できるようになってきているが、人間の肌の表面化散乱などを正確に計算することは未だに困難である。 The neural networks in (1) and (2) above are designed to fit within the constraints of hardware performance.
(Example of application scene of the present invention - when drawing a human face)
Next, an example of an application scene of the present invention will be given for easy understanding of the present invention. In general, humans can perceive faces with very high sensitivity. For example, humans can immediately notice that the facial expression of the same person is even slightly unnatural. This fact means that if a computer can correctly draw subtle changes in a human face, the attractiveness of an image to humans can be greatly increased. In recent years, real-time ray tracing by hardware has been realized, and for example, it has become possible to accurately calculate the specular reflection of an object, but it is still difficult to accurately calculate the surface scattering of human skin. Have difficulty.

一方、人間の顔の実写映像については、大量に手に入れることが比較的容易であるため、大量に入手可能な人間の顔画像データを教師データとして用い、ＡＩによって描画を行うことは有望と考えられる。本発明はこれに限定されるものではないが、本発明の一実施形態においては、人間の顔画像のリアルタイム描画に関して有意な効果を奏する。 On the other hand, it is relatively easy to obtain a large amount of actual images of human faces, so it is promising to use the large amounts of available human face image data as training data and draw with AI. Conceivable. Although the invention is not so limited, in one embodiment of the invention there are significant advantages in real-time rendering of human facial images.

本発明の一実施形態にかかる画像処理システム、方法、及びプログラムについて、図面を参照しながら詳細に説明する。 An image processing system, method, and program according to one embodiment of the present invention will be described in detail with reference to the drawings.

図１に、本発明の一実施形態にかかる画像処理システムの全体構成例を示す。本発明は、特に制限されないが、ＰＣ等のコンピュータ上でスタンドアロン作動させることもできるし、図１に示されるようなネットワーク構成における情報処理サーバ群において実施されることもできる。また、図１に示されるようなネットワーク構成であっても、他の情報処理サーバや、ＰＣ・タブレット端末等の情報処理装置において少なくとも一部または全部の処理ルーチンが実施されてもよい。以下、本発明の理解の容易のために、図１を参照して、本発明の一実施形態を説明する。 FIG. 1 shows an example of the overall configuration of an image processing system according to one embodiment of the present invention. The present invention can be operated stand-alone on a computer such as a PC, but can also be implemented in a group of information processing servers in a network configuration as shown in FIG. Further, even with the network configuration as shown in FIG. 1, at least some or all of the processing routines may be executed in another information processing server or an information processing device such as a PC or a tablet terminal. Hereinafter, one embodiment of the present invention will be described with reference to FIG. 1 for easy understanding of the present invention.

図１に示されるように、画像処理システム１０は、一実施形態として、情報処理サーバ群１１と、ユーザが使用する各種情報処理装置（図において、例示的に、ＰＣ１２及び１３、携帯電話１４、スマートフォン、携帯情報端末またはタブレット端末１５が示されている。以下、総称して「各種端末」、「ユーザ端末」、あるいは、単に「端末」とも言うこともある）とで構成され、情報処理サーバ群１１及び各種端末間は、図１に示されるように専用回線やインターネット等の公衆回線（図１には、有線の回線例として１６～１９が示されている）により相互に通信可能に接続されている。また、回線は有線であっても無線であってもよい。回線が無線の場合、携帯電話１４及び端末１５は、図示しない基地局や無線ルータ等を介してインターネット１９に乗り入れ、更に回線１８を介して情報処理サーバ群１１と相互に通信可能に接続される。 As shown in FIG. 1, an image processing system 10 includes, as an embodiment, an information processing server group 11 and various information processing devices used by users (in the figure, PCs 12 and 13, a mobile phone 14, A smartphone, a mobile information terminal, or a tablet terminal 15 is shown, hereinafter collectively referred to as "various terminals", "user terminals", or simply "terminals"), and an information processing server The group 11 and various terminals are connected so as to be mutually communicable by a dedicated line or a public line such as the Internet (in FIG. 1, 16 to 19 are shown as examples of wired lines), as shown in FIG. It is Also, the line may be wired or wireless. When the line is wireless, the mobile phone 14 and the terminal 15 enter the Internet 19 via a base station, wireless router, etc. (not shown), and are connected to the information processing server group 11 via the line 18 so as to be able to communicate with each other. .

なお、本願の出願時点での携帯電話１４やスマートフォン、携帯情報端末あるいはタブレット１５は、パーソナルコンピュータ（ＰＣ）と同等の処理能力（通信処理速度や画像処理能力等）を備えているものも多く、小型のコンピュータとも言うべきものである。 Many of the mobile phones 14, smart phones, mobile information terminals, or tablets 15 at the time of filing this application have processing capabilities (communication processing speed, image processing capability, etc.) equivalent to those of personal computers (PCs). It should be called a small computer.

また、本発明の実施に必要なプログラムあるいはソフトウェアは、通常、情報処理サーバ群における記憶部、さらには必要に応じてＰＣや携帯情報端末の記憶部におけるＨＤＤ（Hard Disk Drive）あるいはＳＳＤ（Solid State Drive）等にインストールあるいは記憶され、プログラムあるいはソフトウェアの実行時には、必要に応じて記憶部内のメモリにその全部又は一部のソフトウェアモジュールとして読み出され、ＣＰＵ等において演算実行される。 Programs or software necessary for carrying out the present invention are usually stored in a storage unit in an information processing server group, and if necessary, in a HDD (Hard Disk Drive) or SSD (Solid State Storage) in a storage unit of a PC or a portable information terminal. Drive) or the like, and when the program or software is executed, all or part of it is read as a software module into the memory in the storage unit as required, and the CPU or the like executes calculations.

なお、演算実行は、必ずしもＣＰＵ等の中央処理部のみで行われる必要はなく、図示しないグラフィカルプロセッシングユニット（ＧＰＵ）やディジタルシグナルプロセッサ（ＤＳＰ）等のプロセッサを用いることもできる。 It should be noted that the execution of calculations does not necessarily have to be performed only by a central processing unit such as a CPU, and a processor such as a graphical processing unit (GPU) or a digital signal processor (DSP) (not shown) can also be used.

さらに、情報処理サーバ群１１のハードウェア構成も、基本的にはＰＣを採用することができる。なお、本発明はこれに限定されるものではないが、情報処理サーバ群１１は、必要に応じてそのハードウェアスペックを上げるにあたり、複数のＰＣ（一例として、数十台～数万台）を並列的に作動させることによって大規模データの処理に適した構成をとることもできる。また、本願の出願時において利用可能なクラウド構成を採用することもできる。 Furthermore, the hardware configuration of the information processing server group 11 can basically employ a PC. Although the present invention is not limited to this, the information processing server group 11 may include a plurality of PCs (for example, several tens to tens of thousands of PCs) in order to improve the hardware specifications as necessary. A configuration suitable for processing large-scale data can also be obtained by operating in parallel. In addition, a cloud configuration that is available at the time of filing of this application can also be adopted.

以上、図１を参照して本発明の一実施形態にかかる画像処理システム１０を説明したが、本発明の構成は必ずしもこれに制限されるものではなく、例えば、本発明にかかる特徴的な構成を実施するハードウェアが情報処理サーバ群１１に集約されている場合には、情報処理サーバ群１１を本発明の他の実施形態としての画像処理システムとしてもよい（以下、同様）。 Although the image processing system 10 according to one embodiment of the present invention has been described above with reference to FIG. 1, the configuration of the present invention is not necessarily limited to this. is concentrated in the information processing server group 11, the information processing server group 11 may be used as an image processing system as another embodiment of the present invention (the same applies hereinafter).

また、既に説明したように、本発明の他の実施形態にかかる画像処理システムにおいては、ネットワーク構成をとらず、サーバ単独あるいは端末単体を主体としたスタンドアロン構成を採用することもできる。 Further, as already explained, in the image processing system according to another embodiment of the present invention, it is possible to adopt a stand-alone configuration mainly composed of a single server or a single terminal without adopting a network configuration.

図２に、本発明の一実施形態にかかる画像処理システムにおける情報処理サーバ構成のバリエーションを示す。情報処理サーバ群１１の動作は、以下に説明するハードウェアの個々の動作、及びソフトウェアとこれらハードウェアとの連携動作によって実現されている。 FIG. 2 shows a variation of the information processing server configuration in the image processing system according to one embodiment of the present invention. The operations of the information processing server group 11 are realized by individual operations of hardware described below and cooperative operations between software and these hardware.

図２において、ユーザ端末１５ａ～１５ｃからアクセスされる情報処理サーバ群１１は、例示的に、複数のサーバシステムを連携させ一つのシステムとして稼働させるように、クラスタシステムとして構成される。このようなクラスタ構成とすることで、例えば一つのサーバに障害が発生しても他のサーバに処理を継続させることができるほか、特定のサーバ（群）に処理が集中したような場合においても、他のサーバ（群）に処理を分散させることができ、システム全体の安定性を向上させることができる。このようなクラスタ構成は、特にリアルタイム・マルチゲームプレイシステムを構築する場合には、有利な構成の一つである。 In FIG. 2, the information processing server group 11 accessed from the user terminals 15a to 15c is illustratively configured as a cluster system so that a plurality of server systems are linked and operated as one system. With such a cluster configuration, for example, even if one server fails, other servers can continue processing. , the processing can be distributed to other server(s) and the stability of the overall system can be improved. Such a cluster configuration is one of the advantageous configurations especially when constructing a real-time multi-game play system.

本発明は、これに制限されるものではないが、本発明の理解の容易のために、以下、情報処理サーバ群１１は、リアルタイム・レンダリングを含むリアルタイムマルチプレイゲームを提供するサーバ群であるものとする。 Although the present invention is not limited to this, for ease of understanding of the present invention, hereinafter, the information processing server group 11 is assumed to be a server group that provides a real-time multiplayer game including real-time rendering. do.

図２において、情報処理サーバ群１１は、大別すると、リアルタイムクラスタ（群）１１１と、ロードバランサ１１２（群）と、ＡＰＩサーバ２３とを有する。また、本発明の他の実施形態においては、図示しないキャッシュクラスタ（群）を有するように構成されてもよい。 In FIG. 2 , the information processing server group 11 is roughly divided into a real-time cluster (group) 111 , a load balancer (group) 112 (group), and an API server 23 . Also, other embodiments of the present invention may be configured to have cache cluster(s) not shown.

本発明の一実施形態において、リアルタイムクラスタ（群）１１１は、Ｌｏｂｂｙクラスタ（群）とＧａｍｅクラスタ（群）とを含む。 In one embodiment of the invention, real-time cluster(s) 111 includes Lobby cluster(s) and Game cluster(s).

また、本発明の一実施形態において、ロードバランサ（群）１１２は、Ｌｏｂｂｙロードバランサ（群）と、Ｇａｍｅロードバランサ（群）とを含む。 In one embodiment of the invention, load balancer(s) 112 also include Lobby load balancer(s) and Game load balancer(s).

一実施形態において、Ｌｏｂｂｙクラスタ（群）は、リアルタイムマルチプレイゲームを成立させるためのロビーでのユーザマッチング処理を担当するよう構成することができる。また、Ｇａｍｅクラスタ（群）は、リアルタイムマルチゲームを進行させる上でアクション部分におけるリアルタイム通信処理等を担当させるよう構成することができる。 In one embodiment, the Lobby cluster(s) can be configured to handle user matching operations in lobbies to establish real-time multiplayer games. Also, the Game cluster (group) can be configured to be in charge of real-time communication processing and the like in the action portion in progressing the real-time multi-game.

一実施形態において、ロードバランサ（群）１１２においては、Ｌｏｂｂｙクラスタ（群）のロードバランシング及びオートスケーリングを担当するＬｏｂｂｙロードバランサと、Ｇａｍｅクラスタ（群）のロードバランシング及びオートスケーリングを担当するＧａｍｅロードバランサとを併存させて、プロセス監視などの調整を互いに行いながら作動させることができる。 In one embodiment, in the load balancer(s) 112, a Lobby load balancer is responsible for load balancing and autoscaling for the Lobby cluster(s) and a Game load balancer is responsible for load balancing and autoscaling for the Game cluster(s). can coexist and operate while coordinating with each other such as process monitoring.

図３に、本発明の一実施形態にかかる画像処理システムにおける情報処理装置としてのタブレット端末の外観構成を示す。図３において、情報処理装置（タブレット端末）１５は、筐体部１５１とディスプレイ１５２と筐体１５１の下部中央部に設けられたハードウェアボタン１５３とからなる。ディスプレイ１５２は典型的には液晶ディスプレイ（ＬＣＤ）等で構成され、文字や静止画像や動画など様々な情報を表示することができる。また、ディスプレイ１５２にメニューボタンやソフトウェアキーボードを表示させ、これを指ないしタッチペン（不図示）等で触れることによりタブレット端末１５への指示（コマンド）とすることができる。この点で上記ハードウェアボタン１５３は必須の構成要素ではないが、本発明の説明の便宜上、一定の機能を担うボタンとして実装されている。もちろん、ハードウェアボタン１５３を、ディスプレイ１５２の一部に表示させたメニューボタンで代替させることも可能である。 FIG. 3 shows an external configuration of a tablet terminal as an information processing device in an image processing system according to one embodiment of the present invention. In FIG. 3 , an information processing device (tablet terminal) 15 includes a housing 151 , a display 152 , and hardware buttons 153 provided in the lower central portion of the housing 151 . The display 152 is typically composed of a liquid crystal display (LCD) or the like, and can display various information such as characters, still images, and moving images. Also, a menu button or a software keyboard can be displayed on the display 152 and can be used as an instruction (command) to the tablet terminal 15 by touching it with a finger or a touch pen (not shown). In this respect, the hardware button 153 is not an essential component, but for convenience of explanation of the present invention, it is implemented as a button that performs a certain function. Of course, it is also possible to substitute the hardware button 153 with a menu button displayed on a part of the display 152 .

また、ディスプレイ１５２には、マルチタッチ入力パネルが含まれており、タッチ入力パネル上でのタッチ入力位置座標が入力デバイスインタフェース（不図示）を介してタブレット端末１５の処理系（ＣＰＵ）へ送信され処理される。そして、このマルチタッチ入力パネルは、パネルに対する複数の接触点を同時に感知することができるよう構成されている。この検出（センサ）については様々な方法で実現することができ、必ずしも接触センサに限られず、例えば、光学式のセンサを利用してパネルに対する指示点を抽出することも可能である。さらに、センサには、接触式のセンサや光学式のセンサのほか、人の肌の接触を感知する静電容量方式のセンサを用いることも可能である。 Further, the display 152 includes a multi-touch input panel, and touch input position coordinates on the touch input panel are transmitted to the processing system (CPU) of the tablet terminal 15 via an input device interface (not shown). It is processed. And, the multi-touch input panel is configured to be able to simultaneously sense multiple touch points on the panel. This detection (sensor) can be realized by various methods, and is not necessarily limited to a contact sensor. For example, an optical sensor can be used to extract an indication point on the panel. Further, the sensor may be a contact sensor, an optical sensor, or a capacitive sensor that senses contact with human skin.

また、図３には現れていないが、タブレット端末１５は、マイクやスピーカを備えることもできる。この場合にはマイクから拾ったユーザの声などを判別して入力コマンドとすることも可能である。さらに、図３には現れていないが、タブレット端末１５の背面等には、ＣＭＯＳ等のカメラデバイスが実装されている。 Moreover, although not shown in FIG. 3, the tablet terminal 15 can also include a microphone and a speaker. In this case, it is also possible to discriminate the voice of the user picked up by the microphone and use it as an input command. Furthermore, although not shown in FIG. 3, a camera device such as CMOS is mounted on the back surface of the tablet terminal 15 or the like.

図４に、本発明の一実施形態にかかるタブレット端末１５を構成するハードウェアの機能ブロック図を例示する。タブレット端末１５の動作は、以下に説明するハードウェアの個々の動作、及びソフトウェアとこれらハードウェアとの連携動作によって実現されている。 FIG. 4 illustrates a functional block diagram of hardware configuring the tablet terminal 15 according to one embodiment of the present invention. The operation of the tablet terminal 15 is realized by individual operations of hardware and cooperative operations between software and these hardware, which will be described below.

図４において、ハードウェアブロック全体としてのタブレット端末４００は、大別すると、図３におけるハードウェアボタン１５３、ディスプレイ１５２に設けられたマルチタッチ入力パネル、マイク等で構成される入力部４０１と、プログラムやデータ等を記憶するためのハードディスク、ＲＡＭ及び／又はＲＯＭ等で構成される記憶部４０２と、プログラムにより様々な数値計算や論理演算を行うＣＰＵによって構成される中央処理部４０３と、ディスプレイ１５２等で構成される表示部４０４と、チップや電気系統等の制御を行うための制御部４０５と、インターネットにアクセスするためのスロットや光通信を行うためのポート、及び通信インタフェースから構成される通信インタフェース部４０６と、スピーカやバイブレーション等の出力部４０７と、時刻等を計時するための計時部４０８と、ＣＭＯＳ等のイメージセンサからなるセンサ部４０９と、装置内の各モジュールに電源を供給するための電源部４１０とからなり、これらのモジュールは必要に応じて適宜通信バスや給電線等の配線によって接続されている（図４においては、ひとまとめに結線４１１で表わされている）。 In FIG. 4, the tablet terminal 400 as a hardware block as a whole is roughly divided into an input unit 401 composed of hardware buttons 153 in FIG. 3, a multi-touch input panel provided on the display 152, a microphone, etc. A storage unit 402 composed of a hard disk, RAM and/or ROM for storing data, etc., a central processing unit 403 composed of a CPU that performs various numerical calculations and logical operations according to programs, and a display 152 etc. a display unit 404, a control unit 405 for controlling chips, electrical systems, etc., a slot for accessing the Internet, a port for performing optical communication, and a communication interface composed of a communication interface. unit 406, an output unit 407 such as a speaker and vibration, a timer unit 408 for measuring time, a sensor unit 409 including an image sensor such as a CMOS, and a power supply for supplying power to each module in the device. These modules are appropriately connected by wiring such as a communication bus and a power supply line as required (in FIG. 4, they are collectively represented by a connection 411).

なお、センサ部４０９には、タブレット端末４００（１５）の位置を特定するためのＧＰＳセンサモジュールを含めることとしても良い。また、センサ部４０９を構成するＣＭＯＳ等のイメージセンサによって検知された信号は、入力部４０１において入力情報として処理することができる。 Note that the sensor unit 409 may include a GPS sensor module for specifying the position of the tablet terminal 400 (15). A signal detected by an image sensor such as a CMOS that constitutes the sensor unit 409 can be processed as input information in the input unit 401 .

また、本発明の実施に必要なプログラムあるいはソフトウェアは、通常、記憶部４０２を構成するハードディスク等にインストールあるいは記憶され、プログラムあるいはソフトウェアの実行時には、必要に応じて記憶部４０２内のメモリにその全部又は一部のソフトウェアモジュールとして読み出され、ＣＰＵ４０３において演算実行される。 Programs or software necessary for carrying out the present invention are usually installed or stored in a hard disk or the like constituting the storage unit 402, and when the programs or software are executed, all of them are stored in the memory in the storage unit 402 as necessary. Alternatively, it is read as a part of the software module, and the CPU 403 performs the calculation.

なお、演算実行は、必ずしもＣＰＵ等の中央処理部のみで行われる必要はなく、ゲーミングタブレットにおいては、図示しないグラフィカルプロセッシングユニット（ＧＰＵ）やディジタルシグナルプロセッサ（ＤＳＰ）等のプロセッサを用いることもできる。 It should be noted that arithmetic execution does not necessarily have to be performed only by a central processing unit such as a CPU, and in a gaming tablet, a processor such as a graphical processing unit (GPU) or a digital signal processor (DSP) (not shown) can also be used.

次に、図５～図６の動作フローないしフローチャートを用いて、本発明にかかる一実施形態における画像処理システムないし画像処理プログラムの動作の概略を説明する。 Next, the outline of the operation of the image processing system or image processing program according to one embodiment of the present invention will be described with reference to the operation flows or flow charts of FIGS. 5 and 6. FIG.

既に述べたように、本発明の特徴的な動作は、主として情報処理サーバ群１１において実施可能であるが、少なくともその一部を情報処理装置等に実施させることもできる。 As already described, the characteristic operation of the present invention can be performed mainly in the information processing server group 11, but at least part of it can also be performed by an information processing apparatus or the like.

図５に、本発明の一実施形態にかかる画像処理システム等の動作概要を説明するフローチャートを示す。図５のフローチャートには、本発明の一実施形態にかかる画像処理システムの基本動作が、画像データに対する３つの処理、すなわち、（１）ラスタライズ、（２）ハッシュ分業、（３）ニューラルネットワークモデルに基づく実行処理から構成されることが示されている。 FIG. 5 shows a flowchart for explaining the outline of the operation of the image processing system etc. according to one embodiment of the present invention. In the flowchart of FIG. 5, the basic operations of the image processing system according to one embodiment of the present invention are three processes for image data: (1) rasterization, (2) hash division of labor, and (3) neural network model. It is shown to consist of an execution process based on

図５のステップＳ５０１において処理を開始すると、ステップＳ５０２へ進み、ラスタライズ処理が行われる。次に、ステップＳ５０３へ進み、ハッシュ分業処理が行われ、ステップＳ５０４では、ニューラルネットワークモデルに基づく演算処理が行われる。 When the process is started in step S501 of FIG. 5, the process proceeds to step S502 and rasterization is performed. Next, in step S503, hash division of labor is performed, and in step S504, arithmetic processing based on the neural network model is performed.

そして、ステップＳ５０５では、前ステップでの出力結果に基づいてフレームバッファへの出力が行われ、ステップＳ５０６では、ユーザ端末等のディスプレイ等に出力されるべき画像データが出力される。次に、ステップＳ５０７へ進み、説明上の本フローとしては、処理を終了する。 Then, in step S505, output to the frame buffer is performed based on the output result in the previous step, and in step S506, image data to be output to the display of the user terminal or the like is output. Next, the process proceeds to step S507, and the process ends as the main flow for explanation.

なお、図５においては、発明の理解の容易のために、「ラスタライズ」及び「ハッシュ分業」という手段を用いたが、本発明はこれらのみに制限されるものではない。特に、ラスタライズについては、構造化されたデータを使う具体例として述べられたものであって、本発明は、ラスタライズに替えて、種々の「構造化データ」を採用することができる。 In FIG. 5, the means of "rasterization" and "hash division of labor" are used for easy understanding of the invention, but the invention is not limited to these. In particular, rasterization has been described as a specific example using structured data, and the present invention can employ various "structured data" instead of rasterization.

次に、データ構造化の一例としてのラスタライズ、ハッシュ分業、及びニューラルネットワークモデルに基づく実行処理のそれぞれについての詳細を説明する。
（１）ラスタライズ
一実施形態において、ラスタライズ処理は、ＡＰＩサーバ２３で稼働しているＡＰＩからのリクエストによって開始される。このラスタライズ処理の入力は、３Ｄグラフィックスデータの原型であり、一実施形態において、三角形メッシュ等のポリゴン情報（世界規模）、景色内の光源（世界規模）、カメラ情報（世界規模）が挙げられる。 Next, details of rasterization, hash division of labor, and execution processing based on a neural network model will be described as examples of data structuring.
(1) Rasterization In one embodiment, the rasterization process is initiated by a request from an API running on the API server 23 . The inputs for this rasterization process are prototypes of 3D graphics data, which in one embodiment include polygon information (world scale) such as triangle meshes, light sources in the scene (world scale), and camera information (world scale). .

本発明の一実施形態におけるラスタライズ処理の内容は、ポリゴン情報におけるメッシュをラスタライズしてバッファメモリに書き込むというものである。ラスタライズ方式には、３Ｄコンピュータグラフィックスのレンダリング方法における種々の方式を採用することができる。 The content of the rasterizing process in one embodiment of the present invention is to rasterize the mesh in the polygon information and write it into the buffer memory. Various methods in 3D computer graphics rendering methods can be adopted as the rasterization method.

本発明の一実施形態におけるラスタライズ処理においては、テクスチャ用ＵＶやメッシュ分類ＩＤや照射される光線の光度などを保持するように処理される。一方で、本発明の一実施形態におけるラスタライズ処理においては、色情報を保持する必要はない。 In the rasterization process according to one embodiment of the present invention, processing is performed so as to retain texture UV, mesh classification ID, luminosity of irradiated light, and the like. On the other hand, it is not necessary to retain color information in the rasterization process according to one embodiment of the present invention.

本発明の一実施形態におけるラスタライズ処理における出力は、ＵＶ（ピクセル当たり）、分類情報（ピクセル当たり）、光度情報（ピクセル当たり）である。
（２）ハッシュ分業
一実施形態において、ハッシュ分業における入力は、ＵＶ（ピクセル当たり）、分類情報（ピクセル当たり）、及び世界規模における回路重み配列である。 The outputs of the rasterization process in one embodiment of the present invention are UV (per pixel), classification information (per pixel), and luminosity information (per pixel).
(2) Hash Division In one embodiment, the inputs in the hash division are UV (per pixel), classification information (per pixel), and circuit weight arrays on a global scale.

本発明の一実施形態におけるハッシュ分業の内容は、画面フレーム中の領域をタイルに分割したり、分割したそれぞれのタイルの処理をＧＰＵの命令スケジューラに預けたりすることである。本発明の一実施形態においては、タイル内のデータ依存性はないので、複数のＳＭごとに並列して処理を進めることができる。 The content of the hash division of labor in one embodiment of the present invention is to divide the area in the screen frame into tiles and to entrust the processing of each divided tile to the instruction scheduler of the GPU. In one embodiment of the present invention, there are no intra-tile data dependencies, so multiple SMs can be processed in parallel.

次に、タイルピクセルの内容がハッシュ値に変換される。変換されたハッシュ値は、ニューラルネットワークにおける重みの配列キーとして採用され、これらの値はオンチップメモリに書き込まれる。 The contents of the tile pixels are then converted to hash values. The transformed hash values are employed as weight array keys in the neural network, and these values are written to on-chip memory.

本発明の一実施形態におけるハッシュ分業による出力は、画面フレーム中のタイル当たりのニューラルネットワークの重み（以下、「重み情報」ともいう）によって推定される結果（推定値）である。 The output from the hash division of labor in one embodiment of the present invention is the result (estimated value) estimated by the weight of the neural network per tile in the screen frame (hereinafter also referred to as "weight information").

また、本発明はこれに制限されるものではないが、一実施形態においてＧＰＵを利用してハッシュ分業を実施するにあたっては、処理ユニットにおいて互いに効率的に同期できる複数の部分プロセッサ（同期グループ、またはＳＭ）と、前記同期グループが共有できるオンチップメモリとを有する処理ユニットを採用することができる。この場合、ハッシュ分業は、画面フレーム中のタイル毎に前記同期グループを割り当てて、ニューラルネットワークの重みを上記共有可能なオンチップメモリ上に読み込む。タイル毎のニューラルネットワークは十分に小さいため、上記共有可能なオンチップメモリに全て読み込むことができる。タイル外の重みは不要なため前記同期グループの外部との同期も不要となり、同じニューラルネットワークを利用する複数ピクセルを同時に処理できる。このため、一層効率化することができる。 In addition, although the present invention is not limited to this, in one embodiment, a GPU is used to perform hash division of labor. SM) and on-chip memory that the sync groups can share. In this case, the hash division assigns the sync group for each tile in the screen frame and loads the neural network weights onto the sharable on-chip memory. The per-tile neural network is small enough to be loaded entirely into the sharable on-chip memory. Since there is no need for weights outside the tile, no synchronization outside the sync group is needed, and multiple pixels using the same neural network can be processed simultaneously. Therefore, the efficiency can be further improved.

さらに、本発明はこれに制限されるものではないが、一実施形態においてＧＰＵを利用してハッシュ分業を実施するにあたっては、処理ユニットにおいて互いに効率的に同期できる複数の部分プロセッサ（同期グループ、またはＳＭ）と、レジスタとを有する処理ユニットを採用することができる。この場合、ハッシュ分業は、画面フレーム中のタイル毎に前記同期グループを割り当てて、ニューラルネットワークの中間レイヤーの出力を上記レジスタに格納することで、レイヤー毎にＧＰＵ上のＲＡＭやキャッシュといった、いわゆる「遅いメモリ」への出力との同期を行うことなく各レイヤーに対する処理を実行することができ、処理を一層効率化することができる。
（３）ニューラルネットワークの実行処理
一実施形態において、ニューラルネットワークの実行処理おける入力は、ＵＶ（ピクセル当たり）、光度情報（ピクセル当たり）、重み情報（画面フレーム中のタイル当たり）である。 Furthermore, although the invention is not so limited, in one embodiment, utilizing a GPU to implement hash division of labor involves multiple partial processors (synchronization groups, or SM) and registers can be employed. In this case, the hash division of labor assigns the synchronization group to each tile in the screen frame and stores the output of the intermediate layer of the neural network in the above register, so that each layer can be used as a RAM or cache on the GPU. Processing for each layer can be executed without synchronizing with output to "slow memory", and processing can be made more efficient.
(3) Neural Network Execution Process In one embodiment, the inputs in the neural network execution process are UV (per pixel), luminous intensity information (per pixel), and weight information (per tile in the screen frame).

本発明の一実施形態におけるニューラルネットワークの実行処理の内容は、オンチップメモリ上に書き込まれた重み情報を読み出し、ニューラルネットワークの出力神経までの計算を実行することである。画面フレーム中のタイルごとの計算を行うので入力ドメイン（入力領域）は広くならず、神経チャネルの数も低減させることができる。そのため、計算の途中結果を都度キャッシュメモリ等に送信する必要がなくなるという利点がある。 The content of the execution processing of the neural network in one embodiment of the present invention is to read the weight information written on the on-chip memory and execute the calculation up to the output nerve of the neural network. Since the calculation is performed for each tile in the screen frame, the input domain is not widened and the number of neural channels can be reduced. Therefore, there is an advantage that it is not necessary to send intermediate results of calculations to a cache memory or the like each time.

なお、同じタイルにおける神経対神経のデータ依存性については、同じタイルに対する同じ命令スケジューラにより計算が実行されるため、然したる問題とはならない。 Note that nerve-to-nerve data dependencies in the same tile are not a big problem since the computations are executed by the same instruction scheduler for the same tile.

本発明の一実施形態におけるニューラルネットワークの実行処理における出力は、ＲＧＢＡデータ（ピクセル当たり）である。このＲＧＢＡデータが画面出力用データとして取り扱われる。
（トレーニング工程及び前処理工程）
なお、図５に示した処理を実行する前に、本発明の一実施形態においても、ニューラルネットワークのトレーニング工程を有する。本発明の一実施形態においては、このトレーニング工程の前処理として種々の最適化を行うことも特徴の一つとなっている。これらの前処理工程については、図１０～図１２を参照して後述する。 The output of the neural network execution process in one embodiment of the present invention is RGBA data (per pixel). This RGBA data is handled as screen output data.
(Training process and pretreatment process)
Before executing the processing shown in FIG. 5, the embodiment of the present invention also has a neural network training step. One of the features of one embodiment of the present invention is that various optimizations are performed as preprocessing for this training process. These pretreatment steps will be described later with reference to FIGS.

図６に、本発明の一実施形態にかかる画像処理システム等の動作の詳細を説明するフローチャートを示す。図６に示された動作フローは、図５に示された動作フローと同様に、ラスタライズ処理からピクセルごとの色情報出力までが示されているが、図５に示されたフローよりも具体的に説明されている。また、図５において説明した動作と重複する動作については、適宜説明を割愛している。つまり、図６に示された動作フローは、図５を参照して説明した内容を採用することができるが、図５には示されなかったバリエーションを含むものである。 FIG. 6 shows a flow chart explaining the details of the operation of the image processing system etc. according to one embodiment of the present invention. The operation flow shown in FIG. 6, like the operation flow shown in FIG. 5, shows from rasterization processing to color information output for each pixel, but is more specific than the flow shown in FIG. is explained in Further, descriptions of operations that overlap with the operations described in FIG. 5 are omitted as appropriate. In other words, the operation flow shown in FIG. 6 can employ the contents described with reference to FIG. 5, but includes variations not shown in FIG.

図６のステップＳ６０１において処理を開始すると、ステップＳ６０２へ進み、画面フレームごとのメッシュ情報、光源情報、カメラ情報（いずれも世界規模）に対するラスタライズ処理が行われる。 When the process starts in step S601 of FIG. 6, the process proceeds to step S602, where rasterization processing is performed on mesh information, light source information, and camera information (all of which are on a global scale) for each screen frame.

一実施形態において、このラスタライズ処理では、従来の多くのラスタライズの手法を採用することができる。各ピクセルの中間状態は、Ｇ－Ｂｕｆｆｅｒのようなバッファに保存される。ラスタライズの出力内容は、一例として、表面の２次元位置（ＵＶ）、表面の種類のＩＤ（分類ＩＤ）、各種類の光の量（光度）である。 In one embodiment, this rasterization process can employ many conventional rasterization techniques. The intermediate state of each pixel is saved in a buffer such as a G-Buffer. The output contents of rasterization are, for example, the two-dimensional position (UV) of the surface, the ID of the surface type (classification ID), and the amount of each type of light (luminous intensity).

ステップＳ６０３では、ピクセルごとのＵＶ、分類ＩＤ、光度情報（いずれも世界規模）に対する分割処理が行われる。 In step S603, division processing is performed on UV, classification ID, and luminous intensity information (all of which are on a global scale) for each pixel.

ステップＳ６０５では、ピクセルごとのＵＶ、分類ＩＤ（いずれも世界規模）に対するハッシュ計算処理が行われる。なお、本ステップでは、ピクセルごとのＵＶ、分類ＩＤが取り上げられているが、本発明はこれに限定されるものではなく、光度情報（世界規模）に対するハッシュ計算処理がなされてもよい。 In step S605, hash calculation processing is performed on the UV and classification ID (both of which are on a global scale) for each pixel. In this step, UV and classification ID for each pixel are taken up, but the present invention is not limited to this, and hash calculation processing may be performed on luminous intensity information (world scale).

ここでのハッシュ計算については、任意のハッシュ計算式を採用することができるが、重要なことは、タイルサイズの表面領域が十分に類似した特性（例えばＵＶ）を持っている場合には、同じハッシュ値が得られるという性質が維持されていることである。本発明の一実施形態においては、ビットフィールドを用いたアプローチが採用される。また、ＵとＶは、個々にセクションに分けて管理される。一実施形態において、１つのビットが各セクションを表す。 For the hash calculation here, any hash formula can be adopted, but importantly, if the tile-sized surface areas have sufficiently similar properties (e.g. UV), the same The feature of obtaining a hash value is maintained. In one embodiment of the present invention, a bitfield approach is employed. Also, U and V are individually managed in sections. In one embodiment, one bit represents each section.

ここで、タイルがセクション内のいずれかのピクセルを含む場合、対応するビットは１に設定され、そうでなければ、ビットは０のまま維持される。なお、画素情報はレジスタに読み込まれて、レジスタなどに保持されていることが望ましい。 Here, if the tile contains any pixel in the section, the corresponding bit is set to 1, otherwise the bit remains 0. It is desirable that the pixel information be read into a register and held in a register or the like.

ステップＳ６０６では、前ステップで計算されたタイルごとのハッシュ値（タイル規模）に対し、ルックアップテーブルから取り出された値を適用してタイルごとのニューラルネットワークの重みが生成される（タイル規模）。 In step S606, the neural network weights for each tile are generated (tile scale) by applying the value retrieved from the lookup table to the hash value (tile scale) for each tile calculated in the previous step.

ここで、ルックアップテーブル検索については、メモリの読み出しとなるため、すべてのメモリリードはパフォーマンスを低下させる要因となる。したがって、ある時点でニューラルネットワークの重みをＧＰＵの命令スケジューラ（複数の命令スケジューラが組み合わされることもある）のローカル・オンチップメモリに読み込む必要があるが、本ステップにおいてこの読み込みが実施されても良い。現在のタイルの内容を完全に表すハッシュ値はすでに計算されているので、そのハッシュ値を使って、この種の内容を扱うように厳密に訓練されたニューラルネットワークを選択することができる。 Here, since lookup table search is a memory read, all memory reads are a factor in degrading performance. Therefore, at some point the neural network weights need to be loaded into the local on-chip memory of the GPU's instruction scheduler (sometimes multiple instruction schedulers are combined), which may be performed in this step. . Since we have already computed a hash value that fully represents the current tile's content, we can use that hash value to select a neural network that has been rigorously trained to handle this kind of content.

ハッシュ値に基づいてデータを取得する方法は、種々考えられるが、本発明の一実施形態においては、本フローで生成されるハッシュをオフチップ・メモリの配列へのキーとして利用することができる。 Various methods of obtaining data based on hash values are conceivable, but in one embodiment of the present invention, the hash generated in this flow can be used as a key to the off-chip memory array.

ステップＳ６０７では、前ステップで生成されたニューラルネットワークの重みと、ステップＳ６０３で処理されたタイルごとのＵＶ、光度情報（いずれも世界規模）とが入力とされて、ニューラルネットワークモデルに基づく演算処理が行われる。 In step S607, the weights of the neural network generated in the previous step and the UV and luminous intensity information (both on a global scale) for each tile processed in step S603 are input, and arithmetic processing based on the neural network model is performed. done.

ここで、本発明の一実施形態において、ニューラルネットワークを実行するロジックとしては、ＵＶ、分類ＩＤ、光度を入力とし、ピクセルカラーを出力とする処理系が挙げられる。また、ニューラルネットワークに色ではなく光を出力させることで、複数のニューラルネットワークの出力を低コストで正しく組み合わせることも可能である。また、ここでのロジックには、種々の最適化を適用することができる。 Here, in one embodiment of the present invention, the logic that executes the neural network includes a processing system that takes UV, classification ID, and luminous intensity as inputs and pixel color as output. It is also possible to correctly combine the outputs of multiple neural networks at low cost by having the neural network output light instead of color. Also, various optimizations can be applied to the logic here.

ステップＳ６０８では、色情報が出力される（ピクセル規模）。 In step S608, color information is output (pixel scale).

そして、ステップ６０９へ進み、本フローとしては処理を終了する。 Then, the process proceeds to step 609 and ends the process as this flow.

次に、図７～図１２を参照して、本発明にかかる一実施形態における画像処理システムまたは画像処理プログラムの詳細な動作を説明する。ここでの説明は、図５～６を参照して説明した部分と重複する部分もあるが、ハッシュ化についてのより詳細な説明及びバリエーションが加えられている。 Next, detailed operations of the image processing system or image processing program according to one embodiment of the present invention will be described with reference to FIGS. 7 to 12. FIG. The description here partially overlaps with what was described with reference to FIGS. 5-6, but includes more detailed descriptions and variations on hashing.

なお、既に述べたように、本発明の特徴的な動作は、主として情報処理サーバ群１１において実施可能であるが、少なくともその一部を情報処理装置等に実施させることもできる。 As already described, the characteristic operation of the present invention can be mainly performed by the information processing server group 11, but at least a part of it can also be performed by an information processing apparatus or the like.

図７及び図８Ａに、本発明の一実施形態にかかる画像処理システム等の動作において前提となるデータ構造例が示されている。 7 and 8A show examples of data structures that are prerequisites for the operation of the image processing system and the like according to one embodiment of the present invention.

また、図７示された画面フレーム７０において、タイル情報は、ＵＶ空間に投影されている。ＵＶ空間では、タイルは、見た目ではなく、それが何であるかを反映した場所および形状で表される。したがって、この形状を２つの隣接するビットフィールドで近似することにより、適切に専門化されたニューラルネットに関連付けることができる。 Also, in the screen frame 70 shown in FIG. 7, the tile information is projected into UV space. In UV space, a tile is represented by a location and shape that reflects what it is, rather than what it looks like. Therefore, by approximating this shape with two adjacent bit-fields, it can be associated with a well-specialized neural network.

図７には、横軸をＵハッシュ列（０００００１１１）とされ、かつ、縦軸をＶハッシュ列（０００１１１１０）とされたＵＶ空間が表されており、図７に示されたＵＶ座標系におけるタイル７１は、図８Ａに示された画面フレーム８０（縦横８個、計６４個のタイルが並んでいる）においては、タイル８１となって表れている様子が分かる。なお、一実施形態において、各タイルは、１６×１６のピクセルで構成される。 FIG. 7 shows the UV space with the U hash sequence (00000111) on the horizontal axis and the V hash sequence (00011110) on the vertical axis. It can be seen that 71 appears as a tile 81 in the screen frame 80 (a total of 64 tiles, 8 in length and width) shown in FIG. 8A. Note that in one embodiment, each tile consists of 16×16 pixels.

また、図７及び図８Ａを参照して説明したタイルのＵＶ空間への投影を踏まえ、本発明の一実施形態においては、図８Ｂに定義されるハッシュ関数が使用される。なお、本発明はこれに制限されるものではなく、特定の条件（以下の（Ａ）及び（Ｂ）、ならびに、望ましくは（Ｃ））を満たす限り、どのようなハッシュ関数を用いても差し支えない。
（Ａ）同じハッシュ値を持つタイルは、全体として取得されるすべてのタイプのタイルよりも狭い入力ドメイン（入力領域）を構成すること。 Also, keeping in mind the projection of tiles into UV space described with reference to FIGS. 7 and 8A, in one embodiment of the present invention, the hash function defined in FIG. 8B is used. The present invention is not limited to this, and any hash function may be used as long as specific conditions ((A) and (B) below, and preferably (C) below) are satisfied. do not have.
(A) Tiles with the same hash value constitute a narrower input domain than all types of tiles taken as a whole.

なお、本発明の一実施形態においては、入力ドメインは、狭めるほど良好な結果が得られる。現実的には、実質的にリアルタイムに処理が行える範囲で十分に高速化できるように、この入力ドメインは狭く構成される。
（Ｂ）ハッシュ値の計算は、高速であること（本出願時点におけるコンピュータのハードウェアスペックは十分にこれを満たす）。
（Ｃ）本発明はこれに限定されるものではないが、ハッシュ分業においては、ターゲットタイルのみが入力として使用されること。 In one embodiment of the present invention, the narrower the input domain, the better the results. In practice, this input domain is narrowly configured so that the speed can be sufficiently increased to the extent that processing can be performed substantially in real time.
(B) Calculation of hash values should be fast (computer hardware specifications at the time of filing this application fully satisfy this requirement).
(C) Only target tiles are used as inputs in hash division, although the invention is not so limited.

図９～図１２に、本発明の一実施形態にかかる画像処理システム等の詳細な動作を説明するフローチャートが示されている。より詳細には、図９には、図７及び図８Ａを参照して説明した前提を踏まえたハッシュ化フローの全体像が示されている。図９は、図６において示されたフローを前提としてさらに詳細に記載したものでもある。なお、図９に示された動作フローの前提として、本発明の一実施形態にかかる画像処理システム等が取り扱うシーンデータは、フレームバッファ上でラスタライズされているものとする。 9 to 12 show flowcharts for explaining detailed operations of the image processing system and the like according to one embodiment of the present invention. More specifically, FIG. 9 shows an overview of the hashing flow based on the assumptions explained with reference to FIGS. 7 and 8A. FIG. 9 also describes in more detail assuming the flow shown in FIG. As a premise of the operation flow shown in FIG. 9, it is assumed that the scene data handled by the image processing system according to the embodiment of the present invention is rasterized on the frame buffer.

また、図１０～図１２には、ニューラルネットワークのトレーニングのための処理フロー例が示されている。図１２におけるステップＳ１２１０に示されたトレーニングに向けて、最適化の観点から種々の前処理が行われている。 10 to 12 show an example processing flow for neural network training. Various preprocessing is performed from the viewpoint of optimization toward the training shown in step S1210 in FIG.

図１０には、教師データのバイナリファイルを生成するステージが示されており、ピクセルバッファに対するタイル分割及び各タイルへのハッシュ値割り当て、ならびに、ハッシュマップへのハッシュ値の追加を含むフローが示されている。また、図１１には、コンテンツのバイナリファイルへの出力を含むフローが示されており、図１２には、バイナリファイル内容のメモリ空間へのマッピングを含むフローが示されている。 FIG. 10 shows the stage of generating a binary file of training data, showing the flow of dividing the pixel buffer into tiles, assigning hash values to each tile, and adding hash values to the hash map. ing. Also, FIG. 11 shows a flow including output of content to a binary file, and FIG. 12 shows a flow including mapping of the binary file content to memory space.

図９のステップＳ９０１において処理を開始すると、ステップＳ９０２へ進み、画面フレームバッファから、ＵＶ、メッシュＩＤ、及び照度情報を含むデータが読み込まれる。 When the process starts in step S901 of FIG. 9, the process advances to step S902 to read data including UV, mesh ID, and illuminance information from the screen frame buffer.

ステップＳ９０３では、画面フレームバッファ上のデータが１６×１６ピクセルのタイルに分割される。 In step S903, the data on the screen frame buffer is divided into 16x16 pixel tiles.

ステップＳ９０５では、タイルにおける全てのＵＶを使ってハッシュ値の計算が行われる。また、必要に応じて分類ＩＤが使用されてもよい。 In step S905, a hash value is calculated using all UVs in the tile. Also, a classification ID may be used as needed.

ステップＳ９０６では、前ステップで計算されたハッシュ値をキーとして使用し、ルックアップテーブルからニューラルネットワークの重みが読み出される。 In step S906, the neural network weights are read from the lookup table using the hash value calculated in the previous step as a key.

ステップＳ９０８では、ステップＳ９０６において読み出された重みと、ステップＳ９０３で処理されたタイルごとのＵＶ、光度情報とが入力とされて、ニューラルネットワークインタフェース用データが出力される。この出力データは、一実施形態においてＲＧＢデータである。 In step S908, the weight read in step S906 and the UV and luminosity information for each tile processed in step S903 are input, and neural network interface data is output. This output data is RGB data in one embodiment.

そして、ステップ９０９へ進み、本フローとしては処理を終了する。 Then, the process proceeds to step 909 and ends the process as this flow.

（トレーニング前処理工程）
図１０～図１２に示されるフローでは、図６や図９を参照して説明した処理を実行するためのニューラルネットワークのトレーニングのための前処理とトレーニングのフローが示されている。 (Training pretreatment process)
The flows shown in FIGS. 10 to 12 show preprocessing and training flows for neural network training for executing the processing described with reference to FIGS. 6 and 9. FIG.

タイル毎のニューラルネットワークを学習させる工程において、画面フレーム中のピクセル値をハッシュ化してハッシュ値を計算し、このハッシュ値に対応するタイルの教師データのみを使い、このハッシュ値専用のニューラルネットワークの重みを学習させることができる。 In the process of learning the neural network for each tile, the pixel values in the screen frame are hashed to calculate the hash value, only the training data of the tile corresponding to this hash value is used, and the weight of the neural network dedicated to this hash value can be learned.

本フローにおけるアバター生成器は、人工的に人の顔メッシュを生成するアプリケーションである。本発明の一実施形態においては、顔画像のサンプルを実写映像から収集するのではなく、ＣＧ生成によって収集する趣旨である。 The avatar generator in this flow is an application that artificially generates a human face mesh. In one embodiment of the present invention, the purpose is to collect face image samples by CG generation rather than collecting them from live-action video.

また、メッシュは、レイトレーシングによって描写され、hash_map.txtには、教師データの中に出現するハッシュ値が一覧として保存される。 Also, the mesh is rendered by ray tracing, and hash_map.txt stores a list of hash values that appear in the training data.

本発明は、これに制限されるものではないが、一実施形態として、デジタルコンテンツ作成アプリケーション間の相互運用性を提供するために使用されているｆｂｘファイルが使用される（以下、同様）。ｆｂｘファイルは、３Ｄデータ転送用のオープンフレームワークである。 In one embodiment, although the invention is not so limited, fbx files are used to provide interoperability between digital content creation applications (and so on). An fbx file is an open framework for 3D data transfer.

そして、図１０のステップＳ１００１において処理を開始すると、ステップＳ１００２へ進み、アバター生成器の出力ファイルが入力される。次に、Ｓ１００３へ進み、メモリに記憶されている全処理対象のｆｂｘファイルが読み出され、リスト化される。 Then, when the process starts in step S1001 of FIG. 10, the process advances to step S1002, and the output file of the avatar generator is input. Next, proceeding to S1003, all fbx files to be processed stored in the memory are read and listed.

ステップＳ１００４では、読み出されたｆｂｘファイルリストのうち、処理すべきｆｂｘファイルが残っているかどうかが判断され、Ｙｅｓの場合はＳ１００５へ進むが、Ｎｏの場合はＳ１０１０へ進む。 In step S1004, it is determined whether or not any fbx file to be processed remains in the read fbx file list.

ステップＳ１００５では、リストとして読み出されたｆｂｘファイルから、今回のルーチンで処理すべきｆｂｘファイルが読み込まれ、ステップＳ１００６では、ピクセルバッファに対してメッシュがレンダリングされる。 In step S1005, the fbx file to be processed in this routine is read from the list of fbx files, and in step S1006, the mesh is rendered in the pixel buffer.

次に、Ｓ１００７へ進み、ピクセルバッファはタイルに分割され、各タイルへハッシュ値が割り当てられる。 Next, proceeding to S1007, the pixel buffer is divided into tiles and a hash value is assigned to each tile.

ステップＳ１００８では、ハッシュ値は現在のメモリ上のハッシュマップに存在するかどうかが判断され、存在する場合（ステップＳ１００８においてＹｅｓ）は、ステップＳ１００４へ復帰し、存在しない場合（ステップＳ１００８においてＮｏ）は、ステップＳ１００９へ進む。 In step S1008, it is determined whether or not the hash value exists in the hash map on the current memory. , the process proceeds to step S1009.

ステップＳ１００９では、前ステップで判断されたハッシュ値がメモリ上のハッシュマップへ追加される。 In step S1009, the hash value determined in the previous step is added to the hash map in memory.

また、ステップＳ１０１０では、処理すべきｆｂｘファイルは全て処理し終えたので、メモリ上のハッシュマップにおけるハッシュ値は、ハッシュマップファイル（hash_map.txt）へ出力される。 Also, in step S1010, since all the fbx files to be processed have been processed, the hash values in the hash map on the memory are output to the hash map file (hash_map.txt).

そして、ステップＳ１０１１へ進み、本フローとしては処理を終了する。 Then, the process proceeds to step S1011, and the process as this flow ends.

図１１は、教師データのバイナリファイルを生成するステージである。図１０に示されたフローに続く一実施形態として、デジタルコンテンツ作成アプリケーション間の相互運用性を提供するために使用されているｆｂｘファイルが使用される。 FIG. 11 shows the stage of generating a binary file of teacher data. As one embodiment following the flow shown in FIG. 10, fbx files are used that are used to provide interoperability between digital content creation applications.

本フローにおいて、タイルデータは、ハッシュ値によって区分される必要があるが、全てを同時にＲＡＭ等のメモリに格納することは困難である。そのため、本発明の一実施形態においては、バイナリファイルを段階的に書き出し続ける。この処置は、オペレーティングシステム及びＳＳＤの構成によっては、速度の面で有意な効果を奏する。 In this flow, the tile data must be sorted by hash values, but it is difficult to store all of them in a memory such as RAM at the same time. Therefore, in one embodiment of the present invention, the binary file is written out incrementally. This measure can have significant speed benefits, depending on the operating system and SSD configuration.

また、本発明の一実施形態における教師データタイルのデータセットとして、ピクセル当たりの入力（Ｕ，Ｖ，光度，分類ＩＤ）及び出力（Ｒ，Ｇ，Ｂ，α）が採用されうる。本発明は、これに制限されるものではないが、ＲＡＭの問題点を考慮した場合には、ＡＯＳ（Array of structs：構造体の配列）のように書き出される。後のトレーニング工程では、ＳＯＡ（Struct of arrays；配列の構造体）が必要になる。異なるハッシュ値は異なるバイナリファイルに保存される。 Also, the inputs (U, V, luminosity, classification ID) and outputs (R, G, B, α) per pixel can be adopted as the dataset of the training data tile in one embodiment of the present invention. Although the present invention is not limited to this, in consideration of RAM problems, it is written like an AOS (Array of structs). A later training step requires a SOA (Structure of arrays). Different hash values are stored in different binary files.

本発明はこれに限定されるものではないが、一実施形態において、hash_map.txtには、入っているハッシュ値のタイル情報のみが格納される。レンダリングは、リアルタイム用のグラフィックスＡＰＩによって実行される
そして、図１１のステップＳ１１０１において処理を開始すると、ステップＳ１１０２へ進み、アバター生成器の出力ファイルが入力される。次に、Ｓ１１０３へ進み、メモリに記憶されている全処理対象のｆｂｘファイルが読み出され、リスト化される。 In one embodiment, although the invention is not so limited, hash_map.txt only contains the tile information for the contained hash values. Rendering is performed by a graphics API for real time. Then, when the process starts in step S1101 of FIG. 11, the process proceeds to step S1102, and the output file of the avatar generator is input. Next, proceeding to S1103, all fbx files to be processed stored in the memory are read and listed.

ステップＳ１１０４では、図１０のステップＳ１０１０においてハッシュ値が出力された先のハッシュマップファイル（hash_map.txt）を読み込む。 In step S1104, the hash map file (hash_map.txt) to which the hash value was output in step S1010 of FIG. 10 is read.

次に、ステップＳ１１０５では、各配列におけるハッシュマップファイル（hash_map.txt）からのハッシュ値でメモリ上のハッシュマップを埋める。 Next, in step S1105, the hash map on the memory is filled with hash values from the hash map file (hash_map.txt) in each array.

ステップＳ１１０６では、ステップＳ１１０４で読み出されたｆｂｘファイルリストのうち、処理すべきｆｂｘファイルが残っているかどうかが判断され、Ｙｅｓの場合はＳ１１０７へ進むが、Ｎｏの場合はＳ１１１４へ進む。 In step S1106, it is determined whether or not fbx files to be processed remain in the fbx file list read out in step S1104.

ステップＳ１１０７では、リストとして読み出されたｆｂｘファイルから、今回のルーチンで処理すべきｆｂｘファイルが読み込まれ、ステップＳ１１０８では、ピクセルバッファに対してメッシュがレンダリングされる。 In step S1107, the fbx file to be processed in this routine is read from the list of fbx files, and in step S1108, the mesh is rendered in the pixel buffer.

ステップＳ１１０９では、一実施形態においてターゲットファイルとなるｐｎｇファイルが読み込まれる。 In step S1109, the png file, which in one embodiment is the target file, is read.

次に、Ｓ１１１０へ進み、ピクセルバッファはタイルに分割され、各タイルへハッシュ値が割り当てられる。ステップＳ１１１１では、そのハッシュ値に関連する配列へ各タイルが加えられる。 Next, proceeding to step S1110, the pixel buffer is divided into tiles and a hash value is assigned to each tile. In step S1111, each tile is added to an array associated with its hash value.

次に、ステップＳ１１１２では、ハッシュマップ配列のコンテンツが適切なバイナリファイルへ出力され、ステップＳ１１１３では、ハッシュマップにおける配列からコンテンツが消去される。 Next, in step S1112, the contents of the hashmap array are output to the appropriate binary file, and in step S1113, the contents are cleared from the array in the hashmap.

ステップＳ１１１４では、ハッシュマップ配列のコンテンツが適切なバイナリファイルへ出力される。 In step S1114, the contents of the hashmap array are output to the appropriate binary file.

そして、ステップＳ１１１５へ進み、本フローとしては処理を終了する。 Then, the process advances to step S1115, and the process as this flow ends.

図１２は、実際に１つのハッシュ値のニューラルネットワークをトレーニングするステージであり、図１１に示されたフローに続く一実施形態として、バイナリファイル内容がメモリ空間へマッピングされる工程フローが示されている。 FIG. 12 shows the stage of actually training a neural network for one hash value, and shows the process flow of mapping the binary file contents to the memory space as an embodiment following the flow shown in FIG. there is

本フローにおいては、配列のストライドだけを合わせれば、バイナリファイルのデータをそのままで利用できるようになっている。そのため、一般的なケースと異なり、読み込んだ教師データをトレーニングの入力と出力に適した形式に変換する処理が必要ない（例えば、画像のフォーマットをint8型からfloat32型に変換したり、コンテンツを修正したりするなど）。 In this flow, the data of the binary file can be used as it is by adjusting the stride of the array. Therefore, unlike the general case, there is no need to convert the loaded training data into a format suitable for training input and output (for example, converting the image format from int8 to float32, modifying the content, etc.). etc.).

図１２における最後の工程はニューラルネットワークのトレーニングである（このトレーニングそれ自体には、従前の手法を採用できるため、本発明に特有の事項を除き、詳細の説明を割愛している）。 The final step in FIG. 12 is the training of the neural network (for this training itself, a conventional method can be adopted, so detailed explanation is omitted except for matters specific to the present invention).

本発明の一実施形態において、ニューラルネットワークのトレーニング・フレームワーク（ＰｙＴｏｒｃｈ）は、Ｐｙｔｈｏｎ言語のＧＵＩを介して利用されることができる。余談であるが、Ｐｙｔｈｏｎ言語は、計算処理の遅い言語であるため、データを実際に読み込まないほうが好ましい。本発明の一実施形態においては、オペレーティングシステムとＣＰＵの構成により、ファイル内容は、仮想記憶のポインタを介して直接指定できる。また、ファイル内変数を指すポインタとファイル内配列のストライドをＮｕｍｐｙ配列のフォーマットでＰｙＴｏｒｃｈのＧＵＩに送れば、そのまま教師データをＰＣＩｅ外側のＧＰＵに速く転送することができる。Ｎｕｍｐｙは、他のＰｙｔｈｏｎ言語ライブラリである。 In one embodiment of the present invention, the neural network training framework (PyTorch) can be utilized via a Python language GUI. As an aside, the Python language is a computationally slow language, so it is preferable not to actually read the data. In one embodiment of the present invention, the operating system and CPU configuration allows file contents to be specified directly through pointers in virtual memory. Also, if the pointer pointing to the variable in the file and the stride of the array in the file are sent to the GUI of PyTorch in the Numpy array format, the teacher data can be transferred quickly to the GPU outside PCIe. Numpy is another Python language library.

本発明は、これに制限されるものではないが、一実施形態において、ニューラルネットワークのアーキテクチャは、Gao Huang他の”Densely Connected Convolutional Networks”（https://arxiv.org/pdf/1608.06993v5.pdf）に基づいて実装されることができる。 In one embodiment, although the invention is not so limited, the neural network architecture is based on Gao Huang et al. ) can be implemented based on

また、図１２に示されたフローにおいては、３×３の畳み込みカーネルに替えて、１×１の畳み込みカーネルを使用することができる。 Also, in the flow shown in FIG. 12, a 1×1 convolution kernel can be used in place of the 3×3 convolution kernel.

図１２のステップＳ１２０１において処理を開始すると、ステップＳ１２０２へ進み、図１１に示されたフローまでに生成されたハッシュ値が入力される。次に、ステップＳ１２０３へ進み、バイナリファイルのリストであるｆｉｌｅ＿ｌｉｓｔ＝｛｝が確保される。 When the process starts in step S1201 of FIG. 12, the process advances to step S1202, and the hash value generated up to the flow shown in FIG. 11 is input. Next, the flow advances to step S1203 to secure file_list={}, which is a list of binary files.

ステップＳ１２０４では、確保されたバイナリファイルのリストのうち、処理すべきバイナリファイルが残っているかどうかが判断され、Ｙｅｓの場合はＳ１２０５へ進むが、Ｎｏの場合はＳ１２０９へ進む。 In step S1204, it is determined whether or not any binary file to be processed remains in the list of reserved binary files.

次に、ステップＳ１２０６へ進み、
（ｂｉｎａｒｙ＿ｆｉｌｅ．ｎａｍｅ．ｕ＆ｈａｓｈ．ｕ）＞０
であるかどうかが判断され、Ｙｅｓの場合はステップＳ１２０７へ進み、Ｎｏの場合はステップＳ１２０４へ復帰する。 Next, proceeding to step S1206,
(binary_file.name.u &hash.u)>0
If Yes, the process proceeds to step S1207, and if No, the process returns to step S1204.

次に、ステップＳ１２０７へ進み、
（ｂｉｎａｒｙ＿ｆｉｌｅ．ｎａｍｅ．ｖ＆ｈａｓｈ．ｖ）＞０
であるかどうかが判断され、Ｙｅｓの場合はステップＳ１２０８へ進み、Ｎｏの場合はステップＳ１２０４へ復帰する。 Next, the process proceeds to step S1207,
(binary_file.name.v & hash.v) > 0
If Yes, the process proceeds to step S1208, and if No, the process returns to step S1204.

ステップＳ１２０８では、バイナリファイルがｆｉｌｅ＿ｌｉｓｔへ追加される。 In step S1208, the binary file is added to file_list.

ステップＳ１２０９では、ｆｉｌｅ＿ｌｉｓｔ内のファイルごとに１回、ファイルの内容がｐｙｔｏｒｃｈテンソル仮想メモリ空間に直接マップされる。 In step S1209, once for each file in file_list, the contents of the file are mapped directly into the pytorch tensor virtual memory space.

ステップＳ１２１０では、このハッシュに対するニューラルネットワークがトレーニングされる。 In step S1210, a neural network is trained on this hash.

そして、ステップＳ１２１１へ進み、本フローとしては処理を終了する。 Then, the process advances to step S1211, and the process as this flow ends.

なお、本発明の一実施形態においては、ハッシュ値に対応するタイルの教師データは、ハッシュ値によって、入力領域として近い他の複数のタイルの教師データを含ませることができる。これにより、タイルの境界部分をより自然な結果となるように学習させることができる。 Note that in one embodiment of the present invention, the training data of a tile corresponding to a hash value can include training data of a plurality of other tiles that are close to the input area, depending on the hash value. As a result, it is possible to learn the tile boundary portion so as to obtain a more natural result.

バイナリファイルは、入力領域を指定するハッシュ値によって区分されている。ニューラルネットワークのトレーニングは、自身のハッシュ値や、自身のハッシュ値と近似しているハッシュ値を利用して行われる。ハッシュ値は、元々ＵＶ空間の領域を表すビットフィールドなので、ビット単位のＡＮＤ演算（論理積）により、値と値の類似を推計することができる。 Binary files are partitioned by hash values that specify input areas. Training of the neural network is performed using its own hash value or a hash value that approximates its own hash value. Since the hash value is originally a bit field representing the area of the UV space, it is possible to estimate the similarity between values by bitwise AND operation (logical product).

図１３に、本発明の一実施形態にかかる画像処理システム等の動作の具体例のうち、１×１の畳み込みカーネルを採用したことによる動作原理を示す。 FIG. 13 shows the operating principle by adopting a 1×1 convolution kernel among specific examples of the operation of the image processing system according to the embodiment of the present invention.

以下、従来の画像処理システム等における畳み込みカーネルの計算処理例（図１７～図１８）と対比しながら、本発明の一実施形態にかかる画像処理システム等における畳み込みカーネルの計算処理について説明する。 Calculation processing of convolution kernels in an image processing system according to an embodiment of the present invention will be described below in comparison with an example of calculation processing of convolution kernels in a conventional image processing system (FIGS. 17 and 18).

まず、図１３に示される本発明の一実施形態にかかる１×１の畳み込みカーネルでは、入力画像領域１３１０に入力画像の少なくとも一部が示されており、入力画像領域１３１０内の畳み込みカーネル対象領域１３１１には、情報（１）が埋められているものとする。また、図１３に示された畳み込みカーネル１３２０には、情報（０）が埋められているものとする。 First, in the 1×1 convolution kernel according to one embodiment of the present invention shown in FIG. It is assumed that information (1) is embedded in 1311 . It is also assumed that information (0) is embedded in the convolution kernel 1320 shown in FIG.

そして、畳み込みカーネル対象領域１３１１と畳み込みカーネル１３２０との演算による出力層１３３０上の出力は、畳み込みカーネル対象領域１３１１及び畳み込みカーネル１３２０中の各要素の積（及び和）によって、以下のように求められる。 Then, the output on the output layer 1330 by the operation of the convolution kernel target region 1311 and the convolution kernel 1320 is obtained as follows by the product (and sum) of each element in the convolution kernel target region 1311 and the convolution kernel 1320 .

（１×０）＝０
本発明はこれに制限されるものではないが、本発明の一実施形態にかかる画像処理システム等においては、上述したような１×１の畳み込みカーネルが採用されることで、データ依存性の問題を一層解決することができる。また、本発明の他の実施形態においては、１×１の畳み込みカーネル以外の簡素な構成の畳み込みカーネルによっても、同様の効果を奏することができる。 (1 x 0) = 0
Although the present invention is not limited to this, in an image processing system or the like according to an embodiment of the present invention, by adopting the 1×1 convolution kernel as described above, the data dependence problem can be further resolved. Further, in other embodiments of the present invention, the same effect can be achieved by a convolution kernel with a simple configuration other than the 1×1 convolution kernel.

次に、１×１の畳み込みカーネルとの比較のために、従来の畳み込みカーネルの計算コストを説明した後、本発明の一実施形態におけるもう一つの特徴であるハッシュ分業について説明する。
（従来の畳み込みニューラルネットワーク）
従来の畳み込みニューラルネットワークの設計方法は、本願の出願時点において最新のＧＰＵに実装した場合には、ハードウェアの利用効率が低下してしまうという問題がある。一方で、リアルタイム・レンダリングという特殊なケースでは、通常とは異なる有利なデータ依存特性により、これらの非効率性を改善できることは上述したとおりであり、そのためのアルゴリズムが「ハッシュ分業」である（「ハッシュ分業」の具体例については、図１４～図１６を参照してさらに詳細に説明する）。 Next, for comparison with the 1×1 convolution kernel, after explaining the calculation cost of the conventional convolution kernel, the hash division of labor, which is another feature of one embodiment of the present invention, will be explained.
(conventional convolutional neural network)
The conventional method of designing a convolutional neural network has a problem that the utilization efficiency of hardware decreases when it is implemented in the latest GPU at the time of filing of the present application. On the other hand, in the special case of real-time rendering, as described above, these inefficiencies can be improved by using unusual and advantageous data-dependent characteristics. A specific example of "hash division of labor" will be described in more detail with reference to FIGS. 14 to 16).

図１７～図１８に、従来の画像処理システム等における畳み込みカーネルの計算処理例が示されている。より具体的には、図１７には、畳み込みカーネルが生成する計算の様子が示されており、図１８には、図１７に示される計算が入力チャンネル数に応じてどのように変化するかが示されている。 17 and 18 show examples of calculation processing of convolution kernels in a conventional image processing system or the like. More specifically, FIG. 17 shows the calculations generated by the convolution kernel, and FIG. 18 shows how the calculations shown in FIG. 17 change with the number of input channels. It is shown.

なお、チャンネルの数は、ニューラルネットワークに表現力を持たせるために多数用意される。一例として、５１２チャンネルなどである（それ以上でも差し支えない）。一般に、画像の品質や表現力を向上させるためにはチェンネル数を増大させる必要がある。しかしながら、各ピクセルにおいては、ほとんどのチャンネルが無関係になっていることが多く、非効率である。つまり、チャンネル数による非効率は、スケーラビリティのボトルネックとなる。 A large number of channels are prepared in order to give expressiveness to the neural network. An example is 512 channels (or more). In general, it is necessary to increase the number of channels in order to improve image quality and expressive power. However, in each pixel, most of the channels are often irrelevant, which is inefficient. In other words, inefficiency due to the number of channels becomes a scalability bottleneck.

図１７には、入力画像領域１７１０に入力画像の少なくとも一部が示されており、入力画像領域１７１０内の畳み込みカーネル対象領域１７１１には、左上から順に、（０，０，０，０，１，１，０，１，２）の情報が詰められているものとする。また、図１７に示された畳み込みカーネル１７２０には、左上から順に、（４，０，０，０，０，０，０，０，－４）の情報が詰められている。 FIG. 17 shows at least part of the input image in the input image area 1710, and the convolution kernel target area 1711 in the input image area 1710 includes (0, 0, 0, 0, 1 , 1, 0, 1, 2) are packed. Also, the convolution kernel 1720 shown in FIG. 17 is packed with information (4, 0, 0, 0, 0, 0, 0, 0, -4) in order from the upper left.

そして、畳み込みカーネル対象領域１７１１と畳み込みカーネル１７２０との演算による出力層１７３０上の出力は、畳み込みカーネル対象領域１７１１及び畳み込みカーネル１７２０中の各要素の積及び和によって、以下のように求められる。 Then, the output on the output layer 1730 by the operation of the convolution kernel target region 1711 and the convolution kernel 1720 is obtained by the product and sum of each element in the convolution kernel target region 1711 and the convolution kernel 1720 as follows.

（０×４）＋（０×０）＋（０×０）＋
（０×０）＋（１×０）＋（１×０）＋
（０×０）＋（１×０）＋（２×（－４））
＝－８
図１８には、図１７に示される計算が入力チャンネル数に応じてどのように変化するかが示されており、チャンネル数が３であった場合の計算例である。図１８に示されるように、入力画像領域１８１０ａ～１８１０ｃ内の畳み込みカーネル対象領域１８１１ａ～１８１１ｃと、畳み込みカーネル１８２０ａ～１８２０ｃとの演算による出力層１８３０上の出力は、以下のように求められる。 (0×4)+(0×0)+(0×0)+
(0×0)+(1×0)+(1×0)+
(0×0)+(1×0)+(2×(−4))
=-8
FIG. 18 shows how the calculation shown in FIG. 17 changes according to the number of input channels, and is an example of calculation when the number of channels is three. As shown in FIG. 18, the output on the output layer 1830 by operating the convolution kernel target regions 1811a-1811c in the input image regions 1810a-1810c and the convolution kernels 1820a-1820c is obtained as follows.

（０×１）＋（０×０）＋（０×４）＋
（０×１）＋（０×０）＋（０×０）＋
（１×０）＋（０×４）＋（０×０）＋
（１×０）＋（０×０）＋（０×０）＋
（１×０）＋（１×１）＋（１×０）＋
（２×１）＋（１×１）＋（１×０）＋
（０×０）＋（０×０）＋（０×０）＋
（０×０）＋（１×０）＋（１×０）＋
（０×１）＋（０×０）＋（２×（－４））
＝－４
このように、図１７及び図１８に示される、コンボリューショナルカーネルの計算コストは、入力ピクセル数と出力チャンネル数を掛け合わせて算出される。 (0×1)+(0×0)+(0×4)+
(0×1)+(0×0)+(0×0)+
(1×0)+(0×4)+(0×0)+
(1×0)+(0×0)+(0×0)+
(1×0)+(1×1)+(1×0)+
(2 x 1) + (1 x 1) + (1 x 0) +
(0x0) + (0x0) + (0x0) +
(0×0)+(1×0)+(1×0)+
(0×1)+(0×0)+(2×(−4))
=-4
Thus, the computational cost of the convolutional kernel shown in FIGS. 17 and 18 is calculated by multiplying the number of input pixels by the number of output channels.

この計算コストを、ＧＰＵの使用率から考えると、次の（Ａ）～（Ｃ）のような問題が考えらえる。
（Ａ）レイヤー数が多くなり、各レイヤーが複数のシェーダの呼び出しで構成されるようになると、小さなシェーダを大量に起動して終了させることによるオーバーヘッドが増大する。
（Ｂ）適度なチャンネル数の畳み込みニューラルネットワークは、メモリ命令に対する演算命令の不足が問題となる場合がある。チャンネル数が多ければ、特定の演算ではバランスは良くなるものの、最終的にはランタイムが長くなる。ニューラルネットワークは、ハードウェア利用の観点からは効率的といえるが、リアルタイム性の観点からは、従来のやり方では不十分なものとなる。したがって、チャンネル数が増えても、一部の演算は不均衡なままとなる。
（Ｃ）レイヤー内の演算及びレイヤー間の演算は、データに依存するシェーダ呼び出しとして実装されているため、中間出力をオフチップ・メモリに書き込んでから、別のシェーダ呼び出しに含まれる次の演算の入力として再読込する必要があるという改善すべき冗長性がある。 Considering this calculation cost in terms of the utilization rate of the GPU, the following problems (A) to (C) can be considered.
(A) As the number of layers increases and each layer is composed of multiple shader calls, overhead increases due to starting and terminating a large number of small shaders.
(B) A convolutional neural network with a moderate number of channels may suffer from a shortage of operation instructions for memory instructions. More channels means better balance for certain operations, but ultimately longer runtimes. Neural networks can be said to be efficient from the point of view of hardware utilization, but from the point of view of real-time performance, conventional methods are inadequate. Therefore, even as the number of channels increases, some operations remain imbalanced.
(C) Operations within a layer and operations between layers are implemented as shader calls that depend on data, so the intermediate output is written to off-chip memory before executing the next operation in another shader call. There is redundancy to be improved in that it needs to be reread as input.

次に、上記の問題（Ａ）～（Ｃ）に対する解決の糸口を考察すると、まず（Ａ）に関して、なぜニューラルネットワークを細かく分割して、それぞれのシェーダを呼び出すのかを検討すると、それは、演算間のデータ依存性と要素／ピクセル間のデータ依存性の組み合わせによるものであると考えられる。もし、操作間の依存性がなければ、レイヤーを並行して実行できることが示唆される。また、要素間の依存性がなければ、出力ピクセルを並列に実行できることが示唆される。いずれにしても、１回のシェーダ呼び出しでニューラルネットワークを解決できることを意味する。 Next, considering the clues to solving the above problems (A) to (C), first of all, regarding (A), if we consider why the neural network is subdivided and each shader is called, it is because is due to a combination of the data dependencies of , and the data dependencies between elements/pixels. If there are no dependencies between operations, it suggests that layers can be executed in parallel. It also suggests that output pixels can be executed in parallel if there are no dependencies between elements. Either way, it means that the neural network can be solved with a single shader call.

（Ｂ）に関しては、問題は、ニューラルネットワーク全体の計算命令が不足しているのではなく、各シェーダの呼び出しに含まれる演算命令の数が少な過ぎることにある可能性が示唆される。つまり、各スレッドは、最初のメモリ読み込みが完了するのを待つ間、半固定の起動時間があるため、アイドル・クロックサイクルの割合が高くなり、ＧＰＵ利用率が低下する。 Regarding (B), it is suggested that the problem is not that the neural network as a whole lacks computational instructions, but that the number of computational instructions included in each shader call is too small. That is, each thread has a semi-fixed wake-up time while waiting for its first memory read to complete, resulting in a high percentage of idle clock cycles and low GPU utilization.

（Ｃ）に関しては、小さなシェーダ呼び出しが大量にあることが問題と考えられる。中間出力の再書き込みと再読み込みを繰り返すと、大量のメモリトラフィックが発生し、レイテンシも発生することになる。シェーダの呼び出し回数が少なくするためには、データ依存性の問題を解決する必要がある。 Regarding (C), the problem is that there are a large number of small shader calls. Repeatedly rewriting and rereading intermediate outputs causes a lot of memory traffic and latency. In order to reduce the number of shader calls, it is necessary to solve the data dependency problem.

以上の検討や考察を踏まえると、コンピュータグラフィックスの世界では、要素間のデータの依存関係を事前に解決することができれば、これまでに指摘した問題を解決できることがわかる。 Based on the above considerations and observations, it can be seen that in the world of computer graphics, if the data dependency between elements can be resolved in advance, the problems pointed out so far can be resolved.

そして、その解決策が、図１３を参照して説明した本発明の一実施形態における１×１の畳み込みカーネルの採用、及び／または、次に述べる「ハッシュ分業」の採用である（１×１の畳み込みカーネルについては、状況によっては、他の簡素な構成の畳み込みカーネルによっても本発明の効果を奏することができる）。以下、本発明の一実施形態におけるハッシュ分業の具体例について、図面を参照して説明する。 The solution is the adoption of the 1×1 convolution kernel in one embodiment of the present invention described with reference to FIG. As for the convolution kernel of (1), depending on the situation, the effect of the present invention can also be achieved by convolution kernels with other simple configurations). A specific example of hash division of labor in one embodiment of the present invention will be described below with reference to the drawings.

図１４～図１６に、本発明の一実施形態にかかる画像処理システム等の動作の具体例のうち、ハッシュ分業を採用したことによる動作原理を示す。 14 to 16 show the operation principle by adopting the hash division of labor among the specific examples of the operation of the image processing system etc. according to the embodiment of the present invention.

本発明の一実施形態にかかるハッシュ分業が実施されるにあたっては、画面フレーム中のタイルは、個々のＧＰＵスケジューラに割り当てられる。そして、各タイルの内容に基づいてハッシュ値をＧＰＵに計算させることで、専用のニューラルネットワークをオンチップメモリ上でローカルに実行させることができる。これにより、より小さなニューラルネットワークに特化することに成功し、また、中間データやデータの依存関係を同じスケジューラ内に保つことができるので、理論的に処理効率は向上する。 In implementing hash division of labor according to one embodiment of the present invention, tiles in a screen frame are assigned to individual GPU schedulers. A dedicated neural network can then be run locally on on-chip memory by having the GPU calculate a hash value based on the contents of each tile. This makes it possible to specialize in a smaller neural network, and keeps intermediate data and data dependencies within the same scheduler, thus theoretically improving processing efficiency.

そのハッシュ分業の処理フロー例については、図６等を参照してすでに説明している。 An example of the processing flow of the hash division of labor has already been described with reference to FIG.

図１４～図１６に例示されるハッシュ分業では、タイルのピクセル内容からハッシュ値を計算する過程が示されており、一実施形態において、以下の疑似コードが採用される。 The hash division illustrated in FIGS. 14-16 illustrates the process of computing a hash value from the pixel contents of a tile, and in one embodiment employs the following pseudocode.

ｆｏｒ（ｐｉｘｅｌｉｎｔｉｌｅ）｛ｈａｓｈ＿ｕ｜＝（１＜＜ｐｉｘｅｌ．ｕ）｝
ｆｏｒ（ｐｉｘｅｌｉｎｔｉｌｅ）｛ｈａｓｈ＿ｖ｜＝（１＜＜ｐｉｘｅｌ．ｖ）｝
３ＤモデルのもつテクスチャＵＶは、本発明の一実施形態におけるハッシュ分業を適用するには理想的であり、基本的には、初期ビットが０であるものを出力ビットとして１を立てる。また、望ましくは、さらに８倍するなどのスケーリングが採用される。 for (pixel in tile) {hash_u|=(1<<pixel.u)}
for (pixel in tile) {hash_v|=(1<<pixel.v)}
The texture UV of the 3D model is ideal for applying the hash division of labor in one embodiment of the present invention. Further, desirably, further scaling such as eight times is employed.

このような構成により、「配列キー＝ハッシュ値」という関係で、重みを効率的にロードすることができる。 With such a configuration, weights can be efficiently loaded with the relationship of "arrangement key=hash value".

図１４におけるハッシュ分業では、上述の前提に基づき、入力画像領域１４１０のピクセル１４１１内の値０．８が、以下の式に基づいて出力されている様子が示されている。 The hash division of labor in FIG. 14 shows how the value 0.8 in the pixel 1411 of the input image area 1410 is output based on the following formula based on the above premise.

出力｜＝
１＜＜（（ｓｔａｔｉｃ＿ｃａｓｔ＜ｉｎｔ＞（０．８ｆ＊８．０ｆ））％８）
また、図１５におけるハッシュ分業では、入力画像領域１５１０のピクセル１５１１内の値０．２が、以下の式に基づいて出力されている様子が示されている。 output |=
1<<((static_cast<int>(0.8f*8.0f))%8)
Also, in the hash division of labor in FIG. 15, a state in which the value 0.2 in the pixel 1511 of the input image area 1510 is output based on the following formula is shown.

出力｜＝
１＜＜（（ｓｔａｔｉｃ＿ｃａｓｔ＜ｉｎｔ＞（０．２ｆ＊８．０ｆ））％８）
また、図１６におけるハッシュ分業では、入力画像領域１６１０のピクセル１６１１内の値１．０が、以下の式に基づいて出力されている様子が示されている。 output |=
1<<((static_cast<int>(0.2f*8.0f))%8)
Also, in the hash division of labor in FIG. 16, a state in which the value 1.0 in the pixel 1611 of the input image area 1610 is output based on the following formula is shown.

出力｜＝
１＜＜（（ｓｔａｔｉｃ＿ｃａｓｔ＜ｉｎｔ＞（１．０ｆ＊８．０ｆ））％８）
以下、同様に、入力画像領域内のピクセル内の値に対して、次々とハッシュ分業処理が実施されていくことになる。
［理論的効果］
以上述べた実施例から導かれる本発明の一実施形態における理論的効果は、次のとおりである。
（理論的効果１：ニューロン数の低減）
制約の多い入力領域では、より少ない数のニューロンで十分なパフォーマンスが得られる。このことは、パフォーマンス面にも顕著に反映される。さらに、理論的には、本発明が取り扱う十分に小さいニューラルネットワークは、１つの命令スケジューラ（またはスケジューラのクラスタ）のレジスタ内で、そのライフタイムの間生存することができるため、このことによる効果も期待できる（後述の理論的効果２～４を参照）。
（理論的効果２：シェーダの使用回数の低減）
すべてのニューラルネットワークを１回のシェーダ呼び出しで同時に解決することができるため、プラットフォームに起因する無数のオーバーヘッドを回避することができる。
（理論的効果３：ＡＬＵの使用率向上）
入力及び出力の両方がレジスタに格納されているため、ＡＬＵはメモリ依存によるレイテンシの制約を受けることがない。ＡＬＵを自身のペースで動作させることは、あらゆる面での性能向上につながる。
（理論的効果４：メモリトラフィックの削減）
外部メモリとの間で中間値を書き込んだり、それを読み返したりする必要がないため、メモリトラフィックは減少する。これにより、メモリサブシステムの負担が軽減され、パフォーマンスが向上する。
（理論的効果５：大容量システムのスケーラビリティ向上）
ニューラルネットワークが生成するレンダリング出力の幅や質が向上すると、ニューラルネットワークのサイズも大きくなり、ニューラルネットワークのサイズが大きくなると、パフォーマンスが低下する。本発明の一実施形態におけるハッシュ分業は、構造化されたサブドメインの切り離しにより、この問題を回避することができる。
（理論的効果６：ストリーミング・フレンドリー・アセット）
アートアセットとして展開する場合、ニューラルネットワークはＳＳＤからリアルタイムにストリーミングされる必要がある。そこで、各ニューラルネットワークを小さくし、構造化された方法でラベル付けすることで、現在のスクリーンコンテンツに基づいて、必要に応じてメモリに出し入れすることができる。
（理論的効果７：制御性と安定性の向上）
メモリの依存関係が制限されることで、特定の結論に到達するために使用された入力を推測することが容易になる。これにより、ニューラルネットワークが、十分に異なる状況下で失敗してしまうリスクを低減させることができる。
（理論的効果８：ラスタ耐性のあるズーミング）
低解像度の出力に必要なロジックは、高解像度の出力に必要なロジックとは大きく異なる場合があり、本発明を適用しない場合には、入力ドメインのズームレベルが広いほどラスタアーティファクトが発生する。そこで、本発明の一実施形態にかかるハッシュアルゴリズムを採用したことで、ズームレベルに応じて異なるニューラルネットワークが学習され、このプロセスが自動的に行われるという利点がある。
（理論的効果９：レンダーエンジンの互換性）
本発明の一実施形態にかかるハッシュ分業は、他のエンジンと同じ３Ｄモデル、アニメーション、光源を使用しており、いくつかの（またはすべての）シーンオブジェクトのシェーダとして追加されることもできる。
（理論的効果１０：ニューラルネットワークの画質）
オフラインでのＧＡＮベースのニューラルネットワークは、すでに現在のリアルタイムレンダリングシステムよりも高品質な出力を実現している。これを迅速かつ安定的に行うことにより、多くの種類のリアルタイムエンジンや製品にとって魅力的な選択肢を与えることができる。
（理論的効果１１：資産形成）
ニューラルネットワークのトレーニングは、静的なアセットの３Ｄスキャンや、シェーダで使用されるＰＢＲ定数よりも、多くの点で寛容的である。シーン全体で一貫してフォトリアリスティックな品質レベルを維持するための現実的な選択肢を提供することができる。 output |=
1<<((static_cast<int>(1.0f*8.0f))%8)
Thereafter, similarly, the hash division of labor is successively performed on the values in the pixels within the input image area.
[Theoretical effect]
The theoretical effects of one embodiment of the present invention derived from the examples described above are as follows.
(Theoretical effect 1: reduction in the number of neurons)
Fewer neurons provide sufficient performance in constrained input regions. This is reflected notably in terms of performance. Furthermore, theoretically, a small enough neural network that we are dealing with can live in the registers of a single instruction scheduler (or cluster of schedulers) for its lifetime, so this also has the effect of It can be expected (see theoretical effects 2-4 below).
(Theoretical effect 2: reduction in the number of times shaders are used)
All neural networks can be solved simultaneously in a single shader call, thus avoiding a myriad of platform-induced overheads.
(Theoretical effect 3: ALU usage rate improvement)
Since both inputs and outputs are registered, the ALU is not latency bound by memory dependencies. Allowing the ALU to run at its own pace leads to improved performance in all respects.
(Theoretical effect 4: Reduction of memory traffic)
Memory traffic is reduced because there is no need to write intermediate values to and read back from external memory. This reduces the burden on the memory subsystem and improves performance.
(Theoretical effect 5: Improved scalability of large-capacity systems)
As the breadth and quality of rendered output produced by a neural network increases, so does the size of the neural network, and as the size of the neural network increases, the performance decreases. Hash division in one embodiment of the present invention can circumvent this problem through structured subdomain decoupling.
(Theoretical Effect 6: Streaming Friendly Assets)
When deployed as an art asset, the neural network needs to be streamed from SSD in real time. So by making each neural network small and labeling it in a structured way, it can be moved in and out of memory as needed based on the current screen content.
(Theoretical effect 7: Improved controllability and stability)
Limited memory dependencies make it easier to infer the inputs used to arrive at a particular conclusion. This reduces the risk that the neural network will fail under sufficiently different circumstances.
(Theoretical Effect 8: Raster Resistant Zooming)
The logic required for low-resolution output can be very different from the logic required for high-resolution output, and without the application of the present invention raster artifacts occur at wider zoom levels in the input domain. Therefore, by adopting the hash algorithm according to an embodiment of the present invention, it is advantageous that different neural networks are learned according to the zoom level, and this process is performed automatically.
(Theoretical Effect 9: Render Engine Compatibility)
Hash division according to an embodiment of the present invention uses the same 3D models, animations and lighting as other engines and can also be added as shaders for some (or all) scene objects.
(Theoretical effect 10: Neural network image quality)
GAN-based neural networks offline have already achieved higher quality output than current real-time rendering systems. Doing this quickly and consistently makes it an attractive option for many kinds of real-time engines and products.
(Theoretical effect 11: Asset formation)
Training neural networks is in many ways more forgiving than static 3D scans of assets and PBR constants used in shaders. It can provide realistic options for maintaining a consistent photorealistic quality level throughout the scene.

以上、具体例に基づき、画像処理システム及び画像処理プログラム等の実施形態を説明したが、本発明の実施形態としては、システム又は装置を実施するための方法又はプログラムの他、プログラムが記録された記憶媒体（一例として、光ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ、ＣＤ－ＲＷ、磁気テープ、ハードディスク、メモリカード）等としての実施態様をとることも可能である。 The embodiments of the image processing system and the image processing program have been described above based on specific examples. It can also be implemented as a storage medium (for example, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a CD-RW, a magnetic tape, a hard disk, a memory card) or the like.

また、プログラムの実装形態としては、コンパイラによってコンパイルされるオブジェクトコード、インタプリタにより実行されるプログラムコード等のアプリケーションプログラムに限定されることはなく、オペレーティングシステムに組み込まれるプログラムモジュール等の形態であっても良い。 Moreover, the implementation form of the program is not limited to an application program such as an object code compiled by a compiler or a program code executed by an interpreter. good.

さらに、プログラムは、必ずしも制御基板上のＣＰＵにおいてのみ、全ての処理が実施される必要はなく、必要に応じて基板に付加された拡張ボードや拡張ユニットに実装された別の処理ユニット（ＤＳＰ等）によってその一部又は全部が実施される構成とすることもできる。 Furthermore, the program does not necessarily have to be executed entirely by the CPU on the control board. ) can also be configured to implement a part or all of it.

本明細書（特許請求の範囲、要約、及び図面を含む）に記載された構成要件の全て及び／又は開示された全ての方法又は処理の全てのステップについては、これらの特徴が相互に排他的である組合せを除き、任意の組合せで組み合わせることができる。 For all elements described in this specification (including the claims, abstract, and drawings) and/or for any method or process step disclosed, these features are mutually exclusive. Any combination can be used except for the combination of

また、本明細書（特許請求の範囲、要約、及び図面を含む）に記載された特徴の各々は、明示的に否定されない限り、同一の目的、同等の目的、または類似する目的のために働く代替の特徴に置換することができる。したがって、明示的に否定されない限り、開示された特徴の各々は、包括的な一連の同一又は均等となる特徴の一例にすぎない。 Moreover, each of the features disclosed in this specification (including the claims, abstract, and drawings) may serve the same, equivalent, or similar purpose unless expressly contradicted Alternate features may be substituted. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of identical or equivalent features.

さらに、本発明は、上述した実施形態のいずれの具体的構成にも制限されるものではない。本発明は、本明細書（特許請求の範囲、要約、及び図面を含む）に記載された全ての新規な特徴又はそれらの組合せ、あるいは記載された全ての新規な方法又は処理のステップ、又はそれらの組合せに拡張することができる。 Furthermore, the invention is not limited to any specific configuration of the embodiments described above. The present invention resides in any novel feature or combination thereof, or any novel method or process step, or method described in this specification (including claims, abstract, and drawings). can be extended to combinations of

１０画像処理システム
１１情報処理サーバ群
１１１リアルタイムクラスタ
１１２ロードバランサ
１２、１３ＰＣ（情報処理装置の一形態）
１４携帯電話（情報処理装置の一形態）
１５、１５ａ～１５ｃタブレット端末（情報処理装置の一形態）
１９公衆回線（専用線、インターネット等）
２３ＡＰＩサーバ 10 image processing system 11 information processing server group 111 real-time cluster 112 load balancers 12, 13 PC (one form of information processing apparatus)
14 Mobile phones (a form of information processing equipment)
15, 15a to 15c tablet terminal (a form of information processing device)
19 Public line (dedicated line, Internet, etc.)
23 API server

Claims

a conversion step for converting the mesh and lighting information in the screen frame into a structured input format adaptable to the neural network;
performing a hash division of labor for properly using the neural network trained for each tile,
The hash division of labor includes:
hashing pixel values in the screen frame to calculate a hash value;
using the hash value as a key to select and load a weight in the neural network for the corresponding tile from a lookup table;
An image processing method by a computer, wherein the output by the hash division of labor is an estimated value estimated by the weight of the neural network corresponding to the tile in the screen frame.

The computer has a plurality of partial processors (hereinafter referred to as "synchronization groups") that can be efficiently synchronized with each other in processing units, and an on-chip memory that can be shared by the synchronization groups,
2. The method of claim 1, wherein the hash division assigns the synchronization groups and loads the neural network weights onto the on-chip memory for each tile in the screen frame.

The computer has a plurality of sub-processors (hereinafter referred to as "synchronization groups") that can be efficiently synchronized with each other in the processing units, and registers,
2. The method of claim 1, wherein the hash division assigns the synchronization group for each tile in the screen frame and stores the outputs of intermediate layers of the neural network in the registers.

In the step of learning the neural network for each tile, the pixel values in the screen frame are hashed to calculate a hash value, and only the training data of the tile corresponding to the hash value is used to create a neural network dedicated to the hash value. 2. The method of claim 1, wherein the weights of are learned.

5. The method of claim 4, wherein the training data of the tile corresponding to the hash value includes training data of other tiles that are close as input regions according to the hash value.

An image processing system comprising a CPU and a GPU and performing rendering processing using a neural network,
a conversion unit for converting mesh information and lighting information in a screen frame into a structured input format adaptable to a neural network;
a processing unit that performs hash division of labor for properly using the neural network learned for each tile,
The hash division of labor includes:
a calculation unit for hashing pixel values in the screen frame to calculate a hash value;
a reading unit that selects and reads weights in the neural network for corresponding tiles from a lookup table using the hash value as a key;
The image processing system, wherein the output from the hash division of labor is an estimated value estimated by the weight of the neural network corresponding to the tile in the screen frame.

The image processing system comprises a plurality of sub-processors (hereinafter referred to as "synchronization groups") that can be efficiently synchronized with each other in processing units, and an on-chip memory that can be shared by the synchronization groups,
7. The system of claim 6, wherein the hash division assigns the synchronization groups and loads the neural network weights onto the on-chip memory for each tile in the screen frame.

The image processing system comprises a plurality of sub-processors (hereinafter referred to as "synchronization groups") that can be efficiently synchronized with each other in the processing units, and registers,
7. The system of claim 6, wherein the hash division assigns the synchronization group for each tile in the screen frame and stores the outputs of intermediate layers of the neural network in the registers.

In the step of learning the neural network for each tile, the pixel values in the screen frame are hashed to calculate a hash value, and only the training data of the tile corresponding to the hash value is used to create a neural network dedicated to the hash value. 7. The system of claim 6, wherein the weights of are learned.

A program that is executed on an image processing system that includes a CPU and a GPU and performs rendering processing using a neural network, wherein when executed on the system,
to the CPU or the GPU,
converting the mesh and lighting information in the screen frame into a structured input format adaptable to the neural network;
performing a hash division of labor for properly using the neural network trained for each tile,
The hash division of labor includes:
hashing pixel values in the screen frame to calculate a hash value;
using the hash value as a key to select and load a weight in the neural network for the corresponding tile from a lookup table,
The program, wherein the output from the hash division of labor is an estimated value estimated by the weights of the neural network corresponding to the tiles in the screen frame.

The system includes a plurality of partial processors (hereinafter referred to as "synchronization groups") that can be effectively synchronized with each other in processing units, and an on-chip memory that can be shared by the synchronization groups;
11. The program of claim 10, wherein the hash division includes assigning the synchronization group and loading the neural network weights onto the on-chip memory for each tile in the screen frame.

The system comprises a plurality of partial processors (hereinafter referred to as "synchronization groups") that can be efficiently synchronized with each other in the processing units, and registers,
11. The program product of claim 10, wherein the hash division includes assigning the synchronization group for each tile in the screen frame and storing the outputs of intermediate layers of the neural network in the registers.

In the step of learning the neural network for each tile, the pixel values in the screen frame are hashed to calculate a hash value, and only the training data of the tile corresponding to the hash value is used to create a neural network dedicated to the hash value. 11. The program according to claim 10, which learns the weight of .

14. The program according to claim 13, wherein said teacher data of said tile corresponding to said hash value includes teacher data of a plurality of other tiles that are close as an input area according to said hash value.