JP2020027399A

JP2020027399A - Computer system

Info

Publication number: JP2020027399A
Application number: JP2018151323A
Authority: JP
Inventors: 幸二福田; Koji Fukuda
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2020-02-20
Anticipated expiration: 2038-08-10
Also published as: JP7036689B2

Abstract

To perform pooling processing in a convolution operation on a graph.SOLUTION: A computer system executes a convolution neural network on a graph. The convolution neural network on the graph includes one or more convolution layers and one or more pooling layers. One or more processors update a value of each node by a convolution operation based on a kernel having the size of a prescribed hop number in each convolution layer, and update a value of each node by pooling processing based on the value of each node and a node within the pooling range of the prescribed hop number from each node in each pooling layer. The size of a kernel of a convolution layer of a subsequent stage of the pooling layer is larger than the size of a kernel of a convolution layer of a front stage.SELECTED DRAWING: Figure 9

Description

本開示は、計算機システムに関する。 The present disclosure relates to a computer system.

社会インフラ及び都市等を効率的に設計、運用するため、実社会及びサイバ空間のデータを処理し、社会インフラ等の状態を解析し、若しくは予測し、又は、社会インフラ等を制御し、若しくは誘導する技術が注目されている。処理されるデータは、温度及び湿度等の環境のセンシングデータ、自動車等の機械のログデータ、並びにメール及びＳＮＳ等の人間又は組織のログデータから構成される。 To efficiently design and operate social infrastructure and cities, etc., process data in the real world and cyber space, analyze or predict the state of social infrastructure, etc., or control or guide social infrastructure, etc. Technology is attracting attention. The data to be processed includes sensing data of the environment such as temperature and humidity, log data of a machine such as an automobile, and log data of a human or an organization such as mail and SNS.

近年、ニューラルネットワークを使用する機械学習が、データ解析する手法として注目されている。ニューラルネットワークは、高速で正確なデータ解析を容易に実現することができる。上述のようなシステムのデータを含む様々な種類のデータは、グラフ構造により表わすことができるグラフデータである。 In recent years, machine learning using a neural network has attracted attention as a data analysis technique. The neural network can easily realize high-speed and accurate data analysis. Various types of data, including system data as described above, are graph data that can be represented by a graph structure.

例えば、国際公開第２０１６／１７４７２５号（特許文献１）は、グラフデータのニューラルネットワークを構築得る技術を開示する。具体的には、特許文献１は、一つ以上のニューロンを含む複数の層から構成されるニューラルネットワークを用いた演算処理を実行する計算機であって、複数のノード及び複数のノード間を接続する一つ以上のエッジから構成されるグラフデータ、及びニューラルネットワークに入力される一つ以上の値を格納するサンプルデータを格納する記憶部と、グラフデータを用いて、ニューラルネットワークを構築する構築部と、を備え、構築部は、グラフデータに含まれる前記複数のノードに基づいて、複数の層の各々の一つ以上のニューロンを生成し、グラフデータに含まれる一つ以上のエッジに基づいて、複数の層の各々に含まれる一つ以上のニューロン間の接続を生成することによってニューラルネットワークを構築する、ことを開示する（例えば要約）。また、非特許文献１は、関係グラフ畳み込みネットワークの例を開示する。 For example, WO 2016/174725 (Patent Document 1) discloses a technique capable of constructing a neural network of graph data. Specifically, Patent Literature 1 is a computer that executes an arithmetic process using a neural network including a plurality of layers including one or more neurons, and connects a plurality of nodes and a plurality of nodes. A storage unit that stores graph data composed of one or more edges, and sample data that stores one or more values input to the neural network, and a construction unit that constructs a neural network using the graph data. The construction unit generates one or more neurons of each of a plurality of layers based on the plurality of nodes included in the graph data, and based on one or more edges included in the graph data, Disclose constructing a neural network by creating connections between one or more neurons in each of a plurality of layers For example, summary). Non-Patent Document 1 discloses an example of a relational graph convolution network.

国際公開第２０１６／１７４７２５号International Publication No. 2016/174725

M. Schlichtkrull et al.、 “Modeling Relational Data with Graph Convolutional Networks”、arXiv preprint arXiv:1703.06103、 2017.M. Schlichtkrull et al., “Modeling Relational Data with Graph Convolutional Networks”, arXiv preprint arXiv: 1703.06103, 2017.

様々な種類のニューラルネットワークが知られている中で、畳み込みニューラルネットワーク（ＣＮＮ）は、特に画像処理に有用なニューラルネットワークとして知られている。グラフデータは関係性を有しており、グラフデータをＣＮＮにより解析することができれば有用である。 Among various types of neural networks, a convolutional neural network (CNN) is known as a neural network particularly useful for image processing. The graph data has a relationship, and it is useful if the graph data can be analyzed by the CNN.

ＣＮＮを深いネットワークで構成するいわゆる深層学習の実現には、プーリング、又は、より一般にパラメータの自由度を上層に行くほど小さくする操作が必要である。プーリング層なしで畳み込み層だけを多数重ねると、表現可能な自由度が減らない（正則化がかからない）ままパラメータ数のみが増えることととなり、パラメータの学習時に過学習を引き起こす。 In order to realize so-called deep learning in which the CNN is configured by a deep network, it is necessary to perform pooling or, more generally, an operation of reducing the degree of freedom of the parameter toward an upper layer. If a large number of convolutional layers are superimposed without a pooling layer, only the number of parameters increases without reducing the degree of freedom that can be expressed (no regularization is applied), which causes over-learning when learning the parameters.

従来のグリッドデータのＣＮＮ（グリッド上のＣＮＮ）は、畳み込み演算（及び非線形活性化関数の適用）の後に、プーリングと呼ばれる操作で画像サイズを縮小することが一般的である。しかし、グラフデータのＣＮＮにおいて、プーリング操作の実現は困難である。 In the conventional CNN of grid data (CNN on a grid), it is common to reduce the image size by an operation called pooling after a convolution operation (and application of a nonlinear activation function). However, it is difficult to realize the pooling operation in the CNN of the graph data.

グリッドデータにおけるプーリング操作を単純にグラフデータに対応させると、適当にグラフデータのノードを削除してグラフデータのサイズを縮小すればよいと考えられる。しかしながら、どのようにして削除するノードと残すノードを選ぶのかの指針が明らかでない。 If the pooling operation in the grid data is simply made to correspond to the graph data, it is considered that nodes of the graph data should be appropriately deleted to reduce the size of the graph data. However, it is not clear how to select a node to be deleted and a node to be left.

したがって、グラフ上のＣＮＮにおいてパラメータの自由度を低減するプーリングを実現する技術が望まれる。 Therefore, a technique for implementing pooling that reduces the degree of freedom of parameters in a CNN on a graph is desired.

本開示の一態様は、グラフ上の畳み込みニューラルネットワークを実行する、計算機システムであって、１以上のプロセッサと、１以上の記憶装置と、を含み、前記グラフ上の畳み込みニューラルネットワークは、１以上の畳み込み層と、１以上のプーリング層と、を含み、前記１以上の記憶装置は、前記１以上の畳み込み層のカーネルの重みデータを格納し、前記１以上のプロセッサは、各畳み込み層において、各ノードの値を、所定ホップ数のサイズを有するカーネルに基づく畳み込み演算によって、更新し、各プーリング層において、各ノードの値を、各ノードの値及び各ノードから所定ホップ数のプーリング範囲内のノードの値に基づくプーリング処理によって、更新し、プーリング層の後段の畳み込み層のカーネルのサイズは、前段の畳み込み層のカーネルのサイズよりも大きい。 One aspect of the present disclosure is a computer system that executes a convolutional neural network on a graph, the computer system including one or more processors and one or more storage devices, wherein the convolutional neural network on the graph includes one or more And one or more pooling layers, the one or more storage devices store kernel weight data for the one or more convolution layers, and the one or more processors, at each convolution layer, The value of each node is updated by a convolution operation based on a kernel having a size of a predetermined number of hops, and in each pooling layer, the value of each node is set within a pooling range of a predetermined number of hops from the value of each node and each node. Updated by the pooling process based on the node value, the kernel size of the convolutional layer after the pooling layer is Larger than the size of the kernel of the convolution layer.

本開示の一態様によれば、グラフ上の畳み込み演算において適切にプーリング処理を行うことができる。 According to an embodiment of the present disclosure, pooling processing can be appropriately performed in a convolution operation on a graph.

計算機システムの構成の一例を示す。1 shows an example of the configuration of a computer system. ニューラルネットワークの一例を示す。1 shows an example of a neural network. 畳み込みニューラルネットワークの例を示す。4 shows an example of a convolutional neural network. グラフ上のＣＮＮの構成例を示す。4 shows a configuration example of a CNN on a graph. グラフデータの例を示す。An example of graph data is shown. 一つのノード及び当該ノードに接続されているノードを示す。One node and a node connected to the node are shown. 重みデータの一部であって、ノードの畳み込みに関する値を示す。It is a part of the weight data and indicates a value related to the convolution of the node. ノードの状態値の更新を模式的に示す。5 schematically illustrates updating of a state value of a node. グラフ上のＣＮＮの第１畳み込み層、第１プーリング層、第２畳み込み層及び第２プーリング層詳細を示す。3 shows details of a first convolutional layer, a first pooling layer, a second convolutional layer and a second pooling layer of CNN on a graph. グラフ上のＣＮＮに入力されるグラフデータの例を示す。4 shows an example of graph data input to a CNN on a graph. 第１畳み込み層の畳み込み演算の例を示す。4 shows an example of a convolution operation of a first convolution layer. グラフデータの属性の例を示す。The example of the attribute of graph data is shown. 本例の重みデータの一部のデータを示す。3 shows a part of the weight data of the present example. 第１畳み込み層によりエッジの状態値の更新を示す。The update of the state value of the edge by the first convolution layer is shown. 第１プーリング層によるプーリング処理を模式的に示す。4 schematically shows a pooling process by a first pooling layer. 図１５に示すグラフ構造を有するグラフデータにおいて、適用可能なラプラシアン行例の例を示す。16 illustrates an example of an applicable Laplacian row example in graph data having the graph structure illustrated in FIG. 15. 第２畳み込み層による、ノードのための畳み込み演算を模式的に示す。4 schematically illustrates a convolution operation for a node by a second convolution layer. 計算機が実行する学習処理を説明するフローチャートである。It is a flowchart explaining the learning process which a computer performs. 分散ＣＮＮを実行する計算機システムの例を示す。1 shows an example of a computer system that executes a distributed CNN. 分散ＣＮＮの層及びカラムの例を示す。3 shows examples of layers and columns of dispersed CNN. 分散ＣＮＮの概念的な構成例を示す。4 shows a conceptual configuration example of a distributed CNN. 関数ｆの適用ノードの伝播則を示す。5 shows a propagation rule of an application node of the function f. 可算ノードの伝播則を示す。Shows the propagation rule for countable nodes. 乗算ノードの伝播則を示す。3 shows a propagation rule of a multiplication node. 変数複製ノード（分岐ノード）の伝播則を示す。The propagation rule of a variable replication node (branch node) is shown. 分散ＣＮＮの構成例及びその処理を示す。A configuration example of a distributed CNN and its processing will be described.

以下、添付図面を参照して本発明の実施形態を説明する。本実施形態は本発明を実現するための一例に過ぎず、本発明の技術的範囲を限定するものではないことに注意すべきである。各図において共通の構成については同一の参照符号が付されている。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each of the drawings, common components are denoted by the same reference numerals.

［システム構成］
図１は、計算機システムの構成の一例を示す。図１に示すように、計算機システム１００は、複数の計算機１０１、及びストレージシステム１０２から構成され、複数の計算機１０１とストレージシステム１０２とは、互いに通信できるように、ネットワーク１０３を介して接続される。本実施例の計算機システム１００は、３つの計算機１０１を含む。図１は、一つの計算機のみを例として符号１００で指示する。なお、計算機及びストレージシステムの数は、任意である。計算機システム１００は、一つの計算機１０１で構成されていてもよい。 [System configuration]
FIG. 1 shows an example of the configuration of a computer system. As shown in FIG. 1, the computer system 100 includes a plurality of computers 101 and a storage system 102, and the plurality of computers 101 and the storage system 102 are connected via a network 103 so that they can communicate with each other. . The computer system 100 of the present embodiment includes three computers 101. In FIG. 1, only one computer is indicated by reference numeral 100 as an example. The number of computers and storage systems is arbitrary. The computer system 100 may be composed of one computer 101.

ネットワーク１０３は、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、及びＳＡＮ（ＳｔｏｒａｇｅＡｒｅａＮｅｔｗｏｒｋ）等が考えられる。なお、ネットワーク１０３の種別は限定されない。計算機１０１それぞれとストレージシステム１０２接続するネットワークと、計算機１０１それぞれを接続するネットワークは、異なるネットワークであってもよい。 The network 103 may be a WAN (Wide Area Network), a LAN (Local Area Network), a SAN (Storage Area Network), or the like. Note that the type of the network 103 is not limited. The network connecting each computer 101 to the storage system 102 and the network connecting each computer 101 may be different networks.

計算機１０１は、プロセッサ１１０、メモリ１１１、通信インタフェース１１２、及び入出力インタフェース１１８を含み、各構成はバス１１４を介して互いに接続される。また、入出力装置１１９は、入出力インタフェース１１８を介して他の装置と接続される。プロセッサ１１０は、演算処理を実行する１つ以上のＣＰＵ１１５を含む。図１において、一つのＣＰＵのみが符号１１５で指示されている。ＣＰＵ１１５は、メモリ１１１に格納されるプログラムを実行することによって、計算機１０１が有する機能を実現する。また、計算機１０１上で実行される処理は１つ以上のＣＰＵ１１５によって実行される。なお、１つのＣＰＵ１１５が複数の処理を実行してもよい。なお、ＣＰＵ１１５は、ＦＰＧＡ又はＧＰＵ等の演算器であってもよい。 The computer 101 includes a processor 110, a memory 111, a communication interface 112, and an input / output interface 118, and each component is connected to each other via a bus 114. The input / output device 119 is connected to another device via the input / output interface 118. The processor 110 includes one or more CPUs 115 that execute arithmetic processing. In FIG. 1, only one CPU is indicated by reference numeral 115. The CPU 115 realizes the functions of the computer 101 by executing a program stored in the memory 111. Further, processing executed on the computer 101 is executed by one or more CPUs 115. Note that one CPU 115 may execute a plurality of processes. Note that the CPU 115 may be an arithmetic unit such as an FPGA or a GPU.

メモリ１１１は、プロセッサ１１０が実行するプログラム及び当該プログラムによって使用される情報を格納する。また、メモリ１１１は、プロセッサ１１０が実行する１つの処理に対して割り当てられるメモリ空間を含む。なお、当該メモリ空間は、複数のメモリ１１１のメモリ領域上に確保されてもよいし、また、１つのメモリ１１１のメモリ領域上に確保されてもよい。また、メモリ１１１は、複数の処理のメモリ空間を含んでもよい。メモリ１１１に格納されるプログラム及び情報については後述する。 The memory 111 stores a program executed by the processor 110 and information used by the program. Further, the memory 111 includes a memory space allocated to one process executed by the processor 110. The memory space may be secured on a memory area of a plurality of memories 111, or may be secured on a memory area of one memory 111. Further, the memory 111 may include a memory space for a plurality of processes. The programs and information stored in the memory 111 will be described later.

通信インタフェース１１２は、ネットワーク１０３を介して外部装置と通信する。プロセッサ１１０は、通信インタフェース１１２を介して、他の装置にアクセスする。入出力インタフェース１１８は、ネットワーク１０３を介して、他の装置と入出力装置１１９との通信を媒介する。 The communication interface 112 communicates with an external device via the network 103. The processor 110 accesses another device via the communication interface 112. The input / output interface 118 mediates communication between another device and the input / output device 119 via the network 103.

ストレージシステム１０２は、ディスクインタフェース１３３、及び複数の記憶デバイス１１７を含み、各装置はバス１３５を介して互いに接続される。ディスクインタフェース１３３は、複数の記憶ドライブ１３４と接続するためのインタフェースである。記憶ドライブ１３４は、各種データを格納する記憶装置であり、例えば、ＨＤＤやＳＳＤである。図１において、一つの記憶ドライブのみが符号１３４で指示されている。 The storage system 102 includes a disk interface 133 and a plurality of storage devices 117, and each device is connected to each other via a bus 135. The disk interface 133 is an interface for connecting to a plurality of storage drives 134. The storage drive 134 is a storage device that stores various data, and is, for example, an HDD or an SSD. In FIG. 1, only one storage drive is designated by reference numeral 134.

メモリ１１１は、データ処理部１２０を実現するプログラムを格納する。データ処理部１２０を実行するプロセッサ１１０は、逆誤差伝播により学習処理、及び順伝播による分析処理を実行する。学習処理では、プロセッサ１１０は、学習データ１２２を使用して、構築したニューラルネットワーク内のエッジの重みを決定する。分析処理では、プロセッサ１１０は、分析対象であるデータを、構築されたニューラルネットワークに入力することによって、分類や回帰等の所定の分析を行う。 The memory 111 stores a program that implements the data processing unit 120. The processor 110 that executes the data processing unit 120 performs a learning process by backward error propagation and an analysis process by forward propagation. In the learning process, the processor 110 uses the learning data 122 to determine the weight of an edge in the constructed neural network. In the analysis process, the processor 110 performs predetermined analysis such as classification and regression by inputting data to be analyzed into the constructed neural network.

なお、データ処理部１２０は、複数のプログラムモジュールから構成されてもよい。例えば、データ処理部１２０は、ニューラルネットワークを構築する構築モジュール、学習処理を実行する学習処理モジュール、及び分析処理を実行する分析処理モジュールを含んでもよい。また、それぞれのプログラムモジュールを別々の計算機１０１が実行してもよい。 Note that the data processing unit 120 may be composed of a plurality of program modules. For example, the data processing unit 120 may include a construction module that constructs a neural network, a learning processing module that performs a learning process, and an analysis processing module that performs an analysis process. Further, each computer 101 may execute each program module.

［ニューラルネットワーク］
図２は、ニューラルネットワークの一例を示す。図２に示すニューラルネットワーク２００は、入力層２０１、中間層２０２、及び出力層２０３の３層から構成される。各層は１つ以上のニューロン２１１から構成される。また、各層のニューロン２１１は、他の層の少なくとも１つのニューロン２１１と接続する。具体的には、入力層２０１のニューロン２１１は、中間層２０２の少なくとも１つのニューロン２１１と接続し、また、中間層２０２のニューロン２１１は、出力層２０３の少なくとも１つのニューロン２１１と接続する。入力層２０１の側が前段側であり、出力層２０３の側が後段側である。 [neural network]
FIG. 2 shows an example of a neural network. The neural network 200 shown in FIG. 2 includes three layers: an input layer 201, an intermediate layer 202, and an output layer 203. Each layer is composed of one or more neurons 211. Further, the neurons 211 in each layer are connected to at least one neuron 211 in another layer. Specifically, the neurons 211 of the input layer 201 are connected to at least one neuron 211 of the intermediate layer 202, and the neurons 211 of the intermediate layer 202 are connected to at least one neuron 211 of the output layer 203. The side of the input layer 201 is the former stage, and the side of the output layer 203 is the latter stage.

エッジ２１２はニューロン２１１間のデータの出力を表す。ニューラルネットワーク２００は、分析処理において、入力されたデータを入力層２０１から、中間層２０２を介して、出力層２０３に順伝播する。各ニューロン２１１に入力された値は、エッジに割り当てられている重みと掛け算され、エッジ２１２によって接続される他の層のニューロン２１１に出力される。入力２１１への入力はバイアスを含んでよい。重みは、ニューロン２１１間の接続の強さを表し、後述する学習処理によって決定される。学習処理は、損失信号の誤差逆伝播により各層の重みが更新される。 Edge 212 represents the output of data between neurons 211. In the analysis processing, the neural network 200 sequentially propagates input data from the input layer 201 to the output layer 203 via the intermediate layer 202. The value input to each neuron 211 is multiplied by the weight assigned to the edge, and output to the neuron 211 of another layer connected by the edge 212. The input to input 211 may include a bias. The weight indicates the strength of the connection between the neurons 211 and is determined by a learning process described later. In the learning process, the weight of each layer is updated by back propagation of the loss signal.

図１に戻って、メモリ１１１は、グラフデータ１２１、学習データ１２２、及び重みデータ１２３を格納する。グラフデータ１２１は、任意の要素に対応するノード、及びノード間を接続するエッジから構成されるグラフ構造を有するデータである。学習データ１２２は、ニューラルネットワークを用いた学習処理において用いられるデータである。重みデータ１２３は、ニューラルネットワークの学習処理の処理結果である重みを管理する情報である。 Returning to FIG. 1, the memory 111 stores the graph data 121, the learning data 122, and the weight data 123. The graph data 121 is data having a graph structure including nodes corresponding to arbitrary elements and edges connecting the nodes. The learning data 122 is data used in a learning process using a neural network. The weight data 123 is information for managing a weight that is a processing result of the learning processing of the neural network.

グラフデータ１２１及び学習データ１２２は、例えば、ストレージシステム１０２に格納されており、プロセッサ１１０が、ストレージシステム１０２からグラフデータ１２１及び学習データ１２２を取得し、取得されたグラフデータ１２１及び学習データ１２２をメモリ１１１にロードする。 The graph data 121 and the learning data 122 are stored in, for example, the storage system 102. The processor 110 acquires the graph data 121 and the learning data 122 from the storage system 102, and converts the acquired graph data 121 and learning data 122. The data is loaded into the memory 111.

［グリッド上のＣＮＮ］
図３は、畳み込みニューラルネットワーク（ＣＮＮ）の例を示す。本開示の計算機システム１００は、グラフ上のＣＮＮを実行する。図３を参照して、典型的なグリッド上のＣＮＮ３００を説明する。グリッド上のＣＮＮは、画像データのようなマトリックス状（グリッド状）の入力データを処理する。 [CNN on grid]
FIG. 3 shows an example of a convolutional neural network (CNN). The computer system 100 of the present disclosure executes a CNN on a graph. Referring to FIG. 3, a typical CNN 300 on a grid will be described. The CNN on the grid processes input data in a matrix (grid) such as image data.

図３の例において、２８ｘ２８のグレースケールの手書きの数字の画像３０１が入力される。ＣＮＮ３００は、５ｘ５の畳み込みフィルタ及びＲｅＬＵ関数を適用して（３０２）、６つの２４ｘ２４の特徴マップ３０３を生成する。ＣＮＮ３００は、これら特徴マップに２ｘ２の最大値プーリングを適用して（３０４）、画像サイズを半分に縮小された特徴マップ３０５を生成する。 In the example of FIG. 3, an image 301 of a 28 × 28 gray scale handwritten numeral is input. The CNN 300 applies a 5x5 convolution filter and a ReLU function (302) to generate six 24x24 feature maps 303. The CNN 300 applies 2 × 2 maximum pooling to these feature maps (304) to generate a feature map 305 with the image size reduced by half.

ＣＮＮ３００は、さらに、５ｘ５の畳み込みフィルタ及びＲｅＬＵ関数を実行して（３０６）、１０の８ｘ８の特徴マップ３０７を生成する。ＣＮＮ３００は、これら特徴マップに２ｘ２の最大値プーリングを適用して（３０８）、画像サイズを半分に縮小された特徴マップ３０９を生成する。その後、２層の全結合層よりなるパーセプトロン３１０が、最終的に１０次元のベクトルを出力する。出力された１０次元ベクトルはＳｏｆｔＭａｘ関数によりワンホットベクトル表現に変換され、入力画像が０〜９までのどの数字であるかの尤もらしさを表す。 The CNN 300 further executes a 5x5 convolution filter and a ReLU function (306) to generate a 10 8x8 feature map 307. The CNN 300 applies 2 × 2 maximum pooling to these feature maps (308) to generate a feature map 309 with the image size reduced by half. After that, the perceptron 310 including the two fully connected layers finally outputs a 10-dimensional vector. The output 10-dimensional vector is converted into a one-hot vector expression by a SoftMax function, and represents the likelihood of the input image being a number from 0 to 9.

［グラフ上のＣＮＮ（順伝播）］
以下において、本開示のグラフ上のＣＮＮについて説明する。計算機システム１００は、グラフ上のＣＮＮを実行する。図４は、グラフ上のＣＮＮの構成例４００を示す。図４は、ＣＮＮ４００における順伝播（分析処理）を示し、データ処理部１２０により実行される。グラフ上のＣＮＮ４００は、各層において、共通のグラフ構造上でデータを処理する。 [CNN on graph (forward propagation)]
Hereinafter, the CNN on the graph of the present disclosure will be described. The computer system 100 executes the CNN on the graph. FIG. 4 shows a configuration example 400 of the CNN on the graph. FIG. 4 shows forward propagation (analysis processing) in the CNN 400, which is executed by the data processing unit 120. The CNN 400 on the graph processes data on a common graph structure in each layer.

グラフ構造は、複数のノードと、それぞれ二つノードを接続する複数のエッジで構成されている。グリッド構造はグラフ構造の一つであり、各ノードが隣接するノードそれぞれとエッジで接続されている。本開示におけるグラフ構造は、任意の複数ノードと、任意数のエッジとで構成される。一つのノードは、１以上の任意の数のエッジに接続される。 The graph structure includes a plurality of nodes and a plurality of edges each connecting two nodes. The grid structure is one of graph structures, and each node is connected to an adjacent node by an edge. The graph structure according to the present disclosure includes an arbitrary plurality of nodes and an arbitrary number of edges. One node is connected to one or more arbitrary number of edges.

グラフデータは、グラフ構造を有するデータであって、ノードそれぞれに状態値が割り当てられている。状態値は、１次元以上のベクトルで表わすことができる。図４に示すように、グラフデータ１２１が、グラフ上のＣＮＮ４００に入力される。上述のように、グラフデータ１２１の各ノードは、状態値を有しており、ノード間のエッジが定義されている。 The graph data is data having a graph structure, and a state value is assigned to each node. State values can be represented by one or more dimensional vectors. As shown in FIG. 4, the graph data 121 is input to the CNN 400 on the graph. As described above, each node of the graph data 121 has a state value, and an edge between nodes is defined.

図４のＣＮＮ４００は、図３を参照して説明したグリッド上のＣＮＮ３００のように、２つの畳み込み層と２つのプーリング層を含む。具体的には、ＣＮＮ４００は、第１畳み込み層４１１、第１プーリング層４１２、第２畳み込み層４１３、第２プーリング層４１４及び全域平均プーリング（ＧｌｏｂａｌＡｖｅｒａｇｅＰｏｏｌｉｎｇ：ＧＡＰ）層４１５を含む。第１畳み込み層４１１は、グラフデータ１２１のグラフ構造上で、入力されたグラフデータ１２１の畳み込み演算を行う。これにより、２つの特徴マップ４０２が生成される。 The CNN 400 of FIG. 4 includes two convolutional layers and two pooling layers, like the CNN 300 on the grid described with reference to FIG. Specifically, the CNN 400 includes a first convolution layer 411, a first pooling layer 412, a second convolution layer 413, a second pooling layer 414, and a Global Average Pooling (GAP) layer 415. The first convolution layer 411 performs a convolution operation on the input graph data 121 on the graph structure of the graph data 121. As a result, two feature maps 402 are generated.

第１プーリング層４１２、グラフデータ１２１のグラフ構造上で、特徴マップ４０２それぞれにプーリング処理を実行し、表現の自由度が低下した特徴マップ４０３に変換する。第２畳み込み層４１３は、グラフデータ１２１のグラフ構造上で、特徴マップ４０３の畳み込み演算を行う。これにより、３つの特徴マップ４０４が生成される。 On the graph structure of the first pooling layer 412 and the graph data 121, pooling processing is performed on each of the feature maps 402, and the feature maps 402 are converted into feature maps 403 with reduced degrees of freedom in expression. The second convolution layer 413 performs a convolution operation on the feature map 403 on the graph structure of the graph data 121. Thereby, three feature maps 404 are generated.

第２プーリング層４１４、グラフデータ１２１のグラフ構造上で、特徴マップ４０４それぞれにプーリング処理を実行し、表現の自由度が低下した特徴マップ４０５に変換する。特徴マップ４０５は全域平均プーリング層４１５に入力される。全域平均プーリング層４１５は、チャネル毎に全てのニューロンの値の平均値をとる。 On the graph structure of the second pooling layer 414 and the graph data 121, pooling processing is performed on each of the feature maps 404, and the feature maps 404 are converted into the feature maps 405 with reduced degrees of freedom in expression. The feature map 405 is input to the global average pooling layer 415. The global average pooling layer 415 averages the values of all neurons for each channel.

畳み込み層及びプーリング層の処理の詳細は後述するが、畳み込み層及びプーリング層は、それぞれ、グラフデータの各ノードの状態値を更新する。したがって、各層に入力されたデータのグラフ構造は、出力データにおいて維持されている。具体的には、入力グラフデータ１２１、特徴マップ４０２〜４０５のグラフ構造は共通である。各層において、ノードそれぞれに対応するニューロンが存在し、ニューロンは、対応するノードの新たな値を出力する。なお、他の例において、１又は複数の層における入力データのグラフ構造が出力データのグラフ構造と異なっていてもよい。 Although details of the processing of the convolutional layer and the pooling layer will be described later, the convolutional layer and the pooling layer respectively update the state value of each node of the graph data. Therefore, the graph structure of the data input to each layer is maintained in the output data. Specifically, the graph structures of the input graph data 121 and the feature maps 402 to 405 are common. In each layer, there is a neuron corresponding to each node, and the neuron outputs a new value of the corresponding node. In another example, the graph structure of the input data in one or more layers may be different from the graph structure of the output data.

図５は、グラフデータ１２１の例を示す。図５に示すグラフデータ１２１は、エッジの構造情報５００及びノードの構造情報５１０を含む。エッジの構造情報５００は、エッジＩＤ５０１、ノードＩＤ５０２、及びエッジ属性５０３を含む。エッジＩＤ５０１は、グラフデータのエッジそれぞれのＩＤを示す。ノードＩＤ５０２は、グラフデータそれぞれのノードのＩＤを示す。エッジ属性５０３は、エッジそれぞれに付与されている属性情報を示す。エッジ属性５０３は、エッジの向き及び種類の情報を含む。 FIG. 5 shows an example of the graph data 121. The graph data 121 shown in FIG. 5 includes edge structure information 500 and node structure information 510. The edge structure information 500 includes an edge ID 501, a node ID 502, and an edge attribute 503. The edge ID 501 indicates the ID of each edge of the graph data. The node ID 502 indicates the ID of each node of the graph data. The edge attribute 503 indicates attribute information assigned to each edge. The edge attribute 503 includes information on the direction and type of the edge.

ノードの構造情報５１０は、ノードＩＤ５１１、エッジＩＤ（ＯＵＴ）５１２、エッジＩＤ（ＩＮ）５１３、及びノード属性５１４を含む。ノードＩＤ５１１は、ノードＩＤ５０２と同一のである。エッジＩＤ（ＯＵＴ）５１２は、ノードから流出するエッジのＩＤを示す。ノードは、エッジＩＤ（ＯＵＴ）５１２が示すエッジの始点ノードである。エッジＩＤ（ＯＵＴ）５１３は、ノードＩＤ５１１が示すノードに流入するエッジのＩＤを示す。ノードは、エッジＩＤ（ＩＮ）５１３が示すエッジの終点である。ノード属性５１４は、ノードの属性情報を示す。ノード属性５１４は、ノードの種類の情報を含む。 The node structure information 510 includes a node ID 511, an edge ID (OUT) 512, an edge ID (IN) 513, and a node attribute 514. The node ID 511 is the same as the node ID 502. The edge ID (OUT) 512 indicates an ID of an edge flowing out of the node. The node is the start node of the edge indicated by the edge ID (OUT) 512. The edge ID (OUT) 513 indicates an ID of an edge flowing into the node indicated by the node ID 511. The node is the end point of the edge indicated by the edge ID (IN) 513. The node attribute 514 indicates attribute information of the node. The node attribute 514 includes information on the type of the node.

様々な種類の情報がグラフデータで表わすことができ、例えば、地図情報、分子構造情報、ソーシャルネットワークの情報、交通網の情報などが、グラフデータで表わすことができる。例えば、地図の位置がノードで表わされ、道路がエッジで表わされる。土地を表すノードの属性情報は、不動産価値や、駅又は公園等の土地の利用形態などを含む。また、道路を表すエッジの属性情報は、路線名、道路幅、接続する土地（ノード）の属性、向きなどを含む。土地及び道路の属性情報に応じて、土地及び道路それぞれに属性種類が定義される。例えば、グラフＣＮＮを既存の不動産価値を表す属性情報を教師データとして学習することで、与えられた土地の不動産価値の推定を行うことができる。 Various types of information can be represented by graph data, for example, map information, molecular structure information, social network information, transportation network information, and the like can be represented by graph data. For example, the position of the map is represented by a node, and the road is represented by an edge. The attribute information of the node representing the land includes a real estate value, a use form of the land such as a station or a park, and the like. The attribute information of an edge representing a road includes a line name, a road width, an attribute of a land (node) to be connected, a direction, and the like. An attribute type is defined for each of the land and the road according to the attribute information of the land and the road. For example, it is possible to estimate the real estate value of a given land by learning the graph CNN using attribute information representing the existing real estate value as teacher data.

グラフ上の畳み込み演算には、大きく分けて、グラフフーリエ変換を用いる手法と、ＲｅｌａｔｉｏｎａｌＧｒａｐｈＣｏｎｖｏｌｕｔｉｏｎＮｅｔｗｏｒｋ（Ｒ−ＧＣＮ）と呼ばれるより直接的な手法がある。このうち、グラフフーリエ変換を用いる手法は、理論的な背景が明らかという利点があるが、重複エッジのない単純な無向グラフにしか適用することができないという欠点がある。 The convolution operation on a graph can be roughly classified into a method using a graph Fourier transform and a more direct method called Relational Graph Convolution Network (R-GCN). Among them, the method using the graph Fourier transform has an advantage that the theoretical background is clear, but has a disadvantage that it can be applied only to a simple undirected graph without overlapping edges.

一方、Ｒ−ＧＣＮは、グリッド上の畳み込み演算の直感的な拡張になっており、有向グラフエッジ、又はより一般的にノードやエッジに属性が付与されている場合でも自然に定義することが可能である。本開示の特徴は、いずれの手法にも適用することができるが、以下において、Ｒ−ＧＣＮの例を説明する。 On the other hand, the R-GCN is an intuitive extension of the convolution operation on the grid, and can be naturally defined even when a directed graph edge or, more generally, an attribute is given to a node or an edge. is there. Although the features of the present disclosure can be applied to any of the methods, an example of the R-GCN will be described below.

［Ｒ−ＧＣＮの例］
図６、７及び８を参照して、Ｒ−ＧＣＮの公知の例を説明する。畳み込み演算は、ノードの状態値を、当該ノード及び周囲の他のノードの状態値に基づき、更新する。以下においては、一つのノードの状態値を更新する処理の例を説明する。図６は一つのノード６００及び当該ノード６００に接続されているノード６１１〜６１４、６２１〜６２３を示す。ノード６００及びノード６１１〜６１４それぞれを接続するエッジは属性Ｒ１を有する。ノード６００及びノード６２１〜６２３それぞれを接続するエッジは属性Ｒ２を有する。 [Example of R-GCN]
A known example of the R-GCN will be described with reference to FIGS. The convolution operation updates the state value of the node based on the state values of the node and other surrounding nodes. Hereinafter, an example of a process of updating the state value of one node will be described. FIG. 6 shows one node 600 and nodes 611 to 614 and 621 to 623 connected to the node 600. The edge connecting each of the node 600 and the nodes 611 to 614 has an attribute R1. An edge connecting each of the node 600 and the nodes 621 to 623 has an attribute R2.

図７は、重みデータ１２３の一部であって、ノード６００の畳み込みに関する値を示す。重みデータ１２３は、エッジ属性種類６０１と重み６０２を含む。エッジ属性種類６０１は、エッジ属性の種類の識別子を示す。本例において、エッジ属性種類６０１とグラフデータ１２１のエッジ属性５０３で示される種類とは異なる。これらは同一であってもよい。重み６０２は、エッジ属性種類それぞれの重みを示す。重み６０２が示す値は、ＣＮＮ４００の学習（訓練）において更新される。なお、本例のノード６００の状態値の畳み込み演算のバイアスは０とする。 FIG. 7 shows a part of the weight data 123, which is a value related to the convolution of the node 600. The weight data 123 includes an edge attribute type 601 and a weight 602. The edge attribute type 601 indicates an identifier of the type of the edge attribute. In this example, the edge attribute type 601 is different from the type indicated by the edge attribute 503 of the graph data 121. These may be the same. The weight 602 indicates the weight of each edge attribute type. The value indicated by the weight 602 is updated in learning (training) of the CNN 400. The bias of the convolution operation of the state value of the node 600 in this example is set to 0.

図６及び７に示すように、エッジの属性種類は、エッジの属性及びノード６００に対するエッジの方向により定義されている。具体的には、ノード６１２〜６１４とノード６００とを接続するエッジのエッジ属性種類は、Ｒ１（ＩＮ）である。ノード６１１とノード６００とを接続するエッジのエッジ属性種類は、Ｒ１（ＯＵＴ）である。ノード６２２及び６２３とノード６００とを接続するエッジのエッジ属性種類は、Ｒ２（ＩＮ）である。ノード６２１とノード６００とを接続するエッジのエッジ属性種類は、Ｒ２（ＯＵＴ）である。エッジ６３１の始点及び終点はノード６００のセルフループであり、そのエッジ属性種類は、ＳＥＬＦである。 As shown in FIGS. 6 and 7, the attribute type of the edge is defined by the attribute of the edge and the direction of the edge with respect to the node 600. Specifically, the edge attribute type of the edge connecting the nodes 612 to 614 and the node 600 is R1 (IN). The edge attribute type of the edge connecting the node 611 and the node 600 is R1 (OUT). The edge attribute type of the edge connecting the nodes 622 and 623 and the node 600 is R2 (IN). The edge attribute type of the edge connecting the node 621 and the node 600 is R2 (OUT). The start point and end point of the edge 631 are a self-loop of the node 600, and the edge attribute type is SELF.

図８は、ノード６００の状態値の更新を模式的に示す。畳み込み演算は、まず、同一エッジ属性種類のノードの状態値の総和を計算する。具体的には、以下の通りである。ノード６１２〜６１４のエッジ属性種類はＲ１（ＩＮ）である。畳み込み演算は、ノード６１２〜６１４の状態値８０１の総和８１１を計算する。ノード６１１のエッジ属性種類はＲ１（ＯＵＴ）である。畳み込み演算は、ノード６１１の状態値８０２の総和８１２を計算する。 FIG. 8 schematically shows the update of the state value of the node 600. In the convolution operation, first, the sum of the state values of the nodes having the same edge attribute type is calculated. Specifically, it is as follows. The edge attribute type of the nodes 612 to 614 is R1 (IN). The convolution operation calculates the total sum 811 of the state values 801 of the nodes 612 to 614. The edge attribute type of the node 611 is R1 (OUT). The convolution operation calculates the sum 812 of the state values 802 of the node 611.

ノード６２２及び６２３のエッジ属性種類はＲ２（ＩＮ）である。畳み込み演算は、ノード６２２及び６２３の状態値８０３の総和８１３を計算する。ノード６２１のエッジ属性種類はＲ２（ＯＵＴ）である。畳み込み演算は、ノード６２１の状態値８０４の総和８１４を計算する。畳み込み演算は、セルフループエッジ６３１に対して、ノード６００の状態値８０５の総和８１５を計算する。 The edge attribute type of the nodes 622 and 623 is R2 (IN). The convolution operation calculates the sum 813 of the state values 803 of the nodes 622 and 623. The edge attribute type of the node 621 is R2 (OUT). The convolution operation calculates the sum 814 of the state values 804 of the node 621. The convolution operation calculates the sum 815 of the state values 805 of the node 600 with respect to the self-loop edge 631.

次に、畳み込み演算は、エッジ属性種類の総和と対応する重みの積和を計算する。エッジ属性種類の重みは、図７の重みデータ１２３に示されている。つまり、畳み込み演算は、エッジ属性種類の総和それぞれに、重みデータ１２３が示す重みを乗算し、それらの総和を計算する（８２１）。畳み込み演算は、この積和にＲｅＬＵ関数８２２を適用して得られる状態値８３１を出力する。 Next, the convolution operation calculates the product sum of the sum of the edge attribute types and the corresponding weight. The weight of the edge attribute type is shown in the weight data 123 of FIG. That is, in the convolution operation, the sum of the edge attribute types is multiplied by the weight indicated by the weight data 123, and the sum is calculated (821). The convolution operation outputs a state value 831 obtained by applying the ReLU function 822 to the sum of products.

図６〜８を参照して説明したグラフ上のＣＮＮの例は、対象ノードの状態値と、当該対象ノードに隣接するノード状態値とから、当該対象ノードの状態値の更新値を決定する。隣接するノードは、対象ノードから１ホップのノードである。ホップ数は、ノード間の距離を表し、ノード間のエッジの数を示す。一つのノードから他のノードまで複数の経路が存在する場合、それぞれの経路のホップ数が定義される。 In the example of the CNN on the graph described with reference to FIGS. 6 to 8, the updated value of the state value of the target node is determined from the state value of the target node and the state values of the nodes adjacent to the target node. The adjacent node is a node one hop from the target node. The hop number indicates the distance between nodes and indicates the number of edges between nodes. When there are a plurality of routes from one node to another node, the hop number of each route is defined.

［グラフ上のＣＮＮの各層の処理］
図９は、グラフ上のＣＮＮ４００の第１畳み込み層４１１、第１プーリング層４１２、第２畳み込み層４１３及び第２プーリング層４１４の詳細を示す。第１畳み込み層４１１は、各ノードの状態値の更新値を、当該ノード及び１ホップ内のノードの状態値から、ＲｅＬＵ関数を使用して計算する。第１プーリング層４１２は、１ホップ内でのプーリング処理を実行する。プーリング処理の詳細は後述する。 [Treatment of each layer of CNN on graph]
FIG. 9 shows details of the first convolution layer 411, the first pooling layer 412, the second convolution layer 413, and the second pooling layer 414 of the CNN 400 on the graph. The first convolutional layer 411 calculates the update value of the state value of each node from the state value of the node and the node within one hop using the ReLU function. The first pooling layer 412 executes pooling processing within one hop. The details of the pooling process will be described later.

第２畳み込み層４１３は、各ノードの状態値の更新値を、当該ノード及び２ホップ内のノードの状態値から、ＲｅＬＵ関数を使用して計算する。第２プーリング層４１４は、２ホップ内でのプーリング処理を実行する。２ホップ内でのプーリング処理を実行するかわりに、１ホッププーリングを２回実行してもよい。 The second convolutional layer 413 calculates the updated value of the state value of each node from the state values of the node and the nodes within two hops using the ReLU function. The second pooling layer 414 performs pooling processing within two hops. Instead of performing pooling processing within two hops, one hop pooling may be performed twice.

図９に示すように、第２畳み込み層４１３の畳み込み演算の範囲は、第１畳み込み層４１１の畳み込み演算の範囲よりも広い。また、第１プーリング層４１２が１ホップ内でのプーリング処理を実行することに対して、第２プーリング層４１４は、２ホップ内でのプーリング処理を実行する。これは、第２プーリング層４１４のプーリング処理の範囲が、第１プーリング層４１２のプーリング処理の範囲よりも広いことを意味する。 As shown in FIG. 9, the range of the convolution operation of the second convolution layer 413 is wider than the range of the convolution operation of the first convolution layer 411. Further, while the first pooling layer 412 executes pooling processing within one hop, the second pooling layer 414 executes pooling processing within two hops. This means that the range of the pooling process of the second pooling layer 414 is wider than the range of the pooling process of the first pooling layer 412.

図３を参照して説明したように、従来のグリッド上のＣＮＮ３００は、畳み込み演算（畳み込み演算＋非線形活性化関数の適用）の後にプーリング処理（通常は最大値プーリング処理）を実行することで、画像サイズを縮小する。これに対して、本例のグラフ上のＣＮＮ４００のプーリング層４１２及び４１４は、グラフ構造を縮小することなくプーリング処理を実行する。後述するように、グラフ上のＣＮＮ４００におけるプーリング処理は、グラフのサイズを一定に保つ一方で、近隣のノード間で値を近づけるという正則化をとりいれることで、実質的にプーリングを行う。 As described with reference to FIG. 3, the CNN 300 on the conventional grid performs a pooling process (usually a maximum value pooling process) after performing a convolution operation (convolution operation + application of a non-linear activation function). Reduce image size. In contrast, the pooling layers 412 and 414 of the CNN 400 on the graph of the present example execute pooling processing without reducing the graph structure. As will be described later, the pooling process in the CNN 400 on the graph substantially performs pooling by taking into account regularization of making values close to neighboring nodes while keeping the size of the graph constant.

第１プーリング層４１２の直後の第２畳み込み層４１３は、プーリング層で近隣のノードの値が近づいてしまうため、広い範囲を見る、すなわち、畳み込みフィルタのカーネルサイズを広げる。これにより、プーリング処理後の畳み込み演算がより有効となる。第２畳み込み層４１３の後のプーリング層４１４は、第２畳み込み層４１３の畳み込みフィルタのカーネルサイズが広げられたため、プーリング範囲を広げることで、より有効なプーリング処理を実現する。 The second convolution layer 413 immediately after the first pooling layer 412 looks at a wide range, that is, enlarges the kernel size of the convolution filter because the values of neighboring nodes approach each other in the pooling layer. Thereby, the convolution operation after the pooling processing becomes more effective. The pooling layer 414 after the second convolution layer 413 realizes more effective pooling processing by expanding the pooling range because the kernel size of the convolution filter of the second convolution layer 413 is expanded.

図９に示す例において、第１畳み込み層４１１の範囲は１ホップ内であるが、第１畳み込み層４１１の範囲はより広くてもよい。第２畳み込み層４１３の範囲は、第１畳み込み層４１１の範囲より広ければ、それらの関係は限定されない。例えば、第２畳み込み層４１３の範囲は、３ホップ以上であってもよい。 In the example shown in FIG. 9, the range of the first convolution layer 411 is within one hop, but the range of the first convolution layer 411 may be wider. As long as the range of the second convolution layer 413 is wider than the range of the first convolution layer 411, their relation is not limited. For example, the range of the second convolutional layer 413 may be 3 hops or more.

グラフ上のＣＮＮ４００の例は、それぞれ３増以上の畳み込み層及びプーリング層を含むことができる。例えば、畳み込み層の畳み込み演算の範囲（ホップ数）は、例えば、一つ前の畳み込み層の定数倍である。ホップ数２倍の例において、畳み込み層の畳み込み演算の範囲は、１ホップ、２ホップ、４ホップ、８ホップと増加する。ホップ数１．５倍の例において、畳み込み層の畳み込み演算の範囲は、１ホップ、２ホップ、３ホップ、４ホップ、６ホップと増加する。プーリング層のプーリング処理の範囲（ホップ数）は、例えば、直前の畳み込み層のホップ数と一致させる。 Examples of CNNs 400 on the graph can include more than three convolutional and pooling layers, respectively. For example, the range (the number of hops) of the convolution operation of the convolution layer is, for example, a constant multiple of the immediately preceding convolution layer. In the example where the number of hops is twice, the range of the convolution operation of the convolution layer increases to 1 hop, 2 hops, 4 hops, and 8 hops. In the example where the number of hops is 1.5 times, the range of the convolution operation of the convolution layer increases to 1 hop, 2 hops, 3 hops, 4 hops, and 6 hops. The range (the number of hops) of the pooling processing of the pooling layer is made to match, for example, the number of hops of the immediately preceding convolutional layer.

以下において、グラフ上のＣＮＮ４００の処理の例を示す。図１０は、グラフ上のＣＮＮ４００に入力されるグラフデータ１２１の例を示す。グラフデータ１２１は、ノードＮ１〜Ｎ１３と、それぞれノード間を接続する複数のエッジを含む。図１０は明示していないが、ノードＮ１〜Ｎ１３のそれぞれに状態値が定義されており、また、エッジのそれぞれにエッジ属性種類が定義されている。 Hereinafter, an example of the process of the CNN 400 on the graph will be described. FIG. 10 shows an example of graph data 121 input to the CNN 400 on the graph. The graph data 121 includes nodes N1 to N13 and a plurality of edges connecting the nodes. Although not shown in FIG. 10, a state value is defined for each of the nodes N1 to N13, and an edge attribute type is defined for each of the edges.

［第１畳み込み層］
図１１は、第１畳み込み層４１１の畳み込み演算の例を示す。図１１は、ノードＮ１の状態値の更新の例を示す。第１畳み込み層４１１は、１ホップ内で畳み込み演算を実行する。ノードＮ１から１ホップ内のノード、つまり、ノードＮ１に一つのエッジで接続されているノードは、ノードＮ２、Ｎ６、Ｎ９、Ｎ１０及びＮ１２である。第１畳み込み層４１１は、ノードＮ２、Ｎ６、Ｎ９、Ｎ１０及びＮ１２の状態値を集め、それら状態値とノードＮ１の状態値とから、ノードＮ１の更新状態値を計算する。 [First convolution layer]
FIG. 11 shows an example of the convolution operation of the first convolution layer 411. FIG. 11 shows an example of updating the state value of the node N1. The first convolution layer 411 performs a convolution operation within one hop. Nodes within one hop from node N1, that is, nodes connected to node N1 by one edge are nodes N2, N6, N9, N10, and N12. The first convolutional layer 411 collects the state values of the nodes N2, N6, N9, N10, and N12, and calculates the updated state value of the node N1 from the state values and the state value of the node N1.

図１２は、グラフデータ１２１の属性の例を示す。グラフデータのエッジそれぞれにエッジ属性種類が定義されており、Ｐと数字からなる符号で指示されている。例えば、ノードＮ１とノードＮ１２の間のエッジのエッジ属性種類はＲ１であり、ノードＮ１とノードＮ２の間のエッジのエッジ属性種類はＲ２であり、ノードＮ１とノードＮ６の間のエッジのエッジ属性種類はＲ３である。エッジ属性種類Ｒ１のエッジは実線で表わされ、エッジ属性種類Ｒ２のエッジは破線で表わされ、エッジ属性種類Ｒ３のエッジは２点鎖線で表わされている。 FIG. 12 shows an example of the attribute of the graph data 121. An edge attribute type is defined for each edge of the graph data, and is indicated by a code consisting of P and a number. For example, the edge attribute type of the edge between the node N1 and the node N12 is R1, the edge attribute type of the edge between the node N1 and the node N2 is R2, and the edge attribute of the edge between the node N1 and the node N6. The type is R3. The edge of the edge attribute type R1 is represented by a solid line, the edge of the edge attribute type R2 is represented by a broken line, and the edge of the edge attribute type R3 is represented by a two-dot chain line.

図１３は、本例の重みデータ１２３の一部のデータを示す。重みデータ１２３は、対象ノードの現在状態値の重みＷ０及び１ホップのエッジ属性種類の重みを定義すると共に、２ホップのエッジ属性種類の重みを定義する。１ホップのエッジ属性種類は、エッジそれぞれのエッジ属性種類Ｒ１、Ｒ２及びＲ３と一致する。２ホップのエッジ属性種類は、２つのＰと２つの数字で表わされており、二つのエッジのエッジ属性種類の組を示す。 FIG. 13 shows a part of the weight data 123 of the present example. The weight data 123 defines the weight W0 of the current state value of the target node, the weight of the 1-hop edge attribute type, and the weight of the 2-hop edge attribute type. The 1-hop edge attribute types match the edge attribute types R1, R2, and R3 of the respective edges. The 2-hop edge attribute type is represented by two Ps and two numbers, and indicates a set of edge attribute types of two edges.

より具体的には、２ホップのエッジ属性種類は、対象ノードに近いエッジのエッジ属性種類と対象ノードから遠いエッジのエッジ属性種類を示す。例えば、エッジ属性種類Ｒ１Ｒ２は、対象ノードと隣接ノードの間のエッジのエッジ属性種類がＲ１であり、対象ノードの隣接ノードと２ホップ離れたノードとの間のエッジのエッジ属性種類がＲ２であることを表す。 More specifically, the 2-hop edge attribute type indicates an edge attribute type of an edge close to the target node and an edge attribute type of an edge far from the target node. For example, in the edge attribute type R1R2, the edge attribute type of the edge between the target node and the adjacent node is R1, and the edge attribute type of the edge between the adjacent node of the target node and the node two hops away is R2. It represents that.

図１４は、第１畳み込み層４１１によりノードＮ１の状態値の更新を示す。第１畳み込み層４１１は、対象ノードＮ１の状態値Ｖ１及び、対象ノードＮ１の１ホップ内のノード、つまり隣接ノードＮ２、Ｎ６、Ｎ９、Ｎ１０、Ｎ１２の状態値Ｖ２、Ｖ６、Ｖ９、Ｖ１０、Ｖ１２を取得する。対象ノードＮ１と、隣接ノードＮ２、Ｎ６、Ｎ９、Ｎ１０、Ｎ１２それぞれとの間のエッジのエッジ属性種類は、Ｒ２、Ｒ３、Ｒ１、Ｒ２、Ｒ１である。 FIG. 14 shows updating of the state value of the node N1 by the first convolution layer 411. The first convolutional layer 411 includes the state value V1 of the target node N1 and the state values V2, V6, V9, V10, and V12 of the nodes within one hop of the target node N1, that is, the adjacent nodes N2, N6, N9, N10, and N12. To get. Edge attribute types of edges between the target node N1 and the adjacent nodes N2, N6, N9, N10, and N12 are R2, R3, R1, R2, and R1.

図１３の重みデータ１２３が示すように、自ノードの状態値の重みはＷ０、エッジ属性種類Ｒ１、Ｒ２及びＲ３の重みは、それぞれ、Ｗ１、Ｗ２及びＷ３である。畳み込み演算の例は、エッジのエッジ属性種類毎に異なる重みをノードの状態値に乗算して足し合わせ、全てのエッジ属性種類の総和を計算する。つまり、ノードＮ１のための畳み込み演算は、以下の数式で表わされる。なお、バイアスは０とした。
Ｖ１´＝Ｗ０×Ｖ１＋Ｗ１＊（Ｖ９＋Ｖ１２）＋Ｗ２＊（Ｖ２＋Ｖ１０）＋Ｗ３＊Ｖ６ As shown by the weight data 123 in FIG. 13, the weight of the state value of the own node is W0, and the weights of the edge attribute types R1, R2, and R3 are W1, W2, and W3, respectively. In the example of the convolution operation, the state value of the node is multiplied by a different weight for each edge attribute type of the edge and added to calculate the sum of all edge attribute types. That is, the convolution operation for the node N1 is represented by the following equation. The bias was set to 0.
V1 '= W0 * V1 + W1 * (V9 + V12) + W2 * (V2 + V10) + W3 * V6

第１畳み込み層４１１は、畳み込み演算の結果の状態値Ｖ１´を非線形関数であるＲｅＬＵに入力し、その出力をノードＮ１の新たな状態値Ｖ１として保持する。なお、ＲｅＬＵと異なる非線形関数（例えばシグモイド関数）を使用してもよい。第１畳み込み層４１１は、全てのノードについて、上述のような処理を行う。 The first convolution layer 411 inputs the state value V1 ′ resulting from the convolution operation to ReLU, which is a nonlinear function, and holds the output as a new state value V1 of the node N1. Note that a non-linear function (for example, a sigmoid function) different from ReLU may be used. The first convolutional layer 411 performs the above-described processing for all nodes.

［第１プーリング層］
次に、第１プーリング層４１２の処理を説明する。従来のグリッド上のＣＮＮは、畳み込み演算（＋非線形活性化関数の適用）の後に、プーリング処理により画像サイズを縮小することが一般的である。本開示のグラフ上のＣＮＮ４００のプーリング層は、入力グラフデータ１２１のグラフ構造を変更することなく従来のプーリング処理に相当する処理、つまり、グラフ上のプーリング処理を実行する。 [First pooling layer]
Next, the processing of the first pooling layer 412 will be described. In a conventional CNN on a grid, it is common to reduce the image size by a pooling process after a convolution operation (+ application of a non-linear activation function). The pooling layer of the CNN 400 on the graph according to the present disclosure executes a process corresponding to the conventional pooling process without changing the graph structure of the input graph data 121, that is, the pooling process on the graph.

プーリング処理は、近隣のノードで値を近づける処理であり、これによって、表現の自由度を低減する。以下において、いくつかのグラフ上のプーリング処理の例を説明する。プーリング処理には、いくつかの手法が知られているが、一般に使用される手法は、平均値プーリング及び最大値プーリングである。そこで、以下においては、グラフ上の平均値プーリング及び最大値プーリングを説明する。 The pooling process is a process of making values close to each other in a neighboring node, thereby reducing the degree of freedom of expression. Hereinafter, examples of pooling processing on some graphs will be described. Several techniques are known for the pooling process, but the commonly used techniques are average pooling and maximum pooling. Therefore, hereinafter, the average value pooling and the maximum value pooling on the graph will be described.

プーリング処理は、畳み込み演算と同様に、グラフ構造を維持しつつ、各ノードの状態値を更新する。プーリング処理は、対象ノード及び対象ノードの近隣のノードの状態値から、対象ノードの新たな状態値を決定する。図１５は、第１プーリング層４１２によるプーリング処理を模式的に示す。図１５において、２つのノードＮ１及びＮ２が着目されている。第１プーリング層４１２は、ノードＮ１の状態値を、隣接ノードＮ２、Ｎ６、Ｎ９、Ｎ１０、Ｎ１２の状態値と近づくように更新する。同様に、第１プーリング層４１２は、ノードＮ２の状態値を、隣接ノードＮ１、Ｎ３、Ｎ４、Ｎ５の状態値と近づくように更新する。 In the pooling process, similar to the convolution operation, the state value of each node is updated while maintaining the graph structure. The pooling process determines a new state value of the target node from the state values of the target node and nodes near the target node. FIG. 15 schematically shows the pooling process by the first pooling layer 412. In FIG. 15, attention is focused on two nodes N1 and N2. The first pooling layer 412 updates the state value of the node N1 so as to approach the state values of the adjacent nodes N2, N6, N9, N10, and N12. Similarly, the first pooling layer 412 updates the state value of the node N2 so as to approach the state values of the adjacent nodes N1, N3, N4, N5.

グラフ上の平均値プーリングの例は、対象ノード及び隣接ノードの状態値をもとに、対象ノードの新たな状態値が、対象ノード及び隣接ノードの新たな状態値の平均値になるように、対象ノードの新たな状態値と決定する。例えば、第１プーリング層４１２は、ノードＮ１及びその隣接ノードＮ２、Ｎ６、Ｎ９、Ｎ１０、Ｎ１２の現在の（更新前の）状態値を取得し、それらの平均値を計算する。第１プーリング層４１２は、当該平均値をノードＮ１の新たな状態値と決定する。第１プーリング層４１２は、エッジ属性種類に応じて、ノードの状態値に重みを与えた値の平均値を計算してもよい。 The example of the average value pooling on the graph is based on the state values of the target node and the adjacent node, so that the new state value of the target node becomes the average value of the new state values of the target node and the adjacent node. The new state value of the target node is determined. For example, the first pooling layer 412 acquires the current (pre-update) state values of the node N1 and its neighboring nodes N2, N6, N9, N10, N12, and calculates an average value thereof. The first pooling layer 412 determines the average value as a new state value of the node N1. The first pooling layer 412 may calculate an average value of weighted node state values according to the edge attribute type.

グラフ上の平均値プーリングの他の例は、以下の数式に従って対象ノードの新たな状態値を決定する。Ｎはグラフデータ１２１のノードの集合を示し、Ｅはエッジの集合を示す。ｐ_ｉは、ノードｉのプーリング処理の出力であり、ｖ_ｉは、ノードｉのプーリング処理の入力である。ｖ_ｉは、直前の畳み込み演算の出力（状態値）である。βはハイパーパラメータ、Ｉは単位行列、Ｌはラプラシアン行列である。Ｄは、グラフデータ１２１の全エッジ本数である。右辺の第２項をＤで除算しているのは、エッジ１本あたりの制約の強さをそろえるためであるが、Ｄで割らない（Ｄ＝１とする）構成も考えられる。 Another example of the average value pooling on the graph determines a new state value of the target node according to the following formula. N indicates a set of nodes of the graph data 121, and E indicates a set of edges. p _i is the output of the pooling process node i, v _i is the input of pooling processing node i. v _i is the output of the previous convolution (state value). β is a hyperparameter, I is a unit matrix, and L is a Laplacian matrix. D is the total number of edges in the graph data 121. The reason why the second term on the right-hand side is divided by D is to make the strength of the constraint per edge uniform, but a configuration that does not divide by D (D = 1) is also conceivable.

ノードｉの第１プーリング層４１２の出力ｐ_ｉは、第１項より入力ｖ_ｉに近い値であり、かつ、第２項より隣接するノードの出力値ｐ_ｊとも近い値となる。βが大きくなると第２項の寄与が大きくなるため、グラフ上のより広い範囲で値が近づくことになる。すなわち、βはプーリング処理のカーネルサイズに相当するパラメータである。上記式は、より汎用的に適切な平均値プーリングを行うことができる。 The output p _i of the first pooling layer 412 of node i is a value close to the input v _i from the first term, and a value close to the output value p _j neighboring nodes than the second term. As the value of β increases, the contribution of the second term increases, and the value approaches a wider range on the graph. That is, β is a parameter corresponding to the kernel size of the pooling process. The above equation can perform an average pooling that is more general and appropriate.

図１６は、図１５に示すグラフ構造を有するグラフデータ１２１において、適用可能なラプラシアン行例Ｌの例を示す。上記プーリング処理の式及び図１６に示すラプラシアン行例Ｌは、エッジ属性種類を無視して、全エッジを一律にプーリング処理する場合に使用される。プーリング処理は、エッジ属性種類を無視して実行してもよい。 FIG. 16 shows an example of an applicable Laplacian row example L in the graph data 121 having the graph structure shown in FIG. The pooling formula and the Laplacian row example L shown in FIG. 16 are used when all edges are uniformly pooled, ignoring the edge attribute type. The pooling process may be performed ignoring the edge attribute type.

これと異なり、エッジ属性種類に基づきプーリング処理を実行してもよい。これにより、より適切なプーリング処理を実行できる。第１プーリング層４１２は、エッジ属性種類毎にプーリング処理を実行する。具体的には、エッジ属性種類に基づくプーリング処理の例は、以下の数式に従う。 Alternatively, the pooling process may be performed based on the edge attribute type. Thereby, more appropriate pooling processing can be executed. The first pooling layer 412 executes pooling processing for each edge attribute type. Specifically, an example of the pooling process based on the edge attribute type follows the following formula.

Ｅ_１、Ｅ_２、Ｅ_３は、それぞれ、エッジ属性種類Ｒ１、Ｒ２及びＲ３のエッジの集合を示す。β_１、β_２、β_３は、それぞれ、エッジ属性種類Ｒ１、Ｒ２及びＲ３のハイパーパラメータ（プーリングのカーネルサイズ）である。Ｌ_１、Ｌ_２、Ｌ_３は、それぞれ、エッジ属性種類Ｒ１、Ｒ２及びＲ３のプーリングのラプラシアン行列である。 E ₁ , E ₂ , and E ₃ indicate sets of edges of edge attribute types R1, R2, and R3, respectively. β ₁ , β ₂ , and β ₃ are hyperparameters (pooling kernel size) of edge attribute types R1, R2, and R3, respectively. L ₁ , L ₂ , and L ₃ are pooling Laplacian matrices of edge attribute types R1, R2, and R3, respectively.

次に、グラフ上の最大値プーリングの例を説明する。グラフ上の最大値プーリングの例は、プーリング範囲内のノードの状態値の最大値を選択する。例えば、第１プーリング層４１２は、ノードＮ２及びその隣接ノードＮ１、Ｎ３、Ｎ４、Ｎ５の現在の（更新前の）状態値を取得し、それらの内の最大値を選択する。第１プーリング層４１２は、当該最大値をノードＮ２の新たな状態値と決定する。第１プーリング層４１２は、エッジ属性種類に応じた重みに基づき、更新値を与えるノードを選択してもよい。 Next, an example of maximum value pooling on a graph will be described. The example of the maximum value pooling on the graph selects the maximum value of the state values of the nodes within the pooling range. For example, the first pooling layer 412 obtains the current (pre-update) state values of the node N2 and its neighboring nodes N1, N3, N4, N5, and selects the maximum value among them. The first pooling layer 412 determines the maximum value as a new state value of the node N2. The first pooling layer 412 may select a node to which an update value is to be given based on the weight according to the type of the edge attribute.

グラフ上の最大値プーリングの他の例は、入力値（ノードの現在の状態値）にＳｍｏｏｔｈＭａｘｉｍｕｍを適用した後で、平均値プーリングを行う。ＳｍｏｏｔｈＭａｘｉｍｕｍとしていくつかの関数が知られているが、例えば、ＬｏｇＭｅａｎＥｘｐを使用することができる。グラフ上の最大値プーリングの他の例は、以下の数式に従って対象ノードの新たな状態値を決定する。 Another example of maximum value pooling on a graph performs average value pooling after applying the Smooth Maximum to input values (current state values of nodes). Several functions are known as Smooth Maximum, for example, Log MeanExp can be used. Another example of maximum value pooling on a graph determines a new state value of the target node according to the following formula.

ρは、ＳｍｏｏｔｈＭａｘｉｍｕｍの強さを表すパラメータであり、∞の極限で、ＬｏｇＭｅａｎＥｘｐは最大値関数と一致する。他の変数は、平均値プーリングで説明した通りである。〔数３〕で表わされる最大値プーリングは、エッジ属性種類を無視して、全エッジを一律にプーリング処理する場合に使用される。プーリング処理は、エッジ属性種類を無視して実行してもよい。 ρ is a parameter indicating the strength of the Smooth Maximum, and at the limit of ∞, LogMeanExp matches the maximum value function. Other variables are as described for mean pooling. The maximum value pooling represented by [Equation 3] is used when all edges are uniformly pooled, ignoring the edge attribute type. The pooling process may be performed ignoring the edge attribute type.

平均値プーリングの説明と同様に、第１プーリング層４１２は、エッジ属性種類に基づきプーリング処理を実行してもよい。第１プーリング層４１２は、〔数１〕に代えて、〔数２〕を使用する。これにより、より適切なプーリング処理を実行できる。第１プーリング層４１２は、エッジ属性種類毎にプーリング処理を実行する。 As in the description of the average value pooling, the first pooling layer 412 may execute the pooling process based on the edge attribute type. The first pooling layer 412 uses [Equation 2] instead of [Equation 1]. Thereby, more appropriate pooling processing can be executed. The first pooling layer 412 executes pooling processing for each edge attribute type.

［第２畳み込み層］
次に、第２畳み込み層４１３の処理を説明する。上述のように、第１プーリング層４１２は、グラフ構造のサイズを一定に保つ一方で、近隣のノード間で値を近づけるという正則化をとりいれることで、データの表現の自由度を低減する。第２畳み込み層４１３は、第１プーリング層４１２で近隣のノードの値が近づいてしまう分、広い範囲を参照する、つまり、畳み込みフィルタのカーネルサイズ（畳み込み演算で参照するノードの範囲）を広げる。 [Second convolution layer]
Next, the processing of the second convolution layer 413 will be described. As described above, the first pooling layer 412 reduces the degree of freedom in expressing data by adopting regularization in which values are made closer between neighboring nodes while keeping the size of the graph structure constant. The second convolution layer 413 refers to a wide range because the value of a neighboring node approaches in the first pooling layer 412, that is, expands the kernel size of the convolution filter (the range of nodes referred to in the convolution operation).

図１７は、第２畳み込み層４１３による、ノードＮ１のための畳み込み演算を模式的に示す。同様の畳み込み演算が全てのノードに対して実行される。第２畳み込み層４１３は、ノードＮ１から２ホップ範囲内のノードの現在の（更新前の）状態値を取得する。２ホップ範囲内のノードは、対象ノードから１ホップのノード及び２ホップのノードで構成される。なお、２ホップ範囲内のノード、つまり１ホップのノード及び２ホップのノードを使う代わりに、２ホップのノードのみを使う構成も考えられる。 FIG. 17 schematically illustrates the convolution operation for the node N1 by the second convolution layer 413. A similar convolution operation is performed for all nodes. The second convolutional layer 413 acquires the current (pre-update) state value of a node within a 2-hop range from the node N1. The nodes within the two-hop range are composed of one-hop nodes and two-hop nodes from the target node. Note that a configuration using only a 2-hop node instead of using a node within a 2-hop range, that is, a 1-hop node and a 2-hop node, is also conceivable.

図１７に示す例において、ノードＮ２、Ｎ６、Ｎ９、Ｎ１０、Ｎ１２は、ノードＮ１から１ホップのノードである。ノードＮ３〜Ｎ５、Ｎ７〜Ｎ１３は、ノードＮ１から２ホップのノードである。ノードＮ９、Ｎ１０、Ｎ１２は、ノードＮ１から１ホップのノードであると共に、２ホップのノードでもある。ノードＮ９、Ｎ１０、Ｎ１２は、ノードＮ１から１ホップのノードと定義し、ノードＮ１から２ホップのノードから除外してもよい。 In the example illustrated in FIG. 17, the nodes N2, N6, N9, N10, and N12 are one-hop nodes from the node N1. The nodes N3 to N5 and N7 to N13 are two-hop nodes from the node N1. The nodes N9, N10, and N12 are one-hop nodes from the node N1 and are also two-hop nodes. The nodes N9, N10, and N12 may be defined as nodes of one hop from the node N1, and may be excluded from nodes of two hops from the node N1.

第２畳み込み層４１３は、対象ノードＮ１の状態値、対象ノードＮ１から１ホップのノードの状態値及び対象ノードＮ１から２ホップのノードの状態値と、重みデータ１２３でエッジ属性種類毎に定義されている重みの積和を計算する。図１３を参照して説明したように、重みデータ１２３は、自ノードの重み、１つのエッジのエッジ属性種類の重み、及び２つのエッジの組のエッジ属性種類の重みを定義している。図示していないが、重みデータ１２３は、さらに、３種類以上のエッジの組のエッジ属性種類の重みを定義する。第１畳み込み層４１１と同様に、畳み込み演算により得られた対象ノードの状態値は、ＲｅＬＵに入力され、その出力が対象ノードの新しい状態値として保持される。なお、第２畳み込み層４１３の畳み込み演算は、バイアスを含むことができる。 The second convolutional layer 413 is defined for each edge attribute type by the state value of the target node N1, the state value of the node one hop from the target node N1, the state value of the node two hops from the target node N1, and the weight data 123. Calculate the product sum of the weights. As described with reference to FIG. 13, the weight data 123 defines the weight of the own node, the weight of the edge attribute type of one edge, and the weight of the edge attribute type of a pair of two edges. Although not shown, the weight data 123 further defines the weight of the edge attribute type of the set of three or more types of edges. Similarly to the first convolution layer 411, the state value of the target node obtained by the convolution operation is input to ReLU, and the output is held as a new state value of the target node. Note that the convolution operation of the second convolution layer 413 can include a bias.

［第２プーリング層］
第２プーリング層４１４は、図９に示すように、２ホップ範囲内でプーリング処理を実行する。例えば、第２プーリング層４１４は、第１プーリング層４１２が実行するプーリング処理を２回実行する。これにより、第２畳み込み層４１３の処理範囲に応じたプーリング処理を行うことができる。 [Second pooling layer]
The second pooling layer 414 performs pooling processing within a two-hop range, as shown in FIG. For example, the second pooling layer 414 executes the pooling process performed by the first pooling layer 412 twice. Accordingly, pooling processing according to the processing range of the second convolution layer 413 can be performed.

［全域平均プーリング層］
次に、全域平均プーリング層４１５の処理を説明する。グラフ上のＣＮＮ４００は、ノード数やエッジ数が異なる様々なグラフ構造のグラフデータを取り扱う。従来のＣＮＮにおける全結合層は、入力層のニューロンの数が固定であることを前提としており、グラフ上のＣＮＮ４００にそのまま適用することはできない。本開示のグラフ上のＣＮＮ４００は、全域平均プーリングを使用する。なお、入力されるグラフデータのグラフ構造が一定である場合、全域平均プーリング層に代えて、全てのニューロンの値を一ヶ所に集めてアフィン演算することで最終出力する値を計算する全結合層を使用してもよい。 [Whole area average pooling layer]
Next, the processing of the entire area average pooling layer 415 will be described. The CNN 400 on the graph handles graph data having various graph structures with different numbers of nodes and edges. The fully connected layer in the conventional CNN is based on the assumption that the number of neurons in the input layer is fixed, and cannot be directly applied to the CNN 400 on the graph. CNN 400 on the graph of this disclosure uses global average pooling. If the graph structure of the input graph data is constant, instead of the global average pooling layer, an all-connected layer that collects the values of all neurons in one place and performs an affine operation to calculate the final output value May be used.

全域平均プーリング層４１５は、その入力のチャネル毎に全てのニューロンの値の平均値をとる。したがって、全域平均プーリング層４１５の出力は入力チャネル数の次元のベクトルとなる。例えば、全域平均プーリング層４１５の後段にアフィン変換層（全結合層）を配置し、最終的に必要とする次元の出力を得る。全域平均プーリング層４１５は全結合層に比べてパラメータ数が大幅に少ないにも関わらず、全結合層と同等以上の性能を示すことができる。 The global average pooling layer 415 averages the values of all neurons for each input channel. Therefore, the output of the global average pooling layer 415 is a vector of the number of input channels. For example, an affine transformation layer (full coupling layer) is arranged at a stage subsequent to the global average pooling layer 415, and an output of a finally required dimension is obtained. Although the global average pooling layer 415 has a significantly smaller number of parameters than the fully coupled layer, it can exhibit performance equal to or higher than that of the fully coupled layer.

［誤差逆伝播による学習（訓練）］
以下において、グラフ上のＣＮＮ４００の誤差逆伝播によるパラメータの学習を説明する。学習処理は、データ処理部１２０により実行される。誤差逆伝播は、順伝播におけるグラフのノードを逆にたどる。上述のように、第２畳み込み層４１３は、畳み込みフィルタのカーネルサイズを広げる。しかし、第１プーリング層４１２で近隣のノードの値が近づいているので、単純にカーネルサイズを広げるのみでは、パラメータ設定の自由度が大きくなりすぎて過学習を引き起こす、又は、不良設定問題となりパラメータの学習が全くできない可能性が存在する。 [Learning (training) by backpropagation]
In the following, learning of parameters by backpropagation of errors by the CNN 400 on the graph will be described. The learning process is executed by the data processing unit 120. Backpropagation reverses the nodes of the graph in forward propagation. As described above, the second convolution layer 413 increases the kernel size of the convolution filter. However, since the values of the neighboring nodes are approaching in the first pooling layer 412, simply increasing the kernel size increases the degree of freedom in parameter setting, causing over-learning, or causing a bad setting problem and causing a parameter setting problem. There is a possibility that you cannot learn at all.

そこで、プーリング後の第２畳み込み層４１３に対して、畳み込みフィルタのカーネルサイズを大きくすると同時に、直前の第１プーリング層４１２と同様に、畳み込みフィルタ上の近隣の重みを近づけるという正則化を行う。こうすることで、第１プーリング層４１２と直後の第２畳み込み層４１３の自由度が揃うことになり、畳み込み演算とプーリング処理のセットを何層も積み上げる、いわゆる深層学習のグラフ上のＣＮＮをより適切に構成することができる。 Therefore, for the second convolution layer 413 after pooling, regularization is performed such that the kernel size of the convolution filter is increased and, similarly to the immediately preceding first pooling layer 412, the weights of the neighbors on the convolution filter are made closer. By doing so, the degrees of freedom of the first pooling layer 412 and the immediately following second convolution layer 413 become uniform, and the CNN on the so-called deep learning graph is obtained by stacking a number of layers of the convolution operation and the pooling processing set. Can be properly configured.

ここでは、グラフ上のＣＮＮ４００のパラメータの学習（誤差逆伝播）は、畳み込み層における複数の重みを、１段前のプーリング層と同じ正則化項を含む式により最適化する。より具体的には、ＣＮＮ４００全体の誤差関数Ｅ（ｗ）に畳み込み層の正則化項を含める。 Here, learning (parameter backpropagation) of the parameters of the CNN 400 on the graph optimizes a plurality of weights in the convolutional layer by using an equation including the same regularization term as the pooling layer one stage before. More specifically, the error function E (w) of the entire CNN 400 includes the regularization term of the convolutional layer.

上記式は、３以上の畳み込み層を含む構成の誤差関数Ｅ（ｗ）を示す。Ｅ_０（ｗ）は、畳み込み層の正則化項を含まない通常の誤差関数である。右辺の第２項は、第２畳み込み層の正則化項であり、第３項は第３畳み込み層の正則化項である。ｗ_２、ｉ、ｗ_２、ｊは、あるノードから２ホップ離れたノードにおいて隣接している（一つのエッジで結合されている）ノードを示す。ｗ_３、ｉ、ｗ_３、ｊは、あるノードから３ホップ離れたノードにおいて隣接しているノードを示す。βはプーリング層で使用されるβと同一である。誤差関数Ｅ（ｗ）は、２層目の畳み込み層以降の畳み込み層の正則化項を含む。 The above equation shows an error function E (w) of a configuration including three or more convolutional layers. E ₀ (w) is a normal error function that does not include the regularization term of the convolutional layer. The second term on the right side is a regularization term of the second convolutional layer, and the third term is a regularization term of the third convolutional layer. w2 _{, i} , w2 _{, j} indicate adjacent nodes (joined by one edge) at a node two hops away from a certain node. w3 _{, i} , w3 _{, j} indicate adjacent nodes at a node three hops away from a certain node. β is the same as β used in the pooling layer. The error function E (w) includes the regularization terms of the convolutional layers after the second convolutional layer.

上述のように、第２畳み込み層、第３畳み込み層と、後段にいくにしたがって畳み込みフィルタのカーネルサイズを大きくすることになる。上記式の右辺の第２項と第３項でβにかかる次数を増やして正則化項の効果を大きくしていくことで、各畳み込み層における実質的なパラメータ設定の自由度を一定に保つ。 As described above, the kernel size of the convolution filter increases in the second convolution layer, the third convolution layer, and the subsequent stages. By increasing the order of β in the second and third terms on the right side of the above equation to increase the effect of the regularization term, the degree of freedom of the actual parameter setting in each convolutional layer is kept constant.

図１３を参照して説明したように、重みデータ１２３は、自ノードの重み、１つのエッジのエッジ属性種類の重み、及び２つのエッジの組のエッジ属性種類の重み、３種類以上のエッジの組のエッジ属性種類の重み、を定義している。このとき、例えば、右辺の第２項、すなわち、第２畳み込み層の正則化項は、全ノードそれぞれから２ホップのノードにおいて隣接するノード間の重みの差の２乗の和を計算する。 As described with reference to FIG. 13, the weight data 123 includes the weight of the own node, the weight of the edge attribute type of one edge, the weight of the edge attribute type of a pair of two edges, and the weight of three or more types of edges. The weight of a set of edge attribute types is defined. At this time, for example, the second term on the right side, that is, the regularization term of the second convolutional layer, calculates the sum of the squares of the difference in weight between adjacent nodes in a 2-hop node from each of all the nodes.

ノードＮ１を例として、右辺の第２項を説明する。ノードＮ１から２ホップのノードは、ノードＮ３〜Ｎ５、Ｎ７〜Ｎ１３である。ｗ_２、ｉは、これらノードの全ての重みがそれぞれ代入される。ｗ_２、ｊは、ｗ_２、ｉのノードに隣接する全てのノードの全ての重みが代入される。ノードＮ１の第２項の計算（β／２を省略）は以下のように表わされる。 The second term on the right side will be described using the node N1 as an example. The nodes of two hops from the node N1 are the nodes N3 to N5 and N7 to N13. For w2 _{, i} , all the weights of these nodes are substituted. For w2 _{, j} , all the weights of all the nodes adjacent to the node of w2 _{, i} are substituted. The calculation of the second term of the node N1 (omitting β / 2) is expressed as follows.

ノードＮ３の例を説明する。ノードＮ１からノードＮ３を見て、２つのエッジの組のエッジ属性種類の重みは、Ｗ１２、Ｗ２３（ノードＮ２を介する）、Ｗ２３（ノードＮ１０を介する）である。従って、ノードＮ３のｗ_２、ｉは、Ｗ１２、Ｗ２３、Ｗ２３である。ノードＮ１から２ホップのノードにおいて、ノードＮ３に隣接するノードは、ノードＮ４、Ｎ１０、Ｎ１２である。したがって、ｗ_２、ｊは、ノードＮ１からこれらノードを見た重みそれぞれが代入される。 An example of the node N3 will be described. When looking at the node N3 from the node N1, the weights of the edge attribute types of the set of two edges are W12, W23 (via the node N2), and W23 (via the node N10). Therefore, w2, _i of the node N3 is W12, W23, W23. Among the nodes that are two hops from the node N1, the nodes adjacent to the node N3 are the nodes N4, N10, and N12. Therefore, weights obtained by viewing these nodes from the node N1 are substituted for w2 _{, j} .

例えば、ノードＮ１からノードＮ４を見て、２つのエッジの組のエッジ属性種類の重みは、Ｗ２２である。ノードＮ１からノードＮ１０を見て、２つのエッジの組のエッジ属性種類の重みは、Ｗ１１、Ｗ１２である。ノードＮ１からノードＮ１２を見て、２つのエッジの組のエッジ属性種類の重みは、Ｗ２２である。 For example, when looking at the node N1 to the node N4, the weight of the edge attribute type of the pair of two edges is W22. When looking at the node N10 from the node N1, the weights of the edge attribute types of the pair of two edges are W11 and W12. Looking at the node N1 to the node N12, the weight of the edge attribute type of the pair of two edges is W22.

第２項は、ノードＮ１について、ノード３についての上記説明と同様の処理を、ノードＮ４、Ｎ５、Ｎ７〜Ｎ１３についても計算する。さらに、ノードＮ１以外の他のノードについても同様の計算を行う。第３項は、第２項と同様に、全ノードそれぞれから３ホップのノードについて計算を行う。 The second term calculates the same processing for the node N1 as described above for the node 3 also for the nodes N4, N5, N7 to N13. Further, the same calculation is performed for nodes other than the node N1. In the third term, similarly to the second term, calculation is performed for a node of three hops from each of all nodes.

他の例において、正則化項は、グラフ構造に代えて、エッジ属性種類の関係に基づく計算を行ってもよい。例えば、第２項は、対象ノードに近いエッジのエッジ属性種類が同一の重みが近づくような計算を行ってもよい。例えば、重みグループ（Ｗ１１、Ｗ１２、Ｗ１３）、（Ｗ２１、Ｗ２２、Ｗ２３）、（Ｗ３１、Ｗ３２、Ｗ３３）それぞれにおいて、重みが近づくように、第２項が定義される。第３項以下についても、３ホップ以上のエッジ属性種類に応じて同様に定義される。 In another example, the regularization term may be calculated based on the relationship between edge attribute types instead of the graph structure. For example, for the second term, a calculation may be performed such that an edge close to the target node has the same edge attribute type and the same weight. For example, in each of the weight groups (W11, W12, W13), (W21, W22, W23), and (W31, W32, W33), the second term is defined so that the weights become closer. The third and subsequent terms are similarly defined according to the types of edge attributes of three or more hops.

第２畳み込み層の勾配値は、以下の数式のように、誤差関数Ｅ（ｗ）をｗで微分することにより得られる。 The gradient value of the second convolutional layer is obtained by differentiating the error function E (w) with w as in the following equation.

〔数６〕における右辺の第１項は、上層からの誤差信号（通常の誤差逆伝播）を示す。右辺の第２項は、正規化項の微分を示す。他の畳み込み層の勾配値も同様の式で表わすことができる。学習処理は、この勾配値にしたがって、畳み込み層の重みｗを更新する。例えば、ＳＧＤ（ｓｔｏｃｈａｓｔｉｃｇｒａｄｉｅｎｔｄｅｓｃｅｎｔ）、ＡＤＡＭなどの公知の手法を使用することができる。このように、誤差逆伝播による学習は、上層から逆伝播されてきた通常の誤差信号に、正則化項の微分の項（βＬｗ_、ｎ）を加算した値を勾配として、重みｗを更新する。これにより、畳み込み層（フィルタ）に対して、直前のプーリングに応じた制約を加えることができる。 The first term on the right side in [Equation 6] indicates an error signal (normal error back propagation) from the upper layer. The second term on the right side shows the differentiation of the normalization term. The gradient values of other convolutional layers can be expressed by a similar expression. The learning process updates the weight w of the convolutional layer according to the gradient value. For example, a known method such as SGD (stomatic gradient descent) and ADAM can be used. As described above, in learning by error back propagation, the weight w is updated with a value obtained by adding a differential term (βLw _{, n} ) of a regularization term to a normal error signal back-propagated from the upper layer as a gradient. This makes it possible to add a constraint to the convolutional layer (filter) according to the immediately preceding pooling.

次に、学習処理の全体について説明する。図１８は、計算機１０１が実行する学習処理の一つの例を説明するフローチャートである。プロセッサ１１０は、ニューラルネットワークの構築処理を実行する（Ｓ２０１）。ニューラルネットワークの構築処理は後述する。 Next, the entire learning process will be described. FIG. 18 is a flowchart illustrating an example of a learning process performed by the computer 101. The processor 110 executes a neural network construction process (S201). The construction process of the neural network will be described later.

次に、プロセッサ１１０は、ストレージシステム１０２から学習データ１２２を取得し、メモリ１１１に格納する（Ｓ２０２）。学習データ１２２は複数のサンプルを含む。次に、プロセッサ１１０は、誤差逆伝播処理のループ処理を開始する（Ｓ２０３）。本ループ処理では、プロセッサ１１０は、ステップＳ２０３からステップＳ２０７の処理を繰り返し実行する。例えば、本ループ処理内で実行される誤差逆伝播処理によって算出される誤差が予め設定された閾値以下になった場合、又は、予め決められた回数だけ誤差逆伝播処理が実行された場合に、ループ処理が終了する。 Next, the processor 110 acquires the learning data 122 from the storage system 102 and stores it in the memory 111 (S202). The learning data 122 includes a plurality of samples. Next, the processor 110 starts a loop process of the error back propagation process (S203). In this loop processing, the processor 110 repeatedly executes the processing from step S203 to step S207. For example, when the error calculated by the error back-propagation processing executed in the loop processing becomes equal to or less than a preset threshold value, or when the error back-propagation processing is executed a predetermined number of times, The loop processing ends.

プロセッサ１１０は、サンプルのループ処理を開始する（ステップＳ７０５）。サンプルのループ処理では、ステップＳ２０２において取得された複数のサンプルに対し、サンプル毎にループ内部の処理を実行する。さらに、プロセッサ１１０は、１つのサンプルに対して誤差逆伝播処理を実行する（Ｓ２０５）。 The processor 110 starts loop processing of the sample (step S705). In the sample loop process, the process inside the loop is executed for each of the plurality of samples acquired in step S202. Further, the processor 110 performs an error back propagation process on one sample (S205).

誤差逆伝播処理では、プロセッサ１１０は、あるサンプルを入力とし、その入力に対するグラフ上のＣＮＮ４００の出力結果と当該サンプルに対応する教師データとを比較し、２つのデータの誤差を削減するように、ＣＮＮ４００の上層から順に重みを更新する。 In the error back-propagation processing, the processor 110 takes a sample as an input, compares an output result of the CNN 400 on the graph with respect to the input with teacher data corresponding to the sample, and reduces an error between the two data. The weight is updated sequentially from the upper layer of the CNN 400.

サンプルのループ処理では、プロセッサ１１０が、複数のサンプルの各々に対して前述した処理を実行する。ステップＳ２０５の誤差逆伝播処理では、任意のサンプルデータに対する重みの更新が１回だけ行われる。一方、誤差逆伝播処理のループ処理では、誤差を最小にするためには、誤差逆伝播処理が複数回実行される。具体的には、プロセッサ１１０が、所定の条件を満たすまで、繰り返し、複数のサンプルの各々に対して誤差逆伝播処理を実行する。また、サンプルのループ処理は、複数のサンプルに対する重みの更新が行われる。 In the sample loop processing, the processor 110 performs the above-described processing on each of a plurality of samples. In the error back-propagation processing of step S205, the weight is updated only once for any sample data. On the other hand, in the loop processing of the error backpropagation processing, the error backpropagation processing is executed a plurality of times in order to minimize the error. Specifically, the processor 110 repeatedly executes the backpropagation processing on each of the plurality of samples until a predetermined condition is satisfied. In the sample loop process, the weight is updated for a plurality of samples.

次に、プロセッサ１１０は、全てのサンプルデータに対して誤差逆伝播処理が実行されたか否かを判定する（Ｓ２０６）。全てのサンプルデータに対して誤差逆伝播処理が実行されていないと判定された場合、プロセッサ１１０は、ステップＳ２０４に戻り、同様の処理を実行する。全てのサンプルデータに対して誤差逆伝播処理が実行されたと判定された場合、プロセッサ１１０は、所定の条件を満たしたか否かを判定する（Ｓ２０７）。 Next, the processor 110 determines whether or not the backpropagation processing has been performed on all the sample data (S206). When it is determined that the backpropagation processing has not been performed on all the sample data, the processor 110 returns to step S204 and performs the same processing. If it is determined that the backpropagation processing has been performed on all the sample data, the processor 110 determines whether a predetermined condition has been satisfied (S207).

所定の条件を満たしていないと判定された場合、プロセッサ１１０は、ステップＳ２０３に戻り、同様の処理を実行する。所定の条件を満たしたと判定された場合、プロセッサ１１０は、学習結果をメモリ１１１又はストレージシステム１０２に格納する（ステップＳ２０８）。学習結果は、構築されたニューラルネットワークの構造を示す情報及び学習により得られた重みの情報等が含まれる。重みは、エッジ属性種類それぞれに対して得られ、分析処理において、重みデータ１２３として使用される。 When it is determined that the predetermined condition is not satisfied, the processor 110 returns to step S203 and performs the same processing. When it is determined that the predetermined condition is satisfied, the processor 110 stores the learning result in the memory 111 or the storage system 102 (Step S208). The learning result includes information indicating the structure of the constructed neural network, weight information obtained by learning, and the like. The weight is obtained for each edge attribute type, and is used as the weight data 123 in the analysis processing.

［グラフ上のＣＮＮの構築］
グラフ上のＣＮＮ４００は、ノード数やエッジ数が異なる様々なグラフ構造のグラフデータを取り扱う。データ処理部１２０は、分析対象のグラフデータ１２１に応じたＣＮＮ４００を構築する。データ処理部１２０は、グラフデータ１２１のグラフ構造に従って、畳み込み層及びプーリング層それぞれを構築する。データ処理部１２０は、学習結果からエッジ属性種類それぞれに対する重みを取得して、重みデータ１２３を構成する。 [Construction of CNN on graph]
The CNN 400 on the graph handles graph data having various graph structures with different numbers of nodes and edges. The data processing unit 120 constructs the CNN 400 according to the graph data 121 to be analyzed. The data processing unit 120 constructs each of the convolutional layer and the pooling layer according to the graph structure of the graph data 121. The data processing unit 120 acquires weights for each of the edge attribute types from the learning result, and forms the weight data 123.

以上のように、本実施例によれば、グラフ上のＣＮＮにおいてプーリングを行うことで、表現可能自由度（パラメータの自由度）を上層に行くほど小さくすることができ、パラメータの学習時の過学習や不良設定問題の蓋然性を低下させることができる。また、ＣＮＮの層の数を増加させていわゆる深層学習を行うグラフ上のＣＮＮを構成することが可能となる。 As described above, according to the present embodiment, by performing pooling in the CNN on the graph, the representable degree of freedom (the degree of freedom of the parameter) can be reduced toward the upper layer, and the excess in learning the parameter can be obtained. It is possible to reduce the probability of learning and the problem of setting a defect. In addition, it is possible to configure a CNN on a graph for performing so-called deep learning by increasing the number of CNN layers.

以下において、分散ＣＮＮを説明する。分散ＣＮＮは、ネットワークを介して接続された複数のサブシステム（例えば計算機）が、ＣＮＮを実行する。サブシステムは、サブシステムをネットワーク上で隣接するサブシステムとのみ通信（情報交換）が可能である。なお、サブシステム間を接続するネットワーク構造は、固定である必要はなく時間的に変化してもよい。分散ＣＮＮにおいて、各サブシステムは、例えば、それぞれ独立にシステム状態関数の値を推定する。分散ＣＮＮにより、全てのサブシステムの推定値が真のシステム状態関数の値に一致する。 Hereinafter, the distributed CNN will be described. In the distributed CNN, a plurality of subsystems (for example, computers) connected via a network execute the CNN. The subsystem can communicate (exchange information) only with the subsystem adjacent to the subsystem on the network. The network structure connecting the subsystems does not need to be fixed, and may change over time. In the distributed CNN, each subsystem estimates the value of the system state function independently, for example. With distributed CNN, the estimates of all subsystems match the values of the true system state function.

他の例において、各サブシステムがそれぞれの環境入力と内部状態に応じてアクションを取ってもよい。各サブシステムは、全体システムの状態関数の値を最適化するようなアクションを見出すことができる。例えば、ネットワークを介して接続された工場（の計算機）が、それぞれ隣接する工場とのみ通信を行い、全工場の総利益を最小化するような工場の電力使用量を決定する。工場又はその計算機は、サブシステムである。 In another example, each subsystem may take action in response to a respective environmental input and internal state. Each subsystem can find an action that optimizes the value of the state function of the overall system. For example, a factory (computer) connected via a network communicates only with each adjacent factory, and determines the power consumption of the factory so as to minimize the total profit of all factories. The factory or its computer is a subsystem.

このほか、分散ＣＮＮは、車間通信による、車のルート最適化に適用できる。車又はその計算機（例えばカーナビゲーションシステム）は、サブシステムである。各車は、現在位置と、目的地とから、独立にルートを決定する。各車が近隣の車と通信することで、多数の車が、全体として一つの分散ＣＮＮを構成する。各車は、自車のルートを入力として、分散ＣＮＮによって、平均移動時間を回帰（推定）する。 In addition, the distributed CNN can be applied to vehicle route optimization by inter-vehicle communication. A car or its calculator (eg, a car navigation system) is a subsystem. Each car independently determines a route from the current position and the destination. Each car communicates with neighboring cars so that many cars constitute one distributed CNN as a whole. Each vehicle regresses (estimates) the average traveling time by the variance CNN, using the route of the vehicle as an input.

各車は、独立に、誤差逆伝播によって、分散ＣＮＮの出力値（＝平均移動時間）を減らす方向に、ルート（＝入力）を更新する。同時に、各車は、独立に、道路交通情報から得た真の平均移動時間と、分散ＣＮＮの出力値との誤差を減らすように、誤差逆伝播で、パラメータ（重み）を更新する。各車は、目的地につけば、分散ＣＮＮから離脱する。 Each vehicle independently updates the route (= input) in a direction to reduce the output value (= average travel time) of the distributed CNN by backpropagation. At the same time, each vehicle independently updates a parameter (weight) by error backpropagation so as to reduce the error between the true average travel time obtained from the road traffic information and the output value of the distributed CNN. Each car leaves the decentralized CNN once it reaches its destination.

［システム構成］
図１９は、分散ＣＮＮを実行する計算機システムの例を示す。計算機システムは、複数の計算機１０１Ａ〜１０１Ｍを含み、これらはネットワークを介して接続されている。通信可能な隣接計算機は、エッジで接続されている。計算機の構成は実施例１で説明した通りである。図１９のネットワーク構成は一例であって、他の例において、グリッド状に配置された計算機が隣接計算機とのみ通信可能であってもよい。 [System configuration]
FIG. 19 shows an example of a computer system that executes a distributed CNN. The computer system includes a plurality of computers 101A to 101M, which are connected via a network. Neighboring computers that can communicate are connected by an edge. The configuration of the computer is as described in the first embodiment. The network configuration in FIG. 19 is an example, and in another example, computers arranged in a grid may be able to communicate only with adjacent computers.

分散ＣＮＮは、ＣＮＮを構成するニューロンを層（畳み込み層やプーリング層等）ごとにまとめて扱うのではなく、複数の層の同一位置のニューロン（以下ではカラムと呼ぶ）をまとめて、一つのサブシステムで計算する。結果として、多数のサブシステムが集まったシステム全体が、単一のＣＮＮとして動作する。 The distributed CNN does not handle neurons constituting the CNN collectively for each layer (convolution layer, pooling layer, etc.), but collects neurons at the same position in a plurality of layers (hereinafter referred to as columns) to form one sub-cell. Calculate with the system. As a result, the entire system of many subsystems operates as a single CNN.

図２０は、分散ＣＮＮの層及びカラムの例を示す。分散ＣＮＮは、畳み込み層やプーリング層等の層７０１〜７０４を含む。図２０は、一つのカラム７１１を例として示す。一つのカラムは、一つのサブシステムにより実行される。図１９の計算機システムにおいて、いずれか一つの計算機が、カラムのニューロンの計算を行う。 FIG. 20 shows an example of dispersed CNN layers and columns. The distributed CNN includes layers 701 to 704 such as a convolution layer and a pooling layer. FIG. 20 shows one column 711 as an example. One column is executed by one subsystem. In the computer system shown in FIG. 19, one of the computers performs the calculation of the column neurons.

以上の説明から理解されるように、実施例１において説明したグラフ上のＣＮＮは、分散ＣＮＮに適用することができる。図１９に示す計算機システムのネットワーク構造は、実施例１で説明したグラフデータ１２１のグラフ構造と一致している。計算機１０１Ａ〜１０１Ｍは、それぞれ、ＣＮＮを構成する各層（畳み込み層やプーリング層）において、グラフデータ１２１における対応するノードの演算処理を行う。 As understood from the above description, the CNN on the graph described in the first embodiment can be applied to the distributed CNN. The network structure of the computer system shown in FIG. 19 matches the graph structure of the graph data 121 described in the first embodiment. Each of the computers 101A to 101M performs an arithmetic process on a corresponding node in the graph data 121 in each layer (convolution layer or pooling layer) configuring the CNN.

図２１は、分散ＣＮＮの概念的な構成例を示す。図２１は、３つの計算機（サブシステム）１０１Ｂ、１０１Ｃ、及び１０１Ｄを例として示す。計算機１０１Ｄは、計算機１０１Ｂ及び計算機１０１Ｃと隣接しており、通信可能である。計算機１０１Ｂ、１０１Ｃ、及び１０１Ｄは、それぞれ、カラム７１１Ｂ、７１１Ｃ、及び７１１Ｄを実行する。カラムを構成する四角はニューロンを示す。 FIG. 21 shows a conceptual configuration example of a distributed CNN. FIG. 21 shows three computers (subsystems) 101B, 101C, and 101D as an example. The computer 101D is adjacent to the computers 101B and 101C and can communicate with each other. The computers 101B, 101C, and 101D execute columns 711B, 711C, and 711D, respectively. The squares that make up the columns indicate neurons.

各サブシステムは、自装置が担当するカラム上のニューロンの出力値をまとめたベクトルを内部状態として保持する。各サブシステムは、自分の内部状態ベクトルと、隣接サブシステムと通信によって得た隣接サブシステムの内部状態ベクトル（隣接位置のカラム上のニューロンの値）を入力として、各層の演算を行って、自分の内部状態ベクトルを更新する。 Each subsystem holds, as an internal state, a vector in which the output values of the neurons on the column in charge of the subsystem are collected. Each subsystem inputs its own internal state vector and the internal state vector of the adjacent subsystem obtained by communication with the adjacent subsystem (the value of the neuron on the column at the adjacent position), and performs the operation of each layer, and Update the internal state vector of.

分散ＣＮＮを実行する計算機システムにおいて、システム全体を統括する中央制御システムは存在せず、システム全体でのグローバルな同期機構も存在しない。各サブシステムは、隣接サブシステムとの通信とサブシステムの内部状態ベクトルの更新を、他のサブシステムは非同期に実行する。サブシステムは、他のサブシステムの内部状態ベクトルと自サブシステムの内部状態ベクトルに基づき、自サブシステムの内部状態ベクトルを更新する。 In a computer system that executes a distributed CNN, there is no central control system that controls the entire system, and there is no global synchronization mechanism for the entire system. Each subsystem communicates with an adjacent subsystem and updates an internal state vector of the subsystem, and the other subsystems execute asynchronously. The subsystem updates the internal state vector of its own subsystem based on the internal state vector of another subsystem and the internal state vector of its own subsystem.

隣接サブシステムとの通信とサブシステムの内部状態ベクトルの更新は、同頻度で行われるとは限らず、あるサブシステムでは隣接サブシステムとの通信頻度が高いのに対して、別のサブシステムでは内部状態ベクトルの更新の頻度高いこともある。あるサブシステムが複数の隣接サブシステムをもつとき、それぞれの隣接サブシステムとの通信が行われる頻度は同一でなくてよい。 The communication with the adjacent subsystem and the update of the internal state vector of the subsystem are not always performed at the same frequency.In one subsystem, the communication frequency with the adjacent subsystem is high, while in the other subsystem, the communication frequency is high. The internal state vector may be updated frequently. When a certain subsystem has a plurality of adjacent subsystems, the frequency of communication with each adjacent subsystem may not be the same.

分散ＣＮＮを実行する計算機システムは、隣接サブシステム間の通信と内部状態ベクトルの更新処理をそれぞれ非同期に行う。したがって、サブシステムは、内部状態ベクトルの更新を実行する時に、畳み込み演算に必要な隣接コラム（隣接サブシステムが実行するカラム）上のニューロンの値を得ることはできない。 The computer system that executes the distributed CNN asynchronously performs communication between adjacent subsystems and updates the internal state vector. Therefore, when executing the update of the internal state vector, the subsystem cannot obtain the value of the neuron on the adjacent column (the column executed by the adjacent subsystem) required for the convolution operation.

そのため、サブシステムは、前回の隣接サブシステムとの通信のときに得られた隣接コラム上のニューロンの値（内部状態ベクトル）をメモリに記憶し、内部状態ベクトルの更新時には記憶しておいた値を用いて畳み込み演算を行う。この構成は、後述するように、誤差逆伝播を非同期で行うことを可能とする。 Therefore, the subsystem stores the value of the neuron on the adjacent column (internal state vector) obtained in the previous communication with the adjacent subsystem in the memory, and stores the value stored when the internal state vector is updated. Is used to perform a convolution operation. This configuration makes it possible to perform error back propagation asynchronously, as described later.

［畳み込み］
以下において、分散ＣＮＮの各層の処理を説明する。計算機システムは、実施例１において説明した、グラフ上のＣＮＮ４００における各層の処理を実行することができる。まず、畳み込み層の処理を説明する。分散ＣＮＮを実行する各サブシステムは、自システムが担当するカラムの畳み込み演算する。各サブシステムは、実施例１で説明した畳み込み層の処理を、担当するカラムにおいて実行する。 [Convolution]
Hereinafter, processing of each layer of the distributed CNN will be described. The computer system can execute the processing of each layer in the CNN 400 on the graph described in the first embodiment. First, the processing of the convolution layer will be described. Each subsystem executing the distributed CNN performs a convolution operation of a column in charge of the own system. Each subsystem executes the processing of the convolutional layer described in the first embodiment in the column in charge.

実施例１において説明したように、畳み込み演算は、対象ノード（ニューロン）の値及び対象ノードから所定ホップ数のノードの値を使用する。分散ＣＮＮでは、各サブシステムは、対象ノード（ニューロン）以外のノードの値を、他のサブシステムから収集する。 As described in the first embodiment, the convolution operation uses the value of the target node (neuron) and the value of the node having a predetermined number of hops from the target node. In the distributed CNN, each subsystem collects values of nodes other than the target node (neuron) from other subsystems.

上述のように、各サブシステムは、隣接サブシステムとの通信時に得た隣接カラム上のニューロンの値を記憶しておき、内部状態ベクトルの更新時には、記憶しておいた値を用いて畳み込み演算を行う。畳み込み演算の重み（カーネルの係数）は、全てのサブシステムで共通である。例えば、サブシステムは、互いに通信を行うことで、同一の重みデータを共有することができる。 As described above, each subsystem stores the value of the neuron on the adjacent column obtained at the time of communication with the adjacent subsystem, and performs the convolution operation using the stored value when updating the internal state vector. I do. The weight (coefficient of the kernel) of the convolution operation is common to all subsystems. For example, the subsystems can share the same weight data by communicating with each other.

サブシステムは、畳み込み演算のカーネル範囲（ホップ数）内の全てのカラムの値を必要とする。実施例１で説明したように、プーリング層の後の畳み込み章におけるカーネルサイズは広げられる。第２畳み込み層４１３の処理のような、２ホップ以上のカーネルサイズの畳み込み演算のため、サブシステムは、隣接していないサブシステムのカラムの値（内部状態ベクトル）を、隣接サブシステムを介したマルチホップ中継により取得する。ｎホップの情報伝達は、ｎ回の通信が必要である。 The subsystem needs the values of all columns within the kernel range (hop count) of the convolution operation. As described in the first embodiment, the kernel size in the convolution chapter after the pooling layer is increased. For a convolution operation with a kernel size of 2 hops or more, such as the processing of the second convolution layer 413, the subsystem uses the values of the columns (internal state vectors) of the non-adjacent subsystems via the adjacent subsystems. Obtained by multi-hop relay. Information transmission of n hops requires n times of communication.

［プーリング］
次に、プーリング処理について説明する。分散ＣＮＮでは、各サブシステムが、ＣＮＮの下層から上層まで同一位置のニューロンをカラムとして扱うため、全ての層でニューロンの数は一定である。これは、実施例１において説明したグラフ上のＣＮＮと同様である。分散ＣＮＮにおいて、各サブシステムは、実施例１で説明したプーリング処理と同様に、そこで、ニューロン数（ノード数）を削減することなく、表現の自由度を低下させるプーリング処理を実行する。 [Pooling]
Next, the pooling process will be described. In the distributed CNN, since each subsystem treats neurons at the same position from the lower layer to the upper layer of the CNN as columns, the number of neurons is constant in all layers. This is similar to the CNN on the graph described in the first embodiment. In the distributed CNN, similarly to the pooling process described in the first embodiment, each subsystem executes a pooling process that reduces the degree of freedom of expression without reducing the number of neurons (the number of nodes).

実施例１において説明した平均値プーリング及び最大値プーリングの双方が、分散ＣＮＮにおけるプーリング処理に適用できる。実施例１において説明したように、プーリング処理は、隣接ノードの状態値を必要とする。分散ＣＮＮにおいては、サブシステムは、自システムの内部状態ベクトルと隣接するサブシステムから取得した内部状態ベクトルから、自システムの新たな内部状態ベクトルを計算する。 Both the average value pooling and the maximum value pooling described in the first embodiment can be applied to the pooling process in the distributed CNN. As described in the first embodiment, the pooling process requires the state value of the adjacent node. In the distributed CNN, a subsystem calculates a new internal state vector of its own system from an internal state vector of its own system and an internal state vector acquired from an adjacent subsystem.

［全結合］
次に全結合について説明する。集中型のＣＮＮは、畳み込み演算及びプーリングを何度か繰り返したのち全結合層によって全てのニューロンの値を一ヶ所に集めてアフィン演算することで、最終出力する値を計算する。一方、分散ＣＮＮでは、各サブシステムは自律的に動作しており、全サブシステムの値を一ヶ所に集めることはできない。 [Full join]
Next, all coupling will be described. The centralized CNN calculates the final output value by repeating the convolution operation and pooling several times, then collecting all neuron values at one place by an all-connected layer and performing an affine operation. On the other hand, in the distributed CNN, each subsystem operates autonomously, and the values of all subsystems cannot be collected in one place.

そこで、本開示の計算機システムは、分散で平均値を求める既知のアルゴリズムを利用する。分散で平均値を求めるアルゴリズムは、例えば、分散ＡＤＭＭ（ＡｌｔｅｒｎａｔｉｎｇＤｉｒｅｃｔｉｏｎＭｅｔｈｏｄｏｆＭｕｌｔｉｐｌｉｅｒｓ）や平均合意アルゴリズムである。 Therefore, the computer system according to the present disclosure uses a known algorithm for calculating an average value by variance. The algorithm for obtaining the average value by the variance is, for example, a distributed ADMM (Alternating Direction Method of Multipliers) or an average agreement algorithm.

具体的には、各サブシステムは、まず、自分のカラム上のニューロンの出力値に、対応する全結合層の重み係数を乗算する。次に、全てのサブシステムの乗算結果の値の平均値を、後述のアルゴリズムを用いて分散で計算することで、全結合層の出力値を得ることができる。なお、この方法は、ニューロンの値に重み係数を乗算したものの総和ではなく平均値が出力する。重み係数は学習データ（教師データ）から学習されるため、この違いが問題になることはない。従来の集中型ＣＮＮで学習した重み係数を分散ＣＮＮで用いる場合には、各サブシステムに保持させる重み係数の値に、あらかじめ全サブシステムの数を乗算しておけばよい。 Specifically, each subsystem first multiplies the output value of the neuron on its own column by the weight coefficient of the corresponding fully connected layer. Next, the output value of the fully connected layer can be obtained by calculating the average of the values of the multiplication results of all the subsystems in a variance using an algorithm described later. In this method, an average value is output instead of the sum of the values obtained by multiplying the value of the neuron by the weight coefficient. Since the weight coefficient is learned from the learning data (teacher data), this difference does not matter. When the weighting factor learned in the conventional centralized CNN is used in the distributed CNN, the value of the weighting factor held in each subsystem may be multiplied by the number of all subsystems in advance.

以下において、分散で平均値を計算するアルゴリズムの二つの例を説明する。第１の例は分散ＡＤＭＭを用いた平均値計算アルゴリズムである。分散ＡＤＭＭを用いて以下の最適化問題を解くことで、Ｎ個の値ｘ_ｉの平均値νを分散で計算することができる。 In the following, two examples of an algorithm for calculating an average value by variance will be described. The first example is an average calculation algorithm using a distributed ADMM. Using a dispersion ADMM by solving the following optimization problem, it is possible to calculate the mean value ν of N values x _i in a distributed.

分散ＡＤＭＭを用いる方法は、常に、上記式を満たすように最適化を行うため、動作の最中に値ｘｉやネットワークトポロジが動的に変化した場合でも、変化に追随してその時点での平均値νを出力することができる。したがって、任意の指定した時点の値に対する平均値を計算することができる。 In the method using the distributed ADMM, the optimization is always performed so as to satisfy the above equation. Therefore, even when the value xi or the network topology changes dynamically during the operation, the average at that time follows the change. The value ν can be output. Therefore, it is possible to calculate an average value for a value at any specified time.

第２の例は平均合意アルゴリズムである。平均合意アルゴリズムは、各サブシステムで保持する値ｘ_ｉの総和を一定に保ちながら、サブシステム間で値ｘ_ｉを分配し、最終的に、全てのサブシステムの値ｘ_ｉのを同一の値に収束させる。値ｘ_ｉの総和が常に一定であることから、収束値は各値ｘ_ｉ初期値の平均値となる。いくつかの種類の平均アルゴリズムの中で、最も単純な平均合意アルゴリズムは、例えば、サブシステムｉとサブシステムｊの通信時に、値ｘ_ｉと値ｘ_jを以下の式に従って更新する。 A second example is an average agreement algorithm. Mean agreement algorithm, while keeping the sum of the values x _i of holding in each subsystem constant distributes the values x _i between subsystems, finally, the same value that the value x _i of all the subsystems To converge. Since the sum of values x _i is always constant, the convergence value is the average of the values x _i the initial value. Among several types of means algorithm, the simplest mean agreement algorithm, for example, when communicating subsystem i and subsystem j, updates a value x _i and the value x _j according to the following equation.

これにより、隣接間通信が行われるたびに、システム全体の値ｘ_ｉの総和を一定に保ちながら、値ｘ_ｉの分散が減少するため、いずれ、全ての値ｘ_ｉが同一の値、すなわち、各値ｘ_ｉの初期値の平均値に収束することになる。 Thereby, every time the communication between the neighbors is performed, the variance of the value x _i is reduced while keeping the sum of the values x _i of the entire system constant, so that all the values x _{i eventually} become the same value, that is, will converge to an average value of the initial value of each value x _i.

平均合意アルゴリズムは、動作の最中に値ｘ_ｉの値やネットワークトポロジが動的に変化すると、システム全体での値ｘ_ｉの総和が変化するため、各値ｘ_ｉの初期値の平均値を正しく計算できない。一方で、平均合意アルゴリズムが持つ収束値が常にその時点でのシステム全体の値ｘ_ｉの平均値になっているので、誤差逆伝播による畳み込みフィルタの重み係数の学習に有用である。 The average consensus algorithm calculates the average value of the initial values of each value x _i because the sum of the values x _{i in} the entire system changes when the value of the value x _i or the network topology changes dynamically during the operation. Cannot calculate correctly. On the other hand, since the convergence value of the average agreement algorithm is always the average value of the values x _i of the entire system at that time, it is useful for learning the weight coefficient of the convolution filter by the back propagation of the error.

［誤差逆伝播による学習（訓練）］
上述のように、分散ＣＮＮにおいて、サブシステムは、前回の隣接サブシステムとの通信のときに得られた隣接カラム上のニューロンの値を記憶し、記憶しておいた値を用いて畳み込み演算等の順伝播の計算を行うことで内部状態ベクトルを更新する。サブシステムは、誤差逆伝播計算時に順伝播時に用いた値をそのまま使うことで、他のサブシステムとの同期なく、誤差逆伝播によるパラメータ更新を行うことができる。 [Learning (training) by backpropagation]
As described above, in the distributed CNN, the subsystem stores the value of the neuron on the adjacent column obtained during the previous communication with the adjacent subsystem, and performs convolution operation or the like using the stored value. To update the internal state vector by performing forward propagation calculation. By using the value used during forward propagation at the time of error backpropagation calculation, the subsystem can update parameters by error backpropagation without synchronization with other subsystems.

誤差逆伝播の計算は、順伝播のグラフのノード（ニューロン）を逆にたどることで行う。図２２〜２５にグラフ上のノードの種類ごとの伝播則の例を示す。具体的には、図２２〜２５は、それぞれ、関数ｆの適用ノード７７１、可算ノード７７２、乗算ノード７７３及び変数複製ノード（分岐ノード）７７４を示す。順伝播の伝播則は実線で示され、逆伝播の伝播則は破線で示されている。順伝播の入力はｘ及びｙである。分散ＣＮＮでは、このうち変数複製ノード７７４の実現に工夫が必要である。 The calculation of the backpropagation is performed by following the nodes (neurons) of the forward propagation graph in reverse. 22 to 25 show examples of propagation rules for each type of node on the graph. Specifically, FIGS. 22 to 25 show an application node 771, a countable node 772, a multiplication node 773, and a variable replication node (branch node) 774 of the function f, respectively. The propagation rule of forward propagation is shown by a solid line, and the propagation rule of back propagation is shown by a broken line. The inputs to the forward propagation are x and y. In the distributed CNN, a device is required to realize the variable replication node 774 among them.

変数複製ノード７７４は、順伝播の計算時に同一の変数を複数回使用することに相当する。分散ＣＮＮでは複数のサブシステムが分散して計算するため、図２５の変数複製ノード７７４の逆伝播則において、上層からの誤差信号ｄｘ１及びｄｘ２のうち、自サブシステム以外のサブシステムが担当するカラムからの信号を得ることはできない。 The variable replication node 774 is equivalent to using the same variable a plurality of times when calculating forward propagation. In the distributed CNN, a plurality of subsystems perform calculations in a distributed manner. Therefore, in the backpropagation rule of the variable replication node 774 in FIG. 25, the error signals dx1 and dx2 from the upper layer are columns that are handled by subsystems other than the own subsystem. Can not get the signal from.

そこで、変数複製ノード７７４の誤差逆伝搬では、各サブシステムが、それぞれ自サブシステムが担当するカラム内部で得られる誤差信号のみを使って、独立にパラメータの微係数を計算してパラメータ値を更新する。その後、各サブシステムは、上記平均合意アルゴリズムを用いて全てのサブシステムが保持するパラメータ値の平均を計算する。平均合意アルゴリズムでは、収束値が、常にその時点での全てのサブシステムが持つ値の平均値になっているため、継続的に学習、すなわち、パラメータ値の更新を続けることができる。 Therefore, in the error backpropagation of the variable replication node 774, each subsystem independently calculates the derivative of the parameter and updates the parameter value using only the error signal obtained inside the column assigned to the own subsystem. I do. Thereafter, each subsystem calculates the average of the parameter values held by all the subsystems using the average agreement algorithm. In the average consensus algorithm, the convergence value is always the average value of the values of all the subsystems at that time, so that learning, that is, updating of the parameter value can be continuously performed.

なお、数学的に正しい微係数が上層からの全ての誤差信号の総和であるのに対して、この方法は、上層からの全ての誤差信号の平均値を微係数と考えてパラメータを更新する。そのため、パラメータ更新に用いる学習アルゴリズムのハイパーパラメータである学習率（ＬｅａｒｎｉｎｇＲａｔｅ）に、あらかじめ全サブシステムの数を乗算しておく。 Note that while the mathematically correct differential coefficient is the sum of all error signals from the upper layer, this method updates the parameters by considering the average value of all error signals from the upper layer as the differential coefficient. Therefore, the learning rate (learning rate), which is a hyperparameter of the learning algorithm used for updating the parameter, is multiplied in advance by the number of all subsystems.

逆誤差伝播における分岐ノードの後段（順伝播では前段）にさらに別のノードがある場合も、同様に、分岐ノードでは各サブシステムが自サブシステムが担当するコラムの情報のみを用いて誤差信号を逆伝播させて後段にあるパラメータ更新を独立に実行し、その後にパラメータ値の平均値を平均合意アルゴリズムにより計算する。 Similarly, when there is another node after the branch node in the backward error propagation (the previous stage in the forward propagation), similarly, in the branch node, each subsystem generates an error signal by using only the information of the column in charge of its own subsystem. The parameter is back-propagated and the subsequent parameters are updated independently, and then the average of the parameter values is calculated by the average agreement algorithm.

［分散ＣＮＮの構成例］
以下において、本開示の分散ＣＮＮの構成例を説明する。図２６は、分散ＣＮＮの構成例９００及びその処理を示す。以下において、一つのサブシステムが実行する処理を説明する。まず、順伝播を説明する。順伝播において、第１畳み込み層９０１は、実施例１の第１畳み込み層４１１と同様に、畳み込み演算及びＲｅＬＵを実行する。第１畳み込み層９０１は、自内部状態ベクトルから対象ニューロンの値を取得し、また、予め保持している隣接サブシステムの内部状態ベクトルから、隣接ニューロンの値を取得する。第１畳み込み層９０１は、これらの値の畳み込み演算及びＲｅＬＵを実行し、自内部状態ベクトルを更新する。 [Configuration Example of Distributed CNN]
Hereinafter, a configuration example of the distributed CNN of the present disclosure will be described. FIG. 26 shows a configuration example 900 of the distributed CNN and its processing. Hereinafter, processing executed by one subsystem will be described. First, forward propagation will be described. In the forward propagation, the first convolution layer 901 performs the convolution operation and the ReLU, similarly to the first convolution layer 411 of the first embodiment. The first convolutional layer 901 obtains the value of the target neuron from its own internal state vector, and obtains the value of the adjacent neuron from the internal state vector of the adjacent subsystem stored in advance. The first convolution layer 901 executes a convolution operation of these values and ReLU to update its own internal state vector.

第１プーリング層９０２は、実施例１の第１畳み込み層４１１と同様に、プーリング処理を実行する。第１プーリング層９０２は、自内部状態ベクトルから第１畳み込み層９０１により更新された対象ニューロンの値を取得し、また、予め保持している隣接サブシステムの内部状態ベクトルから、隣接ニューロンの第１畳み込み層９０１の出力値を取得する。第１プーリング層９０２は、これらの値のプーリング、本例において最大値プーリングを実行し、自内部状態ベクトルを更新する。 The first pooling layer 902 performs a pooling process, like the first convolution layer 411 of the first embodiment. The first pooling layer 902 acquires the value of the target neuron updated by the first convolution layer 901 from its own internal state vector, and obtains the first neuron value of the adjacent neuron from the internal state vector of the adjacent subsystem stored in advance. The output value of the convolution layer 901 is obtained. The first pooling layer 902 performs pooling of these values, in this example, maximum value pooling, and updates its own internal state vector.

第２畳み込み層９０４は、実施例１の第２畳み込み層４１３と同様に、畳み込み演算及びＲｅＬＵを実行する。第２畳み込み層９０４は、自内部状態ベクトルから第１プーリング層９０２に更新された対象ニューロンの値を取得し、また、予め保持している２ホップ内のサブシステムの内部状態ベクトルから、隣接ニューロンの第１プーリング層の出力値を取得する。２ホップ離れたサブシステムからのデータは、隣接サブシステムを介して転送される。第２畳み込み層９０４は、これらの値の畳み込み演算及びＲｅＬＵを実行し、自内部状態ベクトルを更新する。 The second convolution layer 904 performs the convolution operation and the ReLU similarly to the second convolution layer 413 of the first embodiment. The second convolutional layer 904 acquires the value of the target neuron updated in the first pooling layer 902 from its own internal state vector, and obtains the value of the neighboring neuron from the internal state vector of the subsystem in two hops stored in advance. To obtain the output value of the first pooling layer. Data from subsystems two hops away is forwarded through neighboring subsystems. The second convolution layer 904 performs a convolution operation of these values and ReLU to update its own internal state vector.

第２プーリング層９０６は、実施例１の第２プーリング層４１４と同様に、プーリング処理を実行する。第２プーリング層９０６は、自内部状態ベクトルから第２畳み込み層９０４に更新された対象ニューロンの値を取得し、また、予め保持している隣接サブシステムの内部状態ベクトルから、隣接ニューロンの値を取得する。第２プーリング層９０６は、これらの値のプーリング、本例において最大値プーリングを実行し、自内部状態ベクトルを更新する。 The second pooling layer 906 performs a pooling process, similarly to the second pooling layer 414 of the first embodiment. The second pooling layer 906 acquires the value of the target neuron updated in the second convolution layer 904 from its own internal state vector, and obtains the value of the adjacent neuron from the internal state vector of the adjacent subsystem stored in advance. get. The second pooling layer 906 performs pooling of these values, in this example, maximum value pooling, and updates its own internal state vector.

第２プーリング層９０６は、さらに、自内部状態ベクトルから直前のプーリング処理で更新されたニューロンの値を取得し、予め保持している隣接サブシステムの内部状態ベクトルから、隣接ニューロンの値を取得する。第２プーリング層９０６は、これらの値のプーリング、本例において最大値プーリングを実行し、自内部状態ベクトルを更新する。 The second pooling layer 906 further acquires the value of the neuron updated in the immediately preceding pooling process from the own internal state vector, and acquires the value of the adjacent neuron from the internal state vector of the adjacent subsystem stored in advance. . The second pooling layer 906 performs pooling of these values, in this example, maximum value pooling, and updates its own internal state vector.

全結合層９０７は、分散ＡＤＭＭによって平均値を計算する。全結合層９０７は、自内部状態ベクトルの対象ニューロンの値と、予め保持している他のサブシステムの内部状態ベクトルの対象ニューロンの値から、分散ＡＤＭＭによって平均値を計算する。 The total coupling layer 907 calculates an average value by the distributed ADMM. The fully connected layer 907 calculates an average value by the distributed ADMM from the value of the target neuron of the own internal state vector and the value of the target neuron of the internal state vector of another subsystem stored in advance.

次に、誤差逆伝播を説明する。誤差逆伝播において、各サブシステムは、教師データを与えられている。各サブシステムは、内部に保持しているデータを使用して、他のサブシステムから独立に、重みデータ（バイアスを含む）を更新する。各サブシステムは、平均合意アルゴリズムによって、更新した重みデータの平均値を計算する。これにより、全てのサブシステムの重みデータが共通化される。 Next, error back propagation will be described. In backpropagation, each subsystem is provided with teacher data. Each subsystem updates the weight data (including the bias) independently of the other subsystems, using the data held internally. Each subsystem calculates an average value of the updated weight data by an average agreement algorithm. As a result, the weight data of all subsystems is shared.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Note that the present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the configurations described above. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of one embodiment can be added to the configuration of another embodiment. Also, for a part of the configuration of each embodiment, it is possible to add, delete, or replace another configuration.

また、上記の各構成・機能・処理部等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード等の記録媒体に置くことができる。また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしもすべての制御線や情報線を示しているとは限らない。実際には殆どすべての構成が相互に接続されていると考えてもよい。 In addition, each of the above configurations, functions, processing units, and the like may be partially or entirely realized by hardware, for example, by designing an integrated circuit. In addition, the above-described configurations, functions, and the like may be realized by software by a processor interpreting and executing a program that realizes each function. Information such as a program, a table, and a file for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card or an SD card. Further, the control lines and the information lines are shown to be necessary for the explanation, and not all the control lines and the information lines are necessarily shown on the product. In practice, almost all components may be considered to be interconnected.

１０１計算機、１１０プロセッサ、１１１メモリ、１２３重みデータ、４００ＣＮＮ、４１１第１畳み込み層、４１２第１プーリング層、４１３第２畳み込み層、４１４第２プーリング層、９００分散ＣＮＮ 101 computer, 110 processor, 111 memory, 123 weight data, 400 CNN, 411 first convolution layer, 412 first pooling layer, 413 second convolution layer, 414 second pooling layer, 900 distributed CNN

Claims

A computer system for executing a convolutional neural network on a graph, comprising:
One or more processors,
One or more storage devices;
The convolutional neural network on the graph is
One or more convolutional layers,
One or more pooling layers;
The one or more storage devices store kernel weight data of the one or more convolutional layers,
The one or more processors include:
In each convolution layer, the value of each node is updated by a convolution operation based on a kernel having a size of a predetermined number of hops,
In each pooling layer, the value of each node is updated by pooling processing based on the value of each node and the value of a node within a pooling range of a predetermined number of hops from each node,
The computer system, wherein the size of the kernel of the convolutional layer at the subsequent stage of the pooling layer is larger than the size of the kernel of the convolutional layer of the preceding stage.

The computer system according to claim 1, wherein
The computer system, wherein the pooling range of the subsequent pooling layer of the convolutional layer is wider than the pooling range of the preceding pooling layer.

The computer system according to claim 1, wherein
The computer system, wherein the one or more processors perform the pooling process of the pooling range of the predetermined hop number by repeating the pooling process of the one hop range a number of times equal to the predetermined hop number.

The computer system according to claim 1, wherein
The computer system, wherein the one or more processors include a regularization term of the one or more convolution layers in an error function in error propagation learning of the convolutional neural network on the graph.

The computer system according to claim 1, wherein
The computer system, wherein the one or more processors perform average pooling after applying smooth maximum to an input value in the pooling process.

The computer system according to claim 1, wherein
The computer system, wherein the one or more processors execute a global average pooling at a stage subsequent to the one or more convolutional layers and the one or more pooling layers.

The computer system according to claim 1, wherein
The computer system includes a plurality of subsystems connected by a network,
The plurality of subsystems communicate only with adjacent subsystems,
A computer system, wherein each of the plurality of subsystems calculates a column composed of neurons at the same position in the convolutional neural network on the graph.

The computer system according to claim 7, wherein
Each of the plurality of subsystems comprises:
Holding an internal state vector indicating the value of the column,
Obtaining the internal state vector from other subsystems, including neighboring subsystems,
A computer system for calculating the column using its own internal state vector and internal state vectors of other subsystems.

The computer system according to claim 7, wherein
The computer system, wherein the plurality of subsystems calculate an average value of the values of the columns by variance in a stage subsequent to the one or more convolutional layers and the one or more pooling layers.

The computer system according to claim 7, wherein
Each of the plurality of subsystems comprises:
Independent of the other subsystems, updating the weights of the one or more convolutional layers by learning by backpropagation,
A computer system for calculating the updated average value of the weights by communication with another subsystem.

A method, wherein a computer system executes a convolutional neural network on a graph,
The computer system includes:
One or more processors,
One or more storage devices;
The convolutional neural network on the graph is
One or more convolutional layers,
One or more pooling layers;
The one or more storage devices store kernel weight data of the one or more convolutional layers,
The method comprises:
In each convolution layer, the value of each node is updated by a convolution operation based on a kernel having a size of a predetermined number of hops,
In each pooling layer, the value of each node is updated by pooling processing based on the value of each node and the value of a node within a pooling range of a predetermined number of hops from each node,
The method, wherein the size of the kernel of the subsequent convolutional layer is larger than the size of the kernel of the previous convolutional layer.