JP2018109870A

JP2018109870A - Information processor, information processing method and program

Info

Publication number: JP2018109870A
Application number: JP2017000223A
Authority: JP
Inventors: 明彦笠置; Akihiko Kasaoki
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-01-04
Filing date: 2017-01-04
Publication date: 2018-07-12

Abstract

PROBLEM TO BE SOLVED: To achieve both maintenance of entire linkages between specific layers in a neural network and reduction of the number of parameters.SOLUTION: An information processor includes: a first calculation part which executes calculation of forward propagation of a multilayer neural network including first layers 101-107, second layers 111-117 for receiving output from the first layers 101-107 and third layers 121-127 for receiving output from the second layers; and a second calculation part which executes calculation of backward propagation of the multilayer neural network based on the calculation result of forward propagation. An output from each unit 101-107 of the first layer is input into all units 121-127 of the third layer via the units 111-117 of the second layer. The number of units of the second layer is greater than the number of edges between each unit 101-107 of the first layer and the second layer and the number of edges between each unit 121-127 of the third layer and the second layer.SELECTED DRAWING: Figure 3

Description

本発明は、機械学習技術に関する。 The present invention relates to a machine learning technique.

多層ニューラルネットワーク（例えば、以下の文献に記載されている畳み込みニューラルネットワーク）は、画像データや音声データの分類に利用されている。分類器として利用される多層ニューラルネットワークにおいては、或る層における複数のユニットと隣接する別の層における複数のユニットとが全結合される。このような全結合が構築される層は全結合層と呼ばれる。 A multilayer neural network (for example, a convolutional neural network described in the following document) is used for classification of image data and audio data. In a multilayer neural network used as a classifier, a plurality of units in one layer and a plurality of units in another adjacent layer are fully coupled. A layer in which such a total bond is built is called a total bond layer.

多層ニューラルネットワークにおいてはエッジ毎にパラメータ（具体的には、エッジの重み）が用意されるので、全結合層のパラメータ数は他の層のパラメータ数と比べて多く、多層ニューラルネットワークのパラメータのほとんどを全結合層のパラメータが占めることもある。但し、パラメータ数の増加はメモリ消費量の増加につながる。また、複数のコンピュータが並列で多層ニューラルネットワークの計算を行う場合、パラメータ数が増加するとコンピュータ間で転送される通信データ量が増加するため、最終的に計算が完了するまでの時間が長くなる。 In a multilayer neural network, parameters (specifically, edge weights) are prepared for each edge. Therefore, the number of parameters in all connected layers is larger than the number of parameters in other layers. May be occupied by the parameters of the total coupling layer. However, an increase in the number of parameters leads to an increase in memory consumption. In addition, when a plurality of computers perform parallel neural network calculations in parallel, the amount of communication data transferred between the computers increases as the number of parameters increases, and the time until the calculation is finally completed becomes longer.

特開２０１６−３３８０６号公報JP, 2006-33806, A

本発明の目的は、１つの側面では、ニューラルネットワークにおける特定の層間の全結合の維持とパラメータ数の削減とを両立するための技術を提供することである。 An object of the present invention is, in one aspect, to provide a technique for achieving both the maintenance of total coupling between specific layers in a neural network and the reduction of the number of parameters.

一態様に係る情報処理装置は、第１の層と、第１の層からの出力を受け付ける第２の層と、第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行する第１計算部と、順伝播の計算の結果に基づき、多層ニューラルネットワークの逆伝播の計算を実行する第２計算部とを有する。そして、第１の層の各ユニットからの出力が第２の層のユニットを経由して第３の層の全ユニットに入力される。また、第２の層のユニットの数は、第１の層の各ユニットと第２の層との間のエッジの数および第３の層の各ユニットと第２の層との間のエッジの数より多い。 An information processing apparatus according to one aspect includes a first layer, a second layer that receives an output from the first layer, and a third layer that receives an output from the second layer. A first calculation unit that executes forward propagation calculation; and a second calculation unit that executes reverse propagation calculation of the multilayer neural network based on the result of forward propagation calculation. Then, the output from each unit of the first layer is input to all units of the third layer via the unit of the second layer. Also, the number of units in the second layer depends on the number of edges between each unit in the first layer and the second layer and the number of edges between each unit in the third layer and the second layer. More than the number.

１つの側面では、ニューラルネットワークにおける特定の層間の全結合の維持とパラメータ数の削減とを両立できるようになる。 In one aspect, it is possible to simultaneously maintain the total coupling between specific layers in the neural network and reduce the number of parameters.

図１は、全結合の一例を示す図である。FIG. 1 is a diagram illustrating an example of total coupling. 図２は、全結合層のエッジと重み行列との対応関係を示す図である。FIG. 2 is a diagram illustrating a correspondence relationship between the edges of the all coupling layers and the weight matrix. 図３は、ラテン方陣全結合層の一例を示す図である。FIG. 3 is a diagram illustrating an example of a Latin square full coupling layer. 図４は、第１の実施の形態の並列計算システムの概要を示す図である。FIG. 4 is a diagram illustrating an overview of the parallel computing system according to the first embodiment. 図５は、情報処理装置の機能ブロック図である。FIG. 5 is a functional block diagram of the information processing apparatus. 図６は、順伝播および逆伝播について説明するための図である。FIG. 6 is a diagram for explaining forward propagation and reverse propagation. 図７は、順伝播についての処理の処理フローを示す図である。FIG. 7 is a diagram illustrating a processing flow of processing for forward propagation. 図８は、順伝播についての処理を説明するための図である。FIG. 8 is a diagram for explaining processing for forward propagation. 図９は、順伝播についてのラテン方陣全結合層の処理の処理フローを示す図である。FIG. 9 is a diagram showing a processing flow of the processing of the Latin square full coupling layer for forward propagation. 図１０は、通常の全結合層について実行される行列積について説明するための図である。FIG. 10 is a diagram for explaining a matrix product executed for a normal fully connected layer. 図１１は、ラテン方陣全結合層について実行される行列積について説明するための図である。FIG. 11 is a diagram for explaining a matrix product executed for a Latin square fully connected layer. 図１２は、逆伝播についての処理の処理フローを示す図である。FIG. 12 is a diagram illustrating a processing flow of processing for back propagation. 図１３は、逆伝播についての処理を説明するための図である。FIG. 13 is a diagram for explaining processing for back propagation. 図１４は、逆伝播についてのラテン方陣全結合層の処理の処理フローを示す図である。FIG. 14 is a diagram illustrating a processing flow of processing of the Latin square full coupling layer for back propagation. 図１５は、マスク行列について説明するための図である。FIG. 15 is a diagram for explaining the mask matrix. 図１６は、マスク行列について説明するための図である。FIG. 16 is a diagram for explaining the mask matrix. 図１７は、マスク行列について説明するための図である。FIG. 17 is a diagram for explaining the mask matrix. 図１８は、マスク行列について説明するための図である。FIG. 18 is a diagram for explaining the mask matrix. 図１９は、マスク行列について説明するための図である。FIG. 19 is a diagram for explaining the mask matrix. 図２０は、マスク行列について説明するための図である。FIG. 20 is a diagram for explaining the mask matrix. 図２１は、マスク行列について説明するための図である。FIG. 21 is a diagram for explaining the mask matrix. 図２２は、マスク行列について説明するための図である。FIG. 22 is a diagram for explaining the mask matrix. 図２３は、逆伝播についての処理の処理フローを示す図である。FIG. 23 is a diagram illustrating a processing flow of processing for back propagation. 図２４は、情報処理装置の台数と並列計算に要する時間との関係を示す図である。FIG. 24 is a diagram illustrating the relationship between the number of information processing apparatuses and the time required for parallel calculation. 図２５は、エッジ数の削減について説明するための図である。FIG. 25 is a diagram for explaining the reduction in the number of edges. 図２６は、エッジ数の削減について説明するための図である。FIG. 26 is a diagram for explaining the reduction of the number of edges. 図２７は、ラテン方陣全結合層とマスク行列との対応関係を示す図である。FIG. 27 is a diagram illustrating a correspondence relationship between the Latin square full coupling layer and the mask matrix. 図２８は、入力側のユニット数および出力側のユニット数の制御について説明するための図である。FIG. 28 is a diagram for explaining control of the number of units on the input side and the number of units on the output side. 図２９は、第３の実施の形態のシステムの概要を示す図である。FIG. 29 is a diagram illustrating an overview of a system according to the third embodiment. 図３０は、ラテン方陣ファットツリーおよび有限射影平面について説明するための図である。FIG. 30 is a diagram for explaining a Latin square fat tree and a finite projection plane. 図３１は、ラテン方陣ファットツリーおよび有限射影平面について説明するための図である。FIG. 31 is a diagram for explaining a Latin square fat tree and a finite projection plane. 図３２は、ラテン方陣ファットツリーおよび有限射影平面について説明するための図である。FIG. 32 is a diagram for explaining a Latin square fat tree and a finite projection plane. 図３３は、ラテン方陣ファットツリーおよび有限射影平面について説明するための図である。FIG. 33 is a diagram for explaining a Latin square fat tree and a finite projection plane. 図３４は、コンピュータの機能ブロック図である。FIG. 34 is a functional block diagram of a computer.

［実施の形態１］
図１に、全結合の一例を示す。図１において、円はユニットを表し、ユニット間の線分はエッジを表す。図１においては、第（ｌ＋１）層の各ユニットは第ｌ層における全ユニットに接続される。各層のユニットの数は７であるので、エッジの数は４９（＝７＊７）である。機械学習においては各エッジのパラメータ（すなわち重み）が更新され、層間の重みは行列として表現させる。例えば図１の例であれば、重み行列は７行７列の行列である。図１における第（ｌ＋１）層への入力は、密行列である重み行列と第ｌ層の出力とを用いて計算される。 [Embodiment 1]
FIG. 1 shows an example of total coupling. In FIG. 1, a circle represents a unit, and a line segment between units represents an edge. In FIG. 1, each unit in the (l + 1) th layer is connected to all units in the lth layer. Since the number of units in each layer is 7, the number of edges is 49 (= 7 * 7). In machine learning, parameters (that is, weights) of each edge are updated, and weights between layers are expressed as a matrix. For example, in the example of FIG. 1, the weight matrix is a 7 × 7 matrix. The input to the (l + 1) th layer in FIG. 1 is calculated using a weight matrix that is a dense matrix and the output of the lth layer.

しかし、全結合層については、機械学習後の重み行列が疎行列になる。例えば図２（ａ）に示す、ユニット数が４である第ｌ層とユニット数が４である第（ｌ＋１）層との間の１６本のエッジのうち、機械学習後の重みの値が零（又はほぼ零）ではないエッジが太線のエッジであるとする。この場合、層間の重みに対応する重み行列は、図２（ｂ）に示すような疎行列である。図２（ｂ）においては、行方向の位置が第（ｌ＋１）層のユニットの位置に対応し、列方向の位置が第ｌ層のユニットの位置に対応する。重みの値が零（又はほぼ零）ではないエッジに対応する要素にハッチングが付されている。このように、多層ニューラルネットワーク（以下、ＤＮＮ（Deep Neural Network）と呼ぶ）における全結合層のエッジには潜在的に無駄なエッジが含まれると考えられる。 However, the weight matrix after machine learning is a sparse matrix for the fully connected layer. For example, among the 16 edges between the l-th layer having 4 units and the (l + 1) -th layer having 4 units shown in FIG. 2A, the weight value after machine learning is zero. It is assumed that an edge that is not (or almost zero) is a thick line edge. In this case, the weight matrix corresponding to the weight between layers is a sparse matrix as shown in FIG. In FIG. 2B, the position in the row direction corresponds to the position of the (l + 1) th layer unit, and the position in the column direction corresponds to the position of the lth layer unit. An element corresponding to an edge whose weight value is not zero (or almost zero) is hatched. As described above, it is considered that the edges of all the connection layers in the multilayer neural network (hereinafter referred to as DNN (Deep Neural Network)) include potentially useless edges.

そこで本実施の形態においては、ＤＮＮにラテン方陣全結合層（ＬＳＦＣＬ：Latin Square Fully-Connected Layer）を導入する。図３に、ラテン方陣全結合層の一例を示す。ラテン方陣全結合層においては、第ｌ層の各ユニットからの出力が、第（ｌ＋２）層の全ユニットに入力されるようにエッジが設定される。例えば、第ｌ層におけるユニットｌ０１からの出力は、第（ｌ＋１）層におけるユニットｌ１１、ユニットｌ１２及びユニットｌ１３に入力される。さらに、ユニットｌ１１からの出力は、第（ｌ＋２）層におけるユニットｌ２１、ユニットｌ２２及びユニットｌ２３に入力される。ユニットｌ１２からの出力は、ユニットｌ２１、ユニットｌ２４及びユニットｌ２５に入力される。ユニットｌ１３からの出力は、ユニットｌ２１、ユニットｌ２６及びユニットｌ２７に入力される。これにより、ユニットｌ０１からの出力が第（ｌ＋２）層における全ユニットに入力される。第ｌ層における他のユニットについても同様である。 Therefore, in the present embodiment, a Latin Square Fully-Connected Layer (LSFCL) is introduced into DNN. FIG. 3 shows an example of a Latin square full coupling layer. In the Latin square fully connected layer, the edge is set so that the output from each unit in the l-th layer is input to all the units in the (l + 2) -th layer. For example, the output from the unit l01 in the l-th layer is input to the unit l11, the unit l12, and the unit l13 in the (l + 1) -th layer. Furthermore, the output from the unit l11 is input to the unit l21, the unit l22, and the unit l23 in the (l + 2) th layer. The output from the unit l12 is input to the unit l21, the unit l24, and the unit l25. The output from the unit l13 is input to the unit l21, the unit l26, and the unit l27. As a result, the output from the unit 101 is input to all units in the (l + 2) layer. The same applies to the other units in the l-th layer.

図３に示したラテン方陣全結合層のエッジ数は、４２（＝（３＊７）＊２）である。ラテン方陣全結合層のエッジ数は全結合層のエッジ数より少ないので、パラメータとしてメモリ等の記憶装置に保持される重みの値の数を減らすことができる。 The number of edges of the Latin square full coupling layer shown in FIG. 3 is 42 (= (3 * 7) * 2). Since the number of edges of the Latin square full coupling layer is smaller than the number of edges of the total coupling layer, the number of weight values held in a storage device such as a memory as a parameter can be reduced.

なお、ラテン方陣に関連する事項については、付録を参照されたい。 Please refer to the appendix for matters related to Latin squares.

以下では、本実施の形態の具体的な内容を説明する。図４に、本実施の形態の並列計算システム１０００の概要を示す。並列計算システム１０００は、ＤＮＮの計算を並列で実行するシステムであり、例えば物理サーバ或いはパーソナルコンピュータ等である情報処理装置１ａ乃至１ｃを有する。情報処理装置１ａ乃至１ｃは、例えばＬＡＮ（Local Area Network）であるネットワーク５に接続される。各情報処理装置は異なるデータに対して処理を行い、処理結果（例えば、パラメータの更新に使用されるデータ）を他の情報処理装置に送信すると共に、他の情報処理装置から処理結果を受信する。各情報処理装置は、収集した処理結果に基づきＤＮＮのパラメータを更新する。なお、図４においては情報処理装置の数は３であるが、数に限定はない。 Below, the specific content of this Embodiment is demonstrated. FIG. 4 shows an overview of the parallel computing system 1000 of the present embodiment. The parallel computing system 1000 is a system that executes DNN calculations in parallel, and includes information processing apparatuses 1a to 1c that are, for example, physical servers or personal computers. The information processing apparatuses 1a to 1c are connected to a network 5 that is a LAN (Local Area Network), for example. Each information processing apparatus processes different data, transmits a processing result (for example, data used for parameter update) to another information processing apparatus, and receives a processing result from the other information processing apparatus. . Each information processing apparatus updates DNN parameters based on the collected processing results. In FIG. 4, the number of information processing apparatuses is three, but the number is not limited.

図５に、情報処理装置１ａの機能ブロック図を示す。情報処理装置１ａは、第１計算部１０１と、第２計算部１０３と、通信部１０５と、データ格納部１１１と、目標出力格納部１１３と、ワークデータ格納部１１５とを含む。なお、情報処理装置１ｂおよび情報処理装置１ｃの機能ブロック図は、情報処理装置１ａの機能ブロック図と同様である。第１計算部１０１、第２計算部１０３及び通信部１０５は、図３４におけるＣＰＵ（Central Processing Unit）２５０３がプログラムを実行することで実現される。データ格納部１１１、目標出力格納部１１３及びワークデータ格納部１１５は、図３４におけるメモリ２５０１又はＨＤＤ２５０５（Hard Disk Drive）に設けられる。 FIG. 5 shows a functional block diagram of the information processing apparatus 1a. The information processing apparatus 1a includes a first calculation unit 101, a second calculation unit 103, a communication unit 105, a data storage unit 111, a target output storage unit 113, and a work data storage unit 115. The functional block diagrams of the information processing apparatus 1b and the information processing apparatus 1c are the same as the functional block diagram of the information processing apparatus 1a. The first calculation unit 101, the second calculation unit 103, and the communication unit 105 are realized by a CPU (Central Processing Unit) 2503 in FIG. 34 executing a program. The data storage unit 111, the target output storage unit 113, and the work data storage unit 115 are provided in the memory 2501 or the HDD 2505 (Hard Disk Drive) in FIG.

データ格納部１１１は、機械学習および分類の対象であるデータ（例えば、画像データや音声データ等）を格納する。目標出力格納部１１３は、目標出力（ラベルとも呼ばれる）を格納する。第１計算部１０１は、データ格納部１１１に格納されているデータに対して順伝播の計算を実行し、計算結果をワークデータ格納部１１５に格納する。第２計算部１０３は、目標出力格納部１１３に格納されているデータおよびワークデータ格納部１１５に格納されているデータに基づき逆伝播の計算を実行し、処理結果（例えば、パラメータの誤差）をワークデータ格納部１１５に格納する。通信部１０５は、ワークデータ格納部１１５に格納されているデータを情報処理装置１ｂ及び１ｃに送信すると共に、情報処理装置１ｂ及び１ｃから受信したデータをワークデータ格納部１１５に格納する。第２計算部１０３は、ワークデータ格納部１１５に格納されているデータに基づきパラメータを更新する。 The data storage unit 111 stores data (for example, image data, audio data, etc.) that are targets of machine learning and classification. The target output storage unit 113 stores a target output (also called a label). The first calculation unit 101 performs forward propagation calculation on the data stored in the data storage unit 111 and stores the calculation result in the work data storage unit 115. The second calculation unit 103 performs back propagation calculation based on the data stored in the target output storage unit 113 and the data stored in the work data storage unit 115, and the processing result (for example, parameter error) is obtained. It is stored in the work data storage unit 115. The communication unit 105 transmits the data stored in the work data storage unit 115 to the information processing devices 1b and 1c, and stores the data received from the information processing devices 1b and 1c in the work data storage unit 115. The second calculation unit 103 updates parameters based on data stored in the work data storage unit 115.

なお、図６に示すように、順伝播とは、ＤＮＮの入力から出力を計算する処理である。逆伝播とは、計算された出力と目標出力との誤差からパラメータの誤差を計算する処理である。算出されたパラメータの誤差に基づき、ＤＮＮのパラメータを更新する。図６に示すように、逆伝播の計算は順伝播の計算とは逆方向に進行する。 As shown in FIG. 6, forward propagation is a process for calculating an output from an input of DNN. Back propagation is a process of calculating an error of a parameter from an error between a calculated output and a target output. Based on the calculated parameter error, the DNN parameter is updated. As shown in FIG. 6, the back propagation calculation proceeds in the opposite direction to the forward propagation calculation.

次に、図７乃至図２６を用いて、各情報処理装置が実行する処理を詳細に説明する。なお、以下では情報処理装置１ａが実行する処理を例として説明するが、情報処理装置１ａ以外の情報処理装置が実行する処理は情報処理装置１ａが実行する処理と同様である。 Next, processing executed by each information processing apparatus will be described in detail with reference to FIGS. In the following, processing executed by the information processing apparatus 1a will be described as an example, but processing executed by an information processing apparatus other than the information processing apparatus 1a is the same as processing executed by the information processing apparatus 1a.

はじめに、図７乃至図１１を用いて、順伝播についての処理を説明する。 First, processing for forward propagation will be described with reference to FIGS.

情報処理装置１ａの第１計算部１０１は、データ格納部１１１から処理対象のデータを読み出す（図７：ステップＳ１）。 The first calculation unit 101 of the information processing apparatus 1a reads out data to be processed from the data storage unit 111 (FIG. 7: step S1).

第１計算部１０１は、層の番号を表す変数であるｉに０を設定する（ステップＳ３）。 The first calculation unit 101 sets 0 to i, which is a variable representing the layer number (step S3).

第１計算部１０１は、第ｉ層がラテン方陣全結合層であるか判定する（ステップＳ５）。ここでは、第ｉ層が図３における第（ｌ＋１）層（すなわち中間の層）に相当する場合に第ｉ層がラテン方陣全結合層であると判定される。 The first calculation unit 101 determines whether or not the i-th layer is a Latin square fully connected layer (step S5). Here, when the i-th layer corresponds to the (l + 1) -th layer (that is, the intermediate layer) in FIG. 3, it is determined that the i-th layer is a Latin square full coupling layer.

第ｉ層がラテン方陣全結合層ではない場合（ステップＳ５：Ｎｏルート）、第１計算部１０１は、順伝播について第ｉ層の演算を実行し（ステップＳ７）、処理結果をワークデータ格納部１１５に格納する。そして処理はステップＳ１１に移行する。 When the i-th layer is not a Latin square all coupled layer (step S5: No route), the first calculation unit 101 executes the i-th layer operation for forward propagation (step S7), and the processing result is stored in the work data storage unit. 115. Then, the process proceeds to step S11.

図８を用いて、順伝播についての演算を説明する。メモリ２５０１には各層の処理に用いられるデータ及び各層の処理の結果が格納される。矢印はポインタを表す。ステップＳ１において読み出されたデータは領域「Ｄａｔａ−０」に格納される。 The calculation for forward propagation will be described with reference to FIG. The memory 2501 stores data used for processing of each layer and processing results of each layer. The arrow represents a pointer. The data read in step S1 is stored in the area “Data-0”.

層０においては、領域「Ｄａｔａ−０」に格納されたデータを入力データとし且つ領域「Ｗｅｉｇｈｔ−０」に格納されたデータを重みとして処理が実行され、処理結果が出力データとして領域「Ｄａｔａ−１」に格納される。 In the layer 0, processing is executed using the data stored in the area “Data-0” as input data and the data stored in the area “Weight-0” as weights, and the processing result is output as output data in the area “Data-”. 1 ".

上記計算は、バッチサイズが１である場合、以下のような式で表される。 The above calculation is expressed by the following formula when the batch size is 1.

ｕは入力を表し、ｗは重みを表し、ｚは出力を表す。バッチサイズがｍ（ｍは自然数）である場合、上記計算は以下のような式で表される。 u represents an input, w represents a weight, and z represents an output. When the batch size is m (m is a natural number), the above calculation is expressed by the following equation.

層１においては、領域「Ｄａｔａ−１」に格納されたデータを入力データとし且つ領域「Ｗｅｉｇｈｔ−１」に格納されたデータを重みとして処理が実行され、処理結果が出力データとして領域「Ｄａｔａ−２」に格納される。 In the layer 1, processing is executed using the data stored in the area “Data-1” as input data and the data stored in the area “Weight-1” as weights, and the processing result is output as output data in the area “Data-”. 2 ".

層２においては、領域「Ｄａｔａ−２」に格納されたデータを入力データとし且つ領域「Ｗｅｉｇｈｔ−２」に格納されたデータを重みとして処理が実行され、処理結果が出力データとして領域「Ｄａｔａ−３」に格納される。 In the layer 2, processing is executed using the data stored in the area “Data-2” as input data and the data stored in the area “Weight-2” as weights, and the processing result is output as output data in the area “Data−”. 3 ".

層３においては、領域「Ｄａｔａ−３」に格納されたデータを入力データとし且つ領域「Ｗｅｉｇｈｔ−３」に格納されたデータを重みとして処理が実行され、処理結果が出力データとして領域「Ｄａｔａ−４」に格納される。 In the layer 3, processing is executed using the data stored in the area “Data-3” as input data and the data stored in the area “Weight-3” as weights, and the processing result is output as output data in the area “Data-”. 4 ".

このように、ＤＮＮへの入力は鎖状のポインタの開始点に格納され、ＤＮＮからの出力は鎖状のポインタの終点に格納される。 In this way, the input to the DNN is stored at the start point of the chain pointer, and the output from the DNN is stored at the end point of the chain pointer.

図７の説明に戻り、第ｉ層がラテン方陣全結合層である場合（ステップＳ５：Ｙｅｓルート）、第１計算部１０１は、順伝播についてラテン方陣全結合層の演算を実行する（ステップＳ９）。ステップＳ９の処理については、図９乃至図１１を用いて説明する。 Returning to the description of FIG. 7, when the i-th layer is the Latin square full connection layer (step S5: Yes route), the first calculation unit 101 performs the operation of the Latin square full connection layer for forward propagation (step S9). ). The process of step S9 will be described with reference to FIGS.

第１計算部１０１は、ワークデータ格納部１１５における、データの格納領域（すなわち、領域「Ｄａｔａ−ｉ」）からデータを入力データとして読み出す（図９：ステップＳ２１）。 The first calculation unit 101 reads data as input data from the data storage area (that is, the area “Data-i”) in the work data storage unit 115 (FIG. 9: step S21).

第１計算部１０１は、ワークデータ格納部１１５における、重みの格納領域（すなわち、領域「Ｗｅｉｇｈｔ−ｉ」及び領域「Ｗｅｉｇｈｔ−（ｉ＋１）」）から第１の重み行列と第２の重み行列とを読み出す（ステップＳ２３）。 The first calculation unit 101 calculates the first weight matrix and the second weight matrix from the weight storage area (that is, the area “Weight-i” and the area “Weight- (i + 1)”) in the work data storage unit 115. Is read (step S23).

第１計算部１０１は、ステップＳ２１において読み出した入力データの行列と、ステップＳ２３において読み出した第１の重み行列との行列積を計算する（ステップＳ２５）。計算の結果は、ワークデータ格納部１１５における格納領域（例えば、領域「Ｄａｔａ−（ｉ＋１）」）に格納される。 The first calculation unit 101 calculates a matrix product of the matrix of input data read in step S21 and the first weight matrix read in step S23 (step S25). The result of the calculation is stored in a storage area (for example, area “Data− (i + 1)”) in the work data storage unit 115.

第１計算部１０１は、ステップＳ２５の行列積の結果と、ステップＳ２３において読み出した第２の重み行列との行列積を計算する（ステップＳ２７）。 The first calculation unit 101 calculates the matrix product of the matrix product result in step S25 and the second weight matrix read in step S23 (step S27).

図１０及び図１１を用いて、ＤＮＮの計算において実行される行列積について説明する。図１０を用いて、通常の全結合層について実行される行列積について説明する。図１０においては、第ｌ層からの出力が第（ｌ＋１）層に入力されるようになっており、各出力には重みが乗じられる。具体的には、第ｌ層からの出力を入力Ｘとし、重み行列をＷとし、第（ｌ＋１）層への入力を出力Ｚとすると、Ｘ＊Ｗ→Ｚとして出力が計算される。 The matrix product executed in the DNN calculation will be described with reference to FIGS. With reference to FIG. 10, a matrix product executed for a normal fully connected layer will be described. In FIG. 10, the output from the l-th layer is input to the (l + 1) -th layer, and each output is multiplied by a weight. Specifically, if the output from the l-th layer is input X, the weight matrix is W, and the input to the (l + 1) -th layer is output Z, the output is calculated as X * W → Z.

図１１を用いて、本実施の形態のラテン方陣全結合層について実行される行列積について説明する。図１１においては、第ｌ層からの各出力に重みが乗じられ、第（ｌ＋１）層に入力される。次に、第（ｌ＋１）層からの各出力に重みが乗じられ、第（ｌ＋２）層に入力される。つまり、行列積が２回実行される。１回目の行列積に使用される重みの行列を第１の重み行列（Ｍ₁）とし、２回目の行列積に使用される重みの行列を第２の重み行列（Ｍ₂）とする。すると、１回目の行列積においては、Ｘ＊Ｍ₁→ＹとしてＹが計算され、２回目の行列積においては、Ｙ＊Ｍ₂→ＺとしてＺが計算される。つまり、全体としては、Ｘ＊Ｍ₁＊Ｍ₂→ＺとしてＺが計算される。 With reference to FIG. 11, the matrix product executed for the Latin square fully connected layer of the present embodiment will be described. In FIG. 11, each output from the l-th layer is multiplied by a weight and input to the (l + 1) -th layer. Next, each output from the (l + 1) th layer is multiplied by a weight and input to the (l + 2) th layer. That is, the matrix product is executed twice. The weight matrix used for the first matrix product is a first weight matrix (M ₁ ), and the weight matrix used for the second matrix product is a second weight matrix (M ₂ ). Then, in the first matrix product, Y is calculated as X * M ₁ → Y, and in the second matrix product, Z is calculated as Y * M ₂ → Z. That is, as a whole, Z is calculated as X * M ₁ * M ₂ → Z.

本実施の形態においては、全結合が３つの層を用いて実現されるので、２つの重み行列を使用してＺが計算される。 In the present embodiment, since total coupling is realized using three layers, Z is calculated using two weight matrices.

図９の説明に戻り、第１計算部１０１は、ステップＳ２７において算出された行列積の結果をデータの格納領域（すなわち、領域「Ｄａｔａ−（ｉ＋２）」）に出力データとして書き込む（ステップＳ２９）。 Returning to the description of FIG. 9, the first calculation unit 101 writes the result of the matrix product calculated in step S27 as output data in the data storage area (that is, the area “Data− (i + 2)”) (step S29). .

第１計算部１０１は、２層分の処理を実行したので、ｉを１インクリメントする（ステップＳ３０）。そして処理は呼び出し元に戻る。 Since the first calculation unit 101 has executed processing for two layers, i is incremented by 1 (step S30). Processing then returns to the caller.

図７の説明に戻り、第１計算部１０１は、ｉを１インクリメントする（ステップＳ１１）。第１計算部１０１は、ｉ＜Ｎが成立するか判定する（ステップＳ１３）。Ｎは３以上の自然数であり、ＤＮＮの層の数である。 Returning to the description of FIG. 7, the first calculation unit 101 increments i by 1 (step S11). The first calculation unit 101 determines whether i <N is satisfied (step S13). N is a natural number of 3 or more, and is the number of layers of DNN.

ｉ＜Ｎが成立する場合（ステップＳ１３：Ｙｅｓルート）、処理はステップＳ５に戻る。一方、ｉ＜Ｎが成立しない場合（ステップＳ１３：Ｎｏルート）、第１計算部１０１は、ＤＮＮの演算結果を出力する（ステップＳ１５）。そして処理は終了する。例えば機械学習時においては、第１計算部１０１はＤＮＮの演算結果をワークデータ格納部１１５に格納する。また、例えばデータの分類時においては、第１計算部１０１はＤＮＮの演算結果を表示装置に表示するとともに、ＤＮＮの演算結果をワークデータ格納部１１５に格納する。 When i <N is satisfied (step S13: Yes route), the process returns to step S5. On the other hand, when i <N is not satisfied (step S13: No route), the first calculation unit 101 outputs the operation result of DNN (step S15). Then, the process ends. For example, during machine learning, the first calculation unit 101 stores the DNN calculation result in the work data storage unit 115. For example, when data is classified, the first calculation unit 101 displays the DNN calculation result on the display device and stores the DNN calculation result in the work data storage unit 115.

以上のような処理を実行すれば、ラテン方陣全結合層を含むＤＮＮに対して順伝播の計算を適切に実行することができるようになる。 By executing the processing as described above, the forward propagation calculation can be appropriately executed for the DNN including the Latin square full coupling layer.

次に、図１２乃至図２３を用いて、逆伝播についての処理を説明する。 Next, processing for back propagation will be described with reference to FIGS.

情報処理装置１ａの第２計算部１０３は、ワークデータ格納部１１５に格納された、ＤＮＮの演算結果と、目標出力格納部１１３に格納された目標出力との差分に基づき誤差データを算出し（図１２：ステップＳ３１）、算出した誤差データをワークデータ格納部１１５における、誤差の格納領域に格納する。 The second calculation unit 103 of the information processing device 1a calculates error data based on the difference between the DNN calculation result stored in the work data storage unit 115 and the target output stored in the target output storage unit 113 ( FIG. 12: Step S31), the calculated error data is stored in the error storage area in the work data storage unit 115.

第２計算部１０３は、層の番号を表す変数であるｉに（Ｎ−１）を設定する（ステップＳ３３）。 The second calculation unit 103 sets (N−1) to i, which is a variable representing the layer number (step S33).

第２計算部１０３は、第ｉ層がラテン方陣全結合層であるか判定する（ステップＳ３５）。ここでは、第ｉ層が図３における第（ｌ＋１）層（すなわち中間の層）に相当する場合に第ｉ層がラテン方陣全結合層であると判定される。 The second calculation unit 103 determines whether or not the i-th layer is a Latin square fully connected layer (step S35). Here, when the i-th layer corresponds to the (l + 1) -th layer (that is, the intermediate layer) in FIG. 3, it is determined that the i-th layer is a Latin square full coupling layer.

第ｉ層がラテン方陣全結合層ではない場合（ステップＳ３５：Ｎｏルート）、第２計算部１０３は、逆伝播について第ｉ層の演算を実行し（ステップＳ３７）、処理結果をワークデータ格納部１１５に格納する。そして処理はステップＳ４１に移行する。 When the i-th layer is not a Latin square all-coupled layer (step S35: No route), the second calculation unit 103 executes the i-th layer operation for back propagation (step S37), and the processing result is stored in the work data storage unit. 115. Then, the process proceeds to step S41.

図１３を用いて、逆伝播についての演算について説明する。メモリ２５０１には各層の処理に用いられるデータ及び各層の処理の結果が格納される。矢印はポインタを表す。 The calculation for back propagation will be described with reference to FIG. The memory 2501 stores data used for processing of each layer and processing results of each layer. The arrow represents a pointer.

層２においては、領域「Ｄｉｆｆ−２」に格納されたデータを入力誤差とし且つ領域「Ｗｅｉｇｈｔ−２」に格納されたデータを重みとして処理が実行され、処理結果が出力誤差として領域「Ｄｉｆｆ−１」に格納される。また、領域「Ｄｉｆｆ−２」に格納されたデータを入力誤差とし且つ領域「Ｄａｔａ−２」に格納されたデータを入力データとして処理が実行され、処理結果が重みの誤差として領域「ＤｅｌｔａＷ−２」に格納される。 In the layer 2, processing is executed with the data stored in the region “Diff-2” as an input error and the data stored in the region “Weight-2” as a weight, and the processing result is output as an output error in the region “Diff−”. 1 ". Further, the process is executed using the data stored in the area “Diff-2” as the input error and the data stored in the area “Data-2” as the input data, and the processing result is used as the weight error in the area “DeltaW-2”. Is stored.

ｄｚは入力誤差を表し、ｗは重みを表し、ｄｕは出力誤差を表し、ｕは入力を表し、ｄｗは重みの誤差を表す。バッチサイズがｍ（ｍは自然数）である場合、上記計算は以下のような式で表される。 dz represents an input error, w represents a weight, du represents an output error, u represents an input, and dw represents a weight error. When the batch size is m (m is a natural number), the above calculation is expressed by the following equation.

層１においては、領域「Ｄｉｆｆ−１」に格納されたデータを入力誤差とし且つ領域「Ｗｅｉｇｈｔ−１」に格納されたデータを重みとして処理が実行され、処理結果が出力誤差として領域「Ｄｉｆｆ−０」に格納される。また、領域「Ｄｉｆｆ−１」に格納されたデータを入力誤差とし且つ領域「Ｄａｔａ−１」に格納されたデータを入力データとして処理が実行され、処理結果が重みの誤差として領域「ＤｅｌｔａＷ−１」に格納される。 In the layer 1, processing is executed using the data stored in the region “Diff−1” as an input error and the data stored in the region “Weight-1” as a weight, and the processing result is output as an output error in the region “Diff−”. 0 ". Further, the process is executed using the data stored in the area “Diff-1” as an input error and the data stored in the area “Data-1” as input data, and the processing result is regarded as a weight error in the area “DeltaW-1”. Is stored.

層０においては、領域「Ｄｉｆｆ−０」に格納されたデータを入力誤差とし且つ領域「Ｗｅｉｇｈｔ−０」に格納されたデータを重みとして処理が実行され、最終的な誤差が算出される。また、領域「Ｄｉｆｆ−０」に格納されたデータを入力誤差とし且つ領域「Ｄａｔａ−０」に格納されたデータを入力データとして処理が実行され、処理結果が重みの誤差として領域「ＤｅｌｔａＷ−０」に格納される。 In the layer 0, the process is executed using the data stored in the region “Diff-0” as an input error and the data stored in the region “Weight-0” as a weight, and a final error is calculated. In addition, the process is executed using the data stored in the area “Diff-0” as the input error and the data stored in the area “Data-0” as the input data, and the processing result is regarded as a weight error in the area “DeltaW-0”. Is stored.

ステップＳ３７において算出された、重みの誤差の行列（以下、重み行列の誤差と呼ぶ）は、各情報処理装置において重み行列の更新に利用される。 The weight error matrix (hereinafter referred to as weight matrix error) calculated in step S37 is used for updating the weight matrix in each information processing apparatus.

図１２の説明に戻り、第ｉ層がラテン方陣全結合層である場合（ステップＳ３５：Ｙｅｓルート）、第２計算部１０３は、ラテン方陣全結合層について逆伝播の演算を実行する（ステップＳ３９）。ステップＳ３９の処理については、図１４乃至図２２を用いて説明する。 Returning to the description of FIG. 12, when the i-th layer is a Latin square full coupling layer (step S <b> 35: Yes route), the second calculation unit 103 performs a back propagation operation on the Latin square full coupling layer (step S <b> 39). ). The process of step S39 will be described with reference to FIGS.

まず、第２計算部１０３は、ワークデータ格納部１１５における、誤差の格納領域（すなわち、領域「Ｄｉｆｆ−ｉ」）からデータを入力誤差データとして読み出す（図１４：ステップＳ６１）。 First, the second calculation unit 103 reads data as input error data from the error storage area (that is, the area “Diff-i”) in the work data storage unit 115 (FIG. 14: step S61).

第２計算部１０３は、ワークデータ格納部１１５における、重みの格納領域（すなわち、領域「Ｗｅｉｇｈｔ−ｉ」及び領域「Ｗｅｉｇｈｔ−（ｉ−１）」）から第２の重み行列と第１の重み行列とを読み出す（ステップＳ６３）。以下では、第２の重み行列をＭ₂とし、第１の重み行列をＭ₁とする。 The second calculation unit 103 calculates the second weight matrix and the first weight from the weight storage area (that is, the area “Weight-i” and the area “Weight- (i−1)”) in the work data storage unit 115. The matrix is read (step S63). In the following, the second weight matrix is M ₂ and the first weight matrix is M ₁ .

第２計算部１０３は、ステップＳ６１において読み出された入力誤差データの行列と、ステップＳ６３において読み出された第２の重み行列との行列積を計算する（ステップＳ６５）。計算の結果は、ワークデータ格納部１１５における格納領域（例えば、領域「Ｄｉｆｆ−（ｉ−１）」）に格納される。 The second calculator 103 calculates a matrix product of the matrix of the input error data read in step S61 and the second weight matrix read in step S63 (step S65). The calculation result is stored in a storage area (for example, the area “Diff− (i−1)”) in the work data storage unit 115.

第２計算部１０３は、ステップＳ６５の行列積の結果と、ステップＳ６３において読み出した第１の重み行列との行列積を計算する（ステップＳ６７）。 The second calculation unit 103 calculates the matrix product of the matrix product result in step S65 and the first weight matrix read in step S63 (step S67).

このように、ステップＳ６５及びＳ６７においては、Ｓ２５及びＳ２７とは逆方向の計算が行われる。例えば、入力誤差データの行列をΔＺとすると、ΔＺ＊Ｍ₂→ΔＹとして中間値の誤差の行列ΔＹが算出され、さらに、ΔＹ＊Ｍ₁→ΔＸとして出力誤差データの行列が算出される。 Thus, in steps S65 and S67, calculation in the opposite direction to S25 and S27 is performed. For example, if the input error data matrix is ΔZ, an intermediate error matrix ΔY is calculated as ΔZ * M ₂ → ΔY, and further, an output error data matrix is calculated as ΔY * M ₁ → ΔX.

第２計算部１０３は、ステップＳ６７において算出された行列積の結果を、ワークデータ格納部１１５における、データの格納領域（すなわち、領域「Ｄｉｆｆ−（ｉ−２）」）に出力データとして書き込む（ステップＳ６９）。 The second calculation unit 103 writes the result of the matrix product calculated in step S67 as output data in the data storage area (that is, the area “Diff− (i−2)”) in the work data storage unit 115 ( Step S69).

第２計算部１０３は、データの格納領域（すなわち、領域「Ｄａｔａ−ｉ」及び領域「Ｄａｔａ−（ｉ−１）」）からデータを入力データとして読み出す（ステップＳ７１）。 The second calculation unit 103 reads data as input data from the data storage area (that is, the area “Data-i” and the area “Data- (i−1)”) (step S71).

第２計算部１０３は、ステップＳ７１において読み出された入力データの行列と、ステップＳ６１において読み出された入力誤差データの行列との行列積に基づき、第２の重み行列の誤差と第１の重み行列の誤差とを算出する（ステップＳ７３）。具体的には、ステップＳ７１において領域「Ｄａｔａ−ｉ」から読み出された入力データをＹとし、ステップＳ６５において算出された行列積の結果をΔＺとすると、Ｙ＊ΔＺ→ΔＭ₂として第２の重み行列の誤差ΔＭ₂が算出される。また、ステップＳ７１において領域「Ｄａｔａ−（ｉ−１）」から読み出された入力データをＸとし、ステップＳ６７において算出された行列積の結果をΔＹとすると、Ｘ＊ΔＹ→ΔＭ₁として第１の重み行列の誤差ΔＭ₁が算出される。 Based on the matrix product of the input data matrix read in step S71 and the input error data matrix read in step S61, the second calculator 103 calculates the error of the second weight matrix and the first An error of the weight matrix is calculated (step S73). Specifically, if the input data read from the area “Data-i” in step S71 is Y and the matrix product result calculated in step S65 is ΔZ, Y * ΔZ → ΔM ₂ A weight matrix error ΔM ₂ is calculated. Further, the input data read from the area "Data- (i-1)" is X in step S71, when the result of the calculated matrix product in step S67 and [Delta] Y, the as X * ΔY → ΔM _{1 1} An error ΔM ₁ of the weight matrix is calculated.

本実施の形態においては、全結合が３つの層を用いて実現されるので、２つの重み行列の誤差がそれぞれ計算される。 In the present embodiment, since total coupling is realized using three layers, errors of two weight matrices are calculated respectively.

第２計算部１０３は、ステップＳ７３において算出した、第２の重み行列の誤差と第１の重み行列の誤差とに対して、マスク行列を用いたマスク処理を実行し（ステップＳ７５）、第２の重み行列の誤差と第１の重み行列の誤差とを更新する。マスク行列については後述する。 The second calculation unit 103 performs mask processing using the mask matrix on the error of the second weight matrix and the error of the first weight matrix calculated in step S73 (step S75), The error of the weight matrix and the error of the first weight matrix are updated. The mask matrix will be described later.

ｄｗは重みの誤差を表す。バッチサイズがｍ（ｍは自然数）である場合、上記計算は以下のような式で表される。 dw represents a weight error. When the batch size is m (m is a natural number), the above calculation is expressed by the following equation.

第２計算部１０３は、ステップＳ７５においてマスク処理が実行された、第２の重み行列の誤差と第１の重み行列の誤差とを、誤差の格納領域（例えば、領域「Ｄｅｌｔａ＿Ｗ−ｉ」及び領域「Ｄｅｌｔａ＿Ｗ−（ｉ−１）」）に書き込む（ステップＳ７７）。 The second calculation unit 103 converts the error of the second weight matrix and the error of the first weight matrix, for which mask processing has been performed in step S75, into error storage areas (for example, the area “Delta_Wi” and the area "Delta_W- (i-1)") (step S77).

第２計算部１０３は、２層分の処理を実行したので、ｉを１デクリメントする（ステップＳ７９）。処理は呼び出し元に戻る。 Since the second calculation unit 103 has executed processing for two layers, i is decremented by 1 (step S79). Processing returns to the caller.

図１５乃至図２２を用いて、マスク行列について説明する。 The mask matrix will be described with reference to FIGS.

まず、図１５に示すようなラテン方陣ファットツリーに対してマスク行列を生成することを考える。図１５に示したラテン方陣ファットツリーは、図３に示したラテン方陣全結合層の半分に相当する。ラテン方陣ファットツリーには（ｎ²＋ｎ＋１）のノードが含まれ、各ノードは（ｎ＋１）本のエッジを有する。図１５に示したラテン方陣ファットツリーは、ｎ＝２の場合のラテン方陣ファットツリーに相当する。 First, consider generating a mask matrix for a Latin square fat tree as shown in FIG. The Latin square fat tree shown in FIG. 15 corresponds to half of the Latin square full coupling layer shown in FIG. The Latin square fat tree contains (n ² + n + 1) nodes, each node having (n + 1) edges. The Latin square fat tree shown in FIG. 15 corresponds to the Latin square fat tree in the case of n = 2.

図１５に示したラテン方陣ファットツリーには７台のノードを含む層が２つ含まれ、４９（＝７＊７）本のエッジが存在しうる。従って、図１６（ａ）に示すような、４９の要素を有するマスク行列を考えることができるが、実際には２１本のエッジしか存在しないので、４９の要素のうち２１の要素を特定する。そして特定した要素以外の要素を０に設定してマスク行列として利用することで、マスク処理が実行された行列を疎行列に変換する。 The Latin square fat tree shown in FIG. 15 includes two layers including seven nodes, and there may be 49 (= 7 * 7) edges. Accordingly, a mask matrix having 49 elements as shown in FIG. 16A can be considered. However, since there are actually only 21 edges, 21 elements among the 49 elements are specified. Then, by setting elements other than the specified element to 0 and using them as a mask matrix, the matrix subjected to mask processing is converted into a sparse matrix.

図１６（ｂ）に示すように、マスク行列を、右下のｎ²＊ｎ²の領域と、その他の部分とに分ける。まず、その他の部分から要素が特定される。最初に、１行目について（ｎ＋１）の要素がマスクとして特定される。 As shown in FIG. 16B, the mask matrix is divided into a lower right n ² * n ² region and other portions. First, an element is specified from other parts. First, (n + 1) elements are specified as a mask for the first row.

次に、図１７（ａ）に示すように、２行目についてｎの要素がマスクとして特定され、３行目についてｎの要素がマスクとして特定される。 Next, as shown in FIG. 17A, n elements are specified as a mask for the second line, and n elements are specified as a mask for the third line.

以上の操作が列についても実行される。すなわち、図１７（ｂ）に示すように、１列目について（ｎ＋１）の要素がマスクとして特定される。但し、既に１つの要素が特定されているので、ここでは新たに２つの要素がマスクとして特定される。そして、図１８に示すように、２列目についてｎの要素がマスクとして特定され、３列目についてｎの要素がマスクとして特定される。ここまでの操作によって、ｎ²＊ｎ²の領域以外の領域についてマスクの特定が完了する。 The above operation is also executed for the columns. That is, as shown in FIG. 17B, (n + 1) elements are specified as a mask for the first column. However, since one element has already been specified, two new elements are specified as masks here. Then, as shown in FIG. 18, n elements are specified as masks for the second column, and n elements are specified as masks for the third column. By the operation so far, the mask specification is completed for the region other than the region of n ² * n ² .

次に、ｎ²＊ｎ²の領域からマスクを特定する。まず、図１９（ａ）に示すように、ｎ²＊ｎ²の領域をｎ＊ｎのブロックに分割する。そして、図１９（ｂ）に示すように、各ブロックに対して２次元のＩＤ（ｘ，ｙ）を割り当てる。左上のブロックに対しては（０，０）が割り当てられ、左下のブロックに対しては（０，１）が割り当てられ、右上のブロックに対しては（１，０）が割り当てられ、右下のブロックに対しては（１，１）が割り当てられる。 Next, a mask is specified from the region of n ² * n ² . First, as shown in FIG. 19A, the n ² * n ² region is divided into n * n blocks. Then, as shown in FIG. 19B, a two-dimensional ID (x, y) is assigned to each block. (0,0) is assigned to the upper left block, (0,1) is assigned to the lower left block, (1,0) is assigned to the upper right block, and lower right (1, 1) is assigned to this block.

そして、ｎ＊ｎの各ブロックにおける第ｉ（ｉは０≦ｉ≦（ｎ−１））行について、第（（ｉ＋（ｘ＊ｙ））ｍｏｄ２）列の要素がマスクとして特定される。例えばＩＤが（０，０）である左上のブロックについては、第０行（ここでは、２つの行のうち上の行とする）について、（（０＋（０＊０））ｍｏｄ２）＝０が成立するので、第０列（ここでは、２つの列のうち左の列とする）の要素がマスクとして特定される。第１行について、（（１＋（０＊０））ｍｏｄ２）＝１が成立するので、第１列（ここでは、２つの列のうち右の列とする）の要素がマスクとして特定される。以上のような処理によって、図２０に示すマスク行列が完成する。生成されたマスク行列は、第１の重み行列および第２の重み行列の両方に対して適用される。 Then, for the i-th (i is 0 ≦ i ≦ (n−1)) row in each block of n * n, the ((i + (x * y)) mod 2) column elements are specified as a mask. For example, for the upper left block whose ID is (0, 0), ((0+ (0 * 0)) mod 2) = 0 for the 0th row (here, the upper row of the two rows). Therefore, the element in the 0th column (here, the left column of the two columns) is specified as a mask. Since ((1+ (0 * 0)) mod 2) = 1 holds for the first row, the element in the first column (here, the right column of the two columns) is specified as a mask. . The mask matrix shown in FIG. 20 is completed by the above processing. The generated mask matrix is applied to both the first weight matrix and the second weight matrix.

次に、図２１を用いて、幅が（ｎ²＋ｎ＋１）であるマスク行列をｎ＝３の場合に生成する方法について説明する。 Next, a method for generating a mask matrix having a width of (n ² + n + 1) when n = 3 will be described with reference to FIG.

図２１において、マスク行列２１０１はｎ²＊ｎ²の領域とその他の領域とに分けられており、ｎ²＊ｎ²の領域においては初期的に対角の要素がマスクとして特定されている。ｎ²＊ｎ²の領域はｎ＊ｎのブロックに分割されており、各ブロックにおける対角の要素は、ガロア体（ここではＧＦ（３））２１０３に従って右シフトされる。すると、マスク行列２１０２が完成する。 In FIG. 21, the mask matrix 2101 is divided into an n ² * n ² area and other areas, and diagonal elements are initially specified as masks in the n ² * n ² area. The n ² * n ² region is divided into n * n blocks, and the diagonal elements in each block are right-shifted according to Galois field (here, GF (3)) 2103. Then, the mask matrix 2102 is completed.

ガロア体とは、要素数が有限の集合であり、有限体とも呼ばれる。ＧＦ（ｎ）は、計算結果に対してｍｏｄｎを計算することで生成される。ガロア体はラテン方陣ファットツリーの有限射影平面に相当する。ガロア体は通信の誤り訂正およびＡＥＳ（Advanced Encryption Standard ）で利用されている。図２２に、ＧＦ（５）の一例を示す。図２２（ａ）には、要素番号の和に対してｍｏｄ５を計算することで生成されたガロア体が示されており、図２２（ｂ）には、要素番号の積に対してｍｏｄ５を計算することで生成されたガロア体が示されている。 A Galois field is a set with a finite number of elements and is also called a finite field. GF (n) is generated by calculating mod n for the calculation result. A Galois field corresponds to a finite projective plane of a Latin square fat tree. The Galois field is used for communication error correction and AES (Advanced Encryption Standard). FIG. 22 shows an example of GF (5). 22A shows a Galois field generated by calculating mod 5 for the sum of element numbers, and FIG. 22B shows mod 5 for the product of element numbers. The Galois field generated by computing is shown.

図１２の説明に戻り、第２計算部１０３は、ｉを１デクリメントする（ステップＳ４１）。第２計算部１０３は、ｉ≧０が成立するか判定する（ステップＳ４３）。 Returning to the description of FIG. 12, the second calculation unit 103 decrements i by 1 (step S41). The second calculation unit 103 determines whether i ≧ 0 is satisfied (step S43).

ｉ≧０が成立する場合（ステップＳ４３：Ｙｅｓルート）、処理はステップＳ３５に戻る。一方、ｉ≧０が成立しない場合（ステップＳ４３：Ｎｏルート）、処理は端子Ａを介して図２３のステップＳ４５に移行する。 If i ≧ 0 holds (step S43: Yes route), the process returns to step S35. On the other hand, if i ≧ 0 does not hold (step S43: No route), the process proceeds to step S45 in FIG.

図２３の説明に移行し、通信部１０５は、他の情報処理装置（すなわち、情報処理装置１ｂ及び１ｃ）において計算された、重み行列の誤差を他の情報処理装置から受信し、情報処理装置１ａの第２計算部１０３が計算した、重み行列の誤差を他の情報処理装置に送信する（ステップＳ４５）。 Shifting to the description of FIG. 23, the communication unit 105 receives the error of the weight matrix calculated in the other information processing apparatuses (that is, the information processing apparatuses 1 b and 1 c) from the other information processing apparatuses, and the information processing apparatus The error of the weight matrix calculated by the second calculation unit 103 of 1a is transmitted to another information processing apparatus (step S45).

第２計算部１０３は、層の番号を表す変数であるｉに（Ｎ−１）を設定する（ステップＳ４７）。 The second calculation unit 103 sets (N−1) to i, which is a variable representing the layer number (step S47).

第２計算部１０３は、第ｉ層について、重み行列の誤差の行列を用いて重み行列を更新する（ステップＳ４９）。例えば、重み行列の誤差をΔＷとし、重み行列をＷとすると、Ｗ−ΔＷ→Ｗとして重み行列が更新される。但し、ラーニングレートやバイアス等をさらに用いて重み行列を更新してもよい。 The second calculation unit 103 updates the weight matrix for the i-th layer using the error matrix of the weight matrix (step S49). For example, if the error of the weight matrix is ΔW and the weight matrix is W, the weight matrix is updated as W−ΔW → W. However, the weighting matrix may be updated by further using a learning rate, a bias, or the like.

第２計算部１０３は、ｉを１デクリメントする（ステップＳ５１）。第２計算部１０３は、ｉ≧０が成立するか判定する（ステップＳ５３）。ｉ≧０が成立する場合（ステップＳ５３：Ｙｅｓルート）、処理はステップＳ４９に戻る。一方、ｉ≧０が成立しない場合（ステップＳ５３：Ｎｏルート）、処理は終了する。 The second calculation unit 103 decrements i by 1 (step S51). The second calculation unit 103 determines whether i ≧ 0 is satisfied (step S53). If i ≧ 0 holds (step S53: Yes route), the process returns to step S49. On the other hand, when i ≧ 0 is not satisfied (step S53: No route), the process ends.

以上のように、本実施の形態の方法によれば、ＤＮＮにおいて全結合される部分のエッジ数が減るので、パラメータである重みの数が減る。これにより、メモリ２５０１等の記憶装置の容量の消費を抑制できるようになり、また、計算量を削減することができるようになる。従って、プロセッサやメモリ等の物理リソースが乏しい計算機を使用した運用にも有効である。さらに、通信において情報処理装置間で交換されるデータの量が減るので、ＤＮＮの並列計算に要する時間を削減できるようになる。 As described above, according to the method of the present embodiment, the number of edges of the part that is fully combined in the DNN is reduced, so that the number of weights that are parameters is reduced. As a result, the consumption of the capacity of the storage device such as the memory 2501 can be suppressed, and the calculation amount can be reduced. Therefore, it is also effective for operation using a computer with scarce physical resources such as a processor and a memory. Furthermore, since the amount of data exchanged between information processing devices in communication is reduced, the time required for DNN parallel computation can be reduced.

図２４に、情報処理装置の台数と並列計算に要する時間との関係を示す。図２４において、縦軸は時間を表し、横軸は情報処理装置の台数を表す。図２４に示すように、情報処理装置の台数を増やせば、計算に要する時間は短縮される。しかし、たとえ情報処理装置の台数を増やしたとしても、通信に要する時間は短縮されない。そのため、情報処理装置の台数を増やすほど通信に要する時間が問題になる。しかし、本実施の形態の方法によれば、通信に要する時間を短縮することができるので、特に情報処理装置の数が多い場合には並列計算の性能向上の効果が大きい。 FIG. 24 shows the relationship between the number of information processing apparatuses and the time required for parallel calculation. In FIG. 24, the vertical axis represents time, and the horizontal axis represents the number of information processing apparatuses. As shown in FIG. 24, if the number of information processing apparatuses is increased, the time required for calculation is shortened. However, even if the number of information processing apparatuses is increased, the time required for communication is not shortened. Therefore, the time required for communication becomes a problem as the number of information processing apparatuses increases. However, according to the method of the present embodiment, the time required for communication can be shortened, so that the effect of improving the performance of parallel computing is great particularly when the number of information processing apparatuses is large.

なお、ラテン方陣全結合層を任意のユニット数で構築することはできず、原則として各ユニットは（ｎ＋１）本のエッジを有し、その時のノード数は（ｎ²＋ｎ＋１）である。ユニット数が（ｎ²＋ｎ＋１）である時、全エッジ数は２（ｎ²＋ｎ＋１）（ｎ＋１）である。これに対して、通常の全結合層のユニット数が（ｎ²＋ｎ＋１）である場合、全エッジ数は（ｎ²＋ｎ＋１）²である。 Note that the Latin square full coupling layer cannot be constructed with an arbitrary number of units. In principle, each unit has (n + 1) edges, and the number of nodes at that time is (n ² + n + 1). When the number of units is (n ² + n + 1), the total number of edges is 2 (n ² + n + 1) (n + 1). On the other hand, when the number of units of the normal total coupling layer is (n ² + n + 1), the total number of edges is (n ² + n + 1) ² .

これを前提として、図２５及び図２６を用いて、エッジ数の削減について説明する。図２５は、エッジ数とユニット数との関係をラテン方陣全結合層及び通常の全結合層の場合について示す図である。図２５において、縦軸はエッジ数を表し、横軸はユニット数を表す。例えば、ユニット数が約１０００である時、ラテン方陣全結合層のエッジ数は約１０００００であり、通常の全結合層のエッジ数は約１００００００である。 Based on this premise, the reduction in the number of edges will be described with reference to FIGS. 25 and 26. FIG. 25 is a diagram illustrating the relationship between the number of edges and the number of units in the case of a Latin square full coupling layer and a normal full coupling layer. In FIG. 25, the vertical axis represents the number of edges, and the horizontal axis represents the number of units. For example, when the number of units is about 1000, the number of edges of the Latin square full bonding layer is about 100,000, and the number of edges of a normal full bonding layer is about 1,000,000.

図２６は、ラテン方陣全結合層のエッジ数と通常の全結合層のエッジ数との比率を示す図である。図２６において、縦軸は比率を表し、横軸はユニット数を表す。比率が１より小さい場合、ラテン方陣全結合層のエッジ数は通常の全結合層のエッジ数より少ない。このように、ユニット数が極端に少ない場合を除くと、ラテン方陣全結合層のエッジ数は通常の全結合層のエッジ数より少ない。 FIG. 26 is a diagram showing the ratio between the number of edges of the Latin square full coupling layer and the number of edges of the normal full coupling layer. In FIG. 26, the vertical axis represents the ratio, and the horizontal axis represents the number of units. If the ratio is less than 1, the number of edges in the Latin square full coupling layer is less than the number of edges in the normal full coupling layer. As described above, except for the case where the number of units is extremely small, the number of edges of the Latin square full coupling layer is smaller than the number of edges of the normal total coupling layer.

［実施の形態２］
図２７に示すように、第１の実施の形態におけるラテン方陣全結合層は、２つのラテン方陣ファットツリーが連結された形状を有している。マスク行列２７０１は左側のエッジ群に対応し、マスク行列２７０２は右側のエッジ群に対応する。塗りつぶされた要素はエッジに対応する。左側の層におけるユニットｕ２７からの出力は、中間の層のユニットを経由して、右側の層の全ユニットに出力される。図２７において、左側の層のユニット数と、中間の層のユニット数と、右側の層のユニット数とは等しい。 [Embodiment 2]
As shown in FIG. 27, the Latin square full coupling layer in the first embodiment has a shape in which two Latin square fat trees are connected. The mask matrix 2701 corresponds to the left edge group, and the mask matrix 2702 corresponds to the right edge group. Filled elements correspond to edges. The output from the unit u27 in the left layer is output to all the units in the right layer via the unit in the middle layer. In FIG. 27, the number of units in the left layer, the number of units in the middle layer, and the number of units in the right layer are equal.

但し、左側の層のユニット数および右側の層のユニット数は、必ずしも中間の層のユニット数と同じでなくてもよい。例えば図２８に示すように、左側の層のユニットのうち破線で囲まれた部分に含まれるユニットと、中間の層のユニットと、右側の層のユニットのうち破線で囲まれた部分に含まれるユニットとでラテン方陣全結合層が構築されてもよい。この場合、各マスク行列の行数はユニット数に従って減らされる。中間の層のユニット数は（ｎ²＋ｎ＋１）であり、右側の層のユニットのうち破線で囲まれた部分に含まれるユニットの数および左側の層のユニットのうち破線で囲まれた部分に含まれるユニットの数以上になるように最小のｎが探索される。 However, the number of units in the left layer and the number of units in the right layer are not necessarily the same as the number of units in the intermediate layer. For example, as shown in FIG. 28, a unit included in a portion surrounded by a broken line among units on the left layer, a unit included in a middle layer, and a unit surrounded by a broken line among units in a right layer. A Latin square full connective layer may be constructed with units. In this case, the number of rows in each mask matrix is reduced according to the number of units. The number of units in the middle layer is (n ² + n + 1), the number of units included in the portion surrounded by the broken line among the units in the right layer, and the portion surrounded by the broken line in the units in the left layer The smallest n is searched so as to be equal to or greater than the number of units to be processed.

これにより、いかなる全結合層に対しても本実施の形態のラテン方陣全結合層を適用することができるようになる。 As a result, the Latin square full coupling layer of the present embodiment can be applied to any total coupling layer.

［実施の形態３］
第１の実施の形態における並列計算システム１０００は複数の情報処理装置を有し、複数の情報処理装置がネットワークで接続される。しかし、システムの形態はこのような形態に限られるわけではない。例えば図２９に示すように、情報処理装置１内でプロセッサ１０ａ乃至１０ｃがバス等を介して接続され、プロセッサ１０ａ乃至１０ｃが並列計算を実行するようなシステムであってもよい。 [Embodiment 3]
The parallel computing system 1000 according to the first embodiment includes a plurality of information processing apparatuses, and the plurality of information processing apparatuses are connected via a network. However, the form of the system is not limited to such a form. For example, as shown in FIG. 29, a system in which processors 10a to 10c are connected via a bus or the like in the information processing apparatus 1 and the processors 10a to 10c execute parallel computation may be used.

以上本発明の一実施の形態を説明したが、本発明はこれに限定されるものではない。例えば、上で説明した情報処理装置１ａの機能ブロック構成は実際のプログラムモジュール構成に一致しない場合もある。 Although one embodiment of the present invention has been described above, the present invention is not limited to this. For example, the functional block configuration of the information processing apparatus 1a described above may not match the actual program module configuration.

また、上で説明した情報処理装置１ａ乃至１ｃおよび情報処理装置１の構成は一例であって、上記のような構成でなければならないわけではない。さらに、処理フローにおいても、処理結果が変わらなければ処理の順番を入れ替えることも可能である。さらに、並列に実行させるようにしても良い。 Further, the configurations of the information processing apparatuses 1a to 1c and the information processing apparatus 1 described above are merely examples, and the configuration as described above is not necessarily required. Further, in the processing flow, the processing order can be changed if the processing result does not change. Further, it may be executed in parallel.

［付録］
本付録においては、ラテン方陣ファットツリーおよび有限射影平面について説明する。 [Appendix]
This appendix describes the Latin square fat tree and the finite projective plane.

有限射影平面の構造は、トポロジー構造に置き換えられる。図３０（ａ）に示したトポロジー構造は、図３０（ｂ）に示した有限射影平面の構造に置き換えられる。図３０（ａ）において、ハッチングされた矩形はスパインスイッチを表し、ハッチングされていない矩形はリーフスイッチを表し、円はサーバを表す。図３０（ｂ）において、直線はスパインスイッチを表し、点はリーフスイッチを表す。 The structure of the finite projective plane is replaced with a topology structure. The topology structure shown in FIG. 30A is replaced with the structure of the finite projection plane shown in FIG. In FIG. 30A, a hatched rectangle represents a spine switch, an unhatched rectangle represents a leaf switch, and a circle represents a server. In FIG. 30B, a straight line represents a spine switch, and a point represents a leaf switch.

図３１（ａ）に示したトポロジー構造は、スパインスイッチの数が７であり且つリーフスイッチの数が７であるラテン方陣ファットツリーのトポロジー構造であり、図３０（ｂ）に示した有限射影平面の構造に対応する。図３１（ａ）において太線で囲まれた部分のトポロジー構造は、図３０（ａ）のトポロジー構造と同じである。また、図３１（ｂ）において太線で囲まれた部分の構造は、図３１（ａ）において太線で囲まれた部分のトポロジー構造に対応する。 The topology structure shown in FIG. 31A is a Latin square fat tree topology structure in which the number of spine switches is 7 and the number of leaf switches is 7, and the finite projection plane shown in FIG. Corresponds to the structure of In FIG. 31A, the topology structure of the portion surrounded by the thick line is the same as the topology structure of FIG. Further, the structure of the portion surrounded by the thick line in FIG. 31B corresponds to the topology structure of the portion surrounded by the thick line in FIG.

有限射影平面とは、普通の平面に無限遠点をいくつか加え且つ「平行な２直線」をなくした平面に相当する。図３２に、位数（以下ｎとする）が２であり且つポート数が６（＝２（ｎ＋１））である場合の有限射影平面の構造を示す。図３２において、ハッチングされた矩形上にある４（＝ｎ＊ｎ）台のリーフスイッチ以外の３（＝ｎ＋１）台のリーフスイッチは無限遠点に相当する。 The finite projective plane corresponds to a plane obtained by adding several infinity points to an ordinary plane and eliminating “two parallel straight lines”. FIG. 32 shows the structure of the finite projection plane when the order (hereinafter referred to as n) is 2 and the number of ports is 6 (= 2 (n + 1)). In FIG. 32, 3 (= n + 1) leaf switches other than the 4 (= n * n) leaf switches on the hatched rectangle correspond to infinity points.

有限射影平面においては、（ｎ²＋ｎ＋１）の点が存在する。直線の数は（ｎ²＋ｎ＋１）である。任意の２直線は１点で交わり、任意の２点を結ぶ直線がただ一つ存在する。但し、ｎは素数であるという制約がある。 In the finite projective plane, there are (n ² + n + 1) points. The number of straight lines is (n ² + n + 1). Any two straight lines intersect at one point, and there is only one straight line connecting any two points. However, there is a restriction that n is a prime number.

図３２に示した構造は、図３３に示す構造に変換することができる。図３３において、ハッチングされた格子部分は、図３２におけるハッチング部分に対応する。格子部分において平行な直線群は、追加の点において交わるように変換される。すなわち、傾きが等しい直線同士が交わるように変換される。 The structure shown in FIG. 32 can be converted into the structure shown in FIG. In FIG. 33, the hatched lattice portion corresponds to the hatched portion in FIG. Parallel lines in the grid portion are transformed to meet at additional points. That is, conversion is performed so that straight lines having the same inclination intersect.

以上で付録を終了する。 This completes the appendix.

なお、上で述べた情報処理装置１ａ乃至１ｃは、コンピュータ装置であって、図３４に示すように、メモリ２５０１とＣＰＵ２５０３とＨＤＤ２５０５と表示装置２５０９に接続される表示制御部２５０７とリムーバブル・ディスク２５１１用のドライブ装置２５１３と入力装置２５１５とネットワークに接続するための通信制御部２５１７とがバス２５１９で接続されている。オペレーティング・システム（ＯＳ：Operating System）及び本実施例における処理を実施するためのアプリケーション・プログラムは、ＨＤＤ２５０５に格納されており、ＣＰＵ２５０３により実行される際にはＨＤＤ２５０５からメモリ２５０１に読み出される。ＣＰＵ２５０３は、アプリケーション・プログラムの処理内容に応じて表示制御部２５０７、通信制御部２５１７、ドライブ装置２５１３を制御して、所定の動作を行わせる。また、処理途中のデータについては、主としてメモリ２５０１に格納されるが、ＨＤＤ２５０５に格納されるようにしてもよい。本発明の実施例では、上で述べた処理を実施するためのアプリケーション・プログラムはコンピュータ読み取り可能なリムーバブル・ディスク２５１１に格納されて頒布され、ドライブ装置２５１３からＨＤＤ２５０５にインストールされる。インターネットなどのネットワーク及び通信制御部２５１７を経由して、ＨＤＤ２５０５にインストールされる場合もある。このようなコンピュータ装置は、上で述べたＣＰＵ２５０３、メモリ２５０１などのハードウエアとＯＳ及びアプリケーション・プログラムなどのプログラムとが有機的に協働することにより、上で述べたような各種機能を実現する。 The information processing apparatuses 1a to 1c described above are computer apparatuses, and as shown in FIG. 34, a display control unit 2507 and a removable disk 2511 connected to a memory 2501, a CPU 2503, an HDD 2505, and a display apparatus 2509. Drive device 2513, input device 2515, and communication control unit 2517 for connecting to a network are connected by a bus 2519. An operating system (OS) and an application program for executing the processing in this embodiment are stored in the HDD 2505, and are read from the HDD 2505 to the memory 2501 when executed by the CPU 2503. The CPU 2503 controls the display control unit 2507, the communication control unit 2517, and the drive device 2513 according to the processing content of the application program, and performs a predetermined operation. Further, data in the middle of processing is mainly stored in the memory 2501, but may be stored in the HDD 2505. In the embodiment of the present invention, an application program for performing the above-described processing is stored in a computer-readable removable disk 2511 and distributed, and installed in the HDD 2505 from the drive device 2513. In some cases, the HDD 2505 may be installed via a network such as the Internet and the communication control unit 2517. Such a computer apparatus realizes various functions as described above by organically cooperating hardware such as the CPU 2503 and the memory 2501 described above and programs such as the OS and application programs. .

以上述べた本発明の実施の形態をまとめると、以下のようになる。 The embodiment of the present invention described above is summarized as follows.

本実施の形態の第１の態様に係る情報処理装置（例えば、情報処理装置１ａ）は、（Ａ）第１の層と、第１の層からの出力を受け付ける第２の層と、第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行する第１計算部（例えば、第１計算部１０１）と、（Ｂ）順伝播の計算の結果に基づき、多層ニューラルネットワークの逆伝播の計算を実行する第２計算部（例えば、第２計算部１０３）とを有する。そして、第１の層の各ユニットからの出力が第２の層のユニットを経由して第３の層の全ユニットに入力される。また、第２の層のユニットの数は、第１の層の各ユニットと第２の層との間のエッジの数および第３の層の各ユニットと第２の層との間のエッジの数より多い。 The information processing apparatus (for example, information processing apparatus 1a) according to the first aspect of the present embodiment includes (A) a first layer, a second layer that receives an output from the first layer, and a second layer A first calculation unit (for example, the first calculation unit 101) that executes a forward propagation calculation of a multilayer neural network including a third layer that receives an output from the first layer; and (B) a result of the forward propagation calculation. And a second calculation unit (for example, the second calculation unit 103) that performs back propagation calculation of the multilayer neural network. Then, the output from each unit of the first layer is input to all units of the third layer via the unit of the second layer. Also, the number of units in the second layer depends on the number of edges between each unit in the first layer and the second layer and the number of edges between each unit in the third layer and the second layer. More than the number.

第１の層と第３の層とが全結合されるようにしつつ、エッジの数が減るのでパラメータ（例えば、エッジに対応する重み）の数を削減できるようになる。また、エッジの数が減ることで計算量を減らすことができるようになる。 Since the number of edges is reduced while the first layer and the third layer are fully coupled, the number of parameters (for example, weights corresponding to the edges) can be reduced. Also, the amount of calculation can be reduced by reducing the number of edges.

また、第２計算部は、（ｂ１）順伝播の計算の結果と目標出力とを用いた逆伝播の計算により、第１の層と第２の層との間のエッジに対応する第１の重みの誤差の行列と、第２の層と第３の層との間のエッジに対応する第２の重みの誤差の行列とを算出し、（ｂ２）第１の重みの誤差の行列および第２の重みの誤差の行列に対するマスク処理により、第１の重みの誤差の行列を第１の疎行列に変換し且つ第２の重みの誤差の行列を第２の疎行列に変換し、（ｂ３）第１の疎行列に基づき、第１の重み行列を更新し、（ｂ４）第２の疎行列に基づき、第２の重み行列を更新してもよい。 In addition, the second calculation unit (b1) performs the first propagation corresponding to the edge between the first layer and the second layer by the back propagation calculation using the result of the forward propagation calculation and the target output. Calculating a weight error matrix and a second weight error matrix corresponding to an edge between the second layer and the third layer; and (b2) a first weight error matrix and a The first weight error matrix is converted to a first sparse matrix and the second weight error matrix is converted to a second sparse matrix by masking the error matrix of 2 weights (b3 ) The first weight matrix may be updated based on the first sparse matrix, and (b4) the second weight matrix may be updated based on the second sparse matrix.

上記のような多層ニューラルネットワークであっても、重みの更新を適切に行うことができるようになる。また、疎行列を利用することで計算を単純化することができるようになる。 Even in the multilayer neural network as described above, the weight can be appropriately updated. In addition, the calculation can be simplified by using a sparse matrix.

また、マスク処理において、第１の重みの誤差の行列とマスク行列とのクロネッカー積と、第２の重みの誤差の行列とマスク行列とのクロネッカー積とを実行してもよい。 In the mask process, a Kronecker product of the first weight error matrix and the mask matrix and a Kronecker product of the second weight error matrix and the mask matrix may be executed.

また、マスク行列は、存在するエッジに対応する要素が所定値に設定され、且つ、存在しないエッジに対応する要素が零に設定された行列であってもよい。 Further, the mask matrix may be a matrix in which elements corresponding to existing edges are set to a predetermined value and elements corresponding to non-existing edges are set to zero.

また、第１計算部は、（ａ１）第１の層からの出力と、第１の重みの行列との第１の行列積を計算し、（ａ２）第１の行列積の結果と、第２の重みの行列との第２の行列積に基づき、第３の層からの出力を計算してもよい。上記のような多層ニューラルネットワークであっても、入力されたデータの分類を適切に行うことができるようになる。 The first calculation unit calculates (a1) a first matrix product of the output from the first layer and a first weight matrix, and (a2) a result of the first matrix product, The output from the third layer may be calculated based on a second matrix product with a weight of 2 matrix. Even in the multilayer neural network as described above, it is possible to appropriately classify input data.

また、本情報処理装置は、（Ｃ）逆伝播の計算の結果を、情報処理装置を含む並列計算システムにおける他の情報処理装置に対して送信し、他の情報処理装置において実行された逆伝播の計算の結果を他の情報処理装置から受信する通信部（例えば、通信部１０５）をさらに有してもよい。パラメータの数が減らされているので通信データの量が減り、通信にかかる時間を短縮できるようになる。 In addition, the information processing apparatus transmits the result of (C) the back propagation calculation to another information processing apparatus in the parallel computing system including the information processing apparatus, and the back propagation executed in the other information processing apparatus. You may further have a communication part (for example, communication part 105) which receives the result of calculation of from other information processing apparatuses. Since the number of parameters is reduced, the amount of communication data is reduced, and the time required for communication can be shortened.

また、第１の層の各ユニットと第２の層との間のエッジ数、および、第３の層の各ユニットと第２の層との間のエッジ数が（ｎ（ｎは自然数）＋１）であり、第２の層のユニットの数が（ｎ²＋ｎ＋１）であってもよい。 The number of edges between each unit of the first layer and the second layer and the number of edges between each unit of the third layer and the second layer are (n (n is a natural number) +1) ), And the number of units in the second layer may be (n ² + n + 1).

本実施の形態の第２の態様に係る情報処理方法は、（Ｄ）第１の層と、第１の層からの出力を受け付ける第２の層と、第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行し、（Ｅ）順伝播の計算の結果に基づき、多層ニューラルネットワークの逆伝播の計算を実行する処理を含む。そして、第１の層の各ユニットからの出力が第２の層のユニットを経由して第３の層の全ユニットに入力される。また、第２の層のユニットの数は、第１の層の各ユニットと第２の層との間のエッジの数および第３の層の各ユニットと第２の層との間のエッジの数より多い。 The information processing method according to the second aspect of the present embodiment includes (D) a first layer, a second layer that receives an output from the first layer, and an output that receives an output from the second layer. And (E) a process of executing a back propagation calculation of the multilayer neural network based on a result of the forward propagation calculation. Then, the output from each unit of the first layer is input to all units of the third layer via the unit of the second layer. Also, the number of units in the second layer depends on the number of edges between each unit in the first layer and the second layer and the number of edges between each unit in the third layer and the second layer. More than the number.

なお、上記方法による処理をプロセッサに行わせるためのプログラムを作成することができ、当該プログラムは、例えばフレキシブルディスク、ＣＤ−ＲＯＭ、光磁気ディスク、半導体メモリ、ハードディスク等のコンピュータ読み取り可能な記憶媒体又は記憶装置に格納される。尚、中間的な処理結果はメインメモリ等の記憶装置に一時保管される。 A program for causing the processor to perform the processing according to the above method can be created, and the program can be a computer-readable storage medium such as a flexible disk, a CD-ROM, a magneto-optical disk, a semiconductor memory, a hard disk, or the like. It is stored in a storage device. The intermediate processing result is temporarily stored in a storage device such as a main memory.

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following supplementary notes are further disclosed with respect to the embodiments including the above examples.

（付記１）
第１の層と、前記第１の層からの出力を受け付ける第２の層と、前記第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行する第１計算部と、
前記順伝播の計算の結果に基づき、前記多層ニューラルネットワークの逆伝播の計算を実行する第２計算部と、
を有し、
前記第１の層の各ユニットからの出力が前記第２の層のユニットを経由して前記第３の層の全ユニットに入力され、
前記第２の層のユニットの数は、前記第１の層の各ユニットと前記第２の層との間のエッジの数および前記第３の層の各ユニットと前記第２の層との間のエッジの数より多い、
情報処理装置。 (Appendix 1)
Perform forward propagation computation of a multi-layer neural network including a first layer, a second layer that accepts output from the first layer, and a third layer that accepts output from the second layer A first calculation unit;
A second calculation unit that performs a back propagation calculation of the multilayer neural network based on the result of the forward propagation calculation;
Have
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
Information processing device.

（付記２）
前記第２計算部は、
前記順伝播の計算の結果と目標出力とを用いた前記逆伝播の計算により、前記第１の層と前記第２の層との間のエッジに対応する第１の重みの誤差の行列と、前記第２の層と前記第３の層との間のエッジに対応する第２の重みの誤差の行列とを算出し、
前記第１の重みの誤差の行列および前記第２の重みの誤差の行列に対するマスク処理により、前記第１の重みの誤差の行列を第１の疎行列に変換し且つ前記第２の重みの誤差の行列を第２の疎行列に変換し、
前記第１の疎行列に基づき、前記第１の重み行列を更新し、
前記第２の疎行列に基づき、前記第２の重み行列を更新する、
付記１記載の情報処理装置。 (Appendix 2)
The second calculator is
A matrix of first weight errors corresponding to an edge between the first layer and the second layer by the back propagation calculation using the result of the forward propagation calculation and a target output; Calculating a second weight error matrix corresponding to an edge between the second layer and the third layer;
The first weight error matrix is converted into a first sparse matrix by masking the first weight error matrix and the second weight error matrix and the second weight error To a second sparse matrix,
Updating the first weight matrix based on the first sparse matrix;
Updating the second weight matrix based on the second sparse matrix;
The information processing apparatus according to attachment 1.

（付記３）
前記マスク処理において、
前記第１の重みの誤差の行列とマスク行列とのクロネッカー積と、前記第２の重みの誤差の行列と前記マスク行列とのクロネッカー積とを実行する、
付記２記載の情報処理装置。 (Appendix 3)
In the mask process,
Performing a Kronecker product of the first weight error matrix and a mask matrix and a Kronecker product of the second weight error matrix and the mask matrix;
The information processing apparatus according to attachment 2.

（付記４）
前記マスク行列は、存在するエッジに対応する要素が所定値に設定され、且つ、存在しないエッジに対応する要素が零に設定された行列である、
付記３記載の情報処理装置。 (Appendix 4)
The mask matrix is a matrix in which elements corresponding to existing edges are set to a predetermined value and elements corresponding to non-existing edges are set to zero.
The information processing apparatus according to attachment 3.

（付記５）
前記第１計算部は、
前記第１の層からの出力と、前記第１の重みの行列との第１の行列積を計算し、
前記第１の行列積の結果と、前記第２の重みの行列との第２の行列積に基づき、前記第３の層からの出力を計算する、
付記２乃至４のいずれか１つ記載の情報処理装置。 (Appendix 5)
The first calculator is
Calculating a first matrix product of the output from the first layer and the first weight matrix;
Calculating an output from the third layer based on a second matrix product of the first matrix product result and the second weight matrix;
The information processing apparatus according to any one of appendices 2 to 4.

（付記６）
前記逆伝播の計算の結果を、前記情報処理装置を含む並列計算システムにおける他の情報処理装置に対して送信し、前記他の情報処理装置において実行された前記逆伝播の計算の結果を前記他の情報処理装置から受信する通信部
をさらに有する付記１乃至５のいずれか１つ記載の情報処理装置。 (Appendix 6)
The result of the back propagation calculation is transmitted to another information processing apparatus in a parallel computing system including the information processing apparatus, and the result of the back propagation calculation executed in the other information processing apparatus is transmitted to the other information processing apparatus. The information processing apparatus according to any one of supplementary notes 1 to 5, further comprising: a communication unit that receives the information processing apparatus from the information processing apparatus.

（付記７）
前記第１の層の各ユニットと前記第２の層との間のエッジ数、および、前記第３の層の各ユニットと前記第２の層との間のエッジ数が（ｎ（ｎは自然数）＋１）であり、
前記第２の層のユニットの数が（ｎ²＋ｎ＋１）である、
付記１乃至６のいずれか１つ記載の情報処理装置。 (Appendix 7)
The number of edges between each unit of the first layer and the second layer, and the number of edges between each unit of the third layer and the second layer are (n (n is a natural number) ) +1),
The number of units of the second layer is (n ² + n + 1),
The information processing apparatus according to any one of appendices 1 to 6.

（付記８）
コンピュータが、
第１の層と、前記第１の層からの出力を受け付ける第２の層と、前記第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行し、
前記順伝播の計算の結果に基づき、前記多層ニューラルネットワークの逆伝播の計算を実行する、
処理を実行し、
前記第１の層の各ユニットからの出力が前記第２の層のユニットを経由して前記第３の層の全ユニットに入力され、
前記第２の層のユニットの数は、前記第１の層の各ユニットと前記第２の層との間のエッジの数および前記第３の層の各ユニットと前記第２の層との間のエッジの数より多い、
情報処理方法。 (Appendix 8)
Computer
Performing a forward propagation calculation of a multi-layer neural network including a first layer, a second layer receiving an output from the first layer, and a third layer receiving an output from the second layer ,
Performing back propagation calculations of the multilayer neural network based on the results of the forward propagation calculations;
Execute the process,
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
Information processing method.

（付記９）
コンピュータに、
第１の層と、前記第１の層からの出力を受け付ける第２の層と、前記第２の層からの出力を受け付ける第３の層とを含む多層ニューラルネットワークの順伝播の計算を実行し、
前記順伝播の計算の結果に基づき、前記多層ニューラルネットワークの逆伝播の計算を実行する、
処理を実行させ、
前記第１の層の各ユニットからの出力が前記第２の層のユニットを経由して前記第３の層の全ユニットに入力され、
前記第２の層のユニットの数は、前記第１の層の各ユニットと前記第２の層との間のエッジの数および前記第３の層の各ユニットと前記第２の層との間のエッジの数より多い、
プログラム。 (Appendix 9)
On the computer,
Performing a forward propagation calculation of a multi-layer neural network including a first layer, a second layer receiving an output from the first layer, and a third layer receiving an output from the second layer ,
Performing back propagation calculations of the multilayer neural network based on the results of the forward propagation calculations;
Let the process run,
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
program.

１，１ａ，１ｂ，１ｃ情報処理装置５ネットワーク
１０ａ，１０ｂ，１０ｃプロセッサ１１バス
１０１第１計算部１０３第２計算部
１０５通信部１１１データ格納部
１１３目標出力格納部１１５ワークデータ格納部 1, 1a, 1b, 1c Information processing device 5 Network 10a, 10b, 10c Processor 11 Bus 101 First calculation unit 103 Second calculation unit 105 Communication unit 111 Data storage unit 113 Target output storage unit 115 Work data storage unit

Claims

Perform forward propagation computation of a multi-layer neural network including a first layer, a second layer that accepts output from the first layer, and a third layer that accepts output from the second layer A first calculation unit;
A second calculation unit that performs a back propagation calculation of the multilayer neural network based on the result of the forward propagation calculation;
Have
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
Information processing device.

The second calculator is
A matrix of first weight errors corresponding to an edge between the first layer and the second layer by the back propagation calculation using the result of the forward propagation calculation and a target output; Calculating a second weight error matrix corresponding to an edge between the second layer and the third layer;
The first weight error matrix is converted into a first sparse matrix by masking the first weight error matrix and the second weight error matrix and the second weight error To a second sparse matrix,
Updating the first weight matrix based on the first sparse matrix;
Updating the second weight matrix based on the second sparse matrix;
The information processing apparatus according to claim 1.

In the mask process,
Performing a Kronecker product of the first weight error matrix and a mask matrix and a Kronecker product of the second weight error matrix and the mask matrix;
The information processing apparatus according to claim 2.

The first calculator is
Calculating a first matrix product of the output from the first layer and the first weight matrix;
Calculating an output from the third layer based on a second matrix product of the first matrix product result and the second weight matrix;
The information processing apparatus according to claim 2 or 3.

The result of the back propagation calculation is transmitted to another information processing apparatus in a parallel computing system including the information processing apparatus, and the result of the back propagation calculation executed in the other information processing apparatus is transmitted to the other information processing apparatus. The information processing apparatus according to claim 1, further comprising: a communication unit that receives from the information processing apparatus.

The number of edges between each unit of the first layer and the second layer, and the number of edges between each unit of the third layer and the second layer are (n (n is a natural number) ) +1),
The number of units of the second layer is (n ² + n + 1),
The information processing apparatus according to any one of claims 1 to 5.

Computer
Performing a forward propagation calculation of a multi-layer neural network including a first layer, a second layer receiving an output from the first layer, and a third layer receiving an output from the second layer ,
Performing back propagation calculations of the multilayer neural network based on the results of the forward propagation calculations;
Execute the process,
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
Information processing method.

On the computer,
Performing a forward propagation calculation of a multi-layer neural network including a first layer, a second layer receiving an output from the first layer, and a third layer receiving an output from the second layer ,
Performing back propagation calculations of the multilayer neural network based on the results of the forward propagation calculations;
Let the process run,
The output from each unit of the first layer is input to all units of the third layer via the unit of the second layer,
The number of units of the second layer includes the number of edges between each unit of the first layer and the second layer and between each unit of the third layer and the second layer. More than the number of edges,
program.