JPWO2020049637A1

JPWO2020049637A1 - Learning device

Info

Publication number: JPWO2020049637A1
Application number: JP2020540902A
Authority: JP
Inventors: 誠也柴田; 芙美代鷹野; 竹中　崇; 崇竹中; 浩明井上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2021-05-13
Also published as: WO2020049637A1

Abstract

学習装置100 は、Ｍ行Ｎ列（Ｍ、Ｎはそれぞれ１以上の整数）の行列が入力される演算部110 を備える学習装置であって、演算部110 は、複数の演算器を含み、複数の演算器は、演算部110 に行列が入力されると行列の各成分をそれぞれ読み込み、読み込まれた成分を対応する演算器にそれぞれ入力する。The learning device 100 is a learning device including a calculation unit 110 for inputting a matrix of M rows and N columns (M and N are integers of 1 or more each), and the calculation unit 110 includes a plurality of calculation units and a plurality of calculation units. When a matrix is input to the arithmetic unit 110, the arithmetic unit of the above reads each component of the matrix and inputs the read component to the corresponding arithmetic unit.

Description

本発明は、学習装置に関する。 The present invention relates to a learning device.

機械学習の普及が進み、時々刻々と変化する状況に対応するための更なる工夫が求められる。時々刻々と変化する状況に対応するためには、使用される環境で取得される多様な生データを学習用データとして学習に取り入れることが求められる。学習用データは、判別モデルの学習に用いられるデータである。 With the spread of machine learning, further ingenuity is required to cope with the ever-changing situation. In order to respond to the ever-changing situation, it is necessary to incorporate various raw data acquired in the environment in which it is used into learning as learning data. The training data is data used for training the discrimination model.

学習用データが用いられた学習（機械学習）では、例えば、学習用データが示す入力と出力の関係等に基づいて、所定の学習器で使用される演算式や判別式のパラメタが調整される。学習器は、例えば、データが入力されると、１つまたは複数のラベルに関する判別を行う判別モデルである。 In learning using learning data (machine learning), for example, parameters of arithmetic expressions and discrimination expressions used in a predetermined learning device are adjusted based on the relationship between input and output indicated by the learning data. .. The learner is, for example, a discriminant model that discriminates about one or more labels when data is input.

機械学習における演算資源と演算精度の関係として、例えば、非特許文献１には、ニューラルネットワークの深層学習を効率的に、特に低い消費電力で実行するための学習用演算回路および学習方法の例が記載されている。 As a relationship between arithmetic resources and arithmetic accuracy in machine learning, for example, Non-Patent Document 1 provides an example of a learning arithmetic circuit and a learning method for efficiently executing deep learning of a neural network with particularly low power consumption. Are listed.

また、非特許文献２には、ＣＮＮ（Convolutional Neural Network）における深層学習において、複数ある畳込み層を、重みが固定される層と重みが更新される層（拡張機能層）に分けて学習範囲を制限することによって学習時間の短縮を図る学習方法の例が記載されている。 Further, in Non-Patent Document 2, in deep learning in CNN (Convolutional Neural Network), a plurality of convolutional layers are divided into a layer in which weights are fixed and a layer in which weights are updated (extended function layer). An example of a learning method for shortening the learning time by limiting the above is described.

また、機械学習における学習演算用の回路構成の例として、非特許文献３には、ＦＰＧＡ（Field-Programmable Gate Array ）をベースとしたアクセラレータ設計の最適化例が記載されている。 Further, as an example of a circuit configuration for learning calculation in machine learning, Non-Patent Document 3 describes an optimization example of accelerator design based on FPGA (Field-Programmable Gate Array).

以下、学習方法の概略を説明する。図１０は、入力層と出力層との間に１つ以上の中間層を含むニューラルネットワークにおける一般的な学習方法および学習のための回路構成の例を示す説明図である。 The outline of the learning method will be described below. FIG. 10 is an explanatory diagram showing an example of a general learning method and a circuit configuration for learning in a neural network including one or more intermediate layers between an input layer and an output layer.

図１０に示す例では、汎用用途の学習アルゴリズムに対応するために、大規模学習回路70が、所定の判別モデルであるニューラルネットワーク全体を学習する。 In the example shown in FIG. 10, the large-scale learning circuit 70 learns the entire neural network, which is a predetermined discrimination model, in order to correspond to the learning algorithm for general-purpose use.

図１０に示す大規模学習回路70に付された吹き出しには、ニューラルネットワークの学習過程における処理の方向および処理の範囲が模式的に記載されている。吹き出し内において、ニューラルネットワークにおけるニューロンに相当するユニット71が楕円で表されている。 The balloon attached to the large-scale learning circuit 70 shown in FIG. 10 schematically describes the direction of processing and the range of processing in the learning process of the neural network. In the balloon, the unit 71 corresponding to the neuron in the neural network is represented by an ellipse.

また、線分72（図１０に示すユニット71間を結ぶ線）は、ユニット71間結合を表す。また、矢印73（図１０に示す右向きの太線矢印）は、推論処理および推論処理の範囲を表す。また、矢印74（図１０に示す左向きの太線矢印）は、パラメタ更新処理およびパラメタ更新処理の範囲を表す。なお、パラメタ更新処理は、学習処理の例である。 Further, the line segment 72 (the line connecting the units 71 shown in FIG. 10) represents the connection between the units 71. The arrow 73 (thick arrow pointing to the right in FIG. 10) represents the range of inference processing and inference processing. The arrow 74 (thick left-pointing arrow shown in FIG. 10) indicates the range of the parameter update process and the parameter update process. The parameter update process is an example of the learning process.

なお、図１０は、各ユニット71への入力が前段の層のユニット71の出力になるフィードフォワード型のニューラルネットワークの例を示す。例えば、時系列情報が保持されている場合、リカレント型のニューラルネットワークのように、各ユニット71への入力に、前の時刻における前段の層のユニット71の出力が含まれてもよい。 Note that FIG. 10 shows an example of a feedforward type neural network in which the input to each unit 71 is the output of the unit 71 in the previous layer. For example, when time series information is retained, the input to each unit 71 may include the output of the unit 71 of the previous layer at the previous time, as in a recurrent type neural network.

なお、各ユニット71への入力に前の時刻における前段の層のユニット71の出力が含まれる場合も、推論処理の方向は、入力層から出力層へと向かう方向（順方向）であるとみなされる。また、各ユニット71への入力は、上記の例に限定されない。 Even if the input to each unit 71 includes the output of the unit 71 of the previous layer at the previous time, the direction of the inference processing is considered to be the direction from the input layer to the output layer (forward direction). Is done. Further, the input to each unit 71 is not limited to the above example.

入力層から所定の順番で行われる推論処理は、「順伝搬」とも呼ばれる。一方、パラメタ更新処理の方向は、特に限定されない。例えば、図１０に示すパラメタ更新処理のように、パラメタ更新処理の方向は、出力層から入力層へと向かう方向（逆方向）でもよい。 The inference processing performed from the input layer in a predetermined order is also called "forward propagation". On the other hand, the direction of parameter update processing is not particularly limited. For example, as in the parameter update process shown in FIG. 10, the direction of the parameter update process may be the direction from the output layer to the input layer (reverse direction).

なお、図１０に示すパラメタ更新処理は、誤差逆伝搬法で実行される処理の例である。しかし、パラメタ更新処理は、誤差逆伝搬法で実行される処理に限定されない。例えば、パラメタ更新処理は、ＳＴＤＰ（Spike Timing Dependent Plasticity ）で実行されてもよい。 The parameter update process shown in FIG. 10 is an example of a process executed by the error back propagation method. However, the parameter update process is not limited to the process executed by the error back propagation method. For example, the parameter update process may be executed by STDP (Spike Timing Dependent Plasticity).

ニューラルネットワークに限らず、深層学習におけるモデルの学習方法の例として、次のような学習方法が挙げられる。まず、入力層に学習用データを入力した後、出力層までの各層で順方向に各ユニット71の出力を計算する推論処理を行う（順伝搬：図１０に示す矢印73参照）。 The following learning methods can be mentioned as examples of model learning methods in deep learning, not limited to neural networks. First, after inputting training data into the input layer, inference processing is performed in each layer up to the output layer to calculate the output of each unit 71 in the forward direction (forward propagation: see arrow 73 shown in FIG. 10).

次いで、出力層からの出力（最終出力）と学習用データが示す入力と出力の関係等とから算出される誤差に基づいて、層内の各ユニット71の出力を計算するために用いられるパラメタを更新するパラメタ更新処理を行う（逆伝搬：図１０に示す矢印74参照）。図１０に示すように、パラメタ更新処理は、出力層から第１層までの各層を逆方向に辿って行われる。また、パラメタ更新処理は、算出される誤差が最小になるように行われる。 Next, the parameters used to calculate the output of each unit 71 in the layer are set based on the error calculated from the output from the output layer (final output) and the relationship between the input and output indicated by the training data. Performs parameter update processing to be updated (backpropagation: see arrow 74 shown in FIG. 10). As shown in FIG. 10, the parameter update process is performed by tracing each layer from the output layer to the first layer in the opposite direction. Further, the parameter update process is performed so that the calculated error is minimized.

図１０に示すように、モデル全体が学習対象である場合、パラメタ更新処理で、入力層より後段の全ての層（第１層〜第ｎ層）における層内の各ユニット71の出力を計算するために用いられるパラメタが更新される。更新されるパラメタは、例えば、層内の各ユニット71と他の層のユニット71を結合するユニット71間結合の重みである。 As shown in FIG. 10, when the entire model is the learning target, the output of each unit 71 in all the layers (first layer to nth layer) after the input layer is calculated by the parameter update process. The parameters used for this are updated. The parameter to be updated is, for example, the weight of the bond between the units 71 that joins each unit 71 in the layer and the unit 71 in the other layer.

上記のようなパラメタ更新処理が、例えば学習用データが変更されながら複数回繰り返し実行されることによって、高い認識率を有する学習済みモデルが生成される。図１０は、学習を行う演算回路の実現例として、上記の推論処理とパラメタ更新処理とを高い演算精度で行う大規模学習回路70を示す。 A trained model having a high recognition rate is generated by repeatedly executing the above parameter update process a plurality of times while changing the learning data, for example. FIG. 10 shows a large-scale learning circuit 70 that performs the above inference processing and parameter update processing with high calculation accuracy as an example of realizing a calculation circuit that performs learning.

図１１は、１つのユニット71に着目したときのユニット71の入出力および他ユニット71との結合の例を示す説明図である。図１１（ａ）は、１つのユニット71の入出力の例を示す。また、図１１（ｂ）は、２層に並べられたユニット71間の結合の例を示す。 FIG. 11 is an explanatory diagram showing an example of input / output of the unit 71 and connection with another unit 71 when focusing on one unit 71. FIG. 11A shows an example of input / output of one unit 71. Further, FIG. 11B shows an example of bonding between the units 71 arranged in two layers.

図１１（ａ）に示すように、１つのユニット71に対して４つの入力（ｘ_１〜ｘ_４）と１つの出力（ｚ）が与えられた場合、ユニット71の動作は、例えば式（１Ａ）のように表される。As shown in FIG. 11A, when four inputs (x _{1 to} _{x 4} ) and one output (z) are given to one unit 71, the operation of the unit 71 is expressed by, for example, the equation (1A). ).

ｚ＝ｆ（ｕ）・・・式（１Ａ）
ただし、ｕ＝ａ＋ｗ_１ｘ_１＋ｗ_２ｘ_２＋ｗ_３ｘ_３＋ｗ_４ｘ_４・・・式（１Ｂ）z = f (u) ・・・ Equation (1A)
However, u = a + w ₁ x ₁ + w ₂ x ₂ + w ₃ x ₃ + w ₄ x ₄ ... Equation (1B)

なお、式（１Ａ）におけるｆ（）は、活性化関数を表す。また、式（１Ｂ）におけるａは、切片を表す。また、式（１Ｂ）におけるｗ_１〜ｗ_４は、各入力（ｘ_１〜ｘ_４）に対応した重み等のパラメタを表す。In addition, f () in the formula (1A) represents an activation function. Further, a in the formula (1B) represents an intercept. _{Further, w 1 to} _{w 4} in the equation (1B) represent parameters such as weights corresponding to each input (x _{1 to} _{x 4).}

一方、図１１（ｂ）に示すように、２層に並べられた層間で各ユニット71が結合されている場合、後段の層に着目すると、層内の各ユニット71への入力（それぞれｘ_１〜ｘ_４）に対する各ユニット71の出力（ｚ_１〜ｚ_４）は、例えば、次のように表される。On the other hand, as shown in FIG. 11B, when each unit 71 is connected between the layers arranged in two layers, when focusing on the subsequent layer, the input to each unit 71 in the layer (each x _1). _{The output (z 1 to} _{z 4} ) of each unit 71 with respect to ~ x ₄ ) is expressed, for example, as follows.

ｚ_ｉ＝ｆ（ｕ_ｉ）・・・式（２Ａ）
ただし、ｕ_ｉ＝ａ＋ｗ_ｉ，１ｘ_１＋ｗ_ｉ，２ｘ_２＋ｗ_ｉ，３ｘ_３＋ｗ_ｉ，４ｘ_４・・・式（２Ｂ） _{_{z i = f (u i)}} ··· formula (2A)
_{_{_{_{However, u i = a + w i}}}} , 1 x 1 + w i, 2 x 2 + w i, 3 x 3 + w i, 4 x 4 ··· formula (2B)

なお、式（２Ａ）におけるｉは、同一層内のユニット71の識別子（本例ではｉ＝１〜３）である。また、式（２Ｂ）における切片ａを、値１の定数項の係数（すなわち、パラメタの１つ）とみなすことも可能である。 Note that i in the formula (2A) is an identifier of the unit 71 in the same layer (i = 1 to 3 in this example). It is also possible to regard the intercept a in the equation (2B) as a coefficient (that is, one of the parameters) of the constant term of the value 1.

以下では、式（２Ｂ）を単純化して、
ｕ_ｉ＝Σ（ｗ_ｉ，ｋ＊ｘ_ｋ）・・・式（２Ｃ）
と記す場合がある。なお、式（２Ｃ）において、切片ａは省略されている。また、式（２Ｃ）におけるｋは、層における各ユニット71への入力、より具体的には入力を行う他のユニット71の識別子を表す。In the following, equation (2B) is simplified.
u _i = Σ (wi _{, k} * x _k ) ・・・ Equation (2C)
May be written as. In the formula (2C), the intercept a is omitted. Further, k in the equation (2C) represents an input to each unit 71 in the layer, more specifically, an identifier of another unit 71 that performs the input.

また、層における各ユニット71への入力が前段の層の各ユニット71の出力のみである場合、上記の簡略式を、
ｕ_ｉ ^（Ｌ）＝Σ（ｗ_ｉ，ｋ ^（Ｌ）＊ｘ_ｋ ^{（Ｌ−１）}）・・・式（２Ｄ）
と記すことも可能である。If the input to each unit 71 in the layer is only the output of each unit 71 in the previous layer, the above simplified formula may be used.
u _i ^(L) = Σ (wi _{, k} ^(L) * x _k ^(L-1) ) ・・・ Equation (2D)
It is also possible to write.

なお、式（２Ｄ）におけるＬは、層の識別子を表す。また、式（２Ｄ）におけるｗ_ｉ，ｋは、第Ｌ層における各ユニットｉのパラメタを表す。より具体的には、ｗ_ｉ，ｋは、各ユニットｉと他のユニットｋとの結合（ユニット71間結合）の重みに相当する。In addition, L in the formula (2D) represents the identifier of the layer. _{Further, wi and k} in the equation (2D) represent the parameters of each unit i in the L layer. More specifically, wi _{and k} correspond to the weight of the connection between each unit i and the other unit k (connection between units 71).

以下、ユニット71を特に区別せず、ユニット71の出力値を決める関数（活性化関数）を簡略化して、ｚ＝Σ（ｗ＊ｘ）と記す場合がある。 Hereinafter, the unit 71 is not particularly distinguished, and the function (activation function) that determines the output value of the unit 71 may be simplified and described as z = Σ (w * x).

上記の重みの集合は、ベクトル形式で以下のように記載される。 The above set of weights is described in vector form as follows.

ｗ_ｉ＝［ｗ_ｉ，１，ｗ_ｉ，２，・・・，ｗ_ｉ，ｋ］^Ｔ・・・式（３）w _i = [wi _{, 1} , wi _{, 2} , ..., wi _{, k} ] ^T ... Equation (3)

式（３）を、重みベクトルと呼ぶ。また、ある層の入力の集合である入力ベクトルｘ＝［ｘ_１，ｘ_２，・・・，ｘ_ｋ］^Ｔ、重みベクトルを横に連結した重み行列をＷとすると、出力ベクトルｚはｆ（Ｗ^Ｔｘ）で表される。なお、出力ベクトルｚと活性化関数との間に、以下の関係が成り立つ。Equation (3) is called a weight vector. _{Further, if the input vector x = [x 1} , x ₂ , ..., X _k ] ^T , which is a set of inputs of a certain layer, and the weight matrix obtained by horizontally connecting the weight vectors is W, the output vector z is f ( represented by W ^T x). The following relationship holds between the output vector z and the activation function.

ｚ＝ｆ（ｕ）＝［ｆ（ｕ_１），ｆ（ｕ_２），・・・，ｆ（ｕ_ｎ）］・・・式（４） _{z = f (u) = [} f (u 1), f (u 2), ···, f (u n)] ··· Equation (4)

上記の例において、あるユニット71が入力ｘから出力ｚを求める計算が、ユニット71における推論処理に相当する。推論処理においてパラメタ（例えば、重みｗ）は固定される。推論処理は、例えば、運用中の監視システム等で、画像中の物体が特定の物体であるか否かを判定するために実行される処理である。一方、ユニット71のパラメタを求める計算が、ユニット71におけるパラメタ更新処理に相当する。 In the above example, the calculation in which a certain unit 71 obtains the output z from the input x corresponds to the inference processing in the unit 71. Parameters (for example, weight w) are fixed in the inference process. The inference process is, for example, a process executed in an operating monitoring system or the like to determine whether or not an object in an image is a specific object. On the other hand, the calculation for obtaining the parameters of the unit 71 corresponds to the parameter update process in the unit 71.

図１２に、推論処理を行う推論装置の例を示す。図１２は、一般的な推論装置の構成例を示すブロック図である。図１２に示す推論装置80は、重みメモリ81と、重みロード部82と、演算部83とを備える。 FIG. 12 shows an example of an inference device that performs inference processing. FIG. 12 is a block diagram showing a configuration example of a general inference device. The inference device 80 shown in FIG. 12 includes a weight memory 81, a weight loading unit 82, and a calculation unit 83.

重みメモリ81は、重み行列Ｗを記憶する機能を有する。重みロード部82は、重みメモリ81に記憶されている重み行列Ｗを重みメモリ81からロードする機能を有する。 The weight memory 81 has a function of storing the weight matrix W. The weight loading unit 82 has a function of loading the weight matrix W stored in the weight memory 81 from the weight memory 81.

重みロード部82は、ロードされた重み行列Ｗを演算部83に入力する。演算部83は、入力された重み行列Ｗを用いて、上記の推論処理を行う機能を有する。 The weight loading unit 82 inputs the loaded weight matrix W to the calculation unit 83. The calculation unit 83 has a function of performing the above inference processing using the input weight matrix W.

次に、図１３に、パラメタ更新処理を行う学習装置の例を示す。図１３は、一般的な学習装置の構成例を示すブロック図である。図１３に示す学習装置90は、重みメモリ91と、重みロード部92と、演算部93と、重みストア部94とを備える。 Next, FIG. 13 shows an example of a learning device that performs parameter update processing. FIG. 13 is a block diagram showing a configuration example of a general learning device. The learning device 90 shown in FIG. 13 includes a weight memory 91, a weight loading unit 92, a calculation unit 93, and a weight store unit 94.

重みメモリ91は、重み行列Ｗを記憶する機能を有する。重みロード部92は、重みメモリ91に記憶されている重み行列Ｗを重みメモリ91からロードする機能を有する。 The weight memory 91 has a function of storing the weight matrix W. The weight loading unit 92 has a function of loading the weight matrix W stored in the weight memory 91 from the weight memory 91.

重みロード部92は、ロードされた重み行列Ｗを演算部93に入力する。演算部93は、入力された重み行列Ｗを用いて、上記のパラメタ更新処理を行う機能を有する。 The weight loading unit 92 inputs the loaded weight matrix W to the calculation unit 93. The calculation unit 93 has a function of performing the above parameter update process using the input weight matrix W.

演算部93は、パラメタ更新処理で更新された重み行列Ｗを、重みストア部94に入力する。重みストア部94は、演算部93により更新された重み行列Ｗを重みメモリ91に書き込む機能を有する。 The calculation unit 93 inputs the weight matrix W updated in the parameter update process to the weight store unit 94. The weight store unit 94 has a function of writing the weight matrix W updated by the calculation unit 93 to the weight memory 91.

具体的には、重みストア部94は、重みメモリ91に記憶されている重み行列Ｗを、入力された重み行列Ｗに更新する。なお、重み行列Ｗの書き込みにあたり、重みストア部94は、重み行列Ｗを一時的に保存する機能を有してもよい。 Specifically, the weight store unit 94 updates the weight matrix W stored in the weight memory 91 to the input weight matrix W. When writing the weight matrix W, the weight store unit 94 may have a function of temporarily storing the weight matrix W.

また、推論処理および学習処理を実行する装置の他の例として、特許文献１には、ハードウェアを増やすことなくニューラルネットワークの推論および学習の計算を高速に行うニューロプロセッサが記載されている。 Further, as another example of an apparatus that executes inference processing and learning processing, Patent Document 1 describes a neuroprocessor that performs inference and learning calculations of a neural network at high speed without increasing hardware.

特開平５−３４６９１４号公報Japanese Unexamined Patent Publication No. 5-346914

Y.H.Chen, et.al., "Eyeriss: an Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", in IEEE Jornal of Slid-State Circuits, vol.52, no.1, Jan. 2017, pp.127-138.YHChen, et.al., "Eyeriss: an Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", in IEEE Jornal of Slid-State Circuits, vol.52, no.1, Jan. 2017, pp.127-138 .. Wei. Liu, et.al., "SSD: Single shot MultiBox Detector", arXiv:1512.02325v5, Dec. 2016.Wei. Liu, et.al., "SSD: Single shot MultiBox Detector", arXiv: 1512.02325v5, Dec. 2016. Chen Zhang, et.al., "Optimizing FPGA-based Accelerator Design for Deep convolutional Neural Networks", In ACM FPGA 2015, pp.160-170.Chen Zhang, et.al., "Optimizing FPGA-based Accelerator Design for Deep convolutional Neural Networks", In ACM FPGA 2015, pp.160-170.

上記の学習処理において、重み行列Ｗの各成分は、並べ替えられて使用される。例えば、重み行列Ｗの転置行列Ｗ^Ｔ等の、各成分の配置が入れ替えられた行列が使用される。In the above learning process, each component of the weight matrix W is rearranged and used. For example, such transposed matrix W ^T of the weight matrix W, disposed was replaced matrix of each component are used.

図１３に示す重みロード部92は、転置行列Ｗ^Ｔを生成するために、例えば、重みメモリ91に記憶されている重み行列Ｗの各成分を行ごとに読み出し、読み出された各成分を並べ替える作業を繰り返し実行する。Weight loading unit 92 shown in FIG. 13, arranged to generate a transposed matrix W ^T, for example, reads out the components of the weight matrix W stored in the weight memory 91 for each row, each read component Repeat the work of changing.

しかし、重みメモリ91からの重み行列Ｗのロードおよび各成分の並べ替えを繰り返し実行する方法では、消費される電力が高くなる。また、各成分の並べ替えに係る時間も長くなる。 However, the method of repeatedly executing the loading of the weight matrix W from the weight memory 91 and the rearrangement of each component consumes a large amount of power. In addition, the time required for rearranging each component becomes longer.

また、重みメモリ91側で重み行列Ｗの各成分が並べ替えられても、重みロード部92が演算部93に行列を転送する際に電力が消費される。特許文献１、および非特許文献１〜３には、消費電力の低い行列の各成分の並べ替え方法が記載されていない。 Further, even if each component of the weight matrix W is rearranged on the weight memory 91 side, power is consumed when the weight loading unit 92 transfers the matrix to the calculation unit 93. Patent Document 1 and Non-Patent Documents 1 to 3 do not describe a method for rearranging each component of a matrix having low power consumption.

［発明の目的］
そこで、本発明は、上述した課題を解決する、低消費電力で行列の各成分を並べ替えることができる学習装置を提供することを目的とする。[Purpose of Invention]
Therefore, an object of the present invention is to provide a learning device capable of rearranging each component of a matrix with low power consumption, which solves the above-mentioned problems.

本発明による学習装置は、Ｍ行Ｎ列（Ｍ、Ｎはそれぞれ１以上の整数）の行列が入力される演算部を備える学習装置であって、演算部は、複数の演算器を含み、複数の演算器は、演算部に行列が入力されると行列の各成分をそれぞれ読み込み、読み込まれた成分を対応する演算器にそれぞれ入力することを特徴とする。 The learning device according to the present invention is a learning device including a calculation unit for inputting a matrix of M rows and N columns (M and N are integers of 1 or more each), and the calculation unit includes a plurality of calculation units and a plurality of calculation units. When a matrix is input to the arithmetic unit, each component of the matrix is read, and the read component is input to the corresponding arithmetic unit.

本発明によれば、低消費電力で行列の各成分を並べ替えることができる。 According to the present invention, each component of the matrix can be rearranged with low power consumption.

本発明による学習装置の第１の実施形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of the 1st Embodiment of the learning apparatus by this invention. 第１の実施形態の演算部1300の構成例を示すブロック図である。It is a block diagram which shows the structural example of the arithmetic unit 1300 of 1st Embodiment. 第１の実施形態の演算部1300の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the arithmetic unit 1300 of 1st Embodiment. 第１の実施形態の演算部1300の他の構成例を示すブロック図である。It is a block diagram which shows the other structural example of the arithmetic unit 1300 of 1st Embodiment. 第１の実施形態の演算部1300による転置行列生成処理の動作を示すフローチャートである。It is a flowchart which shows the operation of the transposed matrix generation processing by the arithmetic unit 1300 of 1st Embodiment. 第２の実施形態の演算部1300の構成例を示すブロック図である。It is a block diagram which shows the structural example of the arithmetic unit 1300 of 2nd Embodiment. 第２の実施形態の演算部1300による180 度回転行列生成処理の動作を示すフローチャートである。It is a flowchart which shows the operation of the 180 degree rotation matrix generation processing by the arithmetic unit 1300 of 2nd Embodiment. 本発明による学習装置1000のハードウェア構成例を示す説明図である。It is explanatory drawing which shows the hardware configuration example of the learning apparatus 1000 by this invention. 本発明による学習装置の概要を示すブロック図である。It is a block diagram which shows the outline of the learning apparatus by this invention. 入力層と出力層との間に１つ以上の中間層を含むニューラルネットワークにおける一般的な学習方法および学習のための回路構成の例を示す説明図である。It is explanatory drawing which shows the example of the general learning method in the neural network which includes one or more intermediate layers between an input layer and an output layer, and the circuit structure for learning. １つのユニット71に着目したときのユニット71の入出力および他ユニット71との結合の例を示す説明図である。It is explanatory drawing which shows the example of the input / output of a unit 71 and the coupling with another unit 71 when paying attention to one unit 71. 一般的な推論装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a general inference device. 一般的な学習装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of a general learning apparatus.

実施形態１．
［構成の説明］
以下、本発明の実施形態を、図面を参照して説明する。図１は、本発明による学習装置の第１の実施形態の構成例を示すブロック図である。Embodiment 1.
[Description of configuration]
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a first embodiment of the learning device according to the present invention.

図１に示すように、学習装置1000は、重みメモリ1100と、重みロード部1200と、演算部1300と、重みストア部1400とを備える。 As shown in FIG. 1, the learning device 1000 includes a weight memory 1100, a weight loading unit 1200, a calculation unit 1300, and a weight store unit 1400.

なお、各ブロック図に記載されている単方向の矢印は、データが流れる方向を示す。しかし、各矢印が記載されている箇所において双方向にデータが流れる可能性は排除されていない。 The unidirectional arrow shown in each block diagram indicates the direction in which the data flows. However, the possibility that data flows in both directions at the points where each arrow is described is not excluded.

低消費電力で重み行列Ｗの各成分を並べ替えるために、演算部1300が、成分の内部転送が可能な配線を有する演算器群を含むことが解決手段として考えられる。 In order to rearrange each component of the weight matrix W with low power consumption, it is conceivable that the calculation unit 1300 includes a group of calculation units having wiring capable of internally transferring the components.

重みロード部1200が重み行列Ｗの各成分を行ごとに読み出し各成分を並べ替える作業に比べて、成分を読み込んだ各演算器が成分を交換することによって各成分を並べ替える作業の方が、消費される電力が低い。以下、低消費電力で重み行列Ｗの各成分を並べ替える学習装置1000の各構成要素の機能を説明する。 Compared to the work in which the weight loading unit 1200 reads out each component of the weight matrix W line by row and rearranges each component, the work in which each arithmetic unit that reads the component rearranges each component by exchanging the components is better. Low power consumption. Hereinafter, the functions of each component of the learning device 1000 that rearranges each component of the weight matrix W with low power consumption will be described.

重みメモリ1100は、推論処理および学習処理に使用される重み行列Ｗ（パラメタ群）を記憶する機能を有する。なお、各ユニット71の重みが、本実施形態における各ユニット71のパラメタである。また、判別モデルは、例えばニューラルネットワークである。 The weight memory 1100 has a function of storing a weight matrix W (parameter group) used for inference processing and learning processing. The weight of each unit 71 is a parameter of each unit 71 in the present embodiment. The discrimination model is, for example, a neural network.

重みロード部1200は、重みメモリ1100から重み行列Ｗをロードする機能を有する。推論処理と学習処理のいずれが行われる時であっても、重みロード部1200は、重みメモリ1100から重み行列Ｗをそのままロードする。重みロード部1200は、ロードされた重み行列Ｗを演算部1300に入力する。 The weight loading unit 1200 has a function of loading the weight matrix W from the weight memory 1100. Regardless of whether the inference process or the learning process is performed, the weight loading unit 1200 loads the weight matrix W as it is from the weight memory 1100. The weight loading unit 1200 inputs the loaded weight matrix W to the calculation unit 1300.

演算部1300は、重みメモリ1100からロードされた重み行列Ｗを用いて上記の推論処理、または上記の学習処理を行う機能を有する。 The arithmetic unit 1300 has a function of performing the above inference processing or the above learning processing using the weight matrix W loaded from the weight memory 1100.

具体的には、演算部1300は、１つ以上のユニット71でそれぞれ構成された複数の層が層状に結合された判別モデルの各ユニット71の判別用データに対する出力を所定の順番で計算する推論処理を実行する。また、演算部1300は、各ユニット71の学習用データに対する出力に基づいて各ユニット71の重みの少なくとも一部を更新する学習処理を実行する。 Specifically, the calculation unit 1300 is an inference that calculates the output for the discrimination data of each unit 71 of the discrimination model in which a plurality of layers each composed of one or more units 71 are connected in a layered manner in a predetermined order. Execute the process. Further, the arithmetic unit 1300 executes a learning process of updating at least a part of the weights of each unit 71 based on the output of the learning data of each unit 71.

演算部1300は、学習処理で更新された重み行列Ｗを、重みストア部1400に入力する。重みストア部1400は、演算部1300により更新された重み行列Ｗを重みメモリ1100に書き込む機能を有する。 The calculation unit 1300 inputs the weight matrix W updated in the learning process to the weight store unit 1400. The weight store unit 1400 has a function of writing the weight matrix W updated by the calculation unit 1300 to the weight memory 1100.

具体的には、重みストア部1400は、重みメモリ1100に記憶されている重み行列Ｗを、入力された重み行列Ｗに更新する。なお、重み行列Ｗの書き込みにあたり、重みストア部1400は、重み行列Ｗを一時的に保存する機能を有してもよい。 Specifically, the weight store unit 1400 updates the weight matrix W stored in the weight memory 1100 to the input weight matrix W. When writing the weight matrix W, the weight store unit 1400 may have a function of temporarily storing the weight matrix W.

すなわち、重みストア部1400は、学習処理における各ユニット71の更新対象の重み（重み行列Ｗ）を重みメモリ1100に格納する。重みストア部1400が重み行列Ｗを重みメモリ1100に格納することによって、次の推論処理および学習処理では更新された重み行列Ｗが使用される。 That is, the weight store unit 1400 stores the weight (weight matrix W) to be updated of each unit 71 in the learning process in the weight memory 1100. By storing the weight matrix W in the weight memory 1100 by the weight store unit 1400, the updated weight matrix W is used in the next inference processing and learning processing.

図２は、第１の実施形態の演算部1300の構成例を示すブロック図である。図２に示すように、演算部1300は、演算器1301〜1309と、第１重みレジスタ1311〜1319と、第２重みレジスタ1321〜1329とを含む。 FIG. 2 is a block diagram showing a configuration example of the calculation unit 1300 of the first embodiment. As shown in FIG. 2, the arithmetic unit 1300 includes arithmetic units 1301 to 1309, first weight registers 1311 to 1319, and second weight registers 1321 to 1329.

本実施形態の演算部1300は、９つの演算器を含む。なお、演算部1300が含む演算器の数は、９つに限られない。演算部1300は、行列形式で配置されている複数の演算器を含んでいればよい。本実施形態の演算部1300は、３行３列の行列形式で配置されている演算器を含む。 The arithmetic unit 1300 of the present embodiment includes nine arithmetic units. The number of arithmetic units included in the arithmetic unit 1300 is not limited to nine. The arithmetic unit 1300 may include a plurality of arithmetic units arranged in a matrix format. The calculation unit 1300 of the present embodiment includes a calculation unit arranged in a matrix format of 3 rows and 3 columns.

図２に示すように、演算器1301は、第１重みレジスタ1311と、第２重みレジスタ1321と共に演算部1300に配置されている。他の演算器も同様に、第１重みレジスタと、第２重みレジスタと共に演算部1300に配置されている。 As shown in FIG. 2, the arithmetic unit 1301 is arranged in the arithmetic unit 1300 together with the first weight register 1311 and the second weight register 1321. Similarly, other arithmetic units are arranged in the arithmetic unit 1300 together with the first weight register and the second weight register.

第１重みレジスタには、演算部1300に重み行列Ｗが入力された時、各演算器が読み込んだ重みが格納される。各演算器は、自身の配置に対応する重み行列Ｗの成分を読み込む。 The weights read by each arithmetic unit when the weight matrix W is input to the arithmetic unit 1300 are stored in the first weight register. Each arithmetic unit reads a component of the weight matrix W corresponding to its own arrangement.

具体的には、演算部1300内で上からｍ（ｍは１以上３以下の整数）番目、左からｎ（ｎは１以上３以下の整数）番目に配置されている演算器は、演算部1300に重み行列Ｗが入力されると重み行列Ｗの（ｍ,ｎ）成分を読み込む。 Specifically, the arithmetic unit arranged in the m (m is an integer of 1 or more and 3 or less) th from the top and n (n is an integer of 1 or more and 3 or less) from the left in the arithmetic unit 1300 is the arithmetic unit. When the weight matrix W is input to 1300, the (m, n) component of the weight matrix W is read.

図２に示す例であれば、上から１番目、左から１番目に配置されている演算器1301は、重み行列Ｗの(1,1) 成分である重みw₁を読み込む。第１重みレジスタ1311には、演算器1301が読み込んだ重みw₁が格納される。各演算器は、第１重みレジスタに格納されている重みを用いて推論処理および学習処理を実行する。In the example shown in FIG. 2, the arithmetic unit 1301 arranged first from the top and first from the left reads the _{weight w 1, which is the (1,1) component of the weight matrix W.} _{The weight w 1} read by the arithmetic unit 1301 is stored in the first weight register 1311. Each arithmetic unit executes inference processing and learning processing using the weights stored in the first weight register.

また、図２に示すように、演算部1300は、演算器1302と演算器1304との間で、演算器1303と演算器1307との間で、および演算器1306と演算器1308との間でそれぞれデータが転送可能に構成されている。具体的には、図２に示すように、演算器同士が配線で接続されている。 Further, as shown in FIG. 2, the arithmetic unit 1300 is used between the arithmetic unit 1302 and the arithmetic unit 1304, between the arithmetic unit 1303 and the arithmetic unit 1307, and between the arithmetic unit 1306 and the arithmetic unit 1308. Each data is configured to be transferable. Specifically, as shown in FIG. 2, the arithmetic units are connected to each other by wiring.

学習処理では、重み行列Ｗの転置行列Ｗ^Ｔが使用される。図２に示す例であれば、重み行列Ｗの(1,2) 成分である重みw₂は、転置行列Ｗ^Ｔでは(2,1) 成分になる。また、重み行列Ｗの(2,1) 成分である重みw₄は、転置行列Ｗ^Ｔでは(1,2) 成分になる。In the learning process, the transposed matrix W ^T of the weight matrix W is used. In the example shown in FIG. 2, the weight w ₂ is (1,2) component of the weight matrix W becomes the transposed matrix W ^T (2,1) component. The weight w ₄ is a (2,1) component of the weight matrix W becomes a transposed matrix W, ^T (1, 2) component.

よって、学習処理が実行される場合、演算器1302と演算器1304との間で重みが交換される。具体的には、演算器1302は、第１重みレジスタ1312に格納された重みw₂を演算器1304の第２重みレジスタ1324に書き込む。Therefore, when the learning process is executed, the weights are exchanged between the arithmetic unit 1302 and the arithmetic unit 1304. _{Specifically, the arithmetic unit 1302 writes the weight w 2} stored in the first weight register 1312 to the second weight register 1324 of the arithmetic unit 1304.

また、演算器1304は、第１重みレジスタ1314に格納された重みw₄を演算器1302の第２重みレジスタ1322に書き込む。同様に、演算器1303と演算器1307との間、および演算器1306と演算器1308との間でもそれぞれ重みが交換される。 _{Further, the arithmetic unit 1304 writes the weight w 4} stored in the first weight register 1314 to the second weight register 1322 of the arithmetic unit 1302. Similarly, weights are exchanged between the arithmetic unit 1303 and the arithmetic unit 1307, and between the arithmetic unit 1306 and the arithmetic unit 1308, respectively.

なお、図２に示す構成例では２つの演算器の間に１本しか配線が存在しないため、重みの交換は、例えば時分割多重化方式に従って行われる。 In the configuration example shown in FIG. 2, since there is only one wiring between the two arithmetic units, the weight exchange is performed according to, for example, a time division multiplexing method.

時分割多重化方式に従って重みが交換される場合、演算器1302が重みw₂を第２重みレジスタ1324に書き込んだ後、演算器1304が重みw₄を第２重みレジスタ1322に書き込む。同様に、演算器1303と演算器1307、および演算器1306と演算器1308も、時分割多重化方式に従って重みを交換する。When the weights are exchanged according to the time division multiplexing method, the arithmetic unit 1302 _{writes the weight w 2} to the second weight register 1324, and then the arithmetic unit 1304 writes the weight w ₄ to the second weight register 1322. Similarly, the arithmetic unit 1303 and the arithmetic unit 1307, and the arithmetic unit 1306 and the arithmetic unit 1308 also exchange weights according to the time division multiplexing method.

重みが交換された後、演算器1301、演算器1305、および演算器1309を除く各演算器は、第２重みレジスタに書き込まれた重みと第１重みレジスタに格納されている重みとを入れ替える。 After the weights are exchanged, each arithmetic unit except the arithmetic unit 1301, the arithmetic unit 1305, and the arithmetic unit 1309 replaces the weight written in the second weight register with the weight stored in the first weight register.

例えば、演算器1302は、第２重みレジスタ1322に書き込まれた重みw₄を第１重みレジスタ1312に書き込む。また、演算器1302は、第１重みレジスタ1312に格納されている重みw₂を第２重みレジスタ1322に書き込む。For example, the arithmetic unit 1302 _{writes the weight w 4} written in the second weight register 1322 to the first weight register 1312. _{Further, the arithmetic unit 1302 writes the weight w 2} stored in the first weight register 1312 to the second weight register 1322.

以上の処理を実行することによって、演算部1300は、重み行列Ｗの転置行列Ｗ^Ｔを仮想的に生成する。すなわち、各第１重みレジスタには、図２に示す転置行列Ｗ^Ｔの配列通りに各重みが格納される。よって、演算部1300は、転置行列Ｗ^Ｔを用いて学習処理を実行できる。By executing the above processing, the arithmetic unit 1300 generates a transposed matrix W ^T of the weight matrix W virtually. In other words, each first weight registers, each weight is stored in the array as the transposed matrix W ^T shown in FIG. Therefore, calculation unit 1300 may perform a learning process using the transposed matrix W ^T.

図３は、第１の実施形態の演算部1300の他の構成例を示すブロック図である。図３に示す構成例のように、演算器1302と演算器1304との間、演算器1303と演算器1307との間、および演算器1306と演算器1308との間にはそれぞれ、２本の配線が存在してもよい。 FIG. 3 is a block diagram showing another configuration example of the calculation unit 1300 of the first embodiment. As shown in the configuration example shown in FIG. 3, there are two, respectively, between the arithmetic unit 1302 and the arithmetic unit 1304, between the arithmetic unit 1303 and the arithmetic unit 1307, and between the arithmetic unit 1306 and the arithmetic unit 1308. Wiring may be present.

図３に示す２本の配線で接続されている２つの演算器は、重みの交換をそれぞれ同時に実行できる。例えば、演算器1302が重みw₂を第２重みレジスタ1324に書き込む作業と、演算器1304が重みw₄を第２重みレジスタ1322に書き込む作業は、同時に実行可能である。The two arithmetic units connected by the two wires shown in FIG. 3 can exchange weights at the same time. For example, the operation of the arithmetic unit 1302 _{writing the weight w 2} to the second weight register 1324 and the operation of the arithmetic unit 1304 writing the weight w ₄ to the second weight register 1322 can be executed at the same time.

図２〜３に示す構成例では重みを交換する演算器同士が直接結ばれているため、各重みは、１回だけ転送される。すなわち、各重みの並べ替えで消費される電力が最低になる。 In the configuration examples shown in FIGS. 2 to 3, since the arithmetic units that exchange weights are directly connected to each other, each weight is transferred only once. That is, the power consumed by the rearrangement of each weight is the lowest.

図４は、第１の実施形態の演算部1300の他の構成例を示すブロック図である。図２〜３に示す構成例と異なり、図４に示す構成例では、重みを交換する演算器同士が直接結ばれていない。 FIG. 4 is a block diagram showing another configuration example of the calculation unit 1300 of the first embodiment. Unlike the configuration examples shown in FIGS. 2 to 3, in the configuration example shown in FIG. 4, the arithmetic units that exchange weights are not directly connected to each other.

図４に示す各演算器は、以下のように各重みを交換する。例えば、演算器1302は、最初に重みw₂を演算器1301の第２重みレジスタ1321に書き込む。次いで、演算器1301は、書き込まれた重みw₂を演算器1304の第２重みレジスタ1324に書き込む。Each arithmetic unit shown in FIG. 4 exchanges each weight as follows. For example, the arithmetic unit 1302 first _{writes the weight w 2} to the second weight register 1321 of the arithmetic unit 1301. Next, the arithmetic unit 1301 _{writes the written weight w 2} to the second weight register 1324 of the arithmetic unit 1304.

重みw₂が第２重みレジスタ1324に書き込まれた後、演算器1304は、重みw₄を演算器1301の第２重みレジスタ1321に書き込む。次いで、演算器1301は、書き込まれた重みw₄を演算器1302の第２重みレジスタ1322に書き込む。After the weight w ₂ is written to the second weight register 1324, the arithmetic unit 1304 _{writes the weight w 4} to the second weight register 1321 of the arithmetic unit 1301. Next, the arithmetic unit 1301 _{writes the written weight w 4} to the second weight register 1322 of the arithmetic unit 1302.

上記のように、図４に示す各演算器は、他の演算器を介して宛先の演算器に重みを入力する。図４に示すように、演算器1302および演算器1304と実線の矢印で結ばれた演算器1301を介して、重みw₂と重みw₄が交換される。As described above, each arithmetic unit shown in FIG. 4 inputs a weight to the destination arithmetic unit via another arithmetic unit. _{As shown in FIG. 4, the weight w 2} and the weight w ₄ are exchanged via the arithmetic unit 1302 and the arithmetic unit 1304 connected by the solid arrow.

同様に、演算器1303および演算器1307と破線の矢印で結ばれた演算器1302、演算器1304、演算器1305を介して、重みw₃と重みw₇が交換される。また、演算器1306および演算器1308と太線の矢印で結ばれた演算器1305を介して、重みw₆と重みw₈が交換される。 _{Similarly, the weight w 3} and the weight w ₇ are exchanged via the arithmetic unit 1302, the arithmetic unit 1304, and the arithmetic unit 1305 connected to the arithmetic unit 1303 and the arithmetic unit 1307 by a broken arrow. _{In addition, the weight w 6} and the weight w ₈ are exchanged via the arithmetic unit 1306 and the arithmetic unit 1308 connected to the arithmetic unit 1308 by a thick arrow.

なお、各重みは、図４に示す経路以外の経路を介して交換されてもよい。また、図３に示す構成例のように、各演算器の間に配線（図４に示す矢印に相当）が２本存在していてもよい。 The weights may be exchanged via a route other than the route shown in FIG. Further, as in the configuration example shown in FIG. 3, two wires (corresponding to the arrows shown in FIG. 4) may exist between each arithmetic unit.

図４に示すように、各演算器が重みの転送先を自在に設定できれば、すなわち各演算器がルーティング能力を有していれば、演算器間での重みのやり取りがより柔軟に実行される。 As shown in FIG. 4, if each arithmetic unit can freely set the transfer destination of the weight, that is, if each arithmetic unit has a routing ability, the exchange of weights between the arithmetic units is executed more flexibly. ..

本実施形態のＭ行Ｎ列（Ｍ、Ｎはそれぞれ１以上の整数）の行列が入力される演算部1300が含む各演算器の動作は、以下のように一般化されて記載される。ｍを１以上Ｍ以下の整数、ｎを１以上Ｎ以下の整数とするとき、本実施形態の演算部1300が含む複数の演算器のうちの上からｍ番目、左からｎ番目の各演算器は、演算部1300にＭ行Ｎ列の行列が入力されると行列の（ｍ，ｎ）成分をそれぞれ読み込む。 The operation of each arithmetic unit included in the arithmetic unit 1300 in which the matrix of M rows and N columns (M and N are integers of 1 or more) of the present embodiment is generalized and described as follows. When m is an integer of 1 or more and M or less and n is an integer of 1 or more and N or less, each of the m-th and n-th arithmetic units from the top among the plurality of arithmetic units included in the arithmetic unit 1300 of the present embodiment. Reads the (m, n) components of the matrix when the matrix of M rows and N columns is input to the arithmetic unit 1300.

本実施形態では、上からｍ番目、左からｎ番目（ｍ≠ｎ）の演算器と、上からｎ番目、左からｍ番目の演算器とが対応する。すなわち、行列の（ｍ，ｎ）成分（ｍ≠ｎ）を読み込んだ演算器と、行列の（ｎ，ｍ）成分を読み込んだ演算器とが対応する。各演算器は、読み込まれた成分を対応する演算器にそれぞれ入力する。 In the present embodiment, the m-th and n-th (m ≠ n) arithmetic units from the top correspond to the n-th and m-th arithmetic units from the left. That is, the arithmetic unit that reads the (m, n) component (m ≠ n) of the matrix corresponds to the arithmetic unit that reads the (n, m) component of the matrix. Each arithmetic unit inputs the read component to the corresponding arithmetic unit.

なお、「ｍ≠ｎ」とする理由は、正方行列の対角成分に対応する演算器（例えば、演算器1301、演算器1305、および演算器1309）を除外するためである。以上の動作により、演算部1300は、低消費電力で重み行列Ｗを基に転置行列Ｗ^Ｔを生成できる。The reason for setting "m ≠ n" is to exclude the arithmetic units (for example, arithmetic unit 1301, arithmetic unit 1305, and arithmetic unit 1309) corresponding to the diagonal components of the square matrix. By the above operation, the calculation unit 1300 can generate a transposed matrix W ^T based on weight matrix W with low power consumption.

［動作の説明］
以下、本実施形態の演算部1300が転置行列Ｗ^Ｔを生成する動作を図５を参照して説明する。図５は、第１の実施形態の演算部1300による転置行列生成処理の動作を示すフローチャートである。[Explanation of operation]
Hereinafter, the operation unit 1300 of the present embodiment will be described with reference to FIG. 5 the operation of generating the transposed matrix W ^T. FIG. 5 is a flowchart showing the operation of the transposed matrix generation process by the calculation unit 1300 of the first embodiment.

最初に、演算部1300に重み行列Ｗが入力されると、各演算器は、重み行列Ｗの該当する重みをそれぞれ読み込む（ステップS101）。各演算器は、読み込まれた重みを第１重みレジスタにそれぞれ格納する。 First, when the weight matrix W is input to the arithmetic unit 1300, each arithmetic unit reads the corresponding weight of the weight matrix W (step S101). Each arithmetic unit stores the read weight in the first weight register.

次いで、演算器1301、演算器1305、および演算器1309を除く各演算器は、対応する演算器と重みを交換する（ステップS102）。すなわち、各演算器は、格納された重みを対応する演算器の第２重みレジスタに書き込む。また、各演算器の第２重みレジスタには、対応する演算器から重みが書き込まれる。 Then, each arithmetic unit except the arithmetic unit 1301, the arithmetic unit 1305, and the arithmetic unit 1309 exchanges weights with the corresponding arithmetic units (step S102). That is, each arithmetic unit writes the stored weight to the second weight register of the corresponding arithmetic unit. In addition, weights are written from the corresponding arithmetic units to the second weight register of each arithmetic unit.

次いで、演算器1301、演算器1305、および演算器1309を除く各演算器は、第２重みレジスタに書き込まれた重みと第１重みレジスタに格納されている重みとを入れ替える（ステップS103）。 Next, each arithmetic unit except the arithmetic unit 1301, the arithmetic unit 1305, and the arithmetic unit 1309 replaces the weight written in the second weight register with the weight stored in the first weight register (step S103).

すなわち、各演算器は、第２重みレジスタに書き込まれた重みを第１重みレジスタに書き込む。また、各演算器は、第１重みレジスタに格納されている重みを第２重みレジスタに書き込む。各重みを入れ替えた後、演算部1300は、転置行列生成処理を終了する。 That is, each arithmetic unit writes the weight written in the second weight register to the first weight register. Further, each arithmetic unit writes the weight stored in the first weight register to the second weight register. After exchanging each weight, the arithmetic unit 1300 ends the transposed matrix generation process.

［効果の説明］
本実施形態の学習装置1000は、直接配線で結ばれた複数の演算器を含む演算部1300を備える。配線で結ばれた演算器同士は、読み込まれた重みをやり取りできる。すなわち、演算部1300は、入力された重み行列Ｗから転置行列Ｗ^Ｔを容易に生成できる。[Explanation of effect]
The learning device 1000 of the present embodiment includes a calculation unit 1300 including a plurality of calculation units directly connected by wiring. Arithmetic units connected by wiring can exchange read weights. That is, the calculation unit 1300 can easily generate a transposed matrix W ^T from the input weight matrix W.

演算部1300が転置行列Ｗ^Ｔを生成するため、重み行列Ｗのロードおよび各成分の並べ替えを繰り返し実行する学習装置に比べて、本実施形態の学習装置1000は、転置行列Ｗ^Ｔの生成で消費される電力を削減できる。Since the arithmetic unit 1300 generates a transposed matrix W ^T, as compared with the learning apparatus for executing repeatedly sorting load and the components of the weight matrix W, the learning device 1000 of this embodiment, in the generation of the transposed matrix W ^T The power consumed can be reduced.

実施形態２．
［構成の説明］
次に、本発明による演算部1300の第２の実施形態を、図面を参照して説明する。図６は、第２の実施形態の演算部1300の構成例を示すブロック図である。なお、本実施形態の学習装置1000の構成は、図１に示す学習装置1000の構成と同様である。Embodiment 2.
[Description of configuration]
Next, a second embodiment of the arithmetic unit 1300 according to the present invention will be described with reference to the drawings. FIG. 6 is a block diagram showing a configuration example of the calculation unit 1300 of the second embodiment. The configuration of the learning device 1000 of this embodiment is the same as the configuration of the learning device 1000 shown in FIG.

図６に示すように、本実施形態の演算部1300も第１の実施形態と同様に、演算器1301〜1309と、第１重みレジスタ1311〜1319と、第２重みレジスタ1321〜1329とを含む。すなわち、本実施形態の演算部1300も、３行３列の行列形式で配置されている演算器を含む。 As shown in FIG. 6, the arithmetic unit 1300 of the present embodiment also includes arithmetic units 1301 to 1309, first weight registers 1311 to 1319, and second weight registers 1321 to 1329, similarly to the first embodiment. .. That is, the calculation unit 1300 of the present embodiment also includes a calculation unit arranged in a matrix format of 3 rows and 3 columns.

各演算器は、第１重みレジスタと、第２重みレジスタと共に演算部1300に配置されている。演算器、第１重みレジスタ、第２重みレジスタが有する各機能は、第１の実施形態における各機能とそれぞれ同様である。 Each arithmetic unit is arranged in the arithmetic unit 1300 together with the first weight register and the second weight register. Each function of the arithmetic unit, the first weight register, and the second weight register is the same as each function in the first embodiment.

図６に示すように、演算部1300は、演算器1305以外の各演算器の間でデータが転送可能に構成されている。 As shown in FIG. 6, the arithmetic unit 1300 is configured so that data can be transferred between each arithmetic unit other than the arithmetic unit 1305.

学習処理では、重み行列Ｗの転置行列Ｗ^Ｔ以外に、重み行列Ｗの各重みが180 度反対の位置に配置された行列（以下、180 度回転行列と呼ぶ。）も使用される。図６に示す例であれば、重み行列Ｗの(1,2) 成分である重みw₂は、180 度回転行列では(3,2) 成分になる。In the learning process, in addition to the transposed matrix W ^T of the weight matrix W, the matrix of the weights of the weight matrix W is placed at a position opposite to 180 degrees (hereinafter, referred to as 180-degree rotation matrix.) Are also used. In the example shown in FIG. 6, the weight w _2, which is the (1,2) component of the weight matrix W, becomes the (3,2) component in the 180-degree rotation matrix.

すなわち、重み行列Ｗの(1,2) 成分は、180 度回転行列では(3,2) 成分=((3+1-1),(3+1-2))成分として扱われる。(3+1-1) の「３」は、重み行列Ｗの行数である。また、(3+1-1) の２番目の「１」は、重み行列Ｗの(1,2) 成分の「１」に対応している。 That is, the (1,2) component of the weight matrix W is treated as the (3,2) component = ((3 + 1-1), (3 + 1-2)) component in the 180-degree rotation matrix. "3" in (3 + 1-1) is the number of rows in the weight matrix W. The second "1" in (3 + 1-1) corresponds to the "1" in the (1,2) component of the weight matrix W.

また、(3+1-2) の「３」は、重み行列Ｗの列数である。また、(3+1-2) の「２」は、重み行列Ｗの(1,2) 成分の「２」に対応している。他の重み行列Ｗの成分も、同様の計算式に従って並べ替えられる。 Further, "3" in (3 + 1-2) is the number of columns of the weight matrix W. Further, "2" in (3 + 1-2) corresponds to "2" in the (1,2) component of the weight matrix W. The components of the other weight matrix W are also rearranged according to the same formula.

よって、学習処理が実行される場合、演算器1302と演算器1308との間で、重みw₂と重みw₈が交換される。具体的には、演算器1302は、最初に重みw₂を演算器1303の第２重みレジスタ1323に書き込む。次いで、演算器1303は、書き込まれた重みw₂を演算器1306の第２重みレジスタ1326に書き込む。Therefore, when the learning process is executed, the weight w ₂ and the weight w ₈ are exchanged between the arithmetic unit 1302 and the arithmetic unit 1308. Specifically, the arithmetic unit 1302 first _{writes the weight w 2} to the second weight register 1323 of the arithmetic unit 1303. Next, the arithmetic unit 1303 _{writes the written weight w 2} to the second weight register 1326 of the arithmetic unit 1306.

次いで、演算器1306は、書き込まれた重みw₂を演算器1309の第２重みレジスタ1329に書き込む。次いで、演算器1309は、書き込まれた重みw₂を演算器1308の第２重みレジスタ1328に書き込む。同様に、演算器1308も、他の演算器を介して重みw₈を演算器1302の第２重みレジスタ1322に書き込む。Next, the arithmetic unit 1306 _{writes the written weight w 2} to the second weight register 1329 of the arithmetic unit 1309. Next, the arithmetic unit 1309 _{writes the written weight w 2} to the second weight register 1328 of the arithmetic unit 1308. Similarly, the arithmetic unit 1308 also _{writes the weight w 8} to the second weight register 1322 of the arithmetic unit 1302 via another arithmetic unit.

上記のように、図６に示す各演算器は、他の演算器を介して宛先の演算器に重みを入力する。同様に、演算器1301と演算器1309との間で、重みw₁と重みw₉が交換される。また、演算器1303と演算器1307との間で、重みw₃と重みw₇が交換される。また、演算器1304と演算器1306との間で、重みw₄と重みw₆が交換される。As described above, each arithmetic unit shown in FIG. 6 inputs a weight to the destination arithmetic unit via another arithmetic unit. _{Similarly, the weight w 1} and the weight w ₉ are exchanged between the arithmetic unit 1301 and the arithmetic unit 1309. _{Further, the weight w 3} and the weight w ₇ are exchanged between the arithmetic unit 1303 and the arithmetic unit 1307. _{Further, the weight w 4} and the weight w ₆ are exchanged between the arithmetic unit 1304 and the arithmetic unit 1306.

なお、各重みは、図６に示す経路以外の経路を介して交換されてもよい。また、図３に示す構成例のように、各演算器の間に配線（図６に示す矢印に相当）が２本存在していてもよい。 The weights may be exchanged via a route other than the route shown in FIG. Further, as in the configuration example shown in FIG. 3, two wires (corresponding to the arrows shown in FIG. 6) may exist between each arithmetic unit.

重みが交換された後、演算器1305を除く各演算器は、第２重みレジスタに書き込まれた重みと第１重みレジスタに格納されている重みとを入れ替える。 After the weights are exchanged, each arithmetic unit except the arithmetic unit 1305 replaces the weight written in the second weight register with the weight stored in the first weight register.

図６に示すように、各演算器が重みの転送先を自在に設定できれば、すなわち各演算器がルーティング能力を有していれば、演算器間での重みのやり取りがより柔軟に実行される。 As shown in FIG. 6, if each arithmetic unit can freely set the transfer destination of the weight, that is, if each arithmetic unit has a routing ability, the exchange of weights between the arithmetic units is executed more flexibly. ..

また、図２に示す構成例のように、重みが交換される演算器同士が配線で接続されていてもよい。また、図３に示す構成例のように、重みが交換される演算器同士が２本の配線で接続されていてもよい。 Further, as in the configuration example shown in FIG. 2, the arithmetic units whose weights are exchanged may be connected by wiring. Further, as in the configuration example shown in FIG. 3, the arithmetic units whose weights are exchanged may be connected by two wirings.

本実施形態では、上からｍ番目、左からｎ番目（２×ｍ−１≠Ｍ、かつ２×ｎ−１≠Ｎ）の演算器と、上から（Ｍ＋１−ｍ）番目、左から（Ｎ＋１−ｎ）番目の演算器とが対応する。すなわち、行列の（ｍ，ｎ）成分（２×ｍ−１≠Ｍ、かつ２×ｎ−１≠Ｎ）を読み込んだ演算器と、行列の（Ｍ＋１−ｍ，Ｎ＋１−ｎ）成分を読み込んだ演算器とが対応する。各演算器は、読み込まれた成分を対応する演算器にそれぞれ入力する。 In this embodiment, the m-th from the top and the n-th from the left (2 × m-1 ≠ M and 2 × n-1 ≠ N), the (M + 1-m) th from the top, and the (N + 1) from the left. −n) Corresponds to the th-th arithmetic unit. That is, the arithmetic unit that read the (m, n) component of the matrix (2 × m-1 ≠ M and 2 × n-1 ≠ N) and the (M + 1-m, N + 1-n) component of the matrix were read. Corresponds to the arithmetic unit. Each arithmetic unit inputs the read component to the corresponding arithmetic unit.

なお、「２×ｍ−１≠Ｍ、かつ２×ｎ−１≠Ｎ」とする理由は、行数および列数が奇数である正方行列の中心に位置する成分に対応する演算器（例えば、演算器1305）を除外するためである。以上の動作により、演算部1300は、低消費電力で重み行列Ｗを基に180 度回転行列を生成できる。 The reason for setting "2 x m-1 ≠ M and 2 x n-1 ≠ N" is that the arithmetic unit corresponding to the component located at the center of the square matrix having an odd number of rows and columns (for example, This is to exclude the arithmetic unit 1305). By the above operation, the arithmetic unit 1300 can generate a 180-degree rotation matrix based on the weight matrix W with low power consumption.

［動作の説明］
以下、本実施形態の演算部1300が180 度回転行列を生成する動作を図７を参照して説明する。図７は、第２の実施形態の演算部1300による180 度回転行列生成処理の動作を示すフローチャートである。[Explanation of operation]
Hereinafter, the operation of the arithmetic unit 1300 of the present embodiment to generate a 180-degree rotation matrix will be described with reference to FIG. 7. FIG. 7 is a flowchart showing the operation of the 180-degree rotation matrix generation process by the calculation unit 1300 of the second embodiment.

最初に、演算部1300に重み行列Ｗが入力されると、各演算器は、重み行列Ｗの該当する重みをそれぞれ読み込む（ステップS201）。各演算器は、読み込まれた重みを第１重みレジスタにそれぞれ格納する。 First, when the weight matrix W is input to the arithmetic unit 1300, each arithmetic unit reads the corresponding weight of the weight matrix W (step S201). Each arithmetic unit stores the read weight in the first weight register.

次いで、演算器1305を除く各演算器は、対応する演算器と重みを交換する（ステップS202）。すなわち、各演算器は、格納された重みを対応する演算器の第２重みレジスタに書き込む。また、各演算器の第２重みレジスタには、対応する演算器から重みが書き込まれる。 Next, each arithmetic unit except the arithmetic unit 1305 exchanges weights with the corresponding arithmetic unit (step S202). That is, each arithmetic unit writes the stored weight to the second weight register of the corresponding arithmetic unit. In addition, weights are written from the corresponding arithmetic units to the second weight register of each arithmetic unit.

次いで、演算器1305を除く各演算器は、第２重みレジスタに書き込まれた重みと第１重みレジスタに格納されている重みとを入れ替える（ステップS203）。 Next, each arithmetic unit except the arithmetic unit 1305 replaces the weight written in the second weight register with the weight stored in the first weight register (step S203).

すなわち、各演算器は、第２重みレジスタに書き込まれた重みを第１重みレジスタに書き込む。また、各演算器は、第１重みレジスタに格納されている重みを第２重みレジスタに書き込む。各重みを入れ替えた後、演算部1300は、180 度回転行列生成処理を終了する。 That is, each arithmetic unit writes the weight written in the second weight register to the first weight register. Further, each arithmetic unit writes the weight stored in the first weight register to the second weight register. After exchanging each weight, the arithmetic unit 1300 ends the 180-degree rotation matrix generation process.

［効果の説明］
本実施形態の学習装置1000は、直接配線で結ばれた複数の演算器を含む演算部1300を備える。配線で結ばれた演算器同士は、読み込まれた重みをやり取りできる。すなわち、演算部1300は、入力された重み行列Ｗから180 度回転行列を容易に生成できる。[Explanation of effect]
The learning device 1000 of the present embodiment includes a calculation unit 1300 including a plurality of calculation units directly connected by wiring. Arithmetic units connected by wiring can exchange read weights. That is, the arithmetic unit 1300 can easily generate a 180-degree rotation matrix from the input weight matrix W.

演算部1300が180 度回転行列を生成するため、重み行列Ｗのロードおよび各成分の並べ替えを繰り返し実行する学習装置に比べて、本実施形態の学習装置1000は、180 度回転行列の生成で消費される電力を削減できる。 Since the arithmetic unit 1300 generates a 180-degree rotation matrix, the learning device 1000 of the present embodiment can generate a 180-degree rotation matrix, as compared with a learning device that repeatedly loads the weight matrix W and rearranges each component. The power consumed can be reduced.

以下、各実施形態の学習装置1000のハードウェア構成の具体例を説明する。図８は、本発明による学習装置1000のハードウェア構成例を示す説明図である。 Hereinafter, a specific example of the hardware configuration of the learning device 1000 of each embodiment will be described. FIG. 8 is an explanatory diagram showing a hardware configuration example of the learning device 1000 according to the present invention.

図８に示す学習装置1000は、プロセッサ1001と、主記憶装置1002と、補助記憶装置1003と、インタフェース1004と、出力デバイス1005と、入力デバイス1006とを備える。また、プロセッサ1001は、ＣＰＵ1008や、ＧＰＵ1007等の各種演算・処理装置を含んでいてもよい。 The learning device 1000 shown in FIG. 8 includes a processor 1001, a main storage device 1002, an auxiliary storage device 1003, an interface 1004, an output device 1005, and an input device 1006. Further, the processor 1001 may include various arithmetic / processing devices such as a CPU 1008 and a GPU 1007.

図８に示すように実装される場合、学習装置1000の動作は、プログラムの形式で補助記憶装置1003に記憶されていてもよい。プログラムが補助記憶装置1003に記憶される場合、ＣＰＵ1008は、プログラムを補助記憶装置1003から読み出して主記憶装置1002に展開し、展開されたプログラムに従って学習装置1000における所定の処理を実行する。 When implemented as shown in FIG. 8, the operation of the learning device 1000 may be stored in the auxiliary storage device 1003 in the form of a program. When the program is stored in the auxiliary storage device 1003, the CPU 1008 reads the program from the auxiliary storage device 1003, expands it in the main storage device 1002, and executes a predetermined process in the learning device 1000 according to the expanded program.

なお、ＣＰＵ1008は、プログラムに従って動作する情報処理装置の一例である。学習装置1000は、ＣＰＵ（Central Processing Unit ）以外にも、例えば、ＭＰＵ（Micro Processing Unit ）やＭＣＵ（Memory Control Unit ）やＧＰＵ（Graphics Processing Unit）を備えていてもよい。図８には、学習装置1000がＣＰＵ1008に加えて、ＧＰＵ1007をさらに備える例が記載されている。 The CPU 1008 is an example of an information processing device that operates according to a program. The learning device 1000 may include, for example, an MPU (Micro Processing Unit), an MCU (Memory Control Unit), or a GPU (Graphics Processing Unit) in addition to the CPU (Central Processing Unit). FIG. 8 shows an example in which the learning device 1000 further includes the GPU 1007 in addition to the CPU 1008.

補助記憶装置1003は、一時的でない有形の媒体の一例である。一時的でない有形の媒体の他の例として、インタフェース1004を介して接続される磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory ）、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory ）、半導体メモリ等が挙げられる。 Auxiliary storage 1003 is an example of a non-temporary tangible medium. Other examples of non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs (Compact Disk Read Only Memory), DVD-ROMs (Digital Versatile Disk Read Only Memory), which are connected via interface 1004. Examples include semiconductor memory.

また、補助記憶装置1003に記憶される対象のプログラムが補助記憶装置1003に記憶される代わりに通信回線によって学習装置1000に配信される場合、配信を受けた学習装置1000は、配信されたプログラムを主記憶装置1002に展開し、所定の処理を実行してもよい。 Further, when the target program stored in the auxiliary storage device 1003 is distributed to the learning device 1000 by a communication line instead of being stored in the auxiliary storage device 1003, the distributed learning device 1000 delivers the distributed program. It may be expanded to the main storage device 1002 and a predetermined process may be executed.

また、プログラムは、学習装置1000における所定の処理の一部を実現するためのものでもよい。さらに、プログラムは、補助記憶装置1003に既に記憶されている他のプログラムと組み合わせられて使用される、学習装置1000における所定の処理を実現するための差分プログラムでもよい。 Further, the program may be used to realize a part of a predetermined process in the learning device 1000. Further, the program may be a difference program for realizing a predetermined process in the learning device 1000, which is used in combination with another program already stored in the auxiliary storage device 1003.

インタフェース1004は、他の装置との間で情報の送受信を行う。また、出力デバイス1005は、ユーザに情報を提示する。また、入力デバイス1006は、ユーザからの情報の入力を受け付ける。 Interface 1004 sends and receives information to and from other devices. The output device 1005 also presents information to the user. Further, the input device 1006 accepts the input of information from the user.

また、学習装置1000における処理内容によっては、図８に示す一部の要素は省略可能である。例えば、学習装置1000がユーザに情報を提示しないのであれば、出力デバイス1005は省略可能である。また、例えば、学習装置1000がユーザから情報入力を受け付けないのであれば、入力デバイス1006は省略可能である。 Further, some elements shown in FIG. 8 can be omitted depending on the processing content of the learning device 1000. For example, if the learning device 1000 does not present information to the user, the output device 1005 can be omitted. Further, for example, if the learning device 1000 does not accept information input from the user, the input device 1006 can be omitted.

また、上記の各構成要素の一部または全部は、汎用または専用の回路（Circuitry ）、プロセッサ等やこれらの組み合わせによって実現される。これらは単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。また、上記の各構成要素の一部または全部は、上述した回路等とプログラムとの組み合わせによって実現されてもよい。 In addition, some or all of the above components are realized by a general-purpose or dedicated circuit (Circuitry), a processor, or a combination thereof. These may be composed of a single chip or may be composed of a plurality of chips connected via a bus. Further, a part or all of the above-mentioned components may be realized by a combination of the above-mentioned circuit or the like and a program.

上記の各構成要素の一部または全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When a part or all of each of the above components is realized by a plurality of information processing devices and circuits, the plurality of information processing devices and circuits may be centrally arranged or distributed. Good. For example, the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client-and-server system and a cloud computing system.

次に、本発明の概要を説明する。図９は、本発明による学習装置の概要を示すブロック図である。本発明による学習装置100 は、Ｍ行Ｎ列（Ｍ、Ｎはそれぞれ１以上の整数）の行列が入力される演算部110 （例えば、演算部1300）を備える学習装置であって、演算部110 は、複数の演算器（例えば、演算器1301〜1309）を含み、複数の演算器は、演算部110 に行列が入力されると行列の各成分をそれぞれ読み込み、読み込まれた成分を対応する演算器にそれぞれ入力する。 Next, the outline of the present invention will be described. FIG. 9 is a block diagram showing an outline of the learning device according to the present invention. The learning device 100 according to the present invention is a learning device including a calculation unit 110 (for example, a calculation unit 1300) in which a matrix of M rows and N columns (M and N are each an integer of 1 or more) is input, and is a calculation unit 110. Includes a plurality of arithmetic units (for example, arithmetic units 1301 to 1309), and the plurality of arithmetic units read each component of the matrix when a matrix is input to the arithmetic unit 110, and perform the corresponding operation on the read component. Enter each in the vessel.

そのような構成により、学習装置は、低消費電力で行列の各成分を並べ替えることができる。 With such a configuration, the learning device can sort each component of the matrix with low power consumption.

また、ｍを１以上Ｍ以下の整数、ｎを１以上Ｎ以下の整数とするとき、行列の（ｍ，ｎ）成分（ｍ≠ｎ）を読み込んだ演算器と、行列の（ｎ，ｍ）成分を読み込んだ演算器とが対応してもよい。 Further, when m is an integer of 1 or more and M or less and n is an integer of 1 or more and N or less, the arithmetic unit that reads the (m, n) component (m ≠ n) of the matrix and the (n, m) of the matrix. It may correspond to the arithmetic unit that has read the components.

そのような構成により、学習装置は、転置行列を生成できる。 With such a configuration, the learning device can generate a transposed matrix.

また、行列形式で配置されている複数の演算器のうちの上からｍ番目、左からｎ番目（ｍ≠ｎ）の演算器は、複数の演算器のうちの上からｎ番目、左からｍ番目の演算器と対応し、演算部110 に行列が入力されると行列の（ｍ，ｎ）成分を読み込んでもよい。 Further, the m-th and n-th (m ≠ n) arithmetic units from the top among the plurality of arithmetic units arranged in the matrix format are the n-th from the top and m from the left among the plurality of arithmetic units. Corresponding to the second arithmetic unit, when the matrix is input to the arithmetic unit 110, the (m, n) component of the matrix may be read.

そのような構成により、学習装置は、複数の演算器の配置を利用して転置行列を生成できる。 With such a configuration, the learning device can generate a transposed matrix by utilizing the arrangement of a plurality of arithmetic units.

また、ｍを１以上Ｍ以下の整数、ｎを１以上Ｎ以下の整数とするとき、行列の（ｍ，ｎ）成分（２×ｍ−１≠Ｍ、かつ２×ｎ−１≠Ｎ）を読み込んだ演算器と、行列の（Ｍ＋１−ｍ，Ｎ＋１−ｎ）成分を読み込んだ演算器とが対応してもよい。 Further, when m is an integer of 1 or more and M or less and n is an integer of 1 or more and N or less, the (m, n) component (2 × m-1 ≠ M and 2 × n-1 ≠ N) of the matrix is set. The read arithmetic unit may correspond to the arithmetic unit that has read the (M + 1-m, N + 1-n) component of the matrix.

そのような構成により、学習装置は、180 度回転行列を生成できる。 With such a configuration, the learning device can generate a 180 degree rotation matrix.

また、行列形式で配置されている複数の演算器のうちの上からｍ番目、左からｎ番目（２×ｍ−１≠Ｍ、かつ２×ｎ−１≠Ｎ）の演算器は、複数の演算器のうちの上から（Ｍ＋１−ｍ）番目、左から（Ｎ＋１−ｎ）番目の演算器と対応し、演算部110 に行列が入力されると行列の（ｍ，ｎ）成分を読み込んでもよい。 Further, among the plurality of arithmetic units arranged in the matrix format, the m-th and n-th (2 × m-1 ≠ M and 2 × n-1 ≠ N) arithmetic units from the top are a plurality of arithmetic units. Corresponds to the (M + 1-m) th arithmetic unit from the top and the (N + 1-n) th arithmetic unit from the left, and when a matrix is input to the arithmetic unit 110, even if the (m, n) component of the matrix is read. Good.

そのような構成により、学習装置は、複数の演算器の配置を利用して180 度回転行列を生成できる。 With such a configuration, the learning device can generate a 180 degree rotation matrix by utilizing the arrangement of a plurality of arithmetic units.

また、各演算器は、対応する演算器と配線でそれぞれ接続されていてもよい。 Further, each arithmetic unit may be connected to the corresponding arithmetic unit by wiring.

そのような構成により、学習装置は、行列の各成分の並べ替えに係る消費電力をより削減できる。 With such a configuration, the learning device can further reduce the power consumption related to the rearrangement of each component of the matrix.

また、各演算器は、対応する演算器と２本の配線でそれぞれ接続されていてもよい。 Further, each arithmetic unit may be connected to the corresponding arithmetic unit by two wirings.

そのような構成により、学習装置は、行列の各成分をより迅速に交換できる。 With such a configuration, the learning device can exchange each component of the matrix more quickly.

また、各演算器は、他の演算器を介して対応する演算器に読み込まれた成分をそれぞれ入力してもよい。 Further, each arithmetic unit may input a component read into the corresponding arithmetic unit via another arithmetic unit.

そのような構成により、学習装置は、転置行列または180 度回転行列の生成の用途以外にも適用される。 With such a configuration, the learning device is applied for purposes other than the generation of transposed matrices or 180 degree rotation matrices.

また、各演算器は、他の演算器と２本の配線でそれぞれ接続されていてもよい。 Further, each arithmetic unit may be connected to another arithmetic unit by two wirings.

また、行列の成分は、１つ以上のユニットでそれぞれ構成された複数の層が層状に結合された判別モデルの各ユニットのパラメタでもよい。 Further, the matrix component may be a parameter of each unit of the discrimination model in which a plurality of layers each composed of one or more units are connected in a layered manner.

そのような構成により、学習装置は、重み行列を取り扱うことができる。 With such a configuration, the learning device can handle the weight matrix.

また、判別モデルは、ニューラルネットワークでもよい。 Further, the discrimination model may be a neural network.

そのような構成により、学習装置は、深層学習を実行できる。 With such a configuration, the learning device can perform deep learning.

以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described above with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made within the scope of the present invention in terms of the structure and details of the present invention.

70 大規模学習回路
71 ユニット
80 推論装置
81、91、1100 重みメモリ
82、92、1200 重みロード部
83、93、110 、1300 演算部
90、100 、1000 学習装置
94、1400 重みストア部
1001 プロセッサ
1002 主記憶装置
1003 補助記憶装置
1004 インタフェース
1005 出力デバイス
1006 入力デバイス
1007 ＧＰＵ
1008 ＣＰＵ
1301〜1309 演算器
1311〜1319 第１重みレジスタ
1321〜1329 第２重みレジスタ70 Large-scale learning circuit
71 units
80 Inference device
81, 91, 1100 Weight memory
82, 92, 1200 Weight load section
83, 93, 110, 1300 Arithmetic unit
90, 100, 1000 learning device
94, 1400 Weight store
1001 processor
1002 main memory
1003 Auxiliary storage
1004 interface
1005 output device
1006 input device
1007 GPU
1008 CPU
1301-1309 Arithmetic
1311-1319 1st weight register
1321 to 1329 2nd weight register

Claims

It is a learning device provided with a calculation unit for inputting a matrix of M rows and N columns (M and N are integers of 1 or more each).
The calculation unit
Including multiple arithmetic units
The plurality of arithmetic units are
When the matrix is input to the calculation unit, each component of the matrix is read and read.
A learning device characterized in that the read components are input to the corresponding arithmetic units.

When m is an integer of 1 or more and M or less, and n is an integer of 1 or more and N or less, an arithmetic unit that reads the (m, n) component (m ≠ n) of the matrix and the (n, m) component of the matrix. The learning device according to claim 1, wherein the arithmetic unit that has read the above corresponds to the arithmetic unit.

Of the multiple arithmetic units arranged in a matrix format, the m-th and n-th (m ≠ n) arithmetic units from the top are
Corresponds to the nth arithmetic unit from the top and the mth arithmetic unit from the left among the plurality of arithmetic units.
The learning device according to claim 2, wherein when a matrix is input to the arithmetic unit, the (m, n) component of the matrix is read.

When m is an integer of 1 or more and M or less and n is an integer of 1 or more and N or less, the (m, n) component (2 × m-1 ≠ M and 2 × n-1 ≠ N) of the matrix is read. The learning device according to claim 1, wherein the arithmetic unit and the arithmetic unit that has read the (M + 1-m, N + 1-n) components of the matrix correspond to each other.

Of the plurality of arithmetic units arranged in a matrix format, the m-th and n-th from the left (2 × m-1 ≠ M and 2 × n-1 ≠ N) arithmetic units are
Corresponds to the (M + 1-m) th arithmetic unit from the top and the (N + 1-n) th arithmetic unit from the left among the plurality of arithmetic units.
The learning device according to claim 4, wherein when a matrix is input to the arithmetic unit, the (m, n) component of the matrix is read.

The learning device according to any one of claims 1 to 5, wherein each arithmetic unit is connected to a corresponding arithmetic unit by wiring.

The learning device according to claim 6, wherein each arithmetic unit is connected to a corresponding arithmetic unit by two wires.

The learning device according to any one of claims 1 to 5, wherein each arithmetic unit inputs a component read into a corresponding arithmetic unit via another arithmetic unit.

The learning device according to claim 8, wherein each arithmetic unit is connected to another arithmetic unit by two wires.

The component of the matrix is described in any one of claims 1 to 9, which is a parameter of each unit of the discrimination model in which a plurality of layers each composed of one or more units are connected in a layered manner. Learning device.