JP2020091855A

JP2020091855A - Training model, method of generating model, and program

Info

Publication number: JP2020091855A
Application number: JP2019209063A
Authority: JP
Inventors: 誠也得居; Seiya Tokui; 大輔西野; Daisuke Nishino; 裕幸ヴインセントヤマザキ; Vincent Yamazaki Hiroyuki; 直利瀬尾; Naotoshi Seo; 諒文今西; Akifumi Imanishi
Original assignee: Preferred Networks Inc
Current assignee: Preferred Networks Inc
Priority date: 2018-11-26
Filing date: 2019-11-19
Publication date: 2020-06-11

Abstract

To enable efficient graph-based computation.SOLUTION: A training device is provided, comprising a graph generation unit configured to generate a graph based on error back propagation paths; an ID allocation unit configured to allocate in identifier to each node in the graph based on the error back propagation paths; and a back propagation unit configured to perform error back propagation based on the graph and the identifiers.SELECTED DRAWING: Figure 1

Description

本開示は、訓練装置、モデルの生成方法及びプログラムに関する。 The present disclosure relates to a training device, a model generation method, and a program.

機械学習において、ニューラルネットワークモデルは、入力層から出力層までのデータの遷移をグラフとして表し、グラフの接続に基づいて順伝播及び逆伝播が行われて訓練される。ネットワークの構築として、例えば、訓練の実行中にネットワークを定義するDefine-by-Run形式がある。Define-by-Run形式においては、非特許文献１に示すように、訓練中にグラフの形状が変化していくため、順伝播時にデータの処理を示すグラフを形成し、そのグラフを用いて逆伝播を行う。この結果、グラフを記憶する相当の領域が必要となり、メモリ領域を圧迫するため、効率的にネットワークの訓練を行うことが困難な場合がある。 In machine learning, a neural network model represents a transition of data from an input layer to an output layer as a graph, and forward propagation and back propagation are performed and training is performed based on the connection of the graph. As a network construction, for example, there is a Define-by-Run form in which the network is defined during execution of training. In the Define-by-Run format, as shown in Non-Patent Document 1, the shape of the graph changes during training, so a graph showing the processing of data is formed during forward propagation, and the graph is used to reverse Propagate. As a result, a considerable area for storing the graph is required, and the memory area is pressed, which may make it difficult to efficiently train the network.

S. Tokui, et.al., "Chainer: a Next-Generation Open Source Framework for Deep Learning," Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015S. Tokui, et.al., "Chainer: a Next-Generation Open Source Framework for Deep Learning," Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015

本実施形態においては、効率的にグラフを用いた演算を行う、グラフ演算装置、グラフ演算方法及びプログラムを提供する。 In the present embodiment, a graph calculation device, a graph calculation method, and a program for efficiently performing calculation using a graph are provided.

一実施形態によれば、訓練装置は、誤差逆伝播の経路に基づいてグラフを生成する、グラフ生成部と、前記グラフにおいて、前記誤差逆伝播の経路に基づいて識別子を各ノードに付与する、ＩＤ付与部と、前記グラフ及び前記識別子に基づいて誤差逆伝播を実行する、逆伝播部と、を備える。 According to one embodiment, the training device generates a graph based on a path of error backpropagation, a graph generation unit, and in the graph, assigns an identifier to each node based on the path of error backpropagation, An ID assigning unit and a backpropagation unit that performs error backpropagation based on the graph and the identifier.

一実施形態に係る訓練装置の機能を示すブロック図。The block diagram which shows the function of the training apparatus which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係る訓練装置の処理を示すフローチャート。The flowchart which shows the process of the training apparatus which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係るグラフ生成の一例を示す図。The figure which shows an example of the graph generation which concerns on one Embodiment. 一実施形態に係る訓練装置のハードウェア実装例を示す図。The figure which shows the example of hardware implementation of the training apparatus which concerns on one Embodiment.

以下、図面を参照して実施形態について説明する。 Hereinafter, embodiments will be described with reference to the drawings.

図１は、一実施形態に係る訓練装置の機能を示すブロック図である。訓練装置１は、入力部１０と、記憶部１２と、グラフ生成部１４と、順伝播部１６と、ＩＤ付与部１８と、逆伝播部２０と、出力部２２と、を備え、入力データに対して所定の処理を行った出力データを出力する学習済みモデルを訓練する。 FIG. 1 is a block diagram showing functions of the training device according to the embodiment. The training device 1 includes an input unit 10, a storage unit 12, a graph generation unit 14, a forward propagation unit 16, an ID assignment unit 18, a back propagation unit 20, and an output unit 22, and inputs data. On the other hand, a trained model that outputs output data that has been subjected to predetermined processing is trained.

入力部１０は、データの入力を受け付ける。入力データは、例えば、訓練データであり、訓練するネットワークに入力されるデータと、損失の計算に用いるための教師データ（label）データ等、必要となるデータである。 The input unit 10 receives data input. The input data is, for example, training data, and is necessary data such as data input to the network to be trained and teacher data (label) data used for calculating the loss.

記憶部１２は、訓練装置１において訓練に必要となるデータあるいは訓練の結果を記憶する。この記憶部１２は、例えば、訓練の対象となるネットワークについて、構成、訓練において更新されるパラメータ等を記憶する。また、入力部１０から入力されたデータを一時的に記憶してもよい。さらに、訓練が終了した後の最終的なパラメータを記憶してもよい。訓練装置１に記憶部１２が備えられる構成としているが、記憶部１２の一部又は全部は、訓練装置１の外部に備えられ、通信回線等を介して訓練装置１がデータの送受信できるようにしてもよい。 The storage unit 12 stores data necessary for training in the training device 1 or a training result. The storage unit 12 stores, for example, a configuration of a network to be trained, parameters updated in the training, and the like. Further, the data input from the input unit 10 may be temporarily stored. Further, final parameters after the training is finished may be stored. Although the training device 1 is provided with the storage unit 12, a part or all of the storage unit 12 is provided outside the training device 1 so that the training device 1 can transmit and receive data via a communication line or the like. You may.

グラフ生成部１４は、ネットワークにデータが入力されるタイミングにおいて、演算グラフを生成する。順伝播部１６は、記憶部１２に記憶されているネットワーク形成の定義の記述に基づいて、入力データの演算を行う。別の例として、ネットワークを形成する定義を記述するネットワーク定義記述部が備えられ、当該ネットワーク定義記述部に記述されているネットワークの定義に基づいて、順伝播を実行しながらグラフを生成してもよい。このように、所定のネットワーク定義に基づいてあらかじめグラフが生成されて順伝播、逆伝播等の処理が行われるのではなく、上記のように、ネットワークの定義の記述に基づいて、順伝播のタイミングでグラフが生成され、生成されたグラフに基づいて以降の処理が行われる。グラフは、例えば、入出力変数の形態等により異なるものが形成されてもよい。 The graph generation unit 14 generates an operation graph at the timing when data is input to the network. The forward propagation unit 16 calculates the input data based on the description of the network formation definition stored in the storage unit 12. As another example, a network definition description part that describes a definition forming a network is provided, and even if a graph is generated while performing forward propagation based on the definition of the network described in the network definition description part. Good. As described above, the graph is not generated in advance based on the predetermined network definition and the processing such as forward propagation and back propagation is performed, but the timing of forward propagation is based on the description of the network definition as described above. The graph is generated in, and the subsequent processing is performed based on the generated graph. Different graphs may be formed depending on, for example, the form of input/output variables.

より具体的には、グラフ生成部１４と、順伝播部１６は、別個の動作を行うのではなく、訓練の対象となるネットワークの構造に基づいて、順伝播部１６が順伝播の処理を実行するとともにグラフ生成部１４がグラフを生成する。すなわち、訓練装置１は、あるデータ又はデータ群に対して、それぞれのデータに基づいてネットワークの順伝播を実行するとともに、逆伝播のためのグラフを生成する。このように、あらかじめ決められたネットワークの演算グラフを生成してから順伝播以降の処理を行うのではなく、順伝播処理とともに入力されたデータがどのような演算を行うかによりグラフを都度生成し、それ以降の処理（例えば、逆伝播）を実行する。 More specifically, the graph generation unit 14 and the forward propagation unit 16 do not perform separate operations, but the forward propagation unit 16 performs the forward propagation process based on the structure of the network to be trained. At the same time, the graph generation unit 14 generates a graph. That is, the training device 1 performs forward propagation of the network on a certain data or data group based on the respective data, and also generates a graph for back propagation. In this way, instead of generating an operation graph of a predetermined network and then performing the process after the forward propagation, the graph is generated each time according to the operation of the input data along with the forward propagation process. , And subsequent processing (for example, back propagation) is executed.

ＩＤ付与部１８は、グラフ生成部１４が生成したグラフに対して、逆伝播する経路を示す識別子（以下、逆伝播ＩＤと記載する）を付与する。逆伝播においてデータに含まれる変数ごとに異なる演算経路を辿る場合、この演算経路ごとにユニークな逆伝播ＩＤを当該演算経路に含まれるノード、例えば、変数ノード及び演算ノードに付与する。例えば、バッチが同種の変数を有する場合には、バッチごとにグラフが生成され、この変数ごとに逆伝播ＩＤが付与されてもよい。逆伝播ＩＤは、グラフごとに一意的に与えられる。 The ID assigning unit 18 assigns an identifier (hereinafter, referred to as a backpropagation ID) indicating a backpropagation route to the graph generated by the graph generating unit 14. When backtracking a different computation path for each variable included in the data, a unique backpropagation ID for each computation path is given to a node included in the computation path, for example, a variable node and a computation node. For example, when batches have the same type of variable, a graph may be generated for each batch and a back-propagation ID may be assigned to each variable. The back propagation ID is uniquely given to each graph.

逆伝播部２０は、順伝播部１６がネットワークを順伝播させて出力した結果と、教師データ（ラベル）とを比較して損失を算出し、当該損失に基づいて誤差逆伝播処理を実行する。この逆伝播処理は、ＩＤ付与部１８が付与した逆伝播ＩＤごとに実行される。逆伝播部２０は、逆伝播が終了したグラフを削除してもよい。削除とは、グラフ自体を破棄してもよいし、グラフが保存されているメモリ領域等を上書きできる状態にする（例えば、メモリを解放する）ことによりグラフデータを実質的に破棄してもよい。また、このグラフの削除は、逆伝播部２０が実行するのではなく、別途グラフ削除部（図示しない）を備え、当該グラフ削除部が逆伝播部２０の動作に基づいてグラフを削除する構成としてもよい。 The back-propagation unit 20 calculates a loss by comparing the result output by the forward-propagation unit 16 that forward-propagates the network and the teacher data (label), and executes the error back-propagation process based on the loss. This back-propagation process is executed for each back-propagation ID assigned by the ID assigning unit 18. The back propagation unit 20 may delete the graph for which back propagation has ended. To delete, the graph itself may be discarded, or the graph data may be substantially discarded by making the memory area in which the graph is saved, etc. overwritable (for example, releasing the memory). .. Further, this graph deletion is not executed by the back propagation unit 20, but is provided with a separate graph deletion unit (not shown), and the graph deletion unit deletes the graph based on the operation of the back propagation unit 20. Good.

バッチ演算等を行う場合には、同種の変数が存在する場合がある。このような場合、これらの同種の変数において、演算経路が異ならない、又は、２回以上微分を行わない等の条件がある場合には、変数群として変数ノード、演算ノードを生成して、同一の逆伝播ＩＤを用いて演算を行ってもよい。 When performing a batch operation or the like, the same type of variable may exist. In such a case, if there is a condition that the calculation paths do not differ for these variables of the same type or that differentiation is not performed twice or more, a variable node and a calculation node are generated as a variable group, and the same variables are generated. The calculation may be performed using the back-propagation ID of.

逆伝播部２０の動作が終了した後、必要であれば、さらにネットワークを訓練するために、グラフ生成部１４、順伝播部１６、ＩＤ付与部１８及び逆伝播部２０により上記の処理が訓練の終了条件を満たすまで繰り返される。 After the operation of the backpropagation unit 20 is completed, if necessary, the above-described processing is performed by the graph generation unit 14, the forwardpropagation unit 16, the ID assignment unit 18, and the backpropagation unit 20 in order to further train the network. Repeated until the end condition is met.

出力部２２は、訓練が終了した後、学習済みのモデルを出力する。学習済みのモデルの出力は、モデル全体を出力するものであってもよいし、モデルに関する形状、パラメータ等のデータを出力し、外部で同じ学習済みモデルを構築することが可能なデータを出力するものであってもよい。さらに、出力部２２を介して外部に出力するのではなく、記憶部１２に学習済みモデルを格納し、記憶させてもよく、この場合、訓練が終了した訓練装置１を、学習済みモデルを用いた推定装置等として機能させてもよい。 The output unit 22 outputs the learned model after the training is completed. The output of the learned model may be the output of the entire model, or the data such as the shape and parameters related to the model may be output, and the data capable of constructing the same learned model externally may be output. It may be one. Further, the trained model may be stored and stored in the storage unit 12 instead of being output to the outside via the output unit 22, and in this case, the trained device 1 after the training is used as the trained model. It may be made to function as an estimation device or the like.

図２は、本実施形態に係るグラフ生成の一例を示す図である。例えば、変数Ａ、Ｂが入力され、変数Ｃが出力される状態（Ｃ＝Ｆ（Ａ，Ｂ））について示している。各グラフにおいてノード間を接続するエッジ（破線）は、出力から入力へと向かっているが、逆伝播をわかりやすく図示したものであり、順伝播時には、逆方向へとノード間を遷移する。なお、有向グラフとして示したが、逆伝播の順序が判断できるのであれば、必ずしも有向グラフではなくともよい。例えば、出力から入力へ向けて演算が行われるようにノードを辿るための指示をしてもよい。具体的には、出力から入力へ向けた演算ノードの列を記憶し、それを順番に実行していってもよい。 FIG. 2 is a diagram showing an example of graph generation according to this embodiment. For example, a state in which the variables A and B are input and the variable C is output (C=F(A, B)) is shown. The edges (broken lines) connecting the nodes in each graph are directed from the output to the input, but the back propagation is illustrated in an easy-to-understand manner, and during forward propagation, transitions occur between the nodes in the reverse direction. Although it is shown as a directed graph, it may not necessarily be a directed graph as long as the order of back propagation can be determined. For example, an instruction may be given to follow a node so that an operation is performed from output to input. Specifically, a sequence of operation nodes from output to input may be stored and executed in order.

変数Ａ、Ｂが入力されると、関数Ｆの処理が行われ、変数Ｃが出力される。これらの処理は、順伝播部１６により実行される。この順伝播の処理と並行して、グラフ生成部１４がグラフを生成する。例えば、グラフ生成部１４は、関数Ｆの処理を順伝播部１６が定義するタイミングで、逆伝播の経路を判断し、経路が複数ある場合には、それぞれの経路に対応するグラフを生成する。この結果、例えば、変数Ａ、Ｂのそれぞれに対して逆伝播の経路が存在する場合には、変数ノードＡ、関数演算ノードＦ１及び変数ノードＣ１を含むグラフと、変数ノードＢ、関数演算ノードＦ２及び変数ノードＣ２を含むグラフと、が生成される。 When the variables A and B are input, the function F is processed, and the variable C is output. These processes are executed by the forward propagation unit 16. In parallel with this forward propagation processing, the graph generation unit 14 generates a graph. For example, the graph generation unit 14 determines a back-propagation route at the timing when the process of the function F is defined by the forward propagation unit 16, and when there are a plurality of routes, generates a graph corresponding to each route. As a result, for example, when there is a back propagation path for each of the variables A and B, a graph including the variable node A, the function operation node F1, and the variable node C1, and the variable node B and the function operation node F2. And a graph including the variable node C2.

逆伝播される経路ごとに、ＩＤ付与部１８は、グラフについて逆伝播ＩＤを付与する。図２において、変数ノードＡ、演算ノードＦ１、変数ノードＣ１が属するグラフにおいては、各ノードに対して逆伝播ＩＤ１が付与される。一方で、変数ノードＢ、演算ノードＦ２、変数ノードＣ２が属するグラフにおいては、各ノードに対して逆伝播ＩＤ２が付与される。なお、逆伝播ＩＤの付与がこのようにグラフ生成の後、又は、グラフ生成と並列に行われるのではなく、逆伝播ＩＤがあり、これに基づいたグラフ生成を行うようにしてもよい。この場合にも、グラフ生成とともに、グラフを構成する各ノードにＩＤ付与部１８が逆伝播ＩＤを付与してもよい。 The ID assigning unit 18 assigns a backpropagation ID to the graph for each route that is backpropagated. In FIG. 2, in the graph to which the variable node A, the operation node F1, and the variable node C1 belong, back propagation ID1 is given to each node. On the other hand, in the graph to which the variable node B, the operation node F2, and the variable node C2 belong, the back propagation ID2 is given to each node. Note that the backpropagation ID may be added after the graph is generated or in parallel with the graph generation as described above, and the graph may be generated based on the backpropagation ID. Also in this case, the ID assigning unit 18 may assign the back-propagation ID to each node forming the graph when the graph is generated.

なお、各グラフについて演算ノードは、１つずつしか示されていないが、これは説明の簡単のためであり、実際には複数の演算ノードが連なるグラフを形成していてもよい。以下の説明においても同様に、演算ノードはグラフに対して１つしか示されないが、２以上の演算ノードが存在していてもよく、順伝播において逆伝播が必要となる演算ごとに演算ノードが生成され、この各演算ノードにしたがい逆伝播の演算を行う。また、演算ノードに接続する入出力変数ノードについても、必ずしも最初の演算ノード（例えば、入力層）又は最後の演算ノード（例えば、出力層）に対して入出力される訳ではなく、複数ある演算ノードのうち、途中のノードに対する入出力がされてもよい。この関係は、訓練する対象となるネットワークに対する変数の入出力、特に逆伝播する経路と同等の関係となる。 Although only one operation node is shown for each graph, this is for simplicity of explanation, and in practice, a graph in which a plurality of operation nodes are connected may be formed. Similarly, in the following description, only one operation node is shown for the graph, but there may be two or more operation nodes, and there is an operation node for each operation that requires back propagation in forward propagation. The back propagation is generated and the back propagation is calculated according to each calculation node. Also, regarding the input/output variable node connected to the operation node, it is not necessarily input/output to/from the first operation node (for example, input layer) or the last operation node (for example, output layer), and there are a plurality of operations. Input/output may be performed with respect to a node in the middle of the nodes. This relationship is equivalent to the input/output of variables with respect to the network to be trained, especially the backpropagating path.

複数の演算ノードが存在する場合、演算ノード同士の接続についても同様であり、逆伝播の経路により、ある関数から他の複数の関数へと分岐するような場合には、異なる逆伝播ＩＤが付与される複数のグラフが生成されてもよい。このように、変数、関数に基づいて逆伝播の経路ごとにグラフ、当該グラフに属するノードが生成され、各グラフにおいて一意的にノードに逆伝播ＩＤが付与される。 When there are a plurality of operation nodes, the same applies to the connection between the operation nodes, and when a function branches to another function due to a back propagation path, different back propagation IDs are assigned. Multiple graphs may be generated. In this way, a graph and a node belonging to the graph are generated for each backpropagation route based on the variables and functions, and the backpropagation ID is uniquely assigned to the node in each graph.

逆伝播部２０は、付与された逆伝播ＩＤごとに誤差逆伝播の処理を行い、ネットワークを更新する。例えば、逆伝播ＩＤ１のグラフを辿ることにより、変数ノードＣ１から、関数Ｆ１におけるパラメータについて逆伝播を実行し、パラメータを更新する。付与されたＩＤについて全ての演算が終了する、例えば、変数ノードＡまで逆伝播を行うと、逆伝播ＩＤ１が付与されたノード、エッジの情報が破棄されることにより、当該グラフが破棄される。あるいは、グラフの情報ごと破棄されてもよい。さらに別の例として、逆伝播部２０は、必要がなくなったタイミングにおいて、ノードごとに情報を破棄してもよい。ノードごとに破棄することで、より早いタイミングで使用しているリソースを解放することが可能となる。 The back propagation unit 20 performs error back propagation processing for each of the given back propagation IDs and updates the network. For example, by tracing the graph of back propagation ID1, back propagation is executed from the variable node C1 for the parameter in the function F1, and the parameter is updated. When all the operations are completed for the assigned ID, for example, when back propagation is performed up to the variable node A, the information of the node and the edge to which the back propagation ID1 is assigned is discarded, so that the graph is discarded. Alternatively, the information of the graph may be discarded. As yet another example, the back propagation unit 20 may discard the information for each node at the timing when it is no longer needed. By discarding each node, it becomes possible to release the used resources at an earlier timing.

この段階においては、逆伝播ＩＤ２が付与されているグラフは、まだ存在している。そこで、逆伝播部２０は、この逆伝播ＩＤ２が付与されているノードについて誤差逆伝播を上記と同様に実行することで、ネットワークを更新する。 At this stage, the graph to which the back propagation ID2 is assigned still exists. Therefore, the back-propagation unit 20 updates the network by executing the error back-propagation for the node to which the back-propagation ID2 is assigned in the same manner as above.

一般的な機械学習と同様に、この後に、別の変数を入力してさらにネットワークを更新してもよい。この場合、さらに、入力された変数に対して同様にグラフを生成して順伝播し、逆伝播ＩＤを付与して、逆伝播を行いつつ、自動的に不要となるグラフを破棄する。なお、訓練については、一般的な機械学習の手法を用いることができる。もちろん、ミニバッチ等を用いて並列的に処理することも可能である。 Similar to general machine learning, another variable may be input after this to further update the network. In this case, a graph is similarly generated with respect to the input variables and forward-propagated, a back-propagation ID is given, and back-propagation is performed, while unnecessary graphs are automatically discarded. For training, a general machine learning method can be used. Of course, it is also possible to perform processing in parallel using a mini batch or the like.

図３は、本実施形態に係る処理を示すフローチャートである。 FIG. 3 is a flowchart showing the processing according to this embodiment.

まず、入力部１０を介してデータの入力を受け付ける（Ｓ１００）。データの入力は、個々のデータごとに入力を受け付けてもよいし、所定の数又は所定のサイズのデータを一度に受け付けてもよい。また、入力の受け付けは、外部から明示的に入力されるものには限られず、外部のストレージ等に記憶されているデータを入力部１０が取得して入力を受け付けてもよいし、記憶部１２に記憶されているデータから取得するものであってもよい。 First, the input of data is accepted via the input unit 10 (S100). The data may be input for each individual data, or a predetermined number or a predetermined size of data may be received at once. Further, the input is not limited to being explicitly input from the outside, and the input unit 10 may acquire the data stored in the external storage or the like to receive the input, or the storage unit 12 may receive the input. It may be obtained from the data stored in.

次に、順伝播部１６は、入力されたデータを順伝播させる（Ｓ１０２）。順伝播を行うことにより、当該入力されたデータを学習対象となるモデルに入力した場合における出力を取得する。 Next, the forward propagation part 16 forward-propagates the input data (S102). By performing the forward propagation, the output when the input data is input to the model to be learned is acquired.

次に、グラフ生成部１４は、グラフを生成する（Ｓ１０４）。グラフの生成は、ネットワークの構成に基づいて変数（すなわち、受け付けた入力データ）ごとに実行される、又は、関数ごとに逆伝播される経路に基づいて実行される。すなわち、逆伝播する経路が複数である場合には、複数のグラフが生成される。この経路は、ネットワークの定義の記述から取得することが可能である。 Next, the graph generator 14 generates a graph (S104). The generation of the graph is performed for each variable (that is, the received input data) based on the network configuration, or based on the path that is backpropagated for each function. That is, when there are a plurality of backpropagating paths, a plurality of graphs are generated. This route can be obtained from the network definition description.

次に、ＩＤ付与部１８は、入力変数の保持する変数ノードに基づいて、逆伝播ＩＤを生成されたグラフの各ノードに対して付与する（Ｓ１０６）。入力変数により訓練対象のモデルにおいて逆伝播において辿る経路が異なる場合等に、入力変数、パラメータ、出力変数等により逆伝播において必要となる経路ごとに異なる逆伝播ＩＤが付与される。なお、フローチャート上では、Ｓ１０４とＳ１０６の処理は、別々に示されるが、実際には、ＩＤ付与部１８は、変数ノードが有する逆伝播ＩＤに基づいて、Ｓ１０４のグラフの生成とともに、逆伝播ＩＤを付与する。すなわち、Ｓ１０２からＳ１０６の処理は、個々に実行されるのではなく、協働して実行されてもよい。 Next, the ID assigning unit 18 assigns the back propagation ID to each node of the generated graph based on the variable node held by the input variable (S106). When the route to be followed in the back propagation in the model to be trained differs depending on the input variable, a different back propagation ID is given to each route required in the back propagation due to the input variables, parameters, output variables and the like. In the flowchart, the processes of S104 and S106 are shown separately, but in reality, the ID assigning unit 18 generates the graph of S104 based on the backpropagation ID of the variable node, and Is given. That is, the processes of S102 to S106 may be executed in cooperation with each other instead of being executed individually.

なお、別の例として、Ｓ１０２からＳ１０６の処理については、逐次的に実行されるものであってもよい。例えば、Ｓ１０２、Ｓ１０４、Ｓ１０６の順番で、順伝播して、グラフを生成して、ＩＤ付与をしてもよいし、Ｓ１０２の後に、Ｓ１０４とＳ１０６の処理を併せて実行、すなわち、順伝播が終了した後にグラフを作成とＩＤ付与とを併せて実行してもよい。あるいは、Ｓ１０４の後にＳ１０２が行われる、すなわち、逆伝播のためのグラフを生成した後に、順伝播を行ってもよい。このように、Ｓ１０２からＳ１０６の処理は、順伝播された結果を取得でき、かつ、逆伝播の経路ごとにグラフを生成し逆伝播ＩＤを付与することができる実装であればいずれにも適用することができる。 As another example, the processes of S102 to S106 may be sequentially executed. For example, in order of S102, S104, and S106, forward propagation may be performed to generate a graph and IDs may be assigned. After S102, the processes of S104 and S106 may be executed together, that is, forward propagation may be performed. After completion, the graph creation and the ID assignment may be executed together. Alternatively, S102 may be performed after S104, that is, forward propagation may be performed after a graph for back propagation is generated. As described above, the processes of S102 to S106 are applied to any implementation as long as the result of forward propagation can be acquired, and a graph can be generated for each route of back propagation and a back propagation ID can be given. be able to.

次に、逆伝播部２０は、付与された逆伝播ＩＤに基づいて誤差逆伝播の処理を実行する（Ｓ１０８）。また、演算が終了し、以後の後の演算で必要となる逆伝播ＩＤが付与されていないグラフデータ、例えば、ノードについては、当該逆伝播ＩＤに基づいた誤差逆伝播が終了した後に破棄する（Ｓ１１０）。別の例として、誤差逆伝播を行いつつ再利用されないノードを破棄してもよい。グラフ間に参照関係がある場合には、逆伝播する順番を逆伝播部２０が決定し、当該順番通りに逆伝播処理を行なってもよい。 Next, the back propagation unit 20 executes the error back propagation processing based on the given back propagation ID (S108). Further, the graph data, for example, the node after the operation is completed and the back propagation ID required in the subsequent operation is not added, is discarded after the error back propagation based on the back propagation ID is completed ( S110). As another example, a node that is not reused may be discarded while performing error back propagation. When there is a reference relationship between the graphs, the backpropagation unit 20 may determine the order of backpropagation and perform backpropagation processing in that order.

次に、逆伝播部２０は、逆伝播されていないグラフ、すなわち、破棄されていないグラフがあるかにより処理を分岐させる（Ｓ１１２）。破棄されていないグラフが存在する場合（Ｓ１１２：ＮＯ）、逆伝播はまだ終了していないので、当該破棄されていないグラフについて誤差逆伝播を実行する（Ｓ１０８〜Ｓ１１０）。 Next, the back-propagation unit 20 branches the processing depending on whether there is a back-propagated graph, that is, a graph that has not been discarded (S112). If there is a graph that has not been discarded (S112: NO), backpropagation has not yet been completed, so error backpropagation is performed on the graph that has not been discarded (S108 to S110).

例えば、逆伝播ＩＤ１のグラフについて逆伝播とグラフの破棄が終了した後、続いて、逆伝播ＩＤ２のグラフについて処理が実行される。このフローチャートにおける分岐は、あくまで便宜的に記載したものであり、動的に形成されるものではなくてもよく、生成されたグラフの全てについて逆伝播とグラフの破棄を行うということを意味する。 For example, after the back propagation and the discarding of the graph are completed for the graph of back propagation ID1, the process is subsequently executed for the graph of back propagation ID2. The branch in this flowchart is described for convenience only, and may not be dynamically formed, and means that back propagation and graph discard are performed for all generated graphs.

さらに、破線で示したように、あるグラフを用いて逆伝播が終了した後、別の順伝播処理が実行されてもよい。この場合、あるグラフの処理の後に、別の逆伝播ＩＤに関する順伝播処理が実行され、グラフが生成され、逆伝播が行われてもよい（Ｓ１０２〜Ｓ１１０）。このように、逆伝播の処理の後に他の順伝播処理、グラフ生成処理が行われてもよい。 Further, as indicated by the broken line, another forward propagation process may be executed after the back propagation is completed using a certain graph. In this case, after the processing of a certain graph, the forward propagation processing for another back propagation ID may be executed to generate a graph and back propagation may be performed (S102 to S110). In this way, other forward propagation processing and graph generation processing may be performed after the back propagation processing.

さらにまた、例えば、逆伝播ＩＤ１、２の順伝播、グラフ生成が実行され、逆伝播ＩＤ１の逆伝播、グラフ破棄が実行され、逆伝播ＩＤ３の順伝播、グラフ生成をし、その後に逆伝播ＩＤ２の逆伝播、グラフ破棄が実行されるといった複雑な処理であってもよい。この処理の順番は、上述したように、ネットワーク定義の記述に基づいて、例えば、入力変数に基づいた順伝播処理が定義され、順伝播が実行されはじめた場合に、一意的に決定されて実行されてもよい。別の例としては、後の順伝播処理の開始のタイミングで、さらなる順伝播処理の分岐が存在してもよい。このように、一般的なDefine-by-Runの形式により実行されるいかなる処理に対しても、Ｓ１０２〜Ｓ１１２の処理が適切に実行されうる。 Furthermore, for example, the forward propagation of the back propagation IDs 1 and 2 and the graph generation are executed, the back propagation of the back propagation ID1 and the graph discard are executed, the back propagation of the back propagation ID3 and the graph generation are performed, and then the back propagation ID2 is performed. It may be a complicated process such as back propagation of, and graph discard. As described above, the order of this process is uniquely determined and executed based on the description of the network definition, for example, when the forward propagation process based on the input variable is defined and the forward propagation is started. May be done. As another example, there may be a branch of a further forward propagation process at the timing of the start of the subsequent forward propagation process. In this way, the processes of S102 to S112 can be appropriately executed with respect to any process executed in the general Define-by-Run format.

すなわち、フローチャートにおけるＳ１０２からＳ１１０の処理の記載は、便宜的に順番に並べられているだけであり、それぞれの逆伝播の経路に対して順伝播及びグラフ生成と、逆伝播と、グラフ破棄と、の処理が順番に実行されればよく、同一の逆伝播ＩＤを有する処理が連続して行われることは、要求されない。上述したように、逆伝播ＩＤ１の処理は、連続して順番に行われる一方で、逆伝播ＩＤ２の処理及び逆伝播ＩＤ３の処理がそれぞれのＩＤに対して連続して順番に行われなくてもよい。さらに複雑な場合にも、同様に処理の順番を適切に入れ替えることが可能である。 That is, the description of the processing from S102 to S110 in the flowchart is merely arranged in order for convenience, and forward propagation and graph generation, back propagation, and graph discard are performed for each back propagation path. It is sufficient that the processes of 1) are executed in order, and it is not required that the processes having the same back propagation ID be continuously executed. As described above, the processing of the back propagation ID1 is continuously performed in order, while the processing of the back propagation ID2 and the processing of the back propagation ID3 are not performed consecutively in order for each ID. Good. Even in more complicated cases, it is possible to appropriately change the order of processing.

一方で、全てのグラフが破棄されている場合（Ｓ１１２：ＹＥＳ）、訓練が終了しているか否かを判断する（Ｓ１１４）。訓練の終了は、一般的な学習と同様に、所定のエポック数の訓練がおこなわれた、損失の値が所定の値よりも小さくなった、アキュラシーが所定の値よりも大きくなった、等の終了条件により判断される。 On the other hand, when all the graphs have been discarded (S112: YES), it is determined whether or not the training is completed (S114). The end of the training is the same as general learning, the training of a predetermined number of epochs was performed, the loss value was smaller than the predetermined value, the accuracy was larger than the predetermined value, etc. Judgment is made according to the termination condition of.

訓練が終了していない場合（Ｓ１１４：ＮＯ）、新たな入力変数に対して、新たなグラフを作成し（Ｓ１０２）、訓練の処理を繰り返す（Ｓ１０４〜Ｓ１１２）。 When the training is not completed (S114: NO), a new graph is created for the new input variable (S102), and the training process is repeated (S104 to S112).

一方で、訓練が終了している場合（Ｓ１１４：ＹＥＳ）、結果を出力、格納等し（Ｓ１１６）、訓練処理を終了する。 On the other hand, when the training is completed (S114: YES), the result is output, stored, etc. (S116), and the training process is ended.

以上のように、本実施形態によれば、生成したグラフの情報を全ての変数、関数等についての逆伝播を行うまで保持する必要がなくなるため、Define-by-Runの処理を行いながらもメモリの使用について効率化を図ることが可能となる。より具体的には、複数のグラフが使用できることにより、誤差逆伝播の計算のために、グラフにおいて保持する部分と、保持しない部分とをより明確に細かく制御することが可能となり、メモリの効率化を図ることができる。 As described above, according to the present embodiment, it is not necessary to retain the information of the generated graph until back propagation is performed for all variables and functions. Therefore, while performing Define-by-Run processing, memory It is possible to improve the efficiency of using the. More specifically, since multiple graphs can be used, it is possible to more clearly and finely control which part of the graph is retained and which part is not, for the purpose of calculating the error backpropagation, which improves memory efficiency. Can be planned.

例えば、誤差逆伝播において、演算ノードに対して２回以上の微分を行う等、グラフが複数の経路を有する場合には、さらに効率よく訓練を行うことが可能となる。これは、損失関数において実行される誤差逆伝播では、入力に関する勾配を計算するが、損失関数に対する誤差逆伝播においては入力に関する勾配を計算しない場合にも用いることができる。このようにグラフを生成、破棄することにより、例えば、変数の一部において誤差逆伝播が完了したタイミングにおいて、必要のなくなるグラフを破棄し、当該グラフのために使用されていたメモリ領域を解放することが可能となる。 For example, in the error back propagation, when the graph has a plurality of paths, such as performing differentiation twice or more with respect to the operation node, it becomes possible to perform the training more efficiently. This can also be used when the error backpropagation performed in the loss function calculates the gradient with respect to the input, but the error backpropagation with respect to the loss function does not calculate the gradient with respect to the input. By generating and discarding the graph in this way, for example, when the error backpropagation is completed for a part of the variables, the unnecessary graph is discarded and the memory area used for the graph is released. It becomes possible.

図２において、グラフ生成の一例について説明したが、さらに異なる例について以下説明する。いずれの場合においても、全体的な処理の流れは、上述したフローチャートと同様である。 In FIG. 2, an example of graph generation has been described, but a further different example will be described below. In any case, the overall processing flow is the same as the above-mentioned flowchart.

図４は、入力ノードにおいて、変数Ａが２つのノードを有する場合のグラフ生成を示す図である。例えば、Ｂ＝Ｆ（Ａ）という計算において、変数Ａが２つのノードＡ１、Ａ２を有している場合、図４のように逆伝播においてそれぞれのノードＡ１、Ａ２に対応する演算ノードが生成される。以下の図において、実線は、グラフのエッジ、一点鎖線は、変数等からのノード生成を示す。 FIG. 4 is a diagram showing graph generation when the variable A has two nodes in the input node. For example, in the calculation of B=F(A), when the variable A has two nodes A1 and A2, operation nodes corresponding to the respective nodes A1 and A2 are generated in the back propagation as shown in FIG. It In the following figures, the solid line indicates the edge of the graph, and the alternate long and short dash line indicates node generation from variables and the like.

変数Ａに対して、ノードＡ１、Ｆ１、Ｂ１を有し、各ノードに逆伝播ＩＤ１が付与されたグラフと、ノードＡ２、Ｆ２、Ｂ２を有し、各ノードに逆伝播ＩＤ２が付与されたグラフとがそれぞれ生成される。より具体的には、順伝播においては、演算Ｆにより、変数Ｂが出力される。そして、逆伝播の経路が２つ存在する場合、図４に示すように逆伝播の経路に基づいて、ノードＡ１、Ｆ１、Ｂ１を含む逆伝播ＩＤ１のグラフと、ノードＡ２、Ｆ２、Ｂ２を含む逆伝播ＩＤ２のグラフとが生成される。 A graph having nodes A1, F1, and B1 and a back propagation ID1 assigned to each node for the variable A, and a graph having nodes A2, F2, and B2 having a back propagation ID2 assigned to each node And are generated respectively. More specifically, in forward propagation, the variable F is output by the operation F. When there are two backpropagation routes, the graph of the backpropagation ID1 including the nodes A1, F1, and B1 and the nodes A2, F2, and B2 are included based on the backpropagation routes as shown in FIG. And a graph of back propagation ID2.

図５は、２変数を入力すると１変数を出力する場合のグラフ生成を示す図である。例えば、Ｃ＝Ｆ（Ａ，Ｂ）のような引数を２つ必要とする演算の場合である。 FIG. 5 is a diagram showing graph generation when two variables are input and one variable is output. For example, it is a case of an operation that requires two arguments such as C=F(A,B).

この場合、ＡとＢが異なる逆伝播ＩＤを持つ変数ノードを有すると、それぞれの逆伝播ＩＤに対応する演算ノードが生成され、出力変数Ｃに対して、その全てに対応する変数ノードＣ１、Ｃ２が生成される。例えば、演算ノードＦ１においては、変数Ａに対する出力である変数ノードＣ１から誤差逆伝播の処理が実行され、一方、演算ノードＦ２においては、変数Ｂに対する出力である変数ノードＣ２から誤差逆伝播の処理が実行される。 In this case, when A and B have variable nodes having different backpropagation IDs, operation nodes corresponding to the respective backpropagation IDs are generated, and for output variable C, variable nodes C1 and C2 corresponding to all of them are generated. Is generated. For example, in the operation node F1, the error backpropagation process is executed from the variable node C1 which is an output for the variable A, while in the operation node F2, the error backpropagation process is executed from the variable node C2 which is an output for the variable B. Is executed.

このように、別々の変数からの出力、及び、同一又は異なる演算に対して、別々の逆伝播ＩＤが付与されるグラフが、それぞれ生成される。 In this way, graphs to which different backpropagation IDs are given for outputs from different variables and the same or different operations are generated.

上述の全ての例においては、変数が異なるノードを有する一方で、演算ノードは、各変数に対して１つの経路からしか接続されない場合を説明したが、これには限られず、例えば、同一の演算ノードに異なる入力変数ノードへの接続が存在する場合もある。 In all the above-mentioned examples, it has been described that the variables have different nodes, but the operation node is connected to each variable from only one path. However, the present invention is not limited to this. There may be connections to different input variable nodes in the node.

図６は、このように、１つの演算ノードに対して複数の入力ノード及び出力ノードが接続する場合におけるグラフ生成を示す。一例として、Ｂ＝Ｆ（Ａ）となる演算において、入力ノードと出力ノードを参照する場合について説明する。以下の図において、破線は、参照関係を示す。例えば、グラフ生成部１４が逆伝播経路に基づいて、ノード間にこの参照関係を持つように設定する。 FIG. 6 shows graph generation in the case where a plurality of input nodes and output nodes are connected to one operation node as described above. As an example, a case where the input node and the output node are referred to in the operation of B=F(A) will be described. In the following figures, broken lines indicate reference relationships. For example, the graph generation unit 14 sets the reference relationship between the nodes based on the back propagation path.

変数ノードＡ１、Ｂ１及び演算ノードＦ１には、逆伝播ＩＤ１が付与され、変数ノードＡ２、Ｂ２及び演算ノードＦ２には、逆伝播ＩＤ２が付与される。ただし、演算ノードＦ２の演算においては、変数ノードＡ１、Ｂ１の変数が使用されるため、演算ノードＦ２には、変数ノードＡ１、Ｂ１への参照関係（図中の破線矢印）をさらに持たせる。 The backpropagation ID1 is given to the variable nodes A1 and B1 and the operation node F1, and the backpropagation ID2 is given to the variable nodes A2 and B2 and the operation node F2. However, since the variables of the variable nodes A1 and B1 are used in the calculation of the calculation node F2, the calculation node F2 is further provided with a reference relationship (broken line arrow in the figure) to the variable nodes A1 and B1.

逆伝播部２０は、このような場合、参照しているノードを有する逆伝播ＩＤのグラフから演算を実行する。図６の例を用いて説明する。逆伝播ＩＤ１、２を有する２つのグラフが存在し、逆伝播ＩＤ２が付与されている演算ノードＦ２には、逆伝播ＩＤ１が付与されている変数ノードＡ１、Ｂ１への参照が存在する。 In such a case, the back propagation unit 20 executes an operation from the graph of the back propagation ID having the referenced node. This will be described using the example of FIG. There are two graphs having backpropagation IDs 1 and 2, and the operation node F2 to which the backpropagation ID2 is assigned has a reference to the variable nodes A1 and B1 to which the backpropagation ID1 is assigned.

この場合、逆伝播部２０は、逆伝播ＩＤ２に係るグラフから演算を開始する。まず、変数ノードＢ２において、損失を算出し、演算ノードＦ２において、逆伝播された損失に基づいて勾配を算出する。例えば、この勾配の算出に、変数Ａ１、Ｂ１を用いる。前述と同様に、演算ノードは１つしか示されていないが複数であってもよい。そして、ある１つの演算ノードに対する勾配の算出に変数Ａ１、Ｂ１が用いられてもよいし、異なる演算ノードに対する勾配の算出にそれぞれ変数Ａ１、Ｂ１が用いられてもよいし、これらの組み合わせであってもよい。入出力される変数が３以上であっても同様である。 In this case, the back propagation unit 20 starts the calculation from the graph related to back propagation ID2. First, the variable node B2 calculates the loss, and the calculation node F2 calculates the gradient based on the backpropagated loss. For example, variables A1 and B1 are used to calculate this gradient. Similar to the above, only one operation node is shown, but a plurality of operation nodes may be used. Then, the variables A1 and B1 may be used to calculate the gradient for a certain operation node, the variables A1 and B1 may be used to calculate the gradient for different operation nodes, respectively, or a combination thereof. May be. The same applies when the number of input/output variables is 3 or more.

逆伝播部２０により演算ノードＦ２の勾配が算出され、変数ノードＡ２まで誤差逆伝播の処理が実行され、逆伝播ＩＤ２に関するグラフの誤差逆伝播の処理が終了すると、逆伝播部２０は、逆伝播ＩＤ２が付与されている各ノードを有するグラフを破棄する。その後、逆伝播ＩＤ１に関する誤差逆伝播の処理を実行する。 When the backpropagation unit 20 calculates the gradient of the operation node F2, the error backpropagation process is executed up to the variable node A2, and when the error backpropagation process of the graph relating to the backpropagation ID2 ends, the backpropagation unit 20 performs the backpropagation process. Discard the graph that has each node to which ID2 is assigned. Then, the error backpropagation process for the backpropagation ID1 is executed.

このように、参照関係を持たせておくことにより、複数のグラフ間で変数ノード又は演算ノードとの間に相互に関係性を有する場合にも、グラフの生成、破棄を実行することにより、メモリの使用効率を向上させることが可能である。逆伝播の処理（演算）を行う順番は、例えば、逆伝播部２０が各グラフにおける参照関係を抽出し、抽出結果に基づいて決定する。 In this way, by having a reference relationship, even when there is a mutual relationship between a variable node or an operation node among multiple graphs, the graphs are created and discarded to save memory. It is possible to improve the use efficiency of. The order of performing the back propagation processing (calculation) is determined, for example, by the back propagation unit 20 extracting the reference relationship in each graph and based on the extraction result.

微分の演算が他の逆伝播ＩＤの変数ノードに依存する場合には、微分の計算自体を通常の逆伝播の計算と同様に演算ノード及び変数ノードを用いて記憶しておき、参照関係に基づいて演算を行う。 When the differential operation depends on another backpropagation ID variable node, the differential calculation itself is stored using the operation node and the variable node in the same manner as the normal backpropagation calculation, and based on the reference relationship. And calculate.

なお、図７のように、変数ノードに対して参照関係を持たせておいてもよい。別の例として、参照関係を示す変数ノードを各グラフに備え、他のグラフ内のノードから、又は、他のグラフ内のノードへの参照関係を格納しておき、逆伝播を実行するタイミングで当該変数ノードへの参照を確認することにより、グラフが効率よく破棄できる順番で逆伝播を行ってもよい。例えば、他の逆伝播ＩＤを有するノードへの参照関係を確認し、他のグラフへの参照があるグラフから逆伝播を実行する。 As shown in FIG. 7, variable nodes may have a reference relationship. As another example, each graph is provided with a variable node that indicates a reference relationship, and a reference relationship from a node in another graph or to a node in another graph is stored, and at the timing of performing back propagation. By confirming the reference to the variable node, backpropagation may be performed in an order in which the graph can be efficiently discarded. For example, the back-propagation is executed from a graph having a reference to another graph by confirming a reference relation to a node having another back-propagation ID.

別の例として、参照関係を示す参照グラフをさらに備えてもよく、参照グラフの末端となるノードに対応する逆伝播ＩＤを有するグラフから演算を実行してもよい。このようにすると、参照関係が複雑となる複数のグラフを有する場合にも、容易に適用することが可能となる。 As another example, a reference graph showing a reference relationship may be further provided, and the operation may be executed from the graph having the backpropagation ID corresponding to the terminal node of the reference graph. By doing so, it is possible to easily apply even when there are a plurality of graphs having complicated reference relationships.

参照関係は、他のグラフのノードを参照するグラフに対して持たせるようにしたが、逆に、他のグラフから参照されるノードを有する関係をグラフとともに記憶させてもよい。この場合、上述と同様に適切に参照するノードから逆伝播の演算を実行することにより、演算が終了したグラフの情報から破棄することが可能となる。 The reference relationship is provided for a graph that refers to a node of another graph, but conversely, a relationship having a node referred to by another graph may be stored together with the graph. In this case, it is possible to discard the information from the graph for which the computation is completed by executing the computation of the back propagation from the node that appropriately refers to the same as described above.

図８は、さらに複雑な場合のグラフ生成の例を示す。入出力と演算の関係は、上記と同様にＢ＝Ｆ（Ａ）であるとする。さらに、演算Ｆの微分Ｆ’が入出力変数Ａ、Ｂを用いた計算であるとする。図８において、点線は、破棄されたノード、エッジを示す。 FIG. 8 shows an example of graph generation in a more complicated case. It is assumed that the relationship between input/output and calculation is B=F(A) as in the above. Further, it is assumed that the differential F′ of the operation F is a calculation using the input/output variables A and B. In FIG. 8, dotted lines indicate discarded nodes and edges.

まず、グラフ生成時において、演算ノードＦ２から変数ノードＡ１、Ｂ１への参照がされる。ΔＢは、誤差逆伝播の途中結果として得られている変数Ｂに対する勾配を表し、ΔＡは、演算Ｆに対する誤差逆伝播の結果として得られる変数Ａに対する勾配を表す。この図８においては、ΔＢについて逆伝播ＩＤ１が付与されているノードを有する場合を示している。 First, at the time of graph generation, the calculation node F2 refers to the variable nodes A1 and B1. ΔB represents a gradient for the variable B obtained as a result of the error back propagation, and ΔA represents a gradient for the variable A obtained as a result of the error back propagation for the operation F. FIG. 8 shows the case where there is a node to which the back propagation ID1 is assigned for ΔB.

逆伝播ＩＤ２が付与されているグラフから逆伝播の処理が実行される。Ｆ２の微分Ｆ２’を算出すると、Ｆ２’の演算ノードには、逆伝播ＩＤ１が付与される。そして、この演算ノードＦ２’から変数ノードＡ１、Ｂ１への参照がされる。そして、逆伝播が終了した変数ノードＢ２、演算ノードＦ２、変数ノードＡ２は、破棄される。 The back propagation processing is executed from the graph to which the back propagation ID2 is assigned. When the differential F2' of F2 is calculated, the back propagation ID1 is given to the operation node of F2'. Then, reference is made from the operation node F2' to the variable nodes A1 and B1. Then, the variable node B2, the operation node F2, and the variable node A2 for which the back propagation is completed are discarded.

このように、他のグラフに依存するノードを有する場合には、逆伝播ＩＤが付与された変数の入力を伝達するノードを生成することにより、逆伝播の経路ごとに前述と同様にグラフを破棄することが可能となる。参照しているノードの誤差逆伝播が完了した後に、当該逆伝播ＩＤが付与された変数ノード、演算ノードは破棄され、特に、演算ノードが保持していたリソースのうち、他の逆伝播ＩＤの演算ノードと共有していないものについてリソースが解放される。一方で、他の逆伝播ＩＤの計算グラフに対する誤差逆伝播は、例えば、演算ノードＦ２’が保持されることにより、正しく演算を行うことが可能である。 In this way, when there is a node that depends on another graph, by generating a node that transmits the input of the variable to which the backpropagation ID is assigned, the graph is discarded for each backpropagation route as described above. It becomes possible to do. After the error backpropagation of the referenced node is completed, the variable node and the operation node to which the backpropagation ID is assigned are discarded, and in particular, of the resources held by the operation node, other backpropagation IDs Resources are released for those that are not shared with compute nodes. On the other hand, the error backpropagation with respect to the calculation graph of the other backpropagation ID can be correctly calculated by, for example, holding the calculation node F2'.

本実施形態においては、単純な場合についていくつか例を挙げて説明したが、グラフを生成し、生成したグラフごとにノードに一意性を有する逆伝播ＩＤを付与し、必要であれば異なる逆伝播ＩＤが付与されたノード間において参照関係を持たせることにより、逆伝播が完了した計算グラフから順にリソースを解放することが可能である。これは、前述で示したものよりも複雑なグラフになっても同様に適用することができる。 In this embodiment, a simple case has been described with some examples, but a graph is generated, a backpropagation ID having uniqueness is given to a node for each generated graph, and different backpropagation is performed if necessary. By providing the reference relationship between the nodes to which the IDs are assigned, it is possible to release the resources in order from the calculation graph in which the back propagation is completed. This can be similarly applied to a more complicated graph than that shown above.

このように、ネットワークの逆伝播をする場合に、そのネットワーク全体についてグラフを生成するのではなく、逆伝播する経路に基づいてグラフを生成することにより、リソースの利用効率を向上することができる。これにより、複雑な演算を行うDefine-by-Run方式においても、メモリの効率化を図ることが可能となる。 In this way, when backpropagating a network, a graph is not generated for the entire network, but a graph is generated based on a backpropagating path, so that resource utilization efficiency can be improved. As a result, it is possible to improve the efficiency of the memory even in the Define-by-Run method that performs complicated calculation.

前述した実施形態における訓練装置１において、各機能は、アナログ回路、デジタル回路又はアナログ・デジタル混合回路で構成された回路であってもよい。また、各機能の制御を行う制御回路を備えていてもよい。各回路の実装は、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等によるものであってもよい。 In the training device 1 according to the above-described embodiment, each function may be a circuit including an analog circuit, a digital circuit, or an analog/digital mixed circuit. Moreover, a control circuit for controlling each function may be provided. Each circuit may be mounted by using an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like.

上記の全ての記載において、訓練装置１の少なくとも一部はハードウェアで構成されていてもよいし、ソフトウェアで構成され、ソフトウェアの情報処理によりＣＰＵ（Central Processing Unit）等が実施をしてもよい。ソフトウェアで構成される場合には、訓練装置１及びその少なくとも一部の機能を実現するプログラムをフレキシブルディスクやＣＤ−ＲＯＭ等の記憶媒体に収納し、コンピュータに読み込ませて実行させるものであってもよい。記憶媒体は、磁気ディスクや光ディスク等の着脱可能なものに限定されず、ハードディスク装置やメモリなどの固定型の記憶媒体であってもよい。すなわち、ソフトウェアによる情報処理がハードウェア資源を用いて具体的に実装されるものであってもよい。さらに、ソフトウェアによる処理は、ＦＰＧＡ等の回路に実装され、ハードウェアが実行するものであってもよい。ジョブの実行は、例えば、ＧＰＵ（Graphics Processing Unit）等のアクセラレータを使用して行ってもよい。 In all the above descriptions, at least a part of the training apparatus 1 may be configured by hardware, or may be configured by software, and a CPU (Central Processing Unit) or the like may be implemented by information processing of the software. .. In the case of being configured by software, the training apparatus 1 and a program that realizes at least a part of the functions thereof may be stored in a storage medium such as a flexible disk or a CD-ROM, read by a computer, and executed. Good. The storage medium is not limited to a removable medium such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk device or a memory. That is, information processing by software may be specifically implemented by using hardware resources. Further, the processing by software may be implemented in a circuit such as FPGA and executed by hardware. The job may be executed by using an accelerator such as GPU (Graphics Processing Unit).

例えば、コンピュータが読み取り可能な記憶媒体に記憶された専用のソフトウェアをコンピュータが読み出すことにより、コンピュータを上記の実施形態の装置とすることができる。記憶媒体の種類は特に限定されるものではない。また、通信ネットワークを介してダウンロードされた専用のソフトウェアをコンピュータがインストールすることにより、コンピュータを上記の実施形態の装置とすることができる。こうして、ソフトウェアによる情報処理が、ハードウェア資源を用いて、具体的に実装される。 For example, the computer can be the device of the above-described embodiment by the computer reading the dedicated software stored in the computer-readable storage medium. The type of storage medium is not particularly limited. Further, the computer can be the device of the above embodiment by installing the dedicated software downloaded via the communication network by the computer. In this way, information processing by software is specifically implemented using hardware resources.

図９は、本発明の一実施形態におけるハードウェア構成の一例を示すブロック図である。訓練装置１は、プロセッサ７１と、主記憶装置７２と、補助記憶装置７３と、ネットワークインタフェース７４と、デバイスインタフェース７５と、を備え、これらがバス７６を介して接続されたコンピュータ装置７として実現できる。 FIG. 9 is a block diagram showing an example of the hardware configuration according to the embodiment of the present invention. The training device 1 includes a processor 71, a main storage device 72, an auxiliary storage device 73, a network interface 74, and a device interface 75, and these can be realized as a computer device 7 connected via a bus 76. ..

なお、図９のコンピュータ装置７は、各構成要素を一つ備えているが、同じ構成要素を複数備えていてもよい。また、１台のコンピュータ装置７が示されているが、ソフトウェアが複数のコンピュータ装置にインストールされて、当該複数のコンピュータ装置それぞれがソフトウェアの異なる一部の処理を実行してもよい。 Note that the computer device 7 of FIG. 9 includes one each of the constituent elements, but may include a plurality of the same constituent elements. Further, although one computer device 7 is shown, software may be installed in a plurality of computer devices, and each of the plurality of computer devices may execute a part of processing of different software.

プロセッサ７１は、コンピュータの制御装置および演算装置を含む電子回路（処理回路、Processing circuit、Processing circuitry）である。プロセッサ７１は、コンピュータ装置７の内部構成の各装置などから入力されたデータやプログラムに基づいて演算処理を行い、演算結果や制御信号を各装置などに出力する。具体的には、プロセッサ７１は、コンピュータ装置７のＯＳ（Operating System）や、アプリケーションなどを実行することにより、コンピュータ装置７を構成する各構成要素を制御する。プロセッサ７１は、上記の処理を行うことができれば特に限られるものではない。訓練装置１及びそれらの各構成要素は、プロセッサ７１により実現される。ここで、処理回路とは、１チップ上に配置された１又は複数の電気回路を指してもよいし、２つ以上のチップあるいはデバイス上に配置された１又は複数の電気回路を指してもよい。 The processor 71 is an electronic circuit (Processing circuit, Processing circuitry) including a control device and an arithmetic unit of a computer. The processor 71 performs arithmetic processing based on data and programs input from each device of the internal configuration of the computer device 7, and outputs the arithmetic result and control signal to each device. Specifically, the processor 71 controls each constituent element of the computer device 7 by executing an OS (Operating System) of the computer device 7, an application, or the like. The processor 71 is not particularly limited as long as it can perform the above processing. The training device 1 and each component thereof are implemented by the processor 71. Here, the processing circuit may refer to one or a plurality of electric circuits arranged on one chip, or one or a plurality of electric circuits arranged on two or more chips or devices. Good.

主記憶装置７２は、プロセッサ７１が実行する命令および各種データなどを記憶する記憶装置であり、主記憶装置７２に記憶された情報がプロセッサ７１により直接読み出される。補助記憶装置７３は、主記憶装置７２以外の記憶装置である。なお、これらの記憶装置は、電子情報を格納可能な任意の電子部品を意味するものとし、メモリでもストレージでもよい。また、メモリには、揮発性メモリと、不揮発性メモリがあるが、いずれでもよい。訓練装置１内において各種データを保存するためのメモリ、例えば、記憶部１２は、主記憶装置７２または補助記憶装置７３により実現されてもよい。例えば、前述した各記憶部の少なくとも一部は、この主記憶装置７２又は補助記憶装置７３に実装されていてもよい。別の例として、アクセラレータが備えられている場合には、前述した各記憶部の少なくとも一部は、当該アクセラレータに備えられているメモリ内に実装されていてもよい。 The main storage device 72 is a storage device that stores instructions executed by the processor 71 and various data. The information stored in the main storage device 72 is directly read by the processor 71. The auxiliary storage device 73 is a storage device other than the main storage device 72. Note that these storage devices mean arbitrary electronic components capable of storing electronic information, and may be a memory or a storage. The memory includes a volatile memory and a non-volatile memory, but either may be used. A memory for storing various data in the training device 1, for example, the storage unit 12 may be realized by the main storage device 72 or the auxiliary storage device 73. For example, at least a part of each storage unit described above may be mounted in the main storage device 72 or the auxiliary storage device 73. As another example, when an accelerator is provided, at least a part of each storage unit described above may be implemented in the memory provided in the accelerator.

ネットワークインタフェース７４は、無線または有線により、通信ネットワーク８に接続するためのインタフェースである。ネットワークインタフェース７４は、既存の通信規格に適合したものを用いればよい。ネットワークインタフェース７４により、通信ネットワーク８を介して通信接続された外部装置９Ａと情報のやり取りが行われてもよい。 The network interface 74 is an interface for connecting to the communication network 8 wirelessly or by wire. The network interface 74 may be one that conforms to the existing communication standard. The network interface 74 may exchange information with the external device 9A that is communicatively connected via the communication network 8.

外部装置９Ａは、例えば、カメラ、モーションキャプチャ、出力先デバイス、外部のセンサ、入力元デバイスなどが含まれる。また、外部装置９Ａは、訓練装置１の構成要素の一部の機能を有する装置でもよい。そして、コンピュータ装置７は、訓練装置１の処理結果の一部を、クラウドサービスのように通信ネットワーク８を介して送受信してもよい。 The external device 9A includes, for example, a camera, a motion capture device, an output destination device, an external sensor, an input source device, and the like. Further, the external device 9A may be a device having a function of a part of the constituent elements of the training device 1. Then, the computer device 7 may send and receive a part of the processing result of the training device 1 via the communication network 8 like a cloud service.

デバイスインタフェース７５は、外部装置９Ｂと直接接続するＵＳＢ（Universal Serial Bus）などのインタフェースである。外部装置９Ｂは、外部記憶媒体でもよいし、ストレージ装置でもよい。各記憶部は、外部装置９Ｂにより実現されてもよい。 The device interface 75 is an interface such as a USB (Universal Serial Bus) that is directly connected to the external device 9B. The external device 9B may be an external storage medium or a storage device. Each storage unit may be realized by the external device 9B.

外部装置９Ｂは出力装置でもよい。出力装置は、例えば、画像を表示するための表示装置でもよいし、音声などを出力する装置などでもよい。例えば、ＬＣＤ（Liquid Crystal Display）、ＣＲＴ（Cathode Ray Tube）、ＰＤＰ（Plasma Display Panel）、スピーカなどがあるが、これらに限られるものではない。 The external device 9B may be an output device. The output device may be, for example, a display device for displaying an image, a device for outputting sound, or the like. For example, there are an LCD (Liquid Crystal Display), a CRT (Cathode Ray Tube), a PDP (Plasma Display Panel), a speaker, and the like, but the invention is not limited thereto.

なお、外部装置９Ｂは入力装置でもよい。入力装置は、キーボード、マウス、タッチパネルなどのデバイスを備え、これらのデバイスにより入力された情報をコンピュータ装置７に与える。入力装置からの信号はプロセッサ７１に出力される。 The external device 9B may be an input device. The input device includes devices such as a keyboard, a mouse, and a touch panel, and gives information input by these devices to the computer device 7. The signal from the input device is output to the processor 71.

本発明の態様は、上記した個々の実施形態に限定されるものではない。特許請求の範囲に規定された内容及びその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲において種々の追加、変更及び部分的削除が可能である。例えば、前述した全ての実施形態において、説明に用いた数値は、一例として示したものであり、これらに限られるものではない。 Aspects of the invention are not limited to the individual embodiments described above. Various additions, changes and partial deletions are possible without departing from the conceptual idea and gist of the present invention derived from the contents defined in the claims and the equivalents thereof. For example, in all the embodiments described above, the numerical values used for the description are shown as an example, and the numerical values are not limited to these.

１：訓練装置、１０：入力部、１２：記憶部、１４：グラフ生成部、１６：順伝播部、１８：ＩＤ付与部、２０：逆伝播部、２２：出力部 1: Training device, 10: Input unit, 12: Storage unit, 14: Graph generation unit, 16: Forward propagation unit, 18: ID assignment unit, 20: Back propagation unit, 22: Output unit

Claims

One or more memories,
One or more processors,
Equipped with
The one or more processors are
Generate a graph based on the path of error backpropagation,
In the graph, an identifier is given to each node based on the path of the error back propagation,
Performing backpropagation based on the graph and the identifier,
Training equipment.

The one or more processors are
Generates a node that is a path of error back propagation corresponding to the input variable, the operation in forward propagation, and the output variable,
The training device according to claim 1.

The one or more processors are
When there are a plurality of different backpropagation paths, the plurality of graphs showing the respective paths are generated.
The training device according to claim 1 or 2.

The one or more processors are
For each of the graphs having the same error backpropagation path, the identifier is uniquely assigned to each node,
The training device according to any one of claims 1 to 3.

The one or more processors are
The different identifiers are given to nodes belonging to the graphs having different paths for the back propagation of errors,
The training device according to claim 4.

The one or more processors are
The error backpropagation is executed for the nodes to which the same identifier is added, and when the error backpropagation is completed for the identifier, the data of the graph to which the identifier is added is discarded.
The training device according to any one of claims 1 to 5.

The one or more processors are
When there is a reference relationship between the nodes having different identifiers, a node having the reference relationship is generated,
Determining the order of the graphs for backpropagation based on the reference relationship,
The training device according to any one of claims 1 to 6.

One or more processors generate a graph based on the path of error backpropagation,
The one or more processors assign an identifier to each node in the graph based on the error backpropagation path;
The one or more processors perform error backpropagation based on the graph and the identifier,
How to generate the model.

The one or more processors store the generated model in one or more memories,
The method for generating a model according to claim 8.

When executed by one or more processors,
Generate a graph based on the path of error backpropagation,
In the graph, an identifier is assigned to each node based on the path of error back propagation,
Performing backpropagation based on the graph and the identifier,
program.