JP2019109875A

JP2019109875A - System, program and method

Info

Publication number: JP2019109875A
Application number: JP2018159500A
Authority: JP
Inventors: 武戸田; Takeshi Toda; 耕祐春木; Kosuke Haruki
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2017-12-18
Filing date: 2018-08-28
Publication date: 2019-07-04
Anticipated expiration: 2038-08-28
Also published as: JP6877393B2

Abstract

To provide a system, a program and a method capable of realizing high scalability in parallel distribution learning processing.SOLUTION: When calculation of a gradient due to first and second nodes is faster than calculation of a gradient due to third and fourth nodes, a second weighting coefficient is further updated based on first to second gradients calculated by the first and second nodes in n+1-th parallel distribution processing, and a fourth weight coefficient is further updated based on the first to fourth gradients calculated by the first to fourth nodes in m+1-th parallel distribution processing. When calculation of the gradient due to the third and fourth nodes is faster than calculation of the gradient due to the first and second nodes, the second weighting coefficient is further updated based on the first to fourth gradients in the n+1-th parallel distribution processing, and the fourth weight coefficient is further updated based on the third to fourth gradients in the m+1-th parallel distribution processing.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、システム、プログラム及び方法に関する。 Embodiments of the present invention relate to a system, a program and a method.

近年、機械学習の１つであるディープラーニングによるデータの有効活用が期待されている。ディープラーニングにおいて、大規模なデータを用いた学習結果をより高速に得るためには、複数のノード（コンピュータ）による学習の並列処理を実行し、各ノードによる学習経過を共有する並列分散学習処理が求められる。このような並列分散学習処理では、ノード間での通信によって学習経過を示すデータが共有される。 In recent years, effective use of data by deep learning which is one of machine learning is expected. In deep learning, in order to obtain learning results using large-scale data at high speed, parallel distributed learning processing is performed in which parallel processing of learning by a plurality of nodes (computers) is performed and learning progress by each node is shared. Desired. In such parallel distributed learning processing, data indicating learning progress is shared by communication between nodes.

ここで、上記した学習結果をより高速に得るために並列処理を実行するノードの数を増加させることが考えられるが、一般的な並列分散学習処理においては、当該ノード数を増加させたとしても効率的に学習結果を得ることができない（つまり、学習速度がスケールしない）場合があり、高いスケーラビリティを実現することは困難である。 Here, it is conceivable to increase the number of nodes executing parallel processing in order to obtain the above-described learning result more quickly, but in general parallel distributed learning processing, even if the number of nodes is increased. In some cases, learning results can not be obtained efficiently (that is, learning speed does not scale), and it is difficult to achieve high scalability.

米国特許出願公開第２０１６／０２６７３８０号明細書US Patent Application Publication No. 2016/0267380

そこで、本発明が解決しようとする課題は、並列分散学習処理における高いスケーラビリティを実現することが可能なシステム、プログラム及び方法を提供することにある。 Therefore, the problem to be solved by the present invention is to provide a system, a program, and a method capable of realizing high scalability in parallel distributed learning processing.

本実施形態に係るシステムは、第１グループに属する第１ノードと第２ノードと、第２グループに属する第３ノードと第４ノードと、を備える。前記システムは、前記第１ノードと前記第２ノードがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出され、前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出され、前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものである。 The system according to the present embodiment includes a first node and a second node belonging to a first group, and a third node and a fourth node belonging to a second group. The system updates the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute n (n is a natural number) parallel distributed processing A second gradient for updating the first weighting factor of the objective function to the second weighting factor is calculated by the second node, and the third node and the second When the fourth node executes m (m is a natural number) parallel distributed processing, a third gradient is calculated by the third node to update the third weighting factor of the objective function to the fourth weighting factor, And a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient is calculated by the fourth node, and calculation of the gradient by the first node and the second node is the same as the fourth gradient. Third node and the fourth Update the first weighting factor based on the first and second gradients in the (n + 1) th parallel distributed processing executed by the first node and the second node if the calculation is faster than the calculation of the gradient by the In the m + 1th parallel distributed processing performed by the third node and the fourth node to further update the second weighting factor, and the third weighting factor based on the first to fourth gradients Are further updated.

第１の実施形態に係るシステムの概要を説明するための図。BRIEF DESCRIPTION OF THE DRAWINGS The figure for demonstrating the outline | summary of the system which concerns on 1st Embodiment. 本システムの構成の一例を示す図。The figure which shows an example of a structure of this system. サーバノードのシステム構成の一例を示す図。The figure which shows an example of the system configuration | structure of a server node. ワーカノードのシステム構成の一例を示す図。The figure which shows an example of the system configuration | structure of a worker node. サーバノードの機能構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of a functional configuration of a server node. ワーカノードの機能構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of a functional configuration of a worker node. 本システムの処理手順の一例を示すシーケンスチャート。The sequence chart which shows an example of the process sequence of this system. 代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a representation node. 非代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a non-representative node. 所定の汎化性能を得ることができるまでの学習時間を学習方式毎に示す図。The figure which shows the learning time until it can acquire predetermined | prescribed generalization performance for every learning system. 本システムの変形例の概要を説明するための図。The figure for demonstrating the outline | summary of the modification of this system. 第２の実施形態に係るシステムの構成の一例を示す図。A figure showing an example of composition of a system concerning a 2nd embodiment. 本システムの処理手順の一例を示すシーケンスチャート。The sequence chart which shows an example of the process sequence of this system. 代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a representation node. 非代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a non-representative node.

以下、図面を参照して、各実施形態について説明する。
（第１の実施形態）
本実施形態に係るシステムは、例えば大規模なデータを扱うディープラーニングにおいて目的関数を基準とする並列分散学習処理を実行する。なお、目的関数を基準とする並列分散学習処理とは、目的関数を学習結果のフィードバック（評価値）として用いて、複数の処理主体で学習されるものであればどのようなものであってもよく、例えば目的関数を最適化するための並列分散学習処理である。 Hereinafter, each embodiment will be described with reference to the drawings.
First Embodiment
The system according to the present embodiment executes, for example, parallel distributed learning processing based on an objective function in deep learning that handles large-scale data. In addition, parallel distributed learning processing based on the objective function is any processing that can be learned by a plurality of processing subjects using the objective function as feedback (evaluation value) of the learning result. For example, parallel distributed learning processing for optimizing an objective function.

ところで、ディープラーニングでは、目的関数を最適化する手法として、例えば確率的勾配降下法（ＳＧＤ：stochastic gradient descent）が用いられる。このＳＧＤでは、勾配（ベクトル）と称される最適解方向へのベクトルを用いて、目的関数のパラメータが繰り返し更新される。なお、目的関数のパラメータには、例えば重み係数が含まれる。 In deep learning, stochastic gradient descent (SGD), for example, is used as a method of optimizing an objective function. In SGD, parameters of the objective function are repeatedly updated using a vector in the direction of an optimal solution called a gradient (vector). The parameters of the objective function include, for example, weighting factors.

ＳＧＤにおける現在の状態を示す重み係数（重みベクトル）、勾配ベクトル及び学習係数をそれぞれW^(t)、∇W^(t)、ε^(t)とすると、更新後の重み係数W^(t+1)は、以下の式で表される。 Assuming that the weighting factor (weighting vector) indicating the current state in SGD, the gradient vector and the learning factor are W ^(t) , ∇W ^(t) and ε ^(t) , the weighting factor W ^{(t + 1)} after update Is expressed by the following equation.

W^(t+1)＝W^(t)−ε^(t)∇W^(t) 式（１）
なお、更新幅を決定する学習係数ε^(t)は学習の進度に応じて適応的に決定され、例えば学習の進度に応じて減衰する。 W ^{(t + 1)} = W ^(t) -ε ^(t) ∇ W ^(t) Formula (1)
The learning coefficient ε ^(t) for determining the update width is adaptively determined according to the progress of learning, and attenuates according to the progress of learning, for example.

上記した勾配は、訓練事例である学習データを目的関数に入力することで得られるが、計算コストの観点から複数の学習データをまとめて入力して平均勾配を得る「ミニバッチ法」が一般的に使用される。この平均勾配を得るための学習データの数はバッチサイズと称される。ＳＧＤによる最適化を並列分散化する際に共有する学習経過としては例えば勾配が用いられる。 The gradient described above can be obtained by inputting training data that is a training case into an objective function, but the “mini-batch method” is generally used in which a plurality of learning data are input together to obtain an average gradient from the viewpoint of calculation cost. used. The number of training data to obtain this average gradient is called the batch size. For example, a gradient is used as a learning process shared in parallel decentralization of optimization by SGD.

ここで、並列分散学習の主要なアルゴリズムとしては、例えば同期型の並列分散学習方式及び非同期型の並列分散学習方式がある。 Here, as a main algorithm of parallel distributed learning, there are, for example, a synchronous parallel distributed learning method and an asynchronous parallel distributed learning method.

上記した並列分散学習処理においては複数のノードに勾配計算を実行させるが、同期型の並列分散学習方式は、当該複数のノードにおける勾配計算が同期して実行される方式である。具体的には、同期型の並列分散学習方式の一例としてはＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤが挙げられるが、当該Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによれば、上記したミニバッチ法の勾配計算を複数ノードに分散し、全ノードが計算した勾配の平均値を重み係数の更新に用いる。なお、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの実施態様は複数存在し、例えば全ノード間で勾配を共有する集団通信型と、パラメータサーバと称されるノードに勾配を集約して重み係数の更新処理を行い、当該更新された重み係数を各ノードに配布するパラメータサーバ型等がある。 In the above-described parallel distributed learning processing, a plurality of nodes are caused to execute gradient calculation, but the synchronous parallel distributed learning method is a method in which gradient calculations in the plurality of nodes are executed in synchronization. Specifically, Synchronous-SGD is mentioned as an example of the synchronous parallel distributed learning method, but according to the Synchronous-SGD, the gradient calculation of the mini-batch method described above is distributed to a plurality of nodes, and all the nodes are calculated. The average value of the slopes is used to update the weighting factor. There are a plurality of Synchronous-SGD embodiments, for example, collective communication type in which all nodes share a gradient, and gradients are summarized in nodes called parameter servers to perform weighting factor update processing, and the update There is a parameter server type or the like that distributes the weighted coefficients to each node.

同期型の並列分散方式においては、勾配を計算するノード数（並列数）が増えるほど同期コストが増大し、処理スループットが低下すること、当該ノード数が増えるほどバッチサイズが増大し、汎化性能が低下すること、全体の処理速度が複数のノードの中の処理の遅いノードの処理速度の影響を受けること等が知られている。 In the synchronous parallel distributed system, the synchronization cost increases as the number of nodes for calculating the gradient (the number of parallel operations) increases, the processing throughput decreases, and the batch size increases as the number of nodes increases, and the generalization performance It is known that the overall processing speed is affected by the processing speed of slow processing nodes among multiple nodes.

一方、非同期型の並列分散学習方式は、複数のノードにおける勾配計算が非同期で実行される方式である。具体的には、非同期型の並列分散学習方式の一例としてはＡｓｙｂｃｈｒｏｎｏｕｓ−ＳＧＤが挙げられるが、当該Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤは、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤと同様に勾配を共有するアルゴリズムである。しかしながら、Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによれば、同期による勾配の平均化を行うことなく、各ノードが計算した勾配をそのまま用いて重み係数が更新される。なお、Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの実施態様はパラメータサーバ型が主である。 On the other hand, the asynchronous parallel distributed learning method is a method in which gradient calculations in a plurality of nodes are executed asynchronously. Specifically, as an example of an asynchronous parallel distributed learning system, there is an Asybchronous-SGD, but the Asynchronous-SGD is an algorithm that shares a gradient in the same manner as the Synchronous-SGD. However, according to Asynchronous-SGD, the weighting factor is updated using the gradient calculated by each node as it is without performing gradient averaging by synchronization. The embodiment of Asynchronous-SGD is mainly a parameter server type.

非同期型の並列分散学習方式においては、同期型の並列分散方式と比較して高い処理スループットが得られるが、ノード毎の処理速度の差により収束速度が低下する等の要因からスケーラビリティには限界があることが知られている。 In the asynchronous parallel distributed learning method, higher processing throughput is obtained compared to the synchronous parallel distributed method, but due to factors such as the convergence speed decreasing due to the difference in processing speed between nodes, the scalability is limited. It is known that there is.

なお、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとは異なるアプローチのＡｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤは、当該Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとは異なり、バッチサイズに依存しない並列分散学習アルゴリズムであることから、バッチサイズ非依存並列方式（処理）等と称される。なお、バッチサイズ非依存並列方式は、その多くが非同期型の並列分散学習方式である。 Note that Asynchronous-SGD, which is an approach different from Synchronous-SGD, is a parallel distributed learning algorithm that does not depend on batch size, unlike the Synchronous-SGD, and is therefore called a batch size independent parallel method (processing) or the like. Ru. The batch size independent parallel method is mostly an asynchronous parallel distributed learning method.

ここで、図１を参照して、本実施形態に係るシステム（以下、本システムと表記）の概要について説明する。本システムは、並列分散学習処理において、階層的に異なる学習処理（例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤと他のバッチサイズ非依存並列方式）を実行可能な構成を有するものとする。 Here, with reference to FIG. 1, an outline of a system according to the present embodiment (hereinafter, referred to as the present system) will be described. The present system is assumed to have a configuration capable of executing hierarchically different learning processing (for example, Synchronous-SGD and other batch size independent parallel methods) in parallel distributed learning processing.

具体的には、図１に示すように、第一階層においては複数のノードが属する各グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散処理を行い、第二階層では第一階層における各グループの代表ノード同士でバッチサイズ非依存並列分散処理を行うものとする。以下、本システムについて詳細に説明する。 Specifically, as shown in FIG. 1, parallel distributed processing of the mini-batch method by Synchronous-SGD is performed in each group to which a plurality of nodes belong in the first hierarchy, and in the second hierarchy, each group in the first hierarchy is It is assumed that batch size independent parallel distributed processing is performed between representative nodes. Hereinafter, the present system will be described in detail.

図２は、本システムの構成の一例を示す。図２に示すように、本システム１０は、サーバノード２０、複数のワーカノード３０及び複数のワーカノード４０を備える。 FIG. 2 shows an example of the configuration of the present system. As shown in FIG. 2, the present system 10 includes a server node 20, a plurality of worker nodes 30, and a plurality of worker nodes 40.

本実施形態において、複数のワーカノード３０はグループ１に属しており、複数のワーカノード４０はグループ２に属している。 In the present embodiment, the plurality of worker nodes 30 belong to group 1, and the plurality of worker nodes 40 belong to group 2.

サーバノード２０は、グループ１に属する複数のワーカノード３０のうちの１つのワーカノード３０（以下、グループ１の代表ノード３０と表記）と通信可能に接続される。また、サーバノード２０は、グループ２に属する複数のワーカノード３０のうちの１つのワーカノード４０（以下、グループ２の代表ノード４０と表記）と通信可能に接続される。 Server node 20 is communicably connected to one worker node 30 (hereinafter referred to as representative node 30 of group 1) among a plurality of worker nodes 30 belonging to group 1. Further, server node 20 is communicably connected to one worker node 40 (hereinafter referred to as representative node 40 of group 2) of a plurality of worker nodes 30 belonging to group 2.

なお、複数のワーカノード３０のうち、サーバノード２０と通信可能に接続されていないワーカノード３０（つまり、グループ１の代表ノード３０以外のワーカノード３０）は、グループ１の非代表ノード３０と称する。また、複数のワーカノード４０のうち、サーバノード２０と通信可能に接続されていないワーカノード４０（つまり、グループ２の代表ノード４０以外のワーカノード４０）は、グループ２の非代表ノード４０と称する。 Among the plurality of worker nodes 30, the worker nodes 30 not connected communicably with the server node 20 (that is, the worker nodes 30 other than the representative node 30 of group 1) are referred to as non-representative node 30 of group 1. Further, among the plurality of worker nodes 40, the worker nodes 40 not connected communicably with the server node 20 (that is, the worker nodes 40 other than the representative node 40 of group 2) are referred to as non-representative nodes 40 of group 2.

グループ１内において、複数のワーカノード３０（代表ノード３０及び非代表ノード３０）は、互いに通信可能に接続されている。同様に、グループ２内において、複数のワーカノード４０（代表ノード４０及び非代表ノード４０）は、互いに通信可能に接続されている。 In Group 1, a plurality of worker nodes 30 (representative node 30 and non-representative node 30) are communicably connected to each other. Similarly, in the group 2, the plurality of worker nodes 40 (the representative node 40 and the non-representative node 40) are communicably connected to each other.

本実施形態において、第一階層では、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）内でそれぞれＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散学習処理が実行される。また、第二階層では、サーバノード２０を介して、グループ１の代表ノード３０及びグループ２の代表ノード４０同士でバッチサイズ非依存並列分散方式による並列分散学習処理が実行される。 In this embodiment, in the first hierarchy, parallel distributed learning processing of the mini-batch method by Synchronous-SGD is executed in each of the group 1 (the plurality of worker nodes 30) and the group 2 (the plurality of worker nodes 40). Further, in the second hierarchy, parallel distributed learning processing by the batch size independent parallel distributed method is executed between the representative node 30 of group 1 and the representative nodes 40 of group 2 via the server node 20.

なお、図２においてはグループ１及びグループ２にそれぞれ３つのワーカノードが属している例が示されているが、グループ１及びグループ２には２つ以上のワーカノードが属していればよい。また、図２においては２つのグループ（グループ１及びグループ２）のみが示されているが、本システムにおいては、３つ以上のグループを備えていてもよい。 Although FIG. 2 shows an example in which three worker nodes belong to group 1 and group 2, respectively, two or more worker nodes may belong to group 1 and group 2. Further, although only two groups (group 1 and group 2) are shown in FIG. 2, in the present system, three or more groups may be provided.

図３は、図２に示すサーバノード２０のシステム構成の一例を示す。サーバノード２０は、ＣＰＵ２０１、システムコントローラ２０２、主メモリ２０３、ＢＩＯＳ−ＲＯＭ２０４、不揮発性メモリ２０５、通信デバイス２０６、エンベデッドコントローラ（ＥＣ）２０７等を備える。 FIG. 3 shows an example of the system configuration of the server node 20 shown in FIG. The server node 20 includes a CPU 201, a system controller 202, a main memory 203, a BIOS-ROM 204, a non-volatile memory 205, a communication device 206, an embedded controller (EC) 207, and the like.

ＣＰＵ２０１は、サーバノード２０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ２０１は、ストレージデバイスである不揮発性メモリ２０５から主メモリ２０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）２０３ａ、及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、サーバノード用の並列分散学習プログラム２０３ｂが含まれている。 The CPU 201 is a processor that controls the operation of various components in the server node 20. The CPU 201 executes various programs loaded from the non-volatile memory 205, which is a storage device, to the main memory 203. These programs include an operating system (OS) 203a and various application programs. The application program includes a parallel distributed learning program 203b for server nodes.

また、ＣＰＵ２０１は、ＢＩＯＳ−ＲＯＭ２０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 201 also executes a basic input / output system (BIOS) stored in the BIOS-ROM 204. The BIOS is a program for hardware control.

システムコントローラ２０２は、ＣＰＵ２０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ２０２には、主メモリ２０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 202 is a device that connects between the local bus of the CPU 201 and various components. The system controller 202 also incorporates a memory controller that controls access to the main memory 203.

通信デバイス２０６は、有線または無線による通信を実行するように構成されたデバイスである。通信デバイス２０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ２０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。 The communication device 206 is a device configured to perform wired or wireless communication. The communication device 206 includes a transmitter that transmits a signal and a receiver that receives the signal. The EC 207 is a one-chip microcomputer including an embedded controller for power management.

図４は、ワーカノード３０のシステム構成の一例を示す。ここでは、ワーカノード３０のシステム構成について説明するが、ワーカノード４０についても同様の構成であるものとする。 FIG. 4 shows an example of the system configuration of the worker node 30. As shown in FIG. Although the system configuration of the worker node 30 will be described here, the same configuration is assumed for the worker node 40.

ワーカノード３０は、ＣＰＵ３０１、システムコントローラ３０２、主メモリ３０３、ＢＩＯＳ−ＲＯＭ３０４、不揮発性メモリ３０５、通信デバイス３０６、エンベデッドコントローラ（ＥＣ）３０７等を備える。 The worker node 30 includes a CPU 301, a system controller 302, a main memory 303, a BIOS-ROM 304, a non-volatile memory 305, a communication device 306, an embedded controller (EC) 307, and the like.

ＣＰＵ３０１は、ワーカノード３０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ３０１は、ストレージデバイスである不揮発性メモリ３０５から主メモリ３０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）３０３ａ及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、ワーカノード用の並列分散学習プログラム３０３ｂが含まれている。 The CPU 301 is a processor that controls the operation of various components in the worker node 30. The CPU 301 executes various programs loaded from the non-volatile memory 305, which is a storage device, to the main memory 303. These programs include an operating system (OS) 303a and various application programs. The application program includes a parallel distributed learning program 303 b for worker nodes.

また、ＣＰＵ３０１は、ＢＩＯＳ−ＲＯＭ３０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 301 also executes a basic input / output system (BIOS) stored in the BIOS-ROM 304. The BIOS is a program for hardware control.

システムコントローラ３０２は、ＣＰＵ３０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ３０２には、主メモリ３０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 302 is a device that connects between the local bus of the CPU 301 and various components. The system controller 302 also incorporates a memory controller for controlling access to the main memory 303.

通信デバイス３０６は、有線または無線による通信を実行するように構成されたデバイスである。通信デバイス３０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ３０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。 The communication device 306 is a device configured to perform wired or wireless communication. The communication device 306 includes a transmitter that transmits a signal and a receiver that receives the signal. The EC 307 is a one-chip microcomputer including an embedded controller for power management.

図５は、サーバノード２０の機能構成の一例を示すブロック図である。図５に示すように、サーバノード２０は、学習データ格納部２１、データ割当部２２、送信制御部２３、重み係数格納部２４、受信制御部２５及び算出部２６を含む。 FIG. 5 is a block diagram showing an example of a functional configuration of the server node 20. As shown in FIG. As shown in FIG. 5, the server node 20 includes a learning data storage unit 21, a data allocation unit 22, a transmission control unit 23, a weighting factor storage unit 24, a reception control unit 25, and a calculation unit 26.

本実施形態において、学習データ格納部２１及び重み係数格納部２４は、例えば図３に示す不揮発性メモリ２０５等に格納される。また、本実施形態において、データ割当部２２、送信制御部２３、受信制御部２５及び算出部２６は、例えば図３に示すＣＰＵ２０１（つまり、サーバノード２０のコンピュータ）が不揮発性メモリ２０５に格納されている並列分散学習プログラム２０３ｂを実行すること（つまり、ソフトウェア）により実現されるものとする。なお、この並列分散学習プログラム２０３ｂは、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、並列分散学習プログラム２０３ｂは、例えばネットワークを介してサーバノード２０にダウンロードされても構わない。 In the present embodiment, the learning data storage unit 21 and the weight coefficient storage unit 24 are stored, for example, in the non-volatile memory 205 or the like shown in FIG. Further, in the present embodiment, for the data allocation unit 22, the transmission control unit 23, the reception control unit 25 and the calculation unit 26, for example, the CPU 201 (that is, the computer of the server node 20) shown in FIG. It is realized by executing the parallel distributed learning program 203b (that is, software). The parallel distributed learning program 203b can be distributed by storing it in a computer readable storage medium in advance. Also, the parallel distributed learning program 203b may be downloaded to the server node 20 via, for example, a network.

ここでは、各部２２、２３、２５及び２６がソフトウェアにより実現されるものとして説明したが、これらの各部２２、２３、２５及び２６は、ハードウェアにより実現されてもよいし、ソフトウェア及びハードウェアの組み合わせ構成によって実現されてもよい。 Here, although each part 22, 23, 25 and 26 was explained as what is realized by software, each of these parts 22, 23, 25 and 26 may be realized by hardware, or software and hardware It may be realized by a combined configuration.

学習データ格納部２１には、上記した並列分散学習処理において各ノード（ワーカノード）が勾配を計算するために用いられる学習データが格納されている。 The learning data storage unit 21 stores learning data used by each node (worker node) to calculate a gradient in the parallel distributed learning process described above.

データ割当部２２は、学習データ格納部２１に格納されている学習データのうち、各ワーカノード３０及び４０に割り当てられる学習データを決定する。データ割当部２２は、例えば学習データ格納部２１に格納されている学習データを２つに分割し、当該分割された学習データのそれぞれをグループ１（に属する複数のワーカノード３０）及びグループ２（に属する複数のワーカノード４０）に割り当てる。 The data allocation unit 22 determines, among the learning data stored in the learning data storage unit 21, the learning data to be allocated to each of the worker nodes 30 and 40. The data allocation unit 22 divides, for example, the learning data stored in the learning data storage unit 21 into two, and each of the divided learning data is divided into group 1 (a plurality of worker nodes 30 belonging to them) and group 2 ( Assign to a plurality of worker nodes 40) to which it belongs.

送信制御部２３は、通信デバイス２０６を介して各種データを送信する機能を有する。送信制御部２３は、データ割当部２２によってグループ１（に属する複数のワーカノード３０）に割り当てられた学習データを、当該グループ１の代表ノード３０に送信する。また、送信制御部２３は、データ割当部２２によってグループ２（に属する複数のワーカノード４０）に割り当てられた学習データを、当該グループ２の代表ノード４０に送信する。 The transmission control unit 23 has a function of transmitting various data via the communication device 206. The transmission control unit 23 transmits, to the representative node 30 of the group 1, the learning data allocated to the group 1 (a plurality of worker nodes 30 belonging to the group 1) by the data allocation unit 22. Further, the transmission control unit 23 transmits, to the representative node 40 of the group 2, the learning data allocated to the group 2 (a plurality of worker nodes 40 belonging to the group 2) by the data allocation unit 22.

重み係数格納部２４には、目的関数の重み係数が格納されている。なお、重み係数格納部２４に格納されている重み係数（つまり、サーバノード２０において管理される重み係数）は、マスタパラメータと称される。 The weighting factor storage unit 24 stores weighting factors of the objective function. The weighting factor stored in the weighting factor storage unit 24 (that is, the weighting factor managed by the server node 20) is referred to as a master parameter.

受信制御部２５は、通信デバイス２０６を介して各種データを受信する機能を有する。受信制御部２５は、各ワーカノード３０及び４０上での学習経過を示す勾配を受信する。なお、受信制御部２５によって受信される勾配は、重み係数を更新するために各ワーカノード３０及び４０において算出される。グループ１に属するワーカノード３０の各々によって算出された勾配は、当該グループ１の代表ノード３０から受信される。グループ２に属するワーカノード４０の各々によって算出された勾配は、当該グループ２の代表ノード４０から受信される。 The reception control unit 25 has a function of receiving various data via the communication device 206. The reception control unit 25 receives a gradient indicating learning progress on each of the worker nodes 30 and 40. The gradient received by the reception control unit 25 is calculated at each worker node 30 and 40 in order to update the weighting factor. The gradient calculated by each of the worker nodes 30 belonging to group 1 is received from the representative node 30 of the group 1 concerned. The gradient calculated by each of the worker nodes 40 belonging to group 2 is received from the representative node 40 of the group 2.

算出部２６は、重み係数格納部２４に格納されている重み係数（マスタパラメータ）及び受信制御部２５によって受信された勾配を用いて、当該マスタパラメータを更新する。この場合、算出部２６は、上記した式（１）に基づいて更新後の重み係数を算出する。算出部２６によって算出された重み係数（更新後の重み係数）は、マスタパラメータとして重み係数格納部２４に格納されるとともに、送信制御部２３によってグループ１の代表ノード３０またはグループ２の代表ノード４０に送信される。 The calculation unit 26 updates the master parameter using the weight coefficient (master parameter) stored in the weight coefficient storage unit 24 and the gradient received by the reception control unit 25. In this case, the calculating unit 26 calculates the updated weighting factor based on the above equation (1). The weighting factor (weighting factor after update) calculated by the calculating section 26 is stored in the weighting factor storage section 24 as a master parameter, and the representative node 30 of group 1 or the representative node 40 of group 2 by the transmission control section 23. Sent to

以下、複数のワーカノード３０及び４０の機能構成について説明する。まず、図６を参照して、グループ１の代表ノード３０の機能構成の一例について説明する。 The functional configuration of the plurality of worker nodes 30 and 40 will be described below. First, an example of a functional configuration of the representative node 30 of the group 1 will be described with reference to FIG.

図６に示すように、グループ１の代表ノード３０は、受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 As shown in FIG. 6, the representative node 30 of the group 1 includes a reception control unit 31, a learning data storage unit 32, a weighting factor storage unit 33, a calculation unit 34, and a transmission control unit 35.

本実施形態において、受信制御部３１、算出部３４及び送信制御部３５は、例えば図４に示すＣＰＵ３０１（つまり、グループ１の代表ノード３０のコンピュータ）が不揮発性メモリ３０５に格納されている並列分散学習プログラム３０３ｂを実行すること（つまり、ソフトウェア）により実現されるものとする。なお、この並列分散学習プログラム３０３ｂは、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、並列分散学習プログラム３０３ｂは、例えばネットワークを介して代表ノード３０にダウンロードされても構わない。 In the present embodiment, the reception control unit 31, the calculation unit 34, and the transmission control unit 35 are, for example, parallel distributed in which the CPU 301 (that is, the computer of the representative node 30 of group 1) shown in FIG. It is realized by executing the learning program 303 b (that is, software). The parallel distributed learning program 303 b can be distributed by storing in advance in a computer readable storage medium. Also, the parallel distributed learning program 303 b may be downloaded to the representative node 30 via, for example, a network.

ここでは、各部３１、３４及び３５がソフトウェアにより実現されるものとして説明したが、これらの各部３１、３４及び３５は、ハードウェアにより実現されてもよいし、ソフトウェア及びハードウェアの組み合わせ構成によって実現されてもよい。 Here, although the respective units 31, 34 and 35 have been described as being realized by software, these respective units 31, 34 and 35 may be realized by hardware or realized by a combined configuration of software and hardware. It may be done.

また、本実施形態において、学習データ格納部３２及び重み係数格納部３３は、例えば図４に示す不揮発性メモリ３０５等に格納される。 Further, in the present embodiment, the learning data storage unit 32 and the weighting factor storage unit 33 are stored, for example, in the non-volatile memory 305 or the like shown in FIG.

受信制御部３１は通信デバイス３０６を介して各種データを受信する機能を有する。受信制御部３１は、サーバノード２０に含まれる送信制御部２３によって送信された学習データを受信する。受信制御部３１によって受信された学習データのうち、グループ１の代表ノード３０に割り当てられた学習データは、学習データ格納部３２に格納される。一方、受信制御部３１によって受信された学習データのうち、グループ１の非代表ノード３０に割り当てられた学習データは、グループ１の代表ノード３０から非代表ノード３０に送信される。 The reception control unit 31 has a function of receiving various data via the communication device 306. The reception control unit 31 receives the learning data transmitted by the transmission control unit 23 included in the server node 20. Of the learning data received by the reception control unit 31, the learning data assigned to the representative node 30 of group 1 is stored in the learning data storage unit 32. On the other hand, among the learning data received by the reception control unit 31, the learning data assigned to the non-representative node 30 of group 1 is transmitted from the representative node 30 of group 1 to the non-representative node 30.

また、受信制御部３１は、グループ１の非代表ノード３０において算出された勾配を、当該非代表ノード３０から受信する。 Further, the reception control unit 31 receives the gradient calculated at the non-representative node 30 of the group 1 from the non-representative node 30.

重み係数格納部３３には、目的関数の重み係数が格納されている。なお、重み係数格納部３３に格納されている重み係数（つまり、代表ノード３０において管理される重み係数）は、便宜的にグループ１の代表ノード３０の重み係数と称する。 The weighting factor storage unit 33 stores weighting factors of the objective function. The weighting factor stored in the weighting factor storage unit 33 (that is, the weighting factor managed by the representative node 30) is referred to as the weighting factor of the representative node 30 of group 1 for convenience.

算出部３４は、学習データ格納部３２に格納された学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。 The calculation unit 34 uses the learning data stored in the learning data storage unit 32 and the weighting factor stored in the weighting factor storage unit 33 to calculate a gradient for updating the weighting factor of the objective function.

送信制御部３５は、通信デバイス３０６を介して各種データを送信する機能を有する。送信制御部３５は、受信制御部３１によって受信された勾配（非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配をサーバノード２０に送信する。 The transmission control unit 35 has a function of transmitting various data via the communication device 306. The transmission control unit 35 transmits, to the server node 20, the gradient (the gradient calculated at the non-representative node 30) received by the reception control unit 31 and the gradient calculated by the calculation unit 34.

なお、上記したようにサーバノード２０に含まれる算出部２６によって算出された重み係数（更新後の重み係数）が当該サーバノード２０（送信制御部２３）から送信された場合、当該重み係数は、受信制御部３１によって受信され、重み係数格納部３３に格納されている重み係数（更新前の重み係数）と置換される。これにより、グループ１の代表ノード３０の重み係数が更新される。また、この重み係数は、送信制御部３５を介して非代表ノード３０に送信される。 When the weighting factor (weighting factor after update) calculated by the calculating unit 26 included in the server node 20 as described above is transmitted from the server node 20 (transmission control unit 23), the weighting factor is: It is replaced by the weighting factor (the weighting factor before updating) received by the reception control unit 31 and stored in the weighting factor storage unit 33. Thereby, the weighting factor of the representative node 30 of the group 1 is updated. Also, the weighting factor is transmitted to the non-representative node 30 via the transmission control unit 35.

次に、グループ１の非代表ノード３０の機能構成の一例について説明する。なお、グループ１の非代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは上記したグループ１の代表ノード３０と異なる部分について主に述べる。 Next, an example of a functional configuration of the non-representative node 30 of group 1 will be described. The functional configuration of the non-representative node 30 of group 1 will be described using FIG. 6 for the sake of convenience, but here, portions different from the representative node 30 of group 1 described above will be mainly described.

グループ１の非代表ノード３０は、上記したグループ１の代表ノード３０と同様に、図６に示す受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 Similar to the representative node 30 of group 1 described above, the non-representative node 30 of group 1 includes the reception control unit 31, the learning data storage unit 32, the weighting factor storage unit 33, the calculation unit 34, and the transmission control unit 35 shown in FIG. including.

受信制御部３１は、グループ１の代表ノード３０から送信された学習データを受信する。受信制御部３１によって受信された学習データは、学習データ格納部３２に格納される。 The reception control unit 31 receives the learning data transmitted from the representative node 30 of the group 1. The learning data received by the reception control unit 31 is stored in the learning data storage unit 32.

重み係数格納部３３には、目的関数の重み係数が格納されている。なお、重み係数格納部３３に格納されている重み係数（つまり、非代表ノード３０において管理される重み係数）は、便宜的にグループ１の非代表ノード３０の重み係数と称する。 The weighting factor storage unit 33 stores weighting factors of the objective function. The weighting factor stored in the weighting factor storage unit 33 (that is, the weighting factor managed by the non-representative node 30) is referred to as the weighting factor of the non-representative node 30 of group 1 for convenience.

上記したようにグループ１の代表ノード３０から重み係数（更新後の重み係数）が送信された場合、当該重み係数は、受信制御部３１によって受信され、重み係数格納部３３に格納されている重み係数（更新前の重み係数）と置換される。これにより、グループ１の非代表ノード３０の重み係数が更新される。 As described above, when the weighting factor (weighting factor after update) is transmitted from the representative node 30 of group 1, the weighting factor is received by the reception control unit 31 and the weighting factor stored in the weighting factor storage unit 33. It is replaced with a coefficient (weight coefficient before updating). Thereby, the weighting factor of the non-representative node 30 of group 1 is updated.

算出部３４は、学習データ格納部３２に格納された学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。算出部３４によって算出された勾配は、送信制御部３５によって代表ノード３０に送信される。 The calculation unit 34 uses the learning data stored in the learning data storage unit 32 and the weighting factor stored in the weighting factor storage unit 33 to calculate a gradient for updating the weighting factor of the objective function. The gradient calculated by the calculation unit 34 is transmitted by the transmission control unit 35 to the representative node 30.

ここでは、グループ１の代表ノード３０及び非代表ノード３０について説明したが、本実施形態においては、グループ２の代表ノード４０及び非代表ノード４０についてもグループ１の代表ノード３０及び非代表ノード３０と同様の機能構成であるものとする。このため、以下においてグループ２の代表ノード４０及び非代表ノード４０の機能構成について述べる場合には図６を用いるものとする。 Here, although the representative node 30 and the non-representative node 30 of the group 1 have been described, in the present embodiment, the representative node 30 and the non-representative node 40 of the group 2 are It shall be the same functional composition. Therefore, when describing the functional configuration of the representative node 40 and the non-representative node 40 of group 2 below, FIG. 6 is used.

以下、図７のシーケンスチャートを参照して、本システムの処理手順の一例について説明する。ここでは、サーバノード２０、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）間の処理について主に説明し、各グループ（グループ１及びグループ２）内のワーカノードの処理については後述する。 Hereinafter, an example of the processing procedure of the present system will be described with reference to the sequence chart of FIG. Here, processing between the server node 20, group 1 (a plurality of worker nodes 30) and group 2 (a plurality of worker nodes 40) will be mainly described, and processing of worker nodes in each group (group 1 and group 2) will be described later. Do.

なお、グループ１に属する複数のワーカノード３０の各々に割り当てられた学習データは、当該ワーカノード３０に含まれる学習データ格納部３２に格納されているものとする。グループ２に属する複数のワーカノード４０についても同様である。 The learning data assigned to each of the plurality of worker nodes 30 belonging to the group 1 is assumed to be stored in the learning data storage unit 32 included in the worker node 30. The same applies to a plurality of worker nodes 40 belonging to group 2.

また、サーバノード２０に含まれる重み係数格納部２４、複数のワーカノード３０及び４０の各々に含まれる重み係数格納部３３には、例えば同一の重み係数（以下、重み係数Ｗ０と表記）が格納されているものとする。 Also, in the weighting factor storage unit 24 included in the server node 20 and the weighting factor storage unit 33 included in each of the plurality of worker nodes 30 and 40, for example, the same weighting factor (hereinafter referred to as weighting factor W0) is stored It shall be.

この場合、グループ１（に属する複数のワーカノード３０）においては、勾配算出処理が行われる（ステップＳ１）。この勾配算出処理によれば、グループ１に属する複数のワーカノード３０の各々は、当該ワーカノード３０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ１に属する複数のワーカノード３０の各々は、互いに同期して勾配算出処理を実行する。 In this case, gradient calculation processing is performed in group 1 (a plurality of worker nodes 30 belonging to the group 1) (step S1). According to this gradient calculation processing, each of the plurality of worker nodes 30 belonging to the group 1 receives the learning data stored in the learning data storage unit 32 included in the worker node 30 and the weights stored in the weighting coefficient storage unit 33. The coefficient W0 is used to calculate the gradient for updating the weighting coefficient of the objective function. Each of the plurality of worker nodes 30 belonging to group 1 executes the gradient calculation processing in synchronization with each other.

このようにステップＳ１において複数のワーカノード３０によって算出された勾配は、グループ１の代表ノード３０からサーバノード２０に送信される（ステップＳ２）。 Thus, the gradients calculated by the plurality of worker nodes 30 in step S1 are transmitted from the representative node 30 of group 1 to the server node 20 (step S2).

サーバノード２０（に含まれる受信制御部２５）は、ステップＳ２において送信された勾配を受信する。サーバノード２０（に含まれる算出部２６）は、受信された勾配と当該サーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ０を用いて新たな重み係数（以下、重み係数Ｗ１と表記）を算出する。これにより、重み係数格納部２４に格納されている重み係数Ｗ０は、算出された重み係数Ｗ１に更新される（ステップＳ３）。 The server node 20 (reception control unit 25 included therein) receives the gradient transmitted in step S2. The server node 20 (the calculation unit 26 included therein) uses the received gradient and the weight coefficient W0 stored in the weight coefficient storage unit 24 included in the server node 20 to generate a new weight coefficient (hereinafter referred to as a weight coefficient). Calculate W1). Accordingly, the weighting factor W0 stored in the weighting factor storage unit 24 is updated to the calculated weighting factor W1 (step S3).

サーバノード２０（に含まれる送信制御部２３）は、ステップＳ３において重み係数Ｗ０から更新された重み係数Ｗ１（更新後のマスタパラメータ）を、グループ１に配布する（ステップＳ４）。 The server node 20 (the transmission control unit 23 included therein) distributes the weighting factor W1 (master parameter after updating) updated from the weighting factor W0 in step S3 to the group 1 (step S4).

このようにサーバノード２０から配布された重み係数Ｗ１は、グループ１に属する複数のワーカノード３０の各々に含まれる重み係数格納部３３に格納される。この場合、グループ１においては、グループ１において算出された勾配が反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 The weighting factor W1 distributed from the server node 20 as described above is stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 30 belonging to the group 1. In this case, in the group 1, it is possible to execute the gradient calculation process from the next time on, using the weight coefficient in which the gradient calculated in the group 1 is reflected.

一方、グループ２（に属する複数のワーカノード４０）においては、グループ１と同様に、勾配算出処理が行われる（ステップＳ５）。この勾配算出処理によれば、グループ２に属する複数のワーカノード４０の各々は、当該ワーカノード４０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ２に属する複数のワーカノード４０の各々は、互いに同期して勾配算出処理を実行する。 On the other hand, in the group 2 (a plurality of worker nodes 40 belonging to the group 2), the gradient calculation processing is performed as in the group 1 (step S5). According to this gradient calculation process, each of the plurality of worker nodes 40 belonging to group 2 receives the learning data stored in the learning data storage unit 32 included in the worker node 40 and the weights stored in the weighting coefficient storage unit 33. The coefficient W0 is used to calculate the gradient for updating the weighting coefficient of the objective function. Each of the plurality of worker nodes 40 belonging to group 2 executes the gradient calculation processing in synchronization with each other.

このようにステップＳ５において算出された勾配は、グループ２の代表ノード４０からサーバノード２０に送信される（ステップＳ６）。 The gradient calculated in step S5 as described above is transmitted from the representative node 40 of group 2 to the server node 20 (step S6).

サーバノード２０は、ステップＳ６において送信された勾配を受信する。ここで、サーバノード２０に含まれる重み係数格納部２４に格納されている重み係数（マスタパラメータ）は、ステップＳ３において更新された重み係数Ｗ１である。 The server node 20 receives the gradient transmitted in step S6. Here, the weighting factor (master parameter) stored in the weighting factor storage unit 24 included in the server node 20 is the weighting factor W1 updated in step S3.

このため、サーバノード２０は、受信された勾配と重み係数Ｗ１とを用いて新たな重み係数（以下、重み係数Ｗ２と表記）を算出する。これにより、重み係数格納部２４に格納されている重み係数Ｗ１は、算出された重み係数Ｗ２に更新される（ステップＳ７）。 Therefore, the server node 20 calculates a new weighting factor (hereinafter referred to as a weighting factor W2) using the received gradient and the weighting factor W1. Accordingly, the weighting factor W1 stored in the weighting factor storage unit 24 is updated to the calculated weighting factor W2 (step S7).

サーバノード２０は、ステップＳ７において重み係数Ｗ１から更新された重み係数Ｗ２（マスタパラメータ）を、グループ２に配布する（ステップＳ８）。 The server node 20 distributes the weighting factor W2 (master parameter) updated from the weighting factor W1 in step S7 to the group 2 (step S8).

このようにサーバノード２０から配布された重み係数Ｗ２は、グループ２に属する複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納される。 The weighting factor W2 distributed from the server node 20 as described above is stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 40 belonging to the group 2.

ここで、重み係数Ｗ２は、グループ１において算出された勾配を用いて更新された重み係数Ｗ１がグループ２において算出された勾配を用いて更に更新されたものである。すなわち、重み係数Ｗ２は、グループ１において算出された勾配（ステップＳ１において算出された勾配）及びグループ２において算出された勾配（ステップＳ５において算出された勾配）に基づいて算出された重み係数である。このようにグループ１による勾配の算出がグループ２による勾配の算出よりも早い場合、グループ２においては、グループ１において算出された勾配に基づいて更新された重み係数を用いた並列分散学習処理が実行される。 Here, the weighting factor W2 is one in which the weighting factor W1 updated using the gradient calculated in group 1 is further updated using the gradient calculated in group 2. That is, weighting factor W2 is a weighting factor calculated based on the slope calculated in group 1 (slope calculated in step S1) and the slope calculated in group 2 (slope calculated in step S5). . As described above, when the gradient calculation by group 1 is faster than the gradient calculation by group 2, parallel distributed learning processing using the weighting factor updated based on the gradient calculated in group 1 is executed in group 2 Be done.

このため、グループ２においては、当該グループ２において算出された勾配だけでなく、グループ１において算出された勾配もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 For this reason, in the group 2, it is possible to execute the gradient calculation process after the next time using a weighting factor in which not only the gradient calculated in the group 2 but also the gradient calculated in the group 1 is reflected. .

また、上記したステップＳ１〜Ｓ４の処理が実行された場合、グループ１においては、当該ステップＳ１〜Ｓ４の処理に相当するステップＳ９〜Ｓ１２の処理が実行される。この処理においては、グループ１における勾配算出処理によって算出された勾配とサーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ２を用いて、当該重み係数Ｗ２が新たな重み係数（以下、重み係数Ｗ３と表記）に更新される。この重み係数Ｗ３は、グループ１に属する複数のワーカノード３０に配布される。なお、ステップＳ９においては、ステップＳ１の勾配算出処理において用いられた学習データとは異なる学習データを用いて勾配を算出するものとする。 Further, when the processing of steps S1 to S4 described above is performed, in group 1, the processing of steps S9 to S12 corresponding to the processing of steps S1 to S4 is performed. In this process, using the gradient calculated by the gradient calculation process in group 1 and the weighting factor W2 stored in the weighting factor storage unit 24 included in the server node 20, the weighting factor W2 is a new weighting factor ( Hereinafter, the weighting factor W3 is updated. The weighting factor W3 is distributed to a plurality of worker nodes 30 belonging to the group 1. In step S9, the gradient is calculated using learning data different from the learning data used in the gradient calculation process in step S1.

ここで、重み係数Ｗ３は、グループ２において算出された勾配を用いて更新された重み係数Ｗ２がグループ１において算出された勾配を用いて更に更新されたものである。このようにグループ２による勾配の算出がグループ１による勾配の算出よりも早い場合、グループ１においては、グループ２において算出された勾配に基づいて更新された重み係数を用いた並列分散学習処理が実行される。 Here, the weighting factor W3 is one in which the weighting factor W2 updated using the gradient calculated in group 2 is further updated using the gradient calculated in group 1. As described above, when the gradient calculation by group 2 is faster than the gradient calculation by group 1, parallel distributed learning processing is executed in group 1 using the weighting factor updated based on the gradient calculated in group 2 Be done.

このため、グループ１においては、当該グループ１において算出された勾配（ステップＳ１及びＳ９において算出された勾配）だけでなく、グループ２において算出された勾配（ステップＳ５において算出された勾配）もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 Therefore, in the group 1, not only the gradient calculated in the group 1 (slope calculated in steps S1 and S9) but also the gradient calculated in the group 2 (slope calculated in step S5) is reflected The gradient calculation process from the next time onward can be executed using the determined weighting factor.

一方、上記したステップＳ５〜Ｓ８の処理が実行された場合、グループ２においては、当該ステップＳ５〜Ｓ８の処理に相当するステップＳ１３〜Ｓ１６の処理が実行される。この処理においては、グループ２における勾配算出処理によって算出された勾配とサーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ３が新たな重み係数（以下、重み係数Ｗ４と表記）に更新される。この重み係数Ｗ４は、グループ２に属する複数のワーカノード４０に配布される。なお、ステップＳ１３においては、ステップＳ５の勾配算出処理において用いられた学習データとは異なる学習データを用いて勾配を算出するものとする。 On the other hand, when the process of steps S5 to S8 described above is performed, in group 2, the processes of steps S13 to S16 corresponding to the processes of steps S5 to S8 are performed. In this process, the gradient calculated by the gradient calculation process in group 2 and the weighting factor W3 stored in the weighting factor storage unit 24 included in the server node 20 have new weighting factors (hereinafter referred to as weighting factor W4). Updated to The weighting factor W4 is distributed to a plurality of worker nodes 40 belonging to the group 2. In step S13, a gradient is calculated using learning data different from the learning data used in the gradient calculation process of step S5.

ここで、重み係数Ｗ４は、グループ１において算出された勾配を用いて更新された重み係数Ｗ３がグループ２において算出された勾配を用いて更に更新されたものである。 Here, the weighting factor W4 is obtained by further updating the weighting factor W3 updated using the gradient calculated in group 1 using the gradient calculated in group 2.

このため、グループ２においては、当該グループ２において算出された勾配（ステップＳ５及びＳ１３において算出された勾配）だけでなく、グループ１において算出された勾配（ステップＳ１及びＳ９において算出された勾配）もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 Therefore, in the group 2, not only the gradient calculated in the group 2 (slope calculated in steps S5 and S13) but also the gradient calculated in group 1 (slope calculated in steps S1 and S9) The gradient calculation process from the next time onward can be executed using the weighting factor on which

図７においてはステップＳ１〜Ｓ１６の処理について説明したが、当該図７に示す処理は、複数のワーカノード３０及び４０の各々に含まれる学習データ格納部３２に格納されている学習データの全てについて勾配算出処理（つまり、並列分散学習処理）が実行されるまで継続して実行される。 Although the process of steps S1 to S16 has been described in FIG. 7, the process shown in FIG. 7 is a gradient for all of the learning data stored in learning data storage unit 32 included in each of the plurality of worker nodes 30 and 40. It is continuously executed until calculation processing (that is, parallel distributed learning processing) is executed.

上記したように本実施形態によれば、グループ１及びグループ２内では互いに同期して処理が実行されるが、サーバノード２０及びグループ１（の代表ノード３０）間の処理と、サーバノード２０及びグループ２（の代表ノード４０）間の処理とは、非同期に実行される。 As described above, according to the present embodiment, processing is executed in synchronization with each other in group 1 and group 2, but processing between server node 20 and (the representative node 30 of) group 1, server node 20 and The processing between (the representative node 40 of) group 2 is performed asynchronously.

以下、上記した図７に示す処理が実行される際の、各グループの代表ノード及び非代表ノードの処理について説明する。 Hereinafter, processing of the representative node and the non-representative node of each group when the processing shown in FIG. 7 described above is executed will be described.

まず、図８のフローチャートを参照して、代表ノードの処理手順の一例について説明する。ここでは、グループ１の代表ノード３０の処理手順について説明する。 First, an example of the processing procedure of the representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the representative node 30 of group 1 will be described.

代表ノード３０に含まれる算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）を用いて勾配を算出する（ステップＳ２１）。以下、代表ノード３０において算出された勾配を代表ノード３０の勾配と称する。 The calculation unit 34 included in the representative node 30 calculates the gradient using the learning data stored in the learning data storage unit 32 and the weighting factor (for example, weighting factor W0) stored in the weighting factor storage unit 33. (Step S21). Hereinafter, the gradient calculated at the representative node 30 is referred to as the gradient of the representative node 30.

ここで、グループ１の代表ノード３０がステップＳ２１の処理を実行する場合、当該グループ１の非代表ノード３０は、後述するように当該代表ノード３０と同期して勾配を算出する。以下、このように非代表ノード３０において算出された勾配を非代表ノード３０の勾配と称する。 Here, when the representative node 30 of the group 1 executes the process of step S21, the non-representative node 30 of the group 1 calculates the gradient in synchronization with the representative node 30, as described later. Hereinafter, the gradient thus calculated at the non-representative node 30 is referred to as the gradient of the non-representative node 30.

この場合、受信制御部３１は、非代表ノード３０の勾配を当該非代表ノード３０から受信する（ステップＳ２２）。なお、本システムにおいて、複数の非代表ノード３０がグループ１に属している場合は、受信制御部３１は、当該非代表ノード３０の各々から勾配を受信する。 In this case, the reception control unit 31 receives the gradient of the non-representative node 30 from the non-representative node 30 (step S22). In the present system, when a plurality of non-representative nodes 30 belong to group 1, the reception control unit 31 receives a gradient from each of the non-representative nodes 30.

次に、算出部３４は、ステップＳ２１において算出された勾配（代表ノード３０の勾配）及びステップＳ２２において受信された勾配（非代表ノード３０の勾配）の平均値を算出する（ステップＳ２３）。以下、ステップＳ２３において算出された勾配の平均値をグループ１の平均勾配と称する。 Next, the calculation unit 34 calculates an average value of the gradient (the gradient of the representative node 30) calculated in step S21 and the gradient (the gradient of the non-representative node 30) received in step S22 (step S23). Hereinafter, the average value of the gradients calculated in step S23 is referred to as the average gradient of group 1.

送信制御部３５は、グループ１の平均勾配をサーバノード２０に送信する（ステップＳ２４）。 The transmission control unit 35 transmits the average slope of group 1 to the server node 20 (step S24).

なお、上記したステップＳ２１〜Ｓ２４の処理は、上記した図７に示すステップＳ１及びＳ２（またはステップＳ９及びＳ１０）においてグループ１の代表ノード３０によって実行される。 The processing of steps S21 to S24 described above is executed by the representative node 30 of group 1 in steps S1 and S2 (or steps S9 and S10) shown in FIG. 7 described above.

この場合、サーバノード２０によって図７に示すステップＳ３及びＳ４の処理が実行される。すなわち、サーバノード２０においてはステップＳ２４において送信されたグループ１の平均勾配でマスタパラメータが更新され、当該更新後のマスタパラメータ（例えば、重み係数Ｗ１）がサーバノード２０からグループ１の代表ノード３０に送信される。 In this case, the server node 20 executes the processes of steps S3 and S4 shown in FIG. That is, in the server node 20, the master parameter is updated with the average gradient of group 1 transmitted in step S24, and the updated master parameter (for example, weighting factor W1) changes from the server node 20 to the representative node 30 of group 1. Will be sent.

サーバノード２０からマスタパラメータが送信された場合、受信制御部３１は、当該マスタパラメータを受信する（ステップＳ２５）。 When the master parameter is transmitted from the server node 20, the reception control unit 31 receives the master parameter (step S25).

送信制御部３５は、ステップＳ２５において受信されたマスタパラメータを非代表ノード３０に送信する（ステップＳ２６）。 The transmission control unit 35 transmits the master parameter received in step S25 to the non-representative node 30 (step S26).

重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）は、ステップＳ２５において受信されたマスタパラメータ（例えば、重み係数Ｗ１）で置換される（ステップＳ２７）。これにより、グループ１の代表ノード３０の重み係数がマスタパラメータ（と同一の重み係数）に更新される。 The weighting factor (for example, weighting factor W0) stored in the weighting factor storage unit 33 is replaced with the master parameter (for example, weighting factor W1) received in step S25 (step S27). Thereby, the weighting factor of the representative node 30 of group 1 is updated to the master parameter (the same weighting factor as that of the master parameter).

なお、ステップＳ２５〜Ｓ２７の処理は、上記した図７に示すステップＳ４（またはステップＳ１２）の処理の後に代表ノード３０によって実行される。 The process of steps S25 to S27 is executed by the representative node 30 after the process of step S4 (or step S12) shown in FIG. 7 described above.

上記した図８に示す処理が実行されることにより、グループ１の代表ノード３０の重み係数がグループ１の平均勾配を用いて算出された重み係数に更新され、次の勾配の算出においては当該更新された重み係数を用いることができる。 By executing the process shown in FIG. 8 described above, the weighting factor of the representative node 30 of group 1 is updated to the weighting factor calculated using the average gradient of group 1, and the updating of the next gradient is performed. The weighting factor can be used.

なお、図示されていないが、図８に示す処理は、図７に示す処理が継続して実行されている間は繰り返し実行される。 Although not illustrated, the process illustrated in FIG. 8 is repeatedly performed while the process illustrated in FIG. 7 is continuously performed.

次に、図９のフローチャートを参照して、非代表ノードの処理手順の一例について説明する。ここでは、グループ１の非代表ノード３０の処理手順について説明する。 Next, an example of the processing procedure of the non-representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the non-representative node 30 of group 1 will be described.

非代表ノード３０に含まれる算出部３４は、上記した代表ノード３０における勾配の算出と同期して、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）を用いて勾配を算出する（ステップＳ３１）。 The calculation unit 34 included in the non-representative node 30 synchronizes the learning data stored in the learning data storage unit 32 and the weights stored in the weighting coefficient storage unit 33 in synchronization with the calculation of the gradient in the representative node 30 described above. A gradient is calculated using a coefficient (for example, weighting coefficient W0) (step S31).

ステップＳ３１の処理が実行されると、送信制御部３５は、当該ステップＳ３１において算出された勾配（非代表ノード３０の勾配）を代表ノード３０に送信する（ステップＳ３２）。 When the process of step S31 is executed, the transmission control unit 35 transmits the gradient (the gradient of the non-representative node 30) calculated in the step S31 to the representative node 30 (step S32).

なお、上記したステップＳ３１及びＳ３２の処理は、上記した図７に示すステップＳ１及びＳ２（またはステップＳ９及びＳ１０）において非代表ノード３０によって実行される。 The processes of steps S31 and S32 described above are executed by the non-representative node 30 in steps S1 and S2 (or steps S9 and S10) shown in FIG. 7 described above.

ステップＳ３２の処理が実行された場合、代表ノード３０においては、図８に示すステップＳ２２〜Ｓ２６の処理が実行される。この場合、サーバノード２０から送信されたマスタパラメータ（例えば、重み係数Ｗ１）がグループ１の代表ノード３０から非代表ノード３０に送信される。 When the process of step S32 is executed, the process of steps S22 to S26 shown in FIG. 8 is executed in the representative node 30. In this case, the master parameter (for example, the weighting factor W1) transmitted from the server node 20 is transmitted from the representative node 30 of group 1 to the non-representative node 30.

代表ノード３０からマスタパラメータが送信された場合、受信制御部３１は、当該マスタパラメータを受信する（ステップＳ３３）。 When the master parameter is transmitted from the representative node 30, the reception control unit 31 receives the master parameter (step S33).

重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）は、ステップＳ３３において受信されたマスタパラメータで置換される（ステップＳ３４）。これにより、グループ１の非代表ノード３０の重み係数がマスタパラメータ（と同一の重み係数）に更新される。 The weighting factor (for example, weighting factor W0) stored in the weighting factor storage unit 33 is replaced with the master parameter received in step S33 (step S34). Thereby, the weighting factor of the non-representative node 30 of group 1 is updated to the master parameter (the same weighting factor as that of the master parameter).

なお、ステップＳ３３及びＳ３４の処理は、上記した図７に示すステップＳ４（またはステップＳ１２）の処理の後に非代表ノード３０によって実行される。 The processes of steps S33 and S34 are executed by the non-representative node 30 after the process of step S4 (or step S12) shown in FIG. 7 described above.

上記した図９に示す処理が実行されることにより、グループ１の非代表ノード３０の重み係数がグループ１の平均勾配を用いて算出された重み係数に更新され、次の勾配の算出においては当該更新された重み係数を用いることができる。 By executing the process shown in FIG. 9 described above, the weighting factor of the non-representative node 30 of group 1 is updated to the weighting factor calculated using the average gradient of group 1, and the next gradient calculation is performed. An updated weighting factor can be used.

なお、図示されていないが、図９に示す処理は、図７に示す処理が継続して実行されている間は繰り返し実行される。 Although not illustrated, the process illustrated in FIG. 9 is repeatedly performed while the process illustrated in FIG. 7 is continuously performed.

上記したようにグループ１においては、当該グループ１に属する全てのワーカノード３０の勾配を代表ノード３０に集約し、当該代表ノード３０において平均勾配を算出する処理が実行される。この場合、例えばＭＰＩ（Message Passing Interface）で定義されるＲｅｄｕｃｅと称される集団通信アルゴリズム（ＭＰＩ＿Ｒｅｄｕｃｅ）を用いることで、非代表ノード３０からの代表ノード３０への勾配の送信と平均勾配（全ワーカノード３０の勾配和）の算出処理を効率的に実行することが可能である。ここでは、ＭＰＩ＿Ｒｅｄｕｃｅを用いる場合について説明したが、当該ＭＰＩ＿Ｒｅｄｕｃｅと同程度の他の処理が実行されてもよい。 As described above, in the group 1, the gradients of all the worker nodes 30 belonging to the group 1 are collected into the representative node 30, and the representative node 30 executes the process of calculating the average gradient. In this case, for example, transmission of the gradient from the non-representative node 30 to the representative node 30 and the average gradient (all worker nodes by using a collective communication algorithm (MPI_Reduce) called Reduce defined by MPI (Message Passing Interface) It is possible to efficiently execute the calculation process of the gradient sum of 30). Here, the case of using MPI_Reduce has been described, but another process similar to the MPI_Reduce may be executed.

ここでは、グループ１（代表ノード３０及び非代表ノード３０）の処理について説明したが、グループ２（代表ノード４０及び非代表ノード４０）においてもグループ１と同様の処理が実行される。 Here, although the processing of group 1 (representative node 30 and non-representative node 30) has been described, processing similar to that of group 1 is executed also in group 2 (representative node 40 and non-representative node 40).

上記したように本実施形態において、システムはグループ１に属する複数のワーカノード（代表ノード及び非代表ノード）３０とグループ２（第２グループ）に属する複数のワーカノード（代表ノード及び非代表ノード）４０とを備える。複数のワーカノード３０が目的関数を基準とする例えばｎ回目の並列分散処理を実行する場合、例えばグループ１の代表ノード（第１ノード）３０によって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、グループ１の非代表ノード（第２ノード）３０によって目的関数の第１重み係数を第２重み係数へ更新するための第２勾配が算出される。 As described above, in the present embodiment, the system includes a plurality of worker nodes (representative node and non-representative node) 30 belonging to group 1 and a plurality of worker nodes (representative node and non-representative node) 40 belonging to group 2 (second group). Equipped with When the plurality of worker nodes 30 execute, for example, the n-th parallel distributed processing based on the objective function, the first weighting factor of the objective function is updated to the second weighting factor by the representative node (first node) 30 of group 1 A first gradient to calculate the second gradient is calculated by the non-representative node (second node) 30 of group 1 to calculate a second gradient to update the first weighting coefficient of the objective function to the second weighting coefficient.

一方、複数のワーカノード３０による並列分散処理と非同期に実行される例えばｍ回目の並列分散処理を複数のワーカノード４０が実行する場合、例えばグループ２の代表ノード（第３ノード）４０によって目的関数の第３重み係数を第４重み係数へ更新するための第４勾配が算出され、かつ、グループ２の非代表ノード４０によって目的関数の第３重み係数を第４重み係数へ更新するための第４勾配が算出される。 On the other hand, when the plurality of worker nodes 40 execute, for example, the m-th parallel distributed processing that is executed asynchronously with the parallel distributed processing by the plurality of worker nodes 30, for example, the representative node (third node) 40 of the group 2 A fourth gradient is calculated for updating the third weighting coefficient to the fourth weighting coefficient, and a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the non-representative node 40 of group 2 Is calculated.

ここで、本実施形態においては、グループ１（代表ノード３０及び非代表ノード３０）における勾配の算出が、グループ２（代表ノード４０及び非代表ノード４０）における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第２勾配に基づいて第１重み係数から更新された第２重み係数をさらに更新し、グループ２におけるｍ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて第３重み係数から更新された第４重み係数をさらに更新する。 Here, in the present embodiment, when the calculation of the gradient in group 1 (the representative node 30 and the non-representative node 30) is faster than the calculation of the gradient in the group 2 (the representative node 40 and the non-representative node 40), group 1 In the (n + 1) -th parallel distributed processing in step (d), the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and in the (m + 1) -th parallel distributed processing in group 2, the first to The fourth weighting factor updated from the third weighting factor is further updated based on the four gradients.

一方、グループ２（代表ノード４０及び非代表ノード４０）における勾配の算出が、グループ１（代表ノード３０及び非代表ノード３０）における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１〜第４勾配に基づいて第１重み係数から更新された第２重み係数をさらに更新し、グループ２におけるｍ＋１回目の並列分散処理では、第３乃至第４勾配に基づいて第３重み係数から更新された第４重み係数をさらに更新する。 On the other hand, if the gradient calculation in group 2 (representative node 40 and non-representative node 40) is faster than the gradient calculation in group 1 (representative node 30 and non-representative node 30), the n + 1th parallel distributed processing in group 1 Then, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and in the m + 1th parallel distributed processing in group 2, the third weighting factor is updated based on the third to fourth gradients. The fourth weighting factor updated from the weighting factor is further updated.

上記したように本実施形態においては、複数のワーカノード３０及び４０を複数のグループ（グループ１及びグループ２）に分割し、第一階層として、当該グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理を行う。この第一階層においては、グループ毎に同期を行うため、例えば複数のワーカノード３０及び４０の全てを同期させて処理を行う場合と比較して、同期コスト及びバッチサイズを抑制することが可能となる。 As described above, in the present embodiment, the plurality of worker nodes 30 and 40 are divided into a plurality of groups (group 1 and group 2), and parallel distributed learning processing by Synchronous-SGD is performed in the group as the first layer. . In this first hierarchy, since synchronization is performed for each group, it is possible to suppress the synchronization cost and the batch size, for example, as compared to the case where processing is performed with all of the plurality of worker nodes 30 and 40 synchronized. .

また、第二階層としては、サーバノード２０を介して、第一階層における各グループの代表ノード同士でバッチサイズ非依存並列方式による並列分散学習処理を行う。この第二階層においては、各代表ノードは同期する必要がないため、高いスループットを得ることができる。 Also, as the second layer, parallel distributed learning processing by the batch size independent parallel method is performed between the representative nodes of each group in the first layer via the server node 20. In this second hierarchy, each representative node does not need to be synchronized, so high throughput can be obtained.

すなわち、本実施形態においては、例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式を階層的に組み合わせる構成により、並列分散学習処理における高いスケーラビリティを実現することができ、より大きな並列数での並列分散学習処理が可能となる。 That is, in this embodiment, high scalability in parallel distributed learning processing can be realized, for example, by hierarchically combining Synchronous-SGD and batch size independent parallel method, and parallel distributed learning with a larger number of parallels can be realized. Processing becomes possible.

ここで、図１０は、所定の汎化性能を得ることができるまでの時間（学習時間）を学習方式毎に示している。 Here, FIG. 10 shows the time (learning time) until the predetermined generalization performance can be obtained for each learning method.

図１０においては、学習方式として、例えば非並列分散方式（単一のノードによる学習方式）、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤ、バッチサイズ非依存並列方式及び本実施形態における方式（Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤ＋バッチサイズ非依存並列方式）が示されている。 In FIG. 10, as a learning method, for example, non-parallel distributed method (learning method by a single node), Synchronous-SGD, batch size independent parallel method, and method according to this embodiment (Synchronous-SGD + batch size independent parallel method )It is shown.

図１０に示すように、非並列分散方式による学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、非並列分散方式の学習時間と表記）を１．０とした場合、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの学習時間と表記）は、０．６である。 As shown in FIG. 10, assuming that the learning time required to obtain predetermined generalization performance in learning processing by the non-parallel distributed system (hereinafter referred to as the learning time of the non-parallel distributed system) is 1.0. The learning time required to obtain predetermined generalization performance in parallel distributed learning processing by Synchronous-SGD (hereinafter referred to as Synchronous-SGD learning time) is 0.6.

同様に、非並列分散方式の学習時間を１．０とした場合、バッチサイズ非依存並列方式による並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、バッチサイズ非依存並列方式の学習時間と表記）は、０．５である。 Similarly, assuming that the learning time of the non-parallel distributed system is 1.0, the learning time until the predetermined generalization performance can be obtained in parallel distributed learning processing by the batch size independent parallel system (hereinafter referred to as batch size non-parallel). The learning time of the dependency parallel system is 0.5.

これに対して、本実施形態に係る方式（階層的な並列分散方式）では、理論上はＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式それぞれのスケーラビリティの上限を掛け合わせたスケーラビリティを得られる。 On the other hand, in the system according to the present embodiment (hierarchical parallel distributed system), it is possible in theory to obtain scalability by multiplying the scalability upper limit of each of the Synchronous-SGD and the batch size independent parallel system.

具体的には、本実施形態に係る方式による並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（の非並列分散方式の学習時間に対する割合）は、非並列分散方式の学習時間に対するＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの学習時間の割合（ここでは、０．６）と非並列分散方式の学習時間に対するバッチサイズ非依存並列方式の学習時間の割合（ここでは、０．５）とを乗算した値（つまり、０．３）となる。 Specifically, the learning time (the ratio of the non-parallel distributed system to the learning time) of obtaining the predetermined generalization performance in the parallel distributed learning processing according to the system according to the embodiment is the non-parallel distributed system. Multiply the ratio of learning time of Synchronous-SGD to learning time (here, 0.6) and the ratio of learning time of batch size independent parallel system to learning time of non-parallel distributed system (here, 0.5) (Ie, 0.3).

これによれば、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤでは非並列分散方式の６割程度の学習時間で所定の汎化性能を得ることができ、バッチサイズ非依存並列方式では非並列分散方式の５割程度の学習時間で所定の汎化性能を得ることができるところ、本実施形態に係る方式では、非並列分散方式の３割程度の学習時間で所定の汎化性能を得ることができる。 According to this, in Synchronous-SGD, a predetermined generalization performance can be obtained in about 60% of the learning time of the non-parallel distributed system, and in the batch size independent parallel system, about 50% of the learning time of the non-parallel distributed system In the method according to this embodiment, the predetermined generalization performance can be obtained in about 30% of the learning time of the non-parallel distributed system.

すなわち、本実施形態においては、単にＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理またはバッチサイズ非依存並列方式による並列分散学習処理を実行する場合に比べて、高いスケーラビリティを実現することが可能である。 That is, in the present embodiment, it is possible to realize high scalability as compared with the case of executing parallel distributed learning processing using only Synchronous-SGD or parallel distributed learning processing using the batch size independent parallel method.

なお、本実施形態において、上記した第２重み係数は、グループ１の代表ノード３０によって算出された第１勾配及びグループ１の非代表ノード３０によって算出された第２勾配から算出された第５勾配（例えば、第１勾配と第２勾配との平均値）に基づいて算出される。同様に、第４重み係数は、グループ２の代表ノード４０によって算出された第３勾配及びグループ２の非代表ノード４０によって算出された第４勾配から算出された第６勾配（例えば、第３勾配と第４勾配との平均値）に基づいて算出される。 In the present embodiment, the above-mentioned second weighting factor is the fifth gradient calculated from the first gradient calculated by the representative node 30 of group 1 and the second gradient calculated by the non-representative node 30 of group 1. It is calculated based on (for example, the average value of the first gradient and the second gradient). Similarly, the fourth weighting factor is a sixth gradient calculated from the third gradient calculated by the representative node 40 of the group 2 and the fourth gradient calculated by the non-representative node 40 of the group 2 (for example, the third gradient And the average value of the fourth gradient).

また、本実施形態においては、グループ１の代表ノード３０及びグループ２の代表ノード４０と通信可能に接続されるサーバノード２０において第２重み係数及び第４重み係数が算出される。 Further, in the present embodiment, the second weighting factor and the fourth weighting factor are calculated in the server node 20 communicably connected to the representative node 30 of group 1 and the representative node 40 of group 2.

サーバノード２０において第２重み係数が算出された場合、サーバノード２０は当該第２重み係数をグループ１の代表ノード３０に送信し、当該代表ノード３０は当該第２重み係数を非代表ノード３０に送信する。本実施形態においては、このような構成により、グループ１の代表ノード３０及び非代表ノード３０の重み係数をサーバノード２０によって算出された重み係数に更新することができる。 When the second weighting factor is calculated in the server node 20, the server node 20 transmits the second weighting factor to the representative node 30 of group 1, and the representative node 30 transmits the second weighting factor to the non-representative node 30. Send. In this embodiment, with such a configuration, it is possible to update the weighting factors of the representative node 30 and the non-representing node 30 of group 1 to the weighting factors calculated by the server node 20.

また、サーバノード２０において第４重み係数が算出された場合、サーバノード２０は当該第４重み係数をグループ２の代表ノード４０に送信し、当該代表ノード４０は当該第４重み係数を非代表ノード４０に送信する。本実施形態においては、このような構成により、グループ２の代表ノード４０及び非代表ノード４０の重み係数をサーバノード２０によって算出された重み係数に更新することができる。 Further, when the fourth weighting factor is calculated in the server node 20, the server node 20 transmits the fourth weighting factor to the representative node 40 of the group 2, and the representative node 40 does not represent the fourth weighting factor. Send to 40 In this embodiment, with such a configuration, it is possible to update the weighting factor of the representative node 40 and the non-representing node 40 of the group 2 to the weighting factor calculated by the server node 20.

なお、本実施形態においては、グループ１における勾配の算出がグループ２における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第２勾配に基づいて重み係数が更新され、グループ２におけるｍ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて重み係数がさらに更新されるものとして説明した。 In this embodiment, when the gradient calculation in group 1 is faster than the gradient calculation in group 2, the weighting factor is updated based on the first and second gradients in the n + 1th parallel distributed processing in group 1 In the (m + 1) th parallel distributed processing in group 2, the weighting factor has been further described as being updated based on the first to fourth gradients.

しかしながら、この「グループ１における勾配の算出がグループ２における勾配の算出よりも早い場合」には、サーバノード２０がグループ２における勾配算出結果（グループ２の代表ノード４０から送信された勾配）を受信するより先にグループ１における勾配算出結果（グループ１の代表ノード３０から送信された勾配）を受信する場合を含むものとする。 However, in the case where “the calculation of the gradient in group 1 is faster than the calculation of the gradient in group 2”, server node 20 receives the gradient calculation result in group 2 (the gradient transmitted from representative node 40 in group 2). It is assumed that the case where the gradient calculation result in Group 1 (the gradient transmitted from the representative node 30 of Group 1) is received prior to the process is included.

これによれば、例えばグループ１における勾配算出結果がグループ２における勾配算出結果よりも先にサーバノード２０によって受信された場合には、グループ１におけるｎ＋１回目の並列分散処理（時間的に後の並列分散処理）では、当該グループ１における勾配算出結果（つまり、第１乃至第２勾配）に基づいて重み係数が更新され、グループ２におけるｍ＋１回目の並列分散処理では、グループ１における勾配算出結果（に基づいて更新された重み係数）及びグループ２における勾配算出結果（つまり、第１乃至第４勾配）に基づいて重み係数がさらに更新されることになる。 According to this, for example, when the gradient calculation result in the group 1 is received by the server node 20 earlier than the gradient calculation result in the group 2, the n + 1th parallel distributed processing in the group 1 (parallel in time later) In the distributed processing), the weighting factor is updated based on the gradient calculation result (that is, the first and second gradients) in the group 1, and in the m + 1th parallel distributed processing in the group 2, the gradient calculation result in the group 1 The weighting factor is further updated based on the updated weighting factor) and the gradient calculation result (that is, the first to fourth gradients) in group 2.

また、本実施形態においては、グループ２における勾配の算出がグループ１における勾配の算出よりも早い場合、グループ２におけるｍ＋１回目の並列分散処理では、第３乃至第４勾配に基づいて重み係数が更新され、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて重み係数がさらに更新されるものとして説明した。 Further, in the present embodiment, when the gradient calculation in group 2 is faster than the gradient calculation in group 1, the weighting coefficients are updated based on the third to fourth gradients in the m + 1th parallel distributed processing in group 2 In the (n + 1) th parallel distributed processing in group 1, the weighting factor is further updated based on the first to fourth gradients.

しかしながら、この「グループ２における勾配の算出がグループ１における勾配の算出よりも早い場合」には、サーバノード２０がグループ１における勾配算出結果（グループ１の代表ノード３０から送信された勾配）を受信するより先にグループ２における勾配算出結果（グループ２の代表ノード４０から送信された勾配）受信する場合を含むものとする。 However, in the case where “the calculation of the gradient in group 2 is faster than the calculation of the gradient in group 1”, server node 20 receives the gradient calculation result in group 1 (the gradient transmitted from representative node 30 in group 1). It is assumed that the case of receiving the gradient calculation result in Group 2 (the gradient transmitted from the representative node 40 of Group 2) is received earlier.

これによれば、例えばグループ２おける勾配算出結果がグループ１における勾配算出結果よりも先にサーバノード２０によって受信された場合には、グループ２におけるｍ＋１回目の並列分散処理（時間的に後の並列分散処理）では、当該グループ２における勾配算出結果（つまり、第３乃至第４勾配）に基づいて重み係数が更新され、グループ１におけるｎ＋１回目の並列分散処理では、グループ２における勾配算出結果（に基づいて更新された重み係数）及びグループ１における勾配算出結果（つまり、第１乃至第４勾配）に基づいて重み係数がさらに更新されることになる。 According to this, for example, when the gradient calculation result in the group 2 is received by the server node 20 earlier than the gradient calculation result in the group 1, the m + 1th parallel distributed processing in the group 2 (parallel in time later) In the distributed processing), the weighting factor is updated based on the gradient calculation result (that is, the third to fourth gradients) in the group 2, and in the n + 1th parallel distributed processing in the group 1, the gradient calculation result in the group 2 ( The weighting factor is further updated based on the updated weighting factor) and the gradient calculation result (that is, the first to fourth gradients) in group 1.

すなわち、本実施形態においては、上記した各グループにおける勾配算出処理の順番（いずれのグループにおける勾配算出処理が早いか）という観点ではなく、いずれのグループにおける勾配算出処理の結果がサーバノード２０において早く受信されるか（サーバノード２０に早く送信されるか）という観点に基づいて重み係数が更新されるようにしても構わない。 That is, in this embodiment, the result of the gradient calculation process in any group is quick in the server node 20, not from the viewpoint of the order of the gradient calculation process in each group described above (which group the gradient calculation process is earlier). The weighting factor may be updated based on whether it is received (earlierly transmitted to the server node 20).

なお、上記したようにＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤにおいて例えばグループ１に属する複数のワーカノード３０は互いに同期して処理を実行するが、当該複数のワーカノード３０の各々の処理性能（に基づく処理速度）の差が大きい場合、グループ１の処理速度は、処理性能の低いワーカノード３０の処理速度の影響を受ける（つまり、処理性能の低いワーカノード３０の処理速度が支配的となる）。グループ２についても同様である。 As described above, in Synchronous-SGD, for example, a plurality of worker nodes 30 belonging to group 1 execute processing in synchronization with each other, but the difference in processing performance (based on processing speed) of each of the plurality of worker nodes 30 is large In this case, the processing speed of group 1 is affected by the processing speed of the worker node 30 with low processing performance (that is, the processing speed of the worker node 30 with low processing performance is dominant). The same applies to group 2.

このため、同一のグループに属する複数のワーカノードの処理速度は同程度となるように構成するものとする。具体的には、グループ１に属する複数のワーカノード３０（代表ノード３０及び非代表ノード）間の処理速度の差は第１閾値以下となるようにし、グループ２に属する複数のワーカノード４０（代表ノード４０及び非代表ノード４０）間の処理速度の差は第２閾値以下となるようにする。なお、第１閾値及び第２閾値は、同じ値であってもよいし、異なる値であってもよい。 For this reason, the processing speeds of a plurality of worker nodes belonging to the same group are configured to be approximately the same. Specifically, the difference in processing speed among the plurality of worker nodes 30 (representative node 30 and non-representative node) belonging to group 1 is made equal to or less than the first threshold, and the plurality of worker nodes 40 (represented node 40) belonging to group 2 And the non-representative node 40) to be equal to or less than the second threshold. The first threshold and the second threshold may be the same value or different values.

また、図１１に示すように、例えばグループ１に属する複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅い場合には、グループ１に属するワーカノード３０の数を、グループ２に属するワーカノード４０の数よりも少なくするようにしてもよい。なお、複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅いとは、複数のワーカノード４０の処理速度の平均値が複数のワーカノード３０の処理速度の平均値よりも遅い場合であってもよいし、または、複数のワーカノード４０の各々の処理速度のうちの最も遅い処理速度が複数のワーカノード３０の処理速度のうちの最も遅い処理速度よりも遅い場合であってもよい。また、各ワーカノードの処理速度は、例えば当該ワーカノードのハードウェア性能等から算出されてもよい。 Further, as shown in FIG. 11, for example, when the processing speeds of the plurality of worker nodes 40 belonging to group 2 are slower than the processing speeds of the plurality of worker nodes 30 belonging to group 1, the number of worker nodes 30 belonging to group 1 is The number may be smaller than the number of worker nodes 40 belonging to group 2. If the processing speeds of the plurality of worker nodes 40 belonging to the group 2 are slower than the processing speeds of the plurality of worker nodes 30, the average value of the processing speeds of the plurality of worker nodes 40 is higher than the average value of the processing speeds of the plurality of worker nodes 30. It may be slow, or the slowest processing speed of each of the plurality of worker nodes 40 may be slower than the slowest processing speed of the plurality of worker nodes 30. Good. Also, the processing speed of each worker node may be calculated from, for example, the hardware performance of the worker node.

また、グループ１に属する複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅い場合には、並列分散処理におけるグループ１の処理量（つまり、グループ１に割り当てられる学習データ量）を、グループ２の処理量（つまり、グループ２に割り当てられる学習データ量）よりも少なくする。この場合には、グループ１に属するワーカノード３０の数及びグループ２に属するワーカノード４０の数は同数であってもよい。 When the processing speeds of the plurality of worker nodes 40 belonging to group 2 are slower than the processing speeds of the plurality of worker nodes 30 belonging to group 1, the processing amount of group 1 in parallel distributed processing (that is, learning assigned to group 1) The amount of data) is made smaller than the amount of processing of group 2 (that is, the amount of learning data allocated to group 2). In this case, the number of worker nodes 30 belonging to group 1 and the number of worker nodes 40 belonging to group 2 may be the same.

すなわち、上記したような構成によれば、各グループに属するワーカノードの数または当該各グループの処理量（負荷）を調整することによって、各グループにおいて必要な処理時間を同程度にする（つまり、処理速度の差の影響を打ち消す）ことが可能となる。 That is, according to the configuration as described above, by adjusting the number of worker nodes belonging to each group or the processing amount (load) of each group, the required processing time in each group can be made equal (that is, processing) It is possible to cancel the effect of the speed difference).

本実施形態においては、サーバノード２０、複数のワーカノード３０の各々及び複数のワーカノード４０の各々がそれぞれ１つの装置（マシン）によって実現される（つまり、各ノードと装置とが１対１の関係にある）ものとして説明したが、当該各ノードは、１つの装置内で実行される１プロセスまたはスレッドとして実現されていても構わない。すなわち、本実施形態に係るシステム（サーバノード２０、複数のワーカノード３０及び複数のワーカノード４０）は、１つの装置によって実現することも可能である。また、本実施形態に係るシステムは、ノードの数とは異なる数の複数の装置によって実現されることも可能である。 In this embodiment, each of the server node 20, each of the plurality of worker nodes 30, and each of the plurality of worker nodes 40 is realized by one device (ie, each node and device are in a one-to-one relationship). However, each node may be realized as one process or thread executed in one device. That is, the system (the server node 20, the plurality of worker nodes 30, and the plurality of worker nodes 40) according to the present embodiment can also be realized by one device. In addition, the system according to the present embodiment can also be realized by a plurality of devices whose number is different from the number of nodes.

すなわち、本実施形態において、１つのノードは１つのコンピュータ（サーバ）であってもよいし、複数のノードが１つのコンピュータに実装されてもよいし、１つのノードが複数のコンピュータで実装されていてもよい。なお、本実施形態においては、上記したように１つのシステムには２以上のグループがあればよく、１つのグループには２以上のノードがあれよい。 That is, in the present embodiment, one node may be one computer (server), a plurality of nodes may be implemented in one computer, and one node is implemented by a plurality of computers. May be In the present embodiment, as described above, one system may have two or more groups, and one group may have two or more nodes.

また、本実施形態においては、バッチサイズ非依存並列方式として非同期型の並列分散学習方式であるＡｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤについて説明したが、例えばＥｌａｓｔｉｃＡｖｅｒａｇｉｎｇＳＧＤ等の他の方式が採用されても構わない。 Furthermore, in the present embodiment, Asynchronous-SGD, which is an asynchronous parallel distributed learning method, has been described as the batch size independent parallel method, but another method such as Elastic Averaging SGD may be adopted.

また、本実施形態においては、第一階層及び第２階層において異なる方式（アルゴリズム）の並列分散学習処理が実行されるものとして説明したが、組み合わせるアルゴリズムによっては例えば３つ以上の階層で並列分散学習処理が実行される構成としてもよい。 Further, in the present embodiment, it has been described that parallel distributed learning processing of different methods (algorithms) is performed in the first layer and the second layer, but depending on the combination algorithm, for example, parallel distributed learning in three or more layers The processing may be executed.

（第２の実施形態）
次に、第２の実施形態について説明する。なお、本実施形態に係るシステムは、前述した第１の実施形態と同様に、階層的に異なる学習処理を実行可能な構成を有する。 Second Embodiment
Next, a second embodiment will be described. The system according to the present embodiment has a configuration capable of executing hierarchically different learning processing, as in the first embodiment described above.

すなわち、第一階層においては複数のワーカノードが属する各グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散処理が行われ、第二階層においては各グループの代表ノード同士でバッチサイズ非依存並列分散処理（非同期型の並列分散方式）が行われる。 That is, in the first hierarchy, parallel distributed processing of the mini-batch method by Synchronous-SGD is performed in each group to which a plurality of worker nodes belong, and in the second hierarchy, batch size independent parallel distributed processing (representative nodes of each group) An asynchronous parallel distributed system is performed.

図１２は、本実施形態に係るシステム（以下、本システムと表記）の構成の一例を示す。図１２に示すように、本システム１０は、複数のワーカノード３０及び複数のワーカノード４０を備える。 FIG. 12 shows an example of the configuration of a system according to the present embodiment (hereinafter referred to as the present system). As shown in FIG. 12, the present system 10 includes a plurality of worker nodes 30 and a plurality of worker nodes 40.

前述した第１の実施形態においてはサーバノード２０を備える構成であるが、本実施形態は、当該サーバノード２０を備えない点で、前述した第１の実施形態とは異なる。なお、複数のワーカノード３０がグループ１に属しており、複数のワーカノード４０がグループ２に属している点については、前述した第１の実施形態と同様である。 Although the first embodiment described above is configured to include the server node 20, the present embodiment is different from the first embodiment described above in that the server node 20 is not included. The plurality of worker nodes 30 belong to the group 1 and the plurality of worker nodes 40 belong to the group 2 as in the first embodiment described above.

グループ１に属する複数のワーカノード３０のうちの１つのワーカノード（以下、グループ１の代表ノードと表記）３０は、他のグループ２に属する複数のワーカノード４０のうちの１つのワーカノード（以下、グループ２の代表ノードと表記）４０と通信可能に接続される。 One worker node (hereinafter referred to as a representative node of group 1) 30 among a plurality of worker nodes 30 belonging to group 1 is a worker node (hereinafter referred to as group 2) of a plurality of worker nodes 40 belonging to another group 2 It is communicably connected to the representative node 40).

なお、複数のワーカノード３０のうち、グループ１の代表ノード３０以外のワーカノード３０は、グループ１の非代表ノード３０と称する。同様に、複数のワーカノード４０のうち、グループ２の代表ノード４０以外のワーカノード４０は、グループ２の非代表ノード４０と称する。 Among the plurality of worker nodes 30, the worker nodes 30 other than the representative node 30 of group 1 are referred to as non-representative node 30 of group 1. Similarly, among the plurality of worker nodes 40, worker nodes 40 other than the representative node 40 of group 2 are referred to as non-representative nodes 40 of group 2.

本実施形態において、第一階層では、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）内でそれぞれＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理が実行される。また、第二階層では、グループ１の代表ノード３０及びグループ２の代表ノード４０同士でバッチサイズ非依存並列分散方式による並列分散学習処理が実行される。 In the present embodiment, in the first layer, parallel distributed learning processing by Synchronous-SGD is executed in each of the group 1 (the plurality of worker nodes 30) and the group 2 (the plurality of worker nodes 40). In the second hierarchy, parallel distributed learning processing by the batch size independent parallel distributed system is executed between the representative node 30 of group 1 and the representative nodes 40 of group 2.

なお、図１２においてはグループ１及びグループ２にそれぞれ３つのワーカノードが属している例が示されているが、グループ１及びグループ２には２つ以上のワーカノードが属していればよい。また、図１２においては２つのグループのみが示されているが、本システムにおいては、３つ以上のグループを備えていてもよい。 Although FIG. 12 shows an example in which three worker nodes belong to group 1 and group 2, respectively, two or more worker nodes may belong to group 1 and group 2. Also, although only two groups are shown in FIG. 12, in the present system, three or more groups may be provided.

複数のワーカノード３０及び４０のシステム構成については、前述した第１の実施形態と同様であるため、ここでは詳しい説明を省略する。 The system configuration of the plurality of worker nodes 30 and 40 is the same as that of the first embodiment described above, and thus the detailed description is omitted here.

以下、複数のワーカノード３０及び４０のうち、グループ１の代表ノード３０の機能構成の一例について説明する。なお、本実施形態におけるグループ１の代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは前述した第１の実施形態におけるグループ１の代表ノード３０と異なる部分について主に述べる。 Hereinafter, among the plurality of worker nodes 30 and 40, an example of a functional configuration of the representative node 30 of the group 1 will be described. Although the functional configuration of the representative node 30 of group 1 in the present embodiment will be described using FIG. 6 for convenience, the differences from the representative node 30 of group 1 in the first embodiment described above will be mainly described here. Describe to.

図６に示すように、代表ノード３０は、受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 As shown in FIG. 6, the representative node 30 includes a reception control unit 31, a learning data storage unit 32, a weight coefficient storage unit 33, a calculation unit 34, and a transmission control unit 35.

受信制御部３１は、グループ１の非代表ノード３０において算出された勾配を、当該非代表ノード３０から受信する。 The reception control unit 31 receives, from the non-representative node 30, the gradient calculated at the non-representative node 30 of group 1.

学習データ格納部３２には、グループ１の代表ノード３０に割り当てられた学習データが格納される。重み係数格納部３３には、目的関数の重み係数が格納されている。 The learning data storage unit 32 stores learning data assigned to the representative node 30 of the group 1. The weighting factor storage unit 33 stores weighting factors of the objective function.

算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。 The calculation unit 34 uses the learning data stored in the learning data storage unit 32 and the weighting factor stored in the weighting factor storage unit 33 to calculate a gradient for updating the weighting factor of the objective function.

算出部３４は、受信制御部３１によって受信された勾配（つまり、非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配と、重み係数格納部３３に格納されている重み係数とを用いて、当該重み係数を更新する。この場合、算出部３４は、上述した式（１）に基づいて更新後の重み係数を算出する。算出部３４によって算出された重み係数は、重み係数格納部３３に格納されている重み係数と置換される。これにより、グループ１の代表ノード３０の重み係数が更新される。 Calculation unit 34 calculates the gradient received by reception control unit 31 (that is, the gradient calculated at non-representative node 30), the gradient calculated by calculation unit 34, and the weighting factor stored in weighting factor storage unit 33. And update the weighting factor. In this case, the calculating unit 34 calculates the updated weighting factor based on the above-described equation (1). The weighting factor calculated by the calculating unit 34 is replaced with the weighting factor stored in the weighting factor storage unit 33. Thereby, the weighting factor of the representative node 30 of the group 1 is updated.

送信制御部３５は、算出部３４によって算出された勾配を、グループ１の非代表ノード３０に送信する。 The transmission control unit 35 transmits the gradient calculated by the calculation unit 34 to the non-representative node 30 of group 1.

また、送信制御部３５は、グループ１の非代表ノード３０において算出された勾配及び算出部３４によって算出された勾配（つまり、代表ノード３０において算出された勾配）を、他のグループ（例えば、グループ２）の代表ノードに送信する。 In addition, the transmission control unit 35 sets the gradient calculated by the non-representative node 30 of group 1 and the gradient calculated by the calculation unit 34 (that is, the gradient calculated by the representative node 30) to another group (for example, a group). Send to the representative node of 2).

ここで、グループ１の非代表ノード３０において算出された勾配及びグループ１の代表ノード３０において算出された勾配は上記したようにグループ２の代表ノード４０に送信されるが、同様に、グループ２の非代表ノード４０において算出された勾配及びグループ２の代表ノード４０において算出された勾配はグループ１の代表ノード３０に送信される。 Here, the gradient calculated at the non-representative node 30 of group 1 and the gradient calculated at the representative node 30 of group 1 are transmitted to the representative node 40 of group 2 as described above, but similarly The gradient calculated at the non-representative node 40 and the gradient calculated at the representative node 40 of group 2 are transmitted to the representative node 30 of group 1.

グループ２の非代表ノード４０において算出された勾配及び代表ノード４０において算出された勾配がグループ１の代表ノード３０（受信制御部３１）において受信された場合、算出部３４は当該勾配を用いて重み係数格納部３３に格納されている重み係数を更新し、送信制御部３５は当該勾配をグループ１の非代表ノード３０に送信する。 When the gradient calculated at the non-representative node 40 of group 2 and the gradient calculated at the representative node 40 are received at the representative node 30 (reception control unit 31) of group 1, the calculation unit 34 uses the gradient to weight The weight coefficient stored in the coefficient storage unit 33 is updated, and the transmission control unit 35 transmits the gradient to the non-representative node 30 of group 1.

次に、グループ１の非代表ノード３０の機能構成の一例について説明する。なお、本実施形態におけるグループ１の非代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは前述した第１の実施形態におけるグループ１の非代表ノード３０と異なる部分について主に述べる。 Next, an example of a functional configuration of the non-representative node 30 of group 1 will be described. The functional configuration of the non-representative node 30 of group 1 in the present embodiment will be described using FIG. 6 for convenience, but here, portions different from the non-representative node 30 of group 1 in the first embodiment described above will be described. I will talk about

受信制御部３１は、グループ１の代表ノード３０において算出された勾配及び他の非代表ノード３０において算出された勾配を、当該代表ノード３０及び当該非代表ノード３０の各々から受信する。 The reception control unit 31 receives the gradient calculated at the representative node 30 of group 1 and the gradient calculated at the other non-representative node 30 from each of the representative node 30 and the non-representative node 30.

学習データ格納部３２には、当該非代表ノード３０に割り当てられた学習データが格納される。重み係数格納部３３には、目的関数の重み係数が格納されている。 The learning data storage unit 32 stores learning data assigned to the non-representative node 30. The weighting factor storage unit 33 stores weighting factors of the objective function.

算出部３４は、受信制御部３１によって受信された勾配（つまり、代表ノード３０において算出された勾配及び他の非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配と、重み係数格納部３３に格納されている重み係数とを用いて、当該重み係数を更新する。この場合、算出部３４は、上述した式（１）に基づいて更新後の重み係数を算出する。算出部３４によって算出された重み係数は、重み係数格納部３３に格納されている重み係数と置換される。これにより、グループ１の非代表ノード３０の重み係数が更新される。 The calculator 34 calculates the gradient received by the reception controller 31 (that is, the gradient calculated at the representative node 30 and the gradient calculated at the other non-representative node 30), the gradient calculated by the calculator 34, and the weight The weighting factor is updated using the weighting factor stored in the factor storage unit 33. In this case, the calculating unit 34 calculates the updated weighting factor based on the above-described equation (1). The weighting factor calculated by the calculating unit 34 is replaced with the weighting factor stored in the weighting factor storage unit 33. Thereby, the weighting factor of the non-representative node 30 of group 1 is updated.

なお、上記したようにグループ１の代表ノード３０に含まれる送信制御部３５によってグループ２の非代表ノード４０において算出された勾配及びグループ２の代表ノード４０において算出された勾配が送信された場合、当該勾配は、受信制御部３１によって受信され、重み係数の更新に用いられる。 When the gradient calculated at the non-representative node 40 of group 2 and the gradient calculated at the representative node 40 of group 2 are transmitted by the transmission control unit 35 included in the representative node 30 of group 1 as described above, The gradient is received by the reception control unit 31 and used to update the weighting factor.

以下、図１３のシーケンスチャートを参照して、本システムの処理手順の一例について説明する。ここでは、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）間の処理について主に説明し、各グループ（グループ１及びグループ２）内の各ワーカノードの処理について後述する。 Hereinafter, an example of the processing procedure of the present system will be described with reference to the sequence chart of FIG. Here, processing between the group 1 (plural worker nodes 30) and the group 2 (plural worker nodes 40) will be mainly described, and the processing of each worker node in each group (group 1 and group 2) will be described later.

ここでは、グループ１に属する複数のワーカノード３０の各々に含まれる重み係数格納部３３には例えば重み係数Ｗ１０が格納されており、グループ２に属する複数のワーカノード４０の各々に含まれる重み係数格納部３３には例えば重み係数Ｗ２０が格納されているものとする。なお、重み係数Ｗ１０及び重み係数Ｗ２０は、同一の値であってもよいし、異なる値であってもよい。 Here, for example, a weighting factor W10 is stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 30 belonging to the group 1, and a weighting factor storage unit included in each of the plurality of worker nodes 40 belonging to the group 2 For example, it is assumed that the weighting factor W20 is stored in 33. The weighting factor W10 and the weighting factor W20 may be the same value or different values.

まず、グループ１（に属する複数のワーカノード３０）においては、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる勾配算出処理が行われる（ステップＳ４１）。この勾配算出処理によれば、グループ１に属する複数のワーカノード３０の各々は、当該ワーカノード３０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ１０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ１に属する複数のワーカノード３０の各々は、互いに同期して勾配算出処理を実行する。 First, in the group 1 (a plurality of worker nodes 30 belonging to the group 1), a gradient calculation process by Synchronous-SGD is performed (step S41). According to this gradient calculation processing, each of the plurality of worker nodes 30 belonging to the group 1 receives the learning data stored in the learning data storage unit 32 included in the worker node 30 and the weights stored in the weighting coefficient storage unit 33. The coefficient W10 is used to calculate the gradient for updating the weighting coefficient of the objective function. Each of the plurality of worker nodes 30 belonging to group 1 executes the gradient calculation processing in synchronization with each other.

複数のワーカノード３０の各々は、ステップＳ４１において算出された勾配及び当該ワーカノード３０に含まれる重み係数格納部３３に格納されている重み係数Ｗ１０を用いて新たな重み係数（以下、重み係数Ｗ１１と表記）を算出する。これにより、複数のワーカノード３０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ１０は、算出された重み係数Ｗ１１に更新される（ステップＳ４２）。 Each of the plurality of worker nodes 30 uses a gradient calculated in step S41 and a weighting factor W10 stored in the weighting factor storage unit 33 included in the worker node 30 to be a new weighting factor (hereinafter referred to as a weighting factor W11). Calculate). Thus, the weighting factor W10 stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 30 is updated to the calculated weighting factor W11 (step S42).

ここで、ステップＳ４２の処理が実行されると、上記したステップＳ４１において算出された勾配がグループ１の代表ノード３０からグループ２の代表ノード４０に送信される（ステップＳ４３）。 Here, when the process of step S42 is executed, the gradient calculated in step S41 described above is transmitted from the representative node 30 of group 1 to the representative node 40 of group 2 (step S43).

グループ２の代表ノード４０は、ステップＳ４３において送信された勾配を受信する。このように受信された勾配は、グループ２に属する複数のワーカノード４０において共有される。これにより、複数のワーカノード４０の各々は、グループ２の代表ノード４０において受信された勾配及び当該ワーカノード４０に含まれる重み係数格納部３３に格納されている重み係数Ｗ２０を用いて新たな重み係数（以下、重み係数Ｗ２１と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ２０は、算出された重み係数Ｗ２１に更新される（ステップＳ４４）。 The representative node 40 of group 2 receives the gradient transmitted in step S43. The gradient thus received is shared by a plurality of worker nodes 40 belonging to group 2. Thereby, each of the plurality of worker nodes 40 uses the gradient received at the representative node 40 of the group 2 and the weighting coefficient W20 stored in the weighting coefficient storage unit 33 included in the worker node 40 to generate a new weighting coefficient ( Hereinafter, the weighting coefficient W21 is calculated. Thus, the weighting factor W20 stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting factor W21 (step S44).

また、グループ２（に属する複数のワーカノード４０）においては、グループ１と同様に、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる勾配算出処理が行われる（ステップＳ４５）。この勾配算出処理によれば、グループ２に属する複数のワーカノード４０の各々は、当該ワーカノード４０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ２１を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ２に属する複数のワーカノード４０の各々は、互いに同期して勾配算出処理を実行する。 Further, in the group 2 (a plurality of worker nodes 40 belonging to the group 2), as in the group 1, the gradient calculation process by Synchronous-SGD is performed (step S45). According to this gradient calculation process, each of the plurality of worker nodes 40 belonging to group 2 receives the learning data stored in the learning data storage unit 32 included in the worker node 40 and the weights stored in the weighting coefficient storage unit 33. The coefficient W21 is used to calculate a gradient for updating the weighting coefficient of the objective function. Each of the plurality of worker nodes 40 belonging to group 2 executes the gradient calculation processing in synchronization with each other.

複数のワーカノード４０の各々は、ステップＳ４５において算出された勾配及び当該ワーカノード４０に含まれる重み係数格納部３３に格納されている重み係数２１を用いて新たな重み係数（以下、重み係数Ｗ２２と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ２１は、算出された重み係数Ｗ２２に更新される（ステップＳ４６）。 Each of the plurality of worker nodes 40 uses the gradient calculated in step S45 and the weighting factor 21 stored in the weighting factor storage unit 33 included in the worker node 40 and uses it as a new weighting factor (hereinafter referred to as weighting factor W22). Calculate). Thereby, the weighting factor W21 stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting factor W22 (step S46).

ここで、ステップＳ４６の処理が実行されると、上記したステップＳ４５において算出された勾配がグループ２の代表ノード４０からグループ１の代表ノード３０に送信される（ステップＳ４７）。 Here, when the process of step S46 is executed, the gradient calculated in step S45 described above is transmitted from the representative node 40 of group 2 to the representative node 30 of group 1 (step S47).

グループ１の代表ノード３０は、ステップＳ４７において送信された勾配を受信する。このように受信された勾配は、グループ１に属する複数のワーカノード３０において共有される。これにより、複数のワーカノード３０の各々は、代表ノード３０において受信された勾配及び当該ワーカノード３０に含まれる重み係数格納部３３に格納されている重み係数Ｗ１１を用いて新たな重み係数（以下、重み係数Ｗ１２と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数１１は、算出された重み係数Ｗ１２に更新される（ステップＳ４８）。 The representative node 30 of group 1 receives the gradient transmitted in step S47. The gradient thus received is shared by a plurality of worker nodes 30 belonging to group 1. Thus, each of the plurality of worker nodes 30 uses the gradient received at the representative node 30 and the weighting factor W11 stored in the weighting factor storage unit 33 included in the worker node 30 to use a new weighting factor (hereinafter, weighting Calculate the coefficient W12). As a result, the weighting factor 11 stored in the weighting factor storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting factor W12 (step S48).

図１３においてはステップＳ４１〜Ｓ４８の処理について説明したが、当該図１３の処理は、複数のワーカノード３０及び４０の各々に含まれる学習データ格納部３２に格納されている学習データの全てについて勾配算出処理（つまり、並列分散学習処理）が実行されるまで継続して実行される。 Although the process of steps S41 to S48 has been described in FIG. 13, the process of FIG. 13 calculates gradients for all learning data stored in learning data storage unit 32 included in each of a plurality of worker nodes 30 and 40. It is continuously executed until the processing (that is, parallel distributed learning processing) is performed.

上記したように本実施形態によれば、グループ１及びグループ２内では互いに同期して処理が実行されるが、グループ１の処理とグループ２の処理とは非同期に実行される。 As described above, according to the present embodiment, the processing is executed in synchronization with each other in group 1 and group 2, but the processing of group 1 and the processing of group 2 are executed asynchronously.

すなわち、図１３に示すステップＳ４３においてはステップＳ４１において算出された勾配がグループ１の代表ノード３０からグループ２の代表ノード４０に送信されるが、当該勾配の送信タイミングは、例えばステップＳ４１及びＳ４２の処理後であればよく、グループ２（に属する複数のワーカノード４０）の処理によって影響されない。同様に、図１３に示すステップＳ４７における勾配の送信タイミングは、例えばステップＳ４５及びＳ４６の処理後であればよく、グループ１（に属する複数のワーカノード３０）の処理によって影響されない。 That is, although the gradient calculated in step S41 is transmitted from the representative node 30 of group 1 to the representative node 40 of group 2 in step S43 shown in FIG. 13, the transmission timing of the gradient is, for example, that of steps S41 and S42. It does not need to be affected by the processing of group 2 (a plurality of worker nodes 40 belonging to it) as long as it is after processing. Similarly, the transmission timing of the gradient in step S47 shown in FIG. 13 may be, for example, after the processing of steps S45 and S46, and is not influenced by the processing of group 1 (a plurality of worker nodes 30 belonging to it).

以下、上記した図１３に示す処理が実行される際の、各グループの代表ノード及び非代表ノードの処理について説明する。 Hereinafter, processing of the representative node and the non-representative node of each group when the processing shown in FIG. 13 described above is executed will be described.

まず、図１４のフローチャートを参照して、代表ノードの処理手順の一例について説明する。ここでは、グループ１の代表ノード３０の処理手順について説明する。 First, an example of the processing procedure of the representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the representative node 30 of group 1 will be described.

グループ１の代表ノード３０に含まれる算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ１１）を用いて勾配を算出する（ステップＳ５１）。以下、代表ノード３０において算出された勾配を代表ノード３０の勾配と称する。 The calculation unit 34 included in the representative node 30 of the group 1 uses the learning data stored in the learning data storage unit 32 and the weighting coefficient (for example, weighting coefficient W11) stored in the weighting coefficient storage unit 33 to use the gradient. Is calculated (step S51). Hereinafter, the gradient calculated at the representative node 30 is referred to as the gradient of the representative node 30.

ここで、グループ１の代表ノード３０がステップＳ５１の処理を実行する場合、当該グループ１の非代表ノード３０においては、後述するように当該代表ノード３０と同期して勾配が算出される。以下、このように非代表ノード３０において算出された勾配を非代表ノード３０の勾配と称する。 Here, when the representative node 30 of the group 1 executes the process of step S51, the non-representative node 30 of the group 1 calculates the gradient in synchronization with the representative node 30, as described later. Hereinafter, the gradient thus calculated at the non-representative node 30 is referred to as the gradient of the non-representative node 30.

この場合、グループ１の代表ノード３０の勾配及び非代表ノード３０の勾配がグループ１内で配布される（ステップＳ５２）。すなわち、グループ１の代表ノード３０の勾配が当該代表ノード３０からグループ１の非代表ノード３０（の各々）に送信されるとともに、グループ１の非代表ノード３０（の各々）の勾配がグループ１の代表ノード３０（に含まれる受信制御部３１）において受信される。 In this case, the gradient of the representative node 30 of group 1 and the gradient of the non-representative node 30 are distributed within group 1 (step S52). That is, while the gradient of the representative node 30 of group 1 is transmitted from the representative node 30 to (non-representative node 30 of group 1), the gradient of (non-representative node 30 of group 1) is group 1 It is received by the representative node 30 (the reception control unit 31 included therein).

次に、算出部３４は、ステップＳ５１において算出された勾配（代表ノード３０の勾配）及び受信制御部３１によって受信された勾配（非代表ノード３０の勾配）の平均値を算出する（ステップＳ５３）。以下、ステップＳ５３において算出された勾配の平均値をグループ１の平均勾配と称する。 Next, the calculation unit 34 calculates an average value of the gradient (the gradient of the representative node 30) calculated in step S51 and the gradient (the gradient of the non-representative node 30) received by the reception control unit 31 (step S53) . Hereinafter, the average value of the gradients calculated in step S53 is referred to as the average gradient of group 1.

ステップＳ５３の処理が実行されると、算出部３４は、グループ１の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１１）に更新する（ステップＳ５４）。これにより、グループ１の代表ノード３０の重み係数がグループ１内のワーカノード３０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S53 is executed, the calculating unit 34 calculates a new weighting factor using the average gradient of group 1, and the weighting factor stored in the weighting factor storage unit 33 is the calculated weighting factor. (For example, the weighting coefficient W11) is updated (step S54). As a result, the weighting factor of the representative node 30 of group 1 is updated to the weighting factor based on the gradient calculated by each of the worker nodes 30 in group 1.

ステップＳ５４の処理が実行されると、送信制御部３５は、グループ１の平均勾配をグループ２の代表ノード４０に送信する（ステップＳ５５）。 When the process of step S54 is executed, the transmission control unit 35 transmits the average gradient of group 1 to the representative node 40 of group 2 (step S55).

上記したステップＳ５１〜Ｓ５５の処理は、図１３に示すステップＳ４１〜Ｓ４３においてグループ１の代表ノード３０によって実行される。 The processes of steps S51 to S55 described above are executed by the representative node 30 of group 1 in steps S41 to S43 illustrated in FIG. 13.

なお、後述するように図１４に示す処理は、グループ２の代表ノード４０においても同様に実行される。このため、例えばグループ２の代表ノード４０においてステップＳ５５の処理に相当する処理が実行された場合には、グループ１の代表ノード３０に含まれる受信制御部３１は、グループ２の平均勾配を受信することができる。 The process shown in FIG. 14 is similarly executed in the representative node 40 of the group 2 as described later. Therefore, for example, when the process corresponding to the process of step S55 is executed in the representative node 40 of the group 2, the reception control unit 31 included in the representative node 30 of the group 1 receives the average gradient of the group 2. be able to.

ここで、受信制御部３１がグループ２の平均勾配を受信したか否かが判定される（ステップＳ５６）。 Here, it is determined whether the reception control unit 31 has received the average gradient of group 2 (step S56).

グループ２の平均勾配を受信したと判定された場合（ステップＳ５６のＹＥＳ）、送信制御部３５は、受信フラグ「Ｔｒｕｅ」をグループ１の非代表ノード３０に送信する（ステップＳ５７）。 If it is determined that the average gradient of group 2 is received (YES in step S56), the transmission control unit 35 transmits the reception flag "True" to the non-representative node 30 in group 1 (step S57).

また、送信制御部３５は、受信制御部３１によって受信されたグループ２の平均勾配を、グループ１の非代表ノード３０に送信する（ステップＳ５８）。 Further, the transmission control unit 35 transmits the average gradient of the group 2 received by the reception control unit 31 to the non-representative node 30 of the group 1 (step S58).

ステップＳ５８の処理が実行されると、算出部３４は、グループ２の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１２）に更新する（ステップＳ５９）。これにより、グループ１の代表ノード３０の重み係数がグループ２内のワーカノード４０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S58 is executed, the calculating unit 34 calculates a new weighting factor using the average gradient of group 2, and the weighting factor stored in the weighting factor storage unit 33 is the calculated weighting factor. (For example, the weighting factor W12) is updated (step S59). Thereby, the weighting factor of the representative node 30 of the group 1 is updated to the weighting factor based on the gradient calculated by each of the worker nodes 40 in the group 2.

上記したステップＳ５６〜Ｓ５９の処理は、図１３に示すステップＳ４８においてグループ１の代表ノード３０によって実行される。 The processes of steps S56 to S59 described above are executed by the representative node 30 of group 1 in step S48 shown in FIG.

なお、ステップＳ５６においてグループ２の平均勾配を受信していないと判定された場合（ステップＳ５６のＮＯ）、送信制御部３５は、受信フラグ「Ｆａｌｓｅ」を非代表ノード３０に送信する（ステップＳ６０）。 When it is determined in step S56 that the average gradient of group 2 is not received (NO in step S56), the transmission control unit 35 transmits the reception flag "False" to the non-representative node 30 (step S60). .

上記した図１４に示す処理が実行されることにより、グループ１の代表ノード３０の重み係数は、グループ１に属する複数のワーカノード３０の各々によって算出された勾配（グループ１の平均勾配）を用いて更新されるとともに、グループ２に属する複数のワーカノード４０の各々によって算出された勾配（グループ２の平均勾配）を用いて更に更新される。 By executing the process shown in FIG. 14 described above, the weighting factor of the representative node 30 of group 1 uses the gradient (average gradient of group 1) calculated by each of the plurality of worker nodes 30 belonging to group 1 While being updated, it is further updated using the gradient (average gradient of group 2) calculated by each of the plurality of worker nodes 40 belonging to group 2.

なお、図示されていないが、図１４に示す処理は、上記した図１３に示す処理が継続して実行されている間は繰り返し実行される。 Although not illustrated, the process illustrated in FIG. 14 is repeatedly performed while the process illustrated in FIG. 13 described above is continuously performed.

次に、図１５のフローチャートを参照して、非代表ノードの処理手順の一例について説明する。ここでは、グループ１の非代表ノード３０の処理手順について説明する。 Next, an example of the processing procedure of the non-representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the non-representative node 30 of group 1 will be described.

非代表ノード３０に含まれる算出部３４は、上記した代表ノード３０における勾配の算出処理と同期して、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ１０）を用いて勾配を算出する（ステップＳ７１）。 The calculation unit 34 included in the non-representative node 30 is stored in the learning data and weighting factor storage unit 33 stored in the learning data storage unit 32 in synchronization with the gradient calculation process in the representative node 30 described above. The gradient is calculated using a weighting factor (for example, weighting factor W10) (step S71).

この場合、上記した代表ノード３０の勾配及びステップＳ７１において算出された勾配（非代表ノード３０の勾配）がグループ１内で配布される（ステップＳ７２）。すなわち、グループ１の非代表ノード３０の勾配が当該非代表ノード３０からグループ１の代表ノード３０（及び他の非代表ノード３０）に送信されるとともに、当該代表ノード３０（及び他の非代表ノード３０）の勾配が非代表ノード３０（に含まれる受信制御部３１）において受信される。 In this case, the gradient of the representative node 30 described above and the gradient calculated in step S71 (the gradient of the non-representative node 30) are distributed in the group 1 (step S72). That is, while the gradient of the non-representative node 30 of group 1 is transmitted from the non-representative node 30 to the representative node 30 (and the other non-representative node 30) of the group 1, the representative node 30 (and the other non-representative nodes) The gradient of 30) is received by the non-representative node 30 (the reception control unit 31 included therein).

次に、算出部３４は、ステップＳ７１において算出された勾配（非代表ノード３０の勾配）及び受信制御部３１によって受信された勾配（代表ノード３０及び他の非代表ノード３０の勾配）の平均値を算出する（ステップＳ７３）。なお、このステップＳ７３において算出される勾配の平均値は、上記した図１４に示すステップＳ５３において算出されたグループ１の平均勾配に相当する。 Next, the calculating unit 34 calculates an average value of the gradient (the gradient of the non-representative node 30) calculated in step S71 and the gradient received by the reception control unit 31 (the gradients of the representative node 30 and other non-representative nodes 30). Is calculated (step S73). The average value of the gradients calculated in step S73 corresponds to the average gradient of group 1 calculated in step S53 shown in FIG. 14 described above.

ステップＳ７３の処理が実行されると、算出部３４は、グループ１の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１１）に更新する（ステップＳ７４）。これにより、グループ１の非代表ノード３０の重み係数がグループ１内のワーカノード３０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S73 is executed, the calculating unit 34 calculates a new weighting factor using the average gradient of group 1, and the weighting factor stored in the weighting factor storage unit 33 is the calculated weighting factor. (For example, the weighting coefficient W11) is updated (step S74). Thereby, the weighting factor of the non-representative node 30 in group 1 is updated to the weighting factor based on the gradient calculated by each of the worker nodes 30 in group 1.

上記したステップＳ７１〜Ｓ７４の処理は、図１３に示すステップＳ４１及びＳ４２においてグループ１の非代表ノード３０によって実行される処理である。 The processes of steps S71 to S74 described above are processes executed by the non-representative node 30 of group 1 in steps S41 and S42 shown in FIG.

ここで、上記した図１４に示すステップＳ５７またはステップＳ６０において代表ノード３０から送信された受信フラグは、非代表ノード３０に含まれる受信制御部３１によって受信される。 Here, the reception flag transmitted from the representative node 30 in step S57 or step S60 shown in FIG. 14 described above is received by the reception control unit 31 included in the non-representative node 30.

この場合、受信制御部３１によって受信フラグ「Ｔｒｕｅ」が受信されたか否かが判定される（ステップＳ７６）。 In this case, the reception control unit 31 determines whether the reception flag "True" has been received (step S76).

受信フラグ「Ｔｒｕｅ」が受信されたと判定された場合（ステップＳ７６のＹＥＳ）、受信制御部３１は、図１４に示すステップＳ５８においてグループ１の代表ノード３０から送信されたグループ１の平均勾配を受信する（ステップＳ７７）。 If it is determined that the reception flag "True" has been received (YES in step S76), the reception control unit 31 receives the average gradient of group 1 transmitted from the representative node 30 of group 1 in step S58 illustrated in FIG. (Step S77).

ステップＳ７７の処理が実行されると、算出部３４は、グループ２の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１２）に更新する（ステップＳ７８）。これにより、グループ１の非代表ノード３０の重み係数がグループ２内のワーカノード４０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S77 is executed, the calculating unit 34 calculates a new weighting factor using the average gradient of group 2, and the weighting factor stored in the weighting factor storage unit 33 is the calculated weighting factor. (For example, the weighting factor W12) is updated (step S78). Thereby, the weighting factor of the non-representative node 30 in group 1 is updated to the weighting factor based on the gradient calculated by each of the worker nodes 40 in group 2.

上記したステップＳ７５〜Ｓ７８の処理は、図１３に示すステップＳ４８においてグループ１の非代表ノード３０によって実行される。 The processes of steps S75 to S78 described above are executed by the non-representative node 30 of group 1 in step S48 shown in FIG.

なお、ステップＳ７６において受信フラグ「Ｔｒｕｅ」が受信されていない（つまり、受信フラグ「Ｆａｌｓｅ」が受信されている）と判定された場合（ステップＳ７６のＮＯ）、グループ２の平均勾配が受信されないため、ステップＳ７７及びＳ７８の処理は実行されない。 When it is determined in step S76 that the reception flag "True" has not been received (that is, the reception flag "False" has been received) (NO in step S76), the average gradient of group 2 is not received. The processes of steps S77 and S78 are not executed.

上記した図１５の処理が実行されることにより、グループ１の非代表ノード３０の重み係数は、グループ１に属する複数のワーカノード３０の各々によって算出された勾配（グループ１の平均勾配）を用いて更新されるとともに、グループ２に属する複数のワーカノード４０の各々によって算出された勾配（グループ２の平均勾配）を用いて更に更新される。 By executing the process of FIG. 15 described above, the weighting factor of the non-representative node 30 of group 1 uses the gradient (average gradient of group 1) calculated by each of the plurality of worker nodes 30 belonging to group 1 While being updated, it is further updated using the gradient (average gradient of group 2) calculated by each of the plurality of worker nodes 40 belonging to group 2.

なお、図示されていないが、図１５に示す処理は、図１３に示す処理が継続して実行されている間は繰り返し実行される。 Although not illustrated, the process illustrated in FIG. 15 is repeatedly performed while the process illustrated in FIG. 13 is continuously performed.

上記したようにグループ１においては、当該グループ１に属する全てのワーカノード３０間で勾配を共有し、当該各ワーカノード３０においてグループ１の平均勾配を算出する処理が実行される。この場合、ＭＰＩで定義されるＡｌｌｒｅｄｕｃｅと称される集団通信アルゴリズム（ＭＰＩ＿Ａｌｌｒｅｄｕｃｅ）を用いることで、当該各ワーカノード３０間での勾配の送信と平均勾配（全ワーカノード３０の勾配和）の算出処理を効率的に実行することが可能である。ここでは、ＭＰＩ＿Ａｌｌｒｅｄｕｃｅを用いる場合について説明したが、当該ＭＰＩ＿Ａｌｌｒｅｄｕｃｅと同程度の他の処理が実行されてもよい。 As described above, in the group 1, all worker nodes 30 belonging to the group 1 share a gradient, and each worker node 30 executes a process of calculating the average gradient of the group 1. In this case, by using a collective communication algorithm (MPI_Allreduce) called Allreduce defined by MPI, efficiency of gradient transmission and average gradient (gradient sum of all worker nodes 30) between the respective worker nodes 30 can be calculated. It is possible to carry out. Here, the case of using MPI_Allreduce has been described, but another process similar to the MPI_Allreduce may be executed.

ここでは、グループ１の代表ノード３０及び非代表ノード３０の処理について説明したが、グループ２の代表ノード４０及び非代表ノード４０においてもグループ１の代表ノード３０及び非代表ノード３０と同様の処理が実行される。 Here, the processing of the representative node 30 and the non-represented node 30 of the group 1 has been described, but the same processing as the representative node 30 and the non-represented node 30 of the group 1 is performed in the representative node 40 and the non-represented node 40 of the group 2 To be executed.

本実施形態においては、グループ１（代表ノード３０及び非代表ノード３０）による勾配の算出がグループ２（代表ノード４０及び非代表ノード４０）による勾配の算出よりも早い場合、グループ１において算出された勾配はグループ２の代表ノード４０に送信される。この場合、グループ２の代表ノード４０（及び非代表ノード４０）は、並列分散学習処理においてグループ１の平均勾配に基づいて重み係数を算出（更新）する。 In this embodiment, if the gradient calculation by group 1 (representative node 30 and non-representative node 30) is faster than the gradient calculation by group 2 (representative node 40 and non-representative node 40), calculation is performed in group 1 The gradient is sent to the representative node 40 of group 2. In this case, the representative node 40 (and the non-representative node 40) of group 2 calculates (updates) the weighting factor based on the average gradient of group 1 in parallel distributed learning processing.

また、グループ２による勾配の算出がグループ１による勾配の算出よりも早い場合、グループ２において算出された勾配はグループ１の代表ノード３０に送信される。この場合、グループ１の代表ノード３０（及び非代表ノード３０）は、並列分散処理においてグループ２の平均勾配に基づいて重み係数を算出（更新）する。 Also, if the gradient calculation by group 2 is faster than the gradient calculation by group 1, the gradient calculated in group 2 is transmitted to the representative node 30 of group 1. In this case, the representative node 30 (and the non-representative node 30) of group 1 calculates (updates) a weighting factor based on the average gradient of group 2 in parallel distributed processing.

上記したように本実施形態においては、第一階層として、複数のワーカノード３０及び４０を複数のグループ（グループ１及びグループ２）に分割し、当該グループ内で集団通信型Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理を行う。この第一階層においては、グループ内のワーカノード間で勾配を共有し、当該ワーカノードの各々において平均勾配が算出されて重み係数が更新される。このような第一階層によれば、同期コスト及びバッチサイズを抑制することが可能となる。 As described above, in the present embodiment, a plurality of worker nodes 30 and 40 are divided into a plurality of groups (group 1 and group 2) as the first hierarchy, and parallel distributed learning by collective communication type Synchronous-SGD in the group Do the processing. In this first hierarchy, the gradients are shared among the worker nodes in the group, the average gradient is calculated in each of the worker nodes, and the weighting factor is updated. According to such a first hierarchy, it is possible to suppress the synchronization cost and the batch size.

また、第二階層として、第一階層における各グループの代表ノード同士でバッチサイズ非依存並列方式による並列分散学習処理を行う。この第二階層においては、各代表ノードは同期する必要がなく、高いスループットを得ることができる。 Further, as the second layer, parallel distributed learning processing by batch size independent parallel method is performed between representative nodes of each group in the first layer. In this second hierarchy, each representative node does not need to be synchronized, and high throughput can be obtained.

すなわち、本実施形態においては、前述した第１の実施形態と同様に、例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式を階層的に組み合わせる構成により、並列分散学習処理における高いスケーラビリティを実現することができ、より大きな並列数での並列分散学習処理が可能となる。 That is, in the present embodiment, as in the first embodiment described above, high scalability in parallel distributed learning processing can be realized, for example, by hierarchically combining Synchronous-SGD and a batch size independent parallel method. This enables parallel distributed learning processing with a larger parallel number.

なお、本実施形態において説明した「グループ１による勾配の算出がグループ２による勾配の算出よりも早い場合」には、グループ２における勾配の算出より先にグループ２の代表ノード４０がグループ１において算出された勾配（グループ１における勾配算出結果）を受信する場合が含まれるものとする。 In the case where “the gradient calculation by group 1 is faster than the gradient calculation by group 2” described in the present embodiment, the representative node 40 of group 2 calculates the group 1 prior to the calculation of the gradient in group 2 It is assumed that the received gradient (the gradient calculation result in group 1) is received.

これによれば、例えばグループ２において勾配が算出されるより先にグループ２の代表ノード４０がグループ１において算出された勾配を受信した場合、当該第１グループにおいては、当該グループ１において算出された勾配（つまり、第１乃至第２勾配）に基づいて重み係数が更新される。一方、グループ２においては、グループ１において算出された勾配に基づいて重み係数が更新された後に、グループ２において算出された勾配（つまり、第３乃至第４勾配）に基づいて重み係数が更に更新される。換言すれば、グループ２における勾配の算出より先にグループ２の代表ノード４０がグループ１において算出された勾配を受信した場合には、グループ１においては第１乃至第２勾配に基づいて重み係数が更新され、グループ２においては第１乃至第４勾配に基づいて重み係数が更新される。 According to this, for example, when the representative node 40 of the group 2 receives the gradient calculated in the group 1 before the gradient is calculated in the group 2, it is calculated in the group 1 in the first group. The weighting factors are updated based on the gradients (ie, the first and second gradients). On the other hand, in group 2, after the weighting factor is updated based on the gradient calculated in group 1, the weighting factor is further updated based on the gradient calculated in group 2 (that is, the third to fourth gradients) Be done. In other words, when the representative node 40 of group 2 receives the gradient calculated in group 1 prior to the calculation of the gradient in group 2, the weighting factor in group 1 is based on the first and second gradients. The weighting factor is updated based on the first to fourth gradients in group 2.

また、本実施形態において説明した「グループ２による勾配の算出がグループ１による勾配の算出よりも早い場合」には、グループ１における勾配の算出より先にグループ１の代表ノード３０がグループ２において算出された勾配（グループ２における勾配算出結果）を受信する場合が含まれるものとする。 Further, in the case where “the calculation of the gradient by group 2 is faster than the calculation of the gradient by group 1” described in the present embodiment, the representative node 30 of group 1 calculates in group 2 prior to the calculation of the gradient in group 1 It is assumed that the received gradient (the gradient calculation result in group 2) is received.

これによれば、例えばグループ１において勾配が算出されるより先にグループ１の代表ノード３０がグループ１において算出された勾配を受信した場合、当該第２グループにおいては、当該グループ２において算出された勾配（つまり、第３乃至第４勾配）に基づいて重み係数が更新される。一方、グループ１においては、グループ２において算出された勾配に基づいて重み係数が更新された後に、グループ１において算出された勾配（つまり、第１乃至第２勾配）に基づいて重み係数が更に更新される。換言すれば、グループ１における勾配の算出より先にグループ１の代表ノード３０がグループ２において算出された勾配を受信した場合には、グループ２においては第３乃至第４勾配に基づいて重み係数が更新され、グループ１においては第１乃至第４勾配に基づいて重み係数が更新される。 According to this, for example, when the representative node 30 of the group 1 receives the gradient calculated in the group 1 before the gradient is calculated in the group 1, it is calculated in the group 2 in the second group. The weighting factors are updated based on the gradients (ie, the third to fourth gradients). On the other hand, in group 1, after the weighting factor is updated based on the gradient calculated in group 2, the weighting factor is further updated based on the gradient calculated in group 1 (that is, the first and second gradients) Be done. In other words, when the representative node 30 of group 1 receives the gradient calculated in group 2 prior to the calculation of the gradient in group 1, the weighting coefficient in group 2 is calculated based on the third to fourth gradients. The weighting factor is updated based on the first to fourth gradients in group 1.

すなわち、本実施形態においては、上記した各グループにおける勾配算出処理の順番（いずれのグループにおける勾配算出処理が早いか）という観点ではなく、いずれのグループにおける勾配算出処理の結果がグループ１またはグループ２の代表ノードにおいて早く受信されるかという観点に基づいて重み係数が更新されるようにしても構わない。 That is, in the present embodiment, the result of the gradient calculation process in any group is group 1 or 2 instead of the order of the gradient calculation process in each group described above (in which group the gradient calculation process is quicker). The weighting factor may be updated based on the viewpoint of early reception at the representative node of.

なお、本実施形態においては、グループ内で勾配が共有されるものとして説明したが、例えばグループ内のワーカノードの各々において更新された重み係数が当該グループ内で共有されるような構成とすることも可能である。 In the present embodiment, the gradient is described as being shared in a group, but, for example, the weighting factor updated in each of the worker nodes in the group may be shared in the group. It is possible.

以上述べた少なくとも１つの実施形態によれば、並列分散学習処理における高いスケーラビリティを実現することが可能なシステム、プログラム及び方法を提供することができる。 According to at least one embodiment described above, it is possible to provide a system, program and method capable of realizing high scalability in parallel distributed learning processing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 While certain embodiments of the present invention have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the invention. These embodiments can be implemented in other various forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the invention described in the claims and the equivalents thereof as well as included in the scope and the gist of the invention.

１０…システム、２０…サーバノード、２１，３２…学習データ格納部、２２…データ割当部、２３，３５…送信制御部、２４，３３…重み係数格納部、２５，３１…受信制御部、２６，３４…算出部、３０，４０…ワーカーノード、２０１，３０１…ＣＰＵ、２０２，３０２…システムコントローラ、２０３，３０３…主メモリ、２０４，３０４…ＢＩＯＳ−ＲＯＭ、２０５，３０６…不揮発性メモリ、２０６，３０６…通信デバイス、２０７，３０７…ＥＣ。 DESCRIPTION OF SYMBOLS 10 System 20 20 server node 21 32, learning data storage unit 22 data allocation unit 23, 35 transmission control unit 24, 33 weight coefficient storage unit 25, 31 reception control unit 26, , 34: calculation unit, 30, 40: worker node, 201, 301: CPU, 202, 302, system controller, 203, 303, main memory, 204, 304, BIOS-ROM, 205, 306, non-volatile memory, 206 , 306 ... communication device, 207, 307 ... EC.

Claims

A system,
A first node and a second node belonging to a first group,
And a third node and a fourth node belonging to a second group,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing A first gradient is calculated, and a second gradient is calculated by the second node to update the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor A third slope is calculated, and a fourth slope for updating the third weighting factor of the objective function to the fourth weighting factor is calculated by the fourth node,
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients.
system.

A system,
A first node and a second node belonging to a first group,
And a third node and a fourth node belonging to a second group,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing A first gradient is calculated, and a second gradient is calculated by the second node to update the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor A third slope is calculated, and a fourth slope for updating the third weighting factor of the objective function to the fourth weighting factor is calculated by the fourth node,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the (m + 1) th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.
system.

A system,
A first node and a second node belonging to a first group,
And a third node and a fourth node belonging to a second group,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing A first gradient is calculated, and a second gradient is calculated by the second node to update the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor A third slope is calculated, and a fourth slope for updating the third weighting factor of the objective function to the fourth weighting factor is calculated by the fourth node,
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the (m + 1) th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.
system.

The second weighting factor is updated based on a fifth gradient calculated from the first and second gradients,
The system according to any one of claims 1 to 3, wherein the fourth weighting factor is updated based on a sixth gradient calculated from the third to fourth gradients.

And a server node communicably connected to the first node and the third node,
The server node calculates the second weighting factor and the fourth weighting factor.
The first node transmits to the second node the second weighting factor transmitted from the server node.
The system according to any one of claims 1 to 3, wherein the third node transmits, to a fourth node, a fourth weighting factor transmitted from the server node.

When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the first to second gradients correspond to the third node and the fourth node. Sent to the node and
In the m + 1th parallel distributed processing executed by the third node and the fourth node, the fourth weighting factor is further updated based on the first to fourth gradients,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the third to fourth gradients correspond to the first node and the second node. Sent to the node and
The system according to claim 3, wherein in the (n + 1) th parallel distributed processing executed by the first node and the second node, the second weighting factor is further updated based on the first to fourth gradients.

The difference in processing speed between the first node and the second node belonging to the first group is equal to or less than a first threshold,
The system according to any one of claims 1 to 3, wherein a difference between processing speeds of the third node and the fourth node belonging to the second group is equal to or less than a second threshold.

A plurality of nodes including the first node and the second node belong to the first group,
A plurality of nodes including the third node and the fourth node belong to the second group,
When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, the first number of nodes belonging to the first group is a node belonging to the second group The system of claim 7, wherein the number is less than the second number.

When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, each of the third node and the third node belonging to the second group in the parallel distributed processing 8. The system according to claim 7, wherein the throughput of is smaller than the throughput of the first node and the second node belonging to the first group.

A program executed by one or more computers of a system comprising a first node and a second node belonging to a first group, and a third node and a fourth node belonging to a second group,
On the one or more computers,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing Calculating a first gradient and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor Calculating a third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients.
program.

A program executed by one or more computers of a system comprising a first node and a second node belonging to a first group, and a third node and a fourth node belonging to a second group,
On the one or more computers,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing Calculating a first gradient and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor Calculating a third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the (m + 1) th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.
program.

A program executed by one or more computers of a system comprising a first node and a second node belonging to a first group, and a third node and a fourth node belonging to a second group,
On the one or more computers,
When updating the first weighting factor of the objective function to the second weighting factor by the first node when the first node and the second node execute the n (n is a natural number) parallel distributed processing Calculating a first gradient and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node execute the m (m is a natural number) parallel distributed processing, the third node updates the third weighting factor of the objective function to the fourth weighting factor Calculating a third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the (m + 1) th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.
program.

The second weighting factor is updated based on a fifth gradient calculated from the first and second gradients,
The program according to any one of claims 10 to 12, wherein the fourth weighting factor is updated based on a sixth gradient calculated from the third to fourth gradients.

The second weighting factor and the fourth weighting factor are calculated by a server node,
The first node transmits to the second node the second weighting factor transmitted from the server node.
The program according to any one of claims 10 to 12, wherein the third node transmits a fourth weighting factor transmitted from the server node to a fourth node.

When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the first to second gradients correspond to the third node and the fourth node. Sent to the node and
In the m + 1th parallel distributed processing executed by the third node and the fourth node, the fourth weighting factor is further updated based on the first gradient to the fourth gradient,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the third to fourth gradients correspond to the first node and the second node. Sent to the node and
The program according to claim 10, wherein in the (n + 1) th parallel distributed processing executed by the first node and the second node, the second weighting factor is further updated based on the first to fourth gradients.

The difference in processing speed between the first node and the second node belonging to the first group is equal to or less than a first threshold,
The program according to any one of claims 10 to 12, wherein a difference between processing speeds of the third node and the fourth node belonging to the second group is equal to or less than a second threshold.

A plurality of nodes including the first node and the second node belong to the first group,
A plurality of nodes including the third node and the fourth node belong to the second group,
When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, the first number of nodes belonging to the first group is a node belonging to the second group The program according to claim 16, which is less than the second number of.

When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, each of the third node and the third node belonging to the second group in the parallel distributed processing The program according to claim 16, wherein the processing amount of is smaller than the processing amounts of the first node and the second node belonging to the first group.

The first weighting factor of the objective function is updated by the first node to the second weighting factor when the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distributed processing Calculating a first gradient to be calculated, and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node belonging to the second group execute m (m is a natural number) parallel distributed processing, the third weighting factor of the objective function is updated to the fourth weighting factor by the third node Calculating a third gradient to calculate the third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients.
Method.

The first weighting factor of the objective function is updated by the first node to the second weighting factor when the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distributed processing Calculating a first gradient to be calculated, and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node belonging to the second group execute m (m is a natural number) parallel distributed processing, the third weighting factor of the objective function is updated to the fourth weighting factor by the third node Calculating a third gradient to calculate the third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.

The first weighting factor of the objective function is updated by the first node to the second weighting factor when the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distributed processing Calculating a first gradient to be calculated, and calculating a second gradient for updating the first weighting factor of the objective function to the second weighting factor by the second node;
When the third node and the fourth node belonging to the second group execute m (m is a natural number) parallel distributed processing, the third weighting factor of the objective function is updated to the fourth weighting factor by the third node Calculating a third gradient to calculate the third gradient, and calculating a fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node;
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first and second gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the first to fourth gradients,
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the (n + 1) th time that the first node and the second node execute In the parallel distributed processing, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and the third node and the fourth node execute In the m + 1th parallel distributed processing, the fourth weighting factor updated from the third weighting factor is further updated based on the third to fourth gradients.