JP7159884B2

JP7159884B2 - Information processing device and information processing method

Info

Publication number: JP7159884B2
Application number: JP2019009605A
Authority: JP
Inventors: 裕明三上
Original assignee: Sony Corp; Sony Group Corp
Current assignee: Sony Corp; Sony Group Corp
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2022-10-25
Anticipated expiration: 2038-09-11
Also published as: JP2020042753A

Description

本開示は、情報処理装置および情報処理方法に関する。 The present disclosure relates to an information processing device and an information processing method.

近年、脳神経系の仕組みを模した数学モデルであるニューラルネットワークが注目されている。また、ニューラルネットワークによる学習を高速化するための技術も多く提案されている。例えば、非特許文献１には、学習中にバッチサイズを変更する技術が開示されている。 In recent years, attention has been paid to neural networks, which are mathematical models that imitate the mechanisms of the nervous system. Also, many techniques for speeding up learning by neural networks have been proposed. For example, Non-Patent Document 1 discloses a technique for changing the batch size during learning.

Samuel L. Smith、外３名、「Don't Decay the Learning Rate, Increase the Batch Size」、２０１７年１１月１日、［Online］、［平成３０年９月７日検索］、インターネット<https://arxiv.org/pdf/1711.00489.pdf>Samuel L. Smith, 3 others, "Don't Decay the Learning Rate, Increase the Batch Size", November 1, 2017, [Online], [searched September 7, 2018], Internet <https: //arxiv.org/pdf/1711.00489.pdf>

しかし、非特許文献１に記載の技術は、特定の学習手法に依存しており、当該手法を採用しない学習には適用することが困難である。 However, the technology described in Non-Patent Document 1 depends on a specific learning method, and is difficult to apply to learning that does not employ that method.

本開示によれば、ニューラルネットワークによる学習に係る理想状態とのギャップ値を取得する取得部と、前記理想状態とのギャップ値に基づいて、前記ニューラルネットワークにおけるバッチサイズの値の動的な変更を指示する指示部と、を備える、情報処理装置が提供される。 According to the present disclosure, an acquisition unit that acquires a gap value from an ideal state related to learning by a neural network, and based on the gap value from the ideal state, dynamically change the batch size value in the neural network. An information processing device is provided, comprising: an instruction unit for instructing.

また、本開示によれば、プロセッサが、ニューラルネットワークによる学習に係る理想状態とのギャップ値を取得することと、前記理想状態とのギャップ値に基づいて、前記ニューラルネットワークにおけるバッチサイズの値の動的な変更を指示することと、を含む、情報処理方法が提供される。 In addition, according to the present disclosure, the processor acquires a gap value from an ideal state related to learning by the neural network, and based on the gap value from the ideal state, changes the batch size value in the neural network. A method of processing information is provided, comprising:

ＳｔｅｐＬｅａｒｎｉｎｇｒａｔｅｄｅｃａｙを適用した場合の損失の推移の一例を示す図である。FIG. 10 is a diagram showing an example of loss transition when Step Learning rate decay is applied; 本開示の一実施形態に係るバッチサイズ変更の概要について説明するための図である。FIG. 5 is a diagram for explaining an overview of batch size change according to an embodiment of the present disclosure; FIG. 同実施形態に係る情報処理装置の機能構成例を示すブロック図である。2 is a block diagram showing an example of the functional configuration of the information processing device according to the same embodiment; FIG. 同実施形態に係る損失の傾きに基づくバッチサイズ変更をＩｍａｇｅＮｅｔ／ＲｅｓＮｅｔ－５０に適用した際の検証結果を示す図である。FIG. 10 is a diagram showing verification results when batch size change based on the gradient of loss according to the embodiment is applied to ImageNet/ResNet-50; 同実施形態に係るトレーニングの値に基づくバッチサイズ変更をＩｍａｇｅＮｅｔ／ＲｅｓＮｅｔ－５０に適用した際の検証結果を示す図である。FIG. 10 is a diagram showing verification results when batch size change based on training values according to the same embodiment is applied to ImageNet/ResNet-50; 同実施形態に係る損失に基づくバッチサイズ変更をＭＮＩＳＴを用いた学習に適用した際の検証結果を示す図である。FIG. 10 is a diagram showing verification results when batch size change based on loss according to the same embodiment is applied to learning using MNIST; 同実施形態に係る損失に基づくバッチサイズ変更をｃｉｆａｒ１０を用いた学習に適用した際の検証結果を示す図である。It is a figure which shows the verification result at the time of applying the batch size change based on the loss which concerns on the same embodiment to learning using cifar10. 同実施形態に係る損失に基づくバッチサイズ変更をｃｉｆａｒ１０を用いた学習に適用した際の検証結果を示す図である。It is a figure which shows the verification result at the time of applying the batch size change based on the loss which concerns on the same embodiment to learning using cifar10. 同実施形態に係る損失の１回微分値に基づくバッチサイズの変更を実現する訓練スクリプトおよび損失傾き計算モジュールの一例を示す図である。It is a figure which shows an example of the training script which implement|achieves the change of the batch size based on the 1 time differential value of the loss, and a loss inclination calculation module which concerns on the same embodiment. 同実施形態に係るエポックごとのバッチサイズ増加をＭＮＩＳＴを用いた学習に適用した場合の検証結果を示す図である。FIG. 10 is a diagram showing a verification result when increasing the batch size for each epoch according to the same embodiment is applied to learning using MNIST; 同実施形態に係る損失およびエポックに基づくバッチサイズ変化をｃｉｆａｒ１０を用いた学習に適用した場合の検証結果を示す図である。FIG. 10 is a diagram showing a verification result when batch size change based on loss and epoch according to the same embodiment is applied to learning using cifar10; 同実施形態に係る損失とエポックに基づくバッチサイズの増減を実現する訓練スクリプトの一例を示す図である。FIG. 10 is a diagram showing an example of a training script for increasing or decreasing the batch size based on losses and epochs according to the embodiment; 同実施形態に係るバッチサイズ変更部によるＧＰＵ中モデルの作り直しについて説明するための図である。FIG. 5 is a diagram for explaining recreating of a GPU medium model by a batch size changing unit according to the embodiment; 同実施形態に係るバッチサイズ変更部による計算ループ数の増減制御について説明するための図である。It is a figure for demonstrating increase/decrease control of the number of calculation loops by the batch size change part which concerns on the same embodiment. 同実施形態に係るバッチサイズ変更部による利用ＧＰＵの増減制御について説明するための図である。FIG. 5 is a diagram for explaining increase/decrease control of GPUs in use by a batch size changing unit according to the embodiment; 同実施形態に係るバッチサイズ変更部による制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control by the batch size change part which concerns on the same embodiment. 本開示の一実施形態に係るハードウェア構成例を示す図である。1 is a diagram illustrating a hardware configuration example according to an embodiment of the present disclosure; FIG.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Preferred embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. In the present specification and drawings, constituent elements having substantially the same functional configuration are denoted by the same reference numerals, thereby omitting redundant description.

なお、説明は以下の順序で行うものとする。
１．実施形態
１．１．概要
１．２．情報処理装置１０の機能構成例
１．３．検証結果
１．４．バッチサイズ増減の実現手法
２．ハードウェア構成例
３．まとめ Note that the description will be given in the following order.
1. Embodiment 1.1. Overview 1.2. Functional Configuration Example of Information Processing Apparatus 10 1.3. Verification result 1.4. 2. Realization method of batch size increase/decrease. Hardware configuration example 3 . summary

＜１．実施形態＞
＜＜１．１．概要＞＞
まず、本開示の一実施形態の概要について説明する。上述したように、近年、ニューラルネットワークによる学習を高速化するための技術が多く提案されている。一般に、ＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）の学習に要する時間は、パラメータの更新回数と比例関係にあることから、当該更新回数を減らすことが学習の高速化に対し有効な手段となり得る。 <1. embodiment>
<<1.1. Overview＞＞
First, an outline of an embodiment of the present disclosure will be described. As described above, in recent years, many techniques for speeding up learning by neural networks have been proposed. In general, the time required for DNN (Deep Neural Network) learning is proportional to the number of parameter updates. Therefore, reducing the number of updates can be an effective means for speeding up learning.

パラメータの更新回数は、例えば、バッチサイズを増加させることで減らすことが可能である。また、学習の後半においては、バッチサイズを増加させても学習が収束することが知られていることから、例えば、非特許文献１に開示されるように学習中にバッチサイズを変更し、できるだけ大きなバッチサイズを設定することで、パラメータの更新回数を削減することができ、ひいては学習を高速化させる効果が期待される。 The number of parameter updates can be reduced, for example, by increasing the batch size. Also, in the second half of learning, it is known that learning converges even if the batch size is increased. By setting a large batch size, it is possible to reduce the number of parameter updates, which is expected to have the effect of speeding up learning.

しかし、非特許文献１に記載されるバッチサイズの変更手法は、特定の学習手法にのみ適用可能な技術である。ここで、上記の特定の学習手法とは、ＳｔｅｐＬｅａｒｎｉｎｇｒａｔｅｄｅｃａｙと呼ばれる手法を指す。 However, the method of changing the batch size described in Non-Patent Document 1 is a technology applicable only to a specific learning method. Here, the above specific learning method refers to a method called Step Learning rate decay.

図１は、ＳｔｅｐＬｅａｒｎｉｎｇｒａｔｅｄｅｃａｙを適用した場合の損失（ｌｏｓｓ）の推移の一例を示す図である。ＳｔｅｐＬｅａｒｎｉｎｇｒａｔｅｄｅｃａｙとは、図１に示すように、学習率を階段状に下げることで、損失を階段状に下げていく手法である。図１に示す一例を参照すると、エポック３０および６０付近において損失が大きく低下し、グラフが階段形状をなしていることがわかる。 FIG. 1 is a diagram showing an example of transition of loss when Step Learning rate decay is applied. Step Learning rate decay is a method of decreasing the loss stepwise by decreasing the learning rate stepwise, as shown in FIG. Referring to the example shown in FIG. 1, it can be seen that the loss drops significantly near epochs 30 and 60 and the graph has a staircase shape.

非特許文献１に記載の技術によれば、損失が大きく低下するエポック３０や６０のようなタイミングでバッチサイズを変更することが可能であるが、損失の推移が上記のような階段形状を示さない学習手法には適用することができない。 According to the technique described in Non-Patent Document 1, it is possible to change the batch size at timings such as epochs 30 and 60 when the loss drops significantly, but the loss transition shows a stepped shape as described above. It cannot be applied to learning methods that do not exist.

本開示に係る技術思想は上記の点に着目して発想されたものであり、ＤＮＮによる学習を学習手法に依らず効果的に高速化することを可能とする。このために、本開示の一実施形態に係る情報処理装置１０は、ニューラルネットワークを用いた学習を行う学習部１２０を備え、学習部１２０は、ニューラルネットワークが出力する学習に係る理想状態とのギャップ値に基づいて、学習中にバッチサイズの値を動的に変更すること、を特徴の一つとする。 The technical idea according to the present disclosure was conceived by paying attention to the above points, and makes it possible to effectively speed up learning by DNN regardless of the learning method. For this reason, the information processing apparatus 10 according to an embodiment of the present disclosure includes a learning unit 120 that performs learning using a neural network. One of the features is to dynamically change the batch size value during learning based on the value.

ここで、上記の理想状態とのギャップ値とは、期待される出力と実際の出力との差を定量的に表した指標であってよい。本実施形態に係る理想状態とのギャップ値には、例えば、損失が含まれる。また、本実施形態に係る理想状態とのギャップ値は、トレーニングエラーやバリデーションエラーを含みうる。 Here, the gap value from the ideal state may be an index that quantitatively expresses the difference between the expected output and the actual output. The gap value from the ideal state according to this embodiment includes, for example, loss. Also, the gap value from the ideal state according to the present embodiment may include training errors and validation errors.

なお、上記ギャップ値として用いられるトレーニングエラーおよびバリエーションエラーの一例としては、例えば、損失として用いられることもある平均二乗誤差（ＭＳＥ：ＭｅａｎＳｑｕａｒｅＥｒｒｏｒ）や平均絶対誤差（ＭＡＥ：ＭｅａｎＡｂｓｏｌｕｔｅＥｒｒｏｒ）、画像分類において用いられるＴｏｐ－ｋ－ｅｒｒｏｒ（特に、ｔｏｐ－１－ｅｒｒｏｒやｔｏｐ－５－ｅｒｒｏｒなど）、また物体検出において用いられるｍＡＰ（ｍｅａｎＡｖｅｒａｇｅＰｒｅｃｉｓｉｏｎ）などが挙げられる。 Examples of the training error and variation error used as the gap value include, for example, the mean square error (MSE) and the mean absolute error (MAE), which may be used as a loss, and the image Top-k-error (especially top-1-error, top-5-error, etc.) used in classification, mAP (mean Average Precision) used in object detection, and the like.

ここで、図２を参照して、本実施形態に係るバッチサイズ変更の概要について説明する。図２には、エポックの経過に伴う損失の推移を表すグラフが示されている。なお、図２以降に示す各グラフにおいては、実線がバッチサイズ変更を伴わない損失の推移（Ｒｅｆｅｒｅｎｃｅ）を、破線が本実施形態に係るバッチサイズ変更を適用した損失の推移（Ａｐｐｒｏａｃｈ）をそれぞれ示している。 Here, with reference to FIG. 2, an outline of batch size change according to the present embodiment will be described. FIG. 2 shows a graph representing changes in loss over the course of epochs. In each graph shown in FIG. 2 and subsequent figures, the solid line indicates the loss transition without changing the batch size (Reference), and the dashed line indicates the loss transition (Approach) to which the batch size change according to the present embodiment is applied. ing.

本実施形態に係る学習部１２０は、例えば、損失に基づいて学習の収束が推定される場合、学習中にバッチサイズの値を増加させてよい。 The learning unit 120 according to the present embodiment may increase the value of the batch size during learning, for example, when convergence of learning is estimated based on loss.

損失の値が小さくなることは、ＤＮＮが解に近づいていること、すなわち学習が収束に向かっていること（学習が安定していること）を示す。このことから、本実施形態に係る学習部１２０は、損失のｎ回微分値に基づいて、学習中にバッチサイズの値を増加させてもよい。 A smaller loss value indicates that the DNN is approaching a solution, that is, learning is converging (learning is stable). Therefore, the learning unit 120 according to the present embodiment may increase the batch size value during learning based on the n-th differential value of the loss.

例えば、本実施形態に係る学習部１２０は、損失の１回微分値、すなわち傾きが所定の閾値を下回る場合、バッチサイズの値を増加させることができる。図２に示す一例の場合、学習部１２０は、損失の傾きが落ち着いたタイミングＴ１（エポック３０）でバッチサイズの値を３２Ｋから６４Ｋに増加させている。 For example, the learning unit 120 according to the present embodiment can increase the value of the batch size when the first derivative of the loss, that is, the slope is below a predetermined threshold. In the example shown in FIG. 2, the learning unit 120 increases the batch size value from 32K to 64K at the timing T1 (epoch 30) when the slope of the loss settles down.

また、例えば、本実施形態に係る学習部１２０は、損失の０回微分値、すなわち損失の値そのものが所定の閾値を下回る場合、バッチサイズの値を増加させることもできる。ここで、上記の閾値が０．３である場合、学習部１２０は、損失の値が０．３を下回ったタイミングＴ２（エポック６０）でバッチサイズの値を増加させてよい。なお、学習部１２０は、ｎ＞２以上のｎ回微分値に基づいてバッチサイズの値を増加させてもよい。 Further, for example, the learning unit 120 according to the present embodiment can also increase the value of the batch size when the 0th differential value of loss, that is, the loss value itself is below a predetermined threshold. Here, if the above threshold is 0.3, the learning unit 120 may increase the batch size value at timing T2 (epoch 60) when the loss value falls below 0.3. Note that the learning unit 120 may increase the value of the batch size based on the n-times differentiation value of n>2 or more.

ここで、ＡｐｐｒｏａｃｈとＲｅｆｅｒｅｎｃｅを比較すると、本実施形態に係るバッチサイズ変更手法を適用した場合であっても、学習が発散せずに性能を保っていることがわかる。すなわち、本実施形態に係る情報処理装置１０が実現するバッチ変更手法によれば、学習性能の確保と、パラメータ更新回数の削減すなわち学習時間の短縮と、を両立することが可能となる。 Here, when Approach and Reference are compared, it can be seen that learning does not diverge and performance is maintained even when the batch size changing method according to the present embodiment is applied. That is, according to the batch change method realized by the information processing apparatus 10 according to the present embodiment, it is possible to achieve both securing of learning performance and reduction of the number of parameter updates, that is, shortening of the learning time.

また、本実施形態に係るバッチ変更手法によれば、図２に示すような、損失の推移が階段形状を示さない学習手法であっても、バッチサイズを増加させ、学習時間を短縮することが可能となる。このように、本実施形態に係る情報処理装置１０によれば、ＤＮＮによる学習を学習手法に依らず効果的に高速化することが可能となる。 Further, according to the batch change method according to the present embodiment, even in a learning method in which the change in loss does not show a staircase shape as shown in FIG. 2, it is possible to increase the batch size and shorten the learning time. It becomes possible. As described above, according to the information processing apparatus 10 according to the present embodiment, it is possible to effectively speed up the learning by DNN regardless of the learning method.

＜＜１．２．情報処理装置１０の機能構成例＞＞
次に、本実施形態に係る情報処理装置１０の機能構成例について説明する。図３は、本実施形態に係る情報処理装置１０の機能構成例を示すブロック図である。図３を参照すると、本実施形態に係る情報処理装置１０は、入出力制御部１１０、学習部１２０、微分計算部１３０、およびバッチサイズ変更部１４０を備える。 <<1.2. Functional Configuration Example of Information Processing Device 10>>
Next, a functional configuration example of the information processing apparatus 10 according to this embodiment will be described. FIG. 3 is a block diagram showing a functional configuration example of the information processing apparatus 10 according to this embodiment. Referring to FIG. 3 , the information processing apparatus 10 according to this embodiment includes an input/output control section 110 , a learning section 120 , a differential calculation section 130 and a batch size changing section 140 .

（入出力制御部１１０）
本実施形態に係る入出力制御部１１０は、ＤＮＮの学習に係るユーザインタフェースを制御する。例えば、本実施形態に係る入出力制御部１１０は、入力装置を介して入力された各種のデータを学習部１２０に引き渡す。また、例えば、入出力制御部１１０は、学習部１２０が出力する値を出力装置に引き渡す。 (Input/output control unit 110)
The input/output control unit 110 according to this embodiment controls a user interface related to DNN learning. For example, the input/output control unit 110 according to this embodiment hands over various data input via an input device to the learning unit 120 . Also, for example, the input/output control unit 110 delivers the value output by the learning unit 120 to the output device.

（学習部１２０）
本実施形態に係る学習部１２０は、ＤＮＮを用いた学習を行う。上述したように、本実施形態に係る学習部１２０は、ＤＮＮが出力する学習に係る理想状態とのギャップ値に基づいて、学習中にバッチサイズの値を動的に変更すること、を特徴の一つとする。本実施形態に係る理想状態とのギャップ値は、例えば、損失、トレーニングエラー、バリデーションエラーなどを含む。 (Learning unit 120)
The learning unit 120 according to this embodiment performs learning using DNN. As described above, the learning unit 120 according to the present embodiment is characterized by dynamically changing the batch size value during learning based on the gap value between the ideal state related to learning output by the DNN. be one. The gap value from the ideal state according to this embodiment includes, for example, loss, training error, validation error, and the like.

（微分計算部１３０）
本実施形態に係る微分計算部１３０は、学習部１２０から入力される損失にｎ回微分処理を行うことでｎ回微分値を算出し、当該ｎ回微分値を学習部１２０に出力する。 (Differential calculator 130)
The differential calculation unit 130 according to the present embodiment performs n-time differential processing on the loss input from the learning unit 120 to calculate an n-time differential value, and outputs the n-time differential value to the learning unit 120 .

（バッチサイズ変更部１４０）
本実施形態に係るバッチサイズ変更部１４０は、学習部１２０が設定したバッチサイズの値に基づいて、バッチサイズの増減を制御する機能を有する。本実施形態に係るバッチサイズ変更部１４０が有する機能の詳細については、別途後述する。 (Batch size changing unit 140)
The batch size changing unit 140 according to the present embodiment has a function of controlling increase/decrease of the batch size based on the batch size value set by the learning unit 120 . Details of the functions of the batch size changing unit 140 according to this embodiment will be described separately later.

以上、本実施形態に係る情報処理装置１０の機能構成例について説明した。なお、図３を用いて説明した上記の構成はあくまで一例であり、本実施形態に係る情報処理装置１０の機能構成は係る例に限定されない。本実施形態に係る情報処理装置１０の機能構成は、仕様や運用に応じて柔軟に変形可能である。 The functional configuration example of the information processing apparatus 10 according to the present embodiment has been described above. Note that the above configuration described using FIG. 3 is merely an example, and the functional configuration of the information processing apparatus 10 according to the present embodiment is not limited to the example. The functional configuration of the information processing apparatus 10 according to this embodiment can be flexibly modified according to specifications and operations.

＜＜１．３．検証結果＞＞
次に、本実施形態に係る情報処理装置１０により実現されるバッチサイズ変更手法の検証結果について述べる。 <<1.3. Verification result >>
Next, verification results of the batch size changing method realized by the information processing apparatus 10 according to the present embodiment will be described.

まず、データセットにＩｍａｇｅＮｅｔを、ＤＮＮにＲｅｓＮｅｔ－５０を用いた場合の検証結果について説明する。図４は、本実施形態に係る損失の傾きに基づくバッチサイズ変更をＩｍａｇｅＮｅｔ／ＲｅｓＮｅｔ－５０に適用した際の検証結果を示す図である。 First, the verification results when ImageNet is used as the data set and ResNet-50 is used as the DNN will be described. FIG. 4 is a diagram showing verification results when batch size change based on the gradient of loss according to the present embodiment is applied to ImageNet/ResNet-50.

ここでは、Ｒｅｆｅｒｅｎｃｅにおけるバッチサイズを３２Ｋで固定し、学習を行った。一方、Ａｐｐｒｏａｃｈにおいては、損失の１回微分値、すなわち傾きが閾値を下回ったタイミングＴ３（エポック３０）において、バッチサイズを３２Ｋから６８Ｋに増加させ学習を継続させた。 Here, learning was performed with the batch size in Reference fixed at 32K. On the other hand, in Approach, the batch size was increased from 32K to 68K at timing T3 (epoch 30) when the first derivative of the loss, that is, the slope fell below the threshold, and learning was continued.

ＲｅｆｅｒｅｎｃｅとＡｐｐｒｏａｃｈを比較すると、本実施形態に係るバッチ変更手法によりバッチサイズを増加させた場合であっても、損失の収束に大きな影響を与えないことがわかる。 Comparing Reference and Approach, it can be seen that even when the batch size is increased by the batch change method according to the present embodiment, loss convergence is not significantly affected.

また、図５は、本実施形態に係るトレーニングの値に基づくバッチサイズ変更をＩｍａｇｅＮｅｔ／ＲｅｓＮｅｔ－５０に適用した際の検証結果を示す図である。 FIG. 5 is a diagram showing verification results when applying batch size change based on training values according to the present embodiment to ImageNet/ResNet-50.

ここでは、トレーニングエラーの０回微分値が閾値１．８を下回ったタイミングＴ４（エポック３０）において、バッチサイズを２Ｋから２０Ｋに増加させ学習を継続させた。 Here, the batch size was increased from 2K to 20K at timing T4 (epoch 30) when the 0th differential value of the training error fell below the threshold value of 1.8, and learning was continued.

図５を参照すると、トレーニングエラーの０回微分値に基づいてバッチサイズを増加させた場合であっても、影響なく学習が収束に向かっていることがわかる。 Referring to FIG. 5, it can be seen that learning converges without any effect even when the batch size is increased based on the 0th derivative of the training error.

次に、データセットにＭＮＩＳＴを用いた場合の検証結果について説明する。図６は、本実施形態に係る損失に基づくバッチサイズ変更をＭＮＩＳＴを用いた学習に適用した際の検証結果を示す図である。 Next, the verification results when using MNIST for the data set will be described. FIG. 6 is a diagram showing a verification result when the loss-based batch size change according to the present embodiment is applied to learning using MNIST.

ここでは、Ｒｅｆｅｒｅｎｃｅにおけるバッチサイズを１２８で固定し、学習を行った。一方、Ａｐｐｒｏａｃｈにおいては、損失の１回微分値が閾値を下回り、かつ損失の０回微分値が閾値０．０３を下回ったタイミングＴ５（エポック１）において、バッチサイズを１２８から３０７２に増加させ学習を継続させた。 Here, the batch size in Reference was fixed at 128, and learning was performed. On the other hand, in Approach, the batch size is increased from 128 to 3072 at timing T5 (epoch 1) when the 1st derivative of the loss falls below the threshold and the 0th derivative of the loss falls below the threshold of 0.03. continued.

上記の制御の結果、パラメータの更新回数を２０００回から５６０回に削減することができ、学習時間を大幅に短縮することができた。 As a result of the above control, the number of parameter updates can be reduced from 2000 to 560, and the learning time can be greatly shortened.

次に、データセットにｃｉｆａｒ１０を用いた場合の検証結果について説明する。図７および図８は、本実施形態に係る損失に基づくバッチサイズ変更をｃｉｆａｒ１０を用いた学習に適用した際の検証結果を示す図である。 Next, the verification results when cifar10 is used for the data set will be described. 7 and 8 are diagrams showing verification results when applying loss-based batch resizing according to the present embodiment to learning using cifar10.

図７に係る検証では、Ｒｅｆｅｒｅｎｃｅにおけるバッチサイズを６４で固定し、学習を行った。一方、Ａｐｐｒｏａｃｈにおいては、損失の１回微分値が閾値を下回り、かつ損失の０回微分値が閾値０．３５を下回ったタイミングＴ６（エポック５）において、バッチサイズを６４から１０２４に増加させ学習を継続させた。 In the verification according to FIG. 7, the batch size in Reference was fixed at 64 and learning was performed. On the other hand, in Approach, the batch size is increased from 64 to 1024 at timing T6 (Epoch 5) when the 1st derivative of the loss falls below the threshold and the 0th derivative of the loss falls below the threshold of 0.35. continued.

上記の制御の結果、パラメータの更新回数を２００００回から５０００回に削減することができ、学習時間を大幅に短縮することができた。 As a result of the above control, the number of parameter updates can be reduced from 20000 to 5000, and the learning time can be greatly shortened.

また、図８に係る検証では、図７に係る検証と同様に、Ｒｅｆｅｒｅｎｃｅにおけるバッチサイズを６４で固定し、学習を行った。一方、Ａｐｐｒｏａｃｈにおいては、損失の０回微分値が閾値０．３５を下回ったタイミングＴ７（エポック８）において、バッチサイズを６４から１０２４に増加させ学習を継続させた。 Moreover, in the verification according to FIG. 8, learning was performed with the batch size in Reference fixed at 64, as in the verification according to FIG. On the other hand, in Approach, the batch size was increased from 64 to 1024 at timing T7 (epoch 8) when the 0th differential value of the loss fell below the threshold value of 0.35, and learning was continued.

上記の制御の結果、パラメータの更新回数を２００００回から７２５０回に削減することができ、学習時間を大幅に短縮することができた。 As a result of the above control, the number of parameter updates can be reduced from 20000 to 7250, and the learning time can be greatly shortened.

以上、本実施形態に係るバッチサイズ変更手法の検証結果について述べた。上述した検証結果では、本実施形態に係るバッチサイズ変更手法を適用した場合、性能にほぼ影響を与えずに、パラメータ更新回数を１／３～１／４程度削減できることが示されている。このように、本実施形態に係る情報処理装置１０によれば、ＤＮＮによる学習を学習手法に依らず効果的に高速化することが可能となる。 The verification results of the batch size changing method according to the present embodiment have been described above. The verification results described above show that when the batch size changing method according to the present embodiment is applied, the number of parameter updates can be reduced by about 1/3 to 1/4 with almost no impact on performance. As described above, according to the information processing apparatus 10 according to the present embodiment, it is possible to effectively speed up the learning by DNN regardless of the learning method.

なお、損失の１回微分値に基づくバッチサイズの変更は、例えば、図９に示すような訓練スクリプトＴＳ１および損失傾き計算モジュールＣＭにより実現することができる。なお、図９においては、コードを擬似的に示している。 It should be noted that changing the batch size based on the first derivative of the loss can be realized, for example, by a training script TS1 and a loss slope calculation module CM as shown in FIG. In addition, in FIG. 9, codes are shown in a pseudo manner.

図９に示す一例の場合、訓練スクリプトＴＳ１では、まず、損失傾き取得ＡＰＩすなわち損失傾き計算モジュールＣＭの呼び出し処理を実行し、返り値として現在のｌｏｓｓ＿ｇｒａｄの値が取得される。 In the case of the example shown in FIG. 9, the training script TS1 first executes the process of calling the loss slope acquisition API, that is, the loss slope calculation module CM, and acquires the current loss_grad value as a return value.

次に、取得されたｌｏｓｓ＿ｇｒａｄの値と閾値との比較処理が実行され、ここで、ｌｏｓｓ＿ｇｒａｄの値が閾値を下回る場合、バッチサイズの増加処理が実行される。 Next, a process of comparing the obtained value of loss_grad with a threshold is performed, and if the value of loss_grad is below the threshold, a process of increasing the batch size is performed.

訓練スクリプトＴＳ１では、学習が収束するまで、上記の各処理が繰り返し実行される。 In the training script TS1, the above processes are repeatedly executed until learning converges.

また、損失傾き計算モジュールＣＭは、訓練スクリプトＴＳ１により呼び出されると、保持するｌｏｓｓの値をｌｏｓｓ＿ｐｒｅｖに退避し、新たに取得したｌｏｓｓとｌｏｓｓ＿ｐｒｅｖとの差を求めることで、ｌｏｓｓ＿ｇｒａｄを算出する。この際、損失傾き計算モジュールＣＭは、図示するように、損失の移動平均をとりノイズを除去する処理を行なってもよい。 In addition, when called by the training script TS1, the loss gradient calculation module CM saves the retained value of loss to loss_prev, and calculates loss_grad by obtaining the difference between the newly obtained loss and loss_prev. At this time, the loss slope calculation module CM may take a moving average of losses and remove noise, as shown in the figure.

なお、上記では、バッチサイズを増加させる場合を主な例に説明したが、本実施形態に係る学習部１２０は、損失に基づいて学習の発散が推定される場合には、学習中バッチサイズの値を減少させることも可能である。 In the above description, the case of increasing the batch size has been described as a main example. It is also possible to decrease the value.

例えば、図５に示した一例の場合、期間Ｄ１においては学習がまだ不安定であることから、学習部１２０は、初期値として与えられた小さいバッチサイズの値を維持する。一方、期間Ｄ２において学習が安定した場合、学習部１２０はバッチサイズの値を増加させてよい。 For example, in the case of the example shown in FIG. 5, the learning is still unstable in the period D1, so the learning unit 120 maintains the small batch size value given as the initial value. On the other hand, when learning stabilizes in period D2, the learning unit 120 may increase the value of the batch size.

しかし、Ｓｔｅｐｌｅａｒｎｉｎｇｒａｔｅｄｅｃａｙでは、一般に学習率を下げた直後に大きく損失が下がることから、学習率を下げる前と比較して学習が発散しやすくなることが想定される。このため、本実施形態に係る学習部１２０は、期間Ｄ２において一度増加させたバッチサイズの値を期間Ｄ３においては減少させることで学習の収束を図ることができる。この際、学習部１２０は、例えば期間Ｄ１および期間Ｄ２の間となるようなバッチサイズの値を設定してよい。 However, in step learning rate decay, since the loss generally drops significantly immediately after the learning rate is lowered, it is assumed that the learning diverges more easily than before the learning rate is lowered. Therefore, the learning unit 120 according to the present embodiment can converge learning by decreasing the value of the batch size, which is once increased in the period D2, in the period D3. At this time, the learning unit 120 may set a batch size value between the period D1 and the period D2, for example.

次に、本実施形態に係るエポックに基づくバッチサイズの変更について説明する。ＤＮＮによる学習では、学習率の減衰がない場合、学習が進むにつれ、すなわちエポックが重なるにつれ、学習が容易となる傾向が強く見られる。このため、本実施形態に係る学習部１２０は、エポックの経過に伴いバッチサイズの値を増加させることができる。例えば、本実施形態に係る学習部１２０は、エポックごとにバッチサイズの値を増加させてもよい。 Next, changing the batch size based on the epoch according to this embodiment will be described. In learning by DNN, if the learning rate does not decay, there is a strong tendency for learning to become easier as learning progresses, that is, as epochs overlap. Therefore, the learning unit 120 according to this embodiment can increase the value of the batch size as epochs elapse. For example, the learning unit 120 according to this embodiment may increase the value of the batch size for each epoch.

図１０は、本実施形態に係るエポックごとのバッチサイズ増加をＭＮＩＳＴを用いた学習に適用した場合の検証結果を示す図である。ここでは、バッチサイズの初期値として１２８を設定し、エポック１（タイミングＴ８）でバッチサイズを２５６に、エポック２（タイミングＴ９）でバッチサイズを５１２に、エポック３（タイミングＴ１０）でバッチサイズを１０２４に、それぞれ倍増させる制御を行った。 FIG. 10 is a diagram showing a verification result when increasing the batch size for each epoch according to this embodiment is applied to learning using MNIST. Here, the initial batch size is set to 128, the batch size is set to 256 in epoch 1 (timing T8), the batch size is set to 512 in epoch 2 (timing T9), and the batch size is set to epoch 3 (timing T10). 1024 were controlled to double each.

上記の制御の結果、パラメータの更新回数を２０００回から９３８に削減することができた。係る検証結果によれば、エポックごとにバッチサイズを増加させた場合であっても、損失の収束に大きな影響を与えず、学習時間を大幅に削減できることがわかる。 As a result of the above control, the number of parameter updates was reduced from 2000 to 938. According to the verification results, it can be seen that even if the batch size is increased for each epoch, the learning time can be significantly reduced without significantly affecting loss convergence.

また、本実施形態に係る学習部１２０は、損失やエポックに基づいてバッチサイズの値を増加させた結果、学習の発散が推定された場合には、発散前、すなわち直前のエポックにおけるネットワークモデルを再読み込みすることで、学習の収束を図ることも可能である。 In addition, the learning unit 120 according to the present embodiment, as a result of increasing the batch size value based on the loss and epoch, when learning divergence is estimated, the network model before divergence, that is, in the immediately preceding epoch It is also possible to attempt convergence of learning by reloading.

図１１は、本実施形態に係る損失およびエポックに基づくバッチサイズ変化をｃｉｆａｒ１０を用いた学習に適用した場合の検証結果を示す図である。 FIG. 11 is a diagram showing a verification result when batch size change based on loss and epoch according to this embodiment is applied to learning using cifar10.

ここでは、バッチサイズの初期値として６４を設定し、損失の０回微分値の閾値を０．３５に設定した。図１１に示す一例の場合、エポック８（タイミングＴ１１）において損失が閾値０．３５を下回ったことに基づいてバッチサイズの増加処理を開始し、その後エポックごとにバッチサイズを増加させた。 Here, 64 was set as the initial value of the batch size, and the threshold value of the 0th differential value of the loss was set to 0.35. In the case of the example shown in FIG. 11, batch size increase processing was started based on the fact that the loss fell below the threshold value of 0.35 at epoch 8 (timing T11), and thereafter the batch size was increased for each epoch.

その後、エポック１４（タイミングＴ１２）においてバッチサイズを４Ｋに増加させたところ、学習の発散が推定された。このため、本実施形態に係る学習部１２０は、エポック１５の開始時点でバッチサイズの値の増加を停止し、エポック１４開始時点におけるモデルを再読み込みしたうえで、バッチサイズの値を２Ｋに固定し学習を継続した。 After that, when the batch size was increased to 4K at epoch 14 (timing T12), divergence of learning was estimated. Therefore, the learning unit 120 according to the present embodiment stops increasing the batch size value at the start of epoch 15, reloads the model at the start of epoch 14, and fixes the batch size value to 2K. and continued learning.

このように、本実施形態に係る学習部１２０は、過去のエポックにおけるネットワークモデルの再読み込みを行った場合、当該過去のエポックで設定した値よりも小さいバッチサイズの値を設定してよい。 In this way, the learning unit 120 according to the present embodiment may set a smaller batch size value than the value set in the past epoch when reloading the network model in the past epoch.

本実施形態に係る学習部１２０が有する上記の機能によれば、損失やエポックに基づいて自動でバッチサイズの値を増減することができ、学習の発散を回避しながらパラメータの更新回数を効果的に削減することが可能となる。 According to the above function of the learning unit 120 according to the present embodiment, the batch size value can be automatically increased or decreased based on the loss or epoch, and the number of parameter updates can be effectively reduced while avoiding learning divergence. can be reduced to

なお、上記のような損失とエポックに基づくバッチサイズの増減は、例えば、図１２に示すような訓練スクリプトＴＳ２より実現することができる。なお、図１２においては、コードを擬似的に示している。 It should be noted that the increase and decrease of the batch size based on loss and epoch as described above can be realized by, for example, a training script TS2 as shown in FIG. Note that FIG. 12 shows codes in a pseudo manner.

図１２に示す一例の場合、訓練スクリプトＴＳ２は、まず、図９に示した損失傾き計算モジュールＣＭを呼び出し返り値として取得したｌｏｓｓ＿ｇｒａｄを、閾値と比較している。ここで、ｌｏｓｓ＿ｇｒａｄが閾値を下回る場合、訓練スクリプトＴＳ２は、バッチサイズの自動増加を開始する。 In the example shown in FIG. 12, the training script TS2 first compares loss_grad obtained as a return value from calling the loss gradient calculation module CM shown in FIG. 9 with a threshold. Now, if loss_grad is below the threshold, the training script TS2 starts auto-increasing the batch size.

その後、訓練スクリプトＴＳ２は、損失が前エポックよりも閾値以上大きくなったか否かを判定する。ここで、損失の増大が認められる場合、訓練スクリプトＴＳ２は、バッチサイズの自動増加を停止する。 The training script TS2 then determines whether the loss is greater than the previous epoch by more than a threshold. Now, if an increase in loss is observed, the training script TS2 stops automatically increasing the batch size.

また、この際、訓練スクリプトＴＳ２は、前エポックにおけるＤＮＮのネットワークモデルを再読み込みする。 Also, at this time, the training script TS2 reloads the network model of the DNN in the previous epoch.

＜＜１．４．バッチサイズ増減の実現手法＞＞
続いて、本実施形態に係るバッチサイズ増減の実現手法について詳細に説明する。本実施形態に係るバッチサイズ変更部１４０は、学習部１２０が設定したバッチサイズの値を取得し、当該値に基づいてＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を制御することで、バッチサイズの増減を実現する。 <<1.4. Realization method of batch size increase/decrease >>
Subsequently, a method for realizing batch size increase/decrease according to the present embodiment will be described in detail. The batch size changing unit 140 according to the present embodiment acquires the value of the batch size set by the learning unit 120, and controls the GPU (Graphics Processing Unit) based on the value to increase or decrease the batch size. .

例えば、本実施形態に係るバッチサイズ変更部１４０は、ＧＰＵ中のモデルを作り直すことによりバッチサイズの増減を制御してもよい。 For example, the batch size changing unit 140 according to this embodiment may control the increase or decrease of the batch size by recreating the model in the GPU.

図１３は、本実施形態に係るバッチサイズ変更部１４０によるＧＰＵ中モデルの作り直しについて説明するための図である。この場合、まず、学習部１２０が損失の値を微分計算部１３０に入力し、当該値のｎ回部分値を取得する。また、学習部１２０は、取得したｎ回部分値に基づいて変更後のバッチサイズの値を決定し、当該バッチサイズの値をバッチサイズ変更部１４０に入力する。 FIG. 13 is a diagram for explaining recreating the GPU medium model by the batch size changing unit 140 according to the present embodiment. In this case, first, the learning unit 120 inputs the value of the loss to the differential calculation unit 130, and obtains the n-th partial value of the value. In addition, the learning unit 120 determines the value of the batch size after change based on the acquired n-times partial value, and inputs the value of the batch size to the batch size changing unit 140 .

次に、本実施形態に係るバッチサイズ変更部１４０は、入力されたバッチサイズの値に基づいて、現在学習に利用されているＧＰＵにモデルの再作成指示を行う。 Next, the batch size changing unit 140 according to the present embodiment instructs the GPU currently used for learning to recreate the model based on the input batch size value.

なお、図１３では、ＧＰＵ＿０およびＧＰＵ＿１のうち、ＧＰＵ＿０が現在学習に用いられており、ＧＰＵ＿０中のモデルのバッチサイズが３２である場合において、バッチサイズ変更部１４０がＧＰＵ＿０にモデルの再作成を指示し、当該モデルのバッチサイズを６４に変更させる場合の一例が示されている。 In FIG. 13, among GPU_0 and GPU_1, GPU_0 is currently used for learning, and when the batch size of the model in GPU_0 is 32, the batch size changing unit 140 instructs GPU_0 to recreate the model. and the batch size of the model is changed to 64.

本実施形態に係るバッチサイズ変更部１４０による上記の制御によれば、情報処理装置１０が有するＧＰＵの数に影響されずグローバルにバッチサイズを上げることができ、またＧＰＵの並列演算能力が活きることでさらなる高速化に繋がる効果も記載される。 According to the above control by the batch size changing unit 140 according to the present embodiment, the batch size can be increased globally without being affected by the number of GPUs that the information processing apparatus 10 has, and the parallel computing capability of the GPUs can be utilized. The effect leading to further speedup is also described in .

また、例えば、本実施形態に係るバッチサイズ変更部１４０は、学習に係る計算のループ数を増減させることによりバッチサイズの増減を制御してもよい。上記のようなテクニックは、ａｃｃｕｍ－ｇｒａｄとも称される。 Further, for example, the batch size changing unit 140 according to the present embodiment may control increase/decrease of the batch size by increasing/decreasing the number of loops of calculation related to learning. Techniques such as the above are also referred to as accum-grad.

図１４は、本実施形態に係るバッチサイズ変更部１４０による計算ループ数の増減制御について説明するための図である。この場合、本実施形態に係るバッチサイズ変更部１４０は、学習部１２０から入力されたバッチサイズの値に基づいて、現在学習に利用されているＧＰＵに計算ループ数を変更するよう指示する。 FIG. 14 is a diagram for explaining increase/decrease control of the number of calculation loops by the batch size changing unit 140 according to this embodiment. In this case, the batch size changing unit 140 according to the present embodiment instructs the GPU currently used for learning to change the number of calculation loops based on the batch size value input from the learning unit 120 .

なお、図１４では、ＧＰＵ＿０およびＧＰＵ＿１のうち、ＧＰＵ＿０が現在学習に用いられており、ＧＰＵ＿０中のモデルのバッチサイズが３２である場合において、バッチサイズ変更部１４０がＧＰＵ＿０に２回ａｃｃｕｍ－ｇｒａｄを行うよう指示し、バッチサイズ３２による学習が２回行われる場合の一例が示されている。 In FIG. 14, among GPU_0 and GPU_1, GPU_0 is currently used for learning, and when the batch size of the model in GPU_0 is 32, the batch size changing unit 140 performs accum-grad to GPU_0 twice. An example of a case where the instruction is given and learning with a batch size of 32 is performed twice is shown.

本実施形態に係るバッチサイズ変更部１４０による上記の制御によれば、ＧＰＵの数やメモリ容量に制限されることなくバッチサイズを上げることができ、また同期処理の回数が減るため、減った同期処理の回数分だけ学習を高速化することができる。 According to the above control by the batch size changing unit 140 according to the present embodiment, the batch size can be increased without being restricted by the number of GPUs or the memory capacity, and the number of synchronization processes is reduced. The speed of learning can be increased by the number of times of processing.

また、例えば、本実施形態に係るバッチサイズ変更部１４０は、学習に用いられるＧＰＵの数を増減させることによりバッチサイズの増減を制御してもよい。 Also, for example, the batch size changing unit 140 according to the present embodiment may control increase/decrease of the batch size by increasing/decreasing the number of GPUs used for learning.

図１５は、本実施形態に係るバッチサイズ変更部１４０による利用ＧＰＵの増減制御について説明するための図である。この場合、本実施形態に係るバッチサイズ変更部１４０は、学習部１２０から入力されたバッチサイズの値に基づいて、現在学習に利用されていないＧＰＵに稼働を指示する。 FIG. 15 is a diagram for explaining increase/decrease control of used GPUs by the batch size changing unit 140 according to the present embodiment. In this case, the batch size changing unit 140 according to the present embodiment instructs GPUs not currently used for learning to operate based on the batch size value input from the learning unit 120 .

なお、図１５では、ＧＰＵ＿０およびＧＰＵ＿１のうち、ＧＰＵ＿０のみが現在学習に用いられている場合において、バッチサイズ変更部１４０がＧＰＵ＿１に稼働を指示する場合の一例が示されている。 Note that FIG. 15 shows an example in which the batch size changing unit 140 instructs GPU_1 to operate when only GPU_0 of GPU_0 and GPU_1 is currently used for learning.

本実施形態に係るバッチサイズ変更部１４０による上記の制御によれば、計算資源を増やすことで、その分学習を高速化することができる。 According to the above control by the batch size changing unit 140 according to the present embodiment, learning can be speeded up by increasing computational resources.

以上、本実施形態に係るバッチサイズ変更部１４０によるバッチサイズの変更制御手法について説明した。なお、本実施形態に係るバッチサイズ変更部１４０は、例えば、図１６に示すような優先度に基づいて、バッチサイズの変更制御手法を選択することで、バッチサイズ増加による高速化効果をより大きくすることができる。 The batch size change control method by the batch size change unit 140 according to the present embodiment has been described above. Note that the batch size change unit 140 according to the present embodiment selects a batch size change control method based on the priority shown in FIG. can do.

図１６は、本実施形態に係るバッチサイズ変更部１４０による制御の流れを示すフローチャートである。 FIG. 16 is a flow chart showing the flow of control by the batch size changer 140 according to this embodiment.

図１６を参照すると、バッチサイズ変更部１４０は、まず、追加で利用可能なＧＰＵが存在するか否かを判定する（Ｓ１１０１）。 Referring to FIG. 16, the batch size changing unit 140 first determines whether or not there is an additionally usable GPU (S1101).

ここで、追加で利用可能なＧＰＵが存在する場合（Ｓ１１０１：Ｙｅｓ）、バッチサイズ変更部１４０は、当該利用可能なＧＰＵを学習に割り当てることで、バッチサイズの増加を制御する（Ｓ１１０２）。 Here, if there is an additionally usable GPU (S1101: Yes), the batch size changing unit 140 controls the batch size increase by allocating the usable GPU to learning (S1102).

続いて、バッチサイズ変更部１４０は、ステップＳ１１０２における処理により目的のバッチサイズを達成したかを判定する（Ｓ１１０３）。 Subsequently, the batch size changing unit 140 determines whether the target batch size has been achieved by the processing in step S1102 (S1103).

ここで、目的のバッチサイズが達成されている場合（Ｓ１１０３：Ｙｅｓ）、バッチサイズ変更部１４０は、バッチサイズ変更に係る処理を終了する。 Here, if the target batch size has been achieved (S1103: Yes), the batch size changing unit 140 ends the processing related to changing the batch size.

一方、目的のバッチサイズが達成されていない場合（Ｓ１１０３：Ｎｏ）、または追加で利用可能なＧＰＵが存在しない場合（Ｓ１１０１：Ｎｏ）、バッチサイズ変更部１４０は、現在利用中のＧＰＵのメモリに空き容量が存在するか否かを判定する（Ｓ１１０４）。 On the other hand, if the desired batch size has not been achieved (S1103: No), or if there is no additionally usable GPU (S1101: No), the batch size changing unit 140 stores It is determined whether or not there is free space (S1104).

ここで、現在利用中のＧＰＵのメモリに空き容量が存在する場合（Ｓ１１０４：Ｙｅｓ）、バッチサイズ変更部１４０は、現在利用中のＧＰＵ中のモデルを作り直すことで、バッチサイズの増加を制御する（Ｓ１１０５）。 Here, if there is free space in the memory of the GPU currently in use (S1104: Yes), the batch size changing unit 140 controls the increase in batch size by recreating the model in the GPU currently in use. (S1105).

続いて、バッチサイズ変更部１４０は、ステップＳ１１０５における処理により目的のバッチサイズを達成したかを判定する（Ｓ１１０６）。 Subsequently, the batch size changing unit 140 determines whether the target batch size has been achieved by the processing in step S1105 (S1106).

ここで、目的のバッチサイズが達成されている場合（Ｓ１１０６：Ｙｅｓ）、バッチサイズ変更部１４０は、バッチサイズ変更に係る処理を終了する。 Here, if the target batch size has been achieved (S1106: Yes), the batch size changing unit 140 ends the processing related to changing the batch size.

一方、目的のバッチサイズが達成されていない場合（Ｓ１１０６：Ｎｏ）、または現在利用中のＧＰＵのメモリに空き容量が存在しない場合（Ｓ１１０４：Ｎｏ）、バッチサイズ変更部１４０は、学習に係る計算のループ数を増加させることで、バッチサイズの増加を制御し（Ｓ１１０７）、バッチサイズ変更に係る処理を終了する。 On the other hand, if the target batch size has not been achieved (S1106: No), or if there is no free space in the memory of the GPU currently in use (S1104: No), the batch size changing unit 140 performs calculations related to learning. By increasing the number of loops, the batch size increase is controlled (S1107), and the process related to batch size change ends.

＜２．ハードウェア構成例＞
次に、本開示の一実施形態に係る情報処理装置１０のハードウェア構成例について説明する。図１７は、本開示の一実施形態に係る情報処理装置１０のハードウェア構成例を示すブロック図である。図１７を参照すると、情報処理装置１０は、例えば、プロセッサ８７１と、ＲＯＭ８７２と、ＲＡＭ８７３と、ホストバス８７４と、ブリッジ８７５と、外部バス８７６と、インターフェース８７７と、入力装置８７８と、出力装置８７９と、ストレージ８８０と、ドライブ８８１と、接続ポート８８２と、通信装置８８３と、を有する。なお、ここで示すハードウェア構成は一例であり、構成要素の一部が省略されてもよい。また、ここで示される構成要素以外の構成要素をさらに含んでもよい。 <2. Hardware configuration example>
Next, a hardware configuration example of the information processing device 10 according to an embodiment of the present disclosure will be described. FIG. 17 is a block diagram showing a hardware configuration example of the information processing device 10 according to an embodiment of the present disclosure. Referring to FIG. 17, the information processing apparatus 10 includes, for example, a processor 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, an input device 878, and an output device 879. , a storage 880 , a drive 881 , a connection port 882 and a communication device 883 . Note that the hardware configuration shown here is an example, and some of the components may be omitted. Moreover, it may further include components other than the components shown here.

（プロセッサ８７１）
プロセッサ８７１は、例えば、演算処理装置又は制御装置として機能し、ＲＯＭ８７２、ＲＡＭ８７３、ストレージ８８０、又はリムーバブル記録媒体９０１に記録された各種プログラムに基づいて各構成要素の動作全般又はその一部を制御する。プロセッサ８７１は、例えば、ＧＰＵやＣＰＵを含む。なお、本開示の一実施形態に係る情報処理装置１０は、少なくとも２つのＧＰＵを備える。 (processor 871)
The processor 871 functions as, for example, an arithmetic processing device or a control device, and controls the overall operation of each component or a part thereof based on various programs recorded in the ROM 872, RAM 873, storage 880, or removable recording medium 901. . The processor 871 includes, for example, GPU and CPU. Note that the information processing device 10 according to an embodiment of the present disclosure includes at least two GPUs.

（ＲＯＭ８７２、ＲＡＭ８７３）
ＲＯＭ８７２は、プロセッサ８７１に読み込まれるプログラムや演算に用いるデータ等を格納する手段である。ＲＡＭ８７３には、例えば、プロセッサ８７１に読み込まれるプログラムや、そのプログラムを実行する際に適宜変化する各種パラメータ等が一時的又は永続的に格納される。 (ROM872, RAM873)
The ROM 872 is means for storing programs to be read into the processor 871, data used for calculation, and the like. The RAM 873 temporarily or permanently stores, for example, programs to be read into the processor 871 and various parameters that change appropriately when the programs are executed.

（ホストバス８７４、ブリッジ８７５、外部バス８７６、インターフェース８７７）
プロセッサ８７１、ＲＯＭ８７２、ＲＡＭ８７３は、例えば、高速なデータ伝送が可能なホストバス８７４を介して相互に接続される。一方、ホストバス８７４は、例えば、ブリッジ８７５を介して比較的データ伝送速度が低速な外部バス８７６に接続される。また、外部バス８７６は、インターフェース８７７を介して種々の構成要素と接続される。 (Host Bus 874, Bridge 875, External Bus 876, Interface 877)
The processor 871, ROM 872, and RAM 873 are interconnected via, for example, a host bus 874 capable of high-speed data transmission. On the other hand, the host bus 874 is connected, for example, via a bridge 875 to an external bus 876 with a relatively low data transmission speed. External bus 876 is also connected to various components via interface 877 .

（入力装置８７８）
入力装置８７８には、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等が用いられる。さらに、入力装置８７８としては、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラ（以下、リモコン）が用いられることもある。また、入力装置８７８には、マイクロフォンなどの音声入力装置が含まれる。 (input device 878)
For the input device 878, for example, a mouse, keyboard, touch panel, button, switch, lever, or the like is used. Furthermore, as the input device 878, a remote controller (hereinafter referred to as a remote controller) capable of transmitting control signals using infrared rays or other radio waves may be used. The input device 878 also includes a voice input device such as a microphone.

（出力装置８７９）
出力装置８７９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）、ＬＣＤ、又は有機ＥＬ等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置である。また、本開示に係る出力装置８７９は、触覚刺激を出力することが可能な種々の振動デバイスを含む。 (output device 879)
The output device 879 is, for example, a display device such as a CRT (Cathode Ray Tube), LCD, or organic EL, an audio output device such as a speaker, headphones, a printer, a mobile phone, a facsimile, or the like, and outputs the acquired information to the user. It is a device capable of visually or audibly notifying Output devices 879 according to the present disclosure also include various vibration devices capable of outputting tactile stimuli.

（ストレージ８８０）
ストレージ８８０は、各種のデータを格納するための装置である。ストレージ８８０としては、例えば、ハードディスクドライブ（ＨＤＤ）等の磁気記憶デバイス、半導体記憶デバイス、光記憶デバイス、又は光磁気記憶デバイス等が用いられる。 (storage 880)
Storage 880 is a device for storing various data. As the storage 880, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.

（ドライブ８８１）
ドライブ８８１は、例えば、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体９０１に記録された情報を読み出し、又はリムーバブル記録媒体９０１に情報を書き込む装置である。 (Drive 881)
The drive 881 is, for example, a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory, or writes information to the removable recording medium 901 .

（リムーバブル記録媒体９０１）
リムーバブル記録媒体９０１は、例えば、ＤＶＤメディア、Ｂｌｕ－ｒａｙ（登録商標）メディア、ＨＤＤＶＤメディア、各種の半導体記憶メディア等である。もちろん、リムーバブル記録媒体９０１は、例えば、非接触型ＩＣチップを搭載したＩＣカード、又は電子機器等であってもよい。 (Removable recording medium 901)
The removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like. Of course, the removable recording medium 901 may be, for example, an IC card equipped with a contactless IC chip, an electronic device, or the like.

（接続ポート８８２）
接続ポート８８２は、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ポート、ＩＥＥＥ１３９４ポート、ＳＣＳＩ（ＳｍａｌｌＣｏｍｐｕｔｅｒＳｙｓｔｅｍＩｎｔｅｒｆａｃｅ）、ＲＳ－２３２Ｃポート、又は光オーディオ端子等のような外部接続機器９０２を接続するためのポートである。 (Connection port 882)
The connection port 882 is, for example, a USB (Universal Serial Bus) port, an IEEE1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or a port for connecting an external connection device 902 such as an optical audio terminal. be.

（外部接続機器９０２）
外部接続機器９０２は、例えば、プリンタ、携帯音楽プレーヤ、デジタルカメラ、デジタルビデオカメラ、又はＩＣレコーダ等である。 (External connection device 902)
The external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.

（通信装置８８３）
通信装置８８３は、ネットワークに接続するための通信デバイスであり、例えば、有線又は無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）、又はＷＵＳＢ（ＷｉｒｅｌｅｓｓＵＳＢ）用の通信カード、光通信用のルータ、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）用のルータ、又は各種通信用のモデム等である。 (Communication device 883)
The communication device 883 is a communication device for connecting to a network. subscriber line) or a modem for various communications.

＜３．まとめ＞
以上説明したように、本開示の一実施形態に係る情報処理装置１０は、ニューラルネットワークを用いた学習を行う学習部１２０を備え、学習部１２０は、ニューラルネットワークが出力する学習に係る理想状態とのギャップ値に基づいて、学習中にバッチサイズの値を動的に変更すること、を特徴の一つとする。係る構成によれば、ＤＮＮによる学習を学習手法に依らず効果的に高速化することが可能となる。 <3. Summary>
As described above, the information processing apparatus 10 according to an embodiment of the present disclosure includes the learning unit 120 that performs learning using a neural network. One of the features is to dynamically change the batch size value during learning based on the gap value of . According to such a configuration, it is possible to effectively speed up learning by DNN regardless of the learning method.

以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 Although the preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is obvious that those who have ordinary knowledge in the technical field of the present disclosure can conceive of various modifications or modifications within the scope of the technical idea described in the claims. is naturally within the technical scope of the present disclosure.

また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 Also, the effects described herein are merely illustrative or exemplary, and are not limiting. In other words, the technology according to the present disclosure can produce other effects that are obvious to those skilled in the art from the description of this specification, in addition to or instead of the above effects.

また、コンピュータに内蔵されるＣＰＵ、ＲＯＭおよびＲＡＭなどのハードウェアに、情報処理装置１０が有する構成と同等の機能を発揮させるためのプログラムも作成可能であり、当該プログラムを記録した、コンピュータに読み取り可能な非一過性の記録媒体も提供され得る。 It is also possible to create a program for causing hardware such as a CPU, ROM, and RAM built into a computer to exhibit functions equivalent to those of the configuration of the information processing apparatus 10. A possible non-transitory recording medium may also be provided.

また、本明細書の情報処理装置１０の処理に係る各ステップは、必ずしもフローチャートに記載された順序に沿って時系列に処理される必要はない。例えば、情報処理装置１０の処理に係る各ステップは、フローチャートに記載された順序と異なる順序で処理されても、並列的に処理されてもよい。 Further, each step related to the processing of the information processing apparatus 10 of this specification does not necessarily have to be processed in chronological order according to the order described in the flowchart. For example, each step related to the processing of the information processing device 10 may be processed in an order different from the order described in the flowchart, or may be processed in parallel.

なお、以下のような構成も本開示の技術的範囲に属する。
（１）
ニューラルネットワークによる学習に係る理想状態とのギャップ値を取得する取得部と、
前記理想状態とのギャップ値に基づいて、前記ニューラルネットワークにおけるバッチサイズの値の動的な変更を指示する指示部と、
を備える、
情報処理装置。
（２）
前記理想状態とのギャップ値は、少なくとも損失に関する値を含み、
前記取得部は、前記損失に関する値を取得し、
前記指示部は、前記損失に関する値に基づいて、前記ニューラルネットワークにおけるバッチサイズの値の動的な変更を指示する、
前記（１）に記載の情報処理装置。
（３）
前記損失に関する値は、前記損失のｎ回微分値（ｎは０以上の整数。以下同様）を含む、
前記（２）に記載の情報処理装置。
（４）
前記損失のｎ回微分値は、時間方向における前記損失の微分値である、
前記（３）に記載の情報処理装置。
（５）
前記取得部は、ＡＰＩを介して前記前記損失に関する値を取得し、
前記指示部は、前記損失に関する値に基づいて、自動的に前記バッチサイズの値の変更を指示する、
前記（２）～（４）のいずれかに記載の情報処理装置。
（６）
前記指示部は、前記損失に関する値から学習の収束が推定される場合、バッチサイズの値の増加を指示する、
前記（２）～（５）のいずれかに記載の情報処理装置。
（７）
前記指示部は、前記損失のｎ回微分値の値に基づいて、バッチサイズの値の増加を指示する、
前記（３）または（４）に記載の情報処理装置。
（８）
前記指示部は、前記損失の値または前記損失の傾きの少なくともいずれかが閾値を下回ることに基づいて、バッチサイズの値の増加を指示する、
前記（７）に記載の情報処理装置。
（９）
前記指示部は、前記損失に関する値に基づいて学習の発散が推定される場合、バッチサイズの値の減少を指示する、
前記（２）～（８）のいずれかに記載の情報処理装置。
（１０）
前記指示部は、前記損失に関する値に基づいて学習の発散が推定される場合、過去のエポックにおけるネットワークモデルの再読み込みを指示する、
前記（９）に記載の情報処理装置。
（１１）
前記指示部は、前記過去のエポックにおけるネットワークモデルの再読み込みが行われた場合、前記過去のエポックで設定された値よりも小さいバッチサイズの値を設定させる、
前記（１０）に記載の情報処理装置。
（１２）
前記指示部は、ＧＰＵ中のモデルの作り直しによるバッチサイズの増減を指示する、
前記（１）～（１１）のいずれかに記載の情報処理装置。
（１３）
前記指示部は、学習に係る計算のループ数の増減によるバッチサイズの増減を指示する、
前記（１）～（１２）のいずれかに記載の情報処理装置。
（１４）
前記指示部は、学習に用いられるＧＰＵの数の増減によるバッチサイズの増減を指示する、
前記（１）～（１３）のいずれかに記載の情報処理装置。
（１５）
前記指示部は、追加で利用可能なＧＰＵが存在する場合、当該ＧＰＵの学習への割り当てによるバッチサイズの増加を指示する、
前記（１）～（１４）のいずれかに記載の情報処理装置。
（１６）
前記指示部は、追加で利用可能なＧＰＵが存在せず、かつ現在利用中のＧＰＵのメモリに空き容量が存在する場合、現在利用中のＧＰＵ中のモデルを作り直しによるバッチサイズの増加を指示する、
前記（１）～（１５）のいずれかに記載の情報処理装置。
（１７）
前記指示部は、現在利用中のＧＰＵのメモリに空き容量が存在しない場合、学習に係る計算のループ数の増加によるバッチサイズの増加を指示する、
前記（１）～（１６）のいずれかに記載の情報処理装置。
（１８）
前記理想状態とのギャップ値は、トレーニングエラーまたはバリデーションエラーのうち少なくともいずれかを含む、
前記（１）に記載の情報処理装置。
（１９）
プロセッサが、ニューラルネットワークによる学習に係る理想状態とのギャップ値を取得することと、
前記理想状態とのギャップ値に基づいて、前記ニューラルネットワークにおけるバッチサイズの値の動的な変更を指示することと、
を含む、
情報処理方法。 Note that the following configuration also belongs to the technical scope of the present disclosure.
(1)
an acquisition unit that acquires a gap value between an ideal state and an ideal state related to learning by a neural network;
an instruction unit that instructs dynamic change of the batch size value in the neural network based on the gap value from the ideal state;
comprising
Information processing equipment.
(2)
The gap value from the ideal state includes at least a loss-related value,
The acquisition unit acquires a value related to the loss,
The instruction unit instructs a dynamic change of the batch size value in the neural network based on the loss-related value.
The information processing device according to (1) above.
(3)
The value related to the loss includes the n-th differential value of the loss (n is an integer of 0 or more, the same applies hereinafter),
The information processing device according to (2) above.
(4)
The n-th differential value of the loss is the differential value of the loss in the time direction,
The information processing device according to (3) above.
(5)
The acquisition unit acquires the value related to the loss via an API,
The instruction unit automatically instructs a change in the batch size value based on the loss-related value.
The information processing apparatus according to any one of (2) to (4) above.
(6)
The instruction unit instructs to increase the value of the batch size when convergence of learning is estimated from the value related to the loss,
The information processing apparatus according to any one of (2) to (5).
(7)
The instruction unit instructs to increase the batch size value based on the value of the n-th differential value of the loss.
The information processing apparatus according to (3) or (4).
(8)
The instruction unit instructs to increase the batch size value based on at least one of the loss value or the loss slope being below a threshold.
The information processing device according to (7) above.
(9)
The instruction unit instructs a decrease in the batch size value when learning divergence is estimated based on the loss-related value.
The information processing apparatus according to any one of (2) to (8).
(10)
The instruction unit instructs reloading of the network model in a past epoch when learning divergence is estimated based on the loss-related value.
The information processing device according to (9) above.
(11)
When the network model is reloaded in the past epoch, the instruction unit causes the batch size value to be set smaller than the value set in the past epoch.
The information processing device according to (10) above.
(12)
The instruction unit instructs to increase or decrease the batch size by recreating the model in the GPU.
The information processing apparatus according to any one of (1) to (11) above.
(13)
The instruction unit instructs to increase or decrease the batch size by increasing or decreasing the number of loops for calculation related to learning,
The information processing apparatus according to any one of (1) to (12) above.
(14)
The instruction unit instructs an increase or decrease in batch size due to an increase or decrease in the number of GPUs used for learning,
The information processing apparatus according to any one of (1) to (13) above.
(15)
If there is an additional available GPU, the instruction unit instructs to increase the batch size by allocating the GPU to learning.
The information processing apparatus according to any one of (1) to (14) above.
(16)
If there is no additionally usable GPU and there is free space in the memory of the GPU currently in use, the instruction unit instructs to increase the batch size by recreating the model in the GPU currently in use. ,
The information processing apparatus according to any one of (1) to (15).
(17)
The instruction unit instructs to increase the batch size by increasing the number of loops for calculation related to learning when there is no free space in the memory of the GPU currently in use.
The information processing apparatus according to any one of (1) to (16) above.
(18)
The gap value with the ideal state includes at least one of training error or validation error,
The information processing device according to (1) above.
(19)
A processor acquiring a gap value between an ideal state and an ideal state related to learning by a neural network;
Directing a dynamic change of a batch size value in the neural network based on the gap value from the ideal state;
including,
Information processing methods.

１０情報処理装置
１１０入出力制御部
１２０学習部
１３０微分計算部
１４０バッチサイズ変更部
10 Information Processing Device 110 Input/Output Control Unit 120 Learning Unit 130 Differential Calculation Unit 140 Batch Size Changing Unit

Claims

an acquisition unit that acquires a gap value between an ideal state and an ideal state related to learning by a neural network;
an instruction unit that instructs dynamic change of the batch size value in the neural network based on the gap value from the ideal state;
comprising
Information processing equipment.

The gap value from the ideal state includes at least a loss-related value,
The acquisition unit acquires a value related to the loss,
The instruction unit instructs a dynamic change of the batch size value in the neural network based on the loss-related value.
The information processing device according to claim 1 .

The value related to the loss includes the n-th differential value of the loss (n is an integer of 0 or more, the same applies hereinafter),
The information processing apparatus according to claim 2.

The n-th differential value of the loss is the differential value of the loss in the time direction,
The information processing apparatus according to claim 3.

The acquisition unit acquires a value related to the loss via an API,
The instruction unit automatically instructs a change in the batch size value based on the loss-related value.
The information processing apparatus according to claim 2.

The instruction unit instructs to increase the value of the batch size when convergence of learning is estimated from the value related to the loss,
The information processing apparatus according to claim 2.

The instruction unit instructs to increase the batch size value based on the value of the n-th differential value of the loss.
The information processing apparatus according to claim 3.

The instruction unit instructs to increase the batch size value based on at least one of the loss value or the loss slope being below a threshold.
The information processing apparatus according to claim 7.

The instruction unit instructs a decrease in the batch size value when learning divergence is estimated based on the loss-related value.
The information processing apparatus according to claim 2.

The instruction unit instructs reloading of the network model in a past epoch when learning divergence is estimated based on the loss-related value.
The information processing apparatus according to claim 9 .

When the network model is reloaded in the past epoch, the instruction unit causes the batch size value to be set smaller than the value set in the past epoch.
The information processing apparatus according to claim 10.

The instruction unit instructs to increase or decrease the batch size by recreating the model in the GPU.
The information processing device according to claim 1 .

The instruction unit instructs to increase or decrease the batch size by increasing or decreasing the number of loops for calculation related to learning,
The information processing device according to claim 1 .

The instruction unit instructs an increase or decrease in batch size due to an increase or decrease in the number of GPUs used for learning,
The information processing device according to claim 1 .

If there is an additional available GPU, the instruction unit instructs to increase the batch size by allocating the GPU to learning.
The information processing device according to claim 1 .

If there is no additionally usable GPU and there is free space in the memory of the GPU currently in use, the instruction unit instructs to increase the batch size by recreating the model in the GPU currently in use. ,
The information processing device according to claim 1 .

The instruction unit instructs to increase the batch size by increasing the number of loops for calculation related to learning when there is no free space in the memory of the GPU currently in use.
The information processing device according to claim 1 .

The gap value from the ideal state includes at least one of training error or validation error,
The information processing device according to claim 1 .

A processor acquiring a gap value between an ideal state and an ideal state related to learning by a neural network;
Directing dynamic change of a batch size value in the neural network based on the gap value from the ideal state;
including,
Information processing methods.