JP2018159992A

JP2018159992A - Parameter adjustment device, learning system, parameter adjustment method and program

Info

Publication number: JP2018159992A
Application number: JP2017055505A
Authority: JP
Inventors: 佑美尾崎; Yumi OZAKI; 孝也松野; Takaya Matsuno; 大橋　純; Jun Ohashi; 純大橋
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2017-03-22
Filing date: 2017-03-22
Publication date: 2018-10-11
Anticipated expiration: 2037-03-22
Also published as: JP6815240B2

Abstract

PROBLEM TO BE SOLVED: To provide a parameter adjustment device, a learning system, a parameter adjustment method and a program capable of reducing an adjustment man hour of a hyper parameter.SOLUTION: A parameter adjustment device has an estimation unit, a restriction unit and a determination unit. The estimation unit estimates an estimation function which indicates a relationship between a hyper parameter for specifying operation of machine learning and a learning result, on the basis of the learning result obtained by the machine learning. The restriction unit restricts a value area of the hyper parameter, on the basis of the estimation function estimated by the estimation unit. The determination unit determines the hyper parameter used for the machine learning, from among the hyper parameters contained in the value area restricted by the restriction unit.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、パラメータ調整装置、学習システム、パラメータ調整方法、およびプログラムに関する。 Embodiments described herein relate generally to a parameter adjustment device, a learning system, a parameter adjustment method, and a program.

近年、画像認識、音声認識、および言語解析などの分野において、機械学習を用いた手法についての研究が行われている。特に、多層構造のニューラルネットワークを用いたディープラーニング手法が注目されている。このディープラーニング手法においては、予め教師データなどを用いた学習処理を行い、ニューラルネットワークの結合の重みなどが調整される。このようなニューラルネットワークの結合の重みを調整する場合、学習の動作を規定するハイパーパラメータ（学習率、各層のノード数、層数など）の値が、学習結果に大きな影響を及ぼす。 In recent years, research on methods using machine learning has been performed in the fields of image recognition, speech recognition, and language analysis. In particular, a deep learning method using a multilayered neural network has attracted attention. In this deep learning method, learning processing using teacher data or the like is performed in advance to adjust the weight of the neural network connection. When adjusting the connection weight of such a neural network, the values of the hyper parameters (the learning rate, the number of nodes in each layer, the number of layers, etc.) that define the learning operation greatly affect the learning result.

特開２００６−２３８６８号公報JP 2006-23868 A

上記のような学習処理においては、所望の学習結果が得られるまで、ハイパーパラメータの値を変化させながら学習処理を何度も繰り返す必要がある。例えば、ハイパーパラメータの値を変更するたびに学習結果を確認し、ハイパーパラメータと学習結果との関係を把握した上で、再びハイパーパラメータの値を調整する作業が必要となる。このため、このようなハイパーパラメータの調整に時間がかかる。 In the learning process as described above, it is necessary to repeat the learning process many times while changing the value of the hyperparameter until a desired learning result is obtained. For example, every time the value of the hyper parameter is changed, it is necessary to confirm the learning result, grasp the relationship between the hyper parameter and the learning result, and then adjust the value of the hyper parameter again. For this reason, it takes time to adjust such hyperparameters.

また、ハイパーパラメータの調整に確率分布による一般的探索手法を用いた場合、初期値に左右されて本来探索すべき値から外れた不要な範囲の探索を繰り返す可能性がある。このため、ハイパーパラメータの調整を行っても所望の学習結果を得ることができない場合がある。 In addition, when a general search method based on probability distribution is used to adjust hyperparameters, there is a possibility of repeatedly searching for an unnecessary range that is influenced by the initial value and deviates from the value that should be originally searched. For this reason, a desired learning result may not be obtained even if the hyper parameter is adjusted.

本発明が解決しようとする課題は、ハイパーパラメータの調整を効率的に行うことが可能なパラメータ調整装置、学習システム、パラメータ調整方法、およびプログラムを提供することである。 The problem to be solved by the present invention is to provide a parameter adjustment device, a learning system, a parameter adjustment method, and a program capable of efficiently adjusting hyperparameters.

実施形態のパラメータ調整装置は、推定部と、限定部と、決定部とを持つ。推定部は、機械学習によって得られた学習結果に基づいて、前記機械学習の動作を規定するハイパーパラメータと前記学習結果との関係を示す推定関数を推定する。限定部は、前記推定部により推定された推定関数に基づいて、前記ハイパーパラメータの値域を限定する。決定部は、前記限定部により限定された値域に含まれるハイパーパラメータの中から、前記機械学習に用いるハイパーパラメータを決定する。 The parameter adjustment apparatus according to the embodiment includes an estimation unit, a limitation unit, and a determination unit. The estimation unit estimates an estimation function indicating a relationship between a hyperparameter that defines the operation of the machine learning and the learning result based on a learning result obtained by machine learning. The limiting unit limits the range of the hyperparameter based on the estimation function estimated by the estimation unit. The determining unit determines a hyperparameter used for the machine learning from among the hyperparameters included in the range limited by the limiting unit.

実施形態の学習システムの一例を示す図。The figure which shows an example of the learning system of embodiment. 実施形態のパラメータ調整装置の処理の一例を示すフローチャート。The flowchart which shows an example of the process of the parameter adjustment apparatus of embodiment. 実施形態のパラメータ調整装置の関数推定処理の一例を示すフローチャート。The flowchart which shows an example of the function estimation process of the parameter adjustment apparatus of embodiment. 実施形態における学習処理によって得られた学習結果と第１ハイパーパラメータとの関係を示す図。The figure which shows the relationship between the learning result obtained by the learning process in embodiment, and a 1st hyper parameter. 実施形態における基準関数上の基準点と学習結果における基準点との調整処理を示す図。The figure which shows the adjustment process of the reference point on the reference function in embodiment, and the reference point in a learning result. 実施形態における学習結果と基準関数から算出された算出結果との誤差が最小となる関数を推定する処理を示す図。The figure which shows the process which estimates the function in which the difference | error of the learning result and calculation result calculated from the reference function in embodiment is the minimum. 実施形態における関数推定部によって推定された複数の推定関数を示す図。The figure which shows the several estimation function estimated by the function estimation part in embodiment. 実施形態における探索範囲限定部によって限定された第１ハイパーパラメータの値域を示す図。The figure which shows the range of the 1st hyperparameter limited by the search range limitation part in embodiment. 実施例のパラメータ調整装置の処理を示すフローチャート。The flowchart which shows the process of the parameter adjustment apparatus of an Example. 実施例のパラメータ調整装置の関数推定処理を示すフローチャート。The flowchart which shows the function estimation process of the parameter adjustment apparatus of an Example. 実施例における学習処理によって得られた学習結果と第１ハイパーパラメータとの関係を示す図。The figure which shows the relationship between the learning result obtained by the learning process in an Example, and a 1st hyper parameter. 実施例における基準関数上の基準点と学習結果における基準点との調整処理を示す図。The figure which shows the adjustment process of the reference point on the reference function in an Example, and the reference point in a learning result. 実施例における学習結果と基準関数から算出された算出結果との誤差が最小となる関数を推定する処理を示す図。The figure which shows the process which estimates the function in which the difference | error of the learning result and the calculation result calculated from the reference function in an Example becomes the minimum. 実施例における関数推定部によって推定された複数の推定関数を示す図。The figure which shows the several estimation function estimated by the function estimation part in an Example. 実施例における探索範囲限定部によって限定された値域を示す図。The figure which shows the range limited by the search range limitation part in an Example. 実施例における学習処理結果を示す図。The figure which shows the learning process result in an Example.

以下、実施形態のパラメータ調整装置、学習システム、パラメータ調整方法、およびプログラムを、図面を参照して説明する。 Hereinafter, a parameter adjustment device, a learning system, a parameter adjustment method, and a program according to embodiments will be described with reference to the drawings.

図１は、実施形態の学習システムＳの一例を示す図である。学習システムＳは、例えば、パラメータ調整装置１と、学習装置３とを備える。パラメータ調整装置１と、学習装置３とは、ネットワークＮによって互いに接続されている。ネットワークＮは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、専用回線などを含む。 FIG. 1 is a diagram illustrating an example of a learning system S according to the embodiment. The learning system S includes, for example, a parameter adjustment device 1 and a learning device 3. The parameter adjustment device 1 and the learning device 3 are connected to each other by a network N. The network N includes, for example, a WAN (Wide Area Network), a LAN (Local Area Network), the Internet, a dedicated line, and the like.

パラメータ調整装置１は、機械学習におけるハイパーパラメータの調整を行う。すなわち、パラメータ調整装置１は、機械学習のハイパーパラメータを調整する際に、学習処理を行った学習結果（正答率など）の傾向から、ハイパーパラメータと学習結果との関係を関数で推定し、この関数に基づいてハイパーパラメータの値域を限定する。例えば、パラメータ調整装置１は、ハイパーパラメータと、学習装置３がこのハイパーパラメータを用いて学習処理を行った学習結果との関係を関数で推定し、この関数に基づいてハイパーパラメータの値域を限定する。 The parameter adjustment device 1 adjusts hyper parameters in machine learning. That is, when adjusting the hyper parameter of machine learning, the parameter adjustment device 1 estimates the relationship between the hyper parameter and the learning result from the tendency of the learning result (correct answer rate, etc.) obtained by the learning process, and this Limit the hyperparameter range based on the function. For example, the parameter adjustment device 1 estimates the relationship between the hyperparameter and the learning result obtained by the learning device 3 using the hyperparameter as a function, and limits the range of the hyperparameter based on this function. .

パラメータ調整装置１は、例えば、パラメータ候補決定部１０（決定部）と、タスク送信部１２と、関数推定部１４（推定部）と、探索範囲限定部１６（限定部）と、記憶部１８とを備える。パラメータ調整装置１の各機能部のうち一部または全部は、プロセッサがプログラム（ソフトウェア）を実行することにより実現されてよい。この場合、パラメータ調整装置１は、上記のプログラムをコンピュータ装置に予めインストールすることで実現してもよい。或いは、ＣＤ−ＲＯＭなどの記憶媒体に記憶された上記のプログラム、又はネットワークを介して頒布される上記のプログラムを、コンピュータ装置に適宜インストールすることで実現してもよい。 The parameter adjustment device 1 includes, for example, a parameter candidate determination unit 10 (determination unit), a task transmission unit 12, a function estimation unit 14 (estimation unit), a search range limitation unit 16 (limitation unit), and a storage unit 18. Is provided. Some or all of the functional units of the parameter adjustment apparatus 1 may be realized by a processor executing a program (software). In this case, the parameter adjustment apparatus 1 may be realized by installing the above program in a computer device in advance. Alternatively, the above program stored in a storage medium such as a CD-ROM or the above program distributed via a network may be appropriately installed in a computer device.

パラメータ候補決定部１０は、調整対象とするハイパーパラメータの種類およびハイパーパラメータの値の組み合わせの候補を決定する。パラメータ候補決定部１０は、組み合わせの候補を決定するために、一様分布に基づくランダム方式の探索手法、確率分布に基づくベイジアン（Bayesian）方式の探索手法などを用いる。 The parameter candidate determination unit 10 determines a candidate for a combination of a hyperparameter type and a hyperparameter value to be adjusted. The parameter candidate determination unit 10 uses a random search method based on a uniform distribution, a Bayesian search method based on a probability distribution, and the like to determine combination candidates.

タスク送信部１２は、パラメータ候補決定部１０によって決定された候補を用いた学習処理を示すタスクを学習装置３に送信する。学習装置３は、このタスクに基づいて、学習処理を実行する。 The task transmission unit 12 transmits a task indicating a learning process using the candidate determined by the parameter candidate determination unit 10 to the learning device 3. The learning device 3 performs a learning process based on this task.

関数推定部１４は、学習装置３から受け取った学習結果の傾向に基づいて、ハイパーパラメータと学習結果との関係を示す関数（以下、「推定関数」と呼ぶ）を推定する。 The function estimation unit 14 estimates a function (hereinafter referred to as “estimation function”) indicating the relationship between the hyperparameter and the learning result based on the tendency of the learning result received from the learning device 3.

探索範囲限定部１６は、関数推定部１４によって推定された推定関数に基づいて、学習結果が得られていない未学習範囲を含む学習結果の傾向を予測し、予測した傾向から、最適な学習結果が得られることが予測されるハイパーパラメータの値域を限定する。 Based on the estimation function estimated by the function estimation unit 14, the search range limiting unit 16 predicts the tendency of the learning result including the unlearned range where the learning result is not obtained, and determines the optimal learning result from the predicted tendency. Limits the range of hyperparameters that are expected to be obtained.

記憶部１８は、予め機械学習において使用されるハイパーパラメータの探索範囲を記憶する。また、記憶部１８は、探索範囲限定部１６によって限定された各ハイパーパラメータの値域を記憶する。記憶部１８は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリなどで実現される。 The storage unit 18 stores a hyper parameter search range used in machine learning in advance. In addition, the storage unit 18 stores the range of each hyper parameter limited by the search range limitation unit 16. The storage unit 18 is realized by a ROM (Read Only Memory), a RAM (Random Access Memory), a HDD (Hard Disk Drive), a flash memory, or the like.

学習装置３は、画像認識、音声認識、および言語解析などの判定処理などを行う。学習装置３は、パラメータ調整装置１から受け取ったハイパーパラメータに基づいて、学習処理を行う。学習装置３は、例えば、多層構造のニューラルネットワークを用いたディープラーニング手法を用いる。この学習処理では、ニューラルネットワークの構成および結合の重みなどが調整される。学習装置３は、学習処理における学習結果（正答率など）を、パラメータ調整装置１に入力する。 The learning device 3 performs determination processing such as image recognition, voice recognition, and language analysis. The learning device 3 performs a learning process based on the hyperparameter received from the parameter adjustment device 1. The learning device 3 uses, for example, a deep learning method using a multilayered neural network. In this learning process, the configuration of the neural network, the connection weight, and the like are adjusted. The learning device 3 inputs a learning result (correct answer rate, etc.) in the learning process to the parameter adjustment device 1.

次に、パラメータ調整装置１の動作について説明する。図２は、パラメータ調整装置１の処理の一例を示すフローチャートである。 Next, the operation of the parameter adjustment device 1 will be described. FIG. 2 is a flowchart illustrating an example of processing of the parameter adjustment apparatus 1.

まず、パラメータ候補決定部１０は、入力部（図示しない）から入力されたユーザの指示などに基づいて、学習率、各層のノード数、層数などのハイパーパラメータの中で、調整対象とするハイパーパラメータを選択する（ステップＳ１０１）。以下においては、学習処理に関連する２つのハイパーパラメータ（第１ハイパーパラメータＡおよび第２ハイパーパラメータＢ）を調整対象とする例について説明する。 First, the parameter candidate determination unit 10 determines a hyper parameter to be adjusted among hyper parameters such as a learning rate, the number of nodes in each layer, and the number of layers based on a user instruction input from an input unit (not shown). A parameter is selected (step S101). In the following, an example will be described in which two hyper parameters (first hyper parameter A and second hyper parameter B) related to the learning process are to be adjusted.

次に、パラメータ候補決定部１０は、２つのハイパーパラメータの内、一方のハイパーパラメータ（第１ハイパーパラメータＡ）の値の候補を複数個決定する。すなわち、パラメータ候補決定部１０は、記憶部１８から第１ハイパーパラメータＡの探索範囲を読み出し、その探索範囲内で第１ハイパーパラメータＡの値の候補を複数個決定する。第１ハイパーパラメータＡの探索範囲としては、例えば、１０^ｎ（ｎ＝１，２，．．．）のように大きな範囲が設定されてよい。また、パラメータ候補決定部１０は、他方のハイパーパラメータ（第２ハイパーパラメータ）については固定値として設定する。 Next, the parameter candidate determination unit 10 determines a plurality of candidates for the value of one of the two hyper parameters (the first hyper parameter A). That is, the parameter candidate determination unit 10 reads the search range of the first hyperparameter A from the storage unit 18 and determines a plurality of value candidates for the first hyperparameter A within the search range. As the search range of the first hyperparameter A, for example, a large range such as 10 ⁿ (n = 1, 2,...) May be set. Further, the parameter candidate determination unit 10 sets the other hyper parameter (second hyper parameter) as a fixed value.

次に、タスク送信部１２は、パラメータ候補決定部１０によって決定されたパラメータセット（複数の値を有する第１ハイパーパラメータＡの各々と、固定値の第２ハイパーパラメータＢとの組み合わせ）を学習装置３に送信し、学習装置３に学習処理を行わせる（ステップＳ１０５）。また、タスク送信部１２は、学習装置３において行われる学習処理の繰り返し回数を規定するハイパーパラメータを学習装置３に送信してもよい。 Next, the task transmission unit 12 learns the parameter set determined by the parameter candidate determination unit 10 (a combination of each of the first hyperparameter A having a plurality of values and the second hyperparameter B having a fixed value). 3 to cause the learning device 3 to perform a learning process (step S105). In addition, the task transmission unit 12 may transmit a hyper parameter that defines the number of repetitions of the learning process performed in the learning device 3 to the learning device 3.

次に、関数推定部１４は、学習装置３から受け取った学習結果に基づいて、第１ハイパーパラメータＡと学習結果との関係を示す推定関数を推定する（ステップＳ１０７）。図３は、パラメータ調整装置１の関数推定処理の一例を示すフローチャートである。 Next, the function estimation unit 14 estimates an estimation function indicating the relationship between the first hyperparameter A and the learning result based on the learning result received from the learning device 3 (step S107). FIG. 3 is a flowchart illustrating an example of the function estimation process of the parameter adjustment apparatus 1.

関数推定処理において、まず、関数推定部１４は、学習装置３から受け取った学習結果における基準点と、予め定義された基準関数ｆ（Ａ）との基準点との値が等しくなるように、基準関数ｆ（Ａ）をα（定数）倍する（ステップＳ２０１）。基準関数ｆ（Ａ）として、複数の基準関数ｆ（Ａ）が予め定義されてもよい。この場合、この複数の基準関数ｆ（Ａ）の中から、学習装置３から受け取った学習結果に近い傾向を示す関数が選択されてよい。図４は、学習装置３による学習処理によって得られた学習結果（正答率）と、第１ハイパーパラメータＡとの関係を示す図である。図５は、基準関数ｆ（Ａ）上の基準点Ｄ１と学習結果における基準点Ｄ２との調整処理を示す図である。図５に示すように、基準関数ｆ（Ａ）上の基準点Ｄ１と、学習結果における基準点Ｄ２との正答率の値が等しくなるように、基準関数ｆ（Ａ）をα倍する。 In the function estimation process, first, the function estimation unit 14 sets the reference point so that the reference point in the learning result received from the learning device 3 is equal to the reference point of the reference function f (A) defined in advance. The function f (A) is multiplied by α (constant) (step S201). As the reference function f (A), a plurality of reference functions f (A) may be defined in advance. In this case, a function indicating a tendency close to the learning result received from the learning device 3 may be selected from the plurality of reference functions f (A). FIG. 4 is a diagram showing the relationship between the learning result (correct answer rate) obtained by the learning process by the learning device 3 and the first hyperparameter A. FIG. 5 is a diagram illustrating an adjustment process between the reference point D1 on the reference function f (A) and the reference point D2 in the learning result. As shown in FIG. 5, the reference function f (A) is multiplied by α so that the correct answer rate values of the reference point D1 on the reference function f (A) and the reference point D2 in the learning result are equal.

次に、関数推定部１４は、学習装置３から受け取った学習結果と、基準関数ｆ（Ａ）をα倍することにより得られた関数α＊ｆ（Ａ）から算出された算出結果との誤差（差）が最小となる関数を第１推定関数Ｆ１として推定する（ステップＳ２０３）。例えば、図６に示すように、関数推定部１４は、学習装置３から受け取った学習結果と、基準関数α＊ｆ（Ａ）から算出された算出結果との誤差の合計が最小となる関数を求めることで、第１推定関数Ｆ１を推定する。 Next, the function estimation unit 14 determines an error between the learning result received from the learning device 3 and the calculation result calculated from the function α * f (A) obtained by multiplying the reference function f (A) by α. The function having the smallest (difference) is estimated as the first estimation function F1 (step S203). For example, as illustrated in FIG. 6, the function estimation unit 14 calculates a function that minimizes the sum of errors between the learning result received from the learning device 3 and the calculation result calculated from the reference function α * f (A). By determining, the first estimation function F1 is estimated.

次に、関数推定部１４は、上記の基準関数ｆ（Ａ）を用いて、第２ハイパーパラメータＢの最小値と対応する第２推定関数Ｆ２を推定する（ステップＳ２０５）。例えば、関数推定部１４は、上記のパラメータ候補決定部１０によって決定された複数の値を有する第１ハイパーパラメータＡの各々と第２ハイパーパラメータＢの最小値との組み合わせを学習装置３に送信して学習処理を行わせることにより得られた学習結果と、基準関数ｆ（Ａ）を定数倍することにより得られた関数から算出された算出結果との誤差が最小となる関数を第２推定関数Ｆ２として推定する。 Next, the function estimation unit 14 estimates the second estimation function F2 corresponding to the minimum value of the second hyperparameter B using the reference function f (A) (step S205). For example, the function estimation unit 14 transmits a combination of each of the first hyperparameter A having a plurality of values determined by the parameter candidate determination unit 10 and the minimum value of the second hyperparameter B to the learning device 3. The second estimation function is a function that minimizes the error between the learning result obtained by performing the learning process and the calculation result calculated from the function obtained by multiplying the reference function f (A) by a constant. Estimated as F2.

次に、関数推定部１４は、上記の基準関数ｆ（Ａ）を用いて、第２ハイパーパラメータＢの最大値と対応する第３推定関数Ｆ３を推定する（ステップＳ２０７）。例えば、関数推定部１４は、上記のパラメータ候補決定部１０によって決定された複数の値を有する第１ハイパーパラメータＡの各々と第２ハイパーパラメータＢの最大値との組み合わせを学習装置３に送信して学習処理を行わせることにより得られた学習結果と、基準関数ｆ（Ａ）を定数倍することにより得られた関数から算出された算出結果との誤差が最小となる関数を第３推定関数Ｆ３として推定する。 Next, the function estimation unit 14 estimates the third estimation function F3 corresponding to the maximum value of the second hyperparameter B using the reference function f (A) (step S207). For example, the function estimation unit 14 transmits a combination of each of the first hyperparameter A having a plurality of values determined by the parameter candidate determination unit 10 and the maximum value of the second hyperparameter B to the learning device 3. The third estimation function is a function that minimizes the error between the learning result obtained by performing the learning process and the calculation result calculated from the function obtained by multiplying the reference function f (A) by a constant. Estimated as F3.

次に、関数推定部１４は、第１推定関数Ｆ１、第２推定関数Ｆ２、および第３推定関数Ｆ３に基づいて、学習結果が得られていない範囲（未学習範囲）における推定関数を推定する（ステップＳ２０９）。例えば、関数推定部１４は、各推定関数における第２ハイパーパラメータＢと関連付けされる他のパラメータＣの変化を推定することで、固定値として設定された第２ハイパーパラメータＢの値と、第２ハイパーパラメータＢの最小値との間の推定関数を推定する。また、関数推定部１４は、固定値として設定された第２ハイパーパラメータＢの値と、第２ハイパーパラメータＢの最大値との間の推定関数を推定する。 Next, the function estimation unit 14 estimates an estimation function in a range in which a learning result is not obtained (unlearned range) based on the first estimation function F1, the second estimation function F2, and the third estimation function F3. (Step S209). For example, the function estimation unit 14 estimates the change of the other parameter C associated with the second hyperparameter B in each estimation function, thereby the second hyperparameter B set as a fixed value, the second An estimation function between the minimum value of the hyperparameter B is estimated. The function estimation unit 14 also estimates an estimation function between the value of the second hyperparameter B set as a fixed value and the maximum value of the second hyperparameter B.

図７は、関数推定部１４によって推定された複数の推定関数を示す図である。図７に示す例では、関数推定部１４によって第１推定関数Ｆ１、第２推定関数Ｆ２、および第３推定関数Ｆ３が推定された後、各推定関数における第２ハイパーパラメータＢと関連付けされる他のパラメータＣの変化を推定することで、推定関数Ｆ１，Ｆ２，およびＦ３以外の未学習範囲における推定関数（点線）が推定される。以上により、関数推定に関する本フローチャートの処理を終了する。 FIG. 7 is a diagram illustrating a plurality of estimation functions estimated by the function estimation unit 14. In the example illustrated in FIG. 7, the first estimation function F1, the second estimation function F2, and the third estimation function F3 are estimated by the function estimation unit 14, and then associated with the second hyperparameter B in each estimation function. By estimating the change in parameter C, the estimation function (dotted line) in the unlearned range other than the estimation functions F1, F2, and F3 is estimated. Thus, the process of this flowchart relating to function estimation ends.

次に、探索範囲限定部１６は、関数推定部１４によって推定された複数の推定関数に基づいて、第２ハイパーパラメータＢの各値における第１ハイパーパラメータＡの値域を限定する（ステップＳ１１１）。第２ハイパーパラメータＢの値を選択する場合には、一様分布に基づくランダム方式、メトロポリス法などを用いてよい。また、第１ハイパーパラメータＡの値域を限定する場合には、各推定関数における学習結果のピーク値を基準とした所定の範囲を値域として限定してよい。探索範囲限定部１６は、限定した値域を、記憶部１８に記憶させる。 Next, the search range limiting unit 16 limits the range of the first hyperparameter A in each value of the second hyperparameter B based on the plurality of estimation functions estimated by the function estimation unit 14 (step S111). When selecting the value of the second hyperparameter B, a random method based on a uniform distribution, a metropolis method, or the like may be used. Further, when the range of the first hyperparameter A is limited, a predetermined range based on the peak value of the learning result in each estimation function may be limited as the range. The search range limiting unit 16 stores the limited value range in the storage unit 18.

図８は、探索範囲限定部１６によって限定された第１ハイパーパラメータＡの値域を示す図である。図８に示す例では、関数推定部１４によって推定された推定関数における学習結果のピークＰを基準とした所定の範囲σを値域として限定されている。 FIG. 8 is a diagram illustrating a range of the first hyperparameter A limited by the search range limiting unit 16. In the example illustrated in FIG. 8, a predetermined range σ based on the learning result peak P in the estimation function estimated by the function estimation unit 14 is limited as a range.

次に、パラメータ候補決定部１０は、記憶部１８に記憶されている探索範囲限定部１６によって限定された値域に含まれる第１ハイパーパラメータＡの中から、最適な学習結果が得られることが予測される第１ハイパーパラメータＡの値を選択し、選択した第１ハイパーパラメータＡと第２ハイパーパラメータＢとのパラメータセットを決定する（ステップＳ１１３）。 Next, the parameter candidate determining unit 10 predicts that an optimal learning result is obtained from the first hyperparameter A included in the range limited by the search range limiting unit 16 stored in the storage unit 18. The value of the first hyperparameter A to be selected is selected, and the parameter set of the selected first hyperparameter A and second hyperparameter B is determined (step S113).

次に、タスク送信部１２は、パラメータ候補決定部１０によって決定された機械学習に用いるパラメータセットを学習装置３に送信し、学習装置３に学習処理を行わせる（ステップＳ１１５）。以上により、本フローチャートの処理を終了する。 Next, the task transmission unit 12 transmits the parameter set used for machine learning determined by the parameter candidate determination unit 10 to the learning device 3, and causes the learning device 3 to perform a learning process (step S115). Thus, the process of this flowchart is completed.

以上で説明した実施形態によれば、実際の学習結果の傾向から、各パラメータと学習結果との関係を関数で推定し、値域を限定しながら学習することによって、不要な範囲の探索回数を削減でき、ハイパーパラメータの調整にかかる作業工数を短縮することができる。 According to the embodiment described above, the number of searches for an unnecessary range is reduced by estimating the relationship between each parameter and the learning result from a function of the actual learning result and learning while limiting the range. It is possible to reduce the man-hours required for adjusting the hyper parameters.

次に、ＭｏｍｅｎｔｕｍＳＧＣを用いたハイパーパラメータの調整方法に関する実施例について説明する。図９は、実施例における、パラメータ調整装置１の処理を示すフローチャートである。 Next, an embodiment relating to a hyper parameter adjustment method using Momentum SGC will be described. FIG. 9 is a flowchart illustrating processing of the parameter adjustment device 1 in the embodiment.

まず、パラメータ候補決定部１０は、入力部（図示しない）から入力されたオペレータの指示などに基づいて、学習率、各層のノード数、層数などのハイパーパラメータの中で、調整対象とするハイパーパラメータを選択する（ステップＳ３０１）。ここでは、パラメータ候補決定部１０は、調整対象とする第１ハイパーパラメータＡとして「学習率」を選択し、第２ハイパーパラメータＢとして「モメンタム（ｍｏｍｅｎｔｕｍ）」を選択する。 First, the parameter candidate determination unit 10 determines a hyper parameter to be adjusted among hyper parameters such as a learning rate, the number of nodes in each layer, and the number of layers based on an operator instruction input from an input unit (not shown). A parameter is selected (step S301). Here, the parameter candidate determination unit 10 selects “learning rate” as the first hyperparameter A to be adjusted, and selects “momentum” as the second hyperparameter B.

次に、パラメータ候補決定部１０は、第１ハイパーパラメータＡ（学習率）として「学習率＝１０^−ｎ（ｎ＝１，２，・・・５）」を決定し、第２ハイパーパラメータＢ（Ｂ_{ｄｅｆａｕｌｔ}）（モメンタム）として「ｍｏｍｅｎｔｕｍ＝０．９０」を決定する（ステップＳ３０３）。 Next, the parameter candidate determination unit 10 determines “learning rate = 10 ⁻ⁿ (n = 1, 2,..., 5)” as the first hyperparameter A (learning rate), and the second hyperparameter B ( B _default ) (momentum) is determined as “momentum = 0.90” (step S303).

次に、タスク送信部１２は、パラメータ候補決定部１０によって決定されたパラメータセット（５つの第１ハイパーパラメータＡの各々と、固定値の第２ハイパーパラメータＢとの組み合わせ）を学習装置３に送信し、学習装置３に学習処理を行わせる（ステップＳ３０５）。図１１は、学習処理によって得られた学習結果と第１ハイパーパラメータＡとの関係を示す図である。 Next, the task transmission unit 12 transmits the parameter set determined by the parameter candidate determination unit 10 (a combination of each of the five first hyperparameters A and the fixed second hyperparameter B) to the learning device 3. Then, the learning apparatus 3 is caused to perform learning processing (step S305). FIG. 11 is a diagram illustrating the relationship between the learning result obtained by the learning process and the first hyperparameter A.

次に、関数推定部１４は、学習装置３から受け取った学習結果に基づいて、第１ハイパーパラメータＡと学習結果との関係を示す推定関数を推定する（ステップＳ３０７）。図１０は、パラメータ調整装置１の関数推定処理を示すフローチャートである。 Next, the function estimation unit 14 estimates an estimation function indicating the relationship between the first hyperparameter A and the learning result based on the learning result received from the learning device 3 (step S307). FIG. 10 is a flowchart showing the function estimation process of the parameter adjustment apparatus 1.

関数推定処理において、まず、関数推定部１４は、学習装置３から受け取った学習結果における基準点と、予め定義された基準関数ｆ（Ａ）上における基準点との値が等しくなるように、基準関数ｆ（Ａ）をα（定数）倍する（ステップＳ４０１）。ここでは、基準関数ｆ（Ａ）として、以下の式（１）のポアソン分布が定義されているとする。

In the function estimation process, first, the function estimation unit 14 sets the reference point so that the reference point in the learning result received from the learning device 3 and the reference point on the predefined reference function f (A) are equal. The function f (A) is multiplied by α (constant) (step S401). Here, it is assumed that the Poisson distribution of the following equation (1) is defined as the reference function f (A).

上記の式（１）におけるｋは、上記の学習率（１０^−ｎ）におけるｎとの間において、ｎ＝ｋ＋１の関係を満たす。すなわち、この式（１）におけるｋは、第１ハイパーパラメータＡと関連付けされる変数である。また、この式（１）におけるλは、第２ハイパーパラメータＢと関連付けされる変数（パラメータＣ）である。 K in the above equation (1) satisfies the relationship of n = k + 1 with n in the learning rate (10 ⁻ⁿ ). That is, k in the equation (1) is a variable associated with the first hyperparameter A. In this equation (1), λ is a variable (parameter C) associated with the second hyperparameter B.

図１２に示すように、基準関数ｆ（Ａ）上の基準点Ｄ１と、学習結果における基準点Ｄ２との正答率の値が等しくなるように、基準関数ｆ（Ａ）をα倍する。関数推定部１４は、例えば、ｋ＝１（学習率＝０．０１）の場合の分布の値と、学習装置３から受け取った学習結果（正答率）との値が等しくなるように、基準関数ｆ（Ａ）をα倍する。 As shown in FIG. 12, the reference function f (A) is multiplied by α so that the correct answer rate values of the reference point D1 on the reference function f (A) and the reference point D2 in the learning result are equal. For example, the function estimation unit 14 sets the reference function so that the value of the distribution when k = 1 (learning rate = 0.01) is equal to the value of the learning result (correct answer rate) received from the learning device 3. Multiply f (A) by α.

次に、関数推定部１４は、学習装置３から受け取った学習結果と、基準関数ｆ（Ａ）をα倍することにより得られた関数α＊ｆ（Ａ）から算出された算出結果との誤差（差）の合計が最小となる関数を第１推定関数Ｆ１として推定する（ステップＳ４０３）。関数推定部１４は、例えば、二乗誤差の和が最小になるλの値を推定する。図１３に示すように、関数推定部１４は、式（１）において、λ_０＝２である関数を第１推定関数Ｆ１として推定する。 Next, the function estimation unit 14 determines an error between the learning result received from the learning device 3 and the calculation result calculated from the function α * f (A) obtained by multiplying the reference function f (A) by α. A function having the minimum (difference) is estimated as the first estimation function F1 (step S403). For example, the function estimation unit 14 estimates the value of λ that minimizes the sum of square errors. As illustrated in FIG. 13, the function estimation unit 14 estimates a function having λ ₀ = 2 in the equation (1) as the first estimation function F1.

次に、関数推定部１４は、上記の基準関数ｆ（Ａ）を用いて、第２ハイパーパラメータＢの最小値（Ｂ_ｍｉｎ）と対応する第２推定関数Ｆ２を推定する（ステップＳ４０５）。例えば、関数推定部１４は、上記のパラメータ候補決定部１０によって決定された５つの第１ハイパーパラメータＡの各々と第２ハイパーパラメータＢの最小値（０．０１）との組み合わせを学習装置３に送信して学習処理を行わせることにより得られた学習結果と、基準関数ｆ（Ａ）を定数倍することにより得られた関数から算出された算出結果との誤差が最小となる関数を第２推定関数Ｆ２として推定する。 Next, the function estimation unit 14 estimates the second estimation function F2 corresponding to the minimum value (B _min ) of the second hyperparameter B using the reference function f (A) (step S405). For example, the function estimation unit 14 gives the learning device 3 a combination of each of the five first hyperparameters A determined by the parameter candidate determination unit 10 and the minimum value (0.01) of the second hyperparameter B. A second function that minimizes the error between the learning result obtained by transmitting and performing the learning process and the calculation result calculated from the function obtained by multiplying the reference function f (A) by a constant is the second function. Estimation is performed as an estimation function F2.

次に、関数推定部１４は、上記の基準関数ｆ（Ａ）を用いて、第２ハイパーパラメータＢの最大値（Ｂ_ｍａｘ）と対応する第３推定関数Ｆ３を推定する（ステップＳ４０７）。例えば、関数推定部１４は、上記のパラメータ候補決定部１０によって決定された５つの第１ハイパーパラメータＡの各々と第２ハイパーパラメータＢの最大値（０．９９）との組み合わせを学習装置３に送信して学習処理を行わせることにより得られた学習結果と、基準関数ｆ（Ａ）を定数倍することにより得られた関数から算出された算出結果との誤差が最小となる関数を第３推定関数Ｆ３として推定する。 Next, the function estimation unit 14 estimates the third estimation function F3 corresponding to the maximum value (B _max ) of the second hyperparameter B using the reference function f (A) (step S407). For example, the function estimation unit 14 gives a combination of each of the five first hyperparameters A determined by the parameter candidate determination unit 10 and the maximum value (0.99) of the second hyperparameter B to the learning device 3. A function that minimizes an error between a learning result obtained by transmitting and performing a learning process and a calculation result calculated from a function obtained by multiplying the reference function f (A) by a constant is a third function. Estimation is performed as an estimation function F3.

次に、関数推定部１４は、第１推定関数Ｆ１、第２推定関数Ｆ２、および第３推定関数Ｆ３に基づいて、学習結果が得られていない範囲（未学習範囲）における推定関数を推定する（ステップＳ４０９）。図１４に示すように、関数推定部１４は、第１推定関数Ｆ１、第２推定関数Ｆ２、および第３推定関数Ｆ３を推定した後、０．０１＜ｍｏｍｅｎｔｕｍ＜０．９９の範囲において、第２ハイパーパラメータＢと、パラメータλとの関係を表した近似式を求める。例えば、関数推定部１４は、以下の式（２）および（３）の関係を表した近似式（４）を求める。式（４）において、ａおよびｂは定数である。

Next, the function estimation unit 14 estimates an estimation function in a range in which a learning result is not obtained (unlearned range) based on the first estimation function F1, the second estimation function F2, and the third estimation function F3. (Step S409). As shown in FIG. 14, after estimating the first estimation function F1, the second estimation function F2, and the third estimation function F3, the function estimation unit 14 performs the first estimation function in the range of 0.01 <momentum <0.99. 2 An approximate expression representing the relationship between the hyperparameter B and the parameter λ is obtained. For example, the function estimation unit 14 obtains an approximate expression (4) that represents the relationship between the following expressions (2) and (3). In formula (4), a and b are constants.

ここで、λ_０＝Ｂ_{ｄｅｆａｕｌｔ}、ΔＢ＝Ｂ−Ｂ_{ｄｅｆａｕｌｔ}（Ｂは学習時に確率分布によって与えられる）であるため、近似式（４）は、式（５）のように表される。

Here, since λ ₀ = B _default and ΔB = B−B _default (B is given by the probability distribution at the time of learning), the approximate expression (4) is expressed as Expression (5).

本実施例において、関数推定部１４は、以下の式（６）によって表される近似式を求める（上記式（５）におけるｂは０とする）。この近似式（６）において、βは定数である。関数推定部１４は、この近似式（６）を用いて、０．０１刻みの各ｍｏｍｅｎｔｕｍの値でのλの値を求める（ｍｏｍｅｎｔｕｍ＝０．９は除く）。

In this embodiment, the function estimation unit 14 obtains an approximate expression represented by the following expression (6) (b in the above expression (5) is 0). In this approximate expression (6), β is a constant. The function estimation unit 14 obtains the value of λ at each momentum value in increments of 0.01 using this approximate expression (6) (except momentum = 0.9).

このように、第１推定関数Ｆ１、第２推定関数Ｆ２、および第３推定関数Ｆ３における第２ハイパーパラメータＢと関連付けされるパラメータλの変化を推定することで、未学習範囲における推定関数が推定される。以上により、関数推定に関する本フローチャートの処理を終了する。なお、関数推定部１４が求める近似式は、一次関数の式である必要はなく、二次関数などの式であってもよい。また、近似式が予め定義されている場合には、第２ハイパーパラメータＢの最小値（Ｂ_ｍｉｎ）と対応する第２推定関数Ｆ２および第２ハイパーパラメータＢの最大値（Ｂ_ｍａｘ）と対応する第３推定関数Ｆ３を推定する処理を省略してもよい。 Thus, the estimation function in the unlearned range is estimated by estimating the change of the parameter λ associated with the second hyperparameter B in the first estimation function F1, the second estimation function F2, and the third estimation function F3. Is done. Thus, the process of this flowchart relating to function estimation ends. Note that the approximate expression obtained by the function estimation unit 14 need not be a linear function expression, but may be an expression such as a quadratic function. In addition, when the approximate expression is defined in advance, it corresponds to the second estimation function F2 corresponding to the minimum value (B _min ) of the second hyperparameter B and the maximum value (B _max ) of the second hyperparameter B. The process of estimating the third estimation function F3 may be omitted.

次に、探索範囲限定部１６は、関数推定部１４によって推定された複数の推定関数に基づいて、第１ハイパーパラメータＡの値域を限定する（ステップＳ３１１）。図１５は、関数推定部１４によって推定された複数の推定関数と、探索範囲限定部１６によって限定された値域を示す図である。図１５に示す例では、４つの第２ハイパーパラメータＢ（０．０９，０．６２，０．９０，０．９９）の各々に対して推定された推定関数において、正答率のピークＰ１からＰ４を第１ハイパーパラメータＡの値域として限定している。探索範囲限定部１６は、ピークＰ１からＰ４の各々を基準とした所定の範囲を値域として限定してもよい。 Next, the search range limiting unit 16 limits the range of the first hyperparameter A based on the plurality of estimation functions estimated by the function estimation unit 14 (step S311). FIG. 15 is a diagram illustrating a plurality of estimation functions estimated by the function estimation unit 14 and the range of values limited by the search range limitation unit 16. In the example shown in FIG. 15, in the estimation function estimated for each of the four second hyperparameters B (0.09, 0.62, 0.90, 0.99), correct answer rate peaks P1 to P4 are used. Is limited as the range of the first hyperparameter A. The search range limiting unit 16 may limit a predetermined range based on each of the peaks P1 to P4 as a range.

次に、パラメータ候補決定部１０は、探索範囲限定部１６によって限定された値域に含まれる第１ハイパーパラメータＡの中から、機械学習に用いる第１ハイパーパラメータＡを選択し、選択した第１ハイパーパラメータＡと第２ハイパーパラメータＢとのパラメータセットを決定する（ステップＳ３１３）。 Next, the parameter candidate determination unit 10 selects the first hyperparameter A used for machine learning from the first hyperparameters A included in the range limited by the search range limitation unit 16, and selects the selected first hyperparameter. A parameter set of the parameter A and the second hyper parameter B is determined (step S313).

次に、タスク送信部１２は、パラメータ候補決定部１０によって決定された機械学習に用いるパラメータセットを学習装置３に送信し、学習装置３に学習処理を行わせる（ステップＳ３１５）。以上により、本フローチャートの処理を終了する。 Next, the task transmission unit 12 transmits the parameter set used for machine learning determined by the parameter candidate determination unit 10 to the learning device 3, and causes the learning device 3 to perform a learning process (step S315). Thus, the process of this flowchart is completed.

図１６は、実施例におけるパラメータ調整装置１によって調整されたハイパーパラメータを用いて学習処理を行った場合の処理結果を示す図である。正答率の平均改善幅とは、基準となる引数(ｌｒ＝０．０１，ｍｏｍｅｎｔｕｍ＝０．９)の正答率と、調整によって得られた正答率の最大値との差を示している。また、平均探索数は、ベイジアン探索において、実施例で求めた正答率以上の値が得られるまでの探索数の平均値を示している。ベイジアン探索での最大探索回数が４０回のため、その回数内で得られなかった場合は、値を４０として算出している。 FIG. 16 is a diagram illustrating a processing result when the learning process is performed using the hyper parameter adjusted by the parameter adjusting device 1 according to the embodiment. The average improvement rate of the correct answer rate indicates the difference between the correct answer rate of the reference argument (lr = 0.01, momentum = 0.9) and the maximum value of the correct answer rate obtained by the adjustment. The average search number indicates the average value of the search number until a value equal to or higher than the correct answer rate obtained in the example is obtained in the Bayesian search. Since the maximum number of searches in Bayesian search is 40, the value is calculated as 40 if it cannot be obtained within that number.

図１６に示すように、従来のベイジアン探索を用いた場合と比較して、実施例におけるパラメータ調整装置１によって調整されたハイパーパラメータを用いて学習処理を行った場合、同じ程度の平均改善幅の値で、平均の探索数を半分程度に抑えることができる。また、探索数１５回の結果によると、本実施例により平均改善幅の値を向上させることができる。例えば、クラス数が「３０」であり、探索数が「１５」である条件下で比較すると、従来のベイジアン探索を用いた場合の平均改善幅は「１．８１」であるのに対して、本実施例では「２．５２」となっており、平均改善幅の値が向上していることが分かる。また、クラス数が増大するにつれて、本実施例では平均改善幅の値をより向上させることができる。 As shown in FIG. 16, when the learning process is performed using the hyperparameter adjusted by the parameter adjustment device 1 in the embodiment as compared with the case where the conventional Bayesian search is used, the average improvement width of the same degree is obtained. By value, the average number of searches can be reduced to about half. Further, according to the result of the search number of 15 times, the value of the average improvement width can be improved by the present embodiment. For example, when compared under the condition where the number of classes is “30” and the number of searches is “15”, the average improvement width when the conventional Bayesian search is used is “1.81”, whereas In this example, it is “2.52”, which indicates that the value of the average improvement width is improved. Further, as the number of classes increases, the average improvement width can be further improved in the present embodiment.

なお、上記の実施形態では、調整対象のパラメータとして２つのパラメータ（第１ハイパーパラメータＡおよび第２ハイパーパラメータＢ）を用いた例を説明したが、３つ以上のパラメータの調整に本実施形態を適用してもよい。 In the above-described embodiment, an example in which two parameters (first hyperparameter A and second hyperparameter B) are used as parameters to be adjusted has been described. However, the present embodiment is used to adjust three or more parameters. You may apply.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are also included in the invention described in the claims and the equivalents thereof.

１…パラメータ調整装置、３…学習装置、１０…パラメータ候補決定部、１２…タスク送信部、１４…関数推定部、１６…探索範囲限定部、１８…記憶部、Ｓ…学習システム DESCRIPTION OF SYMBOLS 1 ... Parameter adjustment apparatus, 3 ... Learning apparatus, 10 ... Parameter candidate determination part, 12 ... Task transmission part, 14 ... Function estimation part, 16 ... Search range limitation part, 18 ... Memory | storage part, S ... Learning system

Claims

Based on a learning result obtained by machine learning, an estimation unit that estimates an estimation function indicating a relationship between a hyperparameter that defines the operation of the machine learning and the learning result;
Based on the estimation function estimated by the estimation unit, a limiting unit that limits the range of the hyperparameter;
A parameter adjustment device comprising: a determination unit that determines a hyperparameter used for the machine learning from among the hyperparameters included in the range limited by the limitation unit.

The estimation unit estimates the estimation function including a range in which a learning result of the machine learning is not obtained;
The parameter adjustment device according to claim 1.

The estimation unit uses, as the estimation function, a reference function that minimizes a difference between the learning result obtained by the machine learning and a calculation result calculated based on the reference function among a plurality of reference functions. presume,
The parameter adjustment apparatus according to claim 1 or 2.

The determination unit determines a hyperparameter used for the machine learning based on a search method using a uniform distribution.
The parameter adjustment apparatus as described in any one of Claim 1 to 3.

The estimation unit estimates an estimation function indicating a relationship between a hyperparameter used for adjusting a configuration of a neural network and a connection weight and the learning result.
The parameter adjustment device according to any one of claims 1 to 4.

The estimation unit estimates, as the estimation function, a function that minimizes a difference between the learning result obtained by the machine learning and a calculation result calculated based on a Poisson distribution function;
The parameter adjustment device according to any one of claims 1 to 5.

A parameter adjusting device according to any one of claims 1 to 6;
A learning system comprising: a learning device that performs learning processing using the hyperparameter determined by the parameter adjustment device.

Based on a learning result obtained by machine learning, an estimation function indicating a relationship between the learning result and a hyperparameter that defines the operation of the machine learning is estimated,
Based on the estimated estimation function, a range of the hyperparameter is limited,
Determining hyperparameters used for the machine learning from among hyperparameters included in the limited range;
Parameter adjustment method.

On the computer,
Based on a learning result obtained by machine learning, an estimation function indicating a relationship between a hyperparameter that defines the operation of the machine learning and the learning result is estimated,
Based on the estimated estimation function, the range of the hyperparameter is limited,
From among the hyperparameters included in the limited range, the hyperparameter used for the machine learning is determined.
program.